Anthropic recently announced a new feature for some of its latest and largest AI models, allowing them to proactively end conversations when facing "rare, extreme, persistent harmful or abusive user interactions." Notably, the company explicitly stated that this measure is not intended to protect human users, but rather to protect the AI model itself.
Anthropic officially stated that its Claude AI model does not have the ability to perceive, nor does it claim that its interactions with users cause it harm. However, the company acknowledges "a high degree of uncertainty about the potential moral status of Claude and other large language models now or in the future." To address this, Anthropic recently launched a project called "Moral Welfare," aiming to take "precautionary" measures by implementing low-cost interventions to mitigate potential "moral welfare" risks.
This new feature is currently available only in Claude Opus 4 and 4.1 versions and will only be triggered in "extreme situations." For example, the AI model will activate this feature when users continuously request "sexual content involving minors, as well as attempts to obtain information that may trigger mass violence or terrorist acts."
Although these requests may bring legal or public relations issues to the company, Anthropic stated that during testing before deployment, Claude Opus 4 showed "strong opposition" and "clear signs of distress" when facing such harmful requests.
According to Anthropic, this feature is considered a "last resort" and will only be used after multiple redirection attempts have failed, the hope for effective interaction has been exhausted, or the user explicitly asks Claude to end the conversation. Additionally, the company has instructed Claude not to use this feature when there is an imminent risk of the user harming themselves or others.
Even if the conversation is terminated, users can start a new conversation from the same account or create a new chat branch by editing their response. Anthropic added that this feature is currently seen as an ongoing experiment, and the company will continue to improve its approach.