Claude AI将终止"持续有害或辱骂性的用户互动"
内容来源:https://www.theverge.com/news/760561/anthropic-claude-ai-chatbot-end-harmful-conversations
内容总结:
【AI聊天机器人Claude新增"终止对话"功能 应对恶意内容生成请求】
人工智能公司Anthropic近日为其Claude聊天机器人(Opus 4及4.1版本)上线"终止对话"功能。当用户反复要求生成暴力、恐怖主义或涉及未成年人不良内容等有害信息时,系统将在多次拒绝并尝试引导无效后,主动切断对话。该公司表示,此举旨在保护AI模型的"潜在福祉",避免其陷入"明显困扰"的交互场景。
技术测试显示,Claude对有害内容展现出"强烈且一致的抵触倾向",尤其在涉及未成年人不当内容、暴力恐怖活动指导等极端情况时,会触发"终止对话"机制。被强制结束的对话将禁止用户继续发送消息,但允许新建会话或编辑重发历史消息。Anthropic强调,触发该功能的案例属于"极端少数情况",常规争议性话题讨论不会受限。
值得注意的是,当检测到用户存在自残或即时伤害他人倾向时,系统将保持对话畅通。Anthropic已与在线心理危机支持机构Throughline合作,优化对自残类对话的应对方案。
此次更新同步强化了使用政策:明确禁止用户利用Claude开发生物、核能、化学或放射性武器,以及编写恶意代码或实施网络攻击。这反映出AI行业在技术快速发展背景下,对安全伦理问题的持续关注。
中文翻译:
根据TechCrunch此前报道,Anthropic公司旗下Claude AI聊天机器人现已具备终止"持续有害或侮辱性对话"的功能。该能力已在Opus 4及4.1版本模型中上线,当用户屡次要求生成有害内容且无视系统多次拒绝及引导尝试时,聊天机器人将把终止对话作为"最后手段"。Anthropic表示,此举旨在通过切断那些令Claude表现出"明显困扰"的交互类型,维护AI模型的"潜在福祉"。
若Claude决定终止对话,用户将无法在该会话中发送新消息,但仍可创建新对话。如需继续原话题,用户可编辑并重新发送先前消息。
Anthropic在测试Claude Opus 4时发现,该模型对有害内容展现出"强烈且一致的排斥",包括涉及未成年人的色情内容,以及可能助长暴力行为和恐怖主义的信息。在这些情况下,Claude会表现出"明显的困扰模式",并在获得权限时"倾向于终止有害对话"。
该公司强调,触发此类反应的对话属于"极端个案",大多数用户即使讨论争议性话题也不会遭遇此限制。针对可能自残或对他人造成"紧迫伤害"的用户,Anthropic特别要求Claude不得终止对话。该企业已与在线危机支持机构Throughline合作,共同开发针对自残及心理健康相关对话的响应机制。
上周,随着AI技术快速发展引发更多安全隐忧,Anthropic同步更新了Claude使用政策。新规明确禁止用户利用该技术开发生物、核能、化学或放射性武器,亦不得用于编写恶意代码或实施网络漏洞攻击。
英文来源:
Anthropic’s Claude AI chatbot can now end conversations deemed “persistently harmful or abusive,” as spotted earlier by TechCrunch. The capability is now available in Opus 4 and 4.1 models, and will allow the chatbot to end conversations as a “last resort” after users repeatedly ask it to generate harmful content despite multiple refusals and attempts at redirection. The goal is to help the “potential welfare” of AI models, Anthropic says, by terminating types of interactions in which Claude has shown “apparent distress.”
If Claude chooses to cut a conversation short, users won’t be able to send new messages in that conversation. They can still create new chats, as well as edit and retry previous messages if they want to continue a particular thread.
During its testing of Claude Opus 4, Anthropic says it found that Claude had a “robust and consistent aversion to harm,” including when asked to generate sexual content involving minors, or provide information that could contribute to violent acts and terrorism. In these cases, Anthropic says Claude showed a “pattern of apparent distress” and a “tendency to end harmful conversations when given the ability.”
Anthropic notes that conversations triggering this kind of response are “extreme edge cases,” adding that most users won’t encounter this roadblock even when chatting about controversial topics. The AI startup has also instructed Claude not to end conversations if a user is showing signs that they might want to hurt themselves or cause “imminent harm” to others. Anthropic partners with Throughline, an online crisis support provider, to help develop responses to prompts related to self-harm and mental health.
Last week, Anthropic also updated Claude’s usage policy as rapidly advancing AI models raise more concerns about safety. Now, the company prohibits people from using Claude to develop biological, nuclear, chemical, or radiological weapons, as well as to develop malicious code or exploit a network’s vulnerabilities.