OpenAI表示,人工智能浏览器可能始终难以抵御提示注入攻击的威胁。

内容总结:
OpenAI承认AI浏览器面临长期安全挑战,业界共探防御之道
尽管OpenAI正全力加固其Atlas AI浏览器的安全防护,但公司近日坦言,一种名为“提示词注入”的网络攻击风险短期内难以根除。这种攻击通过隐藏在网页或邮件中的恶意指令操控AI代理,引发了业界对AI在开放网络环境中安全运行能力的广泛担忧。
OpenAI在本周一发布的博客中明确表示:“提示词注入,就像网络诈骗和社会工程学攻击一样,很可能永远无法被彻底‘解决’。”公司承认,ChatGPT Atlas的“智能体模式”确实“扩大了安全威胁的暴露面”。
自去年十月Atlas浏览器发布以来,安全研究人员已多次演示如何通过简单的网页文字篡改其底层行为。英国国家网络安全中心本月也警告称,针对生成式AI应用的提示词注入攻击“或许永远无法完全消除”,可能导致数据泄露风险。该机构建议安全从业者应着力降低风险影响,而非追求完全“阻断”攻击。
面对这一持久挑战,OpenAI采取了独特的防御策略:开发基于大语言模型的“自动化攻击者”。这个通过强化学习训练的模拟黑客,能在虚拟环境中反复测试攻击手段,并观察目标AI的推理过程与应对行动。公司表示,该模拟攻击者已能诱使AI执行需数十甚至数百步的复杂恶意流程,并发现了人类红队测试中未曾出现的新型攻击策略。
在近期演示中,更新后的系统已能成功拦截隐藏在恶意邮件中的注入指令,避免AI代理误发辞职信。OpenAI强调,公司正依靠大规模测试和快速补丁周期,在真实攻击发生前强化系统。
然而,网络安全公司Wiz的首席安全研究员拉米·麦卡锡指出,强化学习仅是应对策略的一部分。他认为,当前智能体浏览器处于“中等自主性”与“极高数据访问权限”的风险叠加区,许多安全建议实质是在平衡两者关系。OpenAI目前建议用户为AI代理设定明确指令、限制登录权限,并要求关键操作前必须获得用户确认。
麦卡锡提醒,对于大多数日常使用场景而言,智能体浏览器带来的价值尚不足以平衡其当前的风险水平。“尽管访问邮件、支付等敏感数据的能力使其功能强大,但这也正是高风险所在。这种平衡未来会演变,但目前的权衡仍然非常现实。”
OpenAI表示,防范提示词注入是Atlas的首要任务,公司自产品发布前就已与第三方合作加强防护。尽管未透露具体攻击成功率变化数据,但公司承诺将持续将防御体系建设视为长期的AI安全课题。
中文翻译:
就在OpenAI努力强化其Atlas AI浏览器的网络安全防护时,该公司承认提示词注入攻击——一种通过操纵AI智能体执行隐藏在网页或电子邮件中的恶意指令的攻击方式——将长期构成威胁,这引发了人们对AI智能体在开放网络中安全运行能力的质疑。
OpenAI在周一的博客文章中写道:"提示词注入攻击就像网络诈骗和社会工程学攻击一样,可能永远无法被彻底'解决'。"该公司详细说明了如何加强Atlas的防御体系以应对持续不断的攻击,同时承认ChatGPT Atlas的"智能体模式""扩大了安全威胁面"。
OpenAI于去年10月推出ChatGPT Atlas浏览器后,安全研究人员迅速发布演示案例,证明只需在Google文档中写入特定文字就能改变底层浏览器行为。同日,Brave浏览器公司发布博文指出,间接提示词注入是包括Perplexity旗下Comet在内的AI浏览器的系统性挑战。
OpenAI并非唯一意识到提示词注入攻击难以根除的机构。英国国家网络安全中心本月初警告称,针对生成式AI应用的提示词注入攻击"可能永远无法完全消除",这使网站面临数据泄露风险。该政府机构建议网络安全专业人员降低提示词注入攻击的风险和影响,而非试图"阻止"攻击。
OpenAI表示:"我们将提示词注入视为一项长期的AI安全挑战,需要持续加强防御。"面对这项西西弗斯式的艰巨任务,该公司采取主动快速的响应机制,称该机制在内部发现新型攻击策略方面已显现初步成效,有助于在攻击实际发生前进行防范。
这与Anthropic、谷歌等竞争对手的策略并无本质区别:为应对持续的提示词攻击风险,必须建立多层次防御体系并持续进行压力测试。例如谷歌近期工作重点就集中在智能体系统的架构与策略层面控制。
但OpenAI的独特策略在于其"基于大语言模型的自动化攻击程序"。该程序本质上是通过强化学习训练的机器人,扮演黑客角色寻找向AI智能体植入恶意指令的途径。它能在仿真环境中测试攻击方案,模拟目标AI的思维过程和应对行动,通过分析响应不断优化攻击方式反复尝试。由于掌握目标AI的内部推理机制(外部攻击者无法获取),理论上该程序能比真实攻击者更快发现漏洞。
这是AI安全测试的常用策略:构建智能体来发现边界案例并在仿真环境中快速测试。OpenAI表示:"经过强化学习的攻击程序能诱导智能体执行数十步甚至数百步的复杂恶意工作流程,我们还发现了人类红队测试和外部报告中未曾出现的新型攻击策略。"
在演示案例中,OpenAI展示了自动化攻击程序如何将恶意电子邮件植入用户收件箱。当AI智能体扫描收件箱时,它遵循邮件中的隐藏指令发送了辞职信而非起草休假回复。但据该公司称,安全更新后"智能体模式"已能成功检测提示词注入企图并向用户发出警告。
OpenAI表示,虽然难以完全杜绝提示词注入攻击,但正依靠大规模测试和快速补丁周期在真实攻击发生前强化系统。该公司发言人未透露Atlas安全更新是否显著降低了成功注入率,但表示自产品发布前就已与第三方合作加强防护。
网络安全公司Wiz的首席安全研究员拉米·麦卡锡指出,强化学习虽能持续适应攻击者行为,但仅是解决方案的一部分。他告诉TechCrunch:"评估AI系统风险的有效方法是自主性乘以访问权限。智能体浏览器正处于这一领域的挑战区:中等自主性与极高访问权限的结合。当前多数建议都反映了这种权衡——限制登录权限主要降低风险敞口,而要求确认请求审核则约束了自主性。"
这些正是OpenAI为用户降低风险提出的两项建议。发言人表示Atlas也经过训练,会在发送消息或支付前寻求用户确认。OpenAI还建议用户给予智能体具体指令,而非提供收件箱访问权限后笼统要求"采取必要措施"。该公司指出:"过宽的权限范围会使隐藏或恶意内容更容易影响智能体,即使已部署安全防护措施。"
尽管OpenAI表示保护Atlas用户免受提示词注入攻击是首要任务,麦卡锡仍对高风险浏览器的投资回报率持审慎态度。他告诉TechCrunch:"对大多数日常使用场景而言,智能体浏览器尚未提供足够价值以平衡当前风险水平。鉴于其对电子邮件和支付信息等敏感数据的高访问权限(这也正是其强大之处),风险确实很高。这种平衡将会发展,但目前的权衡仍然非常现实。"
英文来源:
Even as OpenAI works to harden its Atlas AI browser against cyberattacks, the company admits that prompt injections, a type of attack that manipulates AI agents to follow malicious instructions often hidden in web pages or emails, is a risk that’s not going away anytime soon — raising questions about how safely AI agents can operate on the open web.
“Prompt injection, much like scams and social engineering on the web, is unlikely to ever be fully ‘solved,’” OpenAI wrote in a Monday blog post detailing how the firm is beefing up Atlas’ armor to combat the unceasing attacks. The company conceded that “agent mode” in ChatGPT Atlas “expands the security threat surface.”
OpenAI launched its ChatGPT Atlas browser in October, and security researchers rushed to publish their demos, showing it was possible to write a few words in Google Docs that were capable of changing the underlying browser’s behavior. That same day, Brave published a blog post explaining that indirect prompt injection is a systematic challenge for AI-powered browsers, including Perplexity’s Comet.
OpenAI isn’t alone in recognizing that prompt-based injections aren’t going away. The U.K.’s National Cyber Security Centre earlier this month warned that prompt injection attacks against generative AI applications “may never be totally mitigated,” putting websites at risk of falling victim to data breaches. The U.K. government agency advised cyber professionals to reduce the risk and impact of prompt injections, rather than think the attacks can be “stopped.”
For OpenAI’s part, the company said: “We view prompt injection as a long-term AI security challenge, and we’ll need to continuously strengthen our defenses against it.”
The company’s answer to this Sisyphean task? A proactive, rapid-response cycle that the firm says is showing early promise in helping discover novel attack strategies internally before they are exploited “in the wild.”
That’s not entirely different from what rivals like Anthropic and Google have been saying: that to fight against the persistent risk of prompt-based attacks, defenses must be layered and continuously stress-tested. Google’s recent work, for example, focuses on architectural and policy-level controls for agentic systems.
But where OpenAI is taking a different tact is with its “LLM-based automated attacker.” This attacker is basically a bot that OpenAI trained, using reinforcement learning, to play the role of a hacker that looks for ways to sneak malicious instructions to an AI agent.
The bot can test the attack in simulation before using it for real, and the simulator shows how the target AI would think and what actions it would take if it saw the attack. The bot can then study that response, tweak the attack, and try again and again. That insight into the target AI’s internal reasoning is something outsiders don’t have access to, so, in theory, OpenAI’s bot should be able to find flaws faster than a real-world attacker would.
It’s a common tactic in AI safety testing: build an agent to find the edge cases and test against them rapidly in simulation.
“Our [reinforcement learning]-trained attacker can steer an agent into executing sophisticated, long-horizon harmful workflows that unfold over tens (or even hundreds) of steps,” wrote OpenAI. “We also observed novel attack strategies that did not appear in our human red teaming campaign or external reports.”
In a demo (pictured in part above), OpenAI showed how its automated attacker slipped a malicious email into a user’s inbox. When the AI agent later scanned the inbox, it followed the hidden instructions in the email and sent a resignation message instead of drafting an out-of-office reply. But following the security update, “agent mode” was able to successfully detect the prompt injection attempt and flag it to the user, according to the company.
The company says that while prompt injection is hard to secure against in a foolproof way, it’s leaning on large-scale testing and faster patch cycles to harden its systems before they show up in real-world attacks.
An OpenAI spokesperson declined to share whether the update to Atlas’ security has resulted in a measurable reduction in successful injections, but says the firm has been working with third parties to harden Atlas against prompt injection since before launch.
Rami McCarthy, principal security researcher at cybersecurity firm Wiz, says that reinforcement learning is one way to continuously adapt to attacker behavior, but it’s only part of the picture.
“A useful way to reason about risk in AI systems is autonomy multiplied by access,” McCarthy told TechCrunch.
“Agentic browsers tend to sit in a challenging part of that space: moderate autonomy combined with very high access,” said McCarthy. “Many current recommendations reflect that trade-off. Limiting logged-in access primarily reduces exposure, while requiring review of confirmation requests constrains autonomy.”
Those are two of OpenAI’s recommendations for users to reduce their own risk, and a spokesperson said Atlas is also trained to get user confirmation before sending messages or making payments. OpenAI also suggests that users give agents specific instructions, rather than providing them access to your inbox and telling them to “take whatever action is needed.”
“Wide latitude makes it easier for hidden or malicious content to influence the agent, even when safeguards are in place,” per OpenAI.
While OpenAI says protecting Atlas users against prompt injections is a top priority, McCarthy invites some skepticism as to the return on investment for risk-prone browsers.
“For most everyday use cases, agentic browsers don’t yet deliver enough value to justify their current risk profile,” McCarthy told TechCrunch. “The risk is high given their access to sensitive data like email and payment information, even though that access is also what makes them powerful. That balance will evolve, but today the trade-offs are still very real.”
文章标题:OpenAI表示,人工智能浏览器可能始终难以抵御提示注入攻击的威胁。
文章链接:https://qimuai.cn/?post=2577
本站文章均为原创,未经授权请勿用于任何商业用途