美国政府欲在人工智能领域"豪赌一场",此举暗藏重大风险
内容来源:https://www.sciencenews.org/article/government-ai-cybersecurity-risks
内容总结:
【美政府全面押注AI引隐私安全隐忧】
美国政府近期加速推进人工智能战略引发专家担忧。7月23日公布的最新AI行动计划要求将该技术深度整合至政府职能中,国防部已向Anthropic、谷歌等企业发放2亿美元合同,马斯克旗下xAI更推出面向联邦机构的"Grok for Government"采购服务。值得注意的是,此前有消息称美政府效率部门已集中获取财政部、退伍军人事务部等机构的个人健康、税务等敏感数据,拟构建中央数据库。
数据泄露风险加剧
伊利诺伊大学AI安全专家李波指出,用敏感数据训练AI可能导致信息外泄:"模型不仅能记忆特定疾病患者数量,还可能暴露具体患者身份,甚至还原信用卡号、住址等隐私。"乔治城大学专家杰茜卡·季警告,数据集中化将扩大黑客攻击面,"原本需入侵多个部门,现在只需攻破一个核心数据库"。
新型网络攻击威胁
专家列举三大风险:成员推断攻击可探测特定人员是否在训练数据中;模型反转攻击能还原完整个人信息记录;模型窃取攻击可盗取算法参数。尽管可通过"防火墙模型"或遗忘学习技术缓解风险,但李波坦言"这些措施可能影响AI性能,且无法彻底消除隐患"。
监管滞后于技术应用
季强调,当前AI应用存在"顶层决策与基层执行的脱节":"高层为追赶竞争对手强推AI,执行团队却缺乏风险评估时间。"两位专家建议:建立AI分级防护体系、持续开展"红队"攻防演练、严禁使用商用聊天机器人处理机密数据。专家特别警示,若缺乏有效监管,员工可能无意中将政府代码等敏感信息输入商用AI系统,造成不可控的数据外泄风险。
(根据《科学新闻》对李波、杰茜卡·季的访谈整理)
中文翻译:
美国政府拟全面押注人工智能 专家警示重大数据风险
(科学新闻记者阿南亚报道)
根据最新发布的人工智能行动计划,该技术将被深度整合至美国政府职能中。这份7月23日公布的计划,标志着特朗普政府推行"AI优先战略"又迈出关键一步。
近期动态显示,美国国防部已向Anthropic、谷歌、OpenAI及xAI等企业授予总额2亿美元的合同。埃隆·马斯克旗下的xAI公司推出"政府版Grok",允许联邦机构通过总务管理局采购AI产品。而更早前有报道披露,名为"政府效能部"的顾问机构已从财政部、退伍军人事务部等多个政府部门获取公民个人信息、健康档案、纳税记录等敏感数据,计划将其整合至中央数据库。
但专家警告,在此类敏感信息上应用AI工具存在隐私与网络安全隐忧,特别是在数据访问权限等防护措施被弱化的情况下。为深入解析该问题,《科学新闻》专访了伊利诺伊大学厄巴纳-香槟分校AI安全专家李波与乔治城大学安全与新兴技术中心网络安全专家杰茜卡·季。以下为经过编辑的访谈要点:
科学新闻(SN):在私密及机密数据上使用AI模型有何风险?
李波:首要风险是数据泄露。用敏感数据训练模型时,系统会记忆信息。例如医疗数据被训练后,若查询特定疾病患病人数,模型可能直接泄露具体患者信息。已有案例证明模型会暴露信用卡号、住址等个人隐私。
其次,若私有信息被用于训练或检索增强生成,模型可能据此进行关联推断(如拼凑个人数据画像)。
SN:整合多源数据构建大型数据集的风险?
杰茜卡·季:数据集中化将制造更诱人的攻击目标。黑客无需入侵多个机构,只需攻破中央数据库。美国原有制度本避免将身份信息与健康状况等数据关联,但AI训练需求打破了这种隔离。
更隐蔽的风险在于:通过海量敏感数据建立统计关联可能引发公民自由危机。受影响群体甚至无法追溯损害源头至AI系统。
SN:可能遭遇哪些网络攻击?
李波:一是成员推断攻击——通过查询判断特定个体是否在训练集中;二是模型反演攻击——可还原完整训练数据记录;三是模型窃取攻击——盗取模型参数后持续泄露数据。
SN:模型安全能否消除风险?
李波:现有防护手段如"护栏模型"(AI防火墙)和"遗忘训练"均存在局限。前者需持续升级防御,后者可能损伤模型性能且无法确保彻底遗忘。目前尚无根本解决方案。
SN:对政府使用AI处理敏感数据的建议?
杰茜卡·季:必须将安全置于首位,现有风险管理体系需适配AI特性。当前普遍存在高层强推AI落地、基层仓促执行的情况,亟需建立审慎评估机制。
李波:所有模型都应配备护栏防护,同时持续开展"红队测试"以发现新漏洞。
SN:AI应用带来的网络安全挑战?
杰茜卡·季:机构对数据流向的掌控力会下降。若未禁止员工使用商用AI聊天机器人,可能导致企业代码被输入第三方系统。某些平台政策允许将用户输入用于训练,这种数据失控将引发重大风险。
(注:文中专家职务与机构名称采用中文惯用译法,技术术语参照《人工智能标准化白皮书》规范表述)
英文来源:
The U.S. government wants to go ‘all in’ on AI. There are big risks
The push could leave sensitive data vulnerable to leaks or hacks, experts warn
By Ananya
Under a newly released action plan for artificial intelligence, the technology will be integrated into U.S. government functions. The plan, announced July 23, is another step in the Trump administration’s push for an “AI-first strategy.”
In July, for instance, the U.S. Department of Defense handed out $200 million contracts to Anthropic, Google, OpenAI and xAI. Elon Musk’s xAI announced “Grok for Government,” where federal agencies can purchase AI products through the General Services Administration. And all that comes after months of reports that the advisory group called the Department of Government Efficiency has gained access to personal data, health information, tax information and other protected data from various government departments, including the Treasury Department and Veteran Affairs. The goal is to aggregate it all into a central database.
But experts worry about potential privacy and cybersecurity risks of using AI tools on such sensitive information, especially as precautionary guardrails, such as limiting who can access certain types of data, are loosened or disregarded.
To understand the implications of using AI tools to process health, financial and other sensitive data, Science News spoke with Bo Li, an AI and security expert from University of Illinois Urbana-Champaign, and Jessica Ji, an AI and cybersecurity expert at Georgetown University’s Center for Security and Emerging Technology in Washington, D.C. This interview has been edited for length and clarity.
SN: What are the risks of using AI models on private and confidential data?
Li: First is data leakage. When you use sensitive data to train or fine-tune the model, it can memorize the information. Say you have patient data trained in the model, and you query the model asking how many people have a particular disease, the model may exactly answer it or may leak the information that [a specific] person has that disease. Several people have shown that the model can even leak credit card numbers, email addresses, your residential address and other sensitive and personal information.
Second, if the private information is used in the model’s training or as reference information for retrieval-augmented generation, then the model could use such information for other inferences [such as tying personal data together].
SN: What are the risks associated with consolidating data from different sources into one large dataset?
Ji: When you have consolidated data, you just make a bigger target for adversarial hackers. Rather than having to hack four different agencies, they can just target your consolidated data source.
In the U.S. context, previously, certain organizations have avoided combining, for example, personally identifiable information and linking someone’s name and address with health conditions that they may have.
On consolidating government data to train AI systems, there are major privacy risks associated with it. The idea that you can establish statistical linkages between certain things in a large dataset, especially containing sensitive information such as financial and medical and health information, just carries civil liberties and privacy risks that are quite abstract. Certain people will be adversely impacted but they may not be able to link the impacts to this AI system.
SN: What cyberattacks are possible?
Li: A membership attack is one, which means if you have a model trained with some sensitive data, by querying the models, you want to know, basically the membership, if a particular person is in this [dataset] or not.
Second is model inversion attack, in which you recover not only the membership but also the whole instance of the training data. For example, there’s one person with a record of their age, name, email address and credit card number, and you can recover the whole record from the training data.
Then, model stealing attack means you actually steal the model weights [or parameters], and you can recover the model [and can leak additional data].
SN: If the model is secure, would it be possible to contain the risk?
Li: You can secure the model in certain ways, like by forming a guardrail model, which identifies the sensitive information in the input and output and tries to filter them, outside the main model as an AI firewall. Or there are strategies for training the model to forget information, which is called unlearning. But it’s ultimately not solving the problem because, for example, unlearning can hurt the performance and also cannot guarantee that you unlearn certain information. And for guardrail models, we will need stronger and stronger guardrails for all kinds of diverse attacks and sensitive information leakage. So I think there are improvements on the defense side, but not a solution yet.
SN: What would your recommendations be for the use of AI with sensitive, public, government data?
Ji: Prioritizing security and thinking about the risks and benefits and making sure that your existing risk management processes can adapt to the nature of AI tools.
What we have heard from various organizations both in government and the private sector is that you have a very strong top-down messaging from your CEO or from your agency head to adopt AI systems right away to keep up with the rivals. It’s the people lower down who are tasked with actually implementing the AI systems and oftentimes they’re under a lot of pressure to bring in systems very quickly without thinking about the ramifications.
Li: Whenever we use the model, we need to pair it with a guardrail model as a defense step. No matter how good or how bad it is, at least you need to get a filter so that we can offer some protection. And we need to continue red teaming [with ethical hackers to assess weaknesses] for these types of applications and models so that we can uncover new vulnerabilities over time.
SN: What are the cybersecurity risks of using AI?
Ji: When you’re introducing these models, there’s a process-based risk where you as an organization have less control, visibility and understanding of how data is being circulated by your own employees. If you don’t have a process in place that, for example, forbids people from using a commercial AI chatbot, you have no way of knowing if your workers are putting parts of your code base into a commercial model and asking for coding assistance. That data could potentially get exposed if the chatbot or the platform that they’re using has policies that say that they can ingest your input data for training purposes. So not being able to keep track of that creates a lot of risk and ambiguity.