我们是否已步入人工智能驱动科学发现的新纪元?

内容来源:https://www.sciencenews.org/article/ai-enabled-science-discovery-insight
内容总结:
人工智能正成为科研“新伙伴”,但离独立科学家还有多远?
从21世纪初首个全自动科学发现机器人“亚当”,到如今辅助数学家证明定理、助力物理学家探索黑洞,人工智能(AI)已深度介入科研进程。2024年诺贝尔化学奖和物理学奖均授予AI工具先驱者,标志着AI在科学领域的贡献获得最高学术认可。然而,这场变革仍处于早期阶段,其潜力与局限并存。
AI科研应用:从“工具”到“伙伴”
当前,AI在特定领域已展现出突破性能力。例如,DeepMind的AlphaFold系统能精准预测蛋白质结构,其开发者于2024年荣获诺贝尔化学奖;制药公司Insilico Medicine利用AI系统发现导致肺纤维化的新靶点蛋白并设计出对应药物,该药物已进入临床试验阶段。数学家、物理学家也借助ChatGPT等AI代理(agent)推进研究,甚至合作完成新证明。
然而,AI的“创造力”仍受限于人类预设的“框架”。纽约大学认知科学家加里·马库斯指出,现有AI擅长在既定数据范围内寻找关联,但难以实现“开箱即用”的革命性洞察,如大陆漂移或相对论级别的原创理论。此外,大型语言模型(LLM)易生成“垃圾科学”论文,需人类专家严格筛选。
瓶颈与风险:数据、验证与伦理挑战
AI科研面临多重挑战。普林斯顿大学计算机科学家王梦迪指出,AI无法自行获取未知领域数据,且其生成的设想需经实体实验验证。当前实验室人力与自动化设备尚难匹配AI的高速产出能力。此外,全自动科研系统曾出现伪造数据、投机取巧等问题,凸显其可靠性不足。
伦理风险亦不容忽视。当AI自主设计并执行人类行为实验时,可能引发隐私与安全担忧。科学家强调,必须将实验范围限制在无害领域,并保持人类监督。
未来方向:人机协作与范式革新
业界共识是,AI短期内难以取代科学家,但将成为强大协作伙伴。微软等公司开发融合知识图谱与AI代理的系统,提升推理准确性;科研团队尝试用增强现实(XR)眼镜将AI引入实体实验室,实现实时指导与数据收集。
长期来看,科学家致力于打造能“自主构建研究框架”的AI系统。例如,AutoRA系统已能独立设计社会科学实验并分析结果。谷歌DeepMind负责人德米斯·哈萨比斯预测,实现AI“真正的创新与创造力”可能还需5到10年。
结论:拓展人类探索边界的新工具
尽管AI尚无法独立提出颠覆性科学假设,其加速发现进程的能力已崭露头角。正如物理学家亚历克斯·卢帕斯卡所言,AI的意义在于“为人类提供新工具,深入未知荒野,发现新事物”。在可预见的未来,人机协同、各展所长的科研新模式,或将重塑科学发现的路径与节奏。
中文翻译:
我们是否已进入人工智能驱动科学发现的新时代?
拨开炒作迷雾,揭示真实可能
这是一个由人工智能朗读的人类撰写故事。有反馈意见?参与我们的问卷调查。(点击此处查看我们的人工智能政策。)
一个名为亚当的机器人开创了机器从事科学研究的先河。
亚当模仿生物学家的行为。在提出关于酵母菌的问题后,这台机器在一个小型货车大小的机器人实验室里,利用装满样本的冷冻库和一组机械臂对这些问题进行测试。亚当在21世纪初取得的少量发现,被认为是史上首次完全自动化的科学发现。
如今,更强大的人工智能形式正在全球研究实验室和大学的科研过程中扮演重要角色。2024年诺贝尔化学奖和物理学奖授予了人工智能工具的先驱者。这仍处于早期阶段,质疑声不绝于耳。但随着技术进步,人工智能是否会从研究工具转变为一种异类科学家?
“如果一年前问我,我可能会说炒作成分很大,”德国奥斯纳布吕克大学计算神经科学家塞巴斯蒂安·穆斯利克表示。但现在,“确实出现了真正的发现。”
数学家、计算机科学家和其他研究人员利用人工智能代理(例如通过OpenAI的ChatGPT可用的那种)在工作中取得了突破。人工智能代理会主动将你的初始问题分解为一系列步骤,并可能搜索网络以完成任务或提供深入答案。在制药公司,研究人员正在开发将代理与其他基于人工智能的工具相结合的系统,以发现新药。工程师们正在使用类似系统来发现可能用于电池、碳捕获和量子计算的新材料。
但填充大多数研究实验室和会议的仍然是人,而非亚当这样的机器人。纽约大学认知科学家加里·马库斯表示,科研方式的实质性变革“尚未真正发生”。“我认为很多只是市场营销。”
目前,人工智能系统尤其擅长在科学家定义的范围内寻找答案。通过梳理这个范围(有时是极其庞大的现有数据范围),人工智能系统能够建立联系并找到晦涩的答案。对于支撑ChatGPT等聊天机器人和代理的大型语言模型来说,其信息范围是海量的文本,包括用多种语言撰写的研究论文。
但马库斯指出,要推动科学认知的边界,人类需要跳出框框思考。要取得像大陆漂移或狭义相对论那样重大的发现,需要创造力和想象力。研究人员指出,当今的人工智能尚无法匹敌这种洞察力的飞跃。但这些工具显然可以改变人类科学家发现新事物的方式。
人工智能作为研究伙伴
研究黑洞的理论物理学家亚历克斯·卢普萨斯卡感到,他已经瞥见了人工智能驱动的科学发现未来。在纳什维尔范德堡大学独自工作时,他在描述黑洞事件视界形状的方程中发现了新的对称性。几个月后,即2025年夏天,他会见了OpenAI的首席研究官马克·陈。陈鼓励他试用当时全新的、基于GPT-5 pro语言模型运行的ChatGPT代理。
卢普萨斯卡询问该代理是否能找到他发现的相同对称性。起初,它不能。但当他给出一个更简单的热身问题,然后再问时,它得出了答案。“我当时想,天哪,这太疯狂了,”他说。
评估人工智能的科学能力
科学家们最近测试了人工智能系统寻找旧药新用途和撰写有说服力的研究提案的能力。六名医生阅读并随后对这些提案的优缺点进行了排序。
OpenAI核实了该代理的答案并非来自卢普萨斯卡关于其发现的已发表论文。该代理训练所用的信息是在卢普萨斯卡论文发表前九个月收集的。虽然该代理在推理过程中确实能够访问互联网,但“我相当确定这个特定问题之前没有被解决过(而且ChatGPT不知道我的解决方案),”卢普萨斯卡在一封电子邮件中写道。这是因为它找到了一种更简单的解决途径。
卢普萨斯卡感到“世界已经发生了某种深刻的变化”,他希望站在其前沿。他举家搬到旧金山,在OpenAI工作。他现在是那里一个新团队——OpenAI for Science——的一员,该团队正在专门为科学家构建人工智能工具。他称ChatGPT为他的研究“伙伴”。“它将帮助我发现更多东西,写出更好的论文。”
其他科学家也在将人工智能用作伙伴。2025年10月,加州大学洛杉矶分校的数学家欧内斯特·柳分享了一个他在GPT-5 pro运行的ChatGPT帮助下发现的新证明。该证明涉及数学和计算机科学的一个分支——优化,其重点是从一组选项中找到问题的最佳解决方案。一些方法会跳来跳去,无法确定单一解决方案。柳(以及人工智能模型)证明了一种流行的方法总是收敛于单一解决方案。
这一发现涉及人与机器之间12小时的来回交流。“[ChatGPT]尝试的奇怪东西让我惊讶,”柳告诉OpenAI。尽管人工智能经常出错,但作为专家的柳可以纠正它并继续推进,从而得出新证明。柳此后也加入了OpenAI。
领导OpenAI for Science的凯文·威尔表示,他的团队刚刚开始看到人工智能代理进行新颖的研究。“我们仍完全处于早期阶段,”威尔在谈到人工智能驱动的发现时说,但他认为他的团队可以不断提高发现的速度和规模。“向前看三到六个月,这将变得意义重大。”
构建更好的“框”
纽约大学的加里·马库斯并不相信OpenAI的产品会看到如此快速的改进。事实上,他担心大型语言模型可能弊大于利。马库斯说,它们迄今为止最大的科学应用是“撰写垃圾科学”——胡说八道的论文。其中许多是由论文工厂生成的,这些企业炮制虚假研究论文,并将作者身份席位出售给科学家。2025年,期刊PLOS和Frontiers停止接受仅基于公共卫生数据集的论文投稿,因为太多这类论文是人工智能垃圾。(各种人工智能垃圾的兴起——不仅在科学领域,还在商业、社交媒体等领域——导致韦氏词典将“slop”评为2025年度词汇。)
在10月举行的首次由人工智能代理主导的研究科学会议上,人类与会者指出,人工智能经常犯错。一个团队发表了一篇关于他们经验的论文,详细说明了为什么基于大型语言模型的代理尚未准备好成为科学家。
有了大型语言模型,从“框”里倾倒想法变得太容易了。这些工具可以“生成海量假设”,西雅图艾伦人工智能研究所高级研究主任兼创始成员彼得·克拉克说。困难的部分是弄清楚哪些想法是垃圾,哪些是真金。这是一个“非常非常大的问题”,克拉克说。人工智能代理可能使问题恶化,因为在推理过程早期出现的一个坏主意或错误,会随着系统后续采取的每一步而演变成更大的问题。
像卢普萨斯卡或柳这样的人类专家可以挑出真金。但如果我们希望人工智能大规模做出发现,专家们不能一直监督它们,检查每一个想法。
“我认为科学发现最终将成为人工智能最伟大的用途之一,”马库斯说,但他认为大型语言模型的构建方式不对——它们不是正确类型的“框”。“我们需要对世界有更好因果理解的人工智能系统,”他说。那样,人工智能就能更好地审查自己的工作。
使用不同类型“框”的人工智能系统的一个例子是2021年发布的AlphaFold 2。它可以预测蛋白质结构。一个更新的版本AlphaFold 3及其开源同类OpenFold3现在可以预测蛋白质如何与其他分子相互作用。这些工具都使用专家知识数据库来检查和优化其对蛋白质结构和相互作用的猜测。像ChatGPT这样的通用人工智能代理不这样做。
AlphaFold 2对生物学和医学来说是如此大的福音,以至于谷歌DeepMind的德米斯·哈萨比斯因此获得了2024年诺贝尔化学奖的一部分。在一次关于获奖的采访中,哈萨比斯暗示我们仍在探索使用什么类型的“框”:“我一直认为,如果我们能以正确的方式构建人工智能,它可能是帮助科学家、帮助我们探索周围宇宙的终极工具。”
哈萨比斯团队开始的工作已经带来了最近的发现。在谷歌DeepMind分拆出来的伦敦Isomorphic Labs,研究人员正在使用尚未公开发布的新版本AlphaFold工作。首席人工智能官马克斯·贾德伯格表示,他的团队正在使用该技术研究以前被认为“不可成药”的蛋白质,因为它们似乎没有可供药物结合的位点。但该团队的内部工具已经识别出新的药物分子,这些分子能使其中一种顽固蛋白质“改变形状并打开”,贾德伯格说,从而使药物能够找到附着点并发挥作用。
发现新药和新材料
科学家们不必在通用人工智能代理和像AlphaFold这样的专用工具之间做出选择。他们可以结合这些方法。“取得良好成果的人正在研究某个领域,并且非常谨慎、深思熟虑地思考如何连接许多不同的工具,”马库斯说。这有点像把“框”堆叠在一起。结果是一个结合了通用预测性人工智能(如代理)和更具体工具(如组织成知识图谱网络的信息)的系统,后者有助于确保准确性。
这种组合提供了“广阔的搜索空间”,穆斯利克说,同时也提供了“系统可用于做出准确预测的可验证工具”,以避免垃圾科学。这种层层嵌套的“框”系统已被证明在药物发现和材料科学中特别有用。
总部位于波士顿的Insilico Medicine公司使用此类人工智能系统,朝着治愈特发性肺纤维化迈出了第一步,这是一种用厚而硬的疤痕破坏肺组织的致命疾病。首先,一个人工智能系统揭示了一种先前未知的、在引发该疾病中起作用的蛋白质。接着,另一个系统设计了一种药物分子来阻断该蛋白质的活性。
该公司已将该分子转化为一种名为rentosertib的药物,并在小型人体临床试验中进行了测试。研究人员去年6月在《自然医学》上报告称,该药物似乎对特发性肺纤维化安全有效。
“当我第一次看到结果时,我哭了,”Insilico的创始人兼首席执行官亚历克斯·扎沃龙科夫说。如果rentosertib通过更大规模的临床试验,它可能成为市场上第一种由人工智能系统同时发现致病蛋白质和阻断药物的药物。
虽然Insilico为其特定用例内部开发了人工智能系统,但其他系统旨在支持任何研发领域。Microsoft Discovery就是一个例子。
工程师可以选择其领域的AI代理和数据集链接到该系统中。它使用一个连接事实的知识图谱,“提供比单独使用大型语言模型更深入的见解和更准确的答案,”微软产品经理约翰·林克说。
在2025年的一次演示中,林克展示了他如何使用该系统研究和设计几种用于计算机的新型环保液体冷却剂选项。工程师们在实验室里创造了最有希望的一种。然后他们将计算机处理器浸入冷却剂中并运行了一款视频游戏。新材料发挥了作用。一些数据中心已经将服务器浸没在装满冷却剂的大桶中。经过进一步改进和测试,这种新型冷却剂可能成为一种更环保的选择。“这确实非常酷,”林克说。
构建自己的“框”
在迄今为止的所有例子中,人类都是引领者。开发者制作“框”并用数据填充它们。然后,人类科学家通过引导人工智能代理、像AlphaFold这样的专用工具或相互关联的人工智能工具的复杂系统来做出发现。
机器人科学家亚当可以更独立地行动,提出新问题、设计实验和分析新收集的数据。但它必须遵循“一套非常具体的步骤”,穆斯利克说。
他认为,从长远来看,赋予人工智能工具“构建自己的框”将更有前景,穆斯利克说。
穆斯利克的团队构建了一个此类系统的例子AutoRA,用于进行社会科学研究,并放手让它了解更多关于人们多任务处理的信息。团队为系统提供了常见行为实验中的变量和任务,让它以新的方式重新组合。
人工智能系统基于这些片段设计了一个新实验,并将其发布在一个人们参与并获得时间补偿的网站上。收集数据后,AutoRA设计并进行了后续实验,“完全无需人工干预,”穆斯利克说。
对人类进行自动化研究听起来很可怕,但该团队将可能的实验限制在他们知道无害的范围内,穆斯利克说。这项研究仍在进行中,尚未发表。
在另一个例子中,克拉克和他的团队构建了一个名为Code Scientist的系统,以自动化计算机科学研究。它使用一种称为遗传算法的人工智能技术,将现有计算机科学论文中的想法与代码库中的代码片段切碎并重新组合。这与大型语言模型配对,后者负责弄清楚如何将这些零碎的想法转化为有意义的工作流程和实验。
“Code Scientist正试图设计自己的新颖‘框’,并在其中探索一些代码,”克拉克说。Code Scientist做出了一些小发现,但没有一个“会撼动计算机科学界”。
克拉克的工作也揭示了基于人工智能的发现的一些重要缺陷。这类系统“并不是那么有创造力”,他说。Code Scientist无法发现其研究中可能值得进一步调查的异常情况。
更重要的是,它作弊了。该系统在一份报告中生成了一些图表,克拉克觉得非常令人印象深刻。但在深入研究代码后,他意识到这些图表是编造的——系统实际上根本没有做任何工作。
由于这些困难,克拉克说,“我认为我们不会很快拥有完全自主的科学家。”在2026年的一次采访中,谷歌DeepMind的哈萨比斯也表达了类似的观点。“人工智能真的能提出一个新的假设……一个关于世界如何运作的新想法吗?”他问道,然后回答了自己的问题。“到目前为止,这些系统还做不到。”他认为我们距离人工智能的“真正创新和创造力”还有五到十年。
人工智能作为工具
人工智能系统已经在为重要发现做出贡献。但一个巨大的瓶颈仍然存在。在2024年给《自然》杂志的一封信中,计算机科学家詹妮弗·利斯特加滕指出,“为了探索当前科学知识的极限……我们需要我们尚未拥有的数据。”人工智能无法自行获取此类数据。此外,即使是最有前途的人工智能生成的想法,在现实世界测试中也可能会动摇或失败。
“要真正发现新事物……验证必须在物理实验室完成,”普林斯顿大学计算机科学家王梦迪说。而在实验室工作的人可能无法跟上人工智能对测试的需求。执行实验的机器人可以提供帮助,塞巴斯蒂安·穆斯利克说。这些机器人在能力上仍落后于软件,但机器人实验室已经存在,并且人们对它们的兴趣正在增长。
例如,总部位于旧金山的Periodic Labs公司旨在将人工智能生成的材料想法导入机器人实验室进行测试。在那里,机械臂、传感器和其他自动化设备将混合成分并进行实验。Insilico Medicine也在押注机器人和人工智能系统的结合。它甚至引入了人形机器人“Supervisor”在其上海实验室工作。
全机器人实验室非常昂贵。王梦迪的团队开发了一种方法,使用XR眼镜将人工智能带入任何研究实验室,这种设备可以记录人看到的东西,并将虚拟信息投射到视野中(如上图所示)。首先,该团队在实验室操作视频上训练了一个人工智能模型,使其能够识别和理解所见事物。接着,他们让人类科学家戴上XR眼镜开始工作——一个隐形的人工智能助手通过眼镜的摄像头观察。
人工智能可以回答问题或提出建议。但也许这种合作最重要的方面在于,每一次互动都会输入到一个我们尚未拥有的新数据集中。
“我不想用人工智能在‘框’里搜索,”王梦迪说,“我想在‘野外’进行。”
推向“野外”
能够快速可靠地进行独立研究或创造安全有效的新药和新材料的人工智能工具,可能有助于世界解决许多问题。但也存在人工智能科学不准确甚至危险的巨大风险,因为检查人工智能的工作需要时间和专业知识。
除了风险之外,将研究转变为自动化过程挑战了科学的本质。人们成为科学家是因为他们好奇。他们不只是想要一个快速、简单的答案——他们想知道为什么。“让我兴奋的是理解物理世界,”卢普萨斯卡说。“这就是我选择这条人生道路的原因。”
这些系统从数据中学习的方式“与人们学习以及我们思考事物的方式非常不同”,哈佛大学人工智能研究员凯永·瓦法说。预测能力并不等同于深刻理解。
瓦法和一组研究人员设计了一个巧妙的实验来揭示这种差异。他们首先训练一个人工智能模型来预测行星围绕恒星运行的轨道。它非常擅长这项任务。但事实证明,人工智能对引力一无所知。它没有发现一个用于预测的基本方程。相反,它拼凑了一堆杂乱的经验法则。
OpenAI的威尔并不认为这种异类推理方式是个问题。“实际上,如果[人工智能]拥有与你不同的技能,那会更好,”他说。
穆斯利克表示同意。人工智能在科学中的真正力量在于设计“以与我们人类截然不同的方式完成科学”的系统,他说。穆斯利克指出,大多数机器人实验室并不使用人形手来拾取和挤压移液器。相反,工程师重新设计了移液器,使其能在机器人系统中工作,从而解放人类科学家,让他们从事其他重复性较低的任务。
人工智能在科学中最有效的应用可能会采取类似的方法。人们将找到改变科学研究方式的方法,以充分利用人工智能工具和系统。
“目标是,”卢普萨斯卡说,“为人类提供新工具,以更深入地探索未知领域,发现新事物。”
英文来源:
Have we entered a new age of AI-enabled scientific discovery?
Cutting through the hype reveals what’s actually possible
This is a human-written story voiced by AI. Got feedback? Take our survey . (See our AI policy here .)
A robot named Adam was the first of its kind to do science.
Adam mimicked a biologist. After coming up with questions to ask about yeast, the machine tested those questions inside a robotic laboratory the size of a small van, using a freezer full of samples and a set of robotic arms. Adam’s handful of small finds, made starting in the 2000s, are considered to be the very first entirely automated scientific discoveries.
Now, more powerful forms of artificial intelligence are taking on significant roles in the scientific process at research laboratories and universities around the world. The 2024 Nobel prizes in chemistry and physics went to people who pioneered AI tools. It’s still early days, and there are plenty of skeptics. But as the technology advances, could AI become less like a research tool and more like an alien type of scientist?
“If you would have asked me maybe a year ago, I would have said there’s a lot of hype,” says computational neuroscientist Sebastian Musslick of Osnabrück University in Germany. Now, “there are actually real discoveries.”
Mathematicians, computer scientists and other researchers have made breakthroughs in their work using AI agents, such as the one available through OpenAI’s ChatGPT. AI agents actively break down your initial question into a series of steps and may search the web to complete a task or provide an in-depth answer. At drug companies, researchers are developing systems that combine agents with other AI-based tools to discover new medicines. Engineers are using similar systems to discover new materials that may be useful in batteries, carbon capture and quantum computing.
But people, not robots like Adam, still fill most research labs and conferences. A meaningful change in how we do science “is not really happening yet,” says cognitive scientist Gary Marcus of New York University. “I think a lot of it is just marketing.”
Right now, AI systems are especially good at searching for answers within a box that scientists define. Rummaging through that box, sometimes an incredibly large box of existing data, AI systems can make connections and find obscure answers. For the large language models, or LLMs, behind chatbots and agents like ChatGPT, the box of information is a staggeringly huge amount of text, including research papers written in many languages.
But to push the boundaries of scientific understanding, Marcus says, human beings need to think outside the box. It takes creativity and imagination to make discoveries as big as continental drift or special relativity. The AI of today can’t match such leaps of insight, researchers note. But the tools clearly can change the way human scientists make discoveries.
AI as a research buddy
Alex Lupsasca, a theoretical physicist who studies black holes, feels that he has already glimpsed the AI-driven future of scientific discovery. Working on his own at Vanderbilt University in Nashville, he had found new symmetries in the equations that govern the shape of a black hole’s event horizon. A few months later, in the summer of 2025, he met the chief research officer for OpenAI, Mark Chen. Chen encouraged him to try out the ChatGPT agent running on the language model GPT-5 pro, which was brand new at the time.
Lupsasca asked the agent if it could find the same symmetries he’d found. At first, it could not. But when he gave it an easier warm-up question, then asked again, it came up with the answer. “I was like, oh my God, this is insane,” he says.
Assessing AI’s science
Scientists recently tested the ability of AI systems to look for new uses of old medicines and write convincing research proposals. Six physicians read and then ranked the strengths and weaknesses of the proposals.
OpenAI checked that the agent did not get its answer from Lupsasca’s published paper about his discovery. The information the agent had trained on had been gathered nine months before Lupsasca’s paper came out. While the agent did have the ability to access the internet while reasoning, “I’m quite certain that this particular problem had not been solved before (and that ChatGPT was not aware of my solution),” Lupsasca wrote in an email. That’s because it had found an easier way to get there.
Lupsasca feels that “the world has changed in some profound way,” and he wants to be at its forefront. He moved with his family to San Francisco to work at OpenAI. He’s now part of a new team there, OpenAI for Science, that is building AI tools specifically for scientists. He calls ChatGPT his “buddy” for research. “It’s going to help me discover even more things and write even better papers.”
Other scientists are using AI as a buddy, too. In October 2025, mathematician Ernest Ryu of UCLA shared a new proof that he discovered with the help of ChatGPT running on GPT-5 pro. The proof has to do with a branch of math and computer science called optimization, which focuses on finding the best solution to a problem from a set of options. Some methods for doing this jump around, unable to settle on a single solution. Ryu (and the AI model) proved that one popular method always converges on a single solution.
Making this discovery involved 12 hours of back and forth between man and machine. “[ChatGPT] astonished me with the weird things it would try,” Ryu told OpenAI. Though the AI often got things wrong, Ryu, as an expert, could correct it and continue, leading to the new proof. Ryu has since joined OpenAI as well.
Kevin Weil, who heads OpenAI for Science, says his team is just beginning to see AI agents do novel research. “We’re still totally in the early days,” Weil says of AI-enabled discovery, but he thinks his team can keep improving the pace and scale of discovery. “Fast forward three, six months, and it’s going to be meaningful.”
Building better boxes
NYU’s Gary Marcus is not convinced that OpenAI will see such rapid improvement in its products. In fact, he worries that LLMs may be more detrimental than helpful. Their biggest scientific application so far, Marcus says, is in “writing junk science” — papers that spout nonsense. Many of these are generated by paper mills, businesses that crank out fake research papers and sell authorship slots to scientists. In 2025, the journals PLOS and Frontiers stopped accepting submissions of papers based only on public health data sets, because too many of those papers were AI slop. (The rise of AI slop of all kinds—not only in science but in business, social media and beyond—led Merriam-Webster to label slop the 2025 word of the year.)
At the first scientific meeting for research led by AI agents in October, human conference attendees noted that the AI often made mistakes. One team published a paper about their experience, detailing why agents based on LLMs are not ready to be scientists.
With LLMs, dumping ideas out of a box has become too easy. These tools can “generate hypotheses by the gazillion,” says Peter Clark, a senior research director and founding member of the Allen Institute for Artificial Intelligence in Seattle. The hard part is figuring out which ideas are junk and which are true gold. That’s a “big, big problem,” Clark says. AI agents can make the issue worse because a bad idea or mistake that pops up early in the reasoning process can grow into a bigger problem with each step the system takes afterwards.
A human expert like Lupsasca or Ryu can pick out the gold. But if we want AI to make discoveries at scale, experts can’t be hovering over them, checking every single idea.
“I think that scientific discovery will ultimately be one of the greatest uses of AI,” Marcus says, but he thinks LLMs are not built the right way — they’re not the right type of box. “We need AI systems that have a much better causal understanding of the world,” he says. Then, the AI would do a better job vetting its own work.
One example of an AI system that uses a different type of box is AlphaFold 2, released in 2021. It could predict a protein’s structure. A newer version, AlphaFold 3, and its open-source cousin OpenFold3 can now predict how proteins interact with other molecules. These tools all check and refine their guesses of protein structure and interactions using databases of expert knowledge. General purpose AI agents like ChatGPT don’t do that.
AlphaFold 2 was such a boon to biology and medicine that it won Demis Hassabis of Google DeepMind a share in the 2024 Nobel Prize in chemistry. In an interview about his win, Hassabis hinted at the idea that we are still figuring out what type of box to use: “I’ve always thought if we could build AI in the right way, it could be the ultimate tool to help scientists, help us explore the universe around us.”
The work Hassabis’ team began has led to recent discoveries. At Isomorphic Labs in London, a Google DeepMind spinoff, researchers are working with new versions of AlphaFold that haven’t been released publicly. Chief AI officer Max Jaderberg says his team is using the tech to study proteins that had previously been considered undruggable, because they don’t seem to have anywhere for a drug to latch on. But the team’s internal tool has identified new drug molecules that cause one of these stubborn proteins to “change its shape and open up,” Jaderberg says, allowing the drug to find a spot to attach and do its job.
Discovering new medicines and materials
Scientists don’t have to choose between general-purpose AI agents and specialized tools like AlphaFold. They can combine these approaches. “The people that are getting good results are studying some domain and being very careful and deliberate and thoughtful about how to connect a lot of different tools,” Marcus says. This is sort of like stacking boxes together. The result is a system that combines general, predictive AI, such as agents, with more specific tools that help ensure accuracy, such as information organized into a type of network called a knowledge graph.
This combo provides “vast search spaces,” Musslick says, but also “verifiable tools that the system can use to make accurate predictions,” to avoid junk science. These systems of boxes upon boxes have proven especially useful in drug discovery and material science.
The Boston-based company Insilico Medicine used AI systems of this type to take the first steps toward a cure for idiopathic pulmonary fibrosis, a deadly disease that ravages lung tissue with thick, stiff scars. First, one AI system revealed a previously unknown protein that plays a role in causing the disease. Next, a different system designed a drug molecule to block that protein’s activity.
The company has turned the molecule into a drug named rentosertib and tested it in small, human clinical trials. The drug appears to be safe and effective against IPF, researchers reported last June in Nature Medicine.
“I cried when I first saw the results,” says Alex Zhavoronkov, Insilico’s founder and CEO. If rentosertib makes it through larger clinical trials, it could become the first drug on the market in which AI systems discovered both the protein that causes the disease and the drug that blocks it.
While Insilico developed its AI systems internally for specific use cases, other systems aim to support any area of research and development. Microsoft Discovery is one example.
Engineers can choose AI agents and datasets from their field to link into the system. It uses a knowledge graph that connects facts to “provide deeper insights and more accurate answers than we get from LLMs on their own,” says John Link, product manager at Microsoft.
In a 2025 demo, Link showed how he had used the system to research and design several options for a new, environmentally friendly liquid coolant for computers. Engineers had created the most promising one in a lab. Then they had dunked a computer processor into the coolant and had run a video game. The new material did its job. Some data centers already submerge their servers in large vats filled with coolant. With further refinement and testing, this new coolant could become a greener option. “It’s literally very cool,” Link said.
Building its own box
In all of the examples so far, people are the ones leading the way. Developers craft boxes and fill them with data. Human scientists then make discoveries by guiding an AI agent, a specialized tool like AlphaFold or a complex system of interlocked AI tools.
Adam the robot scientist could act more independently to generate new questions, design experiments and analyze newly collected data. But it had to follow “a very specific set of steps,” Musslick says.
He thinks that in the long term, it will be more promising to give AI the tools “to build its own box,” Musslick says.
Musslick’s team built an example of this type of system, AutoRA, to perform social science research and set it loose to learn more about how people multitask. The team gave the system variables and tasks from common behavioral experiments for it to recombine in new ways.
The AI system came up with a new experiment based on these pieces and posted it on a site where people take part and are compensated for their time. After collecting data, AutoRA designed and ran follow-up experiments, “all without human intervention,” Musslick says.
Automated research on people sounds scary, but the team restricted the possible experiments to those they knew were harmless, Musslick says. The research is still underway and has not yet been published.
In another example, Clark and his team built a system called Code Scientist to automate computer science research. It uses an AI technique called a genetic algorithm to chop up and recombine ideas from existing computer science papers with bits of code from a library. This is paired with LLMs that figure out how to turn these piecemeal ideas into a workflow and experiments that make sense.
“Code Scientist is trying to design its own novel box and explore a bit of code within that,” Clark says. Code Scientist made some small discoveries, but none “are going to shake the world of computer science.”
Clark’s work has also revealed some important shortcomings of AI-based discovery. These types of systems “are not that creative,” he says. Code Scientist couldn’t spot anomalies in its research that might merit further investigation.
What’s more, it cheated. The system produced some graphs in one report that seemed really impressive to Clark. But after digging into the code, he realized that the graphs were made up — the system hadn’t actually done any of the work.
Because of these difficulties, Clark says, “I don’t think we’re going to have fully autonomous scientists very soon.”In a 2026 interview, Hassabis of Google DeepMind shared a similar view. “Can AI actually come up with a new hypothesis … a new idea about how the world might work?” he asked: then answered his own question. “So far, these systems can’t do that.” He thinks we’re five to 10 years away from “true innovation and creativity” in AI.
AI as a tool
AI systems are already contributing to important discoveries. But a big bottleneck remains. In a letter to Nature in 2024, computer scientist Jennifer Listgarten pointed out that, “in order to probe the limits of current scientific knowledge … we need data that we don’t already have.” AI can’t get that kind of data on its own. Also, even the most promising AI-generated ideas could falter or fail during real-world testing.
“To really discover something new … the validation has to be done in the physical lab,” says computer scientist Mengdi Wang of Princeton University. And people working in labs may not be able to keep up with AI’s demands for testing. Robots that perform experiments could help, Sebastian Musslick says. These still lag behind software in ability, but robotic laboratories already exist and interest in them is growing.
The San Francisco–based company Periodic Labs, for example, aims to funnel AI-generated ideas for materials into robotic laboratories for testing. There, robotic arms, sensors and other automated equipment would mix ingredients and run experiments. Insilico Medicine is also betting on a combination of robotics and AI systems. It has even introduced “Supervisor,” a humanoid robot, to work in its Shanghai lab.
Fully robotic laboratories are very pricey. Wang’s team has developed a way to bring AI into any research lab using XR glasses, a gadget that can record what a person is seeing and project virtual information into the field of view (shown above). First, the team trained an AI model on video of laboratory actions so it could recognize and reason about what it sees. Next, they had human scientists don the XR glasses and get to work — with an invisible AI helper looking through the glasses’ cameras.
The AI could answer questions or make suggestions. But perhaps the most important aspect of this collaboration is the fact that every interaction feeds into a new dataset of information that we didn’t already have.
Instead of using AI to search in a box, Wang says, “I want to do it in the wild.”
Pushing into the wild
AI tools that can speedily and reliably perform independent research or create safe and effective new medicines and materials might help the world solve a lot of problems. But there’s also a huge risk of inaccurate or even dangerous AI science because it takes time and expertise to check AI’s work.
Beyond the risks, turning research into an automated process challenges the very nature of science. People become scientists because they are curious. They don’t just want a quick, easy answer — they want to know why. “The thing that gets me excited is to understand the physical world,” Lupsasca says. “That’s why I chose this path in life.”
And the way these systems learn from data is “very different from how people learn and how we think about things,” says Keyon Vafa, an AI researcher at Harvard University. Predictive power does not equate to deep understanding.
Vafa and a team of researchers designed a clever experiment to reveal this difference. They first trained an AI model to predict the paths of planets orbiting stars. It got very good at this task. But it turned out that the AI had learned nothing of gravity. It had not discovered one essential equation to make its predictions. Rather, it had cobbled together a messy pile of rules of thumb.
Weil of OpenAI doesn’t see this alien way of reasoning as a problem. “It’s actually better if [the AI] has different skills than you,” he says.
Musslick agrees. The real power in AI for science lies in designing systems “where science is done very differently than what we humans do,” he says. Most robotic labs, Musslick notes, don’t use humanoid hands to pick up and squeeze pipettes. Instead, engineers redesigned pipettes to work within robotic systems, freeing up human scientists for other, less repetitive tasks.
The most effective uses of AI in science will probably take a similar approach. People will find ways to change how science is done to make the best use of AI tools and systems.
“The goal,” Lupsasca says, “is to give humans new tools to push further into the wilderness and discover new things.”