社交媒体如何助长了人工智能过度乐观的最坏一面

内容总结:
近期,社交媒体上围绕人工智能数学能力的“突破性”宣传频现,却屡屡陷入“先炒作,后思考”的怪圈。这一现象在OpenAI科学家塞巴斯蒂安·布贝克的一则兴奋帖文中显露无遗:他宣称其最新大语言模型GPT-5解决了10个未解的数学难题,并高呼“AI驱动的科学加速时代已到来”。然而,该声明迅速被专业学者“打假”——曼彻斯特大学数学家托马斯·布鲁姆指出,所谓“未解难题”实为已有解法但未被其个人收录的旧问题,GPT-5只是复现了既有成果。
这一事件折射出当前AI领域的两面性:一方面,社交媒体往往放大未经审慎验证的夸张论断,助长盲目追捧;另一方面,模型在文献检索与知识整合上的实际能力(如发现研究者未知的既有解法)本身具有价值,却易被炒作掩盖。专注于数学AI应用的科学家弗朗索瓦·沙尔东指出,大模型在快速梳理海量学术文献上潜力显著,但社交媒体更热衷渲染“颠覆性突破”,甚至将本科难度的习题解答包装为重大进展。
与此同时,针对大模型在医学、法律等专业领域能力的冷静评估正在浮现。最新研究表明,模型虽能辅助诊断,却在治疗方案推荐上存在缺陷;法律咨询中也常出现不一致或错误建议。研究者警示,现有证据远未达到证明其可靠性的标准。
然而,社交媒体的传播逻辑仍助推着浮夸风气。沙尔东坦言:“人人都在疯狂交流,唯恐落后。”从奥特曼到马库斯,行业领袖在平台上的公开论战进一步放大了这种焦虑。布贝克的失误之所以尴尬,只因它被当场揭穿——更多未经验证的说法仍在网络中发酵。
值得关注的是,初创公司Axiom近期宣布其数学专用模型AxiomProver解决了两个长期未解的埃尔德什问题,并在大学数学竞赛中表现突出,获得业内知名人士的称赞。但这同样引发思考:竞赛成绩究竟反映的是模型的创造性解题能力,还是对既有知识的熟练重组?评判AI的真实数学水平,仍需超越社交媒体的喧嚣,深入探究模型解决人类眼中“难题”的实际机制。
在AI技术快速迭代的浪潮中,如何平衡创新热情与科学审慎,已成为行业必须面对的课题。
中文翻译:
社交媒体如何助长人工智能的盲目吹捧
这是一个"先炒作,后思考"的时代。
谷歌DeepMind首席执行官德米斯·哈萨比斯用三个字概括了这种现象:"真尴尬。"
哈萨比斯在X平台回复了竞争对手OpenAI研究科学家塞巴斯蒂安·布贝克一篇过度兴奋的帖子。布贝克当时宣称,两位数学家利用OpenAI最新大语言模型GPT-5,解决了数学领域十个未解难题。"人工智能推动科学加速的时代正式开启,"布贝克得意地写道。
让我们戴上数学思考帽,看看这场十月中旬的争论究竟是怎么回事。这正是当前人工智能领域乱象的绝佳例证。
布贝克的兴奋点在于,GPT-5似乎以某种方式解决了多个被称为"埃尔德什问题"的数学谜题。
保罗·埃尔德什作为20世纪最高产的数学家之一,逝世后留下了数百道谜题。为追踪这些问题的解决进展,英国曼彻斯特大学数学家托马斯·布鲁姆创建了erdosproblems.com网站,收录了1100多道问题,其中约430道标注了解决方案。
当布贝克庆祝GPT-5的突破时,布鲁姆迅速在X平台反驳:"这是严重的误导性陈述。"他解释说,网站未列出解法并不代表问题未被解决,仅仅意味着他本人尚未知晓。现存数学论文数以百万计,无人能尽览——但GPT-5或许可以。
真相是:GPT-5并未创造十个新解法,而是在网络上搜罗了十个布鲁姆未曾见过的现存解法。尴尬!
此事带来两点启示:其一,关于重大突破的轻率论断不应通过社交媒体传播——需要减少条件反射式的欢呼,增加审慎核查。
其二,GPT-5能发现布鲁姆未知的既有研究成果,这本身也令人惊叹。过度炒作反而掩盖了本应值得关注的亮点。
研究大语言模型数学应用的科学家弗朗索瓦·沙尔顿向我指出,数学家们对利用大模型梳理海量现有成果极感兴趣。
但与真正的学术发现相比,文献检索显得平淡——尤其对社交媒体上狂热的AI鼓吹者而言。布贝克的失误并非孤例。
今年八月,两位数学家证明当时所有大模型都无法解决"圆村第五百五十四问题"。两个月后,社交媒体疯传GPT-5已攻克该题的证据。有观察者评论:"李世石时刻即将降临更多人",暗指2016年围棋大师败给AlphaGo的事件。
但沙尔顿指出,解决这道题对数学家而言意义有限:"这只是本科生水平的问题。当前存在过度渲染一切的倾向。"
与此同时,关于大模型能力边界的理性评估正在浮现。就在数学家们为GPT-5争论时,两项新研究深入探讨了大模型在医学和法律领域的应用(这两个领域正是模型开发者宣称的强项)。
研究发现:大模型能进行某些医学诊断,但在治疗方案推荐上存在缺陷;法律咨询方面,模型常给出矛盾且错误的建议。研究者总结:"现有证据完全无法满足证明标准。"
但这类审慎观点在X平台并不受欢迎。"人人都在疯狂交流,唯恐落后——这种氛围催生了过度兴奋,"沙尔顿坦言。X平台已成为AI新闻首发地、成果炫耀场,也是奥特曼、杨立昆等业界领袖公开交锋的擂台。人们既难跟上节奏,更难移开视线。
布贝克的帖子之所以尴尬,只因错误被当场揭穿。并非所有谬误都会曝光。若现状不变,研究者、投资者与泛化的鼓吹者将继续相互助长。"其中有些是科学家,更多则不是,但他们都是技术狂热者,"沙尔顿告诉我,"夸张的宣称在这些网络环境中极具传播力。"
后续来了!以上所有内容本是为《麻省理工科技评论》2026年1/2月刊"算法"专栏撰写的文章(即将出版)。付印两天后,公理数学公司告知我,其数学模型AxiomProver已解决两道公开的埃尔德什问题(#124和#481)。这对成立仅数月的新创公司堪称惊艳——没错,AI进化就是如此迅猛!
但这还没完。五天后该公司宣布,AxiomProver在本年度普特南数学竞赛中解出12道题里的9道。这项大学级别数学挑战的难度,在某些人看来甚至超过知名的国际数学奥林匹克竞赛(而谷歌与OpenAI的大模型数月前刚在奥赛中取得佳绩)。
普特南竞赛结果获得谷歌DeepMind首席科学家杰夫·迪恩等业界巨擘在X平台盛赞,熟悉的争论再次在评论区上演。有研究者指出:国际奥赛侧重创造性解题,普特南竞赛则测试数学知识储备——这对本科生 notoriously 困难,但对吞噬了互联网数据的大模型理论上更易应对。
该如何评判公理数学的成就?至少不应在社交媒体上论断。引人注目的竞赛胜利只是起点。要真正评估大模型的数学能力,必须深入探究它们解决人类眼中难题时的内在机制。
本文原载于每周AI通讯《算法》。欲优先获取此类内容,请点击此处订阅。
深度解析
人工智能
通用人工智能如何成为时代最具影响力的"阴谋论"
机器将比肩甚至超越人类智能的理念,已裹挟了整个行业。但细察之下会发现,这种迷思的存续与阴谋论的传播逻辑如出一辙。
OpenAI新模型揭示AI运作的隐秘逻辑
这款实验模型虽无法与顶尖系统竞争,却能解释主流模型为何行为怪异,以及其可信度究竟如何。
量子物理学家成功"瘦身"并解除DeepSeek R1的内容限制
他们将AI推理模型体积压缩超一半,并宣称其现在能回答中国AI系统曾回避的敏感政治问题。
保持联系
获取《麻省理工科技评论》最新动态
发现特别优惠、头条新闻、近期活动等精彩内容。
英文来源:
How social media encourages the worst of AI boosterism
The era of hype first, think later.
Demis Hassabis, CEO of Google DeepMind, summed it up in three words: “This is embarrassing.”
Hassabis was replying on X to an overexcited post by Sébastien Bubeck, a research scientist at the rival firm OpenAI, announcing that two mathematicians had used OpenAI’s latest large language model, GPT-5, to find solutions to 10 unsolved problems in mathematics. “Science acceleration via AI has officially begun,” Bubeck crowed.
Put your math hats on for a minute, and let’s take a look at what this beef from mid-October was about. It’s a perfect example of what’s wrong with AI right now.
Bubeck was excited that GPT-5 seemed to have somehow solved a number of puzzles known as Erdős problems.
Paul Erdős, one of the most prolific mathematicians of the 20th century, left behind hundreds of puzzles when he died. To help keep track of which ones have been solved, Thomas Bloom, a mathematician at the University of Manchester, UK, set up erdosproblems.com, which lists more than 1,100 problems and notes that around 430 of them come with solutions.
When Bubeck celebrated GPT-5’s breakthrough, Bloom was quick to call him out. “This is a dramatic misrepresentation,” he wrote on X. Bloom explained that a problem isn’t necessarily unsolved if this website does not list a solution. That simply means Bloom wasn’t aware of one. There are millions of mathematics papers out there, and nobody has read all of them. But GPT-5 probably has.
It turned out that instead of coming up with new solutions to 10 unsolved problems, GPT-5 had scoured the internet for 10 existing solutions that Bloom hadn’t seen before. Oops!
There are two takeaways here. One is that breathless claims about big breakthroughs shouldn’t be made via social media: Less knee jerk and more gut check.
The second is that GPT-5’s ability to find references to previous work that Bloom wasn’t aware of is also amazing. The hype overshadowed something that should have been pretty cool in itself.
Mathematicians are very interested in using LLMs to trawl through vast numbers of existing results, François Charton, a research scientist who studies the application of LLMs to mathematics at the AI startup Axiom Math, told me when I talked to him about this Erdős gotcha.
But literature search is dull compared with genuine discovery, especially to AI’s fervent boosters on social media. Bubeck’s blunder isn’t the only example.
In August, a pair of mathematicians showed that no LLM at the time was able to solve a math puzzle known as Yu Tsumura’s 554th Problem. Two months later, social media erupted with evidence that GPT-5 now could. “Lee Sedol moment is coming for many,” one observer commented, referring to the Go master who lost to DeepMind’s AI AlphaGo in 2016.
But Charton pointed out that solving Yu Tsumura’s 554th Problem isn’t a big deal to mathematicians. “It’s a question you would give an undergrad,” he said. “There is this tendency to overdo everything.”
Meanwhile, more sober assessments of what LLMs may or may not be good at are coming in. At the same time that mathematicians were fighting on the internet about GPT-5, two new studies came out that looked in depth at the use of LLMs in medicine and law (two fields that model makers have claimed their tech excels at).
Researchers found that LLMs could make certain medical diagnoses, but they were flawed at recommending treatments. When it comes to law, researchers found that LLMs often give inconsistent and incorrect advice. “Evidence thus far spectacularly fails to meet the burden of proof,” the authors concluded.
But that’s not the kind of message that goes down well on X. “You’ve got that excitement because everybody is communicating like crazy—nobody wants to be left behind,” Charton said. X is where a lot of AI news drops first, it’s where new results are trumpeted, and it’s where key players like Sam Altman, Yann LeCun, and Gary Marcus slug it out in public. It’s hard to keep up—and harder to look away.
Bubeck’s post was only embarrassing because his mistake was caught. Not all errors are. Unless something changes researchers, investors, and non-specific boosters will keep teeing each other up. “Some of them are scientists, many are not, but they are all nerds,” Charton told me. “Huge claims work very well on these networks.”
There’s a coda! I wrote everything you’ve just read above for the Algorithm column in the January/February 2026 issue of MIT Technology Review magazine (out very soon). Two days after that went to press, Axiom told me its own math model, AxiomProver, had solved two open Erdős problems (#124 and #481, for the math fans in the room). That’s impressive stuff for a small startup founded just a few months ago. Yup—AI moves fast!
But that’s not all. Five days later the company announced that AxiomProver had solved nine out of 12 problems in this year’s Putnam competition, a college-level math challenge that some people consider harder than the better-known International Math Olympiad (which LLMs from both Google DeepMind and OpenAI aced a few months back).
The Putnam result was lauded on X by big names in the field, including Jeff Dean, chief scientist at Google DeepMind, and Thomas Wolf, cofounder at the AI firm Hugging Face. Once again familiar debates played out in the replies. A few researchers pointed out that while the International Math Olympiad demands more creative problem-solving, the Putnam competition tests math knowledge—which makes it notoriously hard for undergrads, but easier, in theory, for LLMs that have ingested the internet.
How should we judge Axiom’s achievements? Not on social media, at least. And the eye-catching competition wins are just a starting point. Determining just how good LLMs are at math will require a deeper dive into exactly what these models are doing when they solve hard (read: hard for humans) math problems.
This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.
Deep Dive
Artificial intelligence
How AGI became the most consequential conspiracy theory of our time
The idea that machines will be as smart as—or smarter than—humans has hijacked an entire industry. But look closely and you’ll see it’s a myth that persists for many of the same reasons conspiracies do.
OpenAI’s new LLM exposes the secrets of how AI really works
The experimental model won't compete with the biggest and best, but it could tell us why they behave in weird ways—and how trustworthy they really are.
Quantum physicists have shrunk and “de-censored” DeepSeek R1
They managed to cut the size of the AI reasoning model by more than half—and claim it can now answer politically sensitive questions once off limits in Chinese AI systems.
Stay connected
Get the latest updates from
MIT Technology Review
Discover special offers, top stories, upcoming events, and more.