«

不同的人工智能模型在编码现实时似乎趋于一致。

qimuai 发布于 阅读:12 一手编译


不同的人工智能模型在编码现实时似乎趋于一致。

内容来源:https://www.quantamagazine.org/distinct-ai-models-seem-to-converge-on-how-they-encode-reality-20260107/

内容总结:

人工智能模型呈现“认知趋同”迹象,或正形成统一的世界表征

近期,一项由麻省理工学院研究团队提出的“柏拉图式表征假说”在人工智能学界引发广泛讨论。该假说认为,尽管不同类型的人工智能模型基于迥异的数据进行训练,但随着模型能力的提升,其内部对现实世界的表征方式正逐渐趋同。

这一假说灵感来源于柏拉图著名的“洞穴寓言”。研究团队将现实世界比作洞穴外的实体,而各类数据则是现实投射的“影子”。人工智能模型如同洞穴中的囚徒,仅能接触数据“影子”。研究指出,当语言模型、视觉模型等不同类型的人工智能接触到足够多数据后,它们似乎开始构建出相似的、接近现实本质的内部表征。

为验证这一观点,研究人员采用“表征相似性分析”方法进行实验。他们将模型对特定概念(如“狗”)的内部表征转化为高维空间中的向量,并通过比较不同模型向量集群的几何结构来评估相似性。纽约大学的研究人员形象地解释:“这类似于测量‘相似性的相似性’。”

实验发现,在相同类型的模型中,性能更强的模型往往表现出更高的表征相似性。更引人注目的是,2023年的一项实验显示,当分别使用图像和文本数据训练视觉模型与语言模型时,随着模型规模扩大,两者对同一概念的表征也呈现出收敛趋势。这似乎表明,模型正在透过不同类型的数据,捕捉背后共通的现实结构。

然而,这一假说尚未成为学界共识。质疑者指出,当前研究多基于信息高度匹配的图文数据集(如维基百科),而在现实世界中,许多信息难以跨模态完全转化。加州大学伯克利分校的研究人员举例:“人们去美术馆不仅是为了阅读展品说明,更是为了亲身感受艺术。”他强调,不同模型对世界的理解必然存在不可化约的差异。

尽管存在争议,该研究方向已展现出实用潜力。去年夏天,研究人员成功实现了不同语言模型间内部表征的转换。若能进一步打通视觉与语言模型间的表征壁垒,或将催生更高效的多模态人工智能训练方法。

麻省理工学院团队负责人表示,科学探索的意义在于寻找普遍规律,“发现共性比罗列差异更具解释力”。也有学者认为,面对参数规模达万亿级的复杂系统,任何单一理论都难以完全概括其行为模式。

目前,围绕“柏拉图式表征假说”的辩论仍在持续,相关验证工作亦在推进中。这场讨论不仅关乎人工智能的理论基础,也可能为开发更接近人类认知方式的通用人工智能提供新思路。

中文翻译:

不同的人工智能模型似乎正趋同于它们对现实的编码方式

引言

读过一个关于狗的故事后,下次在公园里看到狗奔跑时,你可能会想起这个故事。这之所以可能,是因为你有一个统一的"狗"的概念,这个概念并不局限于文字或图像。无论是斗牛犬还是边境牧羊犬,无论是吠叫还是被抚摸肚子,狗可以有许多不同的表现形式,但始终是狗。

人工智能系统并不总是如此幸运。这些系统通过一个称为训练的过程,吸收大量数据来学习。通常,这些数据都是同一类型的——语言模型使用文本,计算机视觉系统使用图像,而设计用于预测分子气味或蛋白质结构的系统则使用更特殊的数据类型。那么,语言模型和视觉模型对狗的共同理解达到了什么程度呢?

研究人员通过深入人工智能系统内部,研究它们如何表示场景和句子来探讨这些问题。越来越多的研究发现,不同的人工智能模型可以发展出相似的表示,即使它们使用不同的数据集或完全不同的数据类型进行训练。此外,一些研究表明,随着模型能力的增强,这些表示也变得越来越相似。在2024年的一篇论文中,麻省理工学院的四位人工智能研究人员认为,这些趋同的迹象并非偶然。他们的想法被称为"柏拉图式表示假说",在研究人员中引发了一场热烈的辩论,并催生了一系列后续研究。

该团队的假说得名于2400年前希腊哲学家柏拉图的一个寓言。在这个寓言中,被困在洞穴里的囚犯只能通过外部物体投射的影子来感知世界。柏拉图认为,我们都像那些不幸的囚犯。在他看来,我们在日常生活中遇到的物体,是存在于超越感官的某种超然领域中的理想"形式"的苍白影子。

柏拉图式表示假说则不那么抽象。在这个版本的比喻中,洞穴之外的是真实世界,它以数据流的形式投射出机器可读的影子。人工智能模型就是囚犯。麻省理工学院团队的主张是,仅接触到数据流的、非常不同的模型,正开始就数据背后的世界,趋同于一个共享的"柏拉图式表示"。

"为什么语言模型和视觉模型会一致?因为它们都是同一个世界的影子,"该论文的资深作者菲利普·伊索拉说。

并非所有人都信服。争论的主要焦点之一涉及应该关注哪些表示。你无法检查一个语言模型对每个可想象句子的内部表示,也无法检查视觉模型对每张图像的表示。那么,你如何决定哪些表示具有代表性?你在哪里寻找这些表示?又如何在截然不同的模型之间比较它们?研究人员不太可能很快就在柏拉图式表示假说上达成共识,但这并没有困扰到伊索拉。

"一半的研究人员说这显而易见,另一半则说这显然是错的,"他说。"我们对这种反应感到满意。"

相伴而存

如果人工智能研究人员在柏拉图的问题上无法达成一致,他们或许能在其前辈毕达哥拉斯那里找到更多共同点。毕达哥拉斯的哲学据称始于"万物皆数"的前提。这恰如其分地描述了驱动人工智能模型的神经网络。它们对单词或图片的表示,只是一长串数字,每个数字表示特定人工神经元的激活程度。

为了简化数学计算,研究人员通常孤立地关注神经网络的某一层,这类似于在特定时刻对特定区域的大脑活动进行快照。他们将这一层的神经元激活记录为一个称为向量的几何对象——一个在抽象空间中指向特定方向的箭头。现代人工智能模型的每一层都有成千上万个神经元,因此它们的表示是高维向量,无法直接可视化。但向量使得比较网络的表示变得容易:如果对应的向量指向相似的方向,那么两种表示就是相似的。

在单个AI模型内部,相似的输入往往具有相似的表示。例如,在语言模型中,表示"狗"这个词的向量将相对接近表示"宠物"、"吠叫"和"毛茸茸"的向量,而远离"柏拉图式的"和"糖蜜"。这是对60多年前英国语言学家约翰·鲁珀特·弗斯那句令人难忘的话的精确数学实现:"观其伴,知其词。"

那么,不同模型中的表示呢?直接比较来自不同网络的激活向量没有意义,但研究人员设计了间接方法来评估表示的相似性。一种流行的方法是采纳弗斯那句精辟格言的教训,衡量两个模型对某个输入的表示是否具有相同的"伙伴"。

假设你想比较两个语言模型如何表示动物词汇。首先,你会编译一个单词列表——狗、猫、狼、水母等等。然后,你将把这些词输入两个网络,并记录它们对每个词的表示。在每个网络中,这些表示将形成一组向量簇。然后你可以问:这两个向量簇的整体形状有多相似?

"这可以描述为测量相似性的相似性,"纽约大学的人工智能研究员伊利亚·苏霍卢茨基说。

在这个简单的例子中,你会预期两个模型之间存在一些相似性——例如,在两个网络中,"猫"向量可能都接近"狗"向量,而"水母"向量则指向不同的方向。但这两个向量簇可能看起来不完全一样。"狗"是更像"猫"还是更像"狼"?如果你的模型是在不同的数据集上训练的,或者基于不同的网络架构构建的,它们可能无法达成一致。

研究人员在2010年代中期开始用这种方法探索AI模型之间的表示相似性,发现不同模型对相同概念的表示通常是相似的,尽管远非完全相同。有趣的是,一些研究发现,更强大的模型似乎比更弱的模型在表示上具有更多的相似性。一篇2021年的论文将这种情况称为"安娜·卡列尼娜情景",致敬了托尔斯泰经典小说的开篇第一句。也许成功的人工智能模型都是相似的,而每个不成功的模型都以自己的方式失败。

那篇论文,就像许多关于表示相似性的早期工作一样,只关注计算机视觉,而当时这是人工智能研究中最热门的领域。强大的语言模型的出现即将改变这一点。对伊索拉来说,这也是一个机会,可以看看表示相似性到底能走多远。

趋同进化

柏拉图式表示假说论文的故事始于2023年初,这是人工智能研究人员经历动荡的时期。ChatGPT在几个月前发布,越来越明显的是,仅仅扩大人工智能模型的规模——用更多数据训练更大的神经网络——就能使它们在许多不同任务上表现得更好。但原因尚不清楚。

"人工智能研究领域的每个人都在经历一场存在主义的生活危机,"OpenAI研究员、当时是伊索拉实验室研究生的闵英·胡回忆道。他开始定期与伊索拉以及他们的同事布莱恩·张和童舟·王会面,讨论扩展规模如何影响内部表示。

想象一种情况:多个模型在相同数据上训练,更强的模型学习到更相似的表示。这不一定是因为这些模型正在创造更准确的世界映像。它们可能只是更擅长掌握训练数据集的特性。

现在考虑在不同数据集上训练的模型。如果它们的表示也趋同,那将是更有力的证据,表明模型越来越擅长掌握数据背后世界的共同特征。从完全不同的数据类型(如语言和视觉模型)学习的模型之间的趋同,将提供更有力的证据。

在他们最初对话一年后,伊索拉和他的同事们决定撰写一篇论文,回顾关于趋同表示的证据,并提出支持柏拉图式表示假说的论点。

那时,其他研究人员已经开始研究视觉和语言模型表示之间的相似性。胡进行了自己的实验,他在一个来自维基百科的带标题图片数据集上测试了一组五个视觉模型和11个不同大小的语言模型。他将图片输入视觉模型,将标题输入语言模型,然后比较两种类型中的向量簇。他观察到,随着模型变得更强大,表示相似性稳步增加。这正是柏拉图式表示假说所预测的。

寻找普遍性

当然,事情从来没那么简单。表示相似性的测量总是涉及大量可能影响结果的实验选择。你在每个网络中查看哪些层?一旦你从每个模型中得到一组向量簇,你使用众多数学方法中的哪一种来比较它们?首先,你测量哪些表示?

"如果你只测试一个数据集,你未必知道[结果]如何推广,"芝加哥大学研究员克里斯托弗·沃尔夫拉姆说,他研究过语言模型中的表示相似性。"谁知道如果你用一些更奇怪的数据集会怎么样?"

伊索拉承认,这个问题远未解决。这不是任何一篇论文能够解决的问题:原则上,你可以测量模型对任何图片或任何句子的表示。对他来说,模型确实表现出趋同的情况,比它们可能不趋同的情况更有说服力。

"科学的努力在于寻找普遍性,"伊索拉说。"我们可以研究模型不同或分歧的方式,但某种程度上,识别共性比这更有解释力。"

其他研究人员则认为,关注模型表示不同的地方更有成效。其中包括加州大学伯克利分校的研究员阿列克谢·埃夫罗斯,他曾是麻省理工学院团队四名成员中三位的导师。

"他们都是好朋友,而且都非常、非常聪明,"埃夫罗斯说。"我认为他们错了,但这就是科学的意义所在。"

埃夫罗斯指出,在胡使用的维基百科数据集中,图像和文本在设计上包含了非常相似的信息。但我们在世界上遇到的大多数数据都有难以转换的特征。"你去艺术博物馆而不是仅仅阅读目录是有原因的,"他说。

模型之间任何内在的相同性不必完美就能有用。去年夏天,研究人员设计了一种方法,可以将句子的内部表示从一个语言模型转换到另一个。如果语言和视觉模型的表示在某种程度上可以互换,那可能会导致新的训练方法,让模型从两种数据类型中学习。伊索拉和其他人在最近的一篇论文中探索了这样一种训练方案。

尽管有这些有希望的发展,其他研究人员认为,任何单一理论都不太可能完全捕捉现代人工智能模型的行为。

"你无法将一个万亿参数的系统简化为简单的解释,"不列颠哥伦比亚大学的人工智能研究员杰夫·克鲁恩说。"答案将是复杂的。"

英文来源:

Distinct AI Models Seem To Converge On How They Encode Reality
Introduction
Read a story about dogs, and you may remember it the next time you see one bounding through a park. That’s only possible because you have a unified concept of “dog” that isn’t tied to words or images alone. Bulldog or border collie, barking or getting its belly rubbed, a dog can be many things while still remaining a dog.
Artificial intelligence systems aren’t always so lucky. These systems learn by ingesting vast troves of data in a process called training. Often, that data is all of the same type — text for language models, images for computer vision systems, and more exotic kinds of data for systems designed to predict the odor of molecules or the structure of proteins. So to what extent do language models and vision models have a shared understanding of dogs?
Researchers investigate such questions by peering inside AI systems and studying how they represent scenes and sentences. A growing body of research has found that different AI models can develop similar representations, even if they’re trained using different datasets or entirely different data types. What’s more, a few studies have suggested that those representations are growing more similar as models grow more capable. In a 2024 paper, four AI researchers at the Massachusetts Institute of Technology argued that these hints of convergence are no fluke. Their idea, dubbed the Platonic representation hypothesis, has inspired a lively debate among researchers and a slew of follow-up work.
The team’s hypothesis gets its name from a 2,400-year-old allegory by the Greek philosopher Plato. In it, prisoners trapped inside a cave perceive the world only through shadows cast by outside objects. Plato maintained that we’re all like those unfortunate prisoners. The objects we encounter in everyday life, in his view, are pale shadows of ideal “forms” that reside in some transcendent realm beyond the reach of the senses.
The Platonic representation hypothesis is less abstract. In this version of the metaphor, what’s outside the cave is the real world, and it casts machine-readable shadows in the form of streams of data. AI models are the prisoners. The MIT team’s claim is that very different models, exposed only to the data streams, are beginning to converge on a shared “Platonic representation” of the world behind the data.
“Why do the language model and the vision model align? Because they’re both shadows of the same world,” said Phillip Isola, the senior author of the paper.
Not everyone is convinced. One of the main points of contention involves which representations to focus on. You can’t inspect a language model’s internal representation of every conceivable sentence, or a vision model’s representation of every image. So how do you decide which ones are, well, representative? Where do you look for the representations, and how do you compare them across very different models? It’s unlikely that researchers will reach a consensus on the Platonic representation hypothesis anytime soon, but that doesn’t bother Isola.
“Half the community says this is obvious, and the other half says this is obviously wrong,” he said. “We were happy with that response.”
The Company Being Kept
If AI researchers don’t agree on Plato, they might find more common ground with his predecessor Pythagoras, whose philosophy supposedly started from the premise “All is number.” That’s an apt description of the neural networks that power AI models. Their representations of words or pictures are just long lists of numbers, each indicating the degree of activation of a specific artificial neuron.
To simplify the math, researchers typically focus on a single layer of a neural network in isolation, which is akin to taking a snapshot of brain activity in a specific region at a specific moment in time. They write down the neuron activations in this layer as a geometric object called a vector — an arrow that points in a particular direction in an abstract space. Modern AI models have many thousands of neurons in each layer, so their representations are high-dimensional vectors that are impossible to visualize directly. But vectors make it easy to compare a network’s representations: Two representations are similar if the corresponding vectors point in similar directions.
Within a single AI model, similar inputs tend to have similar representations. In a language model, for instance, the vector representing the word “dog” will be relatively close to vectors representing “pet,” “bark,” and “furry,” and farther from “Platonic” and “molasses.” It’s a precise mathematical realization of an idea memorably expressed more than 60 years ago by the British linguist John Rupert Firth: “You shall know a word by the company it keeps.”
What about representations in different models? It doesn’t make sense to directly compare activation vectors from separate networks, but researchers have devised indirect ways to assess representational similarity. One popular approach is to embrace the lesson of Firth’s pithy quote and measure whether two models’ representations of an input keep the same company.
Imagine that you want to compare how two language models represent words for animals. First, you’ll compile a list of words — dog, cat, wolf, jellyfish, and so on. You’ll then feed these words into both networks and record their representations of each word. In each network, the representations will form a cluster of vectors. You can then ask: How similar are the overall shapes of the two clusters?
“It can kind of be described as measuring the similarity of similarities,” said Ilia Sucholutsky, an AI researcher at New York University.
In this simple example, you’d expect some similarity between the two models — the “cat” vector would probably be close to the “dog” vector in both networks, for instance, and the “jellyfish” vector would point in a different direction. But the two clusters probably won’t look exactly the same. Is “dog” more like “cat” than “wolf,” or vice versa? If your models were trained on different datasets, or built on different network architectures, they might not agree.
Researchers began to explore representational similarity among AI models with this approach in the mid-2010s and found that different models’ representations of the same concepts were often similar, though far from identical. Intriguingly, a few studies found that more powerful models seemed to have more similarities in their representations than weaker ones. One 2021 paper dubbed this the “Anna Karenina scenario,” a nod to the opening line of the classic Tolstoy novel. Perhaps successful AI models are all alike, and every unsuccessful model is unsuccessful in its own way.
That paper, like much of the early work on representational similarity, focused only on computer vision, which was then the most popular branch of AI research. The advent of powerful language models was about to change that. For Isola, it was also an opportunity to see just how far representational similarity could go.
Convergent Evolution
The story of the Platonic representation hypothesis paper began in early 2023, a turbulent time for AI researchers. ChatGPT had been released a few months before, and it was increasingly clear that simply scaling up AI models — training larger neural networks on more data — made them better at many different tasks. But it was unclear why.
“Everyone in AI research was going through an existential life crisis,” said Minyoung Huh, an OpenAI researcher who was a graduate student in Isola’s lab at the time. He began meeting regularly with Isola and their colleagues Brian Cheung and Tongzhou Wang to discuss how scaling might affect internal representations.
From top right: Anna Decker; @by.h_official; Jiaxi Chen; Kris Brewer
Imagine a case where multiple models are trained on the same data, and the stronger models learn more similar representations. This isn’t necessarily because these models are creating a more accurate likeness of the world. They could just be better at grasping quirks of the training dataset.
Now consider models trained on different datasets. If their representations also converge, that would be more compelling evidence that models are getting better at grasping shared features of the world behind the data. Convergence between models that learned from entirely different data types, such as language and vision models, would provide even stronger evidence.
A year after their initial conversations, Isola and his colleagues decided to write a paper reviewing the evidence for convergent representations and presenting an argument for the Platonic representation hypothesis.
By then, other researchers had started studying similarities between vision and language model representations. Huh conducted his own experiment, in which he tested a set of five vision models and 11 language models of varying sizes on a dataset of captioned pictures from Wikipedia. He would feed the pictures into the vision models and the captions into the language models, and then compare clusters of vectors in the two types. He observed a steady increase in representational similarity as models became more powerful. It was exactly what the Platonic representation hypothesis predicted.
Find the Universals
Of course, it’s never so simple. Measurements of representational similarity invariably involve a host of experimental choices that can affect the outcome. Which layers do you look at in each network? Once you have a cluster of vectors from each model, which of the many mathematical methods do you use to compare them? And which representations do you measure in the first place?
“If you only test one dataset, you don’t necessarily know how [the result] generalizes,” said Christopher Wolfram, a researcher at the University of Chicago who has studied representational similarity in language models. “Who knows what would happen if you did some weirder dataset?”
Isola acknowledged that the issue is far from settled. It’s not a question that any one paper can resolve: In principle, you can measure models’ representations of any picture or any sentence. To him, cases where models do exhibit convergence are more compelling than cases where they may not.
Peter DaSilva for Quanta Magazine
“The endeavor of science is to find the universals,” Isola said. “We could study the ways in which models are different or disagree, but that somehow has less explanatory power than identifying the commonalities.”
Other researchers argue that it’s more productive to focus on where models’ representations differ. Among them is Alexei Efros, a researcher at the University of California, Berkeley, who has been an adviser to three of the four members of the MIT team.
“They’re all good friends and they’re all very, very smart people,” Efros said. “I think they’re wrong, but that’s what science is about.”
Efros noted that in the Wikipedia dataset that Huh used, the images and text contained very similar information by design. But most data we encounter in the world has features that resist translation. “There is a reason why you go to an art museum instead of just reading the catalog,” he said.
Any intrinsic sameness across models doesn’t have to be perfect to be useful. Last summer, researchers devised a method to translate internal representations of sentences from one language model to another. And if language and vision model representations are to some extent interchangeable, that could lead to new ways to train models that learn from both data types. Isola and others explored one such training scheme in a recent paper.
Despite these promising developments, other researchers think it’s unlikely that any single theory will fully capture the behavior of modern AI models.
“You can’t reduce a trillion-parameter system to simple explanations,” said Jeff Clune, an AI researcher at the University of British Columbia. “The answers are going to be complicated.”

quanta

文章目录


    扫描二维码,在手机上阅读