«

人工智能首次在语言分析上媲美人类专家。

qimuai 发布于 阅读:19 一手编译


人工智能首次在语言分析上媲美人类专家。

内容来源:https://www.wired.com/story/in-a-first-ai-models-analyze-language-as-well-as-a-human-expert/

内容总结:

人工智能展现类人语言分析能力,挑战人类语言独特性认知

长久以来,语言能力被视为人类区别于其他生物与机器的核心特质。然而,近期一项突破性研究显示,先进的大型语言模型已能像专业语言学者一样,对语言结构进行深度分析与推理,这促使科学界重新审视“人类语言是否独一无二”这一根本问题。

加州大学伯克利分校语言学家加斯珀·贝古什及其合作团队在最新研究中,对多个主流大语言模型进行了一系列严格的语言学测试。测试内容涵盖句法树分析、歧义消解、递归结构识别乃至虚构语言的音系规则归纳——这些任务通常需要人类凭借对语言系统的深层理解才能完成。

令人惊讶的是,OpenAI开发的o1模型在多项测试中表现突出。它不仅能够精准解析如“古人所尊崇的天文学曾与占星术不分家”这类包含复杂嵌套结构(递归)的句子,绘制出专业的句法树状图,还能识别“罗文喂了他的宠物鸡”这类句子中“鸡作为宠物”与“鸡肉作为食物”的语义歧义,并分别用不同树状图加以呈现。更值得注意的是,在音系学测试中,面对研究者完全虚构的微型语言,o1能成功推断出其发音规则,表明其具备了从零开始归纳语言规律的能力。

这一发现直接挑战了以著名语言学家诺姆·乔姆斯基为代表的部分观点,即认为人工智能仅能通过海量数据“浸泡”来模仿语言使用,而无法进行真正的语言分析。未参与该研究的耶鲁大学计算语言学家汤姆·麦考伊评价称,这项工作“非常及时且重要”,为评估人工智能是否具备类人推理能力提供了理想的试金石。

尽管当前模型仍受限于其“根据上文预测下文”的核心训练目标,在创造性和提出全新语言理论方面尚未超越人类,但研究结果表明,许多曾被认为是人类语言专属的复杂特性,正被人工智能逐步掌握。卡内基·梅隆大学计算语言学家戴维·莫滕森认为,鉴于技术发展的速度,构建出能从更少数据中进行更创造性概括、最终在语言理解上超越人类的模型,“只是一个时间问题”。

贝古什总结道:“这削弱了我们过去所认为的人类独特性。”随着人工智能在语言核心领域不断取得进展,我们或许需要以更开放、更动态的视角,重新定义人类智能与机器智能的边界。

中文翻译:

人类拥有诸多能力,其中哪些是唯人类独有的特质?至少从亚里士多德提出"人是拥有语言的动物"起,语言就始终是热门候选。即便ChatGPT等大型语言模型已能表面模仿人类日常对话,研究者仍想探究:人类语言中是否存在某些特质,是其他动物的交流系统或人工智能设备完全无法企及的?

研究者尤其关注语言模型能在多大程度上对语言本身进行推理。部分语言学界学者认为,语言模型不仅不具备推理能力,而且永远不可能拥有。著名语言学家诺姆·乔姆斯基与两位合著者2023年在《纽约时报》撰文时总结道:"对语言的正确解释极为复杂,仅靠浸泡在海量数据中无法习得。"这些学者主张,AI模型或许擅长使用语言,却无法以精妙方式分析语言。

加州大学伯克利分校语言学家加什佩尔·贝古什、刚获伯克利语言学博士学位的马克西米利安·达布科夫斯基,以及罗格斯大学的瑞安·罗兹在近期论文中对此观点提出挑战。研究者让多个大语言模型经受系列语言学测试——其中一项要求模型归纳虚构语言的规则。虽然多数模型未能像人类那样解析语言规则,但有一个模型展现出远超预期的惊人能力:它能像语言学研究生那样分析语言——绘制句子树状图、辨析多重歧义、运用递归等复杂语言特征。贝古什表示,这一发现"挑战了我们对AI能力的认知"。

未参与该研究的耶鲁大学计算语言学家汤姆·麦考伊评价,这项新研究既及时又"至关重要"。他补充道:"随着社会日益依赖这项技术,了解其成功与失败的边界变得愈发重要。语言分析正是评估这些模型在何种程度上能像人类般推理的理想试验场。"

无限复杂性

对语言模型进行严格语言学测试的难点在于,需确保它们并非预先知晓答案。这些系统通常基于海量书面信息训练——不仅涵盖数十种甚至上百种语言的互联网内容,还包括语言学教科书等资料。理论上,模型可能只是机械记忆并复现训练时输入的信息。

为避免这种情况,贝古什团队设计了包含四部分内容的语言学测试。其中三项要求模型使用树状图分析特殊构造的句子——这种由乔姆斯基1957年里程碑著作《句法结构》首创的图表,能将句子分解为名词短语和动词短语,再进一步细分为名词、动词、形容词、副词、介词、连词等成分。

测试重点之一是递归能力——即将短语嵌套于短语之中。"天空是蓝色的"是简单英文句子;"简说天空是蓝色的"则将原句嵌入稍复杂的结构中。关键在于,这种递归过程可以无限延续:"玛丽亚想知道萨姆是否知道奥马尔听说简说天空是蓝色的"虽然拗口,但仍是语法正确的递归句。

乔姆斯基等学者认为,递归是人类语言的决定性特征之一——甚至可能是人类思维的决定性特征。语言学家主张,正是这种无限潜力使人类语言能够用有限词汇和规则生成无限可能的句子。迄今尚无确凿证据表明其他动物能复杂运用递归。

递归可出现在句首或句尾,但最难掌握的是中心嵌套结构——例如从"猫死了"变为"狗咬的猫死了"。

贝古什的测试向语言模型输入30个包含复杂递归结构的原创句子。例如:"我们尊崇的古人研究的天文学与占星术并非分离。"借助句法树状图,OpenAI的o1模型成功解析出该句结构如下:

该模型更进一步,为句子添加了另一层递归:

贝古什等人未曾预料到,这项研究竟会发现具备更高层次"元语言"能力的AI模型——用他的话说,即"不仅能使用语言,还能思考语言"。

未参与研究的卡内基梅隆大学计算语言学家戴维·莫滕森指出,这正是该论文"引人注目"的方面之一。关于语言模型是否仅能预测句中下一个词(或语言标记)——这与人类对语言的深层理解有本质区别——一直存在争议。"语言学领域有人认为大语言模型并非真正在处理语言,"他说,"这项研究似乎推翻了那些论断。"

语义何解?

麦考伊对o1模型的整体表现感到惊讶,尤其对其识别歧义的能力印象深刻,他指出这"众所周知是计算语言模型难以捕捉的领域"。人类"拥有大量常识性知识,能帮助我们排除歧义,但计算机很难具备那种程度的常识"。

例如"罗文喂了他的宠物鸡"这句话,既可指罗文饲养的宠物鸡,也可描述他给(更传统的)动物伙伴喂食鸡肉。o1模型准确生成了两种不同的句法树,分别对应两种解读。

研究者还开展了音系学相关实验——即研究语音模式以及最小语音单位音位的组织方式。为像母语者般流利说话,人们会遵循通过实践习得(即便从未被明确教授)的音系规则。例如英语中,在以"g"结尾的单词后加"s"会产生"z"音(如"dogs"),但在以"t"结尾的单词后加"s"则更接近标准"s"音(如"cats")。

在音系学任务中,团队按贝古什的说法创造了30种新微型语言,以探究大语言模型能否在零先验知识的情况下正确推断音系规则。每种语言包含40个虚构词汇。以下是其中一种语言的示例词汇:

随后他们要求语言模型分析每种语言的音系过程。对于该语言,o1准确描述道:"当元音紧接在既是浊音又是阻塞音(如'top'中的't'这类通过阻碍气流形成的音)的辅音之后时,会变为气化元音。"

这些语言均为新创,o1绝无可能在训练中接触过。莫滕森坦言:"我未曾预料结果会如此有力且令人震撼。"

是否人类独有?

这些语言模型的潜力边界何在?它们是否会随着计算能力、复杂性和训练数据的无限增长而无止境提升?抑或人类语言的某些特征,是我们这个物种进化过程中形成的特有产物?

近期研究表明,这些模型原则上能进行复杂的语言分析。但尚无模型提出原创性见解,也未能揭示我们此前未知的语言知识。

如果进步仅取决于计算能力和训练数据的增加,贝古什认为语言模型终将在语言技能上超越人类。莫滕森指出,现有模型存在局限:"它们被训练执行特定任务:根据已有的词元(或单词)序列预测下一个词元。受训练方式所限,它们在泛化能力上存在困难。"

但鉴于近期进展,莫滕森认为语言模型最终展现出优于人类语言理解能力只是时间问题:"我们终将能构建出更具创造性、以更少数据实现更好泛化的模型。"

贝古什总结道,新研究正持续"侵蚀"那些曾被视为人类语言专属领域的特性,"我们似乎不像曾经自以为的那样独特"。

本文经授权转载自《量子杂志》(Quanta Magazine),该刊是西蒙斯基金会旗下独立编辑出版物,旨在通过报道数学、物理与生命科学领域的研究进展与趋势,提升公众对科学的理解。

英文来源:

The original version of this story appeared in Quanta Magazine.
Among the myriad abilities that humans possess, which ones are uniquely human? Language has been a top candidate at least since Aristotle, who wrote that humanity was “the animal that has language.” Even as large language models such as ChatGPT superficially replicate ordinary speech, researchers want to know if there are specific aspects of human language that simply have no parallels in the communication systems of other animals or artificially intelligent devices.
In particular, researchers have been exploring the extent to which language models can reason about language itself. For some in the linguistic community, language models not only don’t have reasoning abilities, they can’t. This view was summed up by Noam Chomsky, a prominent linguist, and two coauthors in 2023, when they wrote in The New York Times that “the correct explanations of language are complicated and cannot be learned just by marinating in big data.” AI models may be adept at using language, these researchers argued, but they’re not capable of analyzing language in a sophisticated way.
That view was challenged in a recent paper by Gašper Beguš, a linguist at the University of California, Berkeley; Maksymilian Dąbkowski, who recently received his doctorate in linguistics at Berkeley; and Ryan Rhodes of Rutgers University. The researchers put a number of large language models, or LLMs, through a gamut of linguistic tests—including, in one case, having the LLM generalize the rules of a made-up language. While most of the LLMs failed to parse linguistic rules in the way that humans are able to, one had impressive abilities that greatly exceeded expectations. It was able to analyze language in much the same way a graduate student in linguistics would—diagramming sentences, resolving multiple ambiguous meanings, and making use of complicated linguistic features such as recursion. This finding, Beguš said, “challenges our understanding of what AI can do.”
This new work is both timely and “very important,” said Tom McCoy, a computational linguist at Yale University who was not involved with the research. “As society becomes more dependent on this technology, it’s increasingly important to understand where it can succeed and where it can fail.” Linguistic analysis, he added, is the ideal test bed for evaluating the degree to which these language models can reason like humans.
Infinite Complexity
One challenge of giving language models a rigorous linguistic test is making sure they don’t already know the answers. These systems are typically trained on huge amounts of written information—not just the bulk of the internet, in dozens if not hundreds of languages, but also things like linguistics textbooks. The models could, in theory, simply memorize and regurgitate the information that they’ve been fed during training.
To avoid this, Beguš and his colleagues created a linguistic test in four parts. Three of the four parts involved asking the model to analyze specially crafted sentences using tree diagrams, which were first introduced in Chomsky’s landmark 1957 book, Syntactic Structures. These diagrams break sentences down into noun phrases and verb phrases and then further subdivide them into nouns, verbs, adjectives, adverbs, prepositions, conjunctions and so forth.
One part of the test focused on recursion—the ability to embed phrases within phrases. “The sky is blue” is a simple English sentence. “Jane said that the sky is blue” embeds the original sentence in a slightly more complex one. Importantly, this process of recursion can go on forever: “Maria wondered if Sam knew that Omar heard that Jane said that the sky is blue” is also a grammatically correct, if awkward, recursive sentence.
Recursion has been called one of the defining characteristics of human language by Chomsky and others—and indeed, perhaps a defining characteristic of the human mind. Linguists have argued that its limitless potential is what gives human languages their ability to generate an infinite number of possible sentences out of a finite vocabulary and a finite set of rules. So far, there’s no convincing evidence that other animals can use recursion in a sophisticated way.
Recursion can occur at the beginning or end of a sentence, but the form that is most challenging to master, called center embedding, takes place in the middle—for instance, going from “the cat died” to “the cat the dog bit died.”
Beguš’ test fed the language models 30 original sentences that featured tricky examples of recursion. For example: “The astronomy the ancients we revere studied was not separate from astrology.” Using a syntactic tree, one of the language models—OpenAI’s o1—was able to determine that the sentence was structured like so:
The model then went further and added another layer of recursion to the sentence:
Beguš, among others, didn’t anticipate that this study would come across an AI model with a higher-level “metalinguistic” capacity–“the ability not just to use a language but to think about language,” as he put it.
That is one of the “attention-getting” aspects of their paper, said David Mortensen, a computational linguist at Carnegie Mellon University who was not involved with the work. There has been debate about whether language models are just predicting the next word (or linguistic token) in a sentence, which is qualitatively different from the deep understanding of language that humans have. “Some people in linguistics have said that LLMs are not really doing language,” he said. “This looks like an invalidation of those claims.”
What Do You Mean?
McCoy was surprised by o1’s performance in general, particularly by its ability to recognize ambiguity, which is “famously a difficult thing for computational models of language to capture,” he said. Humans “have a lot of commonsense knowledge that enables us to rule out the ambiguity. But it’s difficult for computers to have that level of commonsense knowledge.”
A sentence such as “Rowan fed his pet chicken” could be describing the chicken that Rowan keeps as a pet, or it could be describing the meal of chicken meat that he gave to his (presumably more traditional) animal companion. The o1 model correctly produced two different syntactic trees, one that corresponds to the first interpretation of the sentence and one that corresponds to the latter.
The researchers also carried out experiments related to phonology—the study of the pattern of sounds and of the way the smallest units of sound, called phonemes, are organized. To speak fluently, like a native speaker, people follow phonological rules that they might have picked up through practice without ever having been explicitly taught. In English, for example, adding an “s” to a word that ends in a “g” creates a “z” sound, as in “dogs.” But an “s” added to a word ending in “t” sounds more like a standard “s,” as in “cats.”
In the phonology task, the group made up 30 new mini-languages, as Beguš called them, to find out whether the LLMs could correctly infer the phonological rules without any prior knowledge. Each language consisted of 40 made-up words. Here are some example words from one of the languages:
They then asked the language models to analyze the phonological processes of each language. For this language, o1 correctly wrote that “a vowel becomes a breathy vowel when it is immediately preceded by a consonant that is both voiced and an obstruent”—a sound formed by restricting airflow, like the “t” in “top.”
The languages were newly invented, so there’s no way that o1 could have been exposed to them during its training. “I was not expecting the results to be as strong or as impressive as they were,” Mortensen said.
Uniquely Human or Not?
How far can these language models go? Will they get better, without limit, simply by getting bigger—layering on more computing power, more complexity and more training data? Or are some of the characteristics of human language the result of an evolutionary process that is limited to our species?
The recent results show that these models can, in principle, do sophisticated linguistic analysis. But no model has yet come up with anything original, nor has it taught us something about language we didn’t know before.
If improvement is just a matter of increasing both computational power and the training data, then Beguš thinks that language models will eventually surpass us in language skills. Mortensen said that current models are somewhat limited. “They’re trained to do something very specific: given a history of tokens [or words], to predict the next token,” he said. “They have some trouble generalizing by virtue of the way they’re trained.”
But in view of recent progress, Mortensen said he doesn’t see why language models won’t eventually demonstrate an understanding of our language that’s better than our own. “It’s only a matter of time before we are able to build models that generalize better from less data in a way that is more creative.”
The new results show a steady “chipping away” at properties that had been regarded as the exclusive domain of human language, Beguš said. “It appears that we’re less unique than we previously thought we were.”
Original story reprinted with permission from Quanta Magazine, an editorially independent publication of the Simons Foundation whose mission is to enhance public understanding of science by covering research developments and trends in mathematics and the physical and life sciences.

连线杂志AI最前沿

文章目录


    扫描二维码,在手机上阅读