«

人工智能代理并非理想的自由职业者。

qimuai 发布于 阅读:3 一手编译


人工智能代理并非理想的自由职业者。

内容来源:https://www.wired.com/story/ai-agents-are-terrible-freelance-workers/

内容总结:

近期一项实验研究为“人工智能将大规模取代办公室职员”的流行观点提供了不同视角。数据显示,当前最先进的人工智能代理在模拟自由职业工作中表现远未达到商用水平,仅能完成不到3%的任务量,在总计近14.4万美元的模拟薪酬中仅赚取1810美元。

这项由数据标注公司Scale AI与非营利组织人工智能安全中心(CAIS)联合开发的“远程劳动指数”基准测试,首次系统评估了前沿AI模型在经济价值工作领域的自动化能力。在参与测试的AI工具中,中国初创公司曼纽斯(Manus)开发的同名智能体表现最佳,其后依次为xAI的Grok、Anthropic的Claude、OpenAI的ChatGPT和谷歌的Gemini。

CAIS主任丹·亨德里克斯指出,尽管部分AI智能体近一年来取得显著进步,但并不意味着这种发展速度会持续。该测试通过认证的Upwork自由职业者设计了一系列涵盖平面设计、视频编辑、游戏开发及数据抓取等行政事务的模拟任务,并为每个任务配备工作描述、所需文件目录及人类完成范例。

研究显示,当前AI模型在编码、数学和逻辑推理方面虽有提升,但在使用多样化工具和执行多步骤复杂任务时仍存在明显短板。“它们缺乏长期记忆存储能力,无法从经验中持续学习,更不能像人类那样在工作中掌握新技能。”亨德里克斯解释道。

这一结论与OpenAI于9月发布的GDPval经济价值基准形成对比——该评估认为GPT-5等前沿模型在220项办公室任务中已接近人类水平。Scale AI研究总监刘冰表示:“关于AI与就业的争论已持续多年,但多数讨论仍停留在假设或理论层面。”

研究人员承认,新基准并非衡量AI经济影响的完美标尺。许多职业包含未被测试覆盖的工作内容,且实际工作中自由职业者很可能通过使用AI工具提升生产效率。然而,AI取代就业岗位的现实压力正在显现:亚马逊本周宣布裁员1.4万人,部分归因于生成式人工智能的快速发展。该公司人力资源与技术高级副总裁贝丝·加莱蒂在公开备忘录中称“这代AI是自互联网以来最具变革性的技术”。

但若以“远程劳动指数”作为参照,当前AI技术尚难直接填补这些空缺岗位。这场人机协作的变革征程,依然任重道远。

中文翻译:

一项挑战"人工智能将大规模取代办公室职员"观点的实验表明,即便最先进的人工智能代理在在线自由职业领域也显得力不从心。数据标注公司Scale AI与非营利组织人工智能安全中心(CAIS)联合研发的新基准"远程劳动指数"旨在衡量前沿AI模型实现经济价值工作自动化的能力。

研究人员让多款领先的AI代理完成一系列模拟自由职业任务,发现即便最优异者也只能完成不到3%的工作量,在143,991美元总酬金中仅赚得1,810美元。评估显示,中国初创企业Manus开发的同名代理表现最为出色,其后依次为xAI的Grok、Anthropic的Claude、OpenAI的ChatGPT及谷歌的Gemini。

CAIS主任丹·亨德里克斯指出:"这项研究应能更准确地反映AI能力的现状。"他补充说明,虽然部分AI代理在过去一年左右取得显著进步,但这不意味着进步速度将持续不变。

AI技术的飞跃引发了对智能超越人类、取代大量岗位的猜测。Anthropic首席执行官达里奥·阿莫迪三月曾预测90%的编程工作将在数月内实现自动化。但历史上对AI取代岗位的预测多有偏差,如曾预言放射科医生将很快被AI算法取代。

研究人员通过经认证的Upwork自由职业者设计了一系列任务,涵盖平面设计、视频编辑、游戏开发及数据抓取等行政工作。每个任务均配备工作描述、所需文件目录及人类完成的成品样本。

亨德里克斯表示,尽管AI模型近年提升了编程、数学与逻辑推理能力,但在工具运用及多步骤复杂任务执行方面仍存缺陷。"它们缺乏长期记忆存储,无法从经验中持续学习,更不能像人类那样在工作中掌握技能。"

该研究对OpenAI九月发布的经济价值衡量基准GDPval形成对照。GDPval声称GPT-5等前沿AI模型在220项办公室任务中已接近人类水平,OpenAI对此未予置评。Scale AI研究总监刘冰指出:"关于AI与就业的多年争论多数停留在假设或理论层面。"

刘冰与亨德里克斯承认新基准并非衡量AI经济影响的完美标尺,许多职业包含未被涵盖的工作内容。现实中自由职业者更可能将AI作为提升生产力的工具。但AI取代岗位的趋势确实在加速:亚马逊本周宣布裁员1.4万人,部分归因于生成式AI的迅猛发展。该公司人力资源与技术高级副总裁贝丝·加莱蒂在公开备忘录中称"这代AI是自互联网以来最具变革性的技术"。

不过若以远程劳动指数为参照,AI恐怕尚难填补这些空缺岗位。

您是否担忧被AI取代工作岗位?欢迎发送邮件至ailab@wired分享观点。
本文节选自威尔·奈特《AI实验室》时事通讯,往期内容请点击此处查阅。

英文来源:

Even the best artificial intelligence agents are fairly hopeless at online freelance work, according to an experiment that challenges the idea of AI replacing office workers en masse.
The Remote Labor Index, a new benchmark developed by researchers at data annotation company Scale AI and the Center for AI Safety (CAIS), a nonprofit, measures the ability of frontier AI models to automate economically valuable work.
The researchers gave several leading AI agents a range of simulated freelance work and found that even the best could perform less than 3 percent of the work, earning $1,810 out of a possible $143,991. The researchers looked at several tools and found the most capable to be Manus from a Chinese startup of the same name, followed by Grok from xAI, Claude from Anthropic, ChatGPT from OpenAI, and Gemini from Google.
“I should hope this gives much more accurate impressions as to what's going on with AI capabilities,” says Dan Hendrycks, director of CAIS. He adds that while some agents have improved significantly over the past year or so, that does not mean that this will continue at the same rate.
Spectacular AI advances have led to speculation about AI soon surpassing human intelligence and replacing vast numbers of workers. In March, Dario Amodei, CEO of Anthropic, suggested that 90 percent of coding work would be automated within a matter of months.
Previous waves of AI have inspired misplaced predictions about job displacement, for example concerning the imminent replacement of radiologists with AI algorithms.
The researchers generated a range of freelance tasks through verified Upwork workers. The tasks span a range of work including graphic design, video editing, game development, and administrative chores like scraping data. They combined a description of each job with a directory of files needed to perform the work and an example of a finished project produced by a human.
Hendrycks says that while AI models have gotten better at coding, math, and logical reasoning in recent years, they still struggle to use different tools and to perform complex tasks that involve numerous steps. “They don't have long-term memory storage and can't do continual learning from experiences. They can't pick up skills on the job like humans,” he says.
The analysis offers a counterpoint to a benchmark of economic work offered in September by OpenAI called GDPval, which purports to measure economically valuable work. According to GDPval, frontier AI models such as GPT-5 are approaching human abilities on 220 tasks across a range of office jobs. OpenAI did not provide a comment.
“We have debated AI and jobs for years, but most of it has been hypothetical or theoretical,” adds Bing Liu, director of research at Scale AI.
Liu and Hendrycks concede that the new benchmark is not a perfect yardstick for AI’s economic impact. Many professions include tasks not covered by the measure. In reality, many freelancers are also likely to use AI as a tool in a way that amplifies their productivity.
The idea that AI is already taking jobs is gaining momentum however. This week Amazon announced that it would cut 14,000 jobs in a move that it partly blamed on the rapid rise of generative artificial intelligence. “This generation of AI is the most transformative technology we’ve seen since the Internet,” Beth Galetti, senior vice president of people experience and technology at Amazon, wrote in a publicly shared memo. “It's enabling companies to innovate much faster than ever before (in existing market segments and altogether new ones).”
If the Remote Labor Index is any indication, however, AI is unlikely to be stepping into any of these vacated roles.
Are you worried about AI taking your job? Let me know by sending an email to ailab@wired.com.
This is an edition of Will Knight’s AI Lab newsletter. Read previous newsletters here.

连线杂志AI最前沿

文章目录


    扫描二维码,在手机上阅读