这个机器人仅需一个AI模型,就能掌握类人动作。
内容来源:https://www.wired.com/story/this-humanoid-robot-is-showing-signs-of-generalized-learning/
内容总结:
人形机器人Atlas因跑酷和舞蹈动作而闻名,如今它展示了一项更为精妙且意义重大的新能力:通过单一人工智能模型同时实现行走和抓取功能。这一突破由波士顿动力公司与丰田研究院(TRI)联合研发,通过构建通用型行为模型(LBM),使机器人能够像人类一样协调四肢动作。
与传统模式不同,新模型摒弃了分别控制肢体运动与抓取动作的多模型架构,转而采用统一框架处理视觉传感器数据、本体感知信息和语言指令。研究人员通过远程操控、仿真演示及示例视频对模型进行训练,使机器人展现出更接近人类的自然动作——例如在俯身拾取低处物品时自主调整步伐以保持平衡。更引人注目的是,该系统展现出未经训练的"涌现能力",如物品意外掉落时会主动弯腰捡起。
这一突破被业内视为机器人领域的"ChatGPT时刻"。研究者指出,正如大语言模型通过海量数据训练获得意外能力,机器人也有望通过类似路径发展出多样化技能。目前Atlas及同类机器人已初步展现出通用化学习潜力,例如完成蔬菜切割、咖啡豆清扫等任务。
尽管业界对"涌现能力"持审慎态度——有专家指出某些看似新颖的技能可能源于训练数据的隐性记忆,但丰田研究院此前公开透明的研究记录为技术可靠性提供了支撑。关于"扩大数据规模能否持续激发涌现能力"仍是开放命题,但工程方法的优化已被确认为关键推动力。
研究人员相信机器人技术正临近拐点,未来或将在复杂环境中自主完成焊接管道、制作咖啡等任务。这项技术突破不仅实现了四肢协同的技术飞跃,更预示着通用机器人迈向现实应用的新阶段。
中文翻译:
以跑酷和舞蹈动作闻名的人形机器人阿特拉斯(Atlas)最近开始展示一些更微妙但也更重要的东西:它已经学会使用单一人工智能模型来行走和抓取东西。
更重要的是,该机器人的单一学习模型展现出一些诱人的"涌现"能力,例如在没有经过专门训练的情况下,能凭本能拾起掉落的物品。
制造阿特拉斯的波士顿动力公司与丰田研究院(TRI)合作开发了一个通用模型,该模型通过一系列示例动作学习控制四肢。这与常规做法不同:具备学习能力的机器人通常需要一个模型来控制行走跳跃,另一个模型来控制抓取物品。
"对模型而言,双脚在某种程度上就像是额外的手,"领导这项研究的丰田研究院兼麻省理工学院机器人专家拉斯·泰德拉基表示,"而且确实行之有效,这太不可思议了。"
控制阿特拉斯的单一模型接收三大信息源:机器人视觉传感器的图像、身体传感器的本体感觉数据(使其持续感知自身位置和运动),以及与不同动作相关的语言指令。通过远程操控、仿真模拟和示范视频的组合,模型学习了阿特拉斯执行各类任务的示例。由此产生的大型行为模型(LBM)能以更自然的方式控制人形机器人。例如当从货箱中拣选物品时,机器人会像人类一样调整腿部姿势以保持低位取物时的平衡。该模型还展现出某些基本的涌现行为,比如当物品掉落时,机器人会弯腰拾取,这种"恢复"技能并非事先预设。
这比表面看起来更令人兴奋。正如基于海量文本数据训练的大型语言模型(LLM)有时会涌现出编写代码等意外能力,机器人专家希望类似策略能让机器人在执行任务时展现出大量令人惊喜的新技能。
泰德拉基表示,阿特拉斯和其他机器人正开始显现出通用化学习的迹象。其实验室还在试验训练多种机械臂执行不同任务,包括切蔬菜和清扫洒落的咖啡豆。
尽管仍有大量工作待完成,但泰德拉基指出,目前所有证据都表明适用于大型语言模型的方法对机器人同样有效。"我认为这正在改变一切,"他说。
近期评估机器人技术进展变得更具挑战性——视频中展示的商用人形机器人似乎能轻松完成装载冰箱或倒垃圾等复杂家务。但YouTube视频可能具有误导性,这些人形机器人往往采用远程操控、预先精密编程,或在高度受控环境中针对单一任务进行训练。
阿特拉斯的新进展标志着机器人技术正开始经历等效突破,这种突破最终在生成式AI领域催生了带来ChatGPT的通用语言模型。此类进步或将使机器人能在各种复杂环境中自如运作,从焊接管道到制作浓缩咖啡,无需大量再训练即可快速掌握新技能。
"这无疑是重大进步,"加州大学伯克利分校机器人专家肯·戈德堡评价道,"四肢协调是关键突破。"这位接受TRI部分资助但未参与阿特拉斯项目的专家同时提醒,应对机器人涌现行为持审慎态度。正如大型语言模型的惊人能力有时可追溯至训练数据中的示例,机器人展示的技能可能并不像表面那么新颖。他补充说,详细了解机器人在实验中成功频率和失败模式至关重要。TRI此前在LBM研究方面保持透明,预计将公布新模型的更多数据。
单纯扩大训练数据量是否就能释放更多涌现行为,仍是开放命题。在五月亚特兰大举行的机器人与自动化国际会议上,戈德堡等学者强调工程方法在未来发展中也至关重要。
泰德拉基则坚信机器人技术正临近拐点——这将推动人形机器人在真实场景中的广泛应用。"我们需要让这些机器人走出实验室,开始解决实际问题,"他表示。
您如何看待阿特拉斯的的新技能?是否认为机器人技术将迎来ChatGPT式的突破?欢迎通过ailab@wired邮箱分享见解。
本文节选自威尔·奈特《AI实验室》时事通讯,过往内容可点击此处查阅。
英文来源:
Atlas, the humanoid robot famous for its parkour and dance routines, has recently begun demonstrating something altogether more subtle but also a lot more significant: It has learned to both walk and grab things using a single artificial intelligence model.
What is more, the robot’s single learning model is showing some tantalizingly “emergent” skills, like the ability to instinctively recover when it drops an item without having been trained to do so.
Boston Dynamics, the company that makes Atlas, together with the Toyota Research Institute (TRI), developed a generalist model that learns to control both arms and legs from a range of example actions. This is different from the norm: robots equipped with the ability to learn would usually rely on one model to walk and jump and another to grasp items.
“The feet are just like additional hands, in some sense, to the model,” says Russ Tedrake, a roboticist at the Toyota Research Institute and the Massachusetts Institute of Technology, who led the current work. “And it works, which is just awesome.”
The single model used to control Atlas is fed images from the robot’s visual sensors, proprioception data from bodily sensors (which give it a continuous sense of its position and movement), and language prompts related to different actions. The model is shown examples of Atlas performing a range of tasks using a mix of teleoperation, simulation, and demonstration videos. The resulting large behavior model (LBM) controls the humanoid robot in a more natural-seeming way. When picking items out of a bin, for example, the robot will reposition its legs much like a person to rebalance when reaching low down. The LBM also exhibits some basic emergent behavior. When the robot drops an item, for instance, it demonstrates a new “recovery” skill by bending down to pick it up.
This is a lot more exciting than it might seem. Just as large language models (LLMs) fed by huge amounts of text data sometimes exhibit unexpected abilities, like the ability to code, roboticists hope that a similar strategy will produce robots that exhibit a lot of surprising new skills when trying to get things done.
Tedrake says that Atlas and other robots are starting to show signs of more generalized learning. His lab is also experimenting with different kinds of robot arms that are trained to perform various tasks, including slicing vegetables and sweeping up spilled coffee beans.
While there is a lot of work to do, Tedrake says all of the evidence so far suggests that the approaches used to LLMs also work for robots. “I think it's changing everything,” he says.
Gauging progress in robotics has become more challenging of late, of course, with videoclips showing commercial humanoids performing complex chores, like loading refrigerators or taking out the trash with seeming ease. YouTube clips can be deceptive, though, and humanoid robots tend to be either teleoperated, carefully programmed in advance, or trained to do a single task in very controlled conditions.
The new Atlas work is a big sign that robots are starting to experience the kind of equivalent advances in robotics that eventually led to the general language models that gave us ChatGPT in the field of generative AI. Eventually, such progress could give us robots that are able to operate in a wide range of messy environments with ease and are able to rapidly learn new skills—from welding pipes to making espressos—without extensive retraining.
“It's definitely a step forward,” says Ken Goldberg, a roboticist at UC Berkeley who receives some funding from TRI but was not involved with the Atlas work. “The coordination of legs and arms is a big deal.”
Goldberg says, however, that the idea of emergent robot behavior should be treated carefully. Just as the surprising abilities of large language models can sometimes be traced to examples included in their training data, he says that robots may demonstrate skills that seem more novel than they really are. He adds that it is helpful to know details about how often a robot succeeds and in what ways it fails during experiments. TRI has previously been transparent with the work it’s done on LBMs and may well release more data on the new model.
Whether simple scaling up the data used to train robot models will unlock ever-more emergent behavior remains an open question. At a debate held in May at the International Conference on Robotics and Automation in Atlanta, Goldberg and others cautioned that engineering methods will also play an important role going forward.
Tedrake, for one, is convinced that robotics is nearing an inflection point—one that will enable more real-world use of humanoids and other robots. “I think we need to put these robots out of the world and start doing real work,” he says.
What do you think of Atlas’ new skills? And do you think that we are headed for a ChatGPT-style breakthrough in robotics? Let me know your thoughts on ailab@wired.com.
This is an edition of Will Knight’s AI Lab newsletter. Read previous newsletters here.