«

智能体交互框架:一种与主动式AR智能体实现无干扰交互的方案

qimuai 发布于 阅读:8 一手编译


智能体交互框架:一种与主动式AR智能体实现无干扰交互的方案

内容来源:https://research.google/blog/sensible-agent-a-framework-for-unobtrusive-interaction-with-proactive-ar-agents/

内容总结:

谷歌XR团队于2025年9月18日发布了一项名为"Sensible Agent"的创新研究框架,该成果旨在通过增强现实(AR)技术实现更自然的人机交互体验。这一框架通过实时感知用户视线、手部状态及环境噪音等多模态情境数据,使AR助手能够主动调整交互方式,减少对用户注意力的干扰。

与传统依赖语音指令的AR系统不同,Sensible Agent通过两大核心模块实现智能化交互:首先通过视觉语言模型和音频分析技术理解用户需求(如翻译菜单或推荐路线),随后根据社交场景智能选择最适宜的交互方式(如在嘈杂环境中使用视觉提示替代语音)。该系统已基于Android XR和WebXR平台开发出功能原型,并集成多模态AI模型进行场景解析与响应生成。

研究团队通过10名参与者进行的对比实验显示,与传统语音助手相比,Sensible Agent在认知负荷方面显著降低(NASA-TLX心智需求分值从65.0降至21.1),用户偏好度评分达到6.0(7分制)。尽管交互时间略有增加,但用户对其在社交场景中的隐蔽性和自然交互体验给予高度评价。

该技术为AR设备在日常生活场景中的实用化提供了新方向,未来可延伸至智能家居、跨设备协作等领域,同时通过端侧计算保障用户数据安全。此项研究由谷歌多团队协作完成,相关论文已发表于UIST 2025学术会议。

中文翻译:

智能体交互框架:与主动式AR智能体实现无干扰交互
2025年9月18日
杜若飞(谷歌XR交互感知与图形技术负责人) & 李建善(谷歌学生研究员)

Sensible Agent是一项研究原型,通过实时感知用户视线、手部状态和环境噪音等情境数据,使AR智能体能主动调整建议内容和交互方式。

快速链接
当前创新成果(如谷歌Project Astra)展现了嵌入AR眼镜的主动式智能体发展潜力:它们能预测用户需求,无缝融入日常生活,提供智能辅助。无论是轻松导航陌生交通枢纽,还是在拥挤空间中 discreetly(不引人注目地)提供适时建议,这些智能体都能带来显著便利。然而现有智能体仍存在明显局限:它们主要依赖用户明确的语音指令。这种要求在社交场合可能令人尴尬,在紧急情况下会增加认知负荷,或根本难以实施。

为应对这些挑战,我们在UIST 2025大会上提出Sensible Agent框架——专为实现与主动式AR智能体的无干扰交互而设计。该框架是我们此前Human I/O研究的进阶成果,通过预测用户意图并确定最佳辅助方式,从根本上重塑交互模式。它利用实时多模态情境感知、微手势、视线输入和最小化视觉提示,提供与环境相融的无干扰辅助。这标志着向真正集成化、具有社交意识的AR系统迈出关键一步,此类系统能尊重用户情境,最小化认知干扰,使主动式数字辅助真正适用于日常生活。

框架核心架构
Sensible Agent核心包含两个互联模块:(1)理解"辅助内容";(2)确定"辅助方式"。首先,系统通过第一视角摄像头和环境情境检测进行多模态感知,解析用户当前需求。无论您是在博物馆参观还是超市采购,智能体都会主动提供最有价值的帮助——例如实时翻译、餐厅招牌菜推荐,或 quietly(静默)显示购物清单。

更重要的是,系统会根据社交情境智能选择最不突兀的交互方式。当您烹饪时双手忙碌,智能体会启用点头确认功能;在嘈杂环境中,它会 discreetly(谨慎地)显示视觉图标而非语音提示。这种自适应模态选择确保辅助始终便捷可得,同时避免显著干扰。

原型系统实现
我们基于Android XR和WebXR平台开发了全功能原型,集成多模态AI模型实现该系统。原型包含四个组件:(1)理解场景的情境解析器;(2)确定辅助需求的主动查询生成器;(3)决策最佳交互方式的交互模块;(4)交付辅助的响应生成器。

用户研究
我们通过结构化用户研究,将本系统与基于Project Astra的传统语音控制AR助手进行对比。10名参与者使用Android XR设备完成12个现实场景测试,包括:

测试采用两种模式:

研究数据
通过NASA任务负荷指数量表(NASA-TLX)、系统可用性量表(SUS)、7点李克特偏好量表及交互时长进行测量:

关键发现:主动性不仅降低使用负担,更重塑用户与智能体的关系。参与者感觉Sensible Agent更像是协作伙伴而非工具,其非语言输入方式模仿社交暗示,使交互更自然。在高压或社交敏感环境中,交互方式的重要性不亚于交互内容。

结论与展望
本研究证明:通过联合推理"辅助内容"与"交互方式",主动式AR辅助既能保持智能性又可实现无干扰化。通过将多模态感知和实时适应能力融入决策与界面设计,本框架解决了人机交互中的长期痛点。

未来工作将聚焦:整合长期历史数据实现个性化适配,拓展多设备多环境支持,探索智能家居与物理机器人应用,同时通过设备端推理保障用户数据安全。随着AR技术日益融入日常生活,Sensible Agent类系统将为高效、体贴的数字智能体奠定基础。

致谢
本项目由谷歌多团队协作完成。核心贡献者:李建善、夏敏、Nels Numan、钱迅、David Li、陈炎赫、Achin Kulshrestha、Ishan Chatterjee、张印达、Dinesh Manocha、David Kim、杜若飞。感谢周重义、Vikas Bahirwani、Jessica Bo、徐征、刘仁昊对初期方案的反馈;感谢Alex Olwal、Adarsh Kowdle、Guru Somadder的战略指导与审阅。

英文来源:

Sensible Agent: A framework for unobtrusive interaction with proactive AR agents
September 18, 2025
Ruofei Du, Interactive Perception & Graphics Lead, and Geonsun Lee, Student Researcher, Google XR
Sensible Agent is a research prototype that enables AR agents to proactively adapt what they suggest and how they interact, using real-time context, including gaze, hand availability, and environmental noise.
Quick links
Recent innovations, such as Google's Project Astra, exemplify the potential of proactive agents embedded in augmented reality (AR) glasses to offer intelligent assistance that anticipates user needs and seamlessly integrates into everyday life. These agents promise remarkable convenience, from effortlessly navigating unfamiliar transit hubs to discreetly offering timely suggestions in crowded spaces. Yet, today’s agents remain constrained by a significant limitation: they predominantly rely on explicit verbal commands from users. This requirement can be awkward or disruptive in social environments, cognitively taxing in time-sensitive scenarios, or simply impractical.
To address these challenges, we introduce Sensible Agent, published at UIST 2025, a framework designed for unobtrusive interaction with proactive AR agents. Sensible Agent is an advancement to our prior research in Human I/O and fundamentally reshapes this interaction by anticipating user intentions and determining the best approach to deliver assistance. It leverages real-time multimodal context sensing, subtle gestures, gaze input, and minimal visual cues to offer unobtrusive, contextually-appropriate assistance. This marks a crucial step toward truly integrated, socially aware AR systems that respect user context, minimize cognitive disruption, and make proactive digital assistance practical for daily life.
Sensible Agent framework
At its core, Sensible Agent consists of two interconnected modules for (1) understanding "what" to assist with, and (2) determining "how" to provide assistance. First, Sensible Agent leverages advanced multimodal sensing using egocentric cameras and environmental context detection to understand a user’s current assistance needs. Whether you're navigating a crowded museum or rushing through a grocery store, the agent proactively decides the most helpful action, such as providing quick translations, suggesting popular dishes at a new restaurant, or quietly displaying a grocery list.
Equally important, Sensible Agent intelligently chooses the least intrusive and most appropriate interaction method based on social context. For instance, if your hands are busy cooking, the agent might enable confirmation via a head nod. In a noisy environment, it might discreetly show visual icons instead of speaking out loud. This adaptive modality selection ensures assistance is always conveniently delivered while avoiding significant disruptions.
Building the Sensible Agent prototype
To bring this concept to life, we implemented Sensible Agent as a fully functional prototype running on Android XR and WebXR, integrated with powerful multimodal AI models. The prototype includes four components: (1) a context parser that enables it to understand the scene, (2) a proactive query generator that determines what assistance is needed, (3) an interaction module that decides how to best offer assistance, and (4) a response generator that delivers the assistance.

谷歌研究进展

文章目录


    扫描二维码,在手机上阅读