超越一对一:动态人机群体对话的创作、模拟与测试

内容总结:
谷歌研发对话模拟平台DialogLab 推动人机群体交互进入动态仿真新阶段
2026年2月10日,谷歌XR部门的研究人员发布了一项名为DialogLab的创新性开源框架。该平台旨在突破当前人工智能对话系统通常仅限于“一对一”交互的局限,为设计、模拟和测试动态化、多角色的人机群体对话提供专业工具。
在现实场景中,如团队会议、课堂讨论或家庭聚会,对话往往涉及多方参与者、动态角色转换与即兴互动,复杂度远高于简单问答。为填补传统脚本对话的僵化与纯生成模型不可预测性之间的空白,DialogLab应运而生。该平台在ACM UIST 2025上亮相,其核心是提供了一个统一界面,允许开发者灵活配置对话场景、定义智能体角色、管理群组结构、设定发言规则,并能在预设脚本与自由即兴对话之间无缝编排过渡。
DialogLab的创新架构将对话的“社交结构”(参与者、角色、子群组关系)与“时序流程”分离,通过“创作-测试-验证”的三阶段可视化工作流支持快速迭代。用户可通过拖放式界面构建场景,细致调整角色属性与互动模式,并在测试中通过“人工控制”模式实时编辑或引导AI的发言,实现自动化生成与人工精细调控的平衡。平台内置的验证分析面板能直观展示对话中的发言权分布与情感流向,帮助开发者高效评估交互效果。
一项针对14名来自游戏设计、教育及社会科学领域用户的评估显示,参与者普遍认为DialogLab操作直观、控制灵活,能有效模拟真实群体对话。尤其在“人工控制”模式下,用户可主动引导话题转向、激发新观点或插入情感回应,该模式在沉浸感、效果及真实性上均获得更高评价。
研究团队展望,DialogLab在教育培训(如公开演讲练习、面试模拟)、游戏叙事(创建更自然的非玩家角色互动)及社会科学研究(模拟群体动力学实验)等领域具有广泛应用潜力。未来,该框架有望整合更丰富的多模态行为(如非语言手势、面部表情),并借助逼真虚拟形象与3D环境,在扩展现实(XR)场景中打造更为沉浸的仿真对话体验。
此项研究由谷歌博士生奖学金部分资助,标志着人机群体对话动态系统的研发迈出了重要一步,为未来构建更自然、复杂且可控的人机协同交互开启了新的可能性。
中文翻译:
超越一对一:动态人机群体对话的创作、模拟与测试
2026年2月10日
胡尔真(学生研究员)与杜若飞(谷歌XR交互感知与图形团队负责人)
DialogLab是一款研究原型工具,它通过统一界面帮助用户配置对话场景、定义智能体角色、管理群组结构、设定发言规则,并协调脚本叙事与即兴互动之间的转换。
快速链接
对话式人工智能已从根本上重塑了我们与技术的互动方式。尽管基于大语言模型的一对一交互已取得显著进展,却鲜少能完整复现人类沟通的复杂性。现实中的许多对话——无论是团队会议、家庭聚餐还是课堂讨论——本质上是多方参与的。这类互动涉及灵活的发言轮转、动态的角色转换以及即兴的交流打断。
对设计师和开发者而言,模拟自然流畅的多方对话历来需要权衡取舍:要么接受脚本化互动的刻板性,要么承受纯生成式模型的不可控性。为弥合这一鸿沟,我们需要能够融合脚本结构可预测性与人类对话自发即兴特质的工具。
为此,我们推出在ACM UIST 2025大会上展示的DialogLab——一个用于创作、模拟与测试动态人机群体对话的开源原型框架。该工具通过统一界面管理多方对话的复杂性,涵盖从定义智能体角色到协调复杂发言动态的全流程。通过将实时即兴互动与结构化脚本相结合,开发者可利用该框架测试各类对话场景,无论是结构化的问答环节,还是自由流动的创意头脑风暴。我们邀请14位终端用户及领域专家进行评估,验证了DialogLab能够有效支持高效迭代,并为培训与研究提供真实、可适配的多方对话设计方案。
动态对话框架
DialogLab将对话的社会结构(如参与者、角色、子群组和关系)与其时间演进过程解耦。这种分离使创作者能通过精简的三阶段工作流(创作、测试、验证)构建复杂动态。
该框架核心从两个维度定义对话:
- 群体动态:涵盖互动的社会结构配置
- 群组作为顶层容器(例如一场会议社交活动)
- 子群组指具有特定职能的参与方(如“演讲者”与“听众”)
- 元素包括个体参与者(人类或AI)及共享内容(如演示幻灯片)
- 对话流动态:描述对话随时间推进的展开方式
- 对话流被划分为片段,每个片段代表对话的不同阶段,包含明确的参与者集合、发言顺序及特定互动模式(如协作型或争论型)。创作者还可定义插话与反馈规则,以增强对话真实感。
动态对话的“创作-测试-验证”工作流
DialogLab通过为快速迭代设计的可视化界面,引导创作者遵循结构化工作流:
- 可视化创作工具:界面提供拖放式画布,用户可从资源库拖入虚拟形象与内容构建场景。检查器面板支持细粒度配置,涵盖虚拟形象角色设定到特定片段的互动模式。为加速设计流程,DialogLab提供可自动生成的对话提示,并支持根据叙事目标进行微调。
- 人机协同模拟:测试对多方互动至关重要。DialogLab的实时预览面板可显示对话记录,并设有“人工控制”模式——审核面板会推荐AI的潜在回应,设计师可编辑、采纳或忽略这些建议,从而精细调控AI的参与行为,实现快速迭代。
- 验证与分析功能:验证仪表板作为诊断工具,可将对话动态可视化,让创作者无需解析冗长原始记录,即可快速分析发言分布与情绪流向。
原型评估
我们邀请来自游戏设计、教育及社会科学研究领域的14位参与者评估DialogLab。参与者需完成两项任务:设计一场学术社交活动,并在三种模式下测试与AI的群体讨论:
- 人工控制模式:测试对话时,用户可指示智能体“切换话题”“提出新观点”“追问质疑”或“生成情绪化回应”
- 自主模式:模拟智能体基于预设顺序(随机或按序)主动参与对话,同时自动生成情绪回应与话题转换
- 响应模式:模拟人类智能体仅在被他方提及时才回应,复现传统人机轮替行为
参与者通过李克特五分量表对各模式评分。结果显示,人工控制模式在互动吸引力、模拟效果及真实感方面均显著优于其他模式。
用户反馈进一步凸显了系统在自动化与控制力间的平衡能力:
- 直观且引人入胜:多数参与者认为DialogLab易于使用,其可视化拖放界面让场景与角色设置既高效又有趣
- 灵活可控:用户赞赏系统在自动生成提示与细调对话细节间的平衡,其模拟不同主持策略的能力也被视为核心优势
- 真实模拟体验:人工控制模式最受测试者青睐,用户反馈该模式赋予更强的自主感与沉浸感,在模拟人类行为时比全自主或纯响应模式更生动、有效且真实
- 高效验证机制:验证仪表板被视为有价值的诊断工具,能帮助用户快速分析对话动态而无需通读冗长文本
未来展望
DialogLab不仅是一个研究原型,更是迈向更丰富、更细腻人机协作未来的一步。其应用前景广阔:
- 教育与技能培养:学生可在模拟观众前练习公开演讲,职场人士可演练高难度对话或面试场景
- 游戏设计与叙事创作:作家与游戏开发者能打造更具可信度的动态非玩家角色,使其以更自然的方式相互互动或与玩家交互
- 社会科学研究:DialogLab可作为受控环境用于研究群体动态,让研究者在无需大规模招募实际人群的情况下验证社交互动假设
展望未来,我们计划在框架中融入更丰富的多模态行为(如非语言手势与面部表情),并探索在开源XR Blocks框架中结合照片级虚拟形象与三维环境(如ChatDirector),以创造更具沉浸感的仿真体验。我们期待这项研究能持续激发人机群体对话动态这一新兴领域的创新活力。
观看DialogLab演示视频以了解更多信息。
致谢
项目核心贡献者包括胡尔真、陈彦合、李明怡、Vrushank Phadnis、徐平梅、钱迅、Alex Olwal、David Kim、Seongkook Heo与杜若飞。特别感谢Adarsh Kowdle对论文与博客文章的反馈与协助。本项目部分由谷歌博士生奖学金资助。
英文来源:
Beyond one-on-one: Authoring, simulating, and testing dynamic human-AI group conversations
February 10, 2026
Erzhen Hu, Student Researcher, and Ruofei Du, Interactive Perception & Graphics Lead, Google XR
DialogLab is a research prototype that provides a unified interface to configure conversational scenes, define agent personas, manage group structures, specify turn-taking rules, and orchestrate transitions between scripted narratives and improvisation.
Quick links
Conversational AI has fundamentally reshaped how we interact with technology. While one-on-one interactions with large language models (LLMs) have seen significant advances, they rarely capture the full complexity of human communication. Many real-world dialogues, including team meetings, family dinners, or classroom lessons, are inherently multi-party. These interactions involve fluid turn-taking, shifting roles, and dynamic interruptions.
For designers and developers, simulating natural and engaging multi-party conversations has historically required a trade-off: settle for the rigidity of scripted interaction or accept the unpredictability of purely generative models. To bridge this gap, we need tools that blend the structural predictability of a script with the spontaneous, improvisational nature of human conversation.
To address this need, we introduce DialogLab, presented at ACM UIST 2025, an open-source prototyping framework designed to author, simulate, and test dynamic human-AI group conversations. DialogLab provides a unified interface to manage multi-party dialogue complexity, handling everything from defining agent personas to orchestrating complex turn-taking dynamics. Through integrating real-time improvisation with structured scripting, this framework enables developers to test conversations ranging from a structured Q&A session to a free-flowing creative brainstorm. Our evaluations with 14 end users or domain experts validate that DialogLab supports efficient iteration and realistic, adaptable multi-party design for training and research.
A framework for dynamic conversation
DialogLab decouples a conversation’s social setup — such as participants, roles, subgroups, and relationships — from its temporal progression. This separation enables creators to author complex dynamics via a streamlined three-stage workflow: author, test, verify.
At its core, the DialogLab framework defines conversations along two dimensions:
- Group dynamics: This covers the social setup of the interaction.
- A group is the top-level container (e.g., a conference social event).
- Parties are sub-groups that have distinct roles (e.g., "presenters" and "audience").
- Elements are the individual participants (human or AI) and any shared content, like a presentation slide.
- Conversation flow dynamics: This describes how the dialogue unfolds over time.
- The flow is broken down into snippets, which represent distinct phases of the conversation. Each snippet has a defined set of participants, a sequence of conversational turns, and specific interaction styles (e.g., collaborative or argumentative). Creators can also define rules for interruptions and backchanneling to make the dialogue more realistic.
The “author-test-verify” workflow for dynamic conversation
DialogLab guides creators through a structured author-test-verify workflow, supported by a visual interface designed for rapid iteration. - Authoring with visual tools: The interface features a drag-and-drop canvas where users position avatars and content from libraries to build scenes. Inspector panels allow for granular configuration, from an avatar’s persona to the interaction patterns within a specific snippet. To accelerate the design process, DialogLab offers auto-generated conversation prompts that can be fine-tuned to meet specific narrative goals.
- Simulation with human-in-the-loop: Testing is critical for multi-party interactions. DialogLab includes a live preview panel that displays the conversation transcript and a "human control" mode, where an audit panel suggests potential AI responses. The designer can edit, accept, or dismiss these suggestions, providing fine-grained control over the AI's contributions and allowing for rapid iterations.
- Verification and analytics: To validate the interaction, the verification dashboard serves as a diagnostic tool. It visualizes conversation dynamics, allowing creators to quickly analyze turn-taking distributions and sentiment flows without parsing through lengthy raw transcripts.
Prototype evaluation
We evaluated DialogLab with 14 participants across game design, education, and social science research. Participants completed two tasks in DialogLab: designing an academic social event, and testing a group discussion with AI under three conditions: - Human control: When testing a conversation, the user can ask agents to “shift topic”, generate “new perspective”, “probe question”, or generate “emotional response”.
- Autonomous: The simulated agents proactively participate in the conversation based on pre-defined orders (random or one-by-one), while generating emotional responses and topic shifts automatically.
- Reactive: the simulated human agent only responds when directly mentioned by other agents, simulating traditional human-AI turn-taking behaviors.
Participants rated each condition at a 5-point Likert scale. Participants found the human control mode to be significantly more engaging, and generally more effective and realistic for simulating real-world conversations.
Participants’ feedback further highlighted the system's ability to balance automation with control: - Intuitive and engaging: Most participants found DialogLab easy to use and the visual, drag-and-drop interface for setting up scenes and roles to be fun and efficient.
- Flexible and controllable: Users appreciated the balance between auto-generated prompts and the ability to fine-tune conversation details. The system's ability to model different moderation strategies was also highlighted as a key strength.
- Realistic simulation: The human control mode was the clear favorite for testing, with users reporting that it gave them a greater sense of agency and immersion. It was rated as more engaging, effective, and realistic for simulating human behavior compared to fully autonomous or purely reactive agents.
- Powerful verification: The verification dashboard was seen as a valuable diagnostic tool for quickly analyzing conversation dynamics without having to read through lengthy transcripts.
Future directions
DialogLab is more than just a research prototype; it's a step toward a future where human-AI collaboration is richer and more nuanced. The potential applications are vast: - Education and skill development: Students could practice public speaking in front of a simulated audience, or professionals could rehearse difficult conversations and interviews.
- Game design and storytelling: Writers and game developers can create more believable and dynamic non-player characters (NPCs) that interact with each other and the player in more natural ways.
- Social science research: DialogLab can be used as a controlled environment to study group dynamics, allowing researchers to test hypotheses about social interaction without the logistical challenges of recruiting large groups of people.
Moving forward, we envision richer multimodal behaviors, such as non-verbal gestures and facial expressions, could be integrated into this framework, We could also explore the use of photorealistic avatars and 3D environments like ChatDirector to create even more immersive and realistic simulations in our open-source XR Blocks framework. We hope this research will inspire continued innovation in the exciting and emerging field of human-AI group conversation dynamics.
See video demonstration of DialogLab to learn more.
Acknowledgements
Key contributors to the project include Erzhen Hu, Yanhe Chen, Mingyi Li, Vrushank Phadnis, Pingmei Xu, Xun Qian, Alex Olwal, David Kim, Seongkook Heo, and Ruofei Du. We would like to extend our thanks to Adarsh Kowdle for providing feedback or assistance for the manuscript and the blog post. This project is partly sponsored by the Google PhD fellowship.