谷歌地球AI：利用基础模型与跨模态推理解锁地理空间洞察

qimuai 发布于 2025-10-24 08:01 阅读：116 一手编译

内容来源：https://research.google/blog/google-earth-ai-unlocking-geospatial-insights-with-foundation-models-and-cross-modal-reasoning/

内容总结：

谷歌发布新一代地球AI平台融合多模态模型提升地理空间分析能力

谷歌研究院于10月23日宣布，其地球AI（Earth AI）平台取得重大技术突破。该平台通过整合遥感影像、人口动态与环境数据三大基础模型，并搭载基于Gemini模型的地理空间推理智能体，首次实现行星级复杂地理问题的自动化求解。

技术突破体现在三方面：新型遥感基础模型支持自然语言查询，在洪水道路检测等任务中准确率提升超16%；人口动态模型通过17国标准化嵌入技术，将巴西登革热疫情预测准确率提升44%；环境模型新增全球降水临近预报，覆盖20亿人口的重大河流洪水预警。

研究显示，多模型协同可显著提升预测效能。在评估美国联邦应急管理局自然灾害风险指数时，融合人口与地理数据的综合模型较单一数据源预测精度平均提升11%，其中龙卷风风险预测提升达25%。

该平台已应用于多个公益场景：联合国全球脉冲计划通过卫星影像分析加速灾后评估；慈善组织GiveDirectly结合洪水预测向高危社区发放援助金；谷歌X实验室旗下Bellwether公司通过综合预测模型实现风暴前建筑损毁预判，显著缩短保险理赔周期。

目前谷歌正扩大该平台的企业级应用范围，已有CARTO、Visiona空间技术等公司成为新用户，同时向开发者开放遥感基础模型等功能的早期接入通道。

中文翻译：

谷歌地球AI：基于基础模型与跨模态推理解锁地理空间洞察
2025年10月23日
谷歌研究院工程高级总监Niv Efron、产品管理总监Luke Barrington

谷歌地球AI是我们推出的地理空间人工智能模型与推理智能体系列，通过真实世界认知为用户提供可操作的洞察。今日我们将分享最新研发成果，并扩大谷歌地球与谷歌云平台对该技术的访问权限。

快速通道

多年来谷歌持续开发提升地球认知能力的人工智能模型。这些模型通过分析卫星影像确保地图准确性，为搜索用户提供最新气象与自然灾害预警，持续保持谷歌产品的鲜活度。

随着单个模型能力不断增强，我们发现现实世界的诸多问题需要跨领域洞察的融合。要解答"飓风可能在哪里登陆？哪些社区最脆弱？应如何防范？"这类复杂问题，需对影像、人口与环境数据进行联合推理。

今年初推出的谷歌地球AI正是为了应对这一核心挑战。通过将强大基础模型家族与采用最新Gemini模型的地理空间推理智能体相结合，我们正逐步实现行星尺度的复杂现实推理。这些基础模型基于真实世界数据提供对地球的精细认知，而推理智能体则扮演智能调度者角色：将复杂问题拆解为多步骤计划，通过调用基础模型、查询海量数据库及使用地理空间工具执行计划，最终融合各步骤结果形成完整答案。

今日我们发布三项地球AI创新成果：

新型影像与人口基础模型，附技术细节及达到业界顶尖水平的评估报告
展示地理空间推理智能体如何运用这些模型解决复杂多步骤地理空间查询

如需深入了解，欢迎阅读我们的完整技术论文《谷歌地球AI：基于基础模型与跨模态推理解锁地理空间洞察》。随着我们逐步向开发者和企业开放这些新功能，也诚邀您登记合作意向。

地球AI核心组件：顶尖基础模型

影像领域
全新遥感基础模型通过视觉语言模型、开放词汇目标检测和自适应视觉骨干网络三大核心能力，简化并加速卫星影像分析。用户可使用自然语言提问（如"找出暴雨后图像中所有被淹道路"），即可获得快速精准的答案。该模型基于大量高分辨率俯拍影像与文本描述进行训练，在多项公开地球观测基准测试中达到顶尖水平。其中基于文本的图像搜索任务平均提升超16%，针对新目标检测的零样本模型精度较基线提升逾一倍。

人口领域
涵盖移动智能与人口动态基础模型的该研究方向，致力于解析人地关系的复杂互动。我们最新的人口动态基础模型实现两大突破：覆盖17国的全局统一嵌入向量，以及每月更新的人类活动动态捕捉嵌入向量——这对时效性预测至关重要。牛津大学独立研究显示，将该模型嵌入向量登革热预测模型后，巴西地区12个月预测周期的R²指标（衡量模型解释实际疾病发生率的能力）从0.456提升至0.656。

环境领域
我们已发布的研究在中程天气预报、季风爆发、空气质量及河流洪水预测方面展现领先实力。近期环境模型新增全球降水临近预报功能，并为20亿人口提供重大河流洪水预警。

模型融合增强预测能力
虽然每个基础模型都能提供强大洞察，但我们的研究证实模型组合能产生更强预测力。这种协同方法能更全面精准地理解现实世界现象，显著提升关键应用的预测水平。

以美国联邦应急管理局国家风险指数为例，该指数基于经济社交脆弱性及自然环境影响等多重因素，标识洪水风暴等自然灾害高风险社区。通过融合人口动态基础模型的社会经济特征嵌入向量与AlphaEarth基础模型的地貌特征嵌入向量，我们对20种灾害的国家风险指数预测R²平均提升11%，其中龙卷风风险预测提升25%，河流洪水风险预测提升17%。

通过地理空间推理解决复杂问题
上述案例表明解决现实问题需要融合多模型能力。全新Gemini驱动的地理空间推理智能体简化了这些地球AI洞察的协调过程。该智能体能解构自然语言复杂查询，动态规划多步骤解决方案。在执行各步骤时，可调用配备前述地球AI模型的"专家"子智能体及专用地理空间工具。这种模块化智能体网络支持功能扩展与定制。

以识别飓风风险脆弱人群为例，智能体执行透明推理步骤：

调用环境模型识别飓风级风力风险区域
查询Data Commons获取预测登陆区人口密集县统计资料
从BigQuery公共数据集检索目标县行政边界
对风域与行政边界进行空间交集分析
基于人口动态基础模型与县级统计实时训练模型，定位最脆弱邮编区域
运用遥感基础目标检测模型识别最脆弱邮编区卫星影像中的关键基础设施

为评估智能体性能，我们开发两种新方法：基于公开数据可验证答案的问答基准测试，以及针对复杂预测场景的应急响应案例研究（如完整解决上述飓风应对挑战）。在问答基准测试中，我们的地理空间推理智能体总体准确率达0.82，显著优于基线Gemini 2.5 Pro（0.50）和Gemini 2.5 Flash（0.39）智能体（评分基于ROUGE-L F1与误差百分比计算，数值越高越好）。这印证了为智能体配备专业地理空间模型与工具的重要性。在更复杂的应急响应场景中，技术论文通过案例研究展示了协调环境、遥感与人口动态多维洞察的优势——借助专业子智能体进行地理空间与人口统计分析，我们得以完成现实世界分析任务。

携手释放地球潜能
地球AI标志着行星认知能力的根本性飞跃。我们的研究证明，基于顶尖地理空间AI模型的多模态推理方法，能够解锁传统孤立分析无法企及的深层洞察。

我们才刚刚开始探索地球AI的全部潜力，并致力于扩大技术开放度，助力全球社会应对地球最紧迫的挑战。例如：

Google X登月计划项目Bellwether正运用我们的天气预报、人口动态基础嵌入向量、卫星图像分析与房产数据库，在风暴来临前预测建筑损毁，帮助保险客户加速理赔，让业主尽早启动重建
联合国全球脉动计划使用地球AI影像模型评估自然灾害损失，助力政府与国际组织快速响应危机
公益组织GiveDirectly结合我们的洪水预报与地理空间推理，定位高风险社区并发放现金援助，帮助家庭防灾减灾

除支持上述机构外，Google.org还资助Khushi Baby、Cooper/Smith、Direct Relief和Froncort.ai等合作伙伴，这些组织正运用人口动态基础模型模拟传染病传播并改进全球公共卫生行动。地球AI的新企业用户包括Public Storage、CARTO及巴西航空工业公司旗下Visiona Space Technology。

我们期待聆听您的应用构想。诚邀各机构登记早期访问意向：遥感基础模型（通过Vertex AI影像模型提供）、人口动态基础模型及地理空间推理功能。

英文来源：

Google Earth AI: Unlocking geospatial insights with foundation models and cross-modal reasoning
October 23, 2025
Niv Efron, Senior Director of Engineering, and Luke Barrington, Director of Product Management, Google Research
Google Earth AI is our family of geospatial AI models and reasoning agents that provides users with actionable insights, grounded in real-world understanding. Today, we’re sharing our latest Earth AI innovations and expanding access to these new capabilities on Google Earth and Google Cloud.
Quick links
For years, Google has developed AI models that enhance our understanding of the planet. These models help keep Google products fresh, for example, ensuring Maps is accurate by analyzing satellite images and giving Search users the most up-to-date alerts about weather and natural disasters.
As individual models grow more powerful, we’ve learned that many real-world questions require the combination of insights across domains. Answering complex queries like, "Where is a hurricane likely to make landfall? Which communities are most vulnerable and how should they prepare?" requires reasoning about imagery, population and the environment.
Earlier this year, we introduced Google Earth AI to solve this core challenge. By pairing our family of powerful foundation models with a geospatial reasoning agent, which uses our latest Gemini models, it’s becoming possible to perform complex, real-world reasoning at planetary scale. The models provide detailed understanding of our planet, grounded in real-world data. The agent, in turn, acts as an intelligent orchestrator. It deconstructs a complex question into a multi-step plan; executes the plan by calling on these foundation models, querying vast datastores, and using geospatial tools; and finally fuses the results at each step into a holistic answer.
Today, we're introducing new Earth AI innovations:

New Imagery and Population foundation models, along with technical details and evaluations showing state-of-the-art performance.
Demonstrations of our geospatial reasoning agent using these models to solve complex, multi-step geospatial queries.
To learn more, we invite you to read our full technical paper, "Google Earth AI: Unlocking Geospatial Insights with Foundation Models and Cross-Modal Reasoning". You can also get involved by expressing interest as we expand access to these new capabilities for developers and enterprises.
Building blocks of Earth AI: State-of-the-art foundation models
Imagery
Our new Remote Sensing Foundations models simplify and accelerate satellite imagery analysis using three core capabilities: vision-language models, open-vocabulary object detection, and adaptable vision backbones. Users can ask natural language queries, like "find all flooded roads" in an image captured after a storm, and get rapid, accurate answers. Our models are trained on a large corpus of high-resolution overhead imagery, paired with text descriptions. They achieve state-of-the-art results on multiple public Earth observation benchmarks. For instance, we achieve >16% average improvement on text-based image search tasks, while our zero-shot model for novel object detection more than doubles the baseline accuracy.
Population
This area of research, which includes Mobility AI and Population Dynamics Foundations, aims to understand the complex interplay between people and places. Our latest research in Population Dynamics Foundations introduces two key innovations: globally-consistent embeddings across 17 countries and monthly updated embeddings that capture the changing dynamics of human activity, which are critical for time-sensitive predictions. Population Dynamics Foundations has shown remarkable effectiveness in independent studies; for example, researchers at the University of Oxford found that incorporating these embeddings into a forecasting model for Dengue fever in Brazil improved long-range R² (a metric that measures how well a model explains the actual disease rates) from 0.456 to 0.656 for 12-month predictions.
Environment
Our previously-published research demonstrates state-of-the-art forecasts for medium-range weather, monsoon onsets, air quality and riverine floods. We've recently expanded these Environment models to make precipitation nowcasts for the entire planet, and we’re now covering 2 billion people with forecasts for the most significant riverine floods.
Increased predictive power by combining models
While each foundation model provides powerful insights, our findings confirm that combining models yields even more predictive power. This synergistic approach produces a more comprehensive and accurate understanding of real-world phenomena and dramatically improves predictions across critical applications.
For example, FEMA’s National Risk Index shows which communities are most at risk to natural hazards like floods and storms, based on a variety of factors including economic and social vulnerability as well as physical and environmental risk. By fusing embeddings that capture socio-economic features from our Population Dynamics Foundations and landscape features from AlphaEarth Foundations, we improved prediction of FEMA’s National Risk Index by an average of 11% in R² across 20 different hazards, versus using either data source alone, with the most significant gains in predicting risk from tornadoes (+25% R²) and riverine flooding (+17% R²).
Complex problem-solving via Geospatial Reasoning
The example above illustrates that tackling real-world problems requires insights from multiple models with diverse capabilities. Orchestrating these Earth AI insights is simplified by our new Gemini-powered Geospatial Reasoning agent. The agent deconstructs complex, natural language queries and plans a dynamic, multi-step path to an answer. To execute each step, the agent can call on “expert” sub-agents that are equipped with Earth AI models described above, as well as geospatial-specific tools. This modular network of agents allows for extensibility and customization.
To see how it works, consider a user who wishes to identify specific populations that are vulnerable to the risk of an oncoming storm. The agent executes a transparent series of reasoning steps:
Invoke the Environment model to identify the specific geographic areas that are forecast to be at risk of hurricane force winds.
Query Data Commons for demographic statistics to identify higher-population counties in the area of predicted landfall.
Retrieve official boundaries for the counties of interest from BigQuery’s public datasets.
Perform a spatial intersection between the wind zones and official county boundaries.
Identify the most vulnerable postal codes by training a model on the fly using our Population Dynamics Foundations and county level statistics.
Use Remote Sensing Foundations object detection model to identify critical infrastructure in satellite imagery taken over one of the most vulnerable postal codes.
To evaluate the agent, we developed two new methods for evaluation: a Q&A benchmark for fact-finding and analysis with verifiable ground truth answers based on publicly available data and Crisis Response case studies for complex, predictive scenarios (e.g., solving the entire challenge above).
On the Q&A benchmark, our Geospatial Reasoning Agent achieved an overall accuracy of 0.82, significantly outperforming the baseline Gemini 2.5 Pro (0.50) and Gemini 2.5 Flash (0.39) agents (scores derived from ROUGE-L F1 and percentage error, higher is better). This highlights the importance of giving agents access to specialized geospatial models and tools for these types of queries.
In the more complex Crisis Response scenarios, our paper demonstrates the benefit of orchestrating a diverse set of Environment, Remote Sensing and Population Dynamics insights via case studies. Leveraging specialized sub-agents for geospatial and demographic analysis, we’re able to solve real-world analysis tasks.
Unlocking our planet's potential, together
Earth AI represents a fundamental leap in planetary understanding. Our findings show that a multimodal, reasoning-based approach, built upon a foundation of state-of-the-art geospatial AI models, can unlock insights that are intractable with siloed analysis alone.
We are just beginning to explore the full potential of Earth AI and are committed to expanding access in order to help the global community address the planet’s most pressing challenges. For example:
Bellwether, a Google X moonshot, is using our weather forecasts, Population Dynamics Foundations embeddings, satellite image analysis and property databases to predict building damage before a storm strikes, helping their insurance clients pay claims faster so homeowners can start rebuilding sooner — saving them time, money and stress.
United Nations Global Pulse uses Earth AI Imagery models to assess damage after natural disasters, enabling governments and international organizations to rapidly respond to crises.
GiveDirectly is using Geospatial Reasoning with our flood forecasts to identify at-risk communities and send cash aid to help households prepare for and mitigate disaster.
In addition to supporting UN Global Pulse, GiveDirectly, and other organizations using Earth AI, Google.org is providing funds to partners like Khushi Baby, Cooper/Smith, Direct Relief and Froncort.ai who are utilizing Population Dynamics Foundations to model infectious diseases and improve public health action globally. New enterprise users of Earth AI include Public Storage, CARTO and Visiona Space Technology (part of Embraer).
We want to hear how Earth AI might be helpful to you. We encourage organizations to express interest in getting early access to Remote Sensing Foundations (available as Imagery models in Vertex AI), Population Dynamics Foundations, and Geospatial Reasoning.

谷歌研究进展

文章目录

📚 推荐阅读

扫描二维码，在手机上阅读