时序基础模型能够成为小样本学习器。

qimuai 发布于 2025-9-24 08:01 阅读：10 一手编译

时序基础模型能够成为小样本学习器。

内容来源：https://research.google/blog/time-series-foundation-models-can-be-few-shot-learners/

内容总结：

谷歌发布新型时间序列预测模型实现“小样本”精准学习

2025年9月23日，谷歌研究院的研究科学家Rajat Sen与软件工程师Yichen Zhou在ICML 2025大会上发表了一项突破性研究，推出了一种名为“上下文微调”的新方法。该方法成功将其原有的时间序列基础模型TimesFM升级为具备“小样本学习”能力的TimesFM-ICF模型，显著提升了预测性能。

传统的时间序列预测方法（如库存需求、能源需求预测）通常需要为每个具体任务单独构建和训练专用模型，过程繁琐且依赖大量专业知识。尽管此前谷歌开发的TimesFM模型已能实现“零样本”预测，但新研究旨在探索如何利用少量相关示例数据，在不增加用户操作复杂度的前提下，进一步提升预测准确性。

研究人员对模型核心架构进行了关键改进。他们在模型中引入了可学习的“通用分隔符”，如同在数据流中设置“数字停止标志”，有效区分用于参考的历史示例数据与待预测的目标数据，防止模型混淆不同来源的信息模式。通过持续的预训练，模型学会了在预测时自动从这些有限的上下文示例中学习规律，并应用于当前任务。

测试结果显示，在从未接触过的23个数据集上，TimesFM-ICF模型的预测准确率比基础版TimesFM提升了6.8%。尤为重要的是，其性能与经过复杂监督微调的模型相当，却省去了后者所需的额外训练步骤和资源消耗。此外，模型表现符合预期：提供的相关示例越多，预测结果越精确，但推理时间会相应增加。

这项技术有望为实际业务带来革新。企业未来在为新产品预测需求等场景中，无需启动复杂的机器学习项目，只需向模型输入少量相关示例，即可快速获得专业级的精准预测，从而大幅降低成本、加速决策进程，推动高端预测技术的普及应用。

研究团队表示，下一步将致力于开发自动筛选最相关上下文示例的策略，进一步增强基础模型的智能性与适应性。

中文翻译：

时间序列基础模型可实现小样本学习
2025年9月23日
Google研究院研究科学家Rajat Sen、软件工程师Yichen Zhou

我们提出了一种创新的时间序列预测方法，通过持续预训练使时间序列基础模型具备在推理时根据上下文示例自我调整的能力。

快速导读
时间序列预测对现代企业至关重要，可帮助预测从库存需求到能源消耗的各类指标。传统方法需要为每个任务单独构建专用模型，这一过程既耗时又依赖专业经验。

零样本学习的出现提供了解决方案。我们先前开发的TimesFM模型作为零样本预训练基础模型，无需针对特定任务训练即可实现精准预测。但若提供少量示例是否能进一步提升预测效果？例如，若模型能参考附近高速公路或同一路段数周前的数据，对当前交通流量的预测将更为准确。传统的监督微调方法虽能利用标注数据优化现有模型，却重新引入了零样本学习试图规避的复杂性。

在ICML 2025发表的论文《时间序列基础模型的上下文微调》中，我们提出了一种将TimesFM转化为小样本学习器的新方法。该技术通过持续预训练教会模型在推理时从少量示例中学习，最终在不增加用户训练负担的前提下，实现了与监督微调相媲美的性能。

模型重构方案
TimesFM采用分块解码器架构：将每32个连续时间点作为一个输入标记块，通过变压器层处理输入标记序列生成输出标记，再通过共享多层感知器将每个输出标记转换为128个时间点的序列。

为构建TimesFM-ICF（上下文微调版），我们基于原有模型引入新上下文环境进行持续预训练：除预测历史外，还纳入所有上下文示例。首要解决的是避免模型混淆预测历史与上下文示例。假设向模型提供多组数据（如A商店太阳镜销量与B商店雨伞销量），若简单合并数据流，模型可能将不同趋势误解为单一波动曲线。

为此，我们在每组数据后插入可学习的"通用分隔符标记"，其功能类似于数字"停止标志"或"段落分隔符"。当模型处理到已知示例的分隔符时，便能清晰区分历史数据与当前预测目标。理论上这使得模型能够从过往模式中学习规律，例如识别"近期各商店销量均呈现持续定向趋势，故新店防晒霜销量应预测为上升趋势"。

由于分隔符及关联注意力机制是TimesFM的新增要素，第二步需要对基础模型进行持续预训练以适应新机制。具体方案简明直接：构建包含上下文示例与分隔符的新数据集，采用仅解码器的标准下一标记预测训练法。输入数据经MLP层生成标记后，通过因果自注意力层处理序列中先前标记的信息（该机制在时间序列预测中至关重要，可防止模型窥见未来数据），再输入前馈网络。经过多次变压器层堆叠处理后，最终结果传递至输出MLP层。

模型测试验证
我们在训练全程未接触的23个数据集上评估TimesFM-ICF。每个基准数据集包含多条时间序列，预测时首先提取目标序列的近期历史，随后从该序列完整历史及同数据集其他序列中采样上下文示例，确保示例相关性且无数据泄露。

下图展示了经季节性朴素预测归一化的平均绝对缩放误差的几何均值聚合结果。我们重点关注两个基线：

TimesFM（基础版）：作为起点的预训练模型
TimesFM-FT：对基础版进行逐数据集监督微调的版本，代表此前领域自适应的最佳实践

TimesFM-ICF较基础版精度提升6.8%。更令人振奋的是，其性能与监督微调版持平，却无需经历复杂的微调流程。

除精度提升外，TimesFM-ICF还展现出其他优良特性。例如符合"上下文示例越多，预测精度越高（代价是推理时间延长）"的预期规律。相较于纯长上下文模型，该模型能更高效地利用上下文信息。

展望未来：更普惠强大的预测技术
该方法具有重要现实意义，企业可借助单一强大模型实现更稳健灵活的预测。对于新产品需求预测等新任务，无需启动完整机器学习项目，仅需输入少量相关示例即可获得专业级预测结果，显著降低成本、加速决策创新，推动高端预测技术普惠化。

我们对后续研究充满期待，特别是开发自动筛选最相关上下文示例的策略。通过增强基础模型的智能性与适应性，将助力更多用户做出更优质的数据驱动决策。

致谢
本研究由前实习研究员Matthew Faw主导，与Google研究院同事Abhimanyu Das、Ivan Kuznetsov合作完成。博客内容在编辑Mark Simborg与Kimberly Schwede的鼎力支持下完成。

英文来源：

Time series foundation models can be few-shot learners
September 23, 2025
Rajat Sen, Research Scientist, and Yichen Zhou, Software Engineer, Google Research
We present a novel approach to time-series forecasting that uses continued pre-training to teach a time-series foundation model to adapt to in-context examples at inference time.
Quick links
Time-series forecasting is essential for modern businesses, helping them predict everything from inventory needs to energy demands. Traditionally, this has involved building a separate, specialized model for each task — a process that is slow and requires significant expertise.
The emergence of zero-shot learning offered a solution. Our previous model, TimesFM, was a zero-shot, pre-trained foundation model that could accurately forecast without task-specific training. But what if a few examples could make the forecast even better? For instance, forecasting highway traffic would be more accurate if the model could consider data from other nearby highways or from the same highway a few weeks ago. The standard solution, supervised fine-tuning, which uses curated data to fine-tune an existing model, reintroduces the complexity one hopes to avoid with zero-shot learning.
In our new work, "In-Context Fine-Tuning for Time-Series Foundation Models", presented at ICML 2025, we introduce a novel approach that transforms TimesFM into a few-shot learner. This method uses continued pre-training to teach the model how to learn from a handful of examples at inference time. The result is a powerful new capability that matches the performance of supervised fine-tuning without requiring additional complex training from the user.
Redesigning the model
TimesFM is a patched decoder that tokenizes every 32 contiguous timepoints (a patch) as an input token and applies a transformer stack on top of the sequence of input tokens to generate the output tokens. It then applies a shared multilayer perceptron (MLP) to translate each output token back to a time series of 128 timepoints.
To create TimesFM-ICF (In-Context Fine-tuning), we start with the base TimesFM model and continue the pre-training with new context: the forecast history plus all in-context examples. The first step is to make sure the model doesn’t confuse or conflate the forecasting history and the in-context examples. Imagine you're giving the model a list of numbers that represent a few different things, maybe sunglasses sales figures from one store, then umbrella sales figures from another. If you just merge all those numbers together, the model might get confused, thinking it's one continuous stream of data. For example, if the first store’s sales were going up and the second store’s sales were going down, the model might incorrectly see it as a single up-and-down pattern, rather than two separate, simple trends.
To fix this, we put a special, learnable “common separator token” — like a digital "stop sign" or a "new paragraph" symbol — after each set of numbers. With these separators in place, as soon as the model attends to the separator token of an example it has seen before, it won't mix it up with the data it's currently trying to predict. This theoretically allows the model to learn from patterns in those past examples and apply that knowledge to the current forecast. For instance, the model could learn that "all the store sales are showing consistent, directional trends lately, so I should predict an upward trend for my new store’s sunscreen sales."
Since the separator tokens and the attention to them are new for TimesFM, our second step involves continuing the pre-training of the base TimesFM model to teach it about the new introductions. The recipe here is actually straightforward: we created a new dataset that includes both in-context examples and separator tokens, and we applied standard decoder-only next-token prediction training. Inputs are passed to the MLP layer, which generates tokens. These are passed to a causal self attention (CSA) layer that "attends to" information from previous tokens in the sequence, a step that's crucial in tasks like time-series forecasting as it prevents the model from looking into the future. The CSA then feeds into a feed-forward network (FFN). We repeat CSA and FFN multiple times (i.e., the stacked transformers) before connecting the result to the output MLP layer.
Testing the model
We evaluated TimesFM-ICF on 23 datasets that the model had never seen during any phase of its training. Each dataset in this benchmark has multiple time series. When we forecast a time series, we start with its immediate history, then sample sequences from its full history and the histories of other time series in the same dataset as in-context examples. This ensures the in-context examples are relevant and there is no leakage.
The chart below shows the geometric mean (GM) aggregation of the mean absolute scaled errors (MASE) normalized by a naïve repeat of the last seasonal pattern. We focus on two baselines here:

TimesFM (Base), which is the pre-trained model from which we started.
TimesFM-FT is TimesFM (Base) with supervised fine-tuning using the train split per dataset and then evaluated on the corresponding test split. This is a strong baseline that reflects the previous best practice for domain adaptation.
TimesFM-ICF is 6.8% more accurate than TimesFM (Base). What’s more surprising and inspiring is that it matches the performance of TimesFM-FT without the hassle of running supervised fine-tuning.
Besides the accuracy improvement, TimesFM-ICF also demonstrates other desirable properties. For example, it is consistent with our expectation that with more in-context examples, a model will make more accurate forecasts at the cost of longer inference time. In addition, TimesFM-ICF shows better utilization of its context when compared to a purely long-context model that does not have the ability to work with in-context examples.
The future: More accessible and powerful forecasting
This new approach has significant real-world applications because it allows businesses to deploy a more robust and adaptable single, powerful forecasting model. Instead of launching a full ML project for new tasks, like forecasting demand for a new product, they can simply feed the model a few new relevant examples. This immediately provides state-of-the-art, specialized forecasts, dramatically cutting costs, accelerating decision-making and innovation, and democratizing access to high-end forecasting.
We're excited by this research's future, particularly developing automated strategies for selecting the most relevant in-context examples. By making foundation models more intelligent and adaptable, we empower more users to make better, data-driven decisions.
Acknowledgements
This research was led by then-student researcher Matthew Faw in collaboration with Google Research colleagues Abhimanyu Das and Ivan Kuznetsov. This blog post was brought to life with the tremendous help from editors Mark Simborg and Kimberly Schwede.

谷歌研究进展

文章目录

📚 推荐阅读

扫描二维码，在手机上阅读