«

人工智能工具AlphaGenome能够预测单个拼写错误如何改写基因的故事。

qimuai 发布于 阅读:17 一手编译


人工智能工具AlphaGenome能够预测单个拼写错误如何改写基因的故事。

内容来源:https://www.sciencenews.org/article/ai-tool-alphagenome-predicts-genetics

内容总结:

谷歌DeepMind发布新型AI基因组模型AlphaGenome,解码生命“天书”能力获突破

近日,谷歌旗下人工智能公司DeepMind在《自然》杂志发布新一代深度学习模型AlphaGenome。该模型能够一次性分析长达100万个DNA碱基对,较此前最优模型Borzoi的处理长度提升一倍,有望显著提升科学家解读基因组“生命天书”的能力,并为罕见遗传病诊断、癌症驱动突变识别及合成生物学等领域带来潜在应用前景。

基因组常被比作一部结构复杂的“生命百科全书”,其中不仅包含基因“故事主线”,还涉及大量调控元件、非编码区域等“语法标记”,且各部分之间存在复杂的空间折叠与远程相互作用。传统分析工具往往只能针对特定功能进行片段化解读。AlphaGenome的创新之处在于,它能以单碱基分辨率,同时预测DNA序列变化对RNA剪接、基因活性水平等11种关键生物学过程的影响,实现了对基因组功能更系统、更整合的模拟。

斯坦福大学计算生物学家安舒尔·昆达杰评价称,这不仅是分析长度的扩展,更是“整体实用性的重大飞跃”。由于能扫描更长的DNA片段,该模型更易捕捉基因调控中常见的远程相互作用。在实际测试中,其多项性能指标超越现有专业模型,如在特定细胞类型中预测基因活性变化的准确率较前代模型提升14.7%。

然而,研究人员也指出当前局限。昆达杰实验室的未发表数据显示,该模型在预测个体层面基因活性变化时仍存在困难,因此目前主要作为基础科研工具,尚不能直接用于临床诊断或治疗。冷泉港实验室计算生物学家彼得·顾强调,AlphaGenome的成功并非源于单一技术突破,而是集成多种工程策略的“系统创新”。其中采用的“集成蒸馏”技术,通过训练多个子模型形成“共识预测”,提升了结果的可靠性。

谷歌DeepMind研究员娜塔莎·拉蒂舍娃表示,模型在多种基因组任务上的优异表现,说明其已学习到DNA序列及其编码复杂过程的“强大通用表征”。西奈山伊坎医学院遗传学家胡迪特·加西亚·冈萨雷斯认为,AlphaGenome将以往分散的分析功能整合于一体,能极大简化科研工作流程。

展望未来,科学家指出此类模型的发展已接近当前架构的潜力上限,下一步突破可能需要结合新型基因组数据或开发全新分析范式。AlphaGenome的发布,标志着人工智能在解读生命密码的征途上又迈出关键一步。

中文翻译:

谷歌深度思维公司开发的新型深度学习人工智能模型,或许能帮助科学家更好地解读生命密码的复杂叙事,探究基因"笔误"如何改变生命故事。这款名为"阿尔法基因组"的模型是DNA分析AI模型迭代升级的最新成果。此前领先的"博佐伊"模型可解读长达50万个碱基的DNA序列分子标记,而《自然》杂志1月28日刊载的研究显示,阿尔法基因组能同时分析100万个DNA基础单元。该模型在诊断罕见遗传病、识别致癌突变、设计合成DNA序列或治疗性RNA,以及深化基础生物学认知方面具有应用潜力。

斯坦福大学计算生物学家安舒尔·昆达杰评价:"阿尔法基因组不仅在分析长度上实现突破,其整体实用性更是质的飞跃。"这位专攻基因组学AI模型的专家指出,某些基因变异虽不影响邻近基因,却能改变远端基因活性。由于阿尔法基因组能扫描更长的DNA片段,更易捕捉此类远程关联。

但该模型仍存局限。昆达杰实验室未公开数据显示,其在预测个体基因活性变化方面存在困难。目前该工具仅适用于基础生物学研究,尚不能用于临床诊疗。昆达杰认为此类模型已触及能力上限,未来突破需依靠科学家为其迭代版本提供新型分析数据。

纽约冷泉港实验室计算生物学家彼得·顾特别指出,阿尔法基因组能将生物学关键位点定位精度提升至单个碱基级别,远超博佐伊模型32个碱基对的分辨单元。这项成就尤为显著,因为其参照系是长达30亿碱基的人类基因组——这部常被称作"生命天书"的遗传指令集,实则如同融合多卷本、分支剧情与立体弹窗的互动百科全书。

基因如同书中的短篇故事,其表达片段可重组、删减或跳跃。故事碎片间穿插着解读其他篇章的说明指南,书页章节如立体折纸般相互嵌套,牵动某一页的标签便可能触发数章之外的弹窗机制。书中大量曾被视作无意义的篇幅,实则是细胞解读生命密码不可或缺的语法要素——研究人员已梳理出令人目眩的标点体系、折纸式皱褶、句法转换、页边批注等生物语法系统。

阿尔法基因组的核心任务是通过DNA序列预测基因位点、标点变异等要素如何影响RNA剪接、基因活性水平及蛋白质-DNA相互作用等11种生物过程。该模型整合了5930个人类DNA与1128个小鼠DNA研究数据点,能推演百万碱基序列中单个字母改变引发的连锁效应。研究团队证实,这款在多数指标上超越传统专业模型的新工具,尤其擅长识别不同细胞类型的特征差异,如在特定细胞中识别基因活性变化的精度较博佐伊2代提升14.7%。

谷歌深度思维公司的娜塔莎·拉蒂舍娃在1月27日的简报会上表示:"模型在多项基因组任务中的卓越表现,证明其已掌握DNA序列及其编码复杂过程的通用表征能力。"纽约西奈山伊坎医学院人类遗传学家胡迪特·加西亚·冈萨雷斯认为,该工具将极大简化基因组功能研究流程——过去研究人员可能需要掌握三种各有限制的工具来预测20种基因组功能影响,如今阿尔法基因组实现了全功能整合。

彼得·顾强调该模型实为多项技术智慧融合的产物:"阿尔法基因组并无单一突破性创新,而是由大量技巧与系统工程构建的复合体系。"其中采用的集成蒸馏技术曾在其实验室进行验证:通过预训练多个经计算突变的DNA模型作为教师网络,再将其输出结果融合至学生模型。顾生动比喻道:"如同汇集60位历史学家对重大事件的论述,取各叙事线的共识重叠部分,往往最接近真相。这种共识机制比依赖单一模型更为可靠。"

英文来源:

A new deep-learning AI model may help scientists better decipher the plot of the genetic instruction book and learn how typos alter the story.
AlphaGenome, created by Google DeepMind, is the latest in an ever-improving line of AI models built to analyze vast stretches of DNA. The previous front-runner, a model called Borzoi, could predict molecular signposts in stretches of DNA 500,000 bases long. AlphaGenome can analyze 1 million DNA building blocks at a time, researchers report January 28 in Nature. The model may have practical implications for diagnosing rare genetic diseases, identifying cancer-driving mutations, designing synthetic DNA sequences or therapeutic RNAs and better understanding basic biology.
“AlphaGenome is not just a bigger model in terms of context length, but it actually is quite a leap forward in its overall utility,” says Anshul Kundaje, a computational biologist at Stanford University who develops AI models for genomics.
For instance, a genetic change may have no effect on nearby genes but could change activity of genes far away. Because AlphaGenome examines longer stretches of DNA, it is more likely to spot such long-distance relationships.
But AlphaGenome isn’t perfect. Unpublished data from Kundaje’s lab indicates the model struggles with predicting how gene activity changes in individuals. Right now, the model is a tool for uncovering basic biology not something doctors could use to diagnose or treat patients.
AlphaGenome has “maxed out” what this type of model can do, Kundaje says. He predicts the next big leap will come from scientists generating new types of data for the model or its descendants to analyze.
AlphaGenome can pinpoint biologically important spots down to single base resolution, says Peter Koo, a computational biologist at Cold Spring Harbor Laboratory in New York. That’s much higher resolution than Borzoi, which flagged points of biological interest in 32 base-pair bins.
That’s a big task considering that the model’s reference is the 3-billion-base-long human genome, often called a genetic instruction book. The book is actually a multivolume, choose-your-own-adventure, popup encyclopedia.
Genes, the short stories of the book, are told in small phrases that can be rearranged, shortened or skipped. In between the story fragments are passages that may contain instructions for how to read a different story entirely. Pages and chapters are intricately folded into each other so that pulling a tab in one passage causes something to pop up chapters away.
Much of the book is filled with what many people thought was nonsense but is often essential reading material. Researchers have cataloged a dizzying array of punctuation marks, origami-like creases, syntax swaps, margin scribbles and other types of biological grammar that cells use to make sense of the book.
AlphaGenome’s task is to take a string of DNA letters and predict how plot points, punctuation and other variations affect 11 distinct biological processes, including RNA splicing, gene activity levels and certain protein-DNA interactions. The model considers 5,930 data points from studies of human DNA and 1,128 in mouse DNA. With those data, the AI can predict how changing a single letter, or base, in the million-base string alters the story.
Specialized computational models that predict subsets of these biological functions have been in use for years, but AlphaGenome outperforms them on most measures and does particularly well at identifying some features in different types of cells, the researchers report. For example, AlphaGenome identified gene activity changes in certain cell types 14.7 percent better than Borzoi2.
“By doing well on so many different genomic tasks simultaneously, we believe this demonstrates that the model has learned a powerful general representation of DNA sequences and the complex processes these sequences encode,” said Natasha Latysheva of Google DeepMind January 27 during a news briefing.
The tool could make things easier for researchers who are trying to understand how the genome works, says Judit García González, a human geneticist at the Ichan School of Medicine at Mount Sinai in New York City. Before AlphaGenome, a researcher “might need to use three different tools with their own caveats, and [have] to learn how they work, for predicting say 20 different genomic functional consequences,” she says. Now, AlphaGenome unites all those in one tool.
AlphaGenome isn’t an entirely new invention. It builds on previous models but uses aspects of those models in clever ways. “There is no single innovation in AlphaGenome that one can pinpoint as a critical innovation. It’s really a system of lots of tricks and engineering,” Koo says.
AlphaGenome used one trick called ensemble distillation that Koo’s lab has been experimenting with. That strategy pretrains multiple copies of the model each on computationally mutated DNA. Those models serve as teachers to a single student model that averages their outputs.
It’s like having 60 history professors give their account of an important event, Koo says. “If you consider the consensus across what every historian agrees, what overlaps across their story lines, that is probably what might actually be true.”
The consensus, he says, “tends to be more reliable than trusting any individual model.”

AI科学News

文章目录


    扫描二维码,在手机上阅读