机器学习简化了制造更优蛋白质的复杂过程。

内容来源:https://www.sciencenews.org/article/machine-learning-better-proteins
内容总结:
科学家开发出一种名为“MULTI-evolve”的新型机器学习框架,有望大幅提升蛋白质工程效率。该技术通过单轮实验即可预测并筛选出性能更优的蛋白质变体,相关成果于2月19日发表于《科学》杂志。
蛋白质是药物、生物燃料乃至洗涤剂等产品的关键成分。传统改造方法需对氨基酸进行多次替换尝试,但由于不同替换之间存在相互影响,往往需要耗费大量时间进行反复试验与验证。
研究团队提出的新方法将实验数据与机器学习相结合,通过三步流程实现高效设计:首先预测单个氨基酸替换的影响,随后通过实验室合成双突变蛋白验证相互作用,最后利用所得数据训练模型,精准预测包含五个及以上突变的高性能蛋白质组合。
该技术已在自身免疫疾病相关抗体及CRISPR基因编辑蛋白的测试中取得显著效果,成功获得多项性能优于原始蛋白质的突变组合。研究人员指出,该方法有望应用于细胞内部蛋白质追踪、基因治疗优化等多个领域,为生物医学研究与实践带来革新。
中文翻译:
为药物或消费品制造高性能蛋白质,往往需要经历反复的调整、实验与优化。一种新型机器学习框架将这一繁琐过程压缩至单轮测试即可完成。
这项名为"MULTI-evolve"的技术能预测当蛋白质中多个氨基酸被替换时其性质将如何变化。据研究人员2月19日在《科学》杂志发表的报告,该技术通过融合实验室实验与机器学习来寻找优化后的蛋白质。
经过特殊设计的蛋白质在药物、生物燃料乃至洗衣液等日常产品中发挥着重要作用。科学家在设计过程中通常需要替换多个氨基酸以提升蛋白质性能。但单个氨基酸的替换可能影响后续替换对蛋白质功能的作用效果,因此要找到协同增效的氨基酸组合往往需要多轮反复修改与实验室测试。"这本质上是超高维度的搜索问题,我们实际上是在不断试错。"加州大学伯克利分校及帕洛阿尔托弧研究所的生物工程师帕特里克·许解释道。
许与同事开发的MULTI-evolve工作流程旨在减少多数迭代环节,通过单轮测试即可预测包含多重替换(即突变)的高性能蛋白质。为此,他们需要掌握不同突变间相互影响的信息。针对每个目标蛋白质,该工作流程包含三个步骤:首先利用既有数据或机器学习技术预测单个氨基酸替换对蛋白质功能的影响;随后在实验室制备一系列包含双重突变的蛋白质样本,通过测试其性能确定突变间的相互作用规律;最后基于实验数据训练机器学习模型,使其能够预测目标蛋白质在引入五个及以上突变时的功能表现。
研究团队在三种蛋白质上测试了MULTI-evolve系统,包括与自身免疫疾病相关的抗体及用于CRISPR基因编辑的蛋白质。所有测试案例中,模型均筛选出多组突变组合,其实验室测试表现均优于原始蛋白质,表明该模型能有效识别协同作用的氨基酸替换方案。
在MULTI-evolve可优化的众多蛋白质应用中,许特别强调了两大方向:利用蛋白质追踪细胞内其他物质的运动轨迹,以及为缺乏特定酶的患者开发更高效的基因疗法。"我们对这项成果感到振奋。"许表示,"这项技术如何切实改变科研实践,已引发学界极大关注。"
英文来源:
Making high-performance proteins for medicines or consumer products can take trial after trial of tweaks, experiments and fine-tuning. A new machine learning framework squeezes all that into a single round of testing.
The technique, called MULTI-evolve, predicts how proteins will behave when several of their amino acids are swapped for others. MULTI-evolve blends laboratory experiments with machine learning to find these upgraded proteins, researchers report February 19 in Science.
Specially-crafted proteins play a role in everyday products like medicines, biofuels and even laundry detergent. Scientists usually need to swap out multiple amino acids during the design process to boost a protein’s performance. But replacing one amino acid with another can change how the next swap will affect the protein’s function, so finding combinations of swaps that work well together often requires many iterative rounds of modifications and laboratory tests. “It’s this very high-dimensional search problem where we effectively do guess and check,” says Patrick Hsu, a bioengineer at the University of California, Berkeley, and the Arc Institute in Palo Alto, Calif.
Hsu and colleagues built the MULTI-evolve workflow to cut out most of those iterations and predict high-performing proteins with multiple swaps, or mutations, in one round of testing. To do that, they needed information about how different mutations affected each other. For each protein the team targeted, the workflow had three steps. First, the researchers used either previous data or machine learning techniques to predict how single amino acid swaps would affect protein function. Then, to establish how the mutations interacted with each other, they made a series of proteins that each had two of those mutations in the lab and tested how well each one worked. Finally, they trained a machine learning model on that laboratory data and asked it to predict how well the target protein would function with five or more mutations.
The team tested MULTI-evolve on three proteins, including an antibody relevant to autoimmune diseases and a protein used in CRISPR gene editing. In each case, the model found several combinations of mutations that in laboratory tests outperformed the original proteins, suggesting the model could pick out a set of swaps that work well together.
Among the many protein jobs MULTI-evolve could streamline, Hsu highlighted two: using one protein to track another’s movement inside a cell and building better gene therapies for people whose bodies don’t produce certain enzymes. “We’re excited about this work,” Hsu says. “I think there’s tremendous interest in how this actually changes the practice of science.”