

浏览全部资源
扫码关注微信
1.上海交通大学物理与天文学院,上海 200240
2.上海交通大学自然科学研究院,上海国家应用数学中心(交大分中心),上海 200240
3.上海人工智能实验室,上海 200240
Received:16 January 2023,
Revised:2023-03-29,
Published:30 June 2023
移动端阅览
康里奇, 谈攀, 洪亮. 人工智能时代下的酶工程[J]. 合成生物学, 2023, 4(3): 524-534
KANG Liqi, TAN Pan, HONG Liang. Enzyme engineering in the age of artificial intelligence[J]. Synthetic Biology Journal, 2023, 4(3): 524-534
康里奇, 谈攀, 洪亮. 人工智能时代下的酶工程[J]. 合成生物学, 2023, 4(3): 524-534 DOI: 10.12211/2096-8280.2023-009.
KANG Liqi, TAN Pan, HONG Liang. Enzyme engineering in the age of artificial intelligence[J]. Synthetic Biology Journal, 2023, 4(3): 524-534 DOI: 10.12211/2096-8280.2023-009.
自然界中存在的酶拥有多种多样的功能,它们已经被应用在工业生产和学术研究中,但其中许多酶的性质和功能还不能完全满足应用需要,通过改造来提升这类酶的某些特性是酶工程的重要任务。本文介绍了酶工程的主要发展历程,并重点梳理了人工智能(AI)助力酶工程领域的研究进展。酶工程主要包括理性设计、定向进化、半理性设计和人工智能辅助设计等策略。理性设计方法根据酶的催化机理、结构等先验知识进行改造。定向进化技术通过构建随机突变文库和高通量筛选提升目标酶的稳定性和活性等性质。半理性设计方法借助一系列计算方法构建相比于定向进化更小也更合理的突变文库以降低筛选工作量。人工智能技术在大量数据驱动下可以学习有关蛋白质构成和进化的特征信息。通过直接学习自然界中存在的蛋白质序列、共进化信息和结构,深度神经网络已经可以解决许多类型的酶工程问题,如预测具有有益影响的突变、优化蛋白质的稳定性、提高催化活性等。通过对酶工程现状进行分析,本文旨在进一步推动酶的开发和优化以实现更广泛的应用,为研究者和相关从业人员提供更多有价值的见解。
Enzymes have garnered significant attention in both research and industry due to their unparalleled specificity and functionality
and thus opportunities remain for enhancing their physichemical properties and fitness to improve catalytic performance. The primary objective of enzyme engineering is to optimize the fitness of targeted enzymes through various strategies for their modifications
even redesigning. This review provides a comprehensive overview for progress made in enzyme engineering
with a focus on artificial intelligence (AI)-guided design methodology. Several key strategies have been employed in enzyme engineering
including rational design
directed evolution
semi-rational design
and AI-guided design. Rational design relies on an extensive knowledge based on encompassing protein structures and catalytic mechanisms
allowing for purposeful manipulations of enzyme properties. Directed evolution
on the other hand
involves the generation of a library of random variants for subsequent high-throughput screening to identify beneficial mutations. Semi-rational design combines rational design and directed evolution
resulting in a smaller
yet more targeted
library of variants
which mitigates high cost associated with extensive screening of large libraries developed through directed evolution. In recent years
AI technologies
particularly deep neural networks
have emerged as a promising approach for enzyme engineering
and AI-guided methods leverage a vast amount of information regarding protein sequences
multiple sequence alignments
and protein structures to learn key features for correlations. These learned features can then be applied to various downstream tasks in enzyme engineering
such as predicting mutations with beneficial effect
optimizing protein stability
and enhancing catalytic activity. Herewith
we delves into advancements and successes in each of these strategies for enzyme engineering
highlighting the growing impact of AI-guided design on the process. By offering a detailed examination of the current state of enzyme engineering
we aim at providing valuable insight for researchers and engineers to further advance the development and optimization of enzymes for more applications.
2
COBB R E , CHAO R , ZHAO H M . Directed evolution: past, present, and future [J ] . AIChE Journal , 2013 , 59 ( 5 ): 1432 - 1440 .
LERNER S A , WU T T , LIN E C . Evolution of a catabolic pathway in bacteria [J ] . Science , 1964 , 146 ( 3649 ): 1313 - 1315 .
SARAC I , HOLLENSTEIN M . Terminal deoxynucleotidyl transferase in the synthesis and modification of nucleic acids [J ] . ChemBioChem , 2019 , 20 ( 7 ): 860 - 871 .
TOBIN M B , GUSTAFSSON C , HUISMAN G W . Directed evolution: the 'rational' basis for 'irrational' design [J ] . Current Opinion in Structural Biology , 2000 , 10 ( 4 ): 421 - 427 .
CHEN K , ARNOLD F H . Tuning the activity of an enzyme for unusual environments: sequential random mutagenesis of subtilisin E for catalysis in dimethylformamide [J ] . Proceedings of the National Academy of Sciences of the United States of America , 1993 , 90 ( 12 ): 5618 - 5622 .
STEMMER W P C . Rapid evolution of a protein in vitro by DNA shuffling [J ] . Nature , 1994 , 370 ( 6488 ): 389 - 391 .
STEMMER W P . DNA shuffling by random fragmentation and reassembly: in vitro recombination for molecular evolution [J ] . Proceedings of the National Academy of Sciences of the United States of America , 1994 , 91 ( 22 ): 10747 - 10751 .
LIEBETON K , ZONTA A , SCHIMOSSEK K , et al . Directed evolution of an enantioselective lipase [J ] . Chemistry & Biology , 2000 , 7 ( 9 ): 709 - 718 .
REETZ M T , ZONTA A , SCHIMOSSEK K , et al . Creation of enantioselective biocatalysts for organic chemistry by in vitro evolution [J ] . Angewandte Chemie International Edition , 1997 , 36 ( 24 ): 2830 - 2832 .
POREBSKI B T , BUCKLE A M . Consensus protein design [J ] . Protein Engineering, Design and Selection , 2016 , 29 ( 7 ): 245 - 251 .
STERNKE M , TRIPP K W , BARRICK D . Consensus sequence design as a general strategy to create hyperstable, biologically active proteins [J ] . Proceedings of the National Academy of Sciences of the United States of America , 2019 , 116 ( 23 ): 11275 - 11284 .
PALMER B , ANGUS K , TAYLOR L , et al . Design of stability at extreme alkaline pH in streptococcal protein G [J ] . Journal of Biotechnology , 2008 , 134 ( 3/4 ): 222 - 230 .
MINAKUCHI K , MURATA D , OKUBO Y , et al . Remarkable alkaline stability of an engineered protein A as immunoglobulin affinity ligand: C domain having only one amino acid substitution [J ] . Protein Science , 2013 , 22 ( 9 ): 1230 - 1238 .
ROMERO-RIVERA A , GARCIA-BORRÀS M , OSUNA S . Computational tools for the evaluation of laboratory-engineered biocatalysts [J ] . Chemical Communications , 2017 , 53 ( 2 ): 284 - 297 .
KHERSONSKY O , LIPSH R , AVIZEMER Z , et al . Automated design of efficient and functionally diverse enzyme repertoires [J ] . Molecular Cell , 2018 , 72 ( 1 ): 178 - 186.e5 .
WEINREICH D M , DELANEY N F , DEPRISTO M A , et al . Darwinian evolution can follow only very few mutational paths to fitter proteins [J ] . Science , 2006 , 312 ( 5770 ): 111 - 114 .
LI R F , WIJMA H J , SONG L , et al . Computational redesign of enzymes for regio- and enantioselective hydroamination [J ] . Nature Chemical Biology , 2018 , 14 ( 7 ): 664 - 670 .
CUI Y L , WANG Y H , TIAN W Y , et al . Development of a versatile and efficient C-N lyase platform for asymmetric hydroamination via computational enzyme redesign [J ] . Nature Catalysis , 2021 , 4 ( 5 ): 364 - 373 .
CAPRIOTTI E , FARISELLI P , CASADIO R . A neural-network-based method for predicting protein stability changes upon single point mutations [J ] . Bioinformatics , 2004 , 20 ( S1 ): i63 - i68 .
CAPRIOTTI E , FARISELLI P , CASADIO R . I-Mutant2.0: predicting stability changes upon mutation from the protein sequence or structure [J ] . Nucleic Acids Research , 2005 , 33 ( S2 ): W306 - W310 .
曲玉辰 , 陆路 , 姜世勃 . 利用I-Mutant 2 .0辅助设计与优化中东呼吸综合征冠状病毒融合抑制多肽[J ] . 微生物与感染 , 2019, 14 ( 2 ): 72 - 81 .
QU Y C , LU L , JIANG S B . Using I-Mutant2.0 to assist the design and optimization of MERS-CoV fusion inhibitory peptides [J ] . Journal of Microbes and Infections , 2019 , 14 ( 2 ): 72 - 81 .
YANG Y , DING X S , ZHU G C , et al . ProTstab - predictor for cellular protein stability [J ] . BMC Genomics , 2019 , 20 ( 1 ): 804 .
FARISELLI P , MARTELLI P L , SAVOJARDO C , et al . INPS: predicting the impact of non-synonymous variations on protein stability from sequence [J ] . Bioinformatics , 2015 , 31 ( 17 ): 2816 - 2821 .
LAIMER J , HOFER H , FRITZ M , et al . MAESTRO—multi agent stability prediction upon point mutations [J ] . BMC Bioinformatics , 2015 , 16 : 116 .
DEHOUCK Y , KWASIGROCH J M , GILIS D , et al . PoPMuSiC 2.1: a web server for the estimation of protein stability changes upon mutation and sequence optimality [J ] . BMC Bioinformatics , 2011 , 12 : 151 .
PIRES D E V , ASCHER D B , BLUNDELL T L . DUET: a server for predicting effects of mutations on protein stability using an integrated computational approach [J ] . Nucleic Acids Research , 2014 , 42 ( W1 ): W314 - W319 .
WORTH C L , PREISSNER R , BLUNDELL T L . SDM—a server for predicting effects of mutations on protein stability and malfunction . Nucleic acids research , 2011 , 39 ( S2 ): W215 - W222 .
PIRES D E V , ASCHER D B , BLUNDELL T L . mCSM: predicting the effects of mutations in proteins using graph-based signatures [J ] . Bioinformatics , 2014 , 30 ( 3 ): 335 - 342 .
The UniProt Consortium . UniProt: the universal protein knowledgebase in 2023 [J ] . Nucleic Acids Research , 2023 , 51 ( D1 ): D523 - D531 .
MEIER J , RAO R S , VERKUIL R , et al . Language models enable zero-shot prediction of the effects of mutations on protein function [C/OL ] // Advances in Neural Information Processing Systems 34 (NeurIPS 2021 ), 2021 . 34 : 29287 - 29303 [2023-01-03] . https://proceedings.neurips.cc/paper_files/paper/2021/hash/f51338d736f95dd42427296047067694-Abstract.html https://proceedings.neurips.cc/paper_files/paper/2021/hash/f51338d736f95dd42427296047067694-Abstract.html .
RAO R M , LIU J , VERKUIL R , et al . MSA transformer [C/OL ] // Proceedings of the 38th International Conference on Machine Learning , PMLR , 2021 , 139 : 8844 - 8856 [2023-01-03] . http://proceedings.mlr.press/v139/rao21a.html http://proceedings.mlr.press/v139/rao21a.html .
JUMPER J , EVANS R , PRITZEL A , et al . Highly accurate protein structure prediction with AlphaFold [J ] . Nature , 2021 , 596 ( 7873 ): 583 - 589 .
HSU C , VERKUIL R , LIU J , et al . Learning inverse folding from millions of predicted structures [EB/OL ] . bioRxiv , 2022 [ 2023-01-03 ] . https://www.biorxiv.org/content/10.1101/2022.04.10.487779v1 https://www.biorxiv.org/content/10.1101/2022.04.10.487779v1 .
JING B , EISMANN S , SURIANA P , et al . Learning from protein structure with geometric vector perceptrons [EB/OL ] . arXiv , 2020 : 2009 . 01411 [ 2023-01-03 ] . https://arxiv.org/abs/2009.01411 https://arxiv.org/abs/2009.01411 .
ZHOU B X , LV O T Y , YI K , et al . Lightweight equivariant graph representation learning for protein engineering [C/OL ] // Machine Learning for Structural Biology Workshop - NeurIPS 2022[2023-01-03] . https://ins.sjtu.edu.cn/people/lhong/papers/articles/LGN-NeurIPS2022-Lightweight%20Equivariant%20Graph%20Representation%20Learning%20for%20Protein%20Engineering.pdf https://ins.sjtu.edu.cn/people/lhong/papers/articles/LGN-NeurIPS2022-Lightweight%20Equivariant%20Graph%20Representation%20Learning%20for%20Protein%20Engineering.pdf .
RIESSELMAN A J , INGRAHAM J B , MARKS D S . Deep generative models of genetic variation capture the effects of mutations [J ] . Nature Methods , 2018 , 15 ( 10 ): 816 - 822 .
NIJKAMP E , RUFFOLO J , WEINSTEIN E N , et al . ProGen2: exploring the boundaries of protein language models [EB/OL ] . arXiv , 2022 : 2206 . 13517 [ 2023-01-03 ] . https://arxiv.org/abs/2206.13517 https://arxiv.org/abs/2206.13517 .
NOTIN P , DIAS M , FRAZER J , et al . Tranception: protein fitness prediction with autoregressive transformers and inference-time retrieval [C/OL ] // International Conference on Machine Learning , arXiv , 2022[2023-01-03] . https://arxiv.org/abs/2205.13760 https://arxiv.org/abs/2205.13760 .
LU H Y , DIAZ D J , CZARNECKI N J , et al . Machine learning-aided engineering of hydrolases for PET depolymerization [J ] . Nature , 2022 , 604 ( 7907 ): 662 - 667 .
RIVES A , MEIER J , SERCU T , et al . Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences [J ] . Proceedings of the National Academy of Sciences of the United States of America , 2021 , 118 ( 15 ): e2016239118 .
LUO Y N , JIANG G D , YU T H , et al . ECNet is an evolutionary context-integrated deep learning framework for protein engineering [J ] . Nature Communications , 2021 , 12 : 5743 .
LI M C , KANG L Q , XIONG Y , et al . SESNet: sequence-structure feature-integrated deep learning method for data-efficient protein engineering [EB/OL ] . arXiv , 2022 : 2301 . 00004 [ 2023-01-03 ] . https://arxiv.org/abs/2301.00004 https://arxiv.org/abs/2301.00004 .
CUI Y L , CHEN Y C , LIU X Y , et al . Computational redesign of a PETase for plastic biodegradation under ambient condition by the GRAPE strategy [J ] . ACS Catalysis , 2021 , 11 ( 3 ): 1340 - 1350 .
ROCKLIN G J , CHIDYAUSIKU T M , GORESHNIK I , et al . Global analysis of protein folding using massively parallel design, synthesis, and testing [J ] . Science , 2017 , 357 ( 6347 ): 168 - 175 .
HUANG B , XU Y , HU X H , et al . A backbone-centred energy function of neural networks for protein design [J ] . Nature , 2022 , 602 ( 7897 ): 523 - 528 .
DOU J Y , VOROBIEVA A A , SHEFFLER W , et al . De novo design of a fluorescence-activating β-barrel [J ] . Nature , 2018 , 561 ( 7724 ): 485 - 491 .
YEH A H W , NORN C , KIPNIS Y , et al . De novo design of luciferases using deep learning [J ] . Nature , 2023 , 614 ( 7949 ): 774 - 780 .
RUSS W P , FIGLIUZZI M , STOCKER C , et al . An evolution-based model for designing chorismate mutase enzymes [J ] . Science , 2020 , 369 ( 6502 ): 440 - 445 .
REPECKA D , JAUNISKIS V , KARPUS L , et al . Expanding functional protein sequence spaces using generative adversarial networks [J ] . Nature Machine Intelligence , 2021 , 3 ( 4 ): 324 - 333 .
MADANI A , KRAUSE B , GREENE E R , et al . Large language models generate functional protein sequences across diverse families [J/OL ] . Nature Biotechnology , 2023 [ 2023-02-01 ] . https://www.nature.com/articles/s41587-022-01618-2 https://www.nature.com/articles/s41587-022-01618-2 .
SINAI S , WANG R , WHATLEY A , et al . AdaLead: a simple and robust adaptive greedy search algorithm for sequence design [EB/OL ] . arXiv , 2020 : 2010 . 02141 [ 2023-01-03 ] . https://arxiv.org/abs/2010.02141 https://arxiv.org/abs/2010.02141 .
BISWAS S , KHIMULYA G , ALLEY E C , et al . Low-N protein engineering with data-efficient deep learning [J ] . Nature Methods , 2021 , 18 ( 4 ): 389 - 396 .
HU R Y , FU L H , CHEN Y C , et al . Protein engineering via Bayesian optimization-guided evolutionary algorithm and robotic experiments [J ] . Briefings in Bioinformatics , 2023 , 24 ( 1 ): bbac570 .
CASTRO E , GODAVARTHI A , RUBINFIEN J , et al . Transformer-based protein generation with regularized latent space optimization [J ] . Nature Machine Intelligence , 2022 , 4 ( 10 ): 840 - 851 .
LIN Z , AKIN H , RAO R , et al . Evolutionary-scale prediction of atomic-level protein structure with a language model [J ] . Science , 2023 , 379 ( 6637 ): 1123 - 1130 .
0
Views
2
下载量
6
CSCD
Publicity Resources
Related Articles
Related Author
Related Institution
京公网安备11010802024621