

浏览全部资源
扫码关注微信
1.北京大学定量生物学中心,北京大学-清华大学生命科学联合中心,北京大学前沿学科交叉研究院,北京 100871
2.北京大学教育部细胞增殖与分化重点实验室,北京大学生命科学学院,北京 100871
3.北京大学成都前沿交叉生物技术研究院,四川 成都 610213
Received:12 May 2025,
Revised:2025-06-06,
Published:30 June 2025
移动端阅览
宋成治, 林一瀚. AI+定向进化赋能蛋白改造及优化[J]. 合成生物学, 2025, 6(3): 617-635
SONG Chengzhi, LIN Yihan. AI-enabled directed evolution for protein engineering and optimization[J]. Synthetic Biology Journal, 2025, 6(3): 617-635
宋成治, 林一瀚. AI+定向进化赋能蛋白改造及优化[J]. 合成生物学, 2025, 6(3): 617-635 DOI: 10.12211/2096-8280.2025-044.
SONG Chengzhi, LIN Yihan. AI-enabled directed evolution for protein engineering and optimization[J]. Synthetic Biology Journal, 2025, 6(3): 617-635 DOI: 10.12211/2096-8280.2025-044.
定向进化是合成生物学领域的核心底层技术之一。通过在实验室中模拟自然界发生的进化过程,定向进化利用功能筛选从大量的突变序列文库中不断获得性能提升的蛋白序列,帮助实现野生型蛋白难以实现的功能。近年来不断发展的机器学习、蛋白语言模型等人工智能(artificial intelligence, AI)方法进一步拓展了该技术的使用场景和工作效率,帮助其在酶、抗体、生物传感器等的改造中取得优异表现。本文总结了传统定向进化在突变文库构建和功能筛选过程中使用的典型策略,并对近年来开发的高效连续定向进化平台进行介绍,进一步对定向进化技术存在的序列空间有限、容易陷入局部最优等一系列问题进行探讨。快速迭代的机器学习模型与定向进化相结合,一方面能够缓解序列空间的探索局限性,另一方面能够从起始序列设计、中间文库优化、功能信息提取等多个维度对定向进化的实验流程进行完善,帮助实现更加高效的蛋白改造尝试。为明确定向进化结合机器学习的应用潜力,本文重点展示了机器学习辅助定向进化的代表案例。最后,简要探讨了该领域的潜在挑战和未来发展方向。
Directed evolution is one of the core enabling technologies in synthetic biology. By recapitulating evolutionary processes that occur in natur
e within the laboratories
directed evolution employs functional screening to continually isolate variants with improved performance from large mutant libraries for functions that are difficult to achieve with wild-type proteins. In recent years
rapidly advancing artificial intelligence (AI) approaches—such as machine learning and protein language models—have further expanded both the range of applications and the operational efficiency of directed evolution
yielding unprecedented achievements in the engineering of enzymes
antibodies
biosensors
and more. In this review
we first outline classic strategies and emerging techniques for mutagenesis and functional selection in traditional directed evolution
followed by an in-depth examination of various continuous directed evolution systems. We highlight common limitations of directed evolution
emphasizing issues such as constrained search space and susceptibility to local optima. Combining rapidly iterated AI methods with directed evolution offers promising solutions to these challenges. Protein language models
in particular
leverage learned patterns from experimental variants alongside fundamental protein properties
providing superior predictive accuracy for unexplored mutants and facilitating the extrapolation of sequence-function relationships to broader sequence space. AI-based methods enhance directed evolution workflows from multiple perspectives.
De novo
protein design and unsupervised protein language models aid in generating functional starting sequences with targeted sequence diversity. Machine learning models trained on experimental data enable the construction of optimized mutant libraries tailored for subsequent selection rounds. Additionally
models derived from statistical physics and dynamical systems help extract detailed functional information from data acquired across multiple selection rounds. Collectively
these machine learning approaches significantly enhance the overall efficiency of directed evolution. To illustrate
the transformative potential of machine learning-assisted directed evolution
we discuss exemplary cases of protein function improvement and modification. Lastly
we briefly address ongoing challenges and future directions in this rapidly evolving and promising research area.
2
LERNER S A , WU T T , LIN E C . Evolution of a catabolic pathway in bacteria [J ] . Science , 1964 , 146 ( 3649 ): 1313 - 1315 .
HALL B G . Experimental evolution of a new enzymatic function. Kinetic analysis of the ancestral ( ebg o ) and evolved ( ebg + ) enzymes [J ] . Journal of Molecular Biology , 1976 , 107 ( 1 ): 71 - 84 .
LENUG D W , CHEN E , GOEDDEL D V . A method for random mutagenesis of a defined DNA segment using a modified polymerase chain reaction [J ] . Technique JMCMB , 1989 , 1 : 11 - 15 .
CHEN K Q , ARNOLD F H . Enzyme engineering for nonaqueous solvents: random mutagenesis to enhance activity of subtilisin E in polar organic media [J ] . Nature Biotechnology , 1991 , 9 ( 11 ): 1073 - 1077 .
KLENK C , SCRIVENS M , NIEDERER A , et al . A Vaccinia-based system for directed evolution of GPCRs in mammalian cells [J ] . Nature Communications , 2023 , 14 : 1770 .
HUFFMAN M A , FRYSZKOWSKA A , ALVIZO O , et al . Design of an in vitro biocatalytic cascade for the manufacture of islatravir [J ] . Science , 2019 , 366 ( 6470 ): 1255 - 1259 .
TOURNIER V , TOPHAM C M , GILLES A , et al . An engineered PET depolymerase to break down and recycle plastic bottles [J ] . Nature , 2020 , 580 ( 7802 ): 216 - 219 .
SARAI N S , FULTON T J , O’MEARA R L , et al . Directed evolution of enzymatic silicon-carbon bond cleavage in siloxanes [J ] . Science , 2024 , 383 ( 6681 ): 438 - 443 .
RAPPAZZO C G , TSE L V , KAKU C I , et al . Broad and potent activity against SARS-like viruses by an engineered human monoclonal antibody [J ] . Science , 2021 , 371 ( 6531 ): 823 - 829 .
BANACH B B , PLETNEV S , OLIA A S , et al . Antibody-directed evolution reveals a mechanism for enhanced neutralization at the HIV-1 fusion peptide site [J ] . Nature Communications , 2023 , 14 : 7593 .
TABEBORDBAR M , LAGERBORG K A , STANTON A , et al . Directed evolution of a family of AAV capsid variants enabling potent muscle-directed gene delivery across species [J ] . Cell , 2021 , 184 ( 19 ): 4919 - 4938.e22 .
LIN R , ZHOU Y T , YAN T , et al . Directed evolution of adeno-associated virus for efficient gene delivery to microglia [J ] . Nature Methods , 2022 , 19 ( 8 ): 976 - 985 .
CLARKE J , FERSHT A R . Engineered disulfide bonds as probes of the folding pathway of barnase: increasing the stability of proteins against the rate of denaturation [J ] . Biochemistry , 1993 , 32 ( 16 ): 4322 - 4329 .
REA V , KOLKMAN A J , VOTTERO E , et al . Active site substitution A82W improves the regioselectivity of steroid hydroxylation by cytochrome P450 BM3 mutants as rationalized by spin relaxation nuclear magnetic resonance studies [J ] . Biochemistry , 2012 , 51 ( 3 ): 750 - 760 .
MAITI A , BUFFALO C Z , SAURABH S , et al . Structural and photophysical characterization of the small ultra-red fluorescent protein [J ] . Nature Communications , 2023 , 14 : 4155 .
PACKER M S , LIU D R . Methods for the directed evolution of proteins [J ] . Nature Reviews Genetics , 2015 , 16 ( 7 ): 379 - 394 .
WANG Y J , XUE P , CAO M F , et al . Directed evolution: methodologies and applications [J ] . Chemical Reviews , 2021 , 121 ( 20 ): 12384 - 12444 .
ZACCOLO M , WILLIAMS D M , BROWN D M , et al . An approach to random mutagenesis of DNA using mixtures of triphosphate derivatives of nucleoside analogues [J ] . Journal of Molecular Biology , 1996 , 255 ( 4 ): 589 - 603 .
VANHERCKE T , AMPE C , TIRRY L , et al . Reducing mutational bias in random protein libraries [J ] . Analytical Biochemistry , 2005 , 339 ( 1 ): 9 - 14 .
DENNIG A , SHIVANGE A V , MARIENHAGEN J , et al . OmniChange: the sequence independent method for simultaneous site-saturation of five codons [J ] . PLoS One , 2011 , 6 ( 10 ): e26222 .
PÜLLMANN P , ULPINNIS C , MARILLONNET S , et al . Golden Mutagenesis: an efficient multi-site-saturation mutagenesis approach by Golden Gate cloning with automated primer design [J ] . Scientific Reports , 2019 , 9 : 10932 .
ZHAO H M , GIVER L , SHAO Z X , et al . Molecular evolution by staggered extension process (StEP) in vitro recombination [J ] . Nature Biotechnology , 1998 , 16 ( 3 ): 258 - 261 .
COCO W M , LEVINSON W E , CRIST M J , et al . DNA shuffling method for generating highly recombined genes and evolved enzymes [J ] . Nature Biotechnology , 2001 , 19 ( 4 ): 354 - 359 .
SIEBER V , MARTINEZ C A , ARNOLD F H . Libraries of hybrid proteins from distantly related sequences [J ] . Nature Biotechnology , 2001 , 19 ( 5 ): 456 - 460 .
BITTKER J A , LE B V , LIU J M , et al . Directed evolution of protein enzymes using nonhomologous random recombination [J ] . Proceedings of the National Academy of Sciences of the United States of America , 2004 , 101 ( 18 ): 7011 - 7016 .
GREENER A , CALLAHAN M , JERPSETH B . An efficient random mutagenesis technique using an E.coli mutator strain [J ] . Molecular Biotechnology , 1997 , 7 ( 2 ): 189 - 195 .
BADRAN A H , LIU D R . Development of potent in vivo mutagenesis plasmids with broad mutational spectra [J ] . Nature Communications , 2015 , 6 : 8425 .
MOORE C L , PAPA L J 3 RD, SHOULDERS M D. A processive protein Chimera introduces mutations across defined DNA regions in vivo [J ] . Journal of the American Chemical Society , 2018, 140 ( 37 ): 11560 - 11564 .
RAVIKUMAR A , ARZUMANYAN G A , OBADI M K A , et al . Scalable, continuous evolution of genes at mutation rates above genomic error thresholds [J ] . Cell , 2018 , 175 ( 7 ): 1946 - 1957.e13 .
RAVIKUMAR A , ARRIETA A , LIU C C . An orthogonal DNA replication system in yeast [J ] . Nature Chemical Biology , 2014 , 10 ( 3 ): 175 - 177 .
HALPERIN S O , TOU C J , WONG E B , et al . CRISPR-guided DNA polymerases enable diversification of all nucleotides in a tunable window [J ] . Nature , 2018 , 560 ( 7717 ): 248 - 252 .
TOU C J , SCHAFFER D V , DUEBER J E . Targeted diversification in the S. cerevisiae genome with CRISPR-guided DNA polymerase I [J ] . ACS Synthetic Biology , 2020 , 9 ( 7 ): 1911 - 1916 .
ÁLVAREZ B , MENCÍA M , DE LORENZO V , et al . In vivo diversification of target genomic sites using processive base deaminase fusions blocked by dCas9 [J ] . Nature Communications , 2020 , 11 : 6436 .
YI X , KHEY J , KAZLAUSKAS R J , et al . Plasmid hypermutation using a targeted artificial DNA replisome [J ] . Science Advances , 2021 , 7 ( 29 ): eabg8712 .
CROOK N , ABATEMARCO J , SUN J , et al . In vivo continuous evolution of genes and pathways in yeast [J ] . Nature Communications , 2016 , 7 : 13051 .
CARR P A , WANG H H , STERLING B , et al . Enhanced multiplex genome engineering through co-operative oligonucleotide co-selection [J ] . Nucleic Acids Research , 2012 , 40 ( 17 ): e132 .
LEWIS J C , ARNOLD F H . Catalysts on demand: selective oxidations by laboratory-evolved cytochrome P450 BM3 [J ] . Chimia , 2009 , 63 ( 6 ): 309 .
COELHO P S , BRUSTAD E M , KANNAN A , et al . Olefin cyclopropanation via carbene transfer catalyzed by engineered cytochrome P450 enzymes [J ] . Science , 2013 , 339 ( 6117 ): 307 - 310 .
CHEN H Q , LIU S , PADULA S , et al . Efficient, continuous mutagenesis in human cells using a pseudo-random DNA editor [J ] . Nature Biotechnology , 2020 , 38 ( 2 ): 165 - 168 .
STIFFLER M A , HEKSTRA D R , RANGANATHAN R . Evolvability as a function of purifying selection in TEM-1 β-lactamase [J ] . Cell , 2015 , 160 ( 5 ): 882 - 892 .
STARR T N , GREANEY A J , HILTON S K , et al . Deep mutational scanning of SARS-CoV-2 receptor binding domain reveals constraints on folding and ACE2 binding [J ] . Cell , 2020 , 182 ( 5 ): 1295 - 1310.e20 .
MA L , LIN Y H . Orthogonal RNA replication enables directed evolution and Darwinian adaptation in mammalian cells [J ] . Nature Chemical Biology , 2025 , 21 ( 3 ): 451 - 463 .
COLLINS C H , LEADBETTER J R , ARNOLD F H . Dual selection enhances the signaling specificity of a variant of the quorum-sensing transcriptional activator LuxR [J ] . Nature Biotechnology , 2006 , 24 ( 6 ): 708 - 712 .
MORRISON M S , PODRACKY C J , LIU D R . The developing toolkit of continuous directed evolution [J ] . Nature Chemical Biology , 2020 , 16 ( 6 ): 610 - 619 .
MOLINA R S , RIX G , MENGISTE A A , et al . In vivo hypermutation and continuous evolution [J ] . Nature Reviews Methods Primers , 2022 , 2 : 36 .
ESVELT K M , CARLSON J C , LIU D R . A system for the continuous directed evolution of biomolecules [J ] . Nature , 2011 , 472 ( 7344 ): 499 - 503 .
PACKER M S , REES H A , LIU D R . Phage-assisted continuous evolution of proteases with altered substrate specificity [J ] . Nature Communications , 2017 , 8 : 956 .
BLUM T R , LIU H , PACKER M S , et al . Phage-assisted evolution of botulinum neurotoxin proteases with reprogrammed specificity [J ] . Science , 2021 , 371 ( 6531 ): 803 - 810 .
MILLER S M , WANG T N , RANDOLPH P B , et al . Continuous evolution of Sp Cas9 variants compatible with non-G PAMs [J ] . Nature Biotechnology , 2020 , 38 ( 4 ): 471 - 481 .
RICHTER M F , ZHAO K T , ETON E , et al . Phage-assisted evolution of an adenine base editor with improved Cas domain compatibility and activity [J ] . Nature Biotechnology , 2020 , 38 ( 7 ): 883 - 891 .
MERCER J A M , DECARLO S J , ROY BURMAN S S , et al . Continuous evolution of compact protein degradation tags regulated by selective molecular glues [J ] . Science , 2024 , 383 ( 6688 ): eadk4422 .
ENGLISH J G , OLSEN R H J , LANSU K , et al . VEGAS as a platform for facile directed evolution in mammalian cells [J ] . Cell , 2019 , 178 ( 3 ): 748 - 761.e17 .
DENES C E , COLE A J , TRAN M T N , et al . The VEGAS platform is unsuitable for mammalian directed evolution [J ] . ACS Synthetic Biology , 2022 , 11 ( 10 ): 3544 - 3549 .
KIMMAN T G , SMIT E , KLEIN M R . Evidence-based biosafety: a review of the principles and effectiveness of microbiological containment measures [J ] . Clinical Microbiology Reviews , 2008 , 21 ( 3 ): 403 - 425 .
ARTIKA I M , MA’ROEF C N . Laboratory biosafety for handling emerging viruses [J ] . Asian Pacific Journal of Tropical Biomedicine , 2017 , 7 ( 5 ): 483 - 491 .
WELLNER A , MCMAHON C , GILMAN M S A , et al . Rapid generation of potent antibodies by autonomous hypermutation in yeast [J ] . Nature Chemical Biology , 2021 , 17 ( 10 ): 1057 - 1064 .
RIX G , WILLIAMS R L , HU V J , et al . Continuous evolution of user-defined genes at 1 million times the genomic mutation rate [J ] . Science , 2024 , 386 ( 6722 ): eadm9073 .
TIAN R Z , REHM F B H , CZERNECKI D , et al . Establishing a synthetic orthogonal replication system enables accelerated evolution in E . coli [J ] . Science , 2024 , 383 ( 6681 ): 421 - 426 .
MA Z Y , LI W J , SHEN Y H , et al . EvoAI enables extreme compression and reconstruction of the protein sequence space [J ] . Nature Methods , 2025 , 22 ( 1 ): 102 - 112 .
JOHNSTON K E , ALMHJELL P J , WATKINS-DULANEY E J , et al . A combinatorially complete epistatic fitness landscape in an enzyme active site [J ] . Proceedings of the National Academy of Sciences of the United States of America , 2024 , 121 ( 32 ): e2400439121 .
WEINREICH D M , DELANEY N F , DEPRISTO M A , et al . Darwinian evolution can follow only very few mutational paths to fitter proteins [J ] . Science , 2006 , 312 ( 5770 ): 111 - 114 .
PODGORNAIA A I , LAUB M T . Pervasive degeneracy and epistasis in a protein-protein interface [J ] . Science , 2015 , 347 ( 6222 ): 673 - 677 .
FOX R , ROY A , GOVINDARAJAN S , et al . Optimizing the search algorithm for protein engineering by directed evolution [J ] . Protein Engineering Design and Selection , 2003 , 16 ( 8 ): 589 - 597 .
FOX R J , DAVIS S C , MUNDORFF E C , et al . Improving catalytic function by ProSAR-driven enzyme evolution [J ] . Nature Biotechnology , 2007 , 25 ( 3 ): 338 - 344 .
ROMERO P A , KRAUSE A , ARNOLD F H . Navigating the protein fitness landscape with Gaussian processes [J ] . Proceedings of the National Academy of Sciences of the United States of America , 2013 , 110 ( 3 ): E193 - E201 .
OTWINOWSKI J , PLOTKIN J B . Inferring fitness landscapes by regression produces biased estimates of epistasis [J ] . Proceedings of the National Academy of Sciences of the United States of America , 2014 , 111 ( 22 ): E2301 - E2309 .
OFER D , BRANDES N , LINIAL M . The language of proteins: NLP, machine learning & protein sequences [J ] . Computational and Structural Biotechnology Journal , 2021 , 19 : 1750 - 1758 .
FERRUZ N , HÖCKER B . Controllable protein design with language models [J ] . Nature Machine Intelligence , 2022 , 4 ( 6 ): 521 - 532 .
ASGARI E , MOFRAD M R K . Continuous distributed representation of biological sequences for deep proteomics and genomics [J ] . PLoS One , 2015 , 10 ( 11 ): e0141287 .
HEINZINGER M , ELNAGGAR A , WANG Y , et al . Modeling aspects of the language of life through transfer-learning protein sequences [J ] . BMC Bioinformatics , 2019 , 20 ( 1 ): 723 .
ALLEY E C , KHIMULYA G , BISWAS S , et al . Unified rational protein engineering with sequence-based deep representation learning [J ] . Nature Methods , 2019 , 16 ( 12 ): 1315 - 1322 .
RAO R , BHATTACHARYA N , THOMAS N , et al . Evaluating protein transfer learning with TAPE [C ] // Advances in Neural Information Processing Systems 32 (NeurIPS 2019 ), 2019 , 32 : 9689 - 9701 [2025-06-03] . https://proceedings.neurips.cc/paper_files/paper/2019/hash/37f65c068b7723cd7809ee2d31d7861c-Abstract.html https://proceedings.neurips.cc/paper_files/paper/2019/hash/37f65c068b7723cd7809ee2d31d7861c-Abstract.html .
MADANI A , KRAUSE B , GREENE E R , et al . Large language models generate functional protein sequences across diverse families [J ] . Nature Biotechnology , 2023 , 41 ( 8 ): 1099 - 1106 .
MADANI A , MCCANN B , NAIK N , et al . ProGen: language modeling for protein generation [EB/OL ] . arXiv , 2020 : 2004 . 03497 . ( 2020-03-08 )[ 2025-06-03 ] . https://arxiv.org/abs/2004.03497v1 https://arxiv.org/abs/2004.03497v1 .
RIVES A , MEIER J , SERCU T , et al . Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences [J ] . Proceedings of the National Academy of Sciences of the United States of America , 2021 , 118 ( 15 ): e2016239118 .
MEIER J , RAO R , VERKUIL R , et al . Language models enable zero-shot prediction of the effects of mutations on protein function [C/OL ] // Advances in Neural Information Processing Systems 34 (NeurIPS 2021 ), 2021 , 34 : 29287 - 29303 [2025-06-03] . https://proceedings.neurips.cc/paper_files/paper/2021/hash/f51338d736f95dd42427296047067694-Abstract.html https://proceedings.neurips.cc/paper_files/paper/2021/hash/f51338d736f95dd42427296047067694-Abstract.html .
RAO R M , LIU J , VERKUIL R , et al . MSA transformer [C ] // Proceedings of the 38th International Conference on Machine Learning , PMLR , 2021 , 139 : 8844 - 8856 [2025-06-03] . https://proceedings.mlr.press/v139/rao21a.html https://proceedings.mlr.press/v139/rao21a.html .
BRANDES N , OFER D , PELEG Y , et al . ProteinBERT: a universal deep-learning model of protein sequence and function [J ] . Bioinformatics , 2022 , 38 ( 8 ): 2102 - 2110 .
FERRUZ N , SCHMIDT S , HÖCKER B . ProtGPT2 is a deep unsupervised language model for protein design [J ] . Nature Communications , 2022 , 13 : 4348 .
ELNAGGAR A , HEINZINGER M , DALLAGO C , et al . ProtTrans: toward understanding the language of life through self-supervised learning [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2022 , 44 ( 10 ): 7112 - 7127 .
LIN Z M , AKIN H , RAO R , et al . Evolutionary-scale prediction of atomic-level protein structure with a language model [J ] . Science , 2023 , 379 ( 6637 ): 1123 - 1130 .
NIJKAMP E , RUFFOLO J A , WEINSTEIN E N , et al . ProGen2: exploring the boundaries of protein language models [J ] . Cell Systems , 2023 , 14 ( 11 ): 968 - 978.e3 .
ELNAGGAR A , ESSAM H , SALAH-ELDIN W , et al . Ankh: optimized protein language model unlocks general-purpose modelling [EB/OL ] . arXiv , 2023 : 2301 . 06568 .( 2023-01-16 )[ 2025-06-03 ] . https://arxiv.org/abs/2301.06568v1 https://arxiv.org/abs/2301.06568v1 .
TRUONG T F JR , BEPLER T . PoET: a generative model of protein families as sequences-of-sequences [C/OL ] // Advances in Neural Information Processing Systems 36 (NeurIPS 2023), 2023: 2306 . 06156[2025-06-03] . https://proceedings.neurips.cc/paper_files/paper/2023/hash/f4366126eba252699b280e8f93c0ab2f-Abstract-Conference.html https://proceedings.neurips.cc/paper_files/paper/2023/hash/f4366126eba252699b280e8f93c0ab2f-Abstract-Conference.html .
HAYES T , RAO R , AKIN H , et al . Simulating 500 million years of evolution with a language model [J ] . Science , 2025 , 387 ( 6736 ): 850 - 858 .
YANG K K , FUSI N , LU A X . Convolutions are competitive with transformers for protein sequence pretraining [J ] . Cell Systems , 2024 , 15 ( 3 ): 286 - 294.e2 .
LV L , LIN Z Y , LI H , et al . ProLLaMA: a protein large language model for multi-task protein language processing [J ] . IEEE Transactions on Artificial Intelligence , 2025 , PP( 99 ): 1 - 12 .
CHEN B , CHENG X Y , LI P , et al . xTrimoPGLM: unified 100B-scale pre-trained transformer for deciphering the language of protein [EB/OL ] . arXiv , 2024 : 2401 . 06199 . ( 2024-01-11 )[ 2025-06-03 ] . https://arxiv.org/abs/2401.06199v2. https://arxiv.org/abs/2401.06199v2. .
SAYEED M A , TEKIN E , NADEEM M , et al . Prot42: a novel family of protein language models for target-aware protein binder generation [EB/OL ] . arXiv , 2025 : 2504 . 04453 . ( 2025-04-06 )[ 2025-06-03 ] . https://arxiv.org/abs/2504.04453v2 https://arxiv.org/abs/2504.04453v2 .
KELLY T , XIA S , LU J Y , et al . Unified deep learning of molecular and protein language representations with T5ProtChem [J ] . Journal of Chemical Information and Modeling , 2025 , 65 ( 8 ): 3990 - 3998 .
WANG Y H , WANG Z C , SADEH G , et al . LC-PLM: long-context protein language modeling using bidirectional mamba with shared projection layers [EB/OL ] . arXiv , 2025 :: 2411 . 08909 . ( 2024-10-29 )[ 2025-06-03 ] . https://doi.org/10.48550/arXiv.2411.08909 https://doi.org/10.48550/arXiv.2411.08909 .
BISWAS S , KHIMULYA G , ALLEY E C , et al . Low-N protein engineering with data-efficient deep learning [J ] . Nature Methods , 2021 , 18 ( 4 ): 389 - 396 .
SUZEK B E , WANG Y Q , HUANG H Z , et al . UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches [J ] . Bioinformatics , 2015 , 31 ( 6 ): 926 - 932 .
RADFORD A , JOZEFOWICZ R , SUTSKEVER I . Learning to generate reviews and discovering sentiment [EB/OL ] . arXiv , 2017 : 1704 . 01444 . ( 2017-04-05 )[ 2025-06-03 ] . https://doi.org/10.48550/arXiv.1704.01444 https://doi.org/10.48550/arXiv.1704.01444 .
HSU C , VERKUIL R , LIU J , et al . Learning inverse folding from millions of predicted structures [C/OL ] // Proceedings of the 39th International Conference on Machine Learning , PMLR , 2022 , 162 : 8946 - 8970 [2025-06-03] . https://proceedings.mlr.press/v162/hsu22a.html https://proceedings.mlr.press/v162/hsu22a.html .
JUMPER J , EVANS R , PRITZEL A , et al . Highly accurate protein structure prediction with AlphaFold [J ] . Nature , 2021 , 596 ( 7873 ): 583 - 589 .
NOTIN P , KOLLASCH A , RITTER D , et al . ProteinGym: large-scale benchmarks for protein fitness prediction and design [C/OL ] // Advances in Neural Information Processing Systems 36 (NeurIPS 2023 ), 2023 : 64331 - 64379 [2025-06-03] . https://proceedings.neurips.cc/paper_files/paper/2023/file/cac723e5ff29f65e3fcbb0739ae91bee-Paper-Datasets_and_Benchmarks.pdf https://proceedings.neurips.cc/paper_files/paper/2023/file/cac723e5ff29f65e3fcbb0739ae91bee-Paper-Datasets_and_Benchmarks.pdf .
BADRAN A H , GUZOV V M , HUAI Q , et al . Continuous evolution of Bacillus thuringiensis toxins overcomes insect resistance [J ] . Nature , 2016 , 533 ( 7601 ): 58 - 63 .
GLÖGL M , KRISHNAKUMAR A , RAGOTTE R J , et al . Target-conditioned diffusion generates potent TNFR superfamily antagonists and agonists [J ] . Science , 2024 , 386 ( 6726 ): 1154 - 1161 .
VÁZQUEZ TORRES S , BENARD VALLE M , MACKESSY S P , et al . De novo designed proteins neutralize lethal snake venom toxins [J ] . Nature , 2025 , 639 ( 8053 ): 225 - 231 .
YEH A H W , NORN C , KIPNIS Y , et al . De novo design of luciferases using deep learning [J ] . Nature , 2023 , 614 ( 7949 ): 774 - 780 .
KIPNIS Y , CHAIB A O , VOROBIEVA A A , et al . Design and optimization of enzymatic activity in a de novo β-barrel scaffold [J ] . Protein Science , 2022 , 31 ( 11 ): e4405 .
DING K , CHIN M , ZHAO Y L , et al . Machine learning-guided co-optimization of fitness and diversity facilitates combinatorial library design in enzyme engineering [J ] . Nature Communications , 2024 , 15 : 6392 .
FRAM B , SU Y , TRUEBRIDGE I , et al . Simultaneous enhancement of multiple functional properties using evolution-informed protein design [J ] . Nature Communications , 2024 , 15 : 5141 .
WU Z , JENNIFER KAN S B , LEWIS R D , et al . Machine learning-assisted directed protein evolution with combinatorial libraries [J ] . Proceedings of the National Academy of Sciences of the United States of America , 2019 , 116 ( 18 ): 8852 - 8858 .
WU N C , DAI L , OLSON C A , et al . Adaptation in protein fitness landscapes is facilitated by indirect paths [J ] . eLife , 2016 , 5 : e16965 .
CHU H Y , FONG J H C , THEAN D G L , et al . Accurate top protein variant discovery via low-N pick-and-validate machine learning [J ] . Cell Systems , 2024 , 15 ( 2 ): 193 - 203.e6 .
YANG J , LAL R G , BOWDEN J C , et al . Active learning-assisted directed evolution [J ] . Nature Communications , 2025 , 16 : 714 .
ZHOU Z Y , ZHANG L , YU Y X , et al . Enhancing efficiency of protein language models with minimal wet-lab data through few-shot learning [J ] . Nature Communications , 2024 , 15 : 5566 .
WITTMANN B J , YUE Y S , ARNOLD F H . Informed training set design enables efficient machine learning-assisted directed protein evolution [J ] . Cell Systems , 2021 , 12 ( 11 ): 1026 - 1045.e7 .
THURONYI B W , KOBLAN L W , LEVY J M , et al . Continuous evolution of base editors with expanded target compatibility and improved activity [J ] . Nature Biotechnology , 2019 , 37 ( 9 ): 1070 - 1079 .
HU J H , MILLER S M , GEURTS M H , et al . Evolved Cas9 variants with broad PAM compatibility and high DNA specificity [J ] . Nature , 2018 , 556 ( 7699 ): 57 - 63 .
JUDGE A , SANKARAN B , HU L Y , et al . Network of epistatic interactions in an enzyme active site revealed by large-scale deep mutational scanning [J ] . Proceedings of the National Academy of Sciences of the United States of America , 2024 , 121 ( 12 ): e2313513121 .
OLSON C A , WU N C , SUN R . A comprehensive biophysical description of pairwise epistasis throughout an entire protein domain [J ] . Current Biology , 2014 , 24 ( 22 ): 2643 - 2651 .
LIU G , ZENG H Y , MUELLER J , et al . Antibody complementarity determining region design using high-capacity machine learning [J ] . Bioinformatics , 2020 , 36 ( 7 ): 2126 - 2133 .
FERNANDEZ-DE-COSSIO-DIAZ J , UGUZZONI G , PAGNANI A . Unsupervised inference of protein fitness landscape from deep mutational scan [J ] . Molecular Biology and Evolution , 2021 , 38 ( 1 ): 318 - 328 .
SESTA L , UGUZZONI G , FERNANDEZ-DE-COSSIO-DIAZ J , et al . AMaLa: analysis of directed evolution experiments via annealed mutational approximated landscape [J ] . International Journal of Molecular Sciences , 2021 , 22 ( 20 ): 10908 .
SHEN M W , ZHAO K T , LIU D R . Reconstruction of evolving gene variants and fitness from short sequencing reads [J ] . Nature Chemical Biology , 2021 , 17 ( 11 ): 1188 - 1198 .
ALVAREZ S , NARTEY C M , MERCADO N , et al . In vivo functional phenotypes from a computational epistatic model of evolution [J ] . Proceedings of the National Academy of Sciences of the United States of America , 2024 , 121 ( 6 ): e2308895121 .
DI BARI L , BISARDI M , COTOGNO S , et al . Emergent time scales of epistasis in protein evolution [J ] . Proceedings of the National Academy of Sciences of the United States of America , 2024 , 121 ( 40 ): e2406807121 .
JIANG K Y , YAN Z Q , DI BERNARDO M , et al . Rapid in silico directed evolution by a protein language model with EVOLVEpro [J ] . Science , 2025 , 387 ( 6732 ): eadr6006 .
LANDWEHR G M , BOGART J W , MAGALHAES C , et al . Accelerated enzyme engineering by machine-learning guided cell-free expression [J ] . Nature Communications , 2025 , 16 : 865 .
JIANG F , LI M C , DONG J J , et al . A general temperature-guided language model to design proteins of enhanced stability and activity [J ] . Science Advances , 2024 , 10 ( 48 ): eadr2641 .
HIE B L , SHANKER V R , XU D , et al . Efficient evolution of human antibodies from general protein language models [J ] . Nature Biotechnology , 2024 , 42 ( 2 ): 275 - 283 .
SHANKER V R , BRUUN T U J , HIE B L , et al . Unsupervised evolution of protein and antibody complexes with a structure-informed language model [J ] . Science , 2024 , 385 ( 6704 ): 46 - 53 .
BEDBROOK C N , YANG K K , ROBINSON J E , et al . Machine learning-guided channelrhodopsin engineering enables minimally invasive optogenetics [J ] . Nature Methods , 2019 , 16 ( 11 ): 1176 - 1184 .
UNGER E K , KELLER J P , ALTERMATT M , et al . Directed evolution of a selective and sensitive serotonin sensor via machine learning [J ] . Cell , 2020 , 183 ( 7 ): 1986 - 2002.e26 .
SAITO Y , OIKAWA M , NAKAZAWA H , et al . Machine-learning-guided mutagenesis for directed evolution of fluorescent proteins [J ] . ACS Synthetic Biology , 2018 , 7 ( 9 ): 2014 - 2022 .
CHENG X Y , CHEN B , LI P , et al . Training compute-optimal protein language models [C/OL ] // Advances in Neural Information Processing Systems 37 (NeurIPS 2024 ), 2024 , 37 : 69386 - 69418 [2025-06-03] . https://proceedings.neurips.cc/paper_files/paper/2024/hash/8066ae1446b2bbccb5159587cc3b3bcc-Abstract-Conference https://proceedings.neurips.cc/paper_files/paper/2024/hash/8066ae1446b2bbccb5159587cc3b3bcc-Abstract-Conference .
LUO Y N , JIANG G D , YU T H , et al . ECNet is an evolutionary context-integrated deep learning framework for protein engineering [J ] . Nature Communications , 2021 , 12 : 5743 .
LI M C , KANG L Q , XIONG Y , et al . SESNet: sequence-structure feature-integrated deep learning method for data-efficient protein engineering [J ] . Journal of Cheminformatics , 2023 , 15 ( 1 ): 12 .
DIECKHAUS H , KUHLMAN B . Protein stability models fail to capture epistatic interactions of double point mutations [EB/OL ] . bioRxiv , 2024 : 2024 .08. 20 .608844. ( 2024-08-20 )[ 2025-06-03 ] . https://biorxiv.org/lookup/doi/10.1101/2024.08.20.608844 https://biorxiv.org/lookup/doi/10.1101/2024.08.20.608844 .
YU T H , BOOB A G , SINGH N , et al . In vitro continuous protein evolution empowered by machine learning and automation [J ] . Cell Systems , 2023 , 14 ( 8 ): 633 - 644 .
GELMAN S , JOHNSON B , FRESCHLIN C , et al . Biophysics-based protein language models for protein engineering [EB/OL ] . bioRxiv , 2024 : 2024 .03. 15 .585128. ( 2024-03-15 )[ 2025-06-03 ] . https://biorxiv.org/lookup/doi/10.1101/2024.03.15.585128 https://biorxiv.org/lookup/doi/10.1101/2024.03.15.585128 .
OLIVARES-GIL A , BARBERO-APARICIO J A , RODRÍGUEZ J J , et al . Semi-supervised prediction of protein fitness for data-driven protein engineering [J ] . Journal of Cheminformatics , 2025 , 17 ( 1 ): 88 .
VIG J , MADANI A , VARSHNEY L R , et al . BERTology meets biology: interpreting attention in protein language models [EB/OL ] . arXiv , 2020 : 2006 . 15222 . ( 2020-06-26 )[ 2025-06-03 ] . https://arxiv.org/abs/2006.15222v3 https://arxiv.org/abs/2006.15222v3 .
CHEN L , ZHANG Z H , LI Z H , et al . Learning protein fitness landscapes with deep mutational scanning data from multiple sources [J ] . Cell Systems , 2023 , 14 ( 8 ): 706 - 721.e5 .
0
Views
3
下载量
1
CSCD
Publicity Resources
Related Articles
Related Author
Related Institution
京公网安备11010802024621