

浏览全部资源
扫码关注微信
复旦大学药学院,上海 201203
Received:13 January 2023,
Revised:2023-03-15,
Published:30 June 2023
移动端阅览
陈志航, 季梦麟, 戚逸飞. 人工智能蛋白质结构设计算法研究进展[J]. 合成生物学, 2023, 4(3): 464-487
CHEN Zhihang, JI Menglin, QI Yifei. Research progress of artificial intelligence in desiging protein structures[J]. Synthetic Biology Journal, 2023, 4(3): 464-487
陈志航, 季梦麟, 戚逸飞. 人工智能蛋白质结构设计算法研究进展[J]. 合成生物学, 2023, 4(3): 464-487 DOI: 10.12211/2096-8280.2023-008.
CHEN Zhihang, JI Menglin, QI Yifei. Research progress of artificial intelligence in desiging protein structures[J]. Synthetic Biology Journal, 2023, 4(3): 464-487 DOI: 10.12211/2096-8280.2023-008.
蛋白质是各类生命活动不可缺少的承担者,其序列决定了折叠后的三维结构和功能。这些具有特定功能的蛋白质在生物医学等多个领域具有重要的应用价值。计算蛋白质设计可以根据所需的蛋白功能和结构设计氨基酸序列,生成自然界中不存在的蛋白质。传统计算蛋白质设计通常采用能量函数和特定的搜索优化算法获得设计的序列。近年来,随着先进算法的发展、大数据的积累和计算机硬件算力的增长,人工智能技术得到了蓬勃发展,并逐渐应用于蛋白质设计领域。本文综述了近年人工智能在蛋白质结构设计中的进展,侧重于各类算法的介绍,从固定骨架设计、可变骨架设计和序列结构生成三个方面回顾了最新的蛋白质结构设计算法,并阐明了其相对于传统计算方法的新颖性和创新性。在人工智能技术的赋能下,蛋白质设计的成功率和合理性获得大幅提高,按需功能蛋白设计的时代即将到来。
Proteins are essential to life as they carry out a great variety of biological functions. Protein sequences determine their three-dimensional structures
and therefore physiological functions. Proteins with specific functions have important applications in many fields such as biomedicine
where they are utilized in drug design and delivery. In the past
protein engineering and directed evolution are commonly used to improve the activity and stability of proteins. These methods
however
are both complex and expensive
as they require a large number of biological experiments for validation. Computational protein design (CPD) allows the design of amino acid sequences based on desired protein functions and structures
and more intriguingly
generation of proteins even not found in nature. Conventional CPD uses energy function and optimization algorithm to design protein sequences. In recent years
with the rapid development of artificial intelligence (AI) technique
the accumulation of big data and the development of high speed computing
AI has made great progresses in learning
and been successfully applied in CPD. In this review
based on the input constraints and sampling space size
we present a systematic overview of recent applications of AI in protein design from three aspects: fixed-backbone design
flexible-backbone design
and sequence structure generation. We focus on algorithms and protein feature encoding
present the effect of dataset size and architectural improvements on model performance in prediction
and showcase several enzymes
antibodies
and binding proteins that were successfully designed using these models. The advantages of AI compared with traditional CPD methods are also discussed. Finally
we highlight challenges in AI-aided protein design
and propose some strategies for solutions.
2
HUANG P S , BOYKEN S E , BAKER D . The coming of age of de novo protein design [J ] . Nature , 2016 , 537 ( 7620 ): 320 - 327 .
KHERSONSKY O , LIPSH R , AVIZEMER Z , et al . Automated design of efficient and functionally diverse enzyme repertoires [J ] . Molecular Cell , 2018 , 72 ( 1 ): 178 - 186.e5 .
GLASGOW A A , HUANG Y M , MANDELL D J , et al . Computational design of a modular protein sense-response system [J ] . Science , 2019 , 366 ( 6468 ): 1024 - 1028 .
ANFINSEN C B . Principles that govern the folding of protein chains [J ] . Science , 1973 , 181 ( 4096 ): 223 - 230 .
LEAVER-FAY A , O'MEARA M J , TYKA M , et al . Scientific benchmarks for guiding macromolecular energy function improvement [J ] . Methods in Enzymology , 2013 , 523 : 109 - 143 .
LEMAN J K , WEITZNER B D , LEWIS S M , et al . Macromolecular modeling and design in Rosetta: recent methods and frameworks [J ] . Nature Methods , 2020 , 17 ( 7 ): 665 - 680 .
NADRA A D , SERRANO L , ALIBÉS A . Chapter one-DNA-binding specificity prediction with FoldX [M ] // Methods in enzymology . New York : Academic Press . 2011 , 498 : 3 - 18 .
HUANG X Q , PEARCE R , ZHANG Y . EvoEF2: accurate and fast energy function for computational protein design [J ] . Bioinformatics , 2020 , 36 ( 4 ): 1135 - 1142 .
ALFORD R F , LEAVER-FAY A , JELIAZKOV J R , et al . The Rosetta all-atom energy function for macromolecular modeling and design [J ] . Journal of Chemical Theory and Computation , 2017 , 13 ( 6 ): 3031 - 3048 .
KUHLMAN B , DANTAS G , IRETON G C , et al . Design of a novel globular protein fold with atomic-level accuracy [J ] . Science , 2003 , 302 ( 5649 ): 1364 - 1368 .
SIEGEL J B , ZANGHELLINI A , LOVICK H M , et al . Computational design of an enzyme catalyst for a stereoselective bimolecular Diels-Alder reaction [J ] . Science , 2010 , 329 ( 5989 ): 309 - 313 .
SILVA D A , YU S , ULGE U Y , et al . De novo design of potent and selective mimics of IL-2 and IL-15 [J ] . Nature , 2019 , 565 ( 7738 ): 186 - 191 .
MOHAN K , UEDA G , KIM A R , et al . Topological control of cytokine receptor signaling induces differential effects in hematopoiesis [J ] . Science , 2019 , 364 ( 6442 ): eaav7532 .
CHEVALIER A , SILVA D A , ROCKLIN G J , et al . Massively parallel de novo protein design for targeted therapeutics [J ] . Nature , 2017 , 550 ( 7674 ): 74 - 79 .
CAO L X , GORESHNIK I , COVENTRY B , et al . De novo design of picomolar SARS-CoV-2 miniprotein inhibitors [J ] . Science , 2020 , 370 ( 6515 ): 426 - 431 .
LANGAN R A , BOYKEN S E , NG A H , et al . De novo design of bioactive protein switches [J ] . Nature , 2019 , 572 ( 7768 ): 205 - 210 .
DAWSON W M , LANG E J M , RHYS G G , et al . Structural resolution of switchable states of a de novo peptide assembly [J ] . Nature Communications , 2021 , 12 : 1530 .
SHEN H , FALLAS J A , LYNCH E , et al . De novo design of self-assembling helical protein filaments [J ] . Science , 2018 , 362 ( 6415 ): 705 - 709 .
HSIA Y , BALE J B , GONEN S , et al . Design of a hyperstable 60-subunit protein icosahedron [J ] . Nature , 2016 , 535 ( 7610 ): 136 - 139 .
ROCKLIN G J , CHIDYAUSIKU T M , GORESHNIK I , et al . Global analysis of protein folding using massively parallel design, synthesis, and testing [J ] . Science , 2017 , 357 ( 6347 ): 168 - 175 .
BERMAN H M , WESTBROOK J , FENG Z K , et al . The protein data bank [J ] . Nucleic Acids Research , 2000 , 28 ( 1 ): 235 - 242 .
FOX N K , BRENNER S E , CHANDONIA J M . SCOPe: structural classification of proteins—extended, integrating SCOP and ASTRAL data and classification of new structures [J ] . Nucleic Acids Research , 2014 , 42 ( D1 ): D304 - D309 .
CONSORTIUM T U , BATEMAN A , MARTIN M J , et al . UniProt: the universal protein knowledgebase [J ] . Nucleic Acids Research , 2017 , 45 ( D1 ): D158 - D169 .
MISTRY J , CHUGURANSKY S , WILLIAMS L , et al . Pfam: the protein families database in 2021 [J ] . Nucleic Acids Research , 2021 , 49 ( D1 ): D412 - D419 .
FRAPPIER V , KEATING A E . Data-driven computational protein design [J ] . Current Opinion in Structural Biology , 2021 , 69 : 63 - 69 .
KWON Y , SHIN W H , KO J , et al . AK-score: accurate protein-ligand binding affinity prediction using an ensemble of 3D-convolutional neural networks [J ] . International Journal of Molecular Sciences , 2020 , 21 ( 22 ): 8424 .
JIANG D J , HSIEH C Y , WU Z X , et al . InteractionGraphNet: a novel and efficient deep graph representation learning framework for accurate protein-ligand interaction predictions [J ] . Journal of Medicinal Chemistry , 2021 , 64 ( 24 ): 18209 - 18232 .
JONES D , KIM H , ZHANG X H , et al . Improved protein-ligand binding affinity prediction with structure-based deep fusion inference [J ] . Journal of Chemical Information and Modeling , 2021 , 61 ( 4 ): 1583 - 1592 .
JIMÉNEZ J , ŠKALIČ M , MARTÍNEZ-ROSELL G , et al . K DEEP : protein-ligand absolute binding affinity prediction via 3D-convolutional neural networks [J ] . Journal of Chemical Information and Modeling , 2018 , 58 ( 2 ): 287 - 296 .
SLEDZIESKI S , SINGH R , COWEN L , et al . D-SCRIPT translates genome to phenome with sequence-based, structure-aware, genome-scale predictions of protein-protein interactions [J ] . Cell Systems , 2021 , 12 ( 10 ): 969 - 982.e6 .
BARANWAL M , MAGNER A , SALDINGER J , et al . Struct2Graph: a graph attention network for structure based predictions of protein-protein interactions [J ] . BMC Bioinformatics , 2022 , 23 ( 1 ): 370 .
WANG S , CHEN W Q , HAN P F , et al . RGN: residue-based graph attention and convolutional network for protein-protein interaction site prediction [J ] . Journal of Chemical Information and Modeling , 2022 , 62 ( 23 ): 5961 - 5974 .
SHEN W X , ZENG X , ZHU F , et al . Out-of-the-box deep learning prediction of pharmaceutical properties by broadly learned knowledge-based molecular representations [J ] . Nature Machine Intelligence , 2021 , 3 ( 4 ): 334 - 343 .
BUTTON A , MERK D , HISS J A , et al . Automated de novo molecular design by hybrid machine intelligence and rule-driven chemical synthesis [J ] . Nature Machine Intelligence , 2019 , 1 ( 7 ): 307 - 315 .
DE CAO N , KIPF T . MolGAN: an implicit generative model for small molecular graphs [EB/OL ] . arXiv , 2018 : 1805 . 11973 [ 2023-10-01 ] . https://arxiv.org/abs/1805.11973 https://arxiv.org/abs/1805.11973
WINTER R , MONTANARI F , STEFFEN A , et al . Efficient multi-objective molecular optimization in a continuous latent space [J ] . Chemical Science , 2019 , 10 ( 34 ): 8016 - 8024 .
DING W Z , NAKAI K T , GONG H P . Protein design via deep learning [J ] . Briefings in Bioinformatics , 2022 , 23 ( 3 ): bbac102 .
JUMPER J , EVANS R , PRITZEL A , et al . Highly accurate protein structure prediction with AlphaFold [J ] . Nature , 2021 , 596 ( 7873 ): 583 - 589 .
BAEK M , DIMAIO F , ANISHCHENKO I , et al . Accurate prediction of protein structures and interactions using a three-track neural network [J ] . Science , 2021 , 373 ( 6557 ): 871 - 876 .
DAHIYAT B I , MAYO S L . Protein design automation [J ] . Protein Science , 1996 , 5 ( 5 ): 895 - 903 .
DAHIYAT B I , MAYO S L . De novo protein design: fully automated sequence selection [J ] . Science , 1997 , 278 ( 5335 ): 82 - 87 .
LI Z X , YANG Y D , FARAGGI E , et al . Direct prediction of profiles of sequences compatible with a protein structure by neural networks with fragment-based local and energy-based nonlocal profiles [J ] . Proteins: Structure, Function, and Bioinformatics , 2014 , 82 ( 10 ): 2565 - 2573 .
DAI L , YANG Y D , KIM H R , et al . Improving computational protein design by using structure-derived sequence profile [J ] . Proteins: Structure, Function, and Bioinformatics , 2010 , 78 ( 10 ): 2338 - 2348 .
YANG Y D , ZHOU Y Q . Ab initio folding of terminal segments with secondary structures reveals the fine difference between two closely related all-atom statistical energy functions [J ] . Protein Science , 2008 , 17 ( 7 ): 1212 - 1219 .
WANG J X , CAO H L , ZHANG J Z H , et al . Computational protein design with deep learning neural networks [J ] . Scientific Reports , 2018 , 8 : 6349 .
O'CONNELL J , LI Z X , HANSON J , et al . SPIN2: predicting sequence profiles from protein structures using deep neural networks [J ] . Proteins: Structure, Function, and Bioinformatics , 2018 , 86 ( 6 ): 629 - 633 .
CHEN S , SUN Z , LIN L H , et al . To improve protein sequence profile prediction through image captioning on pairwise residue distance map [J ] . Journal of Chemical Information and Modeling , 2020 , 60 ( 1 ): 391 - 399 .
KRIZHEVSKY A , SUTSKEVER I , HINTON G E . ImageNet classification with deep convolutional neural networks [J ] . Communications of the ACM , 2017 , 60 ( 6 ): 84 - 90 .
ZHANG Y , CHEN Y , WANG C R , et al . ProDCoNN: protein design using a convolutional neural network [J ] . Proteins: Structure, Function, and Bioinformatics , 2020 , 88 ( 7 ): 819 - 829 .
ANAND N , EGUCHI R , MATHEWS I I , et al . Protein sequence design with a learned potential [J ] . Nature Communications , 2022 , 13 : 746 .
QI Y F , ZHANG J Z H . DenseCPD: improving the accuracy of neural-network-based computational protein sequence design with DenseNet [J ] . Journal of Chemical Information and Modeling , 2020 , 60 ( 3 ): 1245 - 1252 .
HUANG G , LIU Z , VAN DER MAATEN L , et al . Densely connected convolutional networks [C ] // 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . July 21-26, 2017 , Honolulu, HI, USA . IEEE , 2017 : 2261 - 2269 .
SHROFF R , COLE A W , DIAZ D J , et al . Discovery of novel gain-of-function mutations guided by structure-based deep learning [J ] . ACS Synthetic Biology , 2020 , 9 ( 11 ): 2927 - 2935 .
LU H Y , DIAZ D J , CZARNECKI N J , et al . Machine learning-aided engineering of hydrolases for PET depolymerization [J ] . Nature , 2022 , 604 ( 7907 ): 662 - 667 .
NORN C , WICKY B I M , JUERGENS D , et al . Protein sequence design by conformational landscape optimization [J ] . Proceedings of the National Academy of Sciences of the United States of America , 2021 , 118 ( 11 ): e2017228118 .
YANG J Y , ANISHCHENKO I , PARK H , et al . Improved protein structure prediction using predicted interresidue orientations [J ] . Proceedings of the National Academy of Sciences of the United States of America , 2020 , 117 ( 3 ): 1496 - 1503 .
WANG X , FLANNERY S T , KIHARA D . Protein docking model evaluation by graph neural networks [J ] . Frontiers in Molecular Biosciences , 2021 , 8 : 647915 .
KIPF T N , WELLING M . Semi-supervised classification with graph convolutional networks [EB/OL ] . arXiv , 2016 : 1609 . 02907 [ 2023-01-10 ] . https://arxiv.org/abs/1609.02907 https://arxiv.org/abs/1609.02907
INGRAHAM J , GARG V K , BARZILAY R , et al . Generative models for graph-based protein design [C/OL ] // Advances in Neural Information Processing Systems 32 (NeurIPS 2019 ), December 2019 , Vancouver, Canada, Neural Information Processing Systems Foundation , 2019[2023-01-10] . https://dspace.mit.edu/bitstream/handle/1721.1/129731/NeurIPS-2019-generative-models-for-graph-based-protein-design-Paper.pdf?sequence=2&isAllowed=y https://dspace.mit.edu/bitstream/handle/1721.1/129731/NeurIPS-2019-generative-models-for-graph-based-protein-design-Paper.pdf?sequence=2&isAllowed=y .
VASWANI A , SHAZEER N , PARMAR N , et al . Attention is all you need [C ] // Proceedings of the 31st International Conference on Neural Information Processing Systems . December 4-9, 2017 , Long Beach, California, USA . New York : ACM , 2017 : 6000 - 6010 .
STROKACH A , BECERRA D , CORBI-VERGE C , et al . Fast and flexible protein design using deep graph neural networks [J ] . Cell Systems , 2020 , 11 ( 4 ): 402 - 411.e4 .
JING B , EISMANN S , SURIANA P , et al . Learning from protein structure with geometric vector perceptrons [EB/OL ] . arXiv , 2020 : 2009 . 01411 [ 2023-01-10 ] . https://arxiv.org/abs/2009.01411 https://arxiv.org/abs/2009.01411 .
ORELLANA G A , CACERES-DELPIANO J , IBAÑEZ R , et al . Protein sequence sampling and prediction from structural data [EB/OL ] . bioRxiv , 2021 [ 2023-01-10 ] https://www.biorxiv.org/content/10.1101/2021.09.06.459171v3 https://www.biorxiv.org/content/10.1101/2021.09.06.459171v3 .
LI A J , LU M , DESTA I , et al . Neural network-derived Potts models for structure-based protein design using backbone atomic coordinates and tertiary motifs [J ] . Protein Science , 2023 , 32 ( 2 ): e4554 .
ZHENG F , ZHANG J , GRIGORYAN G . Tertiary structural propensities reveal fundamental sequence/structure relationships [J ] . Structure , 2015 , 23 ( 5 ): 961 - 971 .
HSU C , VERKUIL R , LIU J , et al . Learning inverse folding from millions of predicted structures [C/OL ] // Proceedings of the 39th International Conference on Machine Learning . Proceedings of Machine Learning Research, PMLR . 2022 : 8946 - 8970 [2023-01-10] . https://www.biorxiv.org/content/10.1101/2022.04.10.487779v2 https://www.biorxiv.org/content/10.1101/2022.04.10.487779v2 .
MCPARTLON M , LAI B , XU J B . A deep SE(3)-equivariant model for learning inverse protein folding [EB/OL ] . bioXiv , 202 [ 2023-01-10 ] . https://www.biorxiv.org/content/10.1101/2022.04.15.488492v1 https://www.biorxiv.org/content/10.1101/2022.04.15.488492v1 .
ZADEH A , CHEN M , PORIA S , et al . Tensor fusion network for multimodal sentiment analysis [EB/OL ] . arXiv , 2017 : 1707 . 07250 [ 2023-01-10 ] . https://arxiv.org/abs/1707.07250 https://arxiv.org/abs/1707.07250 .
XIONG P , HU X H , HUANG B , et al . Increasing the efficiency and accuracy of the ABACUS protein sequence design method [J ] . Bioinformatics , 2020 , 36 ( 1 ): 136 - 144 .
LIU Y F , ZHANG L , WANG W L , et al . Rotamer-free protein sequence design based on deep learning and self-consistency [J ] . Nature Computational Science , 2022 , 2 ( 7 ): 451 - 462 .
XIONG P , WANG M , ZHOU X Q , et al . Protein design with a comprehensive statistical energy function and boosted by experimental selection for foldability [J ] . Nature Communications , 2014 , 5 : 5330 .
RONEY J P , OVCHINNIKOV S . State-of-the-art estimation of protein model accuracy using AlphaFold [J ] . Physical Review Letters , 2022 , 129 ( 23 ): 238101 .
DAUPARAS J , ANISHCHENKO I , BENNETT N , et al . Robust deep learning-based protein sequence design using ProteinMPNN [J ] . Science , 2022 , 378 ( 6615 ): 49 - 56 .
HUANG B , FAN T W , WANG K Y , et al . Accurate and efficient protein sequence design through learning concise local environment of residues [J ] . Bioinformatics , 2023 : btad122 .
ZHENG Z , DENG Y , XUE D , et al . Structure-informed language models are protein designers [EB/OL ] . arXiv , 2023 : 2302 . 01649 [ 2023-02-10 ] . https://arxiv.org/abs/2302.01649 https://arxiv.org/abs/2302.01649 .
INGRAHAM J , GARG V K , BARZILAY R , et al . Generative models for graph-based protein design [C ] // Proceedings of the 33rd International Conference on Neural Information Processing Systems , 8 -14 December 2019, Vancouver, Canada, Curran Associates Inc , 2019: 1417[2023-01-10] . https://proceedings.neurips.cc/paper/2019/hash/f3a4ff4839c56a5f460c88cce3666a2b-Abstract.html https://proceedings.neurips.cc/paper/2019/hash/f3a4ff4839c56a5f460c88cce3666a2b-Abstract.html .
GAO Z Y , TAN C , LI S Z . ProDesign: toward effective and efficient protein design [EB/OL ] . arXiv , 2022 [ 2023-01-10 ] . https://arxiv.org/abs/2209.12643v1 https://arxiv.org/abs/2209.12643v1 .
TAN C , GAO Z Y , XIA J , et al . Generative de novo protein design with global context [EB/OL ] . arXiv , 2022 [ 2023-01-10 ] . https://arxiv.org/abs/2204.10673 https://arxiv.org/abs/2204.10673 .
GAO Z Y , TAN C , LI S Z . AlphaDesign: a graph protein design method and benchmark on AlphaFoldDB [EB/OL ] . arXiv , 2022 [ 2023-01-10 ] . https://arxiv.org/abs/2202.01079v2 https://arxiv.org/abs/2202.01079v2 .
ANISHCHENKO I , PELLOCK S J , CHIDYAUSIKU T M , et al . De novo protein design by deep network hallucination [J ] . Nature , 2021 , 600 ( 7889 ): 547 - 552 .
TISCHER D , LISANZA S , WANG J , et al . Design of proteins presenting discontinuous functional sites using deep learning [EB/OL ] . bioXiv , 2020 [ 2023-01-10 ] . https://www.biorxiv.org/content/10.1101/2020.11.29.402743v1 https://www.biorxiv.org/content/10.1101/2020.11.29.402743v1 .
WANG J , LISANZA S , JUERGENS D , et al . Scaffolding protein functional sites using deep learning [J ] . Science , 2022 , 377 ( 6604 ): 387 - 394 .
ZHANG S H , XU Y J , PEI J F , et al . AutoFoldFinder: an automated adaptive optimization toolkit for de novo protein fold design [EB/OL ] . 2021[ 2023-01-10 ] . https://www.mlsb.io/papers_2021/MLSB2021_AutoFoldFinder.pdf https://www.mlsb.io/papers_2021/MLSB2021_AutoFoldFinder.pdf .
YEH A H W , NORN C , KIPNIS Y , et al . De novo design of luciferases using deep learning [J ] . Nature , 2023 , 614 ( 7949 ): 774 - 780 .
DOU J Y , VOROBIEVA A A , SHEFFLER W , et al . De novo design of a fluorescence-activating β-barrel [J ] . Nature , 2018 , 561 ( 7724 ): 485 - 491 .
CAO L X , COVENTRY B , GORESHNIK I , et al . Design of protein-binding proteins from the target structure alone [J ] . Nature , 2022 , 605 ( 7910 ): 551 - 560 .
HUANG B , XU Y , HU X H , et al . A backbone-centred energy function of neural networks for protein design [J ] . Nature , 2022 , 602 ( 7897 ): 523 - 528 .
LIANG S D , LI Z X , ZHAN J , et al . De novo protein design by an energy function based on series expansion in distance and orientation dependence [J ] . Bioinformatics , 2021 , 38 ( 1 ): 86 - 93 .
LIANG S D , ZHENG D D , ZHANG C , et al . Fast and accurate prediction of protein side-chain conformations [J ] . Bioinformatics , 2011 , 27 ( 20 ): 2913 - 2914 .
LIANG S D , ZHOU Y Q , GRISHIN N , et al . Protein side chain modeling with orientation-dependent atomic force fields derived by series expansions [J ] . Journal of Computational Chemistry , 2011 , 32 ( 8 ): 1680 - 1686 .
LIANG S D , ZHANG C , ZHOU Y Q . LEAP: highly accurate prediction of protein loop conformations by integrating coarse-grained sampling and optimized energy scores with all-atom refinement of backbone and side chains [J ] . Journal of Computational Chemistry , 2014 , 35 ( 4 ): 335 - 341 .
ANAND N , HUANG P S . Generative modeling for protein structures [C/OL ] // 6th International Conference on Learning Representations , Vancouver, BC, Canada, April 30-May 3 , 2018[2023-01-10] . https://openreview.net/forum?id=HJFXnYJvG https://openreview.net/forum?id=HJFXnYJvG .
ANAND N , EGUCHI R , HUANG P S . Fully differentiable full-atom protein backbone generation [EB/OL ] . ICLR 2019 Workshop on Deep Generative Models for Highly Structured Data , 2019 [ 2023-01-10 ] . https://openreview.net/group?id=ICLR.cc/2019/Workshop/DeepGenStruct https://openreview.net/group?id=ICLR.cc/2019/Workshop/DeepGenStruct .
EGUCHI R R , CHOE C A , HUANG P S . Ig-VAE: generative modeling of protein structure by direct 3D coordinate generation [J ] . PLoS Computational Biology , 2022 , 18 ( 6 ): e1010271 .
LAI B Q , MCPARTLON M , XU J B . End-to-End deep structure generative model for protein design [EB/OL ] . bioRxiv , 2022 [ 2023-01-10 ] . https://www.biorxiv.org/content/10.1101/2022.07.09.499440v1 https://www.biorxiv.org/content/10.1101/2022.07.09.499440v1 .
GUO X J , DU Y Q , TADEPALLI S , et al . Generating tertiary protein structures via interpretable graph variational autoencoders [J ] . Bioinformatics Advances , 2021 , 1 ( 1 ): vbab036 .
HARTEVELD Z , SOUTHERN J , LOUKAS A , et al . Deep sharpening of topological features for de novo protein design [EB/OL ] . ICLR 2022 Machine Learning for Drug Discovery , 2022 [ 2023-01-10 ] . https://openreview.net/forum?id=DwN81YIXGQP https://openreview.net/forum?id=DwN81YIXGQP .
HO J , JAIN A , ABBEEL P . Denoising diffusion probabilistic models [EB/OL ] . arXiv , 2020 : 2006 .11239. https://arxiv.org/abs/2006.11239 https://arxiv.org/abs/2006.11239 .
SOHL-DICKSTEIN J , WEISS E A , MAHESWARANATHAN N , et al . Deep unsupervised learning using nonequilibrium thermodynamics [C ] // Proceedings of the 32nd International Conference on International Conference on Machine Learning-Volume 37 . July 6-11, 2015 , Lille, France . New York : ACM , 2015 : 2256 - 2265 .
WATSON J L , JUERGENS D , BENNETT N R , et al . Broadly applicable and accurate protein design by integrating structure prediction networks and diffusion generative models [EB/OL ] . bioXiv , 2022 [ 2023-01-10 ] . https://www.biorxiv.org/content/10.1101/2022.12.09.519842v2 https://www.biorxiv.org/content/10.1101/2022.12.09.519842v2 .
TRIPPE B L , YIM J , TISCHER D , et al . Diffusion probabilistic modeling of protein backbones in 3D for the motif-scaffolding problem [EB/OL ] . arXiv , 2022 : 2206 . 04119 [ 2023-01-10 ] . https://arxiv.org/abs/2206.04119 https://arxiv.org/abs/2206.04119 .
WU K E , YANG K K , BERG R V D , et al . Protein structure generation via folding diffusion [EB/OL ] . arXiv , 2022 : 2209 . 15611 [ 2023-01-10 ] . https://arxiv.org/abs/2209.15611 https://arxiv.org/abs/2209.15611 .
LEE J S , KIM P . ProteinSGM: score-based generative modeling for de novo protein design [EB/OL ] . 2022[ 2023-01-10 ] . https://doi.org/10.21203/rs.3.rs-1855828/v1 https://doi.org/10.21203/rs.3.rs-1855828/v1 .
LEAVER-FAY A , TYKA M , LEWIS S M , et al . ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules [J ] . Methods in Enzymology , 2011 , 487 : 545 - 574 .
INGRAHAM J , BARANOV M , COSTELLO Z , et al . Illuminating protein space with a programmable generative model [EB/OL ] . bioXiv , 2022 [ 2023-01-10 ] . https://www.biorxiv.org/content/10.1101/2022.12.01.518682v1 https://www.biorxiv.org/content/10.1101/2022.12.01.518682v1 .
ANAND N , ACHIM T . Protein structure and sequence generation with equivariant denoising diffusion probabilistic models [EB/OL ] . arXiv , 2022 : 2205 . 15019 [ 2023-01-10 ] . https://arxiv.org/abs/2205.15019 https://arxiv.org/abs/2205.15019 .
DEVLIN J , CHANG M , LEE K , et al . BERT: Pre-training of deep bidirectional transformers for language understanding [EB/OL ] . arXiv , 2018 : 1810 . 04805 [ 2023-01-10 ] . https://arxiv.org/abs/1810.04805 https://arxiv.org/abs/1810.04805 .
DE BORTOLI V , MATHIEU E , HUTCHINSON M , et al . Riemannian score-based generative modelling [EB/OL ] . arXiv , 2022 : 2202 . 02763 [ 2023-01-10 ] . https://arxiv.org/abs/2202.02763 https://arxiv.org/abs/2202.02763 .
LEACH A , SCHMON S M , DEGIACOMI M T , et al . Denoising diffusion probabilistic models on SO(3) for rotational alignment [EB/OL ] . ICLR 2022 Workshop on Geometrical and Topological Representation Learning , 2022 [ 2023-01-10 ] . https://openreview.net/forum?id=BY88eBbkpe5 https://openreview.net/forum?id=BY88eBbkpe5 .
LIU Y F , CHEN L H , LIU H Y . De novo protein backbone generation based on diffusion with structured priors and adversarial training [EB/OL ] . bioRxiv , 2022 [ 2023-01-10 ] . https://www.biorxiv.org/content/10.1101/2022.12.17.520847v1 https://www.biorxiv.org/content/10.1101/2022.12.17.520847v1 .
RIVES A , MEIER J , SERCU T , et al . Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences [J ] . Proceedings of the National Academy of Sciences of the United States of America , 2021 , 118 ( 15 ): e2016239118 .
LUO S T , SU Y F , PENG X G , et al . Antigen-specific antibody design and optimization with diffusion-based generative models for protein structures [EB/OL ] . bioXiv , 2022 [ 2023-01-10 ] . https://www.biorxiv.org/content/10.1101/2022.07.10.499510v5 https://www.biorxiv.org/content/10.1101/2022.07.10.499510v5 .
REPECKA D , JAUNISKIS V , KARPUS L , et al . Expanding functional protein sequence spaces using generative adversarial networks [J ] . Nature Machine Intelligence , 2021 , 3 ( 4 ): 324 - 333 .
MADANI A , MCCANN B , NAIK N , et al . ProGen: Language modeling for protein generation [EB/OL ] . arXiv , 2020 : 2004 . 03497 [ 2023-01-10 ] . https://arxiv.org/abs/2004.03497 https://arxiv.org/abs/2004.03497 .
ELNAGGAR A , HEINZINGER M , DALLAGO C , et al . ProtTrans: towards cracking the language of lifes code through self-supervised deep learning and high performance computing [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2021 , 44 ( 10 ), 7112 - 7127 .
GLIGORIJEVIĆ V , BERENBERG D , RA S , et al . Function-guided protein design by deep manifold sampling [EB/OL ] . bioRxiv , 2021 [ 2023-01-10 ] . https://www.biorxiv.org/content/10.1101/2021.12.22.473759v1 https://www.biorxiv.org/content/10.1101/2021.12.22.473759v1 .
MOFFAT L , KANDATHIL S M , JONES D T . Design in the DARK: learning deep generative models for de novo protein design [EB/OL ] . bioRxiv , 2022 [ 2023-01-10 ] . https://www.biorxiv.org/content/10.1101/2022.01.27.478087v1 https://www.biorxiv.org/content/10.1101/2022.01.27.478087v1 .
FERRUZ N , SCHMIDT S , HÖCKER B . ProtGPT2 is a deep unsurprised language model for protein design [J ] . Nature Communications , 2022 , 13 ( 1 ): 4348 .
HESSLOW D , ZANICHELLI N , NOTIN P , et al . RITA: a study on scaling up generative protein sequence models [EB/OL ] . arXiv , 2022 : 2205 . 05789 [ 2023-01-10 ] . https://arxiv.org/abs/2205.05789 https://arxiv.org/abs/2205.05789 .
NIJKAMP E , RUFFOLO J , WEINSTEIN E N , et al . ProGen2: exploring the boundaries of protein language models [EB/OL ] . arXiv , 2022 [ 2023-01-10 ] . https://arxiv.org/abs/2206.13517 https://arxiv.org/abs/2206.13517 .
LI Z X , YANG Y D , ZHAN J , et al . Energy functions in de novo protein design: current challenges and future prospects [J ] . Annual Review of Biophysics , 2013 , 42 : 315 - 335 .
0
Views
1
下载量
4
CSCD
Publicity Resources
Related Articles
Related Author
Related Institution
京公网安备11010802024621