上海交通大学 生命科学技术学院,微生物代谢国家重点实验室,上海 200240
[ "卞佳豪(1997—),男,硕士研究生。研究方向为人工智能辅助定向进化的组合方法。 E-mail:bjh2170@sjtu.edu.cn" ]
[ "杨广宇(1980—),男,研究员,博士生导师。研究方向为酶定向进化、酶高通量筛选、酶技术应用、体外合成生物学等。E-mail:yanggy@sjtu.edu.cn" ]
收稿:2021-03-16,
修回:2021-05-24,
纸质出版:2022-06-30
移动端阅览
卞佳豪, 杨广宇. 人工智能辅助的蛋白质工程[J]. 合成生物学, 2022, 3(3): 429-444
BIAN Jiahao, YANG Guangyu. Artificial intelligence-assisted protein engineering[J]. Synthetic Biology Journal, 2022, 3(3): 429-444
卞佳豪, 杨广宇. 人工智能辅助的蛋白质工程[J]. 合成生物学, 2022, 3(3): 429-444 DOI: 10.12211/2096-8280.2021-032.
BIAN Jiahao, YANG Guangyu. Artificial intelligence-assisted protein engineering[J]. Synthetic Biology Journal, 2022, 3(3): 429-444 DOI: 10.12211/2096-8280.2021-032.
蛋白质工程是合成生物学领域的重要研究方向之一。但目前人类对于蛋白质折叠、酶天然进化机制等基础生物学问题的理解仍很有限,因此基于理性设计方法进行蛋白质的功能从头设计(
de novo
design)仍然是一个难题。定向进化(directed evolution)通过在实验室模拟自然进化的原理,可以在不依赖结构和机制信息的基础上对蛋白质的功能进行有效优化。但是定向进化高度依赖高通量筛选方法,也限制了其对缺少高通量筛选方法的蛋白质进行改造的能力。近年来,人工智能辅助的蛋白质工程逐渐发展成为一种高效的蛋白质分子设计新策略,在蛋白质的结构预测、功能预测、溶解度预测和指导智能文库设计等多个方面显现出独特的优势,成为理性设计和定向进化之后的又一次技术发展的浪潮。本文综述了近年来人工智能辅助的蛋白质工程的应用进展,对其中的代表性工作进行了重点阐述。在简单介绍了人工智能蛋白质工程策略的原理和流程之后,对数据、分子描述符和人工智能算法等三个影响预测模型性能的关键点进行了分析,总结了该策略中的主要数据库、分子描述符和算法的主流工具包及平台,介绍了它们的功能、用途和网址。我们还对人工智能策略目前仍面临的不足进行了探讨,如高质量数据不足、实验数据存在偏差、缺少通用模型等。随着自动基因功能注释技术、超高通量筛选技术和人工智能算法的不断发展,将会给人工智能辅助的蛋白质工程提供足够的高质量数据和更准确的算法
,从而不断提升人工智能辅助的蛋白质工程预测准确度,为合成生物学研究提供更大的助力。
Protein engineering is one of the important research fields of synthetic biology. However
de novo
design of protein functions based on rational design is still challenging
because of the limited understanding on biological fundamentals such as protein folding and the natural evolution mechanism of enzymes. Directed evolution is capable of optimizing protein functions effectively by mimicking the principle of natural evolution in the laboratory without relying on structure and mechanism information. However
directed evolution is highly dependent on high-throughput screening methods
which also limits its applications on proteins which lack high-throughput screening methods. In recent years
artificial intelligence has been developed very rapidly for integrating into multidisciplinary fields. In synthetic biology
artificial intelligence-assisted protein engineering has become an efficient strategy for protein engineering besides rational design and directed evolution
which has shown unique advantages in predicting the structure
function
solubility of proteins and enzymes. Artificial intelligence models can learn the internal properties and relationships from given sequence-function data sets to make predictions on properties for virtual sequences. In this article
we review the application of artificial intelligence-assisted protein engineering. With the basic and process of the strategy introduced
three key points that affect the performance of the predictive model are analyzed: data
molecular descriptors and artificial intelligence algorithms. In order to provide useful tools for researchers who want to take advantage of this strategy
we summarize the main public database
diverse toolkits and web servers of the common molecular descriptors and artificial intelligence algorithms. We also comment on the functions
applications and websites of several artificial intelligence
-assisted protein engineering platforms
through which a complete prediction task including protein sequences representation
feature analysis
model construction and output can be completed easily. Finally
we analyze some challenges that need to be solved in the artificial intelligence-assisted protein engineering
such as the lack of high-quality data
deviation in data sets and lacking of the universal models. However
with the development of automated gene annotations
ultra-high-throughput screening technologies and artificial intelligence algorithms
sufficient high-quality data and appropriate algorithms will be developed
which can enhance the performance of artificial intelligence-assisted protein engineering and thus facilitate the development of synthetic biology techniques.
2
WAY J C , COLLINS J J , KEASLING J D , et al . Integrating biological redesign: Where synthetic biology came from and where it needs to go [J ] . Cell , 2014 , 157 ( 1 ): 151 - 161 .
XIE M Q , HAELLMAN V , FUSSENEGGER M . Synthetic biology—application-oriented cell engineering [J ] . Current Opinion in Biotechnology , 2016 , 40 : 139 - 148 .
BOYLE P M , SILVER P A . Parts plus pipes: Synthetic biology approaches to metabolic engineering [J ] . Metabolic Engineering , 2012 , 14 ( 3 ): 223 - 232 .
FOO J L , CHING C B , CHANG M W , et al . The imminent role of protein engineering in synthetic biology [J ] . Biotechnology Advances , 2012 , 30 ( 3 ): 541 - 549 .
ERB T J , JONES P R , BAR-EVEN A . Synthetic metabolism: Metabolic engineering meets enzyme design [J ] . Current Opinion in Chemical Biology , 2017 , 37 : 56 - 62 .
PLEISS J . Protein design in metabolic engineering and synthetic biology [J ] . Current Opinion in Biotechnology , 2011 , 22 ( 5 ): 611 - 617 .
CHEN R P , GAYNOR A S , CHEN W . Synthetic biology approaches for targeted protein degradation [J ] . Biotechnology Advances , 2019 , 37 ( 8 ): 107446 .
GAINZA-CIRAUQUI P , CORREIA B E . Computational protein design—the next generation tool to expand synthetic biology applications [J ] . Current Opinion in Biotechnology , 2018 , 52 : 145 - 152 .
BADENHORST C P S , BORNSCHEUER U T . Getting momentum: From biocatalysis to advanced synthetic biology [J ] . Trends in Biochemical Sciences , 2018 , 43 ( 3 ): 180 - 198 .
EASON M G , DAMRY A M , CHICA R A . Structure-guided rational design of red fluorescent proteins: Towards designer genetically-encoded fluorophores [J ] . Current Opinion in Structural Biology , 2017 , 45 : 91 - 99 .
ZEYMER C , HILVERT D . Directed evolution of protein catalysts [J ] . Annual Review of Biochemistry , 2018 , 87 : 131 - 157 .
RIBEIRO L F , AMARELLE V , ALVES L F , et al . Genetically engineered proteins to improve biomass conversion: New advances and challenges for tailoring biocatalysts [J ] . Molecules , 2019 , 24 ( 16 ): 2879 .
MARKEL U , ESSANI K D , BESIRLIOGLU V , et al . Advances in ultrahigh-throughput screening for directed enzyme evolution [J ] . Chemical Society Reviews , 2020 , 49 ( 1 ): 233 - 262 .
LIU Q , XUN G H , FENG Y . The state-of-the-art strategies of protein engineering for enzyme stabilization [J ] . Biotechnology Advances , 2019 , 37 ( 4 ): 530 - 537 .
LIAO J , WARMUTH M K , GOVINDARAJAN S , et al . Engineering proteinase K using machine learning and synthetic genes [J ] . BMC Biotechnology , 2007 , 7 : 16 .
YANG K K , WU Z , ARNOLD F H . Machine-learning-guided directed evolution for protein engineering [J ] . Nature Methods , 2019 , 16 ( 8 ): 687 - 694 .
SENIOR A W , EVANS R , JUMPER J , et al . Improved protein structure prediction using potentials from deep learning [J ] . Nature , 2020 , 577 ( 7792 ): 706 - 710 .
YANG J Y , ANISHCHENKO I , PARK H , et al . Improved protein structure prediction using predicted interresidue orientations [J ] . Proceedings of the National Academy of Sciences of the United States of America , 2020 , 117 ( 3 ): 1496 - 1503 .
YANG M , FEHL C , LEES K V , et al . Functional and informatics analysis enables glycosyltransferase activity prediction [J ] . Nature Chemical Biology , 2018 , 14 ( 12 ): 1109 - 1117 .
RYU J Y , KIM H U , LEE S Y . Deep learning enables high-quality and high-throughput prediction of enzyme commission numbers [J ] . Proceedings of the National Academy of Sciences of the United States of America , 2019 , 116 ( 28 ): 13996 - 14001 .
HAN X , WANG X N , ZHOU K . Develop machine learning-based regression predictive models for engineering protein solubility [J ] . Bioinformatics , 2019 , 35 ( 22 ): 4640 - 4646 .
CHEN J W , ZHENG S J , ZHAO H Y , et al . Structure-aware protein solubility prediction from sequence through graph convolutional network and predicted contact map . Journal of Cheminformatics , 2021 , 13 : 7 .
SAITO Y , OIKAWA M , NAKAZAWA H , et al . Machine-learning-guided mutagenesis for directed evolution of fluorescent proteins [J ] . ACS Synthetic Biology , 2018 , 7 ( 9 ): 2014 - 2022 .
WU Z , KAN S B J , LEWIS R D , et al . Machine learning-assisted directed protein evolution with combinatorial libraries [J ] . Proceedings of the National Academy of Sciences of the United States of America , 2019 , 116 ( 18 ): 8852 - 8858 .
CADET F , FONTAINE N , LI G Y , et al . A machine learning approach for reliable prediction of amino acid interactions and its application in the directed evolution of enantioselective enzymes [J ] . Scientific Reports , 2018 , 8 : 16757 .
BEDBROOK C N , YANG K K , ROBINSON J E , et al . Machine learning-guided channelrhodopsin engineering enables minimally invasive optogenetics [J ] . Nature Methods , 2019 , 16 ( 11 ): 1176 - 1184 .
MAZURENKO S , PROKOP Z , DAMBORSKY J . Machine learning in enzyme engineering [J ] . ACS Catalysis , 2020 , 10 ( 2 ): 1210 - 1223 .
YANG X , WANG Y F , BYRNE R , et al . Concepts of artificial intelligence for computer-assisted drug discovery [J ] . Chemical Reviews , 2019 , 119 ( 18 ): 10520 - 10594 .
BADILLO S , BANFAI B , BIRZELE F , et al . An introduction to machine learning [J ] . Clinical Pharmacology and Therapeutics , 2020 , 107 ( 4 ): 871 - 885 .
蒋迎迎 , 曲戈 , 孙周通 . 机器学习助力酶定向进化 [J ] . 生物学杂志 , 2020 , 37 ( 4 ): 1 - 11 .
JIANG Y Y , QU G , SUN Z T . Machine learning-assisted enzyme directed evolution [J ] . Journal of Biology , 2020 , 37 ( 4 ): 1 - 11 .
胡如云 , 张嵩亚 , 蒙海林 , 等 . 面向合成生物学的机器学习方法及应用 [J ] . 科学通报 , 2021 , 66 ( 3 ): 284 - 299 .
HU R Y , ZHANG S Y , MENG H L , et al . Machine learning for synthetic biology: Methods and applications [J ] . Chinese Science Bulletin , 2021 , 66 ( 3 ): 284 - 299 .
CONSORTIUM T U . UniProt: a worldwide hub of protein knowledge [J ] . Nucleic Acids Research , 2018 , 47 ( D1 ): D506 - D515 .
MUGGLETON S , KING R D , STENBERG M J E . Protein secondary structure prediction using logic-based machine learning [J ] . Protein Engineering, Design and Selection , 1992 , 5 ( 7 ): 647 - 657 .
ALQURAISHI M . AlphaFold at CASP13 [J ] . Bioinformatics , 2019 , 35 ( 22 ): 4862 - 4865 .
KINCH L N , SHI S Y , CHENG H , et al . CASP9 target classification [J ] . Proteins: Structure, Function, and Bioinformatics , 2011 , 79 ( S10 ): 21 - 36 .
LI Y , WANG S , UMAROV R , et al . DEEPre: sequence-based enzyme EC number prediction by deep learning [J ] . Bioinformatics , 2017 , 34 ( 5 ): 760 - 769 .
BOUTET E , LIEBERHERR D , TOGNOLLI M , et al . UniProtKB/Swiss-prot, the manually annotated section of the UniProt KnowledgeBase: How to use the entry view [J ] . Methods in Molecular Biology , 2016 , 1374 : 23 - 54 .
CLAUDEL-RENARD C , CHEVALET C , FARAUT T , et al . Enzyme-specific profiles for genome annotation: PRIAM [J ] . Nucleic Acids Research , 2003 , 31 ( 22 ): 6633 - 6639 .
YU C G , ZAVALJEVSKI N , DESAI V , et al . Genome-wide enzyme annotation with precision control: Catalytic families (CatFam) databases [J ] . Proteins , 2009 , 74 ( 2 ): 449 - 460 .
KUMAR N , SKOLNICK J . EFICAz2.5: application of a high-precision enzyme function predictor to 396 proteomes [J ] . Bioinformatics , 2012 , 28 ( 20 ): 2687 - 2688 .
LI Y H , XU J Y , TAO L , et al . SVM-prot 2016: A web-server for machine learning prediction of protein functional families from sequence irrespective of similarity [J ] . PLoS One , 2016 , 11 ( 8 ): e0155290 .
ZHANG C X , FREDDOLINO P L , ZHANG Y . COFACTOR: improved protein function prediction by combining structure, sequence and protein-protein interaction information [J ] . Nucleic Acids Research , 2017 , 45 ( W1 ): W291 - W299 .
NURSIMULU N , XU L L , WASMUTH J D , et al . Improved enzyme annotation with EC-specific cutoffs using DETECT v2 [J ] . Bioinformatics , 2018 , 34 ( 19 ): 3393 - 3395 .
DALKIRAN A , RIFAIOGLU A S , MARTIN M J , et al . ECPred: a tool for the prediction of the enzymatic functions of protein sequences based on the EC nomenclature [J ] . BMC Bioinformatics , 2018 , 19 ( 1 ): 334 .
HOU Q Z , BOURGEAS R , PUCCI F , et al . Computational analysis of the amino acid interactions that promote or decrease protein solubility [J ] . Scientific Reports , 2018 , 8 : 14661 .
BHANDARI B K , GARDNER P P , LIM C S . Solubility-Weighted Index: Fast and accurate prediction of protein solubility [J ] . Bioinformatics , 2020 , 36 ( 18 ): 4691 - 4698 .
LI G Y , DONG Y J , REETZ M T . Can machine learning revolutionize directed evolution of selective enzymes? [J ] . Advanced Synthesis and Catalysis , 2019 , 361 ( 11 ): 2377 - 2386 .
SONG H , BREMER B J , HINDS E C , et al . Inferring protein sequence-function relationships with large-scale positive-unlabeled learning [J ] . Cell Systems , 2021 , 12 ( 1 ): 92 - 101.e8 .
YU L A , WANG S Y , LAI K K . An integrated data preparation scheme for neural network data analysis [J ] . IEEE Transactions on Knowledge and Data Engineering , 2006 , 18 ( 2 ): 217 - 230 .
YANG Y , UROLAGIN S , NIROULA A , et al . PON-tstab: Protein variant stability predictor. importance of training data quality [J ] . International Journal of Molecular Sciences , 2018 , 19 ( 4 ): 1009 .
SIEDHOFF N E , SCHWANEBERG U , DAVARI M D . Machine learning-assisted enzyme engineering [J ] . Methods in Enzymology , 2020 , 643 : 281 - 315 .
BASTIAN F B , CHIBUCOS M C , GAUDET P , et al . The Confidence Information Ontology: A step towards a standard for asserting confidence in annotations [J ] . Database , 2015 , 2015 : bav043 .
周志华 . 机器学习 [M ] . 北京 : 清华大学出版社 , 2016 .
ZHOU Z H . Machine learning [M ] . Beijing : Tsinghua University Press , 2016 .
KAWASHIMA S , POKAROWSKI P , POKAROWSKA M , et al . AAindex: amino acid index database, progress report 2008 [J ] . Nucleic Acids Research , 2007 , 36 ( suppl 1 ): D202 - D205 .
XU Y T , VERMA D , SHERIDAN R P , et al . Deep dive into machine learning models for protein engineering [J ] . Journal of Chemical Information and Modeling , 2020 , 60 ( 6 ): 2773 - 2790 .
LECUN Y , BENGIO Y , HINTON G . Deep learning [J ] . Nature , 2015 , 521 ( 7553 ): 436 - 444 .
ABDI H . Partial least squares regression and projection on latent structure regression (PLS Regression) [J ] . WIREs Computational Statistics , 2010 , 2 ( 1 ): 97 - 106 .
CORTES C , VAPNIK V . Support-vector networks [J ] . Machine Learning , 1995 , 20 ( 3 ): 273 - 297 .
QUINLAN J R . Induction of decision trees [J ] . Machine Learning , 1986 , 1 ( 1 ): 81 - 106 .
HECKERMAN D . A tutorial on learning with Bayesian networks [M ] //HOLMES D E, JAIN L C. Innovations in Bayesian networks: theory and applications . Berlin, Heidelberg; Springer , 2008 : 33 - 82 .
SINAI S , KELSIC E , CHURCH G M , et al . Variational auto-encoding of protein sequences [EB/OL ] . 2017: arXiv : 1712 .03346[q-bio.QM ] . https://arxiv.org/abs/1712.03346 https://arxiv.org/abs/1712.03346
LECUN Y , BENGIO Y . Convolutional networks for images, speech, and time series [J ] . The handbook of brain theory and neural networks , 1995 , 3361 ( 10 ): 1995 .
LOHMANN R , SCHNEIDER G , BEHRENS D , et al . A neural network model for the prediction of membrane-spanning amino acid sequences [J ] . Protein Science , 1994 , 3 ( 9 ): 1597 - 1601 .
曲戈 , 朱彤 , 蒋迎迎 , 等 . 蛋白质工程:从定向进化到计算设计 [J ] . 生物工程学报 , 2019 , 35 ( 10 ): 1843 - 1856 .
QU G , ZHU T , JIANG Y Y , et al . Protein engineering: From directed evolution to computational design [J ] . Chinese Journal of Biotechnology , 2019 , 35 ( 10 ): 1843 - 1856 .
WOLPERT D H . The lack of A priori distinctions between learning algorithms [J ] . Neural Computation , 1996 , 8 ( 7 ): 1341 - 1390 .
YANG K K , WU Z , BEDBROOK C N , et al . Learned protein embeddings for machine learning [J ] . Bioinformatics , 2018 , 34 ( 15 ): 2642 - 2648 .
ALLEY E C , KHIMULYA G , BISWAS S , et al . Unified rational protein engineering with sequence-based deep representation learning [J ] . Nature Methods , 2019 , 16 ( 12 ): 1315 - 1322 .
CONSORTIUM T U , BATEMAN A , MARTIN M J , et al . UniProt: the universal protein knowledgebase in 2021 [J ] . Nucleic Acids Research , 2020 , 49 ( D1 ): D480 - D489 .
RIVES A , MEIER J , SERCU T , et al . Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences [EB/OL ] . bioRxiv , 2020 , DOI: 10.1101/622803 http://dx.doi.org/10.1101/622803 .
BURLEY S K , BHIKADIYA C , BI C X , et al . RCSB Protein Data Bank: Powerful new tools for exploring 3D structures of biological macromolecules for basic and applied research and education in fundamental biology, biomedicine, biotechnology, bioengineering and energy sciences [J ] . Nucleic Acids Research , 2020 , 49 ( D1 ): D437 - D451 .
NIKAM R , KULANDAISAMY A , HARINI K , et al . ProThermDB: thermodynamic database for proteins and mutants revisited after 15 years [J ] . Nucleic Acids Research , 2020 , 49 ( D1 ): D420 - D424 .
STOURAC J , DUBRAVA J , MUSIL M , et al . FireProtDB: database of manually curated protein stability data [J ] . Nucleic Acids Research , 2020 , 49 ( D1 ): D319 - D324 .
TIAN Y , DEUTSCH C , KRISHNAMOORTHY B . Scoring function to predict solubility mutagenesis [J ] . Algorithms for Molecular Biology: AMB , 2010 , 5 : 33 .
SORMANNI P , APRILE F A , VENDRUSCOLO M . The CamSol method of rational design of protein mutants with enhanced solubility [J ] . Journal of Molecular Biology , 2015 , 427 ( 2 ): 478 - 490 .
ZAMBRANO R , JAMROZ M , SZCZASIUK A , et al . AGGRESCAN3D (A3D): Server for prediction of aggregation properties of protein structures [J ] . Nucleic Acids Research , 2015 , 43 ( W1 ): W306 - W313 .
YANG Y , NIROULA A , SHEN B R , et al . PON-Sol: Prediction of effects of amino acid substitutions on protein solubility [J ] . Bioinformatics , 2016 , 32 ( 13 ): 2032 - 2034 .
PALADIN L , PIOVESAN D , TOSATTO S C E . SODA: prediction of protein solubility from disorder and aggregation propensity [J ] . Nucleic Acids Research , 2017 , 45 ( W1 ): W236 - W240 .
MAZURENKO S . Predicting protein stability and solubility changes upon mutations: data perspective [J ] . Chem Cat Chem , 2020 , 12 , 5590 - 5598 .
WANG C Y , CHANG P M , ARY M L , et al . ProtaBank: A repository for protein design and engineering data [J ] . Protein Science: a Publication of the Protein Society , 2018 , 27 ( 6 ): 1113 - 1124 .
SHEN H B , CHOU K C . PseAAC: A flexible web server for generating various kinds of protein pseudo amino acid composition [J ] . Analytical Biochemistry , 2008 , 373 ( 2 ): 386 - 388 .
RAO H B , ZHU F , YANG G B , et al . Update of PROFEAT: A web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence [J ] . Nucleic Acids Research , 2011 , 39 ( suppl_2 ): W385 - W390 .
CAO D S , XU Q S , LIANG Y Z . Propy: a tool to generate various modes of Chou's PseAAC [J ] . Bioinformatics , 2013 , 29 ( 7 ): 960 - 962 .
DU P F , GU S W , JIAO Y S . PseAAC-General: Fast building various modes of general form of Chou's pseudo-amino acid composition for large-scale protein datasets [J ] . International Journal of Molecular Sciences , 2014 , 15 ( 3 ): 3495 - 3506 .
XIAO N , CAO D S , ZHU M F , et al . Protr/ProtrWeb: R package and web server for generating various numerical representation schemes of protein sequences [J ] . Bioinformatics , 2015 , 31 ( 11 ): 1857 - 1859 .
CAO D S , XIAO N , XU Q S , et al . Rcpi: R/Bioconductor package to generate various descriptors of proteins, compounds and their interactions [J ] . Bioinformatics , 2014 , 31 ( 2 ): 279 - 281 .
LIU B , LIU F L , WANG X L , et al . Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences [J ] . Nucleic Acids Research , 2015 , 43 ( W1 ): W65 - W71 .
OFER D , LINIAL M . ProFET: Feature engineering captures high-level protein functions [J ] . Bioinformatics , 2015 , 31 ( 21 ): 3429 - 3436 .
ZUO Y C , LI Y , CHEN Y L , et al . PseKRAAC: a flexible web server for generating pseudo K -tuple reduced amino acids composition [J ] . Bioinformatics , 2016 , 33 ( 1 ): 122 - 124 .
WANG J W , YANG B J , REVOTE J , et al . POSSUM: a bioinformatics toolkit for generating numerical sequence feature descriptors based on PSSM profiles [J ] . Bioinformatics , 2017 , 33 ( 17 ): 2756 - 2758 .
CHEN Z , ZHAO P , LI F Y , et al . iFeature: a Python package and web server for features extraction and selection from protein and peptide sequences [J ] . Bioinformatics , 2018 , 34 ( 14 ): 2499 - 2502 .
LIU B , GAO X , ZHANG H Y . BioSeq-Analysis2.0: An updated platform for analyzing DNA, RNA and protein sequences at sequence level and residue level based on machine learning approaches [J ] . Nucleic Acids Research , 2019 , 47 ( 20 ): e127 .
CHEN Z , ZHAO P , LI F Y , et al . iLearn: an integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data [J ] . Briefings in Bioinformatics , 2019 , 21 ( 3 ): 1047 - 1057 .
HOU Q Z , KWASIGROCH J M , ROOMAN M , et al . SOLart: a structure-based method to predict protein solubility and aggregation [J ] . Bioinformatics , 2019 , 36 ( 5 ): 1445 - 1452 .
LIU Y M , WANG X L , LIU B . IDP-CRF: intrinsically disordered protein/region identification based on conditional random fields [J ] . International Journal of Molecular Sciences , 2018 , 19 ( 9 ): 2483 .
CHEN Z , HE N N , HUANG Y , et al . Integration of A deep learning classifier with A random forest approach for predicting malonylation sites [J ] . Genomics , Proteomics & Bioinformatics, 2018 , 16 ( 6 ): 451 - 459 .
SCHNOES A M , REAM D C , THORMAN A W , et al . Biases in the experimental annotations of protein function and their effect on our understanding of protein function space [J ] . PLoS Computational Biology , 2013 , 9 ( 5 ): e1003063 .
ZHOU N H , JIANG Y X , BERGQUIST T R , et al . The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens [J ] . Genome Biology , 2019 , 20 ( 1 ): 244 .
GERLT J A , ALLEN K N , ALMO S C , et al . The enzyme function initiative [J ] . Biochemistry , 2011 , 50 ( 46 ): 9950 - 9962 .
ROBERTS R J , CHANG Y C , HU Z J , et al . COMBREX: a project to accelerate the functional annotation of prokaryotic genomes [J ] . Nucleic Acids Research , 2010 , 39 ( S1 ): D11 - D14 .
FOWLER D M , FIELDS S . Deep mutational scanning: A new style of protein science [J ] . Nature Methods , 2014 , 11 ( 8 ): 801 - 807 .
FOWLER D M , STEPHANY J J , FIELDS S . Measuring the activity of protein variants on a large scale using deep mutational scanning [J ] . Nature Protocols , 2014 , 9 ( 9 ): 2267 - 2284 .
NEWBERRY R W , LEONG J T , CHOW E D , et al . Deep mutational scanning reveals the structural basis for α -synuclein activity [J ] . Nature Chemical Biology , 2020 , 16 ( 6 ): 653 - 659 .
Riesselman A J , Ingraham J B , Marks D S . Deep generative models of genetic variation capture the effects of mutations [J ] . Nature Methods , 2018 , 15 ( 10 ): 816 - 822 .
0
浏览量
2
下载量
6
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621