Co-evolutionary analysis meets machine learning for modelling biomolecular structures and interactions
CECAM-HQ-EPFL, Lausanne, Switzerland
In the last decade, coevolutionary methods have revolutionized the field of computational structural biology. They form the basis of today’s best performing computational methods for the predictions of protein and RNA structures, as demonstrated by the success of the top-performing teams in the last CASP competitions. The methodological advances have allowed the successful study of protein structures, complexes and interactions networks by sequence variation analysis .
At the core, these methods rely on the statistical analysis of patterns of correlated mutations in large multiple sequence alignments. Based on the hypothesis that pairs of amino-acids in close spatial proximity display correlated mutations, coevolutionary methods aim at solving the inverse problem which consists of inferring structural proximity of residues from the observation of correlated mutations in natural sequences. The advent of high-throughput sequencing, leading to large databases of homologous protein sequences, and the development of theoretical tools to deal with correlated mutations lead to major breakthroughs in the field and in particular to the development of Direct-Coupling Analysis (DCA) , which laid the foundation for subsequent refinements and led to an impressive number of novel structural insights.
Recently, applications of deep neural-networks to the protein-structure prediction problem have demonstrated the power of using supervised-learning to extract precise structural information from DCA results [3,4], attaining unprecedented accuracy in terms of protein structure prediction. Crucially, these models rely on efficient neural-network architectures originally developed for unrelated image processing task, which have been efficiently repurposed for the task of predicting biomolecular structures.
We are today living an exciting time, where the convergence of coevolutionary analysis and machine learning techniques raises several key questions and exciting novel directions: What methodological advances will allow the accurate study of protein families with very little available sequence information? What is the potential for doing rational sequence-based protein design? Can the methods recently applied to the single domain prediction easily be applied to the problem of predicting protein complexes? Can conformational ensembles be predicted from sequence covariations? What are promising directions to study coevolution patterns in disordered proteins where alignments may be difficult or even impossible to obtain?
We aim at bringing together leading experts in coevolutionary modelling and the main actors active in the development of deep-learning methods to discuss open questions in the field, to foster collaborations, and to tackle pressing challenges in computational structural biology.
Alessandro Barducci (Centre de Biologie Structurale (CNRS UMR5048, INSERM U1054)) - Organiser
Paolo De Los Rios (EPFL) - Organiser
Duccio Malinverni (St. Jude Children's Research Hospital) - Organiser