Hybrid Quantum Mechanics / Molecular Mechanics (QM/MM) Approaches to Biochemistry and Beyond
CECAM-HQ EPFL, Lausanne, Switzerland
Multiscale computational methods have become a powerful tool, boosted by high performance computing (HPC) facilities, in a wealth of fields, covering a spectrum of applications ranging from materials science to biochemistry [1-6]. In this wide multidisciplinary scenario, the common denominator making these computational approaches appealing both in academy and industry is the possibility of including different levels of accuracy in the description of large systems able to reproduce in a virtual way the complexity of the macromolecules or extended condensed phases targeted in laboratory experiments. Advances in theoretical methods and algorithms fostered and adapted to continuously evolving HPC architectures for peta- and exascale-level supercomputers (SPCs) are a powerful tool but also a continuous challenge [7,8]. Worldwide SPCs centers, such as, for instance, in Europe, the PRACE initiative, have shown their prominent role also in the recent pandemic covid-19 crisis (see https://prace-ri.eu/prace-support-to-mitigate-impact-of-covid-19-pandemic/), underscoring the importance of computational science in the general context of human activities. Yet, an efficient exploitation of these methods and SPCs calls for continuous formation and updating to researchers and students facing for the first time this field.
On a technical standpoint, the HPC facilities allow for simulations of systems of unprecedented size and time evolution, with results going beyond a simple qualitative understanding. This makes these virtual experiments closer to real ones, and, in this respect, allowing to foresee their outcome before moving to laboratory realizations. Nonetheless, the continuous evolution of HPC architectures calls for an analogous continuous adaptation of codes and methods to these platforms. Specifically, in chemistry and biochemistry, one of the best suited and exploited multi-scale methods is the quantum mechanics/molecular mechanics (QM/MM) approach. The importance of this method was honored in 2013 by the Nobel Prize in Chemistry awarded jointly to M. Karplus, M. Levitt and A. Warshel. From a historical perspective, the first attempt at joining quantum and classical molecular mechanics, or a QM/MM approach, was introduced by two of the Nobel Prize awarded authors, Warshel and Levitt, in 1976 . This seminal work represented the first pioneering step that later, in combination with the increasing computational power of modern parallel, vector-parallel and hybrid CPU-GPU platforms, led to a breakthrough in the simulations of realistic bio-systems. Since then, biomolecular reactions became a fertile field for QM/MM methods both in terms of applications and in terms of developments that these challenging systems started posing to researchers in the field. This statement can be verified in the reviews nowadays available [9-11]. A subsequent step forward was the coupling of QM/MM dynamical simulations to free energy sampling techniques for the exploration of reaction mechanisms [12,13]. This disclosed an entirely new branch of computational biochemistry [14-17], offering the possibility of performing even more interesting virtual experiments with remarkable accuracy, complementing and supporting the traditional in vivo and in vitro ones. The success of this new branch has even gained a new term, in silico, coined in 1989 by the Mexican mathematician Pedro Miramontes.
The wide variety of QM/MM approaches, coded in various computer software, make their choice and use a major challenge, especially for students, young researchers and newcomers in this field. As a result, guidance from more experienced practitioners is of paramount importance to ne next-generation researchers who are supposed to continue and take over the work of present developers and forefront HPC users. On a second instance, major codes able to perform high level QM/MM simulations have become open-source and/or freely downloadable to any researcher in the need for this type of tools. Yet, these packages are often used as “black boxes” and this hides severe drawbacks for not advised users lacking a precise knowledge of the calculations and methods implemented in these codes, not to mention the domain of applicability and inherent limitations often overlooked.
The scope of our School, which follows previous editions held since 2011 until 2019 with a two-year periodicity, is to offer a general and up-to-date overview of the main features possible with QM/MM approaches. Since its original edition, this CECAM School has been kept dynamical and timely evolving, according to all the developments that the field experienced over the year. As both developers and long-standing practitioners, we intend to offer an updated overview, focusing specifically on biomolecular systems. Because of the large number of applications we received in all former tutorials on this subject (82 in 2011, 50 in 2013, 64 in 2015, 74 in 2017 and 64 in 2019) and the restricted number of students (26 supported participants plus 6 unsupported in 2019) who could be accepted, an analogous school would be highly welcome by applicants, particularly by all those who are still asking for this an opportunity and who were excluded due to the available places and budget limitations. We plan to present an updated, yet detailed, overview of the basics of the QM/MM methods, emphasizing advantages and disadvantages, practical applications and new advanced techniques aimed at exploring the terrain beyond standard molecular dynamics simulations. Since the HPC developments of these year has greatly also influenced the way in which these methods are coded, we also plan to introduce briefly parallelism issues intrinsic to modern algorithms for applications on modern SPCs. The main goal will be to provide to neophytes a background and the necessary “toolbox” to afford the simulation of complex systems of biological, medical and environmental relevance. Moreover, recent applications, such as the protein-nucleic acid interaction [18-20] will be presented having a direct contact with practical problems, such as the interaction of the “spike” protein of the SARS-CoV-2 virus with DNA , following to the EU recommendations and the PRACE initiative for the use of computing resources to contribute to the understanding and possible mitigation of the impact of the covid-19 pandemic (see https://prace-ri.eu/prace-support-to-mitigate-impact-of-covid-19-pandemic-awarded-projects/ ).
A major objective of this School is enable beginners to select a specific QM region in a large biomolecular system, whose structure is generally obtained as a set of cartesian coordinates and constituting chemical elements from the Protein Data Bank (https://www.pdb.org/) and other crystallographic structures databases nowadays available worldwide. The choice of an appropriate set of experimental coordinates is clearly crucial in determining the outcome of any QM/MM simulation.
A second, and equally important, issue is the problem of time scale and phase space sampling. Indeed, QM/MM simulations share the same picoseconds time-scale problem of any quantum chemical approach and, as such, standard dynamical simulations are insufficient to sample activated processes and enzymatic or catalytic reactions. The enhancement provided by free energy sampling techniques, such as Blue Moon Ensemble (BME), metadynamics (MTD) and string methods (SM) are viable tools to overcome this problem, thus allowing to expand the simulations not only with respect to the size of the system but also toward events occurring on longer time scales typical of biochemical processes. These tools can be efficiently distributed on modern HPC platforms not only within standard parallelization paradigms (MPI interface), but also on the basis of a hierarchy of parallelization levels from task groups for high level application-based parallelization to hybrid MPI/OMP and GPUs. In this spirit, this School aims at presenting with lessons and practical exercises the state of the art in recent advances for enhanced-sampling techniques, with special attention to those aspects that applicants in former editions of this school have indicated as most interesting.
We plan to include also specific details about the parallelization paradigms quoted above. This aspect is becoming increasingly important in view of the recent developments of HPC facilities (e.g. Post-K project http://post-k.cms-initiative.jp/ or exascale computing project https://www.exascaleproject.org/). Furthermore, at least a couple of lectures on new methodologies for the full first-principles (density functional theory, DFT) treatment of van der Waals forces, quantum treatment of the nuclei and dynamical DFT-based excited states (surface hopping) important (e.g.) in photosynthesis, will be given for the sake of completeness. The practical sessions will be performed with the codes CPMD [21,23] and CP2K [24,25], which are among the QM/MM codes that support ab initio molecular dynamics.
Mauro Boero (Univerity of Strasbourg-CNRS) - Organiser
Ari Paavo Seitsonen (Ecole Normale Supérieure) - Organiser
Carme Rovira (University of Barcelona) - Organiser
Pablo Campomanes (University of Fribourg) - Organiser