CECAM - Microscopic simulations: forecasting the next two decadesMicroscopic simulations: forecasting the next two decades

Theoretical, methodological and algorithmic developments require long term research investments. It is well accepted that developing a simulation code needs from 10 to 20 years of effort. As in many other research fields (materials, fluid mechanics, climate modeling etc), this is also observed in the field of microscopic simulations (where systems like finite size nanodrops, polymers/bio-polymers or bulk/vapor interfaces, are simulated at the atomic or molecular scale). For instance, the molecular modeling code CHARMM [1] and the quantum chemistry package of programs GAUSSIAN [2] were first developed in the seventies and they are both still among the most popular codes used today. The popular codes NAMD [3], GROMACS [4] and LAMMPS [5] or the quantum molecular code CPMD [6] are also still developed after more than 20 years. Many of the development teams of all the latter codes comprise now at least 20 people (accounting for developers and contributors), but they were all initiated with very small teams (no more than 2 to 3 people usually). Regarding methods, we have obviously to mention here the never-ending race to refine force fields (the version 36 of the CHARMM force field has been proposed in July 2017 [7]) and the long efforts devoted to develop accurate coarse grained approaches (like the ongoing improvements regarding the MARTINI approach [8]). Lastly we should also mention the case of polarizable force fields that were proposed and discussed as soon as the seventies [9]. They have started to be intensively used only during the last decade.

A priori, developing a theoretical method and writing a simulation code may be considered as two distinct standalone activities. Nevertheless, codes can be developed to allow the use of new theoretical methods that still need further improvements. For instance, we may quote the long-standing challenge to perform reliable and efficient geometry optimization when using Quantum Monte Carlo (QMC) approaches [10] or the need to build specific barostats well suited to new multi-scale coarse grained approaches like that proposed in Ref. 11. Moreover we have also to consider the outstanding increase of the available computational resources. In about 15 years, the power of a typical computing system available in national centers has increased from dozens of Tflops to now hundreds of Pflops, with Exaflops systems on the horizon (see among others Ref. 12). While the computing system panorama was stable over the last decade, dominated by standard and almost monolithic INTEL CPU-based architectures, we are now facing an important evolution. First more than 56% of the computational power available in the available fastest supercomputing systems results from GPU units [13]. Moreover, new actors are emerging proposing new architectures based on ARM computational units (see the recent announcement from Fujistu and RIKEN [12]) and even computing systems specifically devoted to molecular dynamics (the ANTON machine, developed by the D.E. Shaw Research Laboratory [14]). This means that we have to be aware of the forthcoming new generation of computing systems to propose not only interesting but also efficient new theoretical methods and algorithms. This is already particularly challenging with the present massively parallel CPU architectures, from which it is far from being obvious to get the highest level of performance when using a simulation code, in particular from the microscopic field [15].

On the experimental side, the main features and capacities of experimental apparatus also evolve rapidly. Decoding a full human genome can be performed today at the week scale by a single team, while years of efforts were needed by a large international network of research centers only 20 years ago. This means that bio informatics developments today have to account for the “Niagara Falls” of data generated by these new sequencing tools, i.e. for the amazing amount of storage capacity needed to store these genetic data and for all the problems tied to analyzing these data [16], problems that were far from being critical 20 years ago. We may also quote new approaches that are emerging in biology, like the high-throughput screening of proteins by phage display and by droplet microfluidics that allow one to map sequences to specific protein properties (like binding affinity or catalytic activity) for libraries comprising from 10^5 to 10^6 different proteins. This means that we are able today to experimentally benchmark protein libraries comprising about the same number of proteins as that hosted in the Protein Data Bank [17]. This suggests thus that new computational techniques coupling high-throughput docking methods with efficient simulation approaches to further interpret the results of this kind of experimental methods could be of great interest.

The problem of analyzing data generated by new experimental techniques is also known in other research fields where microscopic simulations are routinely used. For instance new experimental techniques (like surface sensitive photoelectron spectroscopy used in conjunction with the liquid microjet technique [18] or the gaseous ion nanocalorimetry technique [19] for instance) allow one to investigate interface phenomena or to estimate thermodynamics quantities at bulk/vapor interface and in gas phase nanodrops, like the properties of the hydrated electron and of halide anions. The latter two kinds of charged species are pivotal to understand many processes specific to aerosols that are know to have a deep impact on atmospheric pollution and then on global climate, air quality and public health [20], [21], [22]. However the experimental data generated using these new techniques are still difficult to interpret leading to intense disagreements even for an important property like the water surface ability to accept protons [23].

For obvious safety reasons, we may also quote here the difficulty of experimentally investigating the behavior of chemical species pivotal in the nuclear energy field (like heavy ions and metals) in particular in the liquid phase on a large range of temperature and pressure conditions to study contamination in reactor plan primary circuits for instance. Typical experiments to investigate the latter processes have to be performed at least at the year scale [24]. This is an exhaustive data set to be acquired regarding the ion thermochemistry that can be used by macroscopic chemistry codes like the program PhreeqC [25]. Similarly, we may also quote the example of working fluids for refrigeration. To replace fluids with high global warming potentials, new families of refrigerants [26] are proposed; however, experimental measurements of the thermodynamic and transport properties of new fluid candidates is expensive and time consuming, especially in the case of fluid mixtures with several possible compositions.

Microscopic simulations are considered as a promising alternative route not only to interpret experiments but also to complement them (for instance to “feed” data banks about ion thermodynamics properties in liquid phase). As discussed above the development of simulation codes and theoretical methods can not be considered as standalone activities. They have to be driven to complement experiments, to match potential new needs and to be well suited to be used on the forthcoming high-performance computing systems in order to reach the highest level of efficiency when performing simulations. It is thus pivotal to anticipate as soon as possible (and as far in the future as possible) what will be new potential needs and the computational state of the art in the forthcoming decades, in order to initiate the development of new generation of codes and of theoretical methods that will be used by large communities, from basic research to industry.

Microscopic simulations: forecasting the next two decades

Location: University Paul Sabatier, Toulouse, France

Organisers

References