CECAM - A roadmap for an atomistic machine learning software ecosystemA roadmap for an atomistic machine learning software ecosystem

The learning of accurate and efficient atomic interaction potentials from quantum mechanical calculations has been one of the first applications of machine learning in the physical sciences [1], and it is arguably one of the most successful. In the last 15 years, the development of machine-learned interaction potentials (MLIP) has evolved significantly and many different models have been proposed and applied to a broad set of systems in material science, chemistry, and biophysics [2-4], expanding also beyond the prediction of interatomic potentials to include any quantity accessible to electronic-structure calculations. The mathematical and conceptual framework underlying ML architectures for atomistic simulations is now relatively well understood [5-7] and different models share many common features. For instance, equivariance is a common theme as well as the use of message passing mechanisms between local atomic environments - even though schemes to incorporate long-range physics [8,9] and unrestricted models that relax some of the symmetry constraints are also actively developed [10].

Despite the many common ideas, the software ecosystem is currently very fragmented: each model usually comes with its own monolithic implementation, and there is little shared infrastructure, even though lately some efforts have started to appear that aim to provide basic functionalities in a more general setting (e.g. e3nn [11], dscribe [12], sphericart [13]). This is due in part to the very fast development of the field, in part to the fact that rapid prototyping is possible using general-purpose ML libraries such as Pytorch and Jax, in part to the difficulty in producing efficient domain-specific libraries that can fully exploit accelerated hardware platforms.

Another open question is how to provide easy-to-use interfaces between the ML core and different types of traditional modelling software. Most machine-learning interatomic potentials provide a library interface to call them from a molecular dynamics software such as LAMMPS or OpenMM, but this requires substantial effort for each target code. When one considers the prediction of properties such as electronic densities or Hamiltonians that involve a interfacing with quantum chemistry software, the situation is even more problematic, as one has to handle quantities with a considerably more complicated structure, and performing a machine learning task is more intimately coupled to the technical choices in the host code, such as the choice of basis set used to discretize the electronic wavefunction.[14-16]

A roadmap for an atomistic machine learning software ecosystem

Location: CECAM-HQ-EPFL, Lausanne, Switzerland

Organisers

References