CECAM - Automating atomistic machine learningAutomating atomistic machine learning

Machine-learned interatomic potentials (MLIPs) have now become a standard approach in computational materials modelling. Early work in the field has relied on hand-crafted models, carefully chosen heuristics, and a lot of individual experience, often gained through trial and error. To move to the next stage and to make MLIPs the genuine default in materials modelling, it is important that they become accessible to, and usable by, a much wider audience. We view automation and automated workflows as a key enabler in this regard.

The reasons for automating tasks in computational materials science until recently have mainly been to facilitate high-throughput computations, enable complex DFT-based workflows without customised scripts, and allow for high reproducibility and data provenance of DFT-based simulations. In this context, many automation frameworks have been developed within the community, each with its own focus. Only recently have steps been taken to integrate MLIPs in many of these frameworks. Training and applying MLIPs are associated with specific scientific and technical challenges that differ from DFT-based simulations. For example, large, complex workflows must be written and executed for iterative or active learning for optimised data generation, and GPUs must be integrated into the training process. Additionally, MLIPs may generate substantial amounts of data, posing additional infrastructure challenges. Automation frameworks must be made ready for such challenges.

MLIPs have traditionally relied on three core components: the data used to train the model, the representation of atomic environments (“descriptors”), and the regression task itself that leads to a mathematical model of the potential-energy surface of a given material. In recent years, the development of descriptors and fitting frameworks has made substantial advances, exemplified by the atomic cluster expansion (ACE) and message-passing graph networks built upon it. The selection of training data has often been (partly) based on domain knowledge or active learning. It is interesting to explore how those selection steps can be automated.

Key questions - that will be refined with input from all invited speakers before the workshop and form a starting point for discussion - include:

How to automate atomistic ML? Which parts of the MLIP development process, for example, can and should be automated? How can automation accelerate their application in the wider field? What about other aspects of atomistic ML (for example, spectroscopy prediction)? Are there some aspects that should not be automated, and instead be controlled by the user directly?
How to interface to existing automation infrastructure? How do we use synergies with the more established "automation-for-DFT" community, while ensuring to develop ideas that are not specific to one specific infrastructure?
What are new methodological horizons? How does this type of automation need to change and adapt in light of the most recent developments in the field - for example, pre-trained atomistic foundation models for which the focus shifts from fitting "from scratch" to data-efficient fine-tuning? Can emerging AI agents help to accelerate the automation tasks further?
What are new applications? What future scientific insights can the automation of atomistic ML methods enable that were not possible before?

Automating atomistic machine learning

Location: CECAM-HQ-EPFL, Lausanne, Switzerland

Organisers

References