Uncertainty quantification in atomistic modeling: From uncertainty-aware density functional theory to machine learning
Location: CECAM-HQ-EPFL, Lausanne, Switzerland
Organisers
Uncertainty quantification (UQ) is a standard, widespread practice in experimental sciences. However, rigorous uncertainty analysis in atomistic modeling—from density functional theory (DFT) calculations to machine learning (ML) models trained on DFT results—is relatively underdeveloped, leading to frequent scientific outcomes in these fields that lack any uncertainty or error quantification. This poses a significant challenge for innovation and progress in materials science, especially given the crucial role of multiscale numerical simulations in the modern chemical and physical research communities.
In conducting DFT calculations, numerical parameters such as basis sets sizes, energy tolerances, convergence criteria, and many other preconditioning parameters need careful selection. Typically, these parameters are chosen heuristically, especially in high-throughput contexts, which can lead to inconsistent and unsystematic errors that make it challenging to compare data. Error balancing strategies can improve parameter tuning in DFT simulations [1-5], but comprehensive error bounds for generic chemistry codes and fully integrated models are still lacking.
The recent growth of data-driven atomistic ML models has extended materials modeling at ab initio accuracy beyond the conventionally accessible length and time scales. Atomistic ML models have been constructed to include uncertainty estimates, which can enable error propagation all the way up to the physical observables. [10] For instance, Gaussian process based approaches of classical machine learning (linear regressions, kernel ridge regressions, also in their sparse versions) are naturally endowed with an estimate of the variance of the prediction of a given input [6]. Ensembles of ML models are another viable technique for estimating and propagating ML models’ uncertainty [16].
Although UQ for deep neural network based models pose a challenge, recent work has shown that uncertainty estimates can be obtained by adopting Bayesian methods, such as Monte Carlo dropout [14] and the Gaussian-process interpretation of wide neural networks [11], as well as Gaussian Mixture Models [15] or other uncertainty surrogates [16]. More recently, cheap and reliable methods have been devised for deep neural networks based on the combination of the Laplace approximation and last-layer approximation. [7,8,12,13]
Finally, these on-the-fly ML uncertainty estimates can be leveraged to construct robust datasets for model training via active learning strategies. [9]
Despite the possibility of ML models to quantify the errors from model fitting, they critically also inherit the underlying bias and error of the method, often-times DFT, chosen to generate the data. Thus, in this actively evolving research community, the holy grail of UQ would be to propose a comprehensive approach that links the errors of the DFT calculations to those stemming from the statistical inference of atomistic ML models.
References
Genevieve Dusson (CNRS & Université Bourgogne Franche-Comté) - Organiser
Germany
Julia Maria Westermayr (Leipzig University) - Organiser
Italy
Federico Grasselli (University of Modena and Reggio Emilia) - Organiser
Switzerland
Sanggyu Chong (EPFL) - Organiser
Michael Herbst (EPFL) - Organiser