Thinking outside the box - beyond machine learning for quantum chemistry
Location: CECAM-DE-MM1P
Organisers
The field of machine learning (ML) is already making rapid and tremendous impact at the interfaces of the traditional disciplines of Chemistry, Physics, Biology and materials science. Its ability to use existing examples to rapidly make meaningful predictions in new cases offers a new way to screen wide ranges of structures and to estimate the results of highly accurate methods at much reduced cost. However, there are several issues which require careful thought in deploying these tools. Firstly, reproducibility in the training of models is a current topic of active debate receiving substantial attention and within the last year calls for more physical based approaches are beginning to appear. Then issues of the explainability and explicability of the predictions also matter, particularly with some of the more powerful ML methods. Finally there are problems with additivity to models: learning new cases tends to overwrite existing expertise and predicting properties and responses outside of the original model are not usually possible.
A counterpoint to these methods is the experiences of the past 20 years with approximate quantum mechanical methods [1], which now represent an essential part of computational tools for a solid atomistic understanding of a broad range of physical, chemical and biological problems for both large and challenging systems. These methods are parameterized, but can provide a clear physical understanding of complex structures and processes. Additionally, they can readily be extended to calculate properties and systems outside of their original parameters and fitting sets. However, this commonly comes at the cost of substantial Human effort to parameterize and test these models, providing substantial opportunities for ML.
DFTB
The DFTB approach provides modular components within other academic and/or commercial software products, including DFTB+[19], ADF [20], ATK [21], DeMon [22], Gaussian [23] and Materials Studio [24], and several MM-force fields tools, eg. CHARMM [25]. This considerably enhances the spreading of the method to potential applicants in both academic settings and in the R&D of industrial companies. Overviews of some of the range of DFTB developments and extensions in the species issues of the Journal of Physical Chemistry A 111, Number 26 (2007) and Physica Status Solidi b 249, Issue 2 (2012).
The most recent DFTB developers meeting was in November 2016 to report and discuss the present status of DFTB developments in the different software products and to join forces for further improvements in accuracy, parameterization of new systems and extensions of functionality.
Trends in Machine Learning
The Journal of Chemical Physics has recently invited a special issue on ``Data-enabled theoretical chemistry'' which provides a comprehensive contemporary view on the field with over 40 contributions from leading scientists actively working on the integration of modern machine learning techniques into quantum chemistry [26]. The issue was motivated by preceding successes in the field such as the systematic fitting of potential energies for molecular dynamics simulations or vibrational spectroscopy [27,28]. As also reviewed recently [29], laws of Physics have been rediscovered with ML [30], atomization energies and other electronic ground-state properties of organic molecules can now be predicted with hybrid DFT accuracy [31], and clusters can be identified [32] and compounds mapped [33]. ML can also be used to discover new molecules [34] or crystals [35], and even new reactions [36]. Various properties and systems have been studied with ML, including electrons [37], chemical potentials [38], ionic forces [39], or NMR shifts [40]. By now, neural networks and Gaussian processes have demonstrably surpassed DFT accuracy when it comes to the prediction of electronic ground-state properties of organic materials [41]. Efforts to further improve and assess ML models for their application throughout compositional space are ongoing [42]. When it comes to the improvement of well established QM methods, however, ML based investigations, such as in Refs. [43], are sparse.
References
Marcus Elstner (Karlsruhe Institute of Technology) - Organiser
Thomas Frauenheim (University of Bremen) - Organiser
Switzerland
Anatole von Lilienfeld (University of Basel) - Organiser & speaker
United Kingdom
Ben Hourahine (University of Strathclyde) - Organiser & speaker
United States
Nebgen Benjamin (Los Alamos National Laboratory) - Organiser & speaker