CaMML - Chemistry and Materials Machine Learning School
Location: CECAM-UK-DARESBURY
Organisers
For over fifty years, molecular simulation and atomistic modelling have been instrumental in advancing our understanding and fostering new developments in materials and molecular science. Methods like density functional theory (DFT) and interatomic potentials have enabled the quantitative modelling of system behaviours at various length and time scales. However, a trade-off has typically existed between the scale of exploration and the accuracy of the methods used. The recent boom in machine learning (ML) and data-driven strategies is set to bridge this gap, allowing for the simulation of large systems over long durations with highly accurate predictions. As the accuracy of predictions for key materials and molecular properties improves, ML is now being used to design a wide range of chemicals, from small molecule pharmaceuticals to porous frameworks and solid-state battery systems.
The significance of ML in materials and molecular modelling is highlighted by the surge in research within the field. There is a growing demand for researchers skilled in both molecular simulation and data science; however, few current educational programs provide this combined training. This school aims to fill this gap by providing a strong foundation in ML basics, introducing recent, relevant developments from computer science, and demonstrating the application of these methods to challenges in materials and molecular modelling. The course targets early-career researchers with a background in materials or molecular science and experience in Python, but limited knowledge of ML.
Course Content:
Introduction to the fundamentals of ML to ensure all participants have a solid grasp of the basic principles of building data-based models.
Neural networks, the foundational feature behind many powerful modern ML techniques, covering architectures and training through backpropagation.
Graph neural networks (GNNs), which have revolutionized both molecular and materials machine learning with their inductive bias for graph-based structures. The course will cover basic graph architecture and recent examples.
Machine learned interatomic potentials (MLIPs), including the latest developments and how to generate and fine-tune models for specific applications in both materials and molecules.
Generative models, covering the core concept and specific examples such as variational autoencoders, generative adversarial networks, and stable diffusion.
Bayesian Optimisation covering core concepts and specific examples for materials and molecular science.
On the final day, participants will break into specialized sub-groups to explore the latest developments in ML for:
Materials - foundation model interatomic potentials.
Molecules - De novo design by reinforcement learning.
The syllabus will provide students with a robust understanding of the key statistical concepts and algorithms that form the basis of machine learning. It will then build on this knowledge to showcase the cutting-edge tools used in applied ML for materials and molecular modelling.
References
Keith Butler (University College London) - Organiser
Alin Elena (Science and Technology Facilities Council - Scientific Computing) - Organiser
Alex Ganose (Imperial College London) - Organiser
Ioan-Bogdan Magdau (Newcastle University) - Organiser
Reinhard Maurer (University of Warwick) - Organiser

About