Mixed-gen Session 5: Machine Learning
Location: Online meeting - hosted by CECAM-HQ
Organisers
This is the fifth of a series of on line events aimed mainly at PhD students and researchers in their first post-doc. Our goal is to provide a new venue for these young scientists to share their work, get expert feedback and have an opportunity to strengthen scientific relations within the CECAM community.
The event is fully on line and will have two parts. In the first, broadcasted as a Zoom webinar, Prof. Michele Ceriotti, EPFL, will present a general talk in the area of machine learning (title and abstract below). This will be followed by seminars given by two young members of the community to describe their work in the same area. In the second part of the event, we shall move to a virtual poster session hosted in a Gather room where more PhD students and researchers in their first post-doc will present pertinent projects. The session’s speaker and other (surprise) expert guests will join us for this poster session to discuss exciting new science.
To participate
If you are a PhD student or a post-doc:
Please use the Participate Tab on this page to start the application. You will have to login using your CECAM account to access the application form. If you don't have a CECAM account yet, use the register option on the top right corner of the login page...and welcome to CECAM!
If you are a more senior scientist:
Please contact the organisers and we shall process your registration.
Submission of posters
(Please note that - at least for the time being - we shall accept posters only from PhD students or researchers in their first post-doc)
After your application is accepted, you will be able to submit a poster. In the CECAM page for this event, go to “My participation” tab and click on “Add a poster”, providing in particular title and abstract following the recommended format. On the same form you can already upload your poster file in png or jpg if ready. These formats are strict to enable showing of the poster in the Gather session. If the poster file is not ready at the moment of submitting your abstract, you can upload it later by editing your submission (Go to “My participation” tab and click three vertical dots on “Actions” column on table “My posters”). Please upload your poster as soon as possible to enable a decision from the selection committee - see below.
Please note that posters will be visible on the Gather room associated with this session until the end of the series (July 2021) unless otherwise requested.
DEADLINE FOR SUBMISSION: TEN DAYS BEFORE THE EVENT
Selection of posters
Posters will be selected by the event organisers with the support of our main speaker and experts who will take place in the poster session.
Selection of the two talks by PhD or first year postdocs
These contributions, to be broadcasted in the Zoom webinar in the first part of the event, will be selected, after a preliminary screening by the organisers, the main speaker and guest experts, via a lottery from the posters selected for the Gather session. Please indicate in your application if you DO NOT WANT your poster to be considered for this lottery.
THE DECISION ON THE POSTER AND THE OUTCOME OF THE LOTTERY SELECTION WILL BE COMMUNICATED ONE WEEK BEFORE THE EVENT
POSTER SUBMISSIONS BEYOND THIS DEADLINE WILL BE ACCEPTED BUT NOT CONSIDERED FOR UPGRADE TO TALK. SUBMISSION WILL BE DEFINITELY CLOSED FOUR DAYS BEFORE THE EVENT.
SESSION 5. Title and abstract of talks
The thin line between physics and data
Michele Ceriotti, EPFL, Lausanne
As it has done with many other fields of science, machine learning has taken molecular and materials modeling by storm. There is virtually no simulation task to which machine-learning techniques have not been applied, usually very successfully.
In this talk I will take one step back and look at the relationship between this inductive, data-driven modeling paradigm, and the traditional physics-based approaches. I will take on both a historical and a conceptual perspective. First, I will put the latest wave of machine-learning potentials and models in the context of well-established atomistic simulation techniques. Then, I will discuss the interplay between general-purpose statistical learning ideas and the domain-specific insights that can, and should, be applied to obtain accurate and transferable predictions.
From the description of long-range interactions between atoms and molecules, to the estimation of structural and functional properties of materials in realistic conditions, the future of atomic-scale simulations straddles the line between data and physics-driven modeling.
Atomic Cluster Expansion force fields for organic molecules: evaluation beyond RMSE
Dávid Péter Kovács, University of Cambridge
The efficient simulation of molecules and materials from first principles is a long standing challenge in the physical sciences. Machine learned force fields promise to speed up quantum mechanical simulations by several orders of magnitudes, whilstmaintaining the accuracy of high level quantum mechanics. In the past 3 years several new approaches were proposed to fulfill this promise built on Gaussian Process Regression and Neural Networks. In this poster we demonstrate that highly accurate molecular force fields can be built using the Atomic Cluster Expansion (ACE) framework and linear least squares regression. Our model is built from body ordered symmetric polynomials, which are a natural extension of the traditional molecular mechanics force fields. We show that these relatively simple models are able to achieve state of the art accuracy on the MD17 benchmark dataset of small organic molecules. Furthermore we also train several other machine learning models like sGDML, ANI and GAP, as well as a classical force field and compare them on tasks such as normal mode prediction and extrapolation to high temperature data. Finally, we fit the potential energy surface of a large flexible organic molecule. We show that the ACE model shows excellent transferability across temperatures, and we compare how well the different models reproduce the complex dihedral torsional energy landscape of the molecule form as little as 500 reference calculation.
Automated identification of collective variables and metastable states from molecular dynamics data
Yasemin Bozkurt Varolgünes, Max Planck Institute for Polymer Research
Extracting insight from the enormous quantity of data generated from molecular simulations requires the identification of a small number of collective variables whose corresponding low-dimensional free-energy landscape retains the essential features of the underlying system. Data-driven techniques provide a systematic route to constructing this landscape, without the need for extensive a priori intuition into the relevant driving forces. In particular, autoencoders are powerful tools for dimensionality reduction, as they naturally force an information bottleneck and, thereby, a low-dimensional embedding of the essential features. While variational autoencoders ensure continuity of the embedding by assuming a unimodal Gaussian prior, this is at odds with the multi-basin free-energy landscapes that typically arise from the identification of meaningful collective variables. In this work, we incorporate this physical intuition into the prior by employing a Gaussian mixture variational autoencoder (GMVAE), which encourages the separation of metastable states within the embedding. The GMVAE performs dimensionality reduction and clustering within a single unified framework, and is capable of identifying the inherent dimensionality of the input data, in terms of the number of Gaussians required to categorize the data. We illustrate our approach on two toy models, alanine dipeptide, and a challenging disordered peptide ensemble, demonstrating the enhanced clustering effect of the GMVAE prior compared to standard VAEs. The resulting embeddings appear to be promising representations for constructing Markov state models, highlighting the transferability of the dimensionality reduction from static equilibrium properties to dynamics.
References
Sara Bonella (CECAM HQ) - Organiser
Ignacio Pagonabarraga (CECAM HQ) - Organiser