Physics-aware machine learning for molecules and materials
Location: Cornell Tech campus on Roosevelt Island, New York City, New York, USA
Organisers
Machine learning models are now a routine workhorse for atomistic simulation and molecular design. In recent years, these approaches have shifted from purely data-driven fitting engines to physics-aware, interpretable, and uncertainty-calibrated tools. However a central tension remains - when (and how) should we hardwire physical knowledge, and when should we let the data speak for itself? This workshop confronts this “bitter lesson”, where sometimes the best performing models emerge when we let them learn the physics from the data, while also recognizing that explicit physics is often necessary for interpretability, generalization, and scientific usefulness.
One domain where these advances have been especially prevalent is in the development of machine-learning interatomic potentials (MLIPs). Hard-coded symmetry and conservation constraints are integrated with graph neural networks[1] to predict interatomic forces at first-principles accuracy while retaining linear scaling, with recent models incorporating explicit physics based treatments of long-range electrostatics.[2] However, many MLIP frameworks still omit key physical phenomena such as magnetization, spin, charge and electron transport that are critical for accurately modeling important problems in catalysis, energy materials, and quantum matter. Incorporating these effects without sacrificing computational efficiency or scalability remains an open challenge.
In parallel, there has been growing attention on interpretability in both molecular property prediction and MLIP models by incorporating energy-decomposition,[2] feature-attribution,[3] and counterfactual explanations[4] tailored to molecular graphs. These self-interpretable architectures expose latent functional-group contributions and enable domain experts to verify that models “get the right answer for the right reason.”
Lastly, generalizability and calibrated uncertainty have become important yardsticks for interpretation and reliability of ML model predictions, especially outside of their training distributions.[5] Techniques such as evidential losses, lightweight ensemble approaches, and directed-message-passing UQ layers now yield reliable, low-cost error bars, enabling active learning, chemical extrapolation, and foundation model fine-tuning for niche applications.[6,7]
Key Questions to be Tackled
-
Where is the boundary between learning physics from data and explicitly encoding it, and how do we decide when physical inductive bias is essential versus limiting?
-
What important physical effects (e.g. magnetism, spin, electron transport) are missing from current MLIPs, and how can they be incorporated effectively?
-
Can foundation model fine-tuning be guided by well calibrated UQ to avoid overconfidence on niche chemistries?
-
At what scale (atom, functional group, many-body term) does an explanation become most actionable for experimental chemists, and how can models adapt that scale to the task?
Collectively, these questions reflect a shift in the field from increasing predictive accuracy to ensuring models are physically grounded and aligned with meaningful scientific challenges. However, these research themes of physics integration, interpretability, generalizability, and UQ still evolve largely in isolation. This workshop will cross-fertilize ideas, benchmark competing strategies on common tasks, and chart a roadmap for physically driven, trustworthy molecular ML. It will advance rigorous and foundational science at a critical moment when fast-moving ML developments risk outpacing physical insight and methodological grounding.
References
Camille Bilodeau (University of Virginia) - Organiser
Stefano Martiniani (NYU) - Organiser
Andrew Rosen (Princeton University) - Organiser
Shuwen Yue (Cornell University) - Organiser
Juan J de Pablo (University of Chicago) - Organiser

About