From Machine-Learning Theory to Driven Complex Systems and back
Location: CECAM-HQ-EPFL, Lausanne, Switzerland
Organisers
Registration deadline for abstracts: February 19, 2024 [please submit the abstract in the "motivation section" upon registration]
Notification of accepted participants and talks: March 4, 2024.
In this workshop, we propose to gather researchers with complementary backgrounds, all involved in cutting-edge research in statistical physics, machine learning and statistical inference. The goal of this workshop is to strengthen the links between machine learning (ML), disordered systems and driven complex systems - such as structural glasses and dense active matter - to mutually exploit their theoretical and computational tools as well as their physical intuition. Our main focus will be on stochastic dynamical processes, out-of-equilibrium regimes and their insights into training dynamics, primarily from a computational perspective. In addition to deepening our theoretical understanding of the successes and limitations of ML, these connections will pave the way for the development of new algorithms and suggest alternative architectures.
We plan to address specifically the following topics:
-
Dynamical Mean-Field Theory (DMFT)
-
Generative neural networks for modeling
-
Phase diagrams, landscapes and training optimization
Machine learning (ML) has become ubiquitous in the last decade. Many everyday tasks can now be accomplished with ML-assisted tools, such as ChatGPT as a writing assistant, Copilot as a programming assistant, or image-generating models for art and designs. Due to its strong impact on both industry and fundamental science, ML has become an extremely active research area, leading to lively exchanges between practitioners and theorists in very diverse communities. Its great success requires a deeper theoretical understanding and integrating complementary expertise to address its many challenges [1,2,3].
Training a ML model with a particular architecture on a dataset amounts to evolving the parameters in a complex high-dimensional landscape defined by a given loss function. The main questions that arise are: (i) how the landscape statistical properties depend on the architecture and the dataset statistics, (ii) what is the associated performance of standard optimization algorithms such as stochastic gradient descent, (iii) how these algorithms can be improved in terms of generalization and computational efficiency, (iv) what is the impact of the dataset statistics on the learning process. In terms of modeling, a particular challenge is to design correlated artificial datasets to study the learning dynamics in a controlled manner. Moreover, important insights into either the learning process or practical applications should come from the interpretability of the learned parameters.
While it is obvious that deep neural networks are capable of handling increasingly complicated tasks, understanding how the formation of complex patterns relates to the dataset statistical properties is highly non-trivial, even for relatively simple architectures. Recent advances include the construction of novel loss functions aimed at accelerating the learning [4], the development of synthetic datasets with a higher degree of complexity capable of mimicking real datasets [5], or investigating the structure of the underlying complex landscapes [6]. In parallel, out-of-equilibrium physics has proven particularly useful in developing and controlling powerful generative models that can fully describe the variety of complex datasets [7,8,9,10]. These studies had a strong impact on the computational level. However, the development of new algorithms is often guided by intuitions about the loss-function specific properties, and a comprehensive understanding of the learning dynamics is still lacking.
Recent efforts in this direction rely on studying Langevin equations associated with simple models. The formalism of dynamical mean-field theory (DMFT), developed in the context of statistical physics to study the out-of-equilibrium dynamics of structural glasses [8,11] and even dense active systems [12], has been adapted to inference and ML models [13,14]. Its numerical implementation poses challenges that must be overcome to fully exploit it for improving the training process [15,16].
Finally, generative neural networks have great potential for modeling complex data. One approach is energy-based modeling, in which the probability distribution is represented by a Boltzmann distribution with a neural network as the energy function. Interpreting the trained neural network as a disordered interaction Hamiltonian is a powerful tool for inference applications [17,18]. However, further research is needed to understand how the dataset patterns are encoded in the model, and how to interpret them.
References
Valentina Ros (CNRS and Université Paris- Saclay) - Organiser
Beatriz Seoane (Université Paris-Saclay) - Organiser
Spain
Aurélien Decelle (Universidad Complutense de Madrid) - Organiser
Switzerland
Elisabeth Agoritsas (University of Geneva) - Organiser