CECAM - L2M3: Large language models for materials, molecules and beyondL2M3: Large language models for materials, molecules and beyond

Large language models (LLMs) have significantly impacted various scientific fields. This has led to special issues on AI in science being published by major scientific journals (e.g., Science 381 (6654) 2023). Our organizing committee and participants have played a crucial role in this movement by exploring the applications of LLMs in chemistry and materials science and contributing to the development of open-source solutions.

For example, it has been demonstrated that LLMs can be fine-tuned to achieve impressive performance on chemistry and materials science benchmarks [1–4]. Some participants have even provided LLMs with access to external tools like Google Search and cloud robotics, enabling automated chemical synthesis. However, this has raised safety concerns [5,6].

Despite the rapid advances and attention to this field, a fundamental question remains: "What is hype, and what is reality?" In a recent hackathon [7], we organized over 150 participants to build prototypes to better understand the potential applications of LLMs in chemistry and materials science. This collaborative effort has brought forth several open questions that require intense collaboration across the community.

What are the safety and dual-use concerns? How can we assess and mitigate them? Some prominent figures have raised serious warnings about the potentially devastating impacts of such models, while others have dismissed these concerns as exaggerated.
How should we approach the use of LLMs in science, particularly in chemistry and materials science? There are several challenges associated with using LLMs in a scientific setting. Many powerful models, such as GPT-4, have been trained by for-profit companies on proprietary data, making it difficult to evaluate them scientifically. Additionally, the evolving nature of these systems and the lack of systematic evaluations pose further obstacles. For example, one of the biggest benchmark suites for LLMs, BIG-bench (maintained by Google), contains only two (superficial) chemistry tests.

Furthermore, the role of academic research is being questioned due to the limited access to computational resources, which are predominantly available to a few industrial players.
How can we maximize the benefits of these models? What does our community require to leverage these advancements effectively? Most applications of LLMs in chemistry and materials science are still in the prototype or demo stage. There is no consensus on the most promising applications in the short, medium, and long term. Moreover, there is a lack of agreement on the necessary changes in science governance, safety measures, and education to facilitate progress in these areas.

The proposed CECAM workshop aims to bring together academia, industry, and non-profits. Our goal is to discuss future directions, create a roadmap, develop new benchmarks and evaluations, and establish a framework for ongoing collaboration.

L2M3: Large language models for materials, molecules and beyond

Location: CECAM-HQ-EPFL, Avenue de Forel 2, 1015 Lausanne, Switzerland

Organisers

References