CECAM - Open Databases Integration for Materials DesignOpen Databases Integration for Materials Design

Trial-and-error is the traditional approach to design bespoke materials for specific applications. Typically, researchers use their intuition and experience to propose compounds that are then synthesized and tested to ascertain whether or not they fulfill the target properties. It usually takes months to test a single compound, and most often the outcome is negative so many cycles of trial and improvement are required to finalize the material. The typical research project is therefore complex, time-consuming, and costly. In a speech on June 24, 2011, President Obama announced the Materials Genome Initiative, as a major US research priority to foster innovation more effectively.

The last decade has witnessed game changing improvements in materials design. The exponential growth of computer power and the development of robust first-principles electronic structure codes make it possible to perform large sets of calculations automatically. This has provided a launchpad for the flourishing field of high-throughput (HT) ab initio computations [1,2]. The concept is simple yet powerful. The results of HT calculations are curated in large databases (DBs), recording the computed properties of existing and hypothetical materials. These DBs can then be screened for materials with the desired properties, removing the guesswork from materials design. Furthermore, these DBs can be used to build machine learning models that can design materials. Thanks to such HT calculations, many open-domain DBs have emerged. In parallel, the experimental community has also continued to develop its own DBs with material properties [3,4,5,6,7,8]. In the latter case, some of the DBs are open (e.g., at the National Institute of Standards and Technology) whilst others are marketed by companies. Given the number of available materials DBs, it is impossible to be exhaustive but some initiatives provide links to as many as possible (see e.g., Refs. [9] and [10]).

Nonetheless, the landscape in materials DBs is fragmented. In some cases, a Representational State Transfer (REST) Application Program Interface (API) is available [11,12,13] to interrogate the DB through scripts (though not always documented). But, until recently, it was only possible to interrogate one DB at a time and the APIs would vary from one DB to another. Furthermore, the lack of standards makes it complicated to access large-scale curated materials data. Flexible, uniform, computer-readable data standards should be established to enable data to be shared and systematically mined.

To take advantage of the disparate databases, the OPTIMADE consortium [14] was formed with the goal to create a holistic API that can access all materials databases. The consortium has gathered the developers/maintainers of the following DBs:

AFLOW distributed materials property repository: http://aflow.org
ChemDataExtractor in Cambridge: http://chemdataextractor.org
Materials Cloud: http://materialscloud.org
Materials Project: http://materialsproject.org
NOMAD (Novel Materials Discovery) Archive: https://nomad-lab.eu/prod/rae/gui/search
Open Quantum Materials Database: http://oqmd.org
Computational Materials Repository: http://cmr.fysik.dtu.dk
Open Materials Database: http://openmaterialsdb.se
Crystallography Open Database: http://www.crystallography.net/cod

Under the OPTIMADE banner, six workshops have created a community developing the OPTIMADE API, and aims to further expand the community in the future. Thanks to the discussions that have taken place during the workshops and through the mailing list, the first stable version (v1.0) of the OPTIMADE API specifications has been released [15,16]. All of these databases have now implemented the OPTIMADE API, providing scientists with immediate access to a wealth of materials data.

A paper about the API was recently published in the prestigious peer reviewed journal Nature Scientific Data [17], which should act as a launchpad for future adoption and usage of the API. Furthermore, a website has been set up [14] with a mailing list, a wiki, and a GitHub repository. Optimade-python-tools [18] have been developed and released as the reference implementation for python servers, lowering the barrier to new OPTIMADE adopters. A few tutorials have also been organized online [19,20].

This CECAM event will consist of both a tutorial and a workshop. The tutorial will be mainly online, but participants who would like to participate onsite are more than welcome. They can even stay for the rest of the workshop, and any participants implementing OPTIMADE for their own databases will be strongly encouraged to stay on to accelerate the adoption of OPTIMADE. For the workshop, onsite participation is strongly suggested though online attendance will be possible but without any guarantee on the experience.

For the tutorial, the participants will be asked to go through a series of videos before the meeting itself. After a Q&A session on the videos, the ﬁrst hands-on session will comprise exercises about generating OPTIMADE queries (that can be used on all databases), and about different tools for querying the databases. During the second hands-on session, the tutees choose between three options. They can continue the exercises about querying the databases. Alternatively, they can study how to implement an OPTIMADE client for an existing database. Finally, if they have a speciﬁc problem of their own that can be dealt with using OPTIMADE, the developers can help them to address it. The tutees will continue to work during the second day (with the tutors available at speciﬁc moments). The tutorial will end on the morning of the third day after short elevator pitches by the tutees during which they will present their achievements.

The workshop will be dedicated to the further development of the OPTIMADE API.

Open Databases Integration for Materials Design

Location: CECAM-HQ-EPFL, Lausanne, Switzerland

Organisers

References