Open Databases Integration for Materials Design
CECAM-HQ-EPFL, Lausanne, Switzerland
Designing new materials suitable for specific applications is a long, complex, and costly process. Researchers think of new ideas based on intuition and experience. Their synthesis and evaluation require a tremendous amount of trial and error. It can actually take months to test a single new material, and most often the outcome is negative. In a speech on June 24, 2011, President Obama announced the Materials Genome Initiative, as a major US research priority to foster innovation more effectively.
In the last few years, there has been a major game change in materials design. Thanks to the exponential growth of computer power and the development of robust first-principles electronic structure codes, it has become possible to perform large sets of calculations automatically. This is the flourishing field of high-throughput (HT) ab initio computations . The concept though simple is very powerful. HT calculations are used to create large databases (DBs) containing the calculated properties of existing and hypothetical materials. These DBs can then be intelligently interrogated, searching for materials with desired properties and so removing the guesswork from materials design.
In this framework, various open-domain DBs have appeared online:
- the AFLOW distributed materials property repository: http://aflowlib.org
- the Materials Cloud: http://materialscloud.org
- the Harvard Clean Energy Project Database: http://molecularspace.org
- the Materials Project: http://materialsproject.org
- the NOMAD (Novel Materials Discovery) Archive: http://metainfo.nomad-coe.eu
- the Open Quantum Materials Database: http://oqmd.org
- the Computational Materials Repository: http://cmr.fysik.dtu.dk
- the Data Catalyst Genome: http://suncat.stanford.edu
- the Open Materials Database: http://openmaterialsdb.se
- the Theoretical Crystallography Open Database: http://www.crystallography.net/tcod
In some of those cases, a Representational State Transfer (REST) Application Program Interface (API) is available  to interrogate the DB through scripts (though not always documented). But, so far, it is only possible to interrogate one DB at a time and the APIs are very different from one DB to another.
Building on the results achieved in a previous workshop  and subsequent discussions, the present workshop aims at continuing to work in the direction of making these DBs interoperational. The ultimate goal is to enable the query of several DBs simultaneously and with a common API. This would greatly benefit the materials science community (e.g. by enhancing opportunities for data mining) and clearly contribute to fostering innovation more effectively. This may also be very useful for the validation and verification of the codes and the theories, which has become more and more important in the field .
To this end, we propose to gather all the key developers involved in the different efforts, both in Europe and in the US. The workshop would obviously include some oral presentations about the different DBs and their latest capabilities. But, the largest fraction of the time would be dedicated to more technical presentations and discussions about the first implementations of the common API. We also propose to leave more than a full day for a coding party continuing to implement the common API.
Despite the large efforts that have been put in the development of open databases for materials design, their integration is clearly lacking. Enabling it is thus the next step that needs to be taken. The proposed workshop will make this possible.
Gian-Marco Rignanese ( Université catholique de Louvain ) - Organiser
Matthias Scheffler ( Fritz-Haber-Institut der Max-Planck-Gesellschaft ) - Organiser
Saulius Gražulis ( Vilnius University Life Science Center Institute of Biotechnology ) - Organiser
Rickard Armiento ( Linköping University ) - Organiser & speaker
Nicola Marzari ( EPFL ) - Organiser