Mixed-gen Session 4: Data Driven Science
Location: Online meeting - hosted by CECAM-HQ
Organisers
This is the fourth of a series of on line events aimed mainly at PhD students and researchers in their first post-doc. Our goal is to provide a new venue for these young scientists to share their work, get expert feedback and have an opportunity to strengthen scientific relations within the CECAM community.
The event is fully on line and will have two parts. In the first, broadcasted as a Zoom webinar, Prof. Claudia Draxl, Humboldt University Berlin, will present a general talk in the area of data driven science (title and abstract below). This will be followed by seminars given by two young members of the community to describe their work in the same area. In the second part of the event, we shall move to a virtual poster session hosted in a Gather room where more PhD students and researchers in their first post-doc will present pertinent projects. The session’s speaker and other (surprise) expert guests will join us for this poster session to discuss exciting new science.
To participate
If you are a PhD student or a post-doc:
Please use the Participate Tab on this page to start the application. You will have to login using your CECAM account to access the application form. If you don't have a CECAM account yet, use the register option on the top right corner of the login page...and welcome to CECAM!
If you are a more senior scientist:
Please contact the organisers and we shall process your registration.
Submission of posters
(Please note that - at least for the time being - we shall accept posters only from PhD students or researchers in their first post-doc)
After your application is accepted, you will be able to submit a poster. In the CECAM page for this event, go to “My participation” tab and click on “Add a poster”, providing in particular title and abstract following the recommended format. On the same form you can already upload your poster file in png or jpg if ready. These formats are strict to enable showing of the poster in the Gather session. If the poster file is not ready at the moment of submitting your abstract, you can upload it later by editing your submission (Go to “My participation” tab and click three vertical dots on “Actions” column on table “My posters”). Please upload your poster as soon as possible to enable a decision from the selection committee - see below.
Please note that posters will be visible on the Gather room associated with this session until the end of the series (July 2021) unless otherwise requested.
DEADLINE FOR SUBMISSION: TEN DAYS BEFORE THE EVENT
Selection of posters
Posters will be selected by the event organisers with the support of our main speaker and experts who will take place in the poster session.
Selection of the two talks by PhD or first year postdocs
These contributions, to be broadcasted in the Zoom webinar in the first part of the event, will be selected, after a preliminary screening by the organisers, the main speaker and guest experts, via a lottery from the posters selected for the Gather session. Please indicate in your application if you DO NOT WANT your poster to be considered for this lottery.
THE DECISION ON THE POSTER AND THE OUTCOME OF THE LOTTERY SELECTION WILL BE COMMUNICATED ONE WEEK BEFORE THE EVENT
POSTER SUBMISSIONS BEYOND THIS DEADLINE WILL BE ACCEPTED BUT NOT CONSIDERED FOR UPGRADE TO TALK. SUBMISSION WILL BE DEFINITELY CLOSED FOUR DAYS BEFORE THE EVENT.
SESSION 4. Title and abstract of talks
From data to knowledge
Claudia Draxl, Humboldt University Berlin
Research data paired with Artificial Intelligence (AI) enable a new quality of science. The ultimate goal in our research domain is to predict novel candidate materials for a given application, possibly even in regions of the materials space that no-one would think of. For a real breakthrough, key prerequisites have to be brought together: Data – not only Big but most relevant and reliable – and novel AI tools with predictive power. In this session, we will review where we are on this road. A special focus will be on recently-developed data-analysis tools.
BioExcel-CV19: a database of COVID-19 related Molecular Dynamics trajectories
Adam Hospital, IRB Barcelona
BioExcel-CV19 is a platform designed to provide web-access to atomistic-MD trajectories for macromolecules involved in the COVID-19 disease. BioExcel-CV19 main objective is to generate a tool for scientists interested in the COVID-19 research to interactively and graphically check key structural and flexibility features stemming from MDs. As these features vary depending on the structure analyzed, specific analyses were performed, uploaded to the database, and represented in the web portal. These analyses and key features were collected by direct interaction with the simulations authors. As an example, trajectories corresponding to the virus Receptor Binding Domain (RBD) attached to the human Angiotensin Converting enzyme 2 (hACE2), the RBD-hACE2 complex, include interface observables (e.g. residue distances, hydrogen bonds), allowing an easy analysis of their behaviour along the simulation. The project is part of the open access initiatives promoted by the world-wide scientific community to share information about COVID-19 research.
All data produced by the project is available to download from an associated programmatic access API. Access: https://bioexcel-cv19.bsc.es/#/
MODNet: accurate and interpretable property predictions for limited materials datasets by feature selection and joint-learning
Pierre-Paul De Breuck, Université catholique de Louvain
In order to make accurate predictions of material properties, current machine-learning approaches generally require large amounts of data, which are often not available in practice. In this work, a novel all-round framework is presented, named MODNet, which relies on a feedforward neural network, the selection of physically-meaningful features and, when applicable, joint-learning. Next to being faster in terms of training time, this approach is shown to outperform current graph-network models on small datasets. In particular, the vibrational entropy at 305K of crystals is predicted with a mean absolute test error of 0.009 meV/K/atom (four times lower than previous studies). Furthermore, joint-learning reduces the test error compared to single-target learning and enables the prediction of multiple properties at once, such as temperature functions. Finally, the selection algorithm highlights the most important features and thus helps understanding the underlying physics.
References
Sara Bonella (CECAM HQ) - Organiser
Ignacio Pagonabarraga (CECAM HQ) - Organiser