E-CAM Workshops

E-CAM Scoping Workshop. Solubility prediction

May 14, 2018 to May 15, 2018
Location : CECAM-FR-RA, Ecole Normale Supérieure de Lyon, France


  • Carine Michel (CNRS, Ecole Normale Supérieure de Lyon, France)
  • Daan Frenkel (University of Cambridge, United Kingdom)
  • Eduardo Sanz (Physical Chemistry Department, Chemistry Faculty, University Complutense of Madrid, Spain)
  • Robert Docherty (Pfizer Limited, United Kingdom)




E-CAM is a H2020 project that aims to create, develop and sustain a European infrastructure for computational science applied to simulation and modelling of materials and of biologicalprocesses of industrial and societal interest. Building on the already significant network of 15 CECAM centres across Europe and the PRACE initiative, E-CAM creates a distributed centre for simulation and modelling across the electronic, molecular and continuum length scales. The
center builds on the considerable European expertise and capability in this area of significant industrial and scientific relevance. The objective is to make a very strong impact on the European economy through the development of a key industrial capability in the rapidly developing area of technological innovation through computer modelling.
The ambitious goals of E-CAM will be achieved through three complementary instruments: the development, testing, maintenance, and dissemination of software targeted at end-user needs. It will also provide an environment for the long-term optimisation and maintenance of academic codes and will help to ensure that, in future, these codes are properly exploited by industry.
CECAM will provide two scoping workshops per year. These will ensure a strong connect with our industrial partners. One of the two workshops will be broad in scope, allowing industrial partners from different sectors to interact and to discuss new pilot projects across all of the four scientific work packages of E-CAM. The second workshop will be deep, concentrating on one or two scientific areas of particular interest to a number of our partners.
At the Mainz scoping workshop in September 2016, industrial partners expressed a strong interest in the problem of the calculation of the prediction of solubility and this will be the subject of this scoping workshop.

It has been reported that over 75% of drug development candidates have low solubility based on the Biopharmaceutics Classification System (BCS). An increasing trend towards low solubility is a major issue for drug development as formulation of low solubility compounds can be problematic. Despite tremendous efforts, a definitive accurate and comprehensive approach to predicting solubility has proven elusive. Consequently, there have been a number of attempts to probe changes in solubility as a function of structural changes in specific classes of molecules as well as systematic approaches looking at matched molecular pairs to determine improved solubility as a function of inferred crystal packing disruption. The focus of this workshop could be on the tools that allow an unprecedented deconstruction of the relative importance of molecular solvation and crystal packing on solubility. Recent work includes a systematic experimental approach to examine key thermodynamic functions such as sublimation and hydration properties as a function of structural modifications and a comprehensive computational approach to lattice energy estimation from molecular descriptors. A recent review has analysed simple predictive methods for the estimation of aqueous solubility and the specific use of a chemical informatics and theory to predict the solubility of drug like molecules [1]. A recent paper highlights the potential of these approaches and the attempts to build scientific bridges across the two communities. The paper [2] uses co-crystals to optimise the dissolution rate of a psychotropic drug with known dissolution challenges.


Algorithms for solubility calculations have been carried out by two different general approaches [3]:

(1) the thermodynamic approach (of seeking the concentration at which the electrolyte chemical potential, in solution, is equal to that of the pure solid (2) a direct coexistence approach in which the solution is equilibrated with a solid configuration (typically either a slab or a selected crystal environment) and the electrolyte concentration in the solution phase sufficiently far from the crystal surface is taken to be the solubility [4].

The algorithms for the calculation of solubility will be examined in detail at the workshop. Essentially, the chemical potential of a salt, in the solid phase is given by Gibbs free energy per molecule, which in turn is related to the Helmholtz free energy of the solid estimated using the Einstein model and the molar volume of the solid at a fixed pressure, which can be determined by performing constant-NpT simulations of the solid at room temperature. The chemical potential of the solution can be calculated from the derivative of the Gibbs free energy of solution with respect to the number of molecules, The Gibbs free energy can be estimated using a coupling parameter method combined with a technique such a MBAR or WHAM [5]. The derivative is calculated numerically by performing a number of simulations at different solute concentrations. The solubility limit is obtained when the chemical potential of the solution and the solid are equal.

In the complementary area of structure activity relationships [6], we will discuss automatic model generation process for building QSAR models using Gaussian Processes, a powerful machine learning modeling method. We will examine the stages of the process that ensure models are built and validated within a rigorous framework: descriptor calculation, splitting data into training, validation and test sets, descriptor filtering, application of modeling techniques and selection of the best model. We will explore the effectiveness of the automatic model generation process for two types of data sets commonly encountered in building ADME QSAR models, a small set of in vivo data and a large set of physico-chemical data.



[1] D. Elder and R. Holm, Int. J. Pharm., 453, 3-11 (2013)
[2] D. Elder, R. Holm, R. de Diego, H. Lopez, Int. J. Pharm., 453, 88-100,(2013).
[3] I. Nezbeda, F. Moucka and W. R. Smith, Molecular Phys, 1665-1690 (2016).
[4] J. L. Aragones, E. Sanz, and C. Vega, J., Chem. Phys. 136, 244508 (2012).
[5] R. Gozalbes, , A. Pineda-Lucena, Bioorg Med Chem, 18, 7078–7084, (2010).
[6] J. Shaoxin Feng and Tonglei Li, J. Chem. Theory Comput., 2, 149-156 (2006).