CECAM - From sequences to functions: challenges in the computation of realistic genotype-phenotype mapsFrom sequences to functions: challenges in the computation of realistic genotype-phenotype maps

How genetic variation contributes to phenotypic variation is an essential question to understand the evolutionary process. The experimental characterization of the genotype-phenotype (GP) relationship is a formidable and expensive task; therefore, computational approaches have been recurrently used to make predictions of phenotypes from genotypes and to uncover the statistical features of that relationship. Advances notwithstanding, an apparently unsurmountable problem remains: the astronomically large size of the space of genotypes.

The probability of expressing different phenotypes or of experiencing mutations that modify the current phenotype depends on the architecture of the GP map. It eventually determines how the space of function is explored and the chances of survival or innovation under endogenous or exogenous changes. Since the GP map cannot be systematically studied by experimental means, a number of well-motivated models have been proposed. Examples are the RNA sequence-to-secondary structure map, the hydrophobic-polar model for protein folding, reactions-based metabolism, gene-regulatory networks, the quaternary structure of proteins or the construction of metabolism from genomes with intermediate levels. Experimental, computational and theoretical results suggest that GP maps may have universal architectural properties.

Progress in our understanding of the GP map at various levels is of relevance for different scientific communities with interests that rank from evolutionary theory to molecular design. For example, an up-to-date theory of evolution has to incorporate the networked and multilayered structure of GP maps, since that structure determines adaptive dynamics. Establishing the strengths and limitations of simplified GP maps should establish their suitability to predict function for natural sequences (as RNA or proteins). A sufficiently profound understanding of how DNA sequences map onto molecular function might be of utmost help in biotechnology and systems chemistry approaches. Finally, the way in which generic properties of the GP map shape adaptation in an ecological context have rarely been formally explored so that, as of today, the overarching question of whether organismal phenotypes can be derived from microscopic properties of genotype spaces remains open.

The ideas to be discussed during the workshop are highly interdisciplinary. The main topic being Biology, it has been addressed formally by physicists and mathematicians: despite its high complexity, the GP map apparently has generic properties that can be likely uncovered combining molecular simulations, statistical physics, and mathematical modelling. Computer engineers have developed optimized codes and new algorithms to push numerical studies to their boundaries. The impossibility of exhaustively analyzing complete genotype spaces turns the problem back to mathematicians in search for analytical approaches. Particularly important in covering this aspect are the theory of complex networks, to describe the underlying structure of genotypes within phenotypes, and the design of coarse-grained models that provide the basis of a theory of evolution that avoids relying on microscopic genetic changes. Finally, models that cannot be falsified through empirical measures are of no help. Currently, new technological approaches permitting a massive analysis of data (e.g. ultra-deep sequencing and automated measures of the fitness of sequences) collaborate to clarify realistic and unrealistic features of the different approaches. Thus, the main questions to be discussed are:

* Are there truly universal properties of the genotype-phenotype map? Comparative studies of different GP models, at present only manageable computationally, are essential to answer that question.
* Can universal properties of the GP map be extracted by computational means? It is unclear if different models can be directly compared and, especially, how dependent are the results on the length of genotypes for the different models.
* How can results for short sequences be extrapolated to long sequences? The GP map for short sequences differs from the typical behavior of long sequences. Current computational power permits at best to explore system sizes near the boundary.
* Are statistical properties of a given GP map invariant under increases in the dimensionality of the genotype space? It is not known which topological properties of phenotype spaces vary with genotype length. A collaboration of computational and physical approaches is here mandatory.
* Exploring the whole of a genotype space is necessary to understand adaptation and innovation? If not, which phenotypes are relevant in the evolutionary process? Very likely, only abundant phenotypes are visible to evolution under random mutations. Accordingly, we could restrict computational algorithms to these phenotypes.
* What would characterise a minimal model of the GP map? In case it could be defined, are we able to exhaustively study its properties? Answering these questions again demands collaboration between computation and physics.
* How can prediction of phenotype from genotype be improved?
- In protein folding: is a solution in terms of atomic interactions feasible? Is it useful?
- In metabolism: how many levels, between sequence and catabolism, do we need to consider?
- In variable environments: can we enumerate all compatible phenotypes, given a genotype and the environment where it unfolds?
* How can ultra-deep sequencing data be used to inform about genotype and phenotype spaces? As in other scientific areas, we face here a problem of Big Data, and of how to extract information from those sets.
* Can robust phenotypes be designed?

From sequences to functions: challenges in the computation of realistic genotype-phenotype maps

Location: CECAM-ES, University of Zaragoza

Organisers

References