ModelData Integration in Physical Systems
Monday 17th March 2014 to Tuesday 18th March 2014
08:30 to 09:00  Registration  
09:10 to 09:50 
Dimension reduction, coarsegraining and data assimilation in highdimensional dynamical systems: modeling and computational issues
Modern computing technologies, such as massively parallel simulation, specialpurpose highperformance computers, and highperformance GPUs permit to simulate complex highdimensional dynamical systems and generate timeseries in amounts too large to be grasped by traditional "look and see" analyses. This calls for robust and automated methods to extract the essential structural and dynamical properties from these data in a manner that is little or not depending on human subjectivity. To this end, a decade of work has led to the development of analysis techniques which rely on the partitioning of the conformation space into discrete substates and reduce the dynamics to transitions between these states. A particular successful class of methods of this type are Markov state models (MSMs), in which the transitions between the states in the partition are assumed to be memoryless jumps. The accuracy of these models crucially depends on the choice of these states. In this talk, I will discuss systematic strategies that permit to identify these states and quantify the error of the resulting approximation. These methods will be illustrated on examples arising from molecular dynamics simulations of biomolecules.

INI 1  
09:55 to 10:35 
Gaussian process regression in molecular modelling
The talk will summarise our recent work in using Gaussian process regression (a form of Bayesian nonparametric inference) to model moderate to high dimensional functions in chemistry. Applications range from constructing interatomic potentials (i.e. parametrisations of the BornOppenheimer potential energy surface) starting from total energy electronic structure calculations, to free energy surface reconstruction based on umbrella sampling trajectories. In every case studied so far, a careful treatment of data representation and hyperparameters leads to huge increases in computational efficiency and model fidelity.

INI 1  
10:40 to 11:25  Coffee/Tea Break  
11:30 to 12:10 
D Crommelin (Centrum voor Wiskunde en Informatica (CWI)) Modeling of unresolved scales with datainferred stochastic processes
I will discuss a datadriven stochastic approach to modeling unresolved scales, in which feedback from microscale processes is represented by a network of Markov processes. The Markov processes are conditioned on macroscale model variables, and their properties are inferred from precomputed highresolution (microscale resolving) simulations. These processes are designed to emulate, in a statistical sense, the feedback observed in the highresolution simulations, thereby providing a statisticaldynamical coupling between micro and macroscale models. This work is primarily aimed at applications in atmosphereocean science (stochastic parameterizations of atmospheric convection and of mesoscale oceanic eddies).

INI 1  
12:15 to 12:55 
Implicit particle methods for high dimensional highly nonlinear systems
The implicit particle filter is one of a number of recentlyproposed particle filtering schemes in which the trajectory of each particle is informed by observations within each assimilation cycle. In the case of observations defined by a linear function of the state vector, taken every time step of the numerical model, the implicit particle filter is equivalent to the optimal importance filter, i.e., at each step, any given particle is drawn from the density of the system conditioned jointly upon the observation and the state of the particle at the previous time. The optimal importance filter was implemented for a shallow water model with O(10^4) state variables, and performed well with nominal demands on computing resources, but it exhibited some characteristics of the degeneracy some authors have predicted. We note the similarity of our scheme to other recentlydevised schemes, and propose a potential solution in the form of a fixedlag smoother.

INI 1  
13:00 to 14:25  Sandwich Lunch in INI  
14:30 to 15:10 
L Colwell (University of Cambridge) Using evolutionary sequence variation to make inferences about protein structure and function
The explosive growth in the number of protein sequences gives rise to the possibility of using natural variation in sequences of homologous proteins to find residues that control different protein phenotypes. Because in many cases phenotypic changes are controlled by a group of residues, the mutations that separate one phenotype from another will be correlated. We show that correlations between amino acid mutations at different sites in a protein can be used to predict, de novo, tertiary protein structure of both globular and transmembrane proteins from large sequence alignments.
In addition, residues that determine the specificity of protein interactions can be identified from interprotein residue pairs that covary. Those amino acids whose mutation patterns are most highly constrained by evolution are found to often involve known functional sites of proteins, such as enzyme active sites, and ligand binding sites. These findings raise questions about the relationship between protein structure and function, and the evolutionary constraints that this relationship imposes on different proteins.
Our maximum entropy based analysis identifies a global probability with a minimal set of amino acid pair interactions that reproduce all the observed pairwise correlations in the data. The resulting probability model for the sequence of the protein of interest raises the possibility that we may be able to identify amino acids that control different protein phenotypes, and hence reprogramme existing proteins.

INI 1  
15:15 to 15:55 
Filtering partially observed chaotic deterministic dynamical systems
Many physical systems can be successfully modelled by a deterministic dynamical system for which, however, the initial conditions may contain uncertainty. In the presence of chaos this can lead to undesirable growth of uncertainty over time. However, when noisy observations of the system are present these may be used to compensate for the uncertainty in the initial state. This scenario is naturally modelled by viewing the initial state as given by a probability distribution, and to then condition this probability distribution on the noisy observations, thereby reducing uncertainty. Filtering refers to the situation where the conditional distribution on the system state is updated sequentially, at the time of each observation. In this talk we investigate the asymptotic behaviour of this filtering distribution for large time.
We focus on a class of dissipative systems that includes the Lorenz '63 and '96 models, and the NavierStokes equations on a 2D torus. We first study the behaviour of a variant on the 3DVAR filter, creating a unified analysis which subsumes the existing work in [1,2] which, itself, builds on [3]. The optimality property of the true filtering distribution is then used, when combined with this modified 3DVAR analysis, to provide general conditions on the observation of our wide class of chaotic dissipative systems which ensure that the filtering distributions concentrate around the true state of the underlying system in the longtime asymptotic regime.
[1] C.E.A. Brett, K.F. Lam, K.J.H. Law, D.S. McCormick, M.R. Scott and A.M. Stuart, ``Accuracy and stability of filters for dissipative PDEs.'' Physica D 245(2013). [2] K.J.H. Law, A. Shukla and A.M. Stuart, ``Analysis of the 3DVAR Filter for the Partially Observed Lorenz '63 Model.'' Discrete and Continuous Dynamical Systems A, 34(2014). [3] K. Hayden, E. Olsen and E.S. Titi, ``Discrete data assimilation in the Lorenz and 2D NavierStokes equations.'' Physica D 240(2011).

INI 1  
16:00 to 16:45  Coffee/Tea Break  
16:50 to 17:30 
Transition path processes
Understanding rare events like transitions of chemical system from reactant to product states is a challenging problem due to the time scale separation. In this talk, we will discuss some recent progress in mathematical theory of transition paths. In particular, we identify and characterize the stochastic process corresponds to transition paths. The study of transition path process helps to understand the transition mechanism and provides a framework to design and analyze numerical approaches for rare event sampling and simulation.

INI 1  
17:30 to 18:30  Wine Reception 
09:10 to 09:50 
P van Leeuwen (University of Reading) Particle filters for very high dimensional systems
Particle filters are one of the new dataassimilation methods that allow us to infer the characteristics of the full posterior probability density function. Up to very recently the general knowledge has been that particle filters are not applicable in highdimensional systems. Recent developments have shown this to be incorrect, and I will discuss a few of these.
They are all based on the freedom related to the fact that we can choose particle movements from a different density than that described by the model under study, as long as we adapt the relative weight of the particle. This allows us to pull the particles to future observations, reducing and even avoiding filter degeneracy. Using this we can explore more traditional dataassimilation and inverse modelling techniques that are based on linearisations to find very efficient particles that allow particle filtering in systems of arbitrary dimensions.
Different methods will be described and highdimensional applications, including climate models, will illustrate the quality of the methods. I will also touch upon an ensemble dataassimilation framework that allows very easy and efficient coupling of models to ensemble dataassimilation methods without the need to change the model structure or model work flow.

INI 1  
09:55 to 10:35 
Filter divergence and EnKF
The Ensemble Kalman Filter (EnKF) is a widely used tool for assimilating data with high dimensional nonlinear models. Nevertheless, our theoretical understanding of the filter is largely supported by observational evidence rather than rigorous statements.
In this talk we attempt to make rigorous statements regarding "filter divergence", where the filter loses track of the underlying signal. To be specific, we focus on the more exotic phenomenon known as "catastrophic filter divergence", where the filter reaches machine infinity in finite time.

INI 1  
10:40 to 11:25  Coffee/Tea Break  
11:30 to 12:10 
Extracting spatiotemporal patterns from data with dynamicsadapted kernels
Kernel methods provide an attractive way of extracting features from data by biasing the geometry of the data in a controlled manner. In this talk, we discuss a family of kernels for dynamical systems featuring an explicit dependence on the dynamical vector field operating in the phasespace manifold, estimated empirically through finite differences of timeordered data samples. In a suitable asymptotic limit, the associated diffusion operator generates diffusions along the integral curves of the dynamical vector field. We present applications to toy dynamical systems and data generated by comprehensive climate models.

INI 1  
12:15 to 12:55 
Analysis and interpretation of single molecule force spectroscopy experiments
Protein unfolding and refolding trajectories under a constant stretching force measure manifestations of the underlying molecular processes in the endtoend length fluctuations. In the case of ubiquitin, I27 and NuG2 protein, the distribution of unfolding times at a given force is best fit with a stretched exponential function, which requires an alternative physical interpretation than the commonly used Kramer's theory. On the other hand, we show that the collapse from a highly extended state to the folded length is well captured by simple diffusion along the free energy of the endtoend length. The estimated diffusion coefficient of ?100nm2s?1 is significantly slower than expected from viscous effects alone, possibly because of the internal degrees of freedom of the protein. The reconstructed free energy profiles give validity to a physical model in which the multiple protein domains collapse all at once, independent of the number of domains in the in the chain.

INI 1  
13:00 to 14:25  Sandwich Lunch in INI  
14:30 to 15:10 
M Sarich (Freie Universität Berlin) Projecting multiscale processes and an approach to clustering of directed networks
Markov processes are widely used to model physical, chemical, or biological systems, and these processes often exhibit metastability. This leads to a multiscale behavior and the presence of rare events. We will discuss setoriented and averagingbased approaches for timereversible problems that can be connected to a multilevel Galerkin approximation of the transfer operator. Further, we will discuss possible extensions to the nonreversible case and an application to directed network clustering.

INI 1  
15:15 to 15:55 
Gaussian mixture transition models for identification of slow processes in molecular kinetics
The identification of slow processes from molecular dynamics (MD) simulations is a fundamental and important problem for analyzing and understanding complex molecular processes, because the slow processes governed by dominant eigenvalues and eigenfunctions of MD propagators contain essential information on structures and transition rates of metastable conformations. Most of the existing approaches to this problem, including Markov model based approaches and the variational approach, perform the identification by representing the dominant eigenfunctions as linear combinations of a set of basis functions. But the choice of basis functions is still an unsatisfactorily solved problem for these approaches. Here we take a Bayesian approach to slow process identification by developing a novel parametric model called Gaussian mixture transition model (GMTM) to characterize MD propagators. The GMTM approximates the halfweighted density of a MD propagator by a Gaussian mixtur e model and allows for tractable computation of spectral components. In contrast with the other Galerkintype approximation based approaches, our approach can automatically adjust the involved Gaussian basis functions and handle the statistical uncertainties in the Bayesian framework. We demonstrate by some simulation examples the effectiveness and accuracy of the proposed approach.

INI 1  
16:00 to 16:45  Coffee/Tea Break  
16:50 to 17:30 
M Branicki (University of Edinburgh) Multi model mixture density estimators & information theory for stochastic filtering and prediction
Multi Model Ensemble (MME) predictions are a popular adhoc technique for improving imperfect predictions of highdimensional, multiscale dynamical systems. The heuristic idea behind MME framework is simple: given a collection of imperfect models, one considers predictions obtained through the convex superposition of the individual probabilistic forecasts in the hope of mitigating model error. However, it is not obvious if this is a viable strategy and which models  and with what weights  should be included in the MME forecast in order to achieve the best predictive performance. I will show that an informationtheoretic approach to this problem allows for deriving a sufficient condition for improving dynamical predictions within the MME framework; moreover, this formulation gives rise to systematic and practical guidelines for optimising data assimilation techniques which are based on multi model ensembles.

INI 1 