# Workshop Programme

## for period 17 - 18 March 2014

### Model-Data Integration in Physical Systems

17 - 18 March 2014

Timetable

Monday 17 March | ||||

08:30-09:00 | Registration | |||

09:10-09:50 | Vanden-Eijnden, E (Courant Institute of Mathematical Sciences) |
|||

Dimension reduction, coarse-graining and data assimilation in high-dimensional dynamical systems: modeling and computational issues | Sem 1 | |||

Modern computing technologies, such as massively parallel simulation, special-purpose high-performance computers, and high-performance GPUs permit to simulate complex high-dimensional dynamical systems and generate time-series in amounts too large to be grasped by traditional "look and see" analyses. This calls for robust and automated methods to extract the essential structural and dynamical properties from these data in a manner that is little or not depending on human subjectivity. To this end, a decade of work has led to the development of analysis techniques which rely on the partitioning of the conformation space into discrete substates and reduce the dynamics to transitions between these states. A particular successful class of methods of this type are Markov state models (MSMs), in which the transitions between the states in the partition are assumed to be memoryless jumps. The accuracy of these models crucially depends on the choice of these states. In this talk, I will discuss systematic strategies that permit to identify these states and quantify the error of the resulting approximation. These methods will be illustrated on examples arising from molecular dynamics simulations of biomolecules. |
||||

09:55-10:35 | Csanyi, G (University of Cambridge) |
|||

Gaussian process regression in molecular modelling | Sem 1 | |||

The talk will summarise our recent work in using Gaussian process regression (a form of Bayesian non-parametric inference) to model moderate to high dimensional functions in chemistry. Applications range from constructing interatomic potentials (i.e. parametrisations of the Born-Oppenheimer potential energy surface) starting from total energy electronic structure calculations, to free energy surface reconstruction based on umbrella sampling trajectories. In every case studied so far, a careful treatment of data representation and hyperparameters leads to huge increases in computational efficiency and model fidelity. |
||||

10:40-11:25 | Coffee/Tea Break | |||

11:30-12:10 | Crommelin, D (Centrum voor Wiskunde en Informatica (CWI)) |
|||

Modeling of unresolved scales with data-inferred stochastic processes | Sem 1 | |||

I will discuss a data-driven stochastic approach to modeling unresolved scales, in which feedback from micro-scale processes is represented by a network of Markov processes. The Markov processes are conditioned on macro-scale model variables, and their properties are inferred from pre-computed high-resolution (micro-scale resolving) simulations. These processes are designed to emulate, in a statistical sense, the feedback observed in the high-resolution simulations, thereby providing a statistical-dynamical coupling between micro- and macro-scale models. This work is primarily aimed at applications in atmosphere-ocean science (stochastic parameterizations of atmospheric convection and of mesoscale oceanic eddies). |
||||

12:15-12:55 | Miller, R (Oregon State University) |
|||

Implicit particle methods for high dimensional highly nonlinear systems | Sem 1 | |||

The implicit particle filter is one of a number of recently-proposed particle filtering schemes in which the trajectory of each particle is informed by observations within each assimilation cycle. In the case of observations defined by a linear function of the state vector, taken every time step of the numerical model, the implicit particle filter is equivalent to the optimal importance filter, i.e., at each step, any given particle is drawn from the density of the system conditioned jointly upon the observation and the state of the particle at the previous time. The optimal importance filter was implemented for a shallow water model with O(10^4) state variables, and performed well with nominal demands on computing resources, but it exhibited some characteristics of the degeneracy some authors have predicted. We note the similarity of our scheme to other recently-devised schemes, and propose a potential solution in the form of a fixed-lag smoother. |
||||

13:00-14:25 | Sandwich Lunch in INI | |||

14:30-15:10 | Colwell, L (University of Cambridge) |
|||

Using evolutionary sequence variation to make inferences about protein structure and function | Sem 1 | |||

The explosive growth in the number of protein sequences gives rise to the possibility of using natural variation in sequences of homologous proteins to find residues that control different protein phenotypes. Because in many cases phenotypic changes are controlled by a group of residues, the mutations that separate one phenotype from another will be correlated. We show that correlations between amino acid mutations at different sites in a protein can be used to predict, de novo, tertiary protein structure of both globular and transmembrane proteins from large sequence alignments. In addition, residues that determine the specificity of protein interactions can be identified from inter-protein residue pairs that co-vary. Those amino acids whose mutation patterns are most highly constrained by evolution are found to often involve known functional sites of proteins, such as enzyme active sites, and ligand binding sites. These findings raise questions about the relationship between protein structure and function, and the evolutionary constraints that this relationship imposes on different proteins. Our maximum entropy based analysis identifies a global probability with a minimal set of amino acid pair interactions that reproduce all the observed pairwise correlations in the data. The resulting probability model for the sequence of the protein of interest raises the possibility that we may be able to identify amino acids that control different protein phenotypes, and hence re-programme existing proteins. |
||||

15:15-15:55 | Sanz-Alonso, D (University of Warwick) |
|||

Filtering partially observed chaotic deterministic dynamical systems | Sem 1 | |||

Many physical systems can be successfully modelled by a deterministic dynamical system for which, however, the initial conditions may contain uncertainty. In the presence of chaos this can lead to undesirable growth of uncertainty over time. However, when noisy observations of the system are present these may be used to compensate for the uncertainty in the initial state. This scenario is naturally modelled by viewing the initial state as given by a probability distribution, and to then condition this probability distribution on the noisy observations, thereby reducing uncertainty. Filtering refers to the situation where the conditional distribution on the system state is updated sequentially, at the time of each observation. In this talk we investigate the asymptotic behaviour of this filtering distribution for large time. We focus on a class of dissipative systems that includes the Lorenz '63 and '96 models, and the Navier-Stokes equations on a 2D torus. We first study the behaviour of a variant on the 3DVAR filter, creating a unified analysis which subsumes the existing work in [1,2] which, itself, builds on [3]. The optimality property of the true filtering distribution is then used, when combined with this modified 3DVAR analysis, to provide general conditions on the observation of our wide class of chaotic dissipative systems which ensure that the filtering distributions concentrate around the true state of the underlying system in the long-time asymptotic regime. [1] C.E.A. Brett, K.F. Lam, K.J.H. Law, D.S. McCormick, M.R. Scott and A.M. Stuart, ``Accuracy and stability of filters for dissipative PDEs.'' Physica D 245(2013). [2] K.J.H. Law, A. Shukla and A.M. Stuart, ``Analysis of the 3DVAR Filter for the Partially Observed Lorenz '63 Model.'' Discrete and Continuous Dynamical Systems A, 34(2014). [3] K. Hayden, E. Olsen and E.S. Titi, ``Discrete data assimilation in the Lorenz and 2D Navier-Stokes equations.'' Physica D 240(2011). |
||||

16:00-16:45 | Coffee/Tea Break | |||

16:50-17:30 | Lu, J (Duke University) |
|||

Transition path processes | Sem 1 | |||

Understanding rare events like transitions of chemical system from reactant to product states is a challenging problem due to the time scale separation. In this talk, we will discuss some recent progress in mathematical theory of transition paths. In particular, we identify and characterize the stochastic process corresponds to transition paths. The study of transition path process helps to understand the transition mechanism and provides a framework to design and analyze numerical approaches for rare event sampling and simulation. |
||||

17:30-18:30 | Wine Reception |

Tuesday 18 March | ||||

09:10-09:50 | van Leeuwen, P (University of Reading) |
|||

Particle filters for very high dimensional systems | Sem 1 | |||

Particle filters are one of the new data-assimilation methods that allow us to infer the characteristics of the full posterior probability density function. Up to very recently the general knowledge has been that particle filters are not applicable in high-dimensional systems. Recent developments have shown this to be incorrect, and I will discuss a few of these. They are all based on the freedom related to the fact that we can choose particle movements from a different density than that described by the model under study, as long as we adapt the relative weight of the particle. This allows us to pull the particles to future observations, reducing and even avoiding filter degeneracy. Using this we can explore more traditional data-assimilation and inverse modelling techniques that are based on linearisations to find very efficient particles that allow particle filtering in systems of arbitrary dimensions. Different methods will be described and high-dimensional applications, including climate models, will illustrate the quality of the methods. I will also touch upon an ensemble data-assimilation framework that allows very easy and efficient coupling of models to ensemble data-assimilation methods without the need to change the model structure or model work flow. |
||||

09:55-10:35 | Kelly, D (University of North Carolina) |
|||

Filter divergence and EnKF | Sem 1 | |||

The Ensemble Kalman Filter (EnKF) is a widely used tool for assimilating data with high dimensional nonlinear models. Nevertheless, our theoretical understanding of the filter is largely supported by observational evidence rather than rigorous statements. In this talk we attempt to make rigorous statements regarding "filter divergence", where the filter loses track of the underlying signal. To be specific, we focus on the more exotic phenomenon known as "catastrophic filter divergence", where the filter reaches machine infinity in finite time. |
||||

10:40-11:25 | Coffee/Tea Break | |||

11:30-12:10 | Giannakis, D (New York University) |
|||

Extracting spatiotemporal patterns from data with dynamics-adapted kernels | Sem 1 | |||

Kernel methods provide an attractive way of extracting features from data by biasing the geometry of the data in a controlled manner. In this talk, we discuss a family of kernels for dynamical systems featuring an explicit dependence on the dynamical vector field operating in the phase-space manifold, estimated empirically through finite differences of time-ordered data samples. In a suitable asymptotic limit, the associated diffusion operator generates diffusions along the integral curves of the dynamical vector field. We present applications to toy dynamical systems and data generated by comprehensive climate models. |
||||

12:15-12:55 | Brujic, J (New York University) |
|||

Analysis and interpretation of single molecule force spectroscopy experiments | Sem 1 | |||

Protein unfolding and refolding trajectories under a constant stretching force measure manifestations of the underlying molecular processes in the end-to-end length fluctuations. In the case of ubiquitin, I27 and NuG2 protein, the distribution of unfolding times at a given force is best fit with a stretched exponential function, which requires an alternative physical interpretation than the commonly used Kramer's theory. On the other hand, we show that the collapse from a highly extended state to the folded length is well captured by simple diffusion along the free energy of the end-to-end length. The estimated diffusion coefficient of ?100nm2s?1 is significantly slower than expected from viscous effects alone, possibly because of the internal degrees of freedom of the protein. The reconstructed free energy profiles give validity to a physical model in which the multiple protein domains collapse all at once, independent of the number of domains in the in the chain. |
||||

13:00-14:25 | Sandwich Lunch in INI | |||

14:30-15:10 | Sarich, M (Freie Universität Berlin) |
|||

Projecting multiscale processes and an approach to clustering of directed networks | Sem 1 | |||

Markov processes are widely used to model physical, chemical, or biological systems, and these processes often exhibit metastability. This leads to a multiscale behavior and the presence of rare events. We will discuss set-oriented and averaging-based approaches for time-reversible problems that can be connected to a multilevel Galerkin approximation of the transfer operator. Further, we will discuss possible extensions to the non-reversible case and an application to directed network clustering. |
||||

15:15-15:55 | Wu, H (Freie Universität Berlin) |
|||

Gaussian mixture transition models for identification of slow processes in molecular kinetics | Sem 1 | |||

The identification of slow processes from molecular dynamics (MD) simulations is a fundamental and important problem for analyzing and understanding complex molecular processes, because the slow processes governed by dominant eigenvalues and eigenfunctions of MD propagators contain essential information on structures and transition rates of metastable conformations. Most of the existing approaches to this problem, including Markov model based approaches and the variational approach, perform the identification by representing the dominant eigenfunctions as linear combinations of a set of basis functions. But the choice of basis functions is still an unsatisfactorily solved problem for these approaches. Here we take a Bayesian approach to slow process identification by developing a novel parametric model called Gaussian mixture transition model (GMTM) to characterize MD propagators. The GMTM approximates the half-weighted density of a MD propagator by a Gaussian mixtur e model and allows for tractable computation of spectral components. In contrast with the other Galerkin-type approximation based approaches, our approach can automatically adjust the involved Gaussian basis functions and handle the statistical uncertainties in the Bayesian framework. We demonstrate by some simulation examples the effectiveness and accuracy of the proposed approach. |
||||

16:00-16:45 | Coffee/Tea Break | |||

16:50-17:30 | Branicki, M (University of Edinburgh) |
|||

Multi model mixture density estimators & information theory for stochastic filtering and prediction | Sem 1 | |||

Multi Model Ensemble (MME) predictions are a popular ad-hoc technique for improving imperfect predictions of high-dimensional, multi-scale dynamical systems. The heuristic idea behind MME framework is simple: given a collection of imperfect models, one considers predictions obtained through the convex superposition of the individual probabilistic forecasts in the hope of mitigating model error. However, it is not obvious if this is a viable strategy and which models - and with what weights - should be included in the MME forecast in order to achieve the best predictive performance. I will show that an information-theoretic approach to this problem allows for deriving a sufficient condition for improving dynamical predictions within the MME framework; moreover, this formulation gives rise to systematic and practical guidelines for optimising data assimilation techniques which are based on multi model ensembles. |