# Workshop Programme

## for period 22 - 25 April 2014

### Monte Carlo Inference for Complex Statistical Models

22 - 25 April 2014

Timetable

Tuesday 22 April | ||||

08:30-09:05 | Registration | |||

09:05-09:15 | Welcome from Christie Marr (INI Deputy Director) | |||

09:15-10:15 | Andrieu, C (University of Bristol) |
|||

tba | Sem 1 | |||

10:15-10:30 | Morning Coffee | |||

10:30-11:05 | Chopin, N (Centre de Recherche en Économie et Statistique (CREST)) |
|||

Sequential Quasi-Monte Carlo | Sem 1 | |||

Co-author: Mathieu Gerber (Université de Lausanne and CREST) We develop a new class of algorithms, SQMC (Sequential Quasi-Monte Carlo), as a variant of SMC (Sequential Monte Carlo) based on low-discrepancy points. The complexity of SQMC is O(N\log N), where N is the number of simulations at each iteration, and its error rate is smaller than the Monte Carlo rate O(N^{-1/2}). The only requirement to implement SQMC is the ability to write the simulation of particle x^n_t given x^n_{t-1} as a deterministic function of x^n_{t-1} and uniform variates. We show that SQMC is amenable to the same extensions as standard SMC, such as forward smoothing, backward smoothing, unbiased likelihood evaluation, and so on. In particular, SQMC may replace SMC within a PMCMC (particle Markov chain Monte Carlo) algorithm. We establish several convergence results. We provide numerical evidence in several difficult scenarios than SQMC significantly outperforms SMC in terms of approximation error. |
||||

11:05-11:40 | Moulines, E (Télécom ParisTech) |
|||

On the uniform ergodicity of the particle Gibbs sampler | Sem 1 | |||

Co-authors: Randal Douc (Telecom SudParis), Fred Lindsten (Cambridge) The particle Gibbs sampler is a systematic way of using a particle filter within Markov chain Monte Carlo (MCMC). This results in an off-the-shelf Markov kernel on the space of state trajectories, which can be used to simulate from the full joint smoothing distribution for a state space model in an MCMC scheme. We show that the PG Markov kernel is uniformly ergodic under rather general assumptions, that we will carefully review and discuss. In particular, we provide an explicit rate of convergence which reveals that: (i) for fixed number of data points, the convergence rate can be made arbitrarily good by increasing the number of particles, and (ii) under general mixing assumptions, the convergence rate can be kept constant by increasing the number of particles superlinearly with the number of observations. We illustrate the applicability of our result by studying in detail two common state space models with non-compact state spaces. |
||||

11:40-12:15 | Thiery, AH (National University of Singapore) |
|||

Speeding-up Pseudo-marginal MCMC using a surrogate model | Sem 1 | |||

The pseudo-marginal MCMC algorithm is a powerful tool for exploring the posterior distribution when only unbiased stochastic estimates of the target density are available. In many situations, although there is no closed-form expression for this density, computationally cheap deterministic estimates are also available; one can then use a delayed-acceptance strategy for exploiting these cheap approximations. As powerful as they are, the use of such algorithms are difficult in practice: it involves tuning the MCMC proposals and choosing the computational budget that one is willing to invest in the creation of the unbiased estimates while taking into account the quality of the cheap deterministic approximations. In this talk we discuss how high-dimensional asymptotic results can help in the tuning of these delayed-acceptance pseudo-marginal MCMC algorithms. This is joint work with Chris Sherlock and Andrew Golightly. |
||||

12:30-13:30 | Lunch at Wolfson Court | |||

13:45-14:20 | Pitt, M (University of Warwick) |
|||

Efficient implementation of Markov chain Monte Carlo when using an unbiased likelihood estimator | Sem 1 | |||

When an unbiased estimator of the likelihood is used within an Metropolis-Hastings scheme, it is necessary to tradeoff the number of samples used to evaluate the likelihood against the computing time. Many samples will result in a scheme which has similar properties to the case where the likelihood is exactly known but will be expensive. Few samples will result in faster estimation but at the expense of slower mixing of the Markov chain. We explore the relationship between the number of samples and the efficiency of the resulting Metropolis-Hastings estimates. Under the assumption that the distribution of the additive noise introduced by the log-likelihood estimator is independent of the point at which this log-likelihood is evaluated and other relatively mild assumptions, we provide guidelines on the number of samples to select for a general Metropolis-Hastings proposal. We illustrate on a complex stochastic volatility model that these assumptions are approximately satisfied experimentally and that the theoretical insights with regards to inefficiency and computational time hold true. Keywords: Bayesian inference; Estimated likelihood; Metropolis-Hastings; Particle filtering. |
||||

14:20-14:55 | Reich, S (Universität Potsdam) |
|||

Particle filters for infinite-dimensional systems: combining localization and optimal transportation | Sem 1 | |||

Co-author: Yuan Cheng (University of Potsdam) Particle filters or sequential Monte Carlo methods are powerful tools for adjusting model state to data. However they suffer from the curse of dimensionality and have not yet found wide-spread application in the context of spatio-temporal evolution models. On the other hand, the ensemble Kalman filter with its simple Gaussian approximation has successfully been applied to such models using the concept of localization. Localization allows one to account for a spatial decay of correlation in a filter algorithm. In my talk, I will propose novel particle filter implementations which are suitable for localization and, as the ensemble Kalman filter, fit into the broad class of linear transform filters. In case of a particle filter this transformation will be determined by ideas from optimal transportation while in case of the ensemble Kalman filter one essentially relies on the linear Kalman update formulas. This common framework also allows for a mixture of particle and ensemble Kalman filters. Numerical results will be provided for the Lorenz-96 model which is a crude model for nonlinear advection. |
||||

14:55-15:30 | Stuart, A (University of Warwick) |
|||

The Filtering Distribution For Partially Observed Chaotic Dynamical Systems | Sem 1 | |||

Co-author: Daniel Sanz (University of Warwick) Many physical systems can be successfully modelled by a deterministic dynamical system for which, however, the initial conditions may contain uncertainty. In the presence of chaos this can lead to undesirable growth of uncertainty over time. However, when noisy observations of the system are present these may be used to compensate for the uncertainty in the initial state. This scenario is naturally modelled by viewing the initial state as given by a probability distribution, and to then condition this probability distribution on the noisy observations, thereby reducing uncertainty. Filtering refers to the situation where the conditional distribution on the system state is updated sequentially, at the time of each observation. In this talk we investigate the asymptotic behaviour of this filtering distribution for large time. We focus on a class of dissipative systems that includes the Lorenz '63 and '96 models, and the Navier-Stokes equations on a 2D torus. We first st udy the behaviour of a variant on the 3DVAR filter, creating a unified analysis which subsumes the existing work in [1,2] which, itself, builds on [3]. The optimality property of the true filtering distribution is then used, when combined with this modified 3DVAR analysis, to provide general conditions on the observation of our wide class of chaotic dissipative systems which ensure that the filtering distributions concentrate around the true state of the underlying system in the long-time asymptotic regime. [1] C.E.A. Brett, K.F. Lam, K.J.H. Law, D.S. McCormick, M.R. Scott and A.M. Stuart, ``Accuracy and stability of filters for dissipative PDEs.'' Physica D 245(2013). [2] K.J.H. Law, A. Shukla and A.M. Stuart, ``Analysis of the 3DVAR Filter for the Partially Observed Lorenz '63 Model.'' Discrete and Continuous Dynamical Systems A, 34(2014). [3] K. Hayden, E. Olsen and E.S. Titi, ``Discrete data assimilation in the Lorenz and 2D Navier-Stokes equations.'' Physica D 240(2011). |
||||

15:30-15:50 | Afternoon Tea | |||

15:50-16:25 | Lee, A (University of Warwick) |
|||

Locally adaptive Monte Carlo methods | Sem 1 | |||

Co-authors: Christophe Andrieu (University of Bristol), Arnaud Doucet (University of Oxford) In various situations of interest, natural implementations of Monte Carlo algorithms such as Markov chain Monte Carlo and sequential Monte Carlo can perform poorly due to uneven performance in different parts of the space in which they operate. For example, in Markov chain Monte Carlo a Markov kernel may behave increasingly poorly in the tails of the target distribution of interest and in sequential Monte Carlo the quality of associated estimates may plummet if too few particles are used at a particular time. We overview a particular strategy, local adaptation, that seeks to overcome some of these phenomena in practice. |
||||

16:25-17:00 | Whiteley, NP (University of Bristol) |
|||

Particle filtering subject to interaction constraints | Sem 1 | |||

Co-authors: Kari Heine (Bristol), Taylan Cemgil (Bogazici), Anthony Lee (Warwick) The potential benefits of parallel or distributed implementation motivates study of the interaction structure of particle filters. Can we do away with resampling, or at least re-structure it in such a way as to be more naturally suited to non-serial implementation? What role does resampling really play in endowing these algorithms with attractive properties? This talk will introduce some new algorithms in this context and discuss their theoretical properties. Related Links: http://arxiv.org/abs/1309.2918 |
||||

17:00-18:00 | Welcome Wine Reception |

Wednesday 23 April | ||||

09:15-10:15 | Marin, J-M (Université Montpellier 2) |
|||

ABC methods for Bayesian model choice | Sem 1 | |||

Approximate Bayesian computation (ABC), also known as likelihood-free methods, have become a standard tool for the analysis of complex models, primarily in population genetics. The development of new ABC methodology is undergoing a rapid increase in the past years, as shown by multiple publications, conferences and even software. While one valid interpretation of ABC based estimation is connected with nonparametrics, the setting is quite different for model choice issues. We examined in Grelaud et al. (2009, Bayesian Analysis) the use of ABC for Bayesian model choice in the specific of Gaussian random fields (GRF), relying on a sufficient property only enjoyed by GRFs to show that the approach was legitimate. Despite having previously suggested the use of ABC for model choice in a wider range of models in the DIYABC software (Cornuet et al., 2008, Bioinformatics), we present in Robert et al. (2011, PNAS) evidence that the general use of ABC for model choice can be a real problem. Finally, in Marin et al. (2014, JRSS B), we derive necessary and sufficient conditions on summary statistics for the corresponding Bayes factor to be convergent, namely to asymptotically select the true model. In this talk, we will present these different results. Marin, Pillai, Robert and Rousseau (2014) Relevant statistics for Bayesian model choice, to appear in the Journal of the Royal Statistical Society, Series B Robert, Cornuet, Marin and Pillai (2011) Lack of confidence in approximate Bayesian computation model choice, Proceedings of the National Academy of Science, 108(37), 15112-15117 Grelaud, Robert, Marin, Rodolphe and Taly (2009) ABC likelihood-free methods for model choice in Gibbs random fields, Bayesian Analysis, 4(2), 317-336 Cornuet, Santos, Beaumont, Robert, Marin, Balding, Guillemaud and Estoup (2008) Inferring population history with DIY ABC: a user-friendly approach Approximate Bayesian Computation, Bioinformatics, 24(23), 2713-2719 |
||||

10:15-10:30 | Morning Coffee | |||

10:30-11:05 | Kohn, R (University of New South Wales) |
|||

Speeding up MCMC by Efficient Data Subsampling | Sem 1 | |||

Co-authors: Chris carter (University of New South wales ), Eduardo Mendes (University of New South wales ) The computing time for Markov Chain Monte Carlo (MCMC) algorithms can be prohibitively large for datasets with many observations, especially when the data density for each observation is costly to evaluate. We propose a framework based on a Pseudo-marginal MCMC where the likelihood function is unbiasedly estimated from a random subset of the data, resulting in substantially fewer density evaluations. The subsets are selected using efficient sampling schemes, such as Probability Proportional-to-Size (PPS) sampling where the inclusion probability of an observation is proportional to an approximation of its contribution to the likelihood function. We illustrate the method on a large dataset of Swedish firms containing half a million observations. |
||||

11:05-11:40 | Vihola, MS (University of Jyväskylä) |
|||

Establishing some order amongst exact approximation MCMCs | Sem 1 | |||

Co-author: Christophe Andrieu (University of Bristol) Exact approximation Markov chain Monte Carlo (MCMC) algorithms are a general class of algorithms for Bayesian inference in complex models. We discover a general sufficient condition which allows to order two implementations of such algorithms in terms of mean acceptance probability and asymptotic variance. The key condition is convex order between the weight distributions, which emerges naturally when the weight distributins stem from importance sampling approximations with different number of samples. |
||||

11:40-12:15 | Latuszynski, KG (University of Warwick) |
|||

The Bernoulli Factory, extensions and applications | Sem 1 | |||

The celebrated Bernoulli Factory problem can be phrased as following: Given a black box to simulate a p-coin with unknown p, and given known function f, how to design an algorithm for sampling an f(p)-coin? Solving the Bernoulli Factory involves some elegant and enjoyable mathematics and somewhat surprisingly the problem shows up as a major burden in many computational settings, such as executing certain types of MCMC algorithms. I will discuss some of these applications as well as extensions of the Bernoulli Factory to more general random variables. |
||||

12:30-13:30 | Lunch at Wolfson Court | |||

13:45-14:20 | Beskos, A (National University of Singapore) |
|||

Sequential Monte Carlo methods for applications in Data Assimilation | Sem 1 | |||

Sequential Monte Carlo (SMC) methods are nowadays routinely applied in a variety of complex applications: hidden Markov models, dynamical systems, target tracking, control problems, just to name a few. Whereas SMC have been greatly refined in the last decades and are now much better understood, they are still known to suffer from the curse of dimensionality: algorithms can sometimes break down exponentially fast with the dimension of the state space. As a consequence, practitioners in high-dimensional Data Assimilation applications in atmospheric sciences, oceanography and elsewhere will typically use 3D-Var or Kalman-filter-type approximations that will provide biased estimates in the presence of non-linear model dynamics. The talk will concentrate on a class of SMC algorithms and will look at ways to reduce the cost of the algorithms as a function of the dimension of the state space. Explicit asymptotic results will clarify the effect of the dimension at the properties of the algorithm and could provide a platform for algorithmic optimisation in high dimensions. Applications will be shown in the context of Data Assimilation, in a problem where the objective is to target the posterior distribution of the initial condition of the Navier-Stokes equation given a Gaussian and noisy observations at different instances and locations of the spatial field. The dimension of the signal is in theory infinite-dimensional - in practice 64x64 or more depending on the resolution – thus posing great challenges for the development and efficiency of SMC methods. |
||||

14:20-14:55 | Woodard, DB (Cornell University) |
|||

Statistical Methods for Ambulance Fleet Management | Sem 1 | |||

We introduce statistical methods to address two problems arising in the management of ambulance fleets: (1) predicting the distribution of ambulance driving time between arbitrary start and end locations in a road network; and (2) space-time forecasting of ambulance demand. These predictions are critical for deciding how many ambulances should be deployed at a given time and where they should be stationed, which ambulance should be dispatched to an emergency, and whether and how to schedule ambulances for non-urgent patient transfers. We demonstrate the accuracy and operational impact of our methods using ambulance data from Toronto Emergency Medical Services. For driving time prediction the relevant data are Global Positioning System (GPS) recordings from historical lights-and-sirens ambulance trips. Challenges include the typically large size of the road network and dataset (70,000 network links and 160,000 historical trips for Toronto); the lack of trips in the historical data that follow precisely the route of interest; and the strong temporal, weather, and other effects on driving time. We introduce a model of the travel time at the trip level, as well as a computationally efficient procedure for two-stage estimation in that model. For space-time forecasting of demand we develop integer time-series factor models and spatio-temporal mixture models, which capture the complex weekly and daily patterns in demand as well as changes in the spatial demand density over time. Related Links: •http://people.orie.cornell.edu/woodard/publications.html - Links to Woodard's publications on ambulance fleet management |
||||

14:55-15:30 | Sherlock, C (Lancaster University) |
|||

Adaptive delayed-acceptance pseudo-marginal random walk Metropolis | Sem 1 | |||

Co-authors: Andrew Golightly (Newcastle University) and Daniel Henderson (Newcastle University) Delayed-acceptance (DA) pseudo-marginal Metropolis-Hastings (MH) algorithms can be applied when it is computationally expensive to calculate an unbiased estimate of the true posterior, but a computationally cheap approximation is available. A first accept-reject stage is applied, with the cheap approximation substituted for the true posterior in the MH acceptance ratio. Only for those proposals which pass through the first stage is the computationally expensive true posterior (or unbiased estimate thereof) evaluated, with a second accept-reject stage ensuring that detailed balance is satisfied with respect to the intended posterior. A weighted average of all previous unbiased estimates of the true posterior provides a generic approximation to the true posterior. If only the k-nearest neighbours are used in the average then evaluation of the approximate posterior can be made computationally cheap provided that the points at which the posterior has been estimated unbiasedly are stored in a multi-dimensional binary tree, similar to a KD-tree. |
||||

15:30-15:50 | Afternoon Tea | |||

15:50-16:25 | Douc, R (Telecom SudParis) |
|||

Identifiability conditions for partially-observed Markov chains | Sem 1 | |||

Co-authors: François ROUEFF (LTCI, UMR 5141, Telecom Paristech, France), Tepmony SIM (LTCI, UMR 5141, Telecom Paristech, France) This paper deals with a parametrized family of partially-observed bivariate Markov chains. We establish that the limit of the normalized log-likelihood is maximized when the parameter belongs to the equivalence class of the true parameter, which is a key feature for obtaining consistency of the Maximum Likelihood Estimators in well-specified models. A novel aspect of this work is that geometric ergodicity of the Markov chain associated to the complete data, or exponential separation on measures are no more needed provided that the invariant distribution is assumed to be unique, regardless its rate of convergence to the equilibrium. The result is presented in a general framework including both fully dominated or partially dominated models as Hidden Markov models or Observation-Driven times series of counts. |
||||

16:25-17:00 | Teh, YW (University of Oxford) |
|||

Stochastic Gradient Langevin Dynamics for Large Scale Bayesian Inference | Sem 1 | |||

The Bayesian approach to statistical machine learning is a theoretically well-motivated framework to learning from data. It provides a coherent framework to reasoning about uncertainties, and an inbuilt protection against overfitting. However, computations in the framework can be expensive, and most approaches to Bayesian computations do not scale well to the big data setting. In this talk we propose a new computational approach for Bayesian learning from large scale datasets based on iterative learning from small mini-batches. By adding the right amount of noise to a standard stochastic gradient optimization algorithm we show that the iterates will converge to samples from the true posterior distribution as we anneal the stepsize. We apply the method to logistic regression and latent Dirichlet allocation, showing state-of-the-art performance. Joint work with Max Welling and Sam Patterson. |
||||

19:30-22:00 | Conference Dinner at Emmanuel College |

Thursday 24 April | ||||

09:15-10:15 | Holmes, C (University of Oxford) |
|||

Robust statistical decisions under model misspecification by re-weighted Monte Carlo samplers | Sem 1 | |||

Large complex data sets typically demand approximate models at some level of specification. In such situations it is important for the analyst to examine the robustness of conclusions to approximate predictions. Recent developments in optimal control and econometrics have established formal methods for linear quadratic state space models (see e.g. Hansen and Sargent 2008) by considering the local-minimax outcome within an information divergence (Kullback-Leibler) neighbourhood around the approximating model. Here we show how these approaches can be extended to arbitrary probability models using Monte Carlo methods. We derive theoretical results establishing the uniqueness of the Kullback-Leibler criteria, as well as Bayesian non-parametric methods to sample from the space of probability distributions within a fixed divergence constraint. |
||||

10:15-10:30 | Morning Coffee | |||

10:30-11:05 | Olsson, JR (KTH - Royal Institute of Technology) |
|||

Particle islands and archipelagos: some large sample theory | Sem 1 | |||

Co-authors: Christelle Vergé, Pierre Del Moral, and Eric Moulines This talk discusses parallelisation of sequential Monte Carlo methods via the particle island framework (Vergé et al., 2013) and presents some novel convergence results for methods of this sort. More specifically, we introduce the concept of weighted archipelagos (i.e. sets of weighted particle islands, where each island is itself a weighted sample of particles) and define three different operations on such archipelagos, namely: selection on the island level, selection on the particle level, and mutation. We then establish that these operations preserve a set of convergence properties, including asymptotic normality, of the archipelago as the number of islands as well as the number of particles of each island tend jointly to infinity. Moreover, we provide recursive formulas for the asymptotic variance associated with each operation. As our results allow arbitrary compositions of the mentioned operations to be analysed, we may use the same for establishing the convergence properties of not only the double bootstrap algorithm but also generalisations of this algorithm. |
||||

11:05-11:40 | Golightly, A (Newcastle University) |
|||

Bayesian inference for sparsely observed diffusions | Sem 1 | |||

Co-authors: Chris Sherlock (Lancaster University) We consider Bayesian inference for parameters governing nonlinear multivariate diffusion processes using data that may be incomplete, subject to measurement error and observed sparsely in time. We adopt a high frequency imputation approach to inference, by introducing additional time points between observations and working with the Euler-Maruyama approximation over the induced discretisation. We assume that interest lies in the marginal parameter posterior and sample this target via particle MCMC. A vanilla implementation based on a bootstrap filter is eschewed in favour of an auxiliary particle filter where the latent path is extended by sampling a discretisation of a conditioned diffusion. This conditioned diffusion should be carefully constructed to allow for nonlinear dynamics between observations. |
||||

11:40-12:15 | Rebeschini, P (Princeton University) |
|||

Particle filters and curse of dimensionality | Sem 1 | |||

Co-author: Ramon van Handel (Princeton University) A problem that arises in many applications is to compute the conditional distributions of stochastic models given observed data. While exact computations are rarely possible, particle filtering algorithms have proved to be very useful for approximating such conditional distributions. Unfortunately, the approximation error of particle filters grows exponentially with dimension, a phenomenon known as curse of dimensionality. This fact has rendered particle filters of limited use in complex data assimilation problems that arise, for example, in weather forecasting. In this talk I will argue that it is possible to develop “local” particle filtering algorithms whose approximation error is dimension-free. By exploiting conditional decay of correlations properties of high-dimensional models, we prove for the simplest possible algorithm of this type an error bound that is uniform both in time and in the model dimension. (Joint work with R. van Handel) Related Links: http://arxiv.org/abs/1301.6585 - Preprint |
||||

12:30-13:30 | Lunch at Wolfson Court | |||

13:45-14:20 | Mengersen, K (Queensland University of Technology) |
|||

Pre-Processing for Approximate Bayesian Computation in Image Analysis | Sem 1 | |||

Co-authors: Matthew Moores (QUT, Australia), Christian Robert (U. Paris Dauphine, France) Existing algorithms for approximate Bayesian computation (ABC) assume that it is feasible to simulate pseudo-data from the model at each iteration. However, the computational cost of these simulations can be prohibitive for high dimensional data. An important example is the Potts model, which is commonly used in image analysis. The dimension of the state vector in this model is equal to the size of the data, which can be millions of pixels. We introduce a preparatory computation step before model fitting to improve the scalability of ABC. The output of this precomputation can be reused across multiple datasets. We illustrate this method by estimating the smoothing parameter for satellite images, demonstrating that the pre-computation step can sufficiently reduce the average runtime required for model fitting to enable analysis in realistic, if not yet real, time. |
||||

14:20-14:55 | Prangle, D (University of Reading) |
|||

Lazy ABC | Sem 1 | |||

In approximate Bayesian computation (ABC) algorithms, parameter proposals are accepted if corresponding simulated datasets are sufficiently close to the observations. Producing the large quantity of model simulations needed requires considerable computer time. However, it is often clear early on in a simulation that it is unlikely to produce a close match. This talk is on an ABC algorithm which saves time by abandoning such simulations early. A probabilistic stopping rule is used which leaves the target distribution unchanged from that of standard ABC. Applications of this idea beyond ABC are also discussed. |
||||

14:55-15:30 | Schmidler, S (Duke University) |
|||

Parallel Markov Chain Monte Carlo | Sem 1 | |||

Co-author: Doug VanDerwerken (Duke University) Markov chain Monte Carlo is an inherently serial algorithm. Although the likelihood calculations for individual steps can sometimes be parallelized, the serial evolution of the process is widely viewed as incompatible with parallization, offering no speedup for sampling algorithms which require large numbers of iterations to converge to equilibrium. We provide a methodology for parallelizing Markov chain Monte Carlo across large numbers of independent, asynchronous processors. The method is originally motivated by sampling multimodal target distributions, where we see an exponential speed-up in running time. However we show that the approach is general purpose and applicable to all Markov chain Monte Carlo simulations, and demonstrate speed-ups proportional to the number of available processors on slowly mixing chains with unimodal target distributions. The approach is simple and easy to implement, and suggests additional directions for further research. |
||||

15:30-15:50 | Afternoon Tea | |||

15:50-16:25 | Vollmer, S (University of Oxford) |
|||

Consistency and CLTs for stochastic gradient Langevin dynamics based on subsampled data | Sem 1 | |||

Co-authors: Alexandre Thiery (National University of Singapore), Yee-Whye Teh (University of Oxford) Applying MCMC to large data sets is expensive. Both calculating the acceptance probability and creating informed proposals depending on the likelihood require an iteration through the whole data set. The recently proposed Stochastic Gradient Langevin Dynamics (SGLD) circumvents this problem by generating proposals based on only a subset of the data and skipping the accept-reject step. In order to heuristically justify the latter, the step size converges to zero in a non-summable way. Under appropriate Lyapunov conditions, we provide a rigorous foundation for this algorithm by showing consistency of the weighted sample average and proving a CLT for it. Surprisingly, the fraction of the data subset selection does not have an influence on the asymptotic variance. |
||||

16:25-17:00 | Papaspiliopoulos, O (Institució Catalana de Recerca i Estudis Avançats (ICREA)) |
|||

Optimal filtering and the dual process | Sem 1 | |||

Co-author: Matteo Ruggiero (Turin) We link optimal filtering for hiddenMarkov models to the notion of duality forMarkov processes.We show that when the signal is dual to a process that has two components, one deterministic and one a pure death process, and with respect to functions that define changes of measure conjugate to the emission density, the filtering distributions evolve in the family of finite mixtures of such measures and the filter can be computed at a cost that is polynomial in the number of observations. Special cases of our framework include the Kalman filter, and computable filters for the Cox–Ingersoll–Ross process and the one-dimensional Wright– Fisher process, which have been investigated before in the literature. The dual we obtain for the Cox– Ingersoll–Ross process appears to be new in the literature. Related Links: http://www.isi-web.org/images/bernoulli/BEJ1305-022.pdf - article |

Friday 25 April | ||||

09:50-10:25 | Stumpf, M (Imperial College London) |
|||

Approximate Bayesian Inference for Stochastic Processes | Sem 1 | |||

Co-authors: Paul Kirk (Imperial College London), Angelique Ale (Imperial College London), Ann Babtie (Imperial College London), Sarah Filippi (Imperial College London), Eszter Lakatos (Imperial College London), Daniel Silk (Imperial College London), Thomas Thorn (University of Edinburgh) We consider approximate Bayesian computation (ABC) approaches to model the dynamics and evolution of molecular networks. Initially conceived to cope with problems with intractable likelihoods, ABC has gained popularity over the past decade. But there are still considerable problems in applying ABC to real-world problems, some of which are shared with exact Bayesian inference, but some are due to the nature of ABC. Here we will present some recent advances that allow us to apply ABC sequential Monte Carlo (SMC) to real biological problems. The rate of convergence of ABC-SMC depends crucially on the schedule of thresholds, ?t, t=1,2,…,T, and the perturbation kernels used to generate proposals from the previous population of parameters. We show how both of these can be tuned individually, and jointly. Careful calibration of the ABC-SMC approach can result in a 10-fold reduction in the computational burden (or more). I will also provide an overview of an alternative approach where, rather than approximating the likelihood in an ABC framework, we provide approximations to the master equation that describes the evolution of the stochastic system, that go beyond the conventional linear noise approximation (LNA). This allows us to tackle systems with ``interesting dynamics", that are typically beyond the scope of the LNA, and we will show how to use such approaches in exact Bayesian inference procedures (including nested sampling and SMC). |
||||

10:25-10:40 | Morning Coffee | |||

10:40-11:15 | Schön, TB (Uppsala Universitet) |
|||

Sequential Monte Carlo methods for graphical models | Sem 1 | |||

Co-authors: Christian A. Naesseth (Linkoping University), Fredrik Lindsten (University of Cambridge) We develop a sequential Monte Carlo (SMC) algorithm for inference in general probabilistic graphical model. Via a sequential decomposition of the PGM we find a sequence of auxiliary distributions defined on a monotonically increasing sequence of probability spaces. By targeting these auxiliary distributions using purpose built SMC samplers we are able to approximate the full joint distribution defined by the graphical model. Our SMC sampler also provides an unbiased estimate of the partition function (normalization constant) and we show how it can be used within a particle Markov chain Monte Carlo framework. This allows for better approximations of the marginals and for unknown parameters to be estimated. The proposed inference algorithms can deal with an arbitrary graph structure and the domain of the random variables in the graph can be discrete or continuous. Related Links: http://arxiv.org/pdf/1402.0330v1.pdf - Associated paper http://user.it.uu.se/~thosc112/index.html - Speaker (Thomas Schön) home page |
||||

11:15-11:50 | Murray, L (CSIRO) |
|||

Sequential Monte Carlo with Highly Informative Observations | Sem 1 | |||

Co-author: Pierre Del Moral (University of New South Wales) We introduce a sequential Monte Carlo (SMC) method for sampling the state of continuous-time state-space models when observations are highly informative, a situation in which standard SMC methods can perform poorly. The most extreme case is where the observations are exact---of the state itself---and the problem is that of simulating diffusion bridges between given starting and ending states. The basic idea is to introduce a sequence of intermediate weighting and resampling steps between observation times, guiding particles towards the ending state. A few designs that have been useful in practice are given, and demonstrated on some applied problems that feature complex models. |
||||

11:50-12:25 | Li, K (Uppsala University) |
|||

Generalised Particle Filters with Gaussian Mixtures | Sem 1 | |||

Stochastic filtering is defined as the estimation of a partially observed dynamical system. A massive scientific and computational effort has been dedicated to the development of numerical methods for approximating the solution of the filtering problem. Approximating with Gaussian mixtures has been very popular since the 1970s, however the existing work is only based on the success of the numerical implementation and is not theoretically justified. We fill this gap and conduct a rigorous analysis of a new Gaussian mixture approximation to the solution of the filtering problem. In particular, we construct the corresponding approximating algorithm, deduce the L2-convergence rate and prove a central limit type theorem for the approximating system. In addition, we show a numerical example to illustrate some features of this algorithm. This is joint work with Dan Crisan (Imperial College London). References: [1] D. Crisan, K. Li, “A central limit type theorem for Gaussian mixture approximations to the nonlinear filtering problem”, ArXiv1401:6592, (2014). [2] D. Crisan, K. Li, “Generalised particle filters with Gaussian mixtures”, accepted by Stochastic Processes and their Applications, ArXiv1306:0255, (2013). [3] D. Crisan, K. Li, “Generalised particle filters with Gaussian measures”, Proceedings of 19th European Signal Processing Conference, Barcelona, Spain, pp. 659-663, (2011). |
||||

12:30-13:30 | Lunch at Wolfson Court | |||

14:00-14:45 | Sharia, T (Royal Holloway, University of London) |
|||

Truncated stochastic approximation with moving bounds | Sem 1 | |||

A wide class of truncated stochastic approximation procedures with moving random bounds will be discussed. This class of procedures has three main characteristics: truncations with random moving bounds, a matrix-valued random step-size sequence, and a dynamically changing random regression function. While we believe that the proposed class of procedures will find its way to a wider range of applications, the main motivation is to accommodate applications to parametric statistical estimation theory. The proposed method allows for incorporation of auxiliary information into the estimation process, and is consistent and asymptotically efficient under certain regularity conditions. Related Links: •http://arxiv.org/pdf/1101.0031v4.pdf - Link to the paper in ArXiv |
||||

14:45-15:15 | Tadic, V (University of Bristol) |
|||

Asymptotic Properties of Recursive Maximum Likelihood Estimation in State-Space Models | Sem 1 | |||

Co-author: Arnaud Doucet (University of Oxford) Recursive maximum likelihood algorithm for state-space models (i.e., for continuous state hidden Markov models) is an iterative estimation method based on particle filter and stochastic gradient search. In this talk, resent results on its asymptotic properties are presented. These results are focused on the asymptotic bias and the asymptotic variance. They also involve diffusion approximation, almost-sure and mean-square convergence of the recursive maximum likelihood algorithm. Some auxiliary (yet, rather interesting) results on the asymptotic properties of the particle filter and the log-likelihood are presented in the talk, too. |