skip to content

Timetable (SINW01)

Scalable statistical inference

Monday 3rd July 2017 to Friday 7th July 2017

Monday 3rd July 2017
11:45 to 12:30 Registration
12:30 to 13:20 Lunch @ Wolfson Court
13:20 to 13:30 Welcome from Christie Marr (INI Deputy Director)
13:30 to 14:15 Sylvia Richardson
Data compression with statistical guarantees
Joint talk with Daniel Ahfock (MRC Biostatistics Unit @ University of Cambridge)

The talk is concerned with translating recent ideas from computer science on probabilistic data-compression techniques into a statistical framework that can be ‘safely’ applied for speeding linear regression analyses for very larges sample sizes in bio-medicine.

 Our motivation is to facilitate the use of multivariate regression and model exploration in tall data sets, so that, for example, genetic association analyses carried out on hundreds of thousands of subjects can investigate multivariate effects for a set of explanatory features, rather than be restricted to one feature at a time associations for computational feasibility.

Among the many approaches to dealing with tall data, probabilistic data compression techniques using random linear mapping, developed in the computer science community, so called sketching, are particularly suitable for linear regression problems. In the first part of the talk, we will present a hierarchical representation of sketching, which allows deriving statistical properties (distributional) of different sketching algorithms. In particular, we will discuss how the signal to noise ratio in the original data set is important for the choice of sketching algorithm. In the second part of the talk, we will further refine some of the approximation guarantees and consider iterative sketches. The talk will be illustrated on a genetic analysis of the link between a blood cell trait and the HLA region involving a sample of 130,000 people.

14:15 to 15:00 Alexandre Bouchard
Monte Carlo without rejection
Co-authors: Arnaud Doucet (Oxford), Sebastian Vollmer (Warwick), George Deligiannidis (King's College London), Paul Vanetti (Oxford)

Markov chain Monte Carlo methods have become standard tools to sample from complex high-dimensional probability measures. Many available techniques rely on discrete-time reversible Markov chains whose transition kernels built up over the Metropolis-Hastings algorithm. In our recent work, we investigate an alternative approach, the Bouncy Particle Sampler (BPS) where the target distribution of interest is explored using a continuous-time, non reversible Markov process. In this alternative approach, a particle moves along straight lines continuously around the space and, when facing a high energy barrier, it is not rejected but its path is modified by bouncing against this barrier. The resulting non-reversible Markov process provides a rejection-free Markov chain Monte Carlo sampling scheme. This method, inspired from recent work in the molecular simulation literature, is shown to be a valid, efficient sampling scheme applicable to a wide range of Bayesian problems. We present several additional original methodological extensions and establish various theoretical properties of these procedures. We demonstrate experimentally the efficiency of these algorithms on a variety of Bayesian inference problems.
15:00 to 15:30 Afternoon Tea
15:30 to 16:15 Helen Ogden
Inference with approximate likelihoods
In cases where it is infeasible to compute the likelihood exactly, an alternative is to find some numerical approximation to the likelihood, then to use this approximate likelihood in place of the exact likelihood to do inference about the model parameters. This is a fairly commonly used approach, and I will give several examples of approximate likelihoods which have been used in this way. But is this a valid approach to inference? I will give conditions under which inference with an approximate likelihood shares some of the same asymptotic properties as inference with the exact likelihood, and describe the implications in some examples. I will finish with some ideas about how to construct scalable likelihood approximations which give statistically valid inference.
16:15 to 17:00 Havard Rue
Scalable statistical inference with INLA
INLA do approximate Bayesian inference for the class of latent Gaussian models. It has shown sucessful allowing statisticians and applied scientists to fast and reliable Bayesian inference for a huge class of additve models, within reasonable time. Especially, the use of spatial Gaussian models using the SPDE approach has been particularly popular. Although most models runs within reasonable time, we are facing with the current implementation, limitations for really huge models like large space time models. In this talk I will discuss the current situation and possible strategies to improve the situation.
17:00 to 18:00 Welcome Wine Reception at INI
Tuesday 4th July 2017
09:00 to 09:45 Michael Jordan
Variational, Hamiltonian and Symplectic Perspectives on Acceleration
Accelerated gradient methods play a central role in optimization, achieving optimal rates in many settings.  While many generalizations and extensions of Nesterov's original acceleration method have been proposed, it is not yet clear what is the natural scope of the acceleration
concept. We study accelerated methods from a continuous-time perspective.  We show that there is a Lagrangian functional that we call the "Bregman Lagrangian" which generates a large class of accelerated methods in continuous time, including (but not limited to) accelerated
gradient descent, its non-Euclidean extension, and accelerated higher-order gradient methods.  We show that the continuous-time limit of all of these methods correspond to traveling the same curve in spacetime at different speeds.  We also describe a "Bregman Hamiltonian" which generates the accelerated dynamics, we develop a symplectic integrator for this Hamiltonian and we discuss relations between this symplectic integrator and classical Nesterov acceleration.  [Joint work with Andre Wibisono, Ashia Wilson and Michael Betancourt.]
09:45 to 10:30 Anthony Lee
Unbiased approximations of products of expectations
I will describe recent work with Simone Tiberi (Zurich) and Giacomo Zanella (Bocconi), on the unbiased approximation of a product of n expectations. Such products arise, e.g., as values of the likelihood function in latent variable models, and unbiased approximations can be used in a pseudo-marginal Markov chain to facilitate inference. A straightforward, standard approach consists of approximating each term using an independent average of M i.i.d. random variables and taking the product of these approximations. While simple, this typically requires M to be O(n) so that the total number of random variables required is N = Mn = O(n^2) in order to control the relative variance of the approximation. Using all N random variables to approximate each expectation is less wasteful when producing them is costly, but produces a biased approximation. We propose an alternative to these two approximations that uses most of the N samples to approximate each expectation in such a way that the estimate of the product of expectations is unbiased. We analyze the variance of this approximation and show that it can result in N = O(n) being sufficient for the relative variance to be controlled as n increases. In situations where the cost of simulations dominates overall computational time, and fixing the relative variance, the proposed approximation is almost n times faster than the standard approach to compute.
10:30 to 11:00 Morning Coffee
11:00 to 11:45 Steven Scott
Comparing Consensus Monte Carlo Strategies for Distributed Bayesian Computation
Consensus Monte Carlo is an algorithm for conducting Monte Carlo based Bayesian inference on large data sets distributed across many worker machines in a data center. The algorithm operates by running a separate Monte Carlo algorithm on each worker machine, which only sees a portion of the full data set. The worker-level posterior samples are then combined to form a Monte Carlo approximation to the full posterior distribution based on the complete data set. We compare several methods of carrying out the combination, including a new method based on approximating worker-level simulations using a mixture of multivariate Gaussian distributions. We find that resampling and kernel density based methods break down after 10 or sometimes fewer dimensions, while the new mixture-based approach works well, but the necessary mixture models take too long to fit.
11:45 to 12:30 Jun Liu
12:30 to 13:30 Lunch @ Wolfson Court
13:30 to 14:15 Tamara Broderick
Coresets for scalable Bayesian logistic regression
Co-authors: Jonathan H. Huggins (MIT), Trevor Campbell (MIT)

The use of Bayesian methods in large-scale data settings is attractive because of the rich hierarchical models, uncertainty quantification, and prior specification they provide. However, standard Bayesian inference algorithms are computationally expensive, so their direct application to large datasets can be difficult or infeasible. Rather than modify existing algorithms, we instead leverage the insight that data is often redundant via a pre-processing step. In particular, we construct a weighted subset of the data (called a coreset) that is much smaller than the original dataset. We then input this small coreset to existing posterior inference algorithms without modification. To demonstrate the feasibility of this approach, we develop an efficient coreset construction algorithm for Bayesian logistic regression models. We provide theoretical guarantees on the size and approximation quality of the coreset -- both for fixed, known datasets, and in expectation for a wide class o f data generative models. Our approach permits efficient construction of the coreset in both streaming and parallel settings, with minimal additional effort. We demonstrate the efficacy of our approach on a number of synthetic and real-world datasets, and find that, in practice, the size of the coreset is independent of the original dataset size.
14:15 to 15:00 Cristiano Varin
Designing efficient composite likelihoods
Composite likelihood is an inference function constructed by compounding component likelihoods based on low dimensional marginal or conditional distributions. Since the components are multiplied as if they were independent, the composite likelihood inherits the properties of likelihood inference from a misspecified model. The virtue of composite likelihood inference is “combining the advantages of likelihood with computational feasibility” (Reid, 2013). Given the wide applicability, composite likelihoods are attracting interest as scalable surrogate for intractable likelihoods. Despite the promise, application of composite likelihood is still limited by some theoretical and computational issues which have received only partial or initial responses. Open theoretical questions concern characterization of general model conditions assuring validity of composite likelihood inference, optimal selection of component likelihoods and precise evaluation of estimation uncertainty. Computational issues concern how to design composite likelihood methods to balance statistical efficiency and computational efficiency.

In this talk, after a critical review of composite likelihood theory, I shall focus on the potential merits of composite likelihood inference in modeling temporal and spatial variation of disease incidence. The talk is based on past work with Nancy Reid (Toronto) and David Firth (Warwick), and various new projects with Manuela Cattelan (Padova), Xanthi Pedeli (Venice) and Guido Masarotto (Padova).  
15:00 to 15:30 Afternoon Tea
15:30 to 16:15 Kerrie Mengersen
Transferability: as easy as ABC?
An intense area of interest in ecology is transferability: how to apply a model developed for one region to another region. This type of extrapolation is common in many areas yet is still underdeveloped statistically, particularly when models are complex or intractable. In this presentation I would like to discuss and ask advice about aspects of this problem, namely taking an experimental design approach to the problem and setting vaguely informative priors. Two questions will also be asked. 
16:15 to 17:00 Darren Wilkinson
Category theory and functional programming for scalable statistical modelling and computational inference
This talk considers both the theoretical and computational requirements for scalable statistical modelling and computation. It will be argued that programming languages typically used for statistical computing do not naturally scale, and that functional programming languages by contrast are ideally suited to the development of scalable statistical algorithms. The mathematical subject of category theory provides the necessary theoretical underpinnings for rigorous analysis and reasoning about functional algorithms, their correctness, and their scalability. Used in conjunction with other tools from theoretical computer science, such as recursion schemes, these approaches narrow the gap between statistical theory and computational implementation, providing numerous benefits, not least automatic parallelisation and distribution of algorithms.
Wednesday 5th July 2017
09:00 to 09:45 Paul Fearnhead
Asymptotics of Approximate Bayesian Computation
Many statistical applications involve models for which are easy to sample from, but for which it is difficult to evaluate the likelihood. Approximate Bayesian computation is a likelihood-free method for implementing Bayesian inference in such cases. This talk will overview some recent results on the theoretical properties of approximate Bayesian computation which consider the performance of ABC as we get more data. It will cover questions such as: when does the ABC posterior concentrate on the true parameter value? What distribution does the ABC posterior converge to? And what is the frequentist distribution of point-estimates derived using ABC. It will also cover the impact of Monte Carlo error on estimates obtained using ABC, and consider whether, asympotically, it is possible to efficiently estimate parameters using ABC if we have a fixed Monte Carlo sample size.

This is joint work with Wentao Li: and; the talk will also cover work by David Frazier, Martin, Robert and Rousseau:
09:45 to 10:30 Wentao Li (Lancaster University); (Newcastle University)
Validating approximate Bayesian computation on posterior convergence
Co-author: Paul Fearnhead (Lancaster University)

Many statistical applications involve models for which it is difficult to evaluate the likelihood, but relatively easy to sample from. Approximate Bayesian computation is a likelihood-free method for implementing Bayesian inference in such cases. We present a number of surprisingly strong asymptotic results for the regression-adjusted version of approximate Bayesian Computation introduced by Beaumont et al. (2002). We show that for an appropriate choice of the bandwidth in approximate Bayesian computation, using regression-adjustment will lead to a posterior that, asymptotically, correctly quantifies uncertainty. Furthermore, for such a choice of bandwidth we can implement an importance sampling algorithm to sample from the posterior whose acceptance probability tends to 1 as we increase the data sample size. This compares favourably to results for standard approximate Bayesian computation, where the only way to obtain its posterior that correctly quantifies uncertainty is to choose a much smaller bandwidth, for which the acceptance probability tends to 0 and hence for which Monte Carlo error will dominate.

Related Links
10:30 to 11:00 Morning Coffee
11:00 to 11:45 Sam Livingstone
Kinetic energy choice in Hamiltonian/hybrid Monte Carlo
We consider how different choices of kinetic energy in Hamiltonian Monte Carlo affect algorithm performance. To this end, we introduce two quantities which can be easily evaluated, the composite gradient and the implicit noise. Results are established on integrator stability and geometric convergence, and we show that choices of kinetic energy that result in heavy-tailed momentum distributions can exhibit an undesirable negligible moves property, which we define. A general efficiency-robustness trade off is outlined, and implementations which rely on approximate gradients are also discussed. Two numerical studies illustrate our theoretical findings, showing that the standard choice which results in a Gaussian momentum distribution is not always optimal in terms of either robustness or efficiency.
11:45 to 12:30 Christophe Andrieu
On the ordering of some nonreversible Markov chain and Markov process Monte Carlo methods
We will show how one can derive simple criteria to order the asymptotic variance of a broad class of Monte Carlo algorithms based on Markov chains or processes. Interestingly these results mirror existing results for algorithms based on reversible processes.

Joint work with Sam Livingstone.
12:30 to 13:30 Lunch @ Wolfson Court
13:30 to 14:15 Murray Pollock
Exact Bayesian Inference for Big Data: Single- and Multi-Core Approaches
Co-authors: Hongsheng Dai (Essex), Paul Fearnhead (Lancaster), Adam Johansen (Warwick), Divakar Kumar (Warwick), Gareth Roberts (Warwick)

This talk will introduce novel methodologies for exploring posterior distributions by modifying methodology for exactly (without error) simulating diffusion sample paths. The methodologies discussed have found particular applicability to "Big Data" problems. We begin by presenting the Scalable Langevin Exact Algorithm (ScaLE) and recent methodological extensions (including Re-ScaLE, which avoids the need for particle approximation in ScaLE), which has remarkably good scalability properties as the size of the data set increases (it has sub-linear cost, and potentially no cost as a function of data size). ScaLE has particular applicability in the “single-core” big data setting - in which inference is conducted on a single computer. In the second half of the talk we will present methodology to exactly recombine inferences on separate data sets computed on separate cores - an exact version of “divide and conquer". As such this approach has particu lar applicability in the “multi-core” big data setting. We conclude by commenting on future work on the confluence of these approaches. Joint work with Hongsheng Dai, Paul Fearnhead, Adam Johansen, Divakar Kumar, Gareth Roberts.
14:15 to 15:00 Gareth Roberts (University of Warwick)
Optimisation and complexity for Gibbs samplers for hierarchical and crossed-effect models
We study the convergence properties of the Gibbs Sampler in the context of Gaussian hierarchical and crossed-effect models. We develop a novel methodology based on multi-grid decompositions to derive analytic expressions for the convergence rates of the algorithm, extending significantly the class of conditionally Gaussian models amenable to direct analysis. In the hierarchical context, our work gives a rather complete understanding of the Gibbs Sampler behaviour for symmetric models (with arbitrary depth), while providing approximations and bounds for the non-symmetric cases. The theoretical results give rise to simple and easy-to-implement guidelines to optimise practical implementations of the Gibbs samplers on such models. While the good performances of the Gibbs Sampler in hierarchically-structured models is renowned, the context of crossed-effect models is drastically different. Here hierarchical centering is not possible and the convergence of commonly implemented Gibbs Sampler strategies deteriorates as the data-size increases, resulting in super-linear computational complexity (potentially even quadratic) in the number of data-points. We show how to leverage the negatively-correlated structure of crossed-effect models to design easy-to-implement collapsed Gibbs Samplers whose complexity matches the one of hierarchical scenarios. 

This is joint work with Giacomo Zanella and Omiros Papaspiliopoulos.  
15:00 to 15:30 Afternoon Tea
15:30 to 16:15 Chris Holmes
Fast Bayesian Boolean Matrix Factorisation
Boolean matrix factorisation decomposes a binary data matrix into an approximating Boolean product of two low rank, binary matrices: one containing meaningful patterns (signatures), the other quantifying how the observations can be expressed as a logical combination of these patterns.  

We introduce a probabilistic model for Boolean matrix factorisation, termed the “OrMachine”, and derive a Metropolised Gibbs sampler that facilitates efficient parallel posterior inference on commodity hardware. On real world and simulated data, our Bayesian method provides state of the art performance for Boolean matrix factorisation and matrix completion. The method supports full posterior inference, which is important in applications, for example in controlling false positive rates in collaborative filtering and, crucially, improves the interpretability of the inferred patterns. The proposed model and computation scale to large datasets as motivated by an analysis of single cell gene expression data recording measurements from 1.3 million mouse brain cells across 11 thousand genes. 
16:15 to 17:00 Louis Aslett
Towards Encrypted Inference for Arbitrary Models
There has been substantial progress in development of statistical methods which are amenable to computation with modern cryptographic techniques, such as homomorphic encryption.  This has enabled fitting and/or prediction of models in areas from classification and regression through to genome wide association studies.  However, these are techniques devised to address specific models in specific settings, with the broader challenge of an approach to inference for arbitrary models and arbitrary data sets receiving less attention.  This talk will discuss very recent results from ongoing work towards an approach which may allow theoretically arbitrary low dimensional models to be fitted fully encrypted, keeping the model and prior secret from data owners and vice-versa.  The methodology will be illustrated with a variety of examples, together with a discussion of the ongoing direction of the work.

17:00 to 18:00 Board Meeting INI 1
19:30 to 22:00 Formal Dinner at Trinity College
Thursday 6th July 2017
09:00 to 09:45 Helen Zhang
Hierarchy-preserving regularization solution paths for identifying interactions in high dimensional data
Co-authors: Ning Hao (University of Arizona), Yang Feng (Columbia University)

Interaction screening for high-dimensional settings has recently drawn much attention in the literature. A variety of interaction screening approaches have been proposed for regression and classification problems. However, most of existing regularization methods for interaction selections are limited to low or moderate dimensional data analysis, due to their complex programing with inequality constraints and demanded prohibitive storage and computational cost when handling high dimensional data. This talk will present our recent work on scalable regularization methods to interaction selection under hierarchical constraints for high dimensional regression and classification. We first consider two-stage LASSO methods and establish their theoretical properties. Then a new regularization method, called Regularization Algorithm under Marginality Principle (RAMP), is developed to compute hierarchy-preserving regularization solution paths efficiently. In contrast to existing regular ization methods, the proposed methods avoid storing the entire design matrix and sidestep complex constraints and penalties, making them feasible to ultra-high dimensional data analysis. The new methods are further extended to handling binary responses. Extensive numerical results will be presented as well. 
09:45 to 10:30 Katherine Heller
Mobile Apps and Machine Learning for Improving Healthcare
The first part of this talk centers on the analysis of student influenza data. Students in dormitories at the University of Michigan were given smartphones with mobile a mobile app, called iEpi, that captured data about their locations, interactions, and disease symptoms. We develop Graph-coupled Hidden Markov Models (GCHMMs) which use this data to predict whether a student was likely to fall ill due to their interactions. Using a hierarchical version of GCHMMs we can combine with demographic data and see that certain characteristics, such as drinking, and poor sleep quality, increased the likelihood of contracting influenza, as well as recovery time.

The second part discusses the development of a new mobile app, MS Mosaic, for tracking symptoms in multiple sclerosis (MS) patients. The app includes data in the form of daily surveys, fitness tracker information, and mobile phone task data. The daily surveys about symptoms and medications can potentially be completed with a single notification swipe, sleep and activity data can be collected passively using HealthKit, and mobile phone tasks include finger tapping, gait analysis, as well as additional cognitive and motor tasks. Data collected provides an opportunity for the development of novel machine learning methods for learning about chronic disease, and novel sensor types. The app will soon be released to the Apple app store, and piloted in clinic at Duke University.

If time remains we will briefly look at some of the other healthcare work on using Gaussian Process models on EHR data, going on currently at Duke.

Coauthors: Kai Fan, Allison Aiello, Lee Hartsell, Joe Futoma, and Sanjay Hariharan
10:30 to 11:00 Morning Coffee
11:00 to 11:45 James Scott
Detecting radiological anomalies
Radiologically active materials are used widely in industry, medicine, and research. Yet an unsecured, lost, or stolen radiological source can present a major threat to public safety.  To deal with the potential environmental and security hazards posed by such a scenario, govenment agencies use various detection procedures at ports of entry to their countries.  Moreover, security agencies that try to prevent terrorist attacks are keenly interested in the problem of identifying and locating stolen or smuggled radiation samples. Even at the local level, police departments have shown increasing interest in the deployment of systems for detecting anomalous radiological sources.

Statistically speaking, the radiological anomaly-detection problem is one of detecting a change in distribution. Sequential data is collected from a sensor that measures the energies of arriving gamma rays. These observed energies are random variables drawn from an energy spectrum, which is a probability distribution over the set of possible gamma-ray energies. The question is whether those measured energies are from the normal background spectrum, and therefore harmless, or whether they are from an anomalous spectrum due to the presence of a nearby radiological source.  In this talk I will describe some new statistical methods we’ve developed for deal with two major challenges in this setting: 1) characterizing the spatially varying background radiation in dense urban areas; and 2) flagging anomalous readings from spatially distributed sensor networks in a statistically rigorous way.

This is joint work with Wesley Tansey, Oscar Padilla, Alex Reinhart, and Alex Athey.
11:45 to 12:30 Po-Ling Loh
Community recovery in weighted stochastic block models
Co-authors: Min Xu (University of Pennsylvania), Varun Jog (University of Wisconsin - Madison)

Identifying communities in a network is an important problem in many fields, including social science, neuroscience, military intelligence, and genetic analysis. In the past decade, the Stochastic Block Model (SBM) has emerged as one of the most well-studied and well-understood statistical models for this problem. Yet, the SBM has an important limitation: it assumes that each network edge is drawn from a Bernoulli distribution. This is rather restrictive, since weighted edges are fairly ubiquitous in scientific applications, and disregarding edge weights naturally results in a loss of valuable information. In this paper, we study a weighted generalization of the SBM, where observations are collected in the form of a weighted adjacency matrix, and the weight of each edge is generated independently from a distribution determined by the community membership of its endpoints. We propose and analyze a novel algorithm for community estimation in the weighted SBM based on various su broutines involving transformation, discretization, spectral clustering, and appropriate refinements. We prove that our procedure is optimal in terms of its rate of convergence, and that the misclassification rate is characterized by the Renyi divergence between the distributions of within-community edges and between-community edges. In the regime where the edges are sparse, we also establish sharp thresholds for exact recovery of the communities. Our theoretical results substantially generalize previously established thresholds derived specifically for unweighted block models. Furthermore, our algorithm introduces a principled and computationally tractable method of incorporating edge weights to the analysis of network data.
12:30 to 13:30 Lunch @ Wolfson Court
13:30 to 14:15 Matti Vihola
Importance sampling type estimators based on approximate marginal Markov chain Monte Carlo and exact approximation
We consider an importance sampling (IS) type estimator based on Markov chain Monte Carlo (MCMC) which targets an approximate marginal distribution. The IS approach, based on unbiased estimators, is consistent, and provides a natural alternative to delayed acceptance (DA) pseudo-marginal MCMC. The IS approach enjoys many benefits against DA, including a straightforward parallelisation. We focus on a Bayesian latent variable model setting, where the MCMC operates on the hyperparameters, and the latent variable distributions are approximated. 
14:15 to 15:00 Arnaud Doucet
The Correlated Pseudo-Marginal Method
15:00 to 15:30 Afternoon Tea
15:30 to 16:15 Sinan Yildirim
Scalable Monte Carlo inference for state-space models
Co-authors: Christophe Andrieu (University of Bristol), Arnaud Doucet (University of Oxford)

We present an original simulation-based method to estimate likelihood ratios efficiently for general state-space models. Our method relies on a novel use of the conditional Sequential Monte Carlo (cSMC) algorithm introduced in Andrieu et al. (2010) and presents several practical advantages over standard approaches. The ratio is estimated using a unique source of randomness instead of estimating separately the two likelihood terms involved. Beyond the benefits in terms of variance reduction one may expect in general from this type of approach, an important point here is that the variance of this estimator decreases as the distance between the likelihood parameters decreases. We show how this can be exploited in the context of Monte Carlo Markov chain (MCMC) algorithms, leading to the development of a new class of exact-approximate MCMC methods to perform Bayesian static parameter inference in state-space models. We show through simulations that, in contrast to the Particle Mar ginal Metropolis–Hastings (PMMH) algorithm of Andrieu et al. (2010), the computational effort required by this novel MCMC scheme scales favourably for large data sets.
16:15 to 17:00 Chris Sherlock
The Discrete Bouncy Particle Sampler
The Bouncy Particle Sampler (BPS) is a continuous-time, non-reversible MCMC algorithm that shows great promise in efficiently sampling from certain high-dimensional distributions; a particle moves with a fixed velocity except that occasionally it "bounces" off the hyperplane perpendicular to the gradient of the target density. One practical difficulty is that for each specific target distribution, a locally-valid upper bound on the component of the gradient in the direction of movement must be found so as to allow for simulation of the bounce times via Poisson thinning; for efficient implementation this bound should also be tight. In dimension $d=1$, the discrete-time version of the Bouncy Particle Sampler (and, equivalently, of the Zig-Zag sampler, another continuous-time, non-reversible algorithm) is known to consist of fixing a time step, $\Delta t$, and proposing a shift of $v \Delta t$ which is accepted with a probability dependent on the ratio of target evaluated at the proposed and current positions; on rejection the velocity is reversed. We present a discrete-time version of the BPS that is valid in any dimension $d\ge 1$ and the limit of which (as $\Delta t\downarrow 0$) is the BPS, which is rejection free. The Discrete BPS has the advantages of non-reversible algorithms in terms of mixing, but does not require an upper bound on a Poisson intensity and so is straightforward to apply to complex targets, such as those which can be evaluated pointwise but for which general properties, such as local or global Lipshitz bounds on derivatives, cannot be obtained. [Joint work with Dr. Alex Thiery].
Friday 7th July 2017
09:00 to 09:45 Eric François Moulines
Langevin MCMC: theory and methods
Nicolas Brosse, Ecole Polytechnique, Paris
Alain Durmus, Telecom ParisTech and Ecole Normale Supérieure Paris-Saclay
Marcelo Pereira, Herriot-Watt University, Edinburgh

The complexity and sheer size of modern datasets, to whichever increasingly demanding questions are posed, give rise to major challenges. Traditional simulation methods often scale poorly with data size and model complexity and thus fail for the most complex of modern problems.
We are considering the problem of sampling from a log-concave distribution. Many problems in machine learning fall into this framework,
such as linear ill-posed inverse problems with sparsity-inducing priors, or large scale Bayesian binary regression.

The purpose of this lecture is to explain how we can use ideas which have proven very useful in machine learning community to
solve large-scale optimization problems to design efficient sampling algorithms.
Most of the efficient algorithms know so far may be seen as variants of the gradient descent algorithms,
most often coupled with « partial updates » (coordinates descent algorithms). This, of course, suggests studying methods derived from Euler discretization of the Langevin diffusion. Partial updates may in this context as « Gibbs steps »This algorithm may be generalized in the non-smooth case by « regularizing » the objective function. The Moreau-Yosida inf-convolution algorithm is an appropriate candidate in such case.

We will prove convergence results for these algorithms with explicit convergence bounds both in Wasserstein distance and in total variation. Numerical illustrations will be presented (on the computation of Bayes factor for model choice, Bayesian analysis of high-dimensional regression, aggregation of estimators) to illustrate our results.
09:45 to 10:30 David Firth (University of Warwick)
Bradley-Terry models for pair-comparison networks: Structure and scalability
In this talk I will discuss various aspects of Bradley-Terry models, focusing especially on some aspects that can make the Bradley-Terry model an interesting "toy" case when considering scalable computation and inference more generally.  Included will be some (very!) fresh simulation experiments to investigate: (a) large-network bias; and (b) model reformulation to improve distributional properties as well as computational tractability in "structured" Bradley-Terry models.  This relates to on-going work with Warwick PhD students David Selby and Ella Kaye, and also with Heather Turner (Warwick) and Ioannis Kosmidis (UCL) --- and so I am especially keen to engage with other participants in this INI programme, to explore competing approaches.
10:30 to 11:00 Morning Coffee
11:00 to 11:45 Jose Blanchet
Exact Sampling for Multivariate Diffusions
We provide the first generic exact simulation algorithm for multivariate diffusions. Current exact sampling algorithms for diffusions require the existence of a transformation which can be used to reduce the sampling problem to the case of a constant diffusion matrix and a drift which is the gradient of some function. Such transformation, called Lamperti transformation, can be applied in general only in one dimension. So, completely different ideas are required for exact sampling of generic multivariate diffusions. The development of these ideas is the main contribution of this paper. Our strategy combines techniques borrowed from the theory of rough paths, on one hand, and multilevel Monte Carlo on the other. (Joint work with Fan Zhang.)
11:45 to 12:30 Christian Robert (CNRS & Université Paris-Dauphine)
Inference in generative models using the Wasserstein distance
In purely generative models, one can simulate data given parameters but not necessarily evaluate the likelihood. We use Wasserstein distances between empirical distributions of observed data and empirical distributions of synthetic data drawn from such models to estimate their parameters. Previous interest in the Wasserstein distance for statistical inference has been mainly theoretical, due to computational limitations. Thanks to recent advances in numerical transport, the computation of these distances has become feasible, up to controllable approximation errors. We leverage these advances to propose point estimators and quasi-Bayesian distributions for parameter inference, first for independent data. For dependent data, we extend the approach by using delay reconstruction and residual reconstruction techniques. For large data sets, we propose an alternative distance using the Hilbert space-filling curve, which computation scales as n log n where n is the size of the data. We provide a theoretical study of the proposed estimators, and adaptive Monte Carlo algorithms to approximate them. The approach is illustrated on four examples: a quantile g-and-k distribution, a toggle switch model from systems biology, a Lotka-Volterra model for plankton population sizes and a L\'evy-driven stochastic volatility model.

[This is joint work with Espen Bernton (Harvard University), Pierre E. Jacob (Harvard University), Mathieu Gerber (University of Bristol).]

12:30 to 13:30 Lunch @ Wolfson Court
University of Cambridge Research Councils UK
    Clay Mathematics Institute London Mathematical Society NM Rothschild and Sons