Scalable statistical inference
Monday 3rd July 2017 to Friday 7th July 2017
11:45 to 12:30  Registration  
12:30 to 13:20  Lunch @ Wolfson Court  
13:20 to 13:30  Welcome from Christie Marr (INI Deputy Director)  
13:30 to 14:15 
Sylvia Richardson (University of Cambridge) Data compression with statistical guarantees
Joint talk with Daniel Ahfock (MRC Biostatistics Unit @ University of Cambridge) The talk is concerned with translating recent ideas from computer science on probabilistic datacompression techniques into a statistical framework that can be ‘safely’ applied for speeding linear regression analyses for very larges sample sizes in biomedicine. Our motivation is to facilitate the use of multivariate regression and model exploration in tall data sets, so that, for example, genetic association analyses carried out on hundreds of thousands of subjects can investigate multivariate effects for a set of explanatory features, rather than be restricted to one feature at a time associations for computational feasibility. Among the many approaches to dealing with tall data, probabilistic data compression techniques using random linear mapping, developed in the computer science community, so called sketching, are particularly suitable for linear regression problems. In the first part of the talk, we will present a hierarchical representation of sketching, which allows deriving statistical properties (distributional) of different sketching algorithms. In particular, we will discuss how the signal to noise ratio in the original data set is important for the choice of sketching algorithm. In the second part of the talk, we will further refine some of the approximation guarantees and consider iterative sketches. The talk will be illustrated on a genetic analysis of the link between a blood cell trait and the HLA region involving a sample of 130,000 people. http://arxiv.org/abs/1706.03665 
INI 1  
14:15 to 15:00 
Alexandre Bouchard (University of British Columbia) Monte Carlo without rejection
Coauthors: Arnaud Doucet (Oxford), Sebastian Vollmer (Warwick), George Deligiannidis (King's College London), Paul Vanetti (Oxford) Markov chain Monte Carlo methods have become standard tools to sample from complex highdimensional probability measures. Many available techniques rely on discretetime reversible Markov chains whose transition kernels built up over the MetropolisHastings algorithm. In our recent work, we investigate an alternative approach, the Bouncy Particle Sampler (BPS) where the target distribution of interest is explored using a continuoustime, non reversible Markov process. In this alternative approach, a particle moves along straight lines continuously around the space and, when facing a high energy barrier, it is not rejected but its path is modified by bouncing against this barrier. The resulting nonreversible Markov process provides a rejectionfree Markov chain Monte Carlo sampling scheme. This method, inspired from recent work in the molecular simulation literature, is shown to be a valid, efficient sampling scheme applicable to a wide range of Bayesian problems. We present several additional original methodological extensions and establish various theoretical properties of these procedures. We demonstrate experimentally the efficiency of these algorithms on a variety of Bayesian inference problems. 
INI 1  
15:00 to 15:30  Afternoon Tea  
15:30 to 16:15 
Helen Ogden (University of Southampton) Inference with approximate likelihoods
In cases where it is infeasible to compute the likelihood exactly, an
alternative is to find some numerical approximation to the likelihood,
then to use this approximate likelihood in place of the exact likelihood
to do inference about the model parameters. This is a fairly commonly
used approach, and I will give several examples of approximate
likelihoods which have been used in this way. But is this a valid
approach to inference? I will give conditions under which inference with
an approximate likelihood shares some of the same asymptotic properties
as inference with the exact likelihood, and describe the implications
in some examples. I will finish with some ideas about how to construct
scalable likelihood approximations which give statistically valid
inference. 
INI 1  
16:15 to 17:00 
Havard Rue (Norwegian University of Science and Technology) Scalable statistical inference with INLA
INLA do approximate Bayesian inference for the class of latent Gaussian models. It has shown sucessful allowing statisticians and applied scientists to fast and reliable Bayesian inference for a huge class of additve models, within reasonable time. Especially, the use of spatial Gaussian models using the SPDE approach has been particularly popular. Although most models runs within reasonable time, we are facing with the current implementation, limitations for really huge models like large space time models. In this talk I will discuss the current situation and possible strategies to improve the situation. 
INI 1  
17:00 to 18:00  Welcome Wine Reception at INI 
09:00 to 09:45 
Michael Jordan (University of California, Berkeley) Variational, Hamiltonian and Symplectic Perspectives on Acceleration
Accelerated gradient methods play a central role in optimization, achieving optimal rates in many settings. While many generalizations and extensions of Nesterov's original acceleration method have been proposed, it is not yet clear what is the natural scope of the acceleration concept. We study accelerated methods from a continuoustime perspective. We show that there is a Lagrangian functional that we call the "Bregman Lagrangian" which generates a large class of accelerated methods in continuous time, including (but not limited to) accelerated gradient descent, its nonEuclidean extension, and accelerated higherorder gradient methods. We show that the continuoustime limit of all of these methods correspond to traveling the same curve in spacetime at different speeds. We also describe a "Bregman Hamiltonian" which generates the accelerated dynamics, we develop a symplectic integrator for this Hamiltonian and we discuss relations between this symplectic integrator and classical Nesterov acceleration. [Joint work with Andre Wibisono, Ashia Wilson and Michael Betancourt.] 
INI 1  
09:45 to 10:30 
Anthony Lee (University of Warwick) Unbiased approximations of products of expectations
I will describe recent work with Simone Tiberi (Zurich) and Giacomo Zanella (Bocconi), on the unbiased approximation of a product of n expectations. Such products arise, e.g., as values of the likelihood function in latent variable models, and unbiased approximations can be used in a pseudomarginal Markov chain to facilitate inference. A straightforward, standard approach consists of approximating each term using an independent average of M i.i.d. random variables and taking the product of these approximations. While simple, this typically requires M to be O(n) so that the total number of random variables required is N = Mn = O(n^2) in order to control the relative variance of the approximation. Using all N random variables to approximate each expectation is less wasteful when producing them is costly, but produces a biased approximation. We propose an alternative to these two approximations that uses most of the N samples to approximate each expectation in such a way that the estimate of the product of expectations is unbiased. We analyze the variance of this approximation and show that it can result in N = O(n) being sufficient for the relative variance to be controlled as n increases. In situations where the cost of simulations dominates overall computational time, and fixing the relative variance, the proposed approximation is almost n times faster than the standard approach to compute.

INI 1  
10:30 to 11:00  Morning Coffee  
11:00 to 11:45 
Steven Scott (Google) Comparing Consensus Monte Carlo Strategies for Distributed Bayesian Computation
Consensus Monte Carlo is an algorithm for conducting Monte Carlo based Bayesian inference on large data sets distributed across many worker machines in a data center. The algorithm operates by running a separate Monte Carlo algorithm on each worker machine, which only sees a portion of the full data set. The workerlevel posterior samples are then combined to form a Monte Carlo approximation to the full posterior distribution based on the complete data set. We compare several methods of carrying out the combination, including a new method based on approximating workerlevel simulations using a mixture of multivariate Gaussian distributions. We find that resampling and kernel density based methods break down after 10 or sometimes fewer dimensions, while the new mixturebased approach works well, but the necessary mixture models take too long to fit.

INI 1  
11:45 to 12:30 
Jun Liu (Harvard University) tba 
INI 1  
12:30 to 13:30  Lunch @ Wolfson Court  
13:30 to 14:15 
Tamara Broderick (Massachusetts Institute of Technology) Coresets for scalable Bayesian logistic regression
Coauthors: Jonathan H. Huggins (MIT), Trevor Campbell (MIT) The use of Bayesian methods in largescale data settings is attractive because of the rich hierarchical models, uncertainty quantification, and prior specification they provide. However, standard Bayesian inference algorithms are computationally expensive, so their direct application to large datasets can be difficult or infeasible. Rather than modify existing algorithms, we instead leverage the insight that data is often redundant via a preprocessing step. In particular, we construct a weighted subset of the data (called a coreset) that is much smaller than the original dataset. We then input this small coreset to existing posterior inference algorithms without modification. To demonstrate the feasibility of this approach, we develop an efficient coreset construction algorithm for Bayesian logistic regression models. We provide theoretical guarantees on the size and approximation quality of the coreset  both for fixed, known datasets, and in expectation for a wide class o f data generative models. Our approach permits efficient construction of the coreset in both streaming and parallel settings, with minimal additional effort. We demonstrate the efficacy of our approach on a number of synthetic and realworld datasets, and find that, in practice, the size of the coreset is independent of the original dataset size. 
INI 1  
14:15 to 15:00 
Cristiano Varin (Università Cà Foscari di Venezia) Designing efficient composite likelihoods
Composite likelihood is an inference function constructed by compounding component likelihoods based on low dimensional marginal or conditional distributions. Since the components are multiplied as if they were independent, the composite likelihood inherits the properties of likelihood inference from a misspecified model. The virtue of composite likelihood inference is “combining the advantages of likelihood with computational feasibility” (Reid, 2013). Given the wide applicability, composite likelihoods are attracting interest as scalable surrogate for intractable likelihoods. Despite the promise, application of composite likelihood is still limited by some theoretical and computational issues which have received only partial or initial responses. Open theoretical questions concern characterization of general model conditions assuring validity of composite likelihood inference, optimal selection of component likelihoods and precise evaluation of estimation uncertainty. Computational issues concern how to design composite likelihood methods to balance statistical efficiency and computational efficiency. In this talk, after a critical review of composite likelihood theory, I shall focus on the potential merits of composite likelihood inference in modeling temporal and spatial variation of disease incidence. The talk is based on past work with Nancy Reid (Toronto) and David Firth (Warwick), and various new projects with Manuela Cattelan (Padova), Xanthi Pedeli (Venice) and Guido Masarotto (Padova). 
INI 1  
15:00 to 15:30  Afternoon Tea  
15:30 to 16:15 
Kerrie Mengersen (Queensland University of Technology) Transferability: as easy as ABC?
An intense area of interest in ecology is transferability: how to apply a model developed for one region to another region. This type of extrapolation is common in many areas yet is still underdeveloped statistically, particularly when models are complex or intractable. In this presentation I would like to discuss and ask advice about aspects of this problem, namely taking an experimental design approach to the problem and setting vaguely informative priors. Two questions will also be asked.

INI 1  
16:15 to 17:00 
Darren Wilkinson (Newcastle University) Category theory and functional programming for scalable statistical modelling and computational inference
This talk considers both the theoretical and computational requirements for scalable statistical modelling and computation. It will be argued that programming languages typically used for statistical computing do not naturally scale, and that functional programming languages by contrast are ideally suited to the development of scalable statistical algorithms. The mathematical subject of category theory provides the necessary theoretical underpinnings for rigorous analysis and reasoning about functional algorithms, their correctness, and their scalability. Used in conjunction with other tools from theoretical computer science, such as recursion schemes, these approaches narrow the gap between statistical theory and computational implementation, providing numerous benefits, not least automatic parallelisation and distribution of algorithms.

INI 1 
09:00 to 09:45 
Paul Fearnhead (Lancaster University) Asymptotics of Approximate Bayesian Computation
Many statistical applications involve models for which are easy to sample from, but for which it is difficult to evaluate the likelihood. Approximate Bayesian computation is a likelihoodfree method for implementing Bayesian inference in such cases. This talk will overview some recent results on the theoretical properties of approximate Bayesian computation which consider the performance of ABC as we get more data. It will cover questions such as: when does the ABC posterior concentrate on the true parameter value? What distribution does the ABC posterior converge to? And what is the frequentist distribution of pointestimates derived using ABC. It will also cover the impact of Monte Carlo error on estimates obtained using ABC, and consider whether, asympotically, it is possible to efficiently estimate parameters using ABC if we have a fixed Monte Carlo sample size. This is joint work with Wentao Li: https://arxiv.org/abs/1506.03481 and https://arxiv.org/abs/1609.07135; the talk will also cover work by David Frazier, Martin, Robert and Rousseau: https://arxiv.org/abs/1607.06903 
INI 1  
09:45 to 10:30 
Wentao Li (Lancaster University) Validating approximate Bayesian computation on posterior convergence
Coauthor: Paul Fearnhead (Lancaster University) Many statistical applications involve models for which it is difficult to evaluate the likelihood, but relatively easy to sample from. Approximate Bayesian computation is a likelihoodfree method for implementing Bayesian inference in such cases. We present a number of surprisingly strong asymptotic results for the regressionadjusted version of approximate Bayesian Computation introduced by Beaumont et al. (2002). We show that for an appropriate choice of the bandwidth in approximate Bayesian computation, using regressionadjustment will lead to a posterior that, asymptotically, correctly quantifies uncertainty. Furthermore, for such a choice of bandwidth we can implement an importance sampling algorithm to sample from the posterior whose acceptance probability tends to 1 as we increase the data sample size. This compares favourably to results for standard approximate Bayesian computation, where the only way to obtain its posterior that correctly quantifies uncertainty is to choose a much smaller bandwidth, for which the acceptance probability tends to 0 and hence for which Monte Carlo error will dominate. Related Links 
INI 1  
10:30 to 11:00  Morning Coffee  
11:00 to 11:45 
Sam Livingstone (University of Bristol) Kinetic energy choice in Hamiltonian/hybrid Monte Carlo
We consider how different choices of kinetic energy in Hamiltonian Monte Carlo affect algorithm performance. To this end, we introduce two quantities which can be easily evaluated, the composite gradient and the implicit noise. Results are established on integrator stability and geometric convergence, and we show that choices of kinetic energy that result in heavytailed momentum distributions can exhibit an undesirable negligible moves property, which we define. A general efficiencyrobustness trade off is outlined, and implementations which rely on approximate gradients are also discussed. Two numerical studies illustrate our theoretical findings, showing that the standard choice which results in a Gaussian momentum distribution is not always optimal in terms of either robustness or efficiency.

INI 1  
11:45 to 12:30 
Christophe Andrieu (University of Bristol) On the ordering of some nonreversible Markov chain and Markov process Monte Carlo methods
We will show how one can derive simple criteria to order the asymptotic variance of a broad class of Monte Carlo algorithms based on Markov chains or processes. Interestingly these results mirror existing results for algorithms based on reversible processes. Joint work with Sam Livingstone. 
INI 1  
12:30 to 13:30  Lunch @ Wolfson Court  
13:30 to 14:15 
Murray Pollock (University of Warwick) Exact Bayesian Inference for Big Data: Single and MultiCore Approaches
Coauthors: Hongsheng Dai (Essex), Paul Fearnhead (Lancaster), Adam Johansen (Warwick), Divakar Kumar (Warwick), Gareth Roberts (Warwick) This talk will introduce novel methodologies for exploring posterior distributions by modifying methodology for exactly (without error) simulating diffusion sample paths. The methodologies discussed have found particular applicability to "Big Data" problems. We begin by presenting the Scalable Langevin Exact Algorithm (ScaLE) and recent methodological extensions (including ReScaLE, which avoids the need for particle approximation in ScaLE), which has remarkably good scalability properties as the size of the data set increases (it has sublinear cost, and potentially no cost as a function of data size). ScaLE has particular applicability in the “singlecore” big data setting  in which inference is conducted on a single computer. In the second half of the talk we will present methodology to exactly recombine inferences on separate data sets computed on separate cores  an exact version of “divide and conquer". As such this approach has particu lar applicability in the “multicore” big data setting. We conclude by commenting on future work on the confluence of these approaches. Joint work with Hongsheng Dai, Paul Fearnhead, Adam Johansen, Divakar Kumar, Gareth Roberts. 
INI 1  
14:15 to 15:00 
Gareth Roberts (University of Warwick) Optimisation and complexity for Gibbs samplers for hierarchical and crossedeffect models
We
study the convergence properties of the Gibbs Sampler in the context of
Gaussian hierarchical and crossedeffect models. We develop a novel methodology
based on multigrid decompositions to derive analytic expressions for the
convergence rates of the algorithm, extending significantly the class of
conditionally Gaussian models amenable to direct analysis. In the hierarchical
context, our work gives a rather complete understanding of the Gibbs Sampler
behaviour for symmetric models (with arbitrary depth), while providing
approximations and bounds for the nonsymmetric cases. The theoretical results
give rise to simple and easytoimplement guidelines to optimise practical
implementations of the Gibbs samplers on such models. While the good
performances of the Gibbs Sampler in hierarchicallystructured models is
renowned, the context of crossedeffect models is drastically different. Here
hierarchical centering is not possible and the convergence of commonly
implemented Gibbs Sampler strategies deteriorates as the datasize increases,
resulting in superlinear computational complexity (potentially even quadratic)
in the number of datapoints. We show how to leverage the negativelycorrelated
structure of crossedeffect models to design easytoimplement collapsed Gibbs
Samplers whose complexity matches the one of hierarchical scenarios. This is joint work with Giacomo Zanella and Omiros Papaspiliopoulos. 
INI 1  
15:00 to 15:30  Afternoon Tea  
15:30 to 16:15 
Chris Holmes (University of Oxford) Fast Bayesian Boolean Matrix Factorisation
Boolean matrix factorisation decomposes a binary data matrix
into an approximating Boolean product of two low rank, binary matrices: one
containing meaningful patterns (signatures), the other quantifying how the
observations can be expressed as a logical combination of these patterns.
We introduce a probabilistic model for Boolean matrix factorisation, termed the “OrMachine”, and derive a Metropolised Gibbs sampler that facilitates efficient parallel posterior inference on commodity hardware. On real world and simulated data, our Bayesian method provides state of the art performance for Boolean matrix factorisation and matrix completion. The method supports full posterior inference, which is important in applications, for example in controlling false positive rates in collaborative filtering and, crucially, improves the interpretability of the inferred patterns. The proposed model and computation scale to large datasets as motivated by an analysis of single cell gene expression data recording measurements from 1.3 million mouse brain cells across 11 thousand genes. 
INI 1  
16:15 to 17:00 
Louis Aslett (University of Oxford) Towards Encrypted Inference for Arbitrary Models
There has been substantial progress in development of statistical methods which are amenable to computation with modern cryptographic techniques, such as homomorphic encryption. This has enabled fitting and/or prediction of models in areas from classification and regression through to genome wide association studies. However, these are techniques devised to address specific models in specific settings, with the broader challenge of an approach to inference for arbitrary models and arbitrary data sets receiving less attention. This talk will discuss very recent results from ongoing work towards an approach which may allow theoretically arbitrary low dimensional models to be fitted fully encrypted, keeping the model and prior secret from data owners and viceversa. The methodology will be illustrated with a variety of examples, together with a discussion of the ongoing direction of the work. 
INI 1  
17:00 to 18:00  Board Meeting  INI 1  
19:30 to 22:00  Formal Dinner at Trinity College 
09:00 to 09:45 
Helen Zhang (University of Arizona) Hierarchypreserving regularization solution paths for identifying interactions in high dimensional data
Coauthors: Ning Hao (University of Arizona), Yang Feng (Columbia University) Interaction screening for highdimensional settings has recently drawn much attention in the literature. A variety of interaction screening approaches have been proposed for regression and classification problems. However, most of existing regularization methods for interaction selections are limited to low or moderate dimensional data analysis, due to their complex programing with inequality constraints and demanded prohibitive storage and computational cost when handling high dimensional data. This talk will present our recent work on scalable regularization methods to interaction selection under hierarchical constraints for high dimensional regression and classification. We first consider twostage LASSO methods and establish their theoretical properties. Then a new regularization method, called Regularization Algorithm under Marginality Principle (RAMP), is developed to compute hierarchypreserving regularization solution paths efficiently. In contrast to existing regular ization methods, the proposed methods avoid storing the entire design matrix and sidestep complex constraints and penalties, making them feasible to ultrahigh dimensional data analysis. The new methods are further extended to handling binary responses. Extensive numerical results will be presented as well. 
INI 1  
09:45 to 10:30 
Katherine Heller (Duke University) Mobile Apps and Machine Learning for Improving Healthcare
The first part of this talk centers on the analysis of student influenza data. Students in dormitories at the University of Michigan were given smartphones with mobile a mobile app, called iEpi, that captured data about their locations, interactions, and disease symptoms. We develop Graphcoupled Hidden Markov Models (GCHMMs) which use this data to predict whether a student was likely to fall ill due to their interactions. Using a hierarchical version of GCHMMs we can combine with demographic data and see that certain characteristics, such as drinking, and poor sleep quality, increased the likelihood of contracting influenza, as well as recovery time. The second part discusses the development of a new mobile app, MS Mosaic, for tracking symptoms in multiple sclerosis (MS) patients. The app includes data in the form of daily surveys, fitness tracker information, and mobile phone task data. The daily surveys about symptoms and medications can potentially be completed with a single notification swipe, sleep and activity data can be collected passively using HealthKit, and mobile phone tasks include finger tapping, gait analysis, as well as additional cognitive and motor tasks. Data collected provides an opportunity for the development of novel machine learning methods for learning about chronic disease, and novel sensor types. The app will soon be released to the Apple app store, and piloted in clinic at Duke University. If time remains we will briefly look at some of the other healthcare work on using Gaussian Process models on EHR data, going on currently at Duke. Coauthors: Kai Fan, Allison Aiello, Lee Hartsell, Joe Futoma, and Sanjay Hariharan 
INI 1  
10:30 to 11:00  Morning Coffee  
11:00 to 11:45 
James Scott (University of Texas at Austin) Detecting radiological anomalies
Radiologically active materials are used widely in industry, medicine, and research. Yet an unsecured, lost, or stolen radiological source can present a major threat to public safety. To deal with the potential environmental and security hazards posed by such a scenario, govenment agencies use various detection procedures at ports of entry to their countries. Moreover, security agencies that try to prevent terrorist attacks are keenly interested in the problem of identifying and locating stolen or smuggled radiation samples. Even at the local level, police departments have shown increasing interest in the deployment of systems for detecting anomalous radiological sources. Statistically speaking, the radiological anomalydetection problem is one of detecting a change in distribution. Sequential data is collected from a sensor that measures the energies of arriving gamma rays. These observed energies are random variables drawn from an energy spectrum, which is a probability distribution over the set of possible gammaray energies. The question is whether those measured energies are from the normal background spectrum, and therefore harmless, or whether they are from an anomalous spectrum due to the presence of a nearby radiological source. In this talk I will describe some new statistical methods we’ve developed for deal with two major challenges in this setting: 1) characterizing the spatially varying background radiation in dense urban areas; and 2) flagging anomalous readings from spatially distributed sensor networks in a statistically rigorous way. This is joint work with Wesley Tansey, Oscar Padilla, Alex Reinhart, and Alex Athey. 
INI 1  
11:45 to 12:30 
PoLing Loh (University of WisconsinMadison) Community recovery in weighted stochastic block models
Coauthors: Min Xu (University of Pennsylvania), Varun Jog (University of Wisconsin  Madison) Identifying communities in a network is an important problem in many fields, including social science, neuroscience, military intelligence, and genetic analysis. In the past decade, the Stochastic Block Model (SBM) has emerged as one of the most wellstudied and wellunderstood statistical models for this problem. Yet, the SBM has an important limitation: it assumes that each network edge is drawn from a Bernoulli distribution. This is rather restrictive, since weighted edges are fairly ubiquitous in scientific applications, and disregarding edge weights naturally results in a loss of valuable information. In this paper, we study a weighted generalization of the SBM, where observations are collected in the form of a weighted adjacency matrix, and the weight of each edge is generated independently from a distribution determined by the community membership of its endpoints. We propose and analyze a novel algorithm for community estimation in the weighted SBM based on various su broutines involving transformation, discretization, spectral clustering, and appropriate refinements. We prove that our procedure is optimal in terms of its rate of convergence, and that the misclassification rate is characterized by the Renyi divergence between the distributions of withincommunity edges and betweencommunity edges. In the regime where the edges are sparse, we also establish sharp thresholds for exact recovery of the communities. Our theoretical results substantially generalize previously established thresholds derived specifically for unweighted block models. Furthermore, our algorithm introduces a principled and computationally tractable method of incorporating edge weights to the analysis of network data. 
INI 1  
12:30 to 13:30  Lunch @ Wolfson Court  
13:30 to 14:15 
Matti Vihola (University of Jyväskylä) Importance sampling type estimators based on approximate marginal Markov chain Monte Carlo and exact approximation
We consider an importance sampling (IS) type estimator based on Markov chain Monte Carlo (MCMC) which targets an approximate marginal distribution. The IS approach, based on unbiased estimators, is consistent, and provides a natural alternative to delayed acceptance (DA) pseudomarginal MCMC. The IS approach enjoys many benefits against DA, including a straightforward parallelisation. We focus on a Bayesian latent variable model setting, where the MCMC operates on the hyperparameters, and the latent variable distributions are approximated.

INI 1  
14:15 to 15:00 
Arnaud Doucet (University of Oxford) The Correlated PseudoMarginal Method
Joint work with George Deligiannidis and Michael Pitt The pseudomarginal algorithm is a popular Metropolis–Hastingstype scheme which samples asymptotically from a target probability density when we are only able to estimate unbiasedly an unnormalised version of it. However, for the performance of this scheme not to degrade as the number T of data points increases, it is typically necessary for the number N of Monte Carlo samples to be proportional to T to control the relative variance of the likelihood ratio estimator appearing in the acceptance probability of this algorithm. The correlated pseudomarginal algorithm is a modification of the pseudomarginal method using a likelihood ratio estimator computed using two correlated likelihood estimators. For random effects models, we show under regularity conditions that the parameters of this scheme can be selected such that the relative variance of this likelihood ratio estimator is controlled when N increases sublinearly with T and we provide guidelines on how to optimise the parameters of the algorithm based on a nonstandard weak convergence analysis. The efficiency of computations for Bayesian inference relative to the pseudomarginal method empirically increases with T and is higher than two orders of magnitude in some of our examples. 
INI 1  
15:00 to 15:30  Afternoon Tea  
15:30 to 16:15 
Sinan Yildirim (Sabanci University) Scalable Monte Carlo inference for statespace models
Coauthors: Christophe Andrieu (University of Bristol), Arnaud Doucet (University of Oxford) We present an original simulationbased method to estimate likelihood ratios efficiently for general statespace models. Our method relies on a novel use of the conditional Sequential Monte Carlo (cSMC) algorithm introduced in Andrieu et al. (2010) and presents several practical advantages over standard approaches. The ratio is estimated using a unique source of randomness instead of estimating separately the two likelihood terms involved. Beyond the benefits in terms of variance reduction one may expect in general from this type of approach, an important point here is that the variance of this estimator decreases as the distance between the likelihood parameters decreases. We show how this can be exploited in the context of Monte Carlo Markov chain (MCMC) algorithms, leading to the development of a new class of exactapproximate MCMC methods to perform Bayesian static parameter inference in statespace models. We show through simulations that, in contrast to the Particle Mar ginal Metropolis–Hastings (PMMH) algorithm of Andrieu et al. (2010), the computational effort required by this novel MCMC scheme scales favourably for large data sets. 
INI 1  
16:15 to 17:00 
Chris Sherlock (Lancaster University) The Discrete Bouncy Particle Sampler
The Bouncy Particle Sampler (BPS) is a continuoustime, nonreversible
MCMC algorithm that shows great promise in efficiently sampling from
certain highdimensional distributions; a particle moves with a fixed
velocity except that occasionally it "bounces" off the hyperplane
perpendicular to the gradient of the target density. One practical
difficulty is that for each specific target distribution, a
locallyvalid upper bound on the component of the gradient in the
direction of movement must be found so as to allow for simulation of the
bounce times via Poisson thinning; for efficient implementation this
bound should also be tight. In dimension $d=1$, the discretetime
version of the Bouncy Particle Sampler (and, equivalently, of the
ZigZag sampler, another continuoustime, nonreversible algorithm) is
known to consist of fixing a time step, $\Delta t$, and proposing a
shift of $v \Delta t$ which is accepted with a probability dependent on
the ratio of target evaluated at the proposed and current positions; on
rejection the velocity is reversed. We present a discretetime version
of the BPS that is valid in any dimension $d\ge 1$ and the limit of
which (as $\Delta t\downarrow 0$) is the BPS, which is rejection free.
The Discrete BPS has the advantages of nonreversible algorithms in
terms of mixing, but does not require an upper bound on a Poisson
intensity and so is straightforward to apply to complex targets, such as
those which can be evaluated pointwise but for which general
properties, such as local or global Lipshitz bounds on derivatives,
cannot be obtained. [Joint work with Dr. Alex Thiery].

INI 1 
09:00 to 09:45 
Eric François Moulines (Télécom ParisTech) Langevin MCMC: theory and methods
Nicolas Brosse, Ecole Polytechnique, Paris Alain Durmus, Telecom ParisTech and Ecole Normale Supérieure ParisSaclay Marcelo Pereira, HerriotWatt University, Edinburgh The complexity and sheer size of modern datasets, to whichever increasingly demanding questions are posed, give rise to major challenges. Traditional simulation methods often scale poorly with data size and model complexity and thus fail for the most complex of modern problems. We are considering the problem of sampling from a logconcave distribution. Many problems in machine learning fall into this framework, such as linear illposed inverse problems with sparsityinducing priors, or large scale Bayesian binary regression. The purpose of this lecture is to explain how we can use ideas which have proven very useful in machine learning community to solve largescale optimization problems to design efficient sampling algorithms. Most of the efficient algorithms know so far may be seen as variants of the gradient descent algorithms, most often coupled with « partial updates » (coordinates descent algorithms). This, of course, suggests studying methods derived from Euler discretization of the Langevin diffusion. Partial updates may in this context as « Gibbs steps »This algorithm may be generalized in the nonsmooth case by « regularizing » the objective function. The MoreauYosida infconvolution algorithm is an appropriate candidate in such case. We will prove convergence results for these algorithms with explicit convergence bounds both in Wasserstein distance and in total variation. Numerical illustrations will be presented (on the computation of Bayes factor for model choice, Bayesian analysis of highdimensional regression, aggregation of estimators) to illustrate our results. 
INI 1  
09:45 to 10:30 
David Firth (University of Warwick) BradleyTerry models for paircomparison networks: Structure and scalability
In this talk I will discuss various aspects of BradleyTerry models, focusing especially on some aspects that can make the BradleyTerry model an interesting "toy" case when considering scalable computation and inference more generally. Included will be some (very!) fresh simulation experiments to investigate: (a) largenetwork bias; and (b) model reformulation to improve distributional properties as well as computational tractability in "structured" BradleyTerry models. This relates to ongoing work with Warwick PhD students David Selby and Ella Kaye, and also with Heather Turner (Warwick) and Ioannis Kosmidis (UCL)  and so I am especially keen to engage with other participants in this INI programme, to explore competing approaches.

INI 1  
10:30 to 11:00  Morning Coffee  
11:00 to 11:45 
Jose Blanchet (Columbia University); (Stanford University) Exact Sampling for Multivariate Diffusions
We provide the first generic exact simulation algorithm for multivariate diffusions.
Current exact sampling algorithms for diffusions require the existence of a transformation which
can be used to reduce the sampling problem to the case of a constant diffusion matrix and a drift
which is the gradient of some function. Such transformation, called Lamperti transformation, can
be applied in general only in one dimension. So, completely different ideas are required for exact
sampling of generic multivariate diffusions. The development of these ideas is the main contribution
of this paper. Our strategy combines techniques borrowed from the theory of rough paths, on one
hand, and multilevel Monte Carlo on the other. (Joint work with Fan Zhang.)

INI 1  
11:45 to 12:30 
Christian Robert (CNRS & Université ParisDauphine) Inference in generative models using the Wasserstein distance
In purely generative models, one can simulate data given parameters but not
necessarily evaluate the likelihood. We use Wasserstein distances between
empirical distributions of observed data and empirical distributions of
synthetic data drawn from such models to estimate their parameters. Previous
interest in the Wasserstein distance for statistical inference has been mainly
theoretical, due to computational limitations. Thanks to recent advances in
numerical transport, the computation of these distances has become feasible, up
to controllable approximation errors. We leverage these advances to propose
point estimators and quasiBayesian distributions for parameter inference,
first for independent data. For dependent data, we extend the approach by using
delay reconstruction and residual reconstruction techniques. For large data
sets, we propose an alternative distance using the Hilbert spacefilling curve,
which computation scales as n log n where n is the size of the data. We
provide a theoretical study of the proposed estimators, and adaptive Monte
Carlo algorithms to approximate them. The approach is illustrated on four
examples: a quantile gandk distribution, a toggle switch model from systems
biology, a LotkaVolterra model for plankton population sizes and a
L\'evydriven stochastic volatility model. [This is joint work with Espen Bernton (Harvard University), Pierre E. Jacob (Harvard University), Mathieu Gerber (University of Bristol).] 
INI 1  
12:30 to 13:30  Lunch @ Wolfson Court 