09:00 to 09:45 Helen Zhang (University of Arizona)Hierarchy-preserving regularization solution paths for identifying interactions in high dimensional data Co-authors: Ning Hao (University of Arizona), Yang Feng (Columbia University) Interaction screening for high-dimensional settings has recently drawn much attention in the literature. A variety of interaction screening approaches have been proposed for regression and classification problems. However, most of existing regularization methods for interaction selections are limited to low or moderate dimensional data analysis, due to their complex programing with inequality constraints and demanded prohibitive storage and computational cost when handling high dimensional data. This talk will present our recent work on scalable regularization methods to interaction selection under hierarchical constraints for high dimensional regression and classification. We first consider two-stage LASSO methods and establish their theoretical properties. Then a new regularization method, called Regularization Algorithm under Marginality Principle (RAMP), is developed to compute hierarchy-preserving regularization solution paths efficiently. In contrast to existing regular ization methods, the proposed methods avoid storing the entire design matrix and sidestep complex constraints and penalties, making them feasible to ultra-high dimensional data analysis. The new methods are further extended to handling binary responses. Extensive numerical results will be presented as well. INI 1 09:45 to 10:30 Katherine Heller (Duke University)Mobile Apps and Machine Learning for Improving Healthcare The first part of this talk centers on the analysis of student influenza data. Students in dormitories at the University of Michigan were given smartphones with mobile a mobile app, called iEpi, that captured data about their locations, interactions, and disease symptoms. We develop Graph-coupled Hidden Markov Models (GCHMMs) which use this data to predict whether a student was likely to fall ill due to their interactions. Using a hierarchical version of GCHMMs we can combine with demographic data and see that certain characteristics, such as drinking, and poor sleep quality, increased the likelihood of contracting influenza, as well as recovery time.The second part discusses the development of a new mobile app, MS Mosaic, for tracking symptoms in multiple sclerosis (MS) patients. The app includes data in the form of daily surveys, fitness tracker information, and mobile phone task data. The daily surveys about symptoms and medications can potentially be completed with a single notification swipe, sleep and activity data can be collected passively using HealthKit, and mobile phone tasks include finger tapping, gait analysis, as well as additional cognitive and motor tasks. Data collected provides an opportunity for the development of novel machine learning methods for learning about chronic disease, and novel sensor types. The app will soon be released to the Apple app store, and piloted in clinic at Duke University.If time remains we will briefly look at some of the other healthcare work on using Gaussian Process models on EHR data, going on currently at Duke.Coauthors: Kai Fan, Allison Aiello, Lee Hartsell, Joe Futoma, and Sanjay Hariharan INI 1 10:30 to 11:00 Morning Coffee 11:00 to 11:45 James Scott (University of Texas at Austin)Detecting radiological anomalies Radiologically active materials are used widely in industry, medicine, and research. Yet an unsecured, lost, or stolen radiological source can present a major threat to public safety.  To deal with the potential environmental and security hazards posed by such a scenario, govenment agencies use various detection procedures at ports of entry to their countries.  Moreover, security agencies that try to prevent terrorist attacks are keenly interested in the problem of identifying and locating stolen or smuggled radiation samples. Even at the local level, police departments have shown increasing interest in the deployment of systems for detecting anomalous radiological sources.Statistically speaking, the radiological anomaly-detection problem is one of detecting a change in distribution. Sequential data is collected from a sensor that measures the energies of arriving gamma rays. These observed energies are random variables drawn from an energy spectrum, which is a probability distribution over the set of possible gamma-ray energies. The question is whether those measured energies are from the normal background spectrum, and therefore harmless, or whether they are from an anomalous spectrum due to the presence of a nearby radiological source.  In this talk I will describe some new statistical methods we’ve developed for deal with two major challenges in this setting: 1) characterizing the spatially varying background radiation in dense urban areas; and 2) flagging anomalous readings from spatially distributed sensor networks in a statistically rigorous way.This is joint work with Wesley Tansey, Oscar Padilla, Alex Reinhart, and Alex Athey. INI 1 11:45 to 12:30 Po-Ling Loh (University of Wisconsin-Madison)Community recovery in weighted stochastic block models Co-authors: Min Xu (University of Pennsylvania), Varun Jog (University of Wisconsin - Madison) Identifying communities in a network is an important problem in many fields, including social science, neuroscience, military intelligence, and genetic analysis. In the past decade, the Stochastic Block Model (SBM) has emerged as one of the most well-studied and well-understood statistical models for this problem. Yet, the SBM has an important limitation: it assumes that each network edge is drawn from a Bernoulli distribution. This is rather restrictive, since weighted edges are fairly ubiquitous in scientific applications, and disregarding edge weights naturally results in a loss of valuable information. In this paper, we study a weighted generalization of the SBM, where observations are collected in the form of a weighted adjacency matrix, and the weight of each edge is generated independently from a distribution determined by the community membership of its endpoints. We propose and analyze a novel algorithm for community estimation in the weighted SBM based on various su broutines involving transformation, discretization, spectral clustering, and appropriate refinements. We prove that our procedure is optimal in terms of its rate of convergence, and that the misclassification rate is characterized by the Renyi divergence between the distributions of within-community edges and between-community edges. In the regime where the edges are sparse, we also establish sharp thresholds for exact recovery of the communities. Our theoretical results substantially generalize previously established thresholds derived specifically for unweighted block models. Furthermore, our algorithm introduces a principled and computationally tractable method of incorporating edge weights to the analysis of network data. INI 1 12:30 to 13:30 Lunch @ Wolfson Court 13:30 to 14:15 Matti Vihola (University of Jyväskylä)Importance sampling type estimators based on approximate marginal Markov chain Monte Carlo and exact approximation We consider an importance sampling (IS) type estimator based on Markov chain Monte Carlo (MCMC) which targets an approximate marginal distribution. The IS approach, based on unbiased estimators, is consistent, and provides a natural alternative to delayed acceptance (DA) pseudo-marginal MCMC. The IS approach enjoys many benefits against DA, including a straightforward parallelisation. We focus on a Bayesian latent variable model setting, where the MCMC operates on the hyperparameters, and the latent variable distributions are approximated. INI 1 14:15 to 15:00 Arnaud Doucet (University of Oxford)The Correlated Pseudo-Marginal Method INI 1 15:00 to 15:30 Afternoon Tea 15:30 to 16:15 Sinan Yildirim (Sabanci University)Scalable Monte Carlo inference for state-space models Co-authors: Christophe Andrieu (University of Bristol), Arnaud Doucet (University of Oxford) We present an original simulation-based method to estimate likelihood ratios efficiently for general state-space models. Our method relies on a novel use of the conditional Sequential Monte Carlo (cSMC) algorithm introduced in Andrieu et al. (2010) and presents several practical advantages over standard approaches. The ratio is estimated using a unique source of randomness instead of estimating separately the two likelihood terms involved. Beyond the benefits in terms of variance reduction one may expect in general from this type of approach, an important point here is that the variance of this estimator decreases as the distance between the likelihood parameters decreases. We show how this can be exploited in the context of Monte Carlo Markov chain (MCMC) algorithms, leading to the development of a new class of exact-approximate MCMC methods to perform Bayesian static parameter inference in state-space models. We show through simulations that, in contrast to the Particle Mar ginal Metropolis–Hastings (PMMH) algorithm of Andrieu et al. (2010), the computational effort required by this novel MCMC scheme scales favourably for large data sets. INI 1 16:15 to 17:00 Chris Sherlock (Lancaster University)The Discrete Bouncy Particle Sampler The Bouncy Particle Sampler (BPS) is a continuous-time, non-reversible MCMC algorithm that shows great promise in efficiently sampling from certain high-dimensional distributions; a particle moves with a fixed velocity except that occasionally it "bounces" off the hyperplane perpendicular to the gradient of the target density. One practical difficulty is that for each specific target distribution, a locally-valid upper bound on the component of the gradient in the direction of movement must be found so as to allow for simulation of the bounce times via Poisson thinning; for efficient implementation this bound should also be tight. In dimension $d=1$, the discrete-time version of the Bouncy Particle Sampler (and, equivalently, of the Zig-Zag sampler, another continuous-time, non-reversible algorithm) is known to consist of fixing a time step, $\Delta t$, and proposing a shift of $v \Delta t$ which is accepted with a probability dependent on the ratio of target evaluated at the proposed and current positions; on rejection the velocity is reversed. We present a discrete-time version of the BPS that is valid in any dimension $d\ge 1$ and the limit of which (as $\Delta t\downarrow 0$) is the BPS, which is rejection free. The Discrete BPS has the advantages of non-reversible algorithms in terms of mixing, but does not require an upper bound on a Poisson intensity and so is straightforward to apply to complex targets, such as those which can be evaluated pointwise but for which general properties, such as local or global Lipshitz bounds on derivatives, cannot be obtained. [Joint work with Dr. Alex Thiery]. INI 1