09:45 to 10:20 Registration 10:20 to 10:50 Morning Coffee 10:50 to 11:00 Welcome from Christie Marr (INI Deputy Director) 11:00 to 11:45 Iain Johnstone (Stanford University)Eigenstructure in high dimensional random effects models The eigenstructure of i.i.d. samples from high dimensional data is known to show phenomena unexpected from experience with low dimensions -- eigenvalue spreading, bias and eigenvector inconsistency. Motivated by some problems in quantitative genetics, we now consider high dimensional data with more than one level of variation, and more specifically the spectral properties of estimators of the various components of variance matrices. We briefly describe bulk and edge eigenvalue distributions in this setting, and then focus more attention on properties and estimation of 'spiked' models. This is joint work with Zhou Fan and Yi Sun. INI 1 11:45 to 12:30 Philippe Rigollet (Massachusetts Institute of Technology)Uncoupled isotonic regression via minimum Wasserstein deconvolution Isotonic regression is a standard problem in shape constrained estimation where the goal is to estimate an unknown nondecreasing regression function $f$ from independent pairs $(x_i,y_i)$ where $\E[y_i]=f(x_i), i=1, \ldots n$. While this problem is well understood both statistically and computationally, much less is known about its uncoupled counterpart where one is given uncoupled $\{x_1, \ldots, x_n\}$ and $\{y_1, \ldots, y_n\}$. In this work, we leverage tools from optimal transport theory to derive minimax rates under weak moments conditions on $y_i$ together with an efficient algorithm. Both upper and lower bounds are articulated around moment-matching arguments that are also pertinent to learning mixtures of distributions and deconvolution. [Joint work with Jonathan Weed (MIT)] INI 1 12:30 to 14:00 Buffet Lunch at INI 14:00 to 14:45 Matthew Stephens (University of Chicago)On applications of Empirical Bayes approaches to the Normal Means problem The normal means problem is very simple: given normally-distributed observations with known variances and unknown means, estimate the means. That is, given X_j \sim N(\theta_j, \sigma_j^2, estimate \theta_j. A key idea is that one can do better than the maximum likelihood estimates, \hat{\theta}_j= \X_j, in particular by use of appropriate "shrinkage" estimators. One attractive way to perform shrinkage estimation in practice is to use Empirical Bayes methods. That is, to assume that \theta_j are independent and identically distributed from some distribution g that is to be estimated from the data. Then, given such an estimate \hat{g}, the posterior distributions of \theta_j can be computed to perform inference. We call this the "Empirical Bayes Normal Means" (EBNM) problem.Despite its simplicity, solving the EBNM problem has a wide range of practical applications. Here we present some flexible non-parametric approaches we have recently developed for solving the EBNM problem, and describe their application to several different settings: false discovery rate (FDR) estimation, non-parametric smoothing, and sparse matrix factorization problems (ie sparse factor analysis and sparse principal components analysis). INI 1 14:45 to 15:30 Jana Jankova (University of Cambridge)Asymptotic Inference for Eigenstructure of Large Covariance Matrices A vast number of methods have been proposed in literature for point estimation of eigenstructure of covariance matrices in high-dimensional settings. In this work, we study uncertainty quantification and propose methodology for inference and hypothesis testing for individual loadings of the covariance matrix. We base our methodology on a Lasso-penalized M-estimator which, despite non-convexity, may be solved by a polynomial-time algorithm such as coordinate or gradient descent. Our results provide theoretical guarantees on asymptotic normality of the new estimators and may be used for valid hypothesis testing and variable selection. These results are achieved under a sparsity condition relating the number of non-zero loadings, sample size, dimensionality of the covariance matrix and spectrum of the covariance matrix. This talk is based on joint work with Sara van de Geer. INI 1 15:30 to 16:00 Afternoon Tea 16:00 to 16:45 Flori Bunea (Cornell University)A fast algorithm with minimax optimal guarantees for topic models with an unknown number of topics Topic models have become popular for the analysis of data that consists in a collection of n independent multinomial observations. The model links all cell probabilities, collected in a p x n matrix, via the assumption that this matrix can be factorized as the product of two non-negative matrices A and W. Topic models have been originally developed in text mining, when one browses through n documents, based on a dictionary of p words, and covering K topics. In this terminology, the matrix A is called the word-topic matrix, and is the main target of estimation. It can be viewed as a matrix of conditional probabilities, and it is uniquely defined, under appropriate separability assumptions, discussed in this talk. Notably, the unique A is required to satisfy what is commonly known as the anchor word assumption, under which A has an unknown number of rows respectively proportional to K-dimensional canonical basis vectors. The indices of such rows are referred to as anchor words. Recent computationally feasible algorithms, with theoretical guarantees, utilize constructively this assumption by linking the estimation of the set of anchor words with that of estimating the K vertices of a simplex. This crucial step in the estimation of A requires K to be known, and cannot be easily extended to the more realistic set-up when K is unknown. In this talk, we take a different view on anchor word estimation, and on the estimation of the word-topic matrix A. We propose a new method of estimation in topic models, that is not a variation on the existing simplex finding algorithms, and that estimates K from the observed data. We derive new finite sample minimax lower bounds for the estimation of A, as well as new upper bounds for our proposed estimator. We describe the scenarios where our estimator is minimax adaptive. Our finite sample analysis is valid for any n, p, K and document length N, and both p and K are allowed to increase with n, a situation not handled well by previous analyses. We illustrate, via simulations, that the new algorithm is faster and more accurate than the current ones, although we start out with a computational and theoretical disadvantage of not knowing the correct number of topics K, while we provide the competing methods with the correct value in our simulations. INI 1 17:00 to 18:00 Welcome Wine Reception & Posters at INI
 09:00 to 09:45 Guy Bresler (Massachusetts Institute of Technology)Reducibility and Computational Lower Bounds for Problems with Planted Sparse Structure The prototypical high-dimensional statistics problem entails finding a structured signal in noise. Many of these problems exhibit a curious phenomenon: the amount of data needed by all known computationally efficient algorithms far exceeds what is needed for inefficient algorithms that search over all possible structures. A line of work initiated by Berthet and Rigollet in 2013 has aimed to explain these gaps by reducing from conjecturally hard problems in computer science. However, the delicate nature of average-case reductions has limited the applicability of this approach. In this work we introduce several new techniques to give a web of average-case reductions showing strong computational lower bounds based on the planted clique conjecture. These include tight lower bounds for Planted Independent Set, Planted Dense Subgraph, Sparse Spiked Wigner, Sparse PCA, as well as for new models that we introduce. Joint work with Matthew Brennan and Wasim Huleihel. INI 1 09:45 to 10:30 Chao Gao (University of Chicago)Reduced Isotonic Regression Consider an $n$-dimensional vector $X$ with mean $\theta^*$. In this talk, we consider $\theta^*$ that is both nondecreasing and has a piecewise constant structure. We establish the exact minimax rate of estimating such monotone functions, and thus give a non-trivial answer to an open problem in the shape-constrained analysis literature. The minimax rate involves an interesting iterated logarithmic dependence on the dimension. We then develop a penalized least-squares procedure for estimating $\theta^*$ adaptively. This estimator is shown to achieve the derived minimax rate without the knowledge of the number of pieces in linear time. We further allow the model to be misspecified and derive oracle inequalities with the optimal rates for the proposed estimator. This is a joint work with Fang Han and Cun-hui Zhang. INI 1 10:30 to 11:00 Morning Coffee 11:00 to 11:45 Ryan Tibshirani (Carnegie Mellon University)Dykstra’s Algorithm, ADMM, and Coordinate Descent: Connections, Insights, and Extensions We study connections between Dykstra’s algorithm for projecting onto an intersection of convex sets, the augmented Lagrangian method of multipliers or ADMM, and block coordinate descent. We prove that coordinate descent for a regularized regression problem, in which the penalty is a separable sum of support functions, is exactly equivalent to Dykstra’s algorithm applied to the dual problem. ADMM on the dual problem is also seen to be equivalent, in the special case of two sets, with one being a linear subspace. These connections, aside from being interesting in their own right, suggest new ways of analyzing and extending coordinate descent. For example, from existing convergence theory on Dykstra’s algorithm over polyhedra, we discern that coordinate descent for the lasso problem converges at an (asymptotically) linear rate. We also develop two parallel versions of coordinate descent, based on the Dykstra and ADMM connections. Finally, we discuss the implications of this work for backfitting in additive models. INI 1 11:45 to 12:30 Peter Bickel (University of California, Berkeley)Two network scale challenges:Constructing and fitting hierarchical block models and fitting large block models using the mean field method Work with S.Bhattacharyya,T.Li,E.Levina,S.Mukherjee,P.Sarkar Networks are a complex type of structure presenting itself in many applications . They are usually represented by a graph ,with possibly weighted edges plus additional covariates (such as directions).Block models have been studied for some time as basic approximations to ergodic stationary probability models for single graphs.A huge number of fitting methods have been developed for these models some of which we will touch on. The mean field method in which an increasing number of parameters must be fitted is used not only for multiple membership block models but also in applications such as LDA.if the graph is too large poor behaviour of the method can be seen.. We have developed what we call "patch methods " for fitting which help both computationally and inferentially in such situations bur much further analysis is needed. It is intuitively clear but mathematically unclear how knowledge of the model having nested scales helps in fitting large scale as opposed to small scale parameters.We will discuss this issue through an example, INI 1 12:30 to 14:00 Buffet Lunch at INI 14:00 to 14:45 Marten Herman Wegkamp (Cornell University)Adaptive estimation of the rank of the regression coefficient matrix We consider the multivariate response regression problem with a regression coefficient matrix of low, unknown rank. In this setting, we analyze a new criterion for selecting the optimal reduced rank. This criterion does not require estimation of the unknown variance of the noise, nor depends on a delicate choice of a tuning parameter. We develop an iterative, fully data-driven procedure, that adapts to the optimal signal-to-noise ratio. This procedure finds the true rank in a few steps with overwhelming probability. At each step, our estimate increases, while at the same time it does not exceed the true rank. Our finite sample results hold for any sample size and any dimension, even when the number of responses and of covariates grow much faster than the number of observations. An extensive simulation study that confirms our theoretical findings in both low and high dimensional settings. This is joint work with Xin Bing. INI 1 14:45 to 15:30 Maryam Fazel (University of Washington)Competitive Online Algorithms for Budgeted Allocation with Application to Online Experiment Design We consider an online resource allocation problem, where the goal is to maximize a function of a positive semidefinite (PSD) matrix with a scalar budget constraint. The problem data arrives online, and the algorithm needs to make an irrevocable decision at each step. Of particular interest are classic experiment design problems in the online setting, with the algorithm deciding whether to allocate budget to each experiment as new experiments become available sequentially. We analyze two classes of primal-dual algorithms and provide bounds on their competitive ratios. Our analysis relies on a smooth surrogate of the objective function that needs to satisfy a new diminishing-returns property. We then formulate a convex optimization problem to directly optimize this competitive ratio bound; this design problem can be solved offline before the data start arriving. We give examples of computing the smooth surrogate for D-optimal and A-optimal experiment design. This is joint work with Reza Eghbali and James Saunderson. INI 1 15:30 to 16:00 Afternoon Tea 16:00 to 16:45 Alex d'Aspremont (CNRS - Ecole Normale Superieure Paris)An Approximate Shapley-Folkman Theorem. with Thomas Kerdreux and Igor Colin. The Shapley-Folkman theorem shows that Minkowski averages of uniformly bounded sets tend to be convex when the number of terms in the sum becomes much larger than the ambient dimension. In optimization, \citet{Aubi76} show that this produces an a priori bound on the duality gap of separable nonconvex optimization problems involving finite sums. This bound is highly conservative and depends on unstable quantities, and we relax it in several directions to show that non convexity can have a much milder impact on finite sum minimization problems such as empirical risk minimization and multi-task classification. As a byproduct, we show a new version of Maurey's classical approximate Carath\'eodory lemma where we sample a significant fraction of the coefficients, without replacement. INI 1
 09:00 to 09:45 Urvashi Oswal (University of Wisconsin-Madison)Selection and Clustering of Correlated variables using OWL/GrOWL regularizers In high-dimensional linear regression problems, it is likely that several covariates are highly correlated e.g. in fMRI data, certain voxels or brain regions may have very correlated activation patterns. Using standard sparsity-inducing regularization (such as Lasso) in such scenarios is known to be unsatisfactory, as it leads to the selection of a subset of the highly correlated covariates. For engineering purposes and/or scientific interpretability, it is often desirable to explicitly identify all of the covariates that are relevant for modeling the data. In this talk, I will present clustering properties and error bounds of Ordered Weighted $\ell_1$ (OWL) regularization for linear regression, and a group variant for multi-task regression called Group OWL (GrOWL). I will present the application of OWL/GrOWL in three settings: (1) Brain networks (2) Subspace clustering and (3) Deep Learning. I will demonstrate, in theory and experiments, how OWL/GrOWL deal with strongly correlated covariates by automatically clustering and averaging regression coefficients associated with those covariates. This is joint work with Robert Nowak, Mário Figueiredo, Tim Rogers and Chris Cox. INI 1 09:45 to 10:30 Elizaveta Levina (University of Michigan)Matrix completion in network analysis Matrix completion is an active area of research in itself, and a natural tool to apply to network data, since many real networks are observed incompletely and/or with noise. However, developing matrix completion algorithms for networks requires taking into account the network structure. This talk will discuss three examples of matrix completion used for network tasks. First, we discuss the use of matrix completion for cross-validation on network data, a long-standing problem in network analysis. Two other examples focus on reconstructing incompletely observed networks, with structured missingness resulting from network sampling mechanisms. One scenario we consider is egocentric sampling, where a set of nodes is selected first and then their connections to the entire network are observed. Another scenario focuses on data from surveys, where people are asked to name a given number of friends. We show that matrix completion, when used appropriately, can generally be very helpful in solving network problems. INI 1 10:30 to 11:00 Morning Coffee 11:00 to 11:45 Garvesh Raskutti (University of Wisconsin-Madison)Estimating sparse additive auto-regressive network models Consider a multi-variate time series, which may correspond to spike train responses for multiple neurons in a brain, crime event data across multiple regions, and many others. An important challenge associated with these time series models is to estimate an influence network between the d variables, especially when the number of variables d is large meaning we are in the high-dimensional setting. Prior work has focused on parametric vector auto-regressive models. However, parametric approaches are somewhat restrictive in practice. In this paper, we use the non-parametric sparse additive model (SpAM) framework to address this challenge. Using a combination of beta and phi-mixing properties of Markov chains and empirical process techniques for reproducing kernel Hilbert spaces (RKHSs), we provide upper bounds on mean-squared error in terms of the sparsity s, logarithm of the dimension logd, number of time points T, and the smoothness of the RKHSs. Our rates are sharp up to logarithm factors in many cases. We also provide numerical experiments that support our theoretical results and display potential advantages of using our non-parametric SpAM framework for a Chicago crime dataset. INI 1 11:45 to 12:30 Peter Bartlett (University of California, Berkeley)Representation, optimization and generalization properties of deep neural networks Deep neural networks have improved the state-of-the-art performance for prediction problems across an impressive range of application areas. This talk describes some recent results in three directions. First, we investigate the impact of depth on representational properties of deep residual networks, which compute near-identity maps at each layer, showing how their representational power improves with depth and that the functional optimization landscape has the desirable property that stationary points are optimal. Second, we study the implications for optimization in deep linear networks, showing how the success of a family of gradient descent algorithms that regularize towards the identity function depends on a positivity condition of the regression function. Third, we consider how the performance of deep networks on training data compares to their predictive accuracy, we demonstrate deviation bounds that scale with a certain "spectral complexity," and we compare the behavior of these bounds with the observed performance of these networks in practical problems.Joint work with Steve Evans, Dylan Foster, Dave Helmbold, Phil Long, and Matus Telgarsky. INI 1 12:30 to 14:00 Buffet Lunch at INI 14:00 to 17:00 Free Afternoon