# Timetable (STSW02)

## Statistics of geometric features and new data types

Monday 19th March 2018 to Friday 23rd March 2018

 09:00 to 09:50 Registration 09:50 to 10:00 Welcome from David Abrahams (INI Director) 10:00 to 11:00 Victor Panaretos (EPFL - Ecole Polytechnique Fédérale de Lausanne)Procrustes Analysis of Covariance Operators and Optimal Transport of Gaussian Processes Covariance operators are fundamental in functional data analysis, providing the canonical means to analyse functional variation via the celebrated Karhunen-Loève expansion. These operators may themselves be subject to variation, for instance in contexts where multiple functional populations are to be compared. Statistical techniques to analyse such variation are intimately linked with the choice of metric on covariance operators, and the intrinsic infinite-dimensionality of these operators. We will describe the manifold-like geometry of the space of trace-class infinite-dimensional covariance operators and associated key statistical properties, under the recently proposed infinite-dimensional version of the Procrustes metric. In particular, we will identify this space with that of centred Gaussian processes equipped with the Wasserstein metric of optimal transportation. The identification allows us to provide a description of those aspects of the geometry that are important in terms of statistical inference, and establish key properties of the Fréchet mean of a random sample of covariances, as well as generative models that are canonical for such metrics. The latter will allow us to draw connections with the problem of registration of warped functional data. Based on joint work with V. Masarotto (EPFL) and Y. Zemel (Göttingen). INI 1 11:00 to 11:30 Morning Coffee 11:30 to 12:30 Jim Ramsay (McGill University)Dynamic Smoothing Meets Gravity authors: Jim Ramsay, Michelle Carey and Juan Li institutions: McGill University, University College Dublin, and McGill University Systems of differential equations are often used to model buffering processes that modulate a non-smooth high-energy input so as to produce an output that is smooth and that distributes the energy load over time and space. Handwriting is buffered in this way. We show that the smooth complex script that spells `"statistics" in Chinese can be represented as buffered version of a series of 46 equal-interval step inputs. The buffer consists of three undamped oscillating springs, one for each orthogonal coordinate. The periods of oscillation vary slightly over the three coordinate in a way that reflects the masses that are moved by muscle activations. Our analyses of data on juggling three balls and on lip motion during speech confirm that this model works for a wide variety of human motions. We use the term "dynamic smoothing" for the estimation of a structured input functional object along with the buffer characteristics. INI 1 12:30 to 13:30 Lunch @ Churchill College 13:30 to 14:30 Bodhisattva Sen (Columbia University)Adaptive Confidence Bands for Shape-Restricted Regression in Multi-dimension using Multiscale Tests Co-author: Pratyay Datta (Columbia University) We consider a multidimensional (continuous) Gaussian white noise regression model. We define suitable multiscale tests in this scenario. Utilizing these tests we construct confidence bands for the underlying regression function with guaranteed coverage probability, assuming that the underlying function is isotonic or convex. These confidence bands are shown to be asymptotically optimal in an appropriate sense. Computational issues will also be discussed. INI 1 14:30 to 15:30 Richard Samworth (University of Cambridge)Isotonic regression in general dimensions Co-authors: Qiyang Han (University of Washington), Tengyao Wang (University of Cambridge), Sabyasachi Chatterjee (University of Illinois)We study the least squares regression function estimator over the class of real-valued functions on $[0,1]^d$ that are increasing in each coordinate. For uniformly bounded signals and with a fixed, cubic lattice design, we establish that the estimator achieves the minimax rate of order $n^{−min\{2/(d+2),1/d\}}$ in the empirical $L_2$ loss, up to poly-logarithmic factors. Further, we prove a sharp oracle inequality, which reveals in particular that when the true regression function is piecewise constant on $k$ hyperrectangles, the least squares estimator enjoys a faster, adaptive rate of convergence of $(k/n)^{min(1,2/d)}$, again up to poly-logarithmic factors. Previous results are confined to the case $d\leq 2$. Finally, we establish corresponding bounds (which are new even in the case $d=2$) in the more challenging random design setting. There are two surprising features of these results: first, they demonstrate that it is possible for a global empirical risk minimisation procedure to be rate optimal up to poly-logarithmic factors even when the corresponding entropy integral for the function class diverges rapidly; second, they indicate that the adaptation rate for shape-constrained estimators can be strictly worse than the parametric rate. INI 1 15:30 to 16:00 Afternoon Tea 16:00 to 17:00 Rainer von Sachs (Université Catholique de Louvain)Intrinsic wavelet regression for curves and surfaces of Hermitian positive definite matrices Co-author: Joris Chau (ISBA, UC Louvain) In multivariate time series analysis, non-degenerate autocovariance and spectral density matrices are necessarily Hermitian and positive definite. An estimation methodology which preserves these properties is developed based on intrinsic wavelet transforms being applied to nonparametric wavelet regression for curves in the non-Euclidean space of Hermitian positive definite matrices. Via intrinsic average-interpolation in a Riemannian manifold equipped with a natural invariant Riemannian metric, we derive the wavelet coefficient decay and linear wavelet thresholding convergence rates of intrinsically smooth curves. Applying this more specifically to nonparametric spectral density estimation, an important property of the intrinsic linear or nonlinear wavelet spectral estimator under the invariant Riemannian metric is that it is independent of the choice of coordinate system of the time series, in contrast to most existing approaches. As a generalisation of this one-dimensional denoising of matrix-valued curves in the Riemannian manifold we also present higher-dimensional intrinsic wavelet transforms, applied in particular to time-varying spectral estimation of non-stationary multivariate time series, i.e. surfaces of Hermitian positive definite matrices.Related Linkshttps://cran.r-project.org/web/packages/pdSpecEst/index.html - R-package "pdSpecEst" (v1.2.1) on CRANhttps://jchau.shinyapps.io/pdSpecEst/ - Shiny-App "pdSpecEst" INI 1 17:00 to 18:00 Welcome Wine Reception & Poster Session
 09:00 to 10:00 Robert Nowak (University of Wisconsin-Madison); (Toyota Technological Institute) Learning Low-Dimensional Metrics This talk discusses the problem of learning a low-dimensional Euclidean metric from distance comparisons. Specifically, consider a set of n items with high-dimensional features and suppose we are given a set of (possibly noisy) distance comparisons of the form sign(dist(x,y) − dist(x,z)), where x, y, and z are the features associated with three such items. The goal is to learn the distance function that generates such comparisons. The talk focuses on several key issues pertaining to the theoretical foundations of metric learning: 1) optimization methods for learning general low-dimensional (low-rank) metrics as well as sparse metrics; 2) upper and lower (minimax) bounds on prediction error; 3) quantification of the sample complexity of metric learning in terms of the dimension of the feature space and the dimension/rank of the underlying metric; 4) bounds on the accuracy of the learned metric relative to the underlying true generative metric. Our results involve novel mathematical approaches to the metric learning problem and shed new light on the special case of ordinal embedding (aka non-metric multidimensional scaling). This is joint work with Lalit Jain and Blake Mason. INI 1 10:00 to 11:00 Matthew Reimherr (Pennsylvania State University)Manifold Data Analysis with Applications to High-Resolution 3D Imaging Many scientific areas are faced with the challenge of extracting information from large, complex, and highly structured data sets. A great deal of modern statistical work focuses on developing tools for handling such data. In this work we presents a new subfield of functional data analysis, FDA, which we call Manifold Data Analysis, or MDA. MDA is concerned with the statistical analysis of samples where one or more variables measured on each unit is a manifold, thus resulting in as many manifolds as we have units. We propose a framework that converts manifolds into functional objects, an efficient 2-step functional principal component method, and a manifold-on-scalar regression model. This work is motivated by an anthropological application involving 3D facial imaging data, which is discussed extensively throughout. The proposed framework is used to understand how individual characteristics, such as age and genetic ancestry, influence the shape of the human face. INI 1 11:00 to 11:30 Morning Coffee 11:30 to 12:30 Davide Pigoli (King's College London)Speech as object data: exploring cross-linguistic changes in Romance languages Exploring phonetic change between languages is of particular importance in the understanding of the history and geographical spread of languages. While many studies have considered differences in textual form or in phonetic transcription, it is somewhat more difficult to analyse speech recordings in this manner, although this is usually the dominant mode of transmission. Here, we propose a novel approach to explore phonetic changes, using log-spectrograms of speech recordings. After pre-processing the data to remove inherent individual differences, we identify time and frequency covariance functions as a feature of the language; in contrast, the mean depends mostly on the word that has been uttered. We use these means and covariances to postulate paths between languages, and we illustrate some preliminary results obtained when the model is applied to recordings of speakers of a few Romance languages. This is part of a joint work with P.Z. Hadjipantelis, J.S. Coleman and J.A.D. Aston. INI 1 12:30 to 13:30 Lunch @ Churchill College 13:30 to 14:30 Michelle Carey (University College Dublin)Uncertainty quantification for Geo-spatial process Co-author: James Ramsay (Prof) Geo spatial data are observations of a process that are collected in conjunction with reference to their geographical location. This type of data is abundant in many scientific fields, some examples include: population census, social and demographic (health, justice, education), economic (business surveys, trade, transport, tourism, agriculture, etc.) and environmental (atmospheric and oceanographic) data. They are often distributed over irregularly shaped spatial domains with complex boundaries and interior holes. Modelling approaches must account for the spatial dependence over these irregular domains as well as describing there temporal evolution. Dynamic systems modelling has a huge potential in statistics, as evidenced by the amount of activity in functional data analysis. Many seemingly complex forms of functional variation can be more simply represented as a set of differential equations, either ordinary or partial. In this talk, I will present a class of semi parametric regression models with differential regularization in the form of PDEs. This methodology will be called Data2PDE “Data to Partial Differential Equations". Data2PDE characterizes spatial processes that evolve over complex geometries in the presence of uncertain, incomplete and often noisy observations and prior knowledge regarding the physical principles of the process characterized by a PDE. INI 1 14:30 to 15:30 Ian Dryden (University of Nottingham)Object Data Driven Discovery Object data analysis is an important tool in the many disciplines where the data have much richer structure than the usual numbers or vectors. An initial question to ask is: what are the most basic data units? i.e. what are the atoms of the data? We describe an introduction to this topic, where the statistical analysis of object data has a wide variety of applications. An important aspect of the analysis is to reduce the dimension to a small number key features while respecting the geometry of the manifold in which objects lie. Three case studies are given which exemplify the types of issues that are encountered: i) Describing changes in variability in damaged DNA, ii) Testing for geometrical differences in carotid arteries, where patients are at high or low risk of aneurysm, iii) clustering enzymes observed over time. In all three applications the structure of the data manifolds is important, in particular the manifold of covariance matrices, unlabelled size-and-shape space and shape space. INI 1 15:30 to 16:00 Afternoon Tea 16:00 to 17:00 Fang Yao (University of Toronto); (University of Toronto)Functional regression on manifold with contamination We propose a new perspective on functional regression with a predictor process via the concept of manifold that is intrinsically finite-dimensional and embedded in an infinite-dimensional functional space, where the predictor is contaminated with discrete/noisy measurements. By a novel method of functional local linear manifold smoothing, we achieve a polynomial rate of convergence that adapts to the intrinsic manifold dimension and the level of sampling/noise contamination with a phase transition phenomenon depending on their interplay. This is in contrast to the logarithmic convergence rate in the literature of functional nonparametric regression. We demonstrate that the proposed method enjoys favorable finite sample performance relative to commonly used methods via simulated and real data examples. (Joint with Zhenhua Lin) INI 1
 09:00 to 10:00 Rebecca Willett (University of Wisconsin-Madison)Graph Total Variation for Inverse Problems with Highly Correlated Designs Co-authors: Garvesh Raskutti (University of Wisconsin), Yuan Li (University of Wisconsin) Sparse high-dimensional linear regression and inverse problems have received substantial attention over the past two decades. Much of this work assumes that explanatory variables are only mildly correlated. However, in modern applications ranging from functional MRI to genome-wide association studies, we observe highly correlated explanatory variables and associated design matrices that do not exhibit key properties (such as the restricted eigenvalue condition). In this talk, I will describe novel methods for robust sparse linear regression in these settings. Using side information about the strength of correlations among explanatory variables, we form a graph with edge weights corresponding to pairwise correlations. This graph is used to define a graph total variation regularizer that promotes similar weights for correlated explanatory variables. I will show how the graph structure encapsulated by this regularizer interacts with correlated design matrices to yield provably a ccurate estimates. The proposed approach outperforms standard methods in a variety of experiments on simulated and real fMRI data. This is joint work with Yuan Li and Garvesh Raskutti. INI 1 10:00 to 11:00 Sofia Olhede (University College London)Small and Large Scale Network Features Comparing and contrasting networks is hindered by their strongly non-Euclidean structure. I will discuss how one determines “optimal” features to compare two different networks of different sparsity and size. As the topology of any complex system is key to understanding its structure and function, the result will be developed from topological ideas. Fundamentally, algebraic topology guarantees that any system represented by a network can be understood through its closed paths. The length of each path provides a notion of scale, which is vitally important in characterizing dominant modes of system behavior. Here, by combining topology with scale, we prove the existence of universal features which reveal the dominant scales of any network. We use these features to compare several canonical network types in the context of a social media discussion which evolves through the sharing of rumors, leaks and other news. Our analysis enables for the first time a universal understanding of the balance between loops and tree-like structure across network scales, and an assessment of how this balance interacts with the spreading of information online. Crucially, our results allow networks to be quantified and compared in a purely model-free way that is theoretically sound, fully automated, and inherently scalable. INI 1 11:00 to 11:30 Morning Coffee 11:30 to 12:30 Johannes Schmidt-hieber (Universiteit Leiden)Statistical theory for deep neural networks with ReLU activation function The universal approximation theorem states that neural networks are capable of approximating any continuous function up to a small error that depends on the size of the network. The expressive power of a network does, however, not guarantee that deep networks perform well on data. For that, control of the statistical estimation risk is needed. In the talk, we derive statistical theory for fitting deep neural networks to data generated from the multivariate nonparametric regression model. It is shown that estimators based on sparsely connected deep neural networks with ReLU activation function and properly chosen network architecture achieve the minimax rates of convergence (up to logarithmic factors) under a general composition assumption on the regression function. The framework includes many well-studied structural constraints such as (generalized) additive models. While there is a lot of flexibility in the network architecture, the tuning parameter is the sparsity of the n etwork. Specifically, we consider large networks with number of potential parameters being much bigger than the sample size. Interestingly, the depth (number of layers) of the neural network architectures plays an important role and our theory suggests that scaling the network depth with the logarithm of the sample size is natural.Related Linkshttps://arxiv.org/abs/1708.06633 - Article INI 1 12:30 to 13:30 Lunch @ Churchill College 13:30 to 17:00 Free Afternoon
 09:00 to 10:00 Jingjing Zou (University of Cambridge)Mixed Effects Model on Functional Manifolds / Sampling Directed Networks I would like to talk about two projects. Co-authors of Mixed Effects Model on Functional Manifolds: John Aston (University of Cambridge), Lexin Li (UC Berkeley) We propose a generalized mixed effects model to study effects of subject-specific covariates on geometric and functional features of the subjects' surfaces. Here the covariates include both time-invariant covariates which affect both the geometric and functional features, and time-varying covariates which result in longitudinal changes in the functional textures. In addition, we extend the usual mixed effects model to model the covariance between a subject's geometric deformation and functional textures on the surface. Co-authors of Sampling Directed Networks: Richard Davis (Columbia University), Gennady Samorodnitsky (Cornell University), Zhi-Li Zhang (University of Minnesota). We propose a sampling procedure for the nodes in a network with the goal of estimating uncommon population features of the entire network. Such features might include tail behavior of the in-degree and out-degree distributions and as well as their joint distribution. Our procedure is based on selecting random initial nodes and then following the path of linked nodes in a structured fashion. In this procedure, targeted nodes with desired features, such as large in-degree, will have a larger probability of being retained. In order to construct nearly unbiased estimates of the quantities of interest, weights associated with the sampled nodes must be calculated. We will illustrate this procedure and compare it with a sampling scheme based on multiple random walks on several data sets including webpage network data and Google+ social network data. INI 1 10:00 to 11:00 Hao Chen (University of California, Davis)New two-sample tests based on adjacency Two-sample tests for multivariate data and non-Euclidean data are widely used in many fields.  We study a nonparametric testing procedure that utilizes graphs representing the similarity among observations.  It can be applied to any data types as long as an informative similarity measure on the sample space can be defined.  Existing tests based on a similarity graph lack power either for location or for scale alternatives. A new test is proposed that utilizes a common pattern overlooked previously, and it works for both types of alternatives.  The test exhibits substantial power gains in simulation studies. Its asymptotic permutation null distribution is derived and shown to work well under finite samples, facilitating its application to large data sets.  Another new test statistic will also be discussed that addresses the problem of the classic test of the type under unequal sample sizes.  Both tests are illustrated on an application of comparing networks under different conditions. INI 1 11:00 to 11:30 Morning Coffee 11:30 to 12:30 Alexander Aue (University of California, Davis)Limiting spectral distributions for a class of high-dimensional time series This talk discusses extensions to the time series case of the Marcenko-Pastur law on limiting spectral distributions (LSDs) for the eigenvalues of high-dimensional sample covariance matrices. The main result will be on establishing a non-linear integral equation characterizing the LSD in terms of its Stieltjes transform. Intuition will be presented for the simple case of a first-order moving average time series and evidence will be provided, indicating the applicability of the result to problems involving to the estimation of certain quadratic forms as they arise, for example, when dealing with the Markowitz portfolio problem. The talk is based on joint work with Haoyang Liu (Florida State) and Debashis Paul (UC Davis). INI 1 12:30 to 13:30 Lunch @ Churchill College 13:30 to 14:30 Regina Liu (Rutgers, The State University of New Jersey)Fusion and Individualized Fusion Learning from Diverse Data Sources by Confidence Distribution Inferences from different data sources can often be fused together to yield more powerful findings than those from individual sources alone. We present a new approach for fusion learning by using the so-called confidence distributions (CD). We further develop the individualized fusion learning, ‘iFusion’, for drawing efficient individualized inference by fusing the leanings from relevant data sources. This approach is robust for handling heterogeneity arising from diverse data sources, and is ideally suited for goal-directed applications such as precision medicine. In essence, iFusion strategically ‘borrows strength’ from relevant individuals to improve efficiency while retaining its inference validity. Computationally, the fusion approach here is parallel in nature and scales up well in comparison with competing approaches. The performance of the approach is demonstrated by simulation studies and risk valuation  of aircraft landing data. INI 1 14:30 to 15:30 Debashis Paul (University of California, Davis)Spectral estimation for a class of high-dimensional linear processes We present results about the limiting behavior of the empirical distribution of eigenvalues of weighted integrals of the sample periodogram for a class of high-dimensional linear processes. The processes under consideration are characterized by having simultaneously diagonalizable coefficient matrices. We make use of these asymptotic results, derived under the setting where the dimension and sample size are comparable, to formulate an estimation strategy for the distribution of eigenvalues of the coefficients of the linear process. This approach generalizes existing works on estimation of the spectrum of an unknown covariance matrix for high-dimensional i.i.d. observations.   (Joint work with Jamshid Namdari and Alexander Aue) INI 1 15:30 to 16:00 Afternoon Tea 16:00 to 17:00 Eardi Lila (University of Cambridge)Statistical Analysis of Functions on Surfaces, with an application to Medical Imaging Co-author: John Aston (University of Cambridge) In Functional Data Analysis, data are commonly assumed to be smooth functions on a fixed interval of the real line. In this work, we introduce a comprehensive framework for the analysis of functional data, whose domain is a two-dimensional manifold and the domain itself is subject to variability from sample to sample. We formulate a statistical model for such data, that we call Functions on Surfaces, which enables a joint representation of the geometric and functional aspects, and propose an associated estimation framework. We apply the proposed framework to the analysis of neuroimaging data of cortical thickness, acquired from the brains of different subjects, and thus lying on domains with different geometries. INI 1 19:30 to 22:00 Formal Dinner at St John's College
 09:00 to 10:00 Sara Anna van de Geer (ETH Zürich)A concentration interval for the Lasso We consider the linear model and the Lasso estimator. Our goal is to provide upper and lower bounds for the prediction error that are close to each other. We assume that the active components of the vector of regression coefficients are sufficiently large in absolute value (in a sense that will be specified) and that the tuning parameter is suitably chosen. The bounds depend on so-called compatibility constants. We will present the definition of the compatibility constants and discuss their relation with restricted eigenvalues. As an example, we consider the the least squares estimator with total variation penalty and present bounds with small gap. For the case of random design, we assume that the rows of the design matrix are i.i.d.copies of a Gaussian random vector. We assume that the largest eigenvalue of the covariance matrix remains bounded and establish under some mild compatibility conditions upper and lower bounds with ratio tending to one. INI 1 10:00 to 11:00 Miguel del alamo (Georg-August-Universität Göttingen)Multiscale Bounded Variation Regularization Co-authors: Housen Li (University of Goettingen), Axel Munk (University of Goettingen)In nonparametric regression and inverse problems, variational methods based on bounded variation (BV) penalties are a well-known and established tool for yielding edge-preserving reconstructions, which is a desirable feature in many applications. Despite its practical success, the theory behind BV-regularization is poorly understood: most importantly, there is a lack of convergence guarantees in spatial dimension d\geq 2.In this talk we present a variational estimator that combines a BV penalty and a multiscale constraint, and prove that it converges to the truth at the optimal rate. Our theoretical analysis relies on a proper analysis of the multiscale constraint, which is motivated by the statistical properties of the noise, and relates in a natural way to certain Besov spaces of negative smoothness. Further, the main novelty of our approach is the use of refined interpolation inequalities between function spaces. We also illustrate the performance of these variational estimators in simulations on signals and images. INI 1 11:00 to 11:30 Morning Coffee 11:30 to 12:30 Piotr Fryzlewicz (London School of Economics)Multiscale methods and recursion in data science The talk starts on a general note: we first attempt to define a "multiscale" method / algorithm as a recursive program acting on a dataset in a suitable way. Wavelet transformations, unbalanced wavelet transformations and binary segmentation are all examples of multiscale methods in this sense. Using the example of binary segmentation, we illustrate the benefits of the recursive formulation of multiscale algorithms from the software implementation and theoretical tractability viewpoints. We then turn more specific and study the canonical problem of a-posteriori detection of multiple change-points in the mean of a piecewise-constant signal observed with noise. Even in this simple set-up, many publicly available state-of-the-art methods struggle for certain classes of signals. In particular, this misperformance is observed in methods that work by minimising a "fit to the data plus a penalty" criterion, the reason being that it is challenging to think of a penalty that works well over a wide range of signal classes. To overcome this issue, we propose a new approach whereby methods learn from the data as they proceed, and, as a result, operate differently for different signal classes. As an example of this approach, we revisit our earlier change-point detection algorithm, Wild Binary Segmentation, and make it data-adaptive by equipping it with a recursive mechanism for deciding "on the fly" how many sub-samples of the input data to draw, and w here to draw them. This is in contrast to the original Wild Binary Segmentation, which is not recursive. We show that this significantly improves the algorithm particularly for signals with frequent change-points. Related Linkshttps://CRAN.R-project.org/package=breakfast - R software package "breakfast" (provides an implementation of Adaptive Wild Binary Segmentation) INI 1 12:30 to 13:30 Lunch @ Churchill College