Statistics of geometric features and new data types
Monday 19th March 2018 to Friday 23rd March 2018
09:00 to 09:50  Registration  
09:50 to 10:00  Welcome from David Abrahams (INI Director)  
10:00 to 11:00 
Victor Panaretos (EPFL  Ecole Polytechnique Fédérale de Lausanne) Procrustes Analysis of Covariance Operators and Optimal Transport of Gaussian Processes
Covariance operators are fundamental in functional data analysis, providing the canonical means to analyse functional variation via the celebrated KarhunenLoève expansion. These operators may themselves be subject to variation, for instance in contexts where multiple functional populations are to be compared. Statistical techniques to analyse such variation are intimately linked with the choice of metric on covariance operators, and the intrinsic infinitedimensionality of these operators. We will describe the manifoldlike geometry of the space of traceclass infinitedimensional covariance operators and associated key statistical properties, under the recently proposed infinitedimensional version of the Procrustes metric. In particular, we will identify this space with that of centred Gaussian processes equipped with the Wasserstein metric of optimal transportation. The identification allows us to provide a description of those aspects of the geometry that are important in terms of statistical inference, and establish key properties of the Fréchet mean of a random sample of covariances, as well as generative models that are canonical for such metrics. The latter will allow us to draw connections with the problem of registration of warped functional data. Based on joint work with V. Masarotto (EPFL) and Y. Zemel (Göttingen).

INI 1  
11:00 to 11:30  Morning Coffee  
11:30 to 12:30 
Jim Ramsay (McGill University) Dynamic Smoothing Meets Gravity
authors: Jim Ramsay, Michelle Carey and Juan Li
institutions: McGill University, University College Dublin, and McGill University
Systems of differential equations are often used to model buffering processes that modulate a nonsmooth highenergy input so as to produce an output that is smooth and that distributes the energy load over time and space. Handwriting is buffered in this way. We show that the smooth complex script that spells `"statistics" in Chinese can be represented as buffered version of a series of 46 equalinterval step inputs. The buffer consists of three undamped oscillating springs, one for each orthogonal coordinate. The periods of oscillation vary slightly over the three coordinate in a way that reflects the masses that are moved by muscle activations. Our analyses of data on juggling three balls and on lip motion during speech confirm that this model works for a wide variety of human motions.
We use the term "dynamic smoothing" for the estimation of a structured input functional object along with the buffer characteristics.

INI 1  
12:30 to 13:30  Lunch @ Churchill College  
13:30 to 14:30 
Bodhisattva Sen (Columbia University) Adaptive Confidence Bands for ShapeRestricted Regression in Multidimension using Multiscale Tests
Coauthor: Pratyay Datta (Columbia University) We consider a multidimensional (continuous) Gaussian white noise regression model. We define suitable multiscale tests in this scenario. Utilizing these tests we construct confidence bands for the underlying regression function with guaranteed coverage probability, assuming that the underlying function is isotonic or convex. These confidence bands are shown to be asymptotically optimal in an appropriate sense. Computational issues will also be discussed. 
INI 1  
14:30 to 15:30 
Richard Samworth (University of Cambridge) Isotonic regression in general dimensions
Coauthors: Qiyang Han (University of Washington), Tengyao Wang (University of Cambridge), Sabyasachi Chatterjee (University of Illinois) We study the least squares regression function estimator over the class of realvalued functions on $[0,1]^d$ that are increasing in each coordinate. For uniformly bounded signals and with a fixed, cubic lattice design, we establish that the estimator achieves the minimax rate of order $n^{−min\{2/(d+2),1/d\}}$ in the empirical $L_2$ loss, up to polylogarithmic factors. Further, we prove a sharp oracle inequality, which reveals in particular that when the true regression function is piecewise constant on $k$ hyperrectangles, the least squares estimator enjoys a faster, adaptive rate of convergence of $(k/n)^{min(1,2/d)}$, again up to polylogarithmic factors. Previous results are confined to the case $d\leq 2$. Finally, we establish corresponding bounds (which are new even in the case $d=2$) in the more challenging random design setting. There are two surprising features of these results: first, they demonstrate that it is possible for a global empirical risk minimisation procedure to be rate optimal up to polylogarithmic factors even when the corresponding entropy integral for the function class diverges rapidly; second, they indicate that the adaptation rate for shapeconstrained estimators can be strictly worse than the parametric rate. 
INI 1  
15:30 to 16:00  Afternoon Tea  
16:00 to 17:00 
Rainer von Sachs (Université Catholique de Louvain) Intrinsic wavelet regression for curves and surfaces of Hermitian positive definite matrices
Coauthor: Joris Chau (ISBA, UC Louvain) In multivariate time series analysis, nondegenerate autocovariance and spectral density matrices are necessarily Hermitian and positive definite. An estimation methodology which preserves these properties is developed based on intrinsic wavelet transforms being applied to nonparametric wavelet regression for curves in the nonEuclidean space of Hermitian positive definite matrices. Via intrinsic averageinterpolation in a Riemannian manifold equipped with a natural invariant Riemannian metric, we derive the wavelet coefficient decay and linear wavelet thresholding convergence rates of intrinsically smooth curves. Applying this more specifically to nonparametric spectral density estimation, an important property of the intrinsic linear or nonlinear wavelet spectral estimator under the invariant Riemannian metric is that it is independent of the choice of coordinate system of the time series, in contrast to most existing approaches. As a generalisation of this onedimensional denoising of matrixvalued curves in the Riemannian manifold we also present higherdimensional intrinsic wavelet transforms, applied in particular to timevarying spectral estimation of nonstationary multivariate time series, i.e. surfaces of Hermitian positive definite matrices. Related Links

INI 1  
17:00 to 18:00  Welcome Wine Reception & Poster Session 
09:00 to 10:00 
Robert Nowak (University of WisconsinMadison); (Toyota Technological Institute) Learning LowDimensional Metrics
This talk discusses the problem of learning a lowdimensional Euclidean metric from distance comparisons. Specifically, consider a set of n items with highdimensional features and suppose we are given a set of (possibly noisy) distance comparisons of the form sign(dist(x,y) − dist(x,z)), where x, y, and z are the features associated with three such items. The goal is to learn the distance function that generates such comparisons. The talk focuses on several key issues pertaining to the theoretical foundations of metric learning: 1) optimization methods for learning general lowdimensional (lowrank) metrics as well as sparse metrics; 2) upper and lower (minimax) bounds on prediction error; 3) quantification of the sample complexity of metric learning in terms of the dimension of the feature space and the dimension/rank of the underlying metric; 4) bounds on the accuracy of the learned metric relative to the underlying true generative metric. Our results involve novel mathematical approaches to the metric learning problem and shed new light on the special case of ordinal embedding (aka nonmetric multidimensional scaling).
This is joint work with Lalit Jain and Blake Mason.

INI 1  
10:00 to 11:00 
Matthew Reimherr (Pennsylvania State University) Manifold Data Analysis with Applications to HighResolution 3D Imaging
Many scientific areas are faced with the challenge of extracting information from large, complex, and highly structured data sets. A great deal of modern statistical work focuses on developing tools for handling such data. In this work we presents a new subfield of functional data analysis, FDA, which we call Manifold Data Analysis, or MDA. MDA is concerned with the statistical analysis of samples where one or more variables measured on each unit is a manifold, thus resulting in as many manifolds as we have units. We propose a framework that converts manifolds into functional objects, an efficient 2step functional principal component method, and a manifoldonscalar regression model. This work is motivated by an anthropological application involving 3D facial imaging data, which is discussed extensively throughout. The proposed framework is used to understand how individual characteristics, such as age and genetic ancestry, influence the shape of the human face.

INI 1  
11:00 to 11:30  Morning Coffee  
11:30 to 12:30 
Davide Pigoli (King's College London) Speech as object data: exploring crosslinguistic changes in Romance languages
Exploring phonetic change between languages is of particular importance in the understanding of the history and geographical spread of languages. While many studies have considered differences in textual form or in phonetic transcription, it is somewhat more difficult to analyse speech recordings in this manner, although this is usually the dominant mode of transmission.
Here, we propose a novel approach to explore phonetic changes, using logspectrograms of speech recordings. After preprocessing the data to remove inherent individual differences, we identify time and frequency covariance functions as a feature of the language; in contrast, the mean depends mostly on the word that has been uttered. We use these means and covariances to postulate paths between languages, and we illustrate some preliminary results obtained when the model is applied to recordings of speakers of a few Romance languages.
This is part of a joint work with P.Z. Hadjipantelis, J.S. Coleman and J.A.D. Aston.

INI 1  
12:30 to 13:30  Lunch @ Churchill College  
13:30 to 14:30 
Michelle Carey (University College Dublin) Uncertainty quantification for Geospatial process
Coauthor: James Ramsay (Prof) Geo spatial data are observations of a process that are collected in conjunction with reference to their geographical location. This type of data is abundant in many scientific fields, some examples include: population census, social and demographic (health, justice, education), economic (business surveys, trade, transport, tourism, agriculture, etc.) and environmental (atmospheric and oceanographic) data. They are often distributed over irregularly shaped spatial domains with complex boundaries and interior holes. Modelling approaches must account for the spatial dependence over these irregular domains as well as describing there temporal evolution. Dynamic systems modelling has a huge potential in statistics, as evidenced by the amount of activity in functional data analysis. Many seemingly complex forms of functional variation can be more simply represented as a set of differential equations, either ordinary or partial. In this talk, I will present a class of semi parametric regression models with differential regularization in the form of PDEs. This methodology will be called Data2PDE “Data to Partial Differential Equations". Data2PDE characterizes spatial processes that evolve over complex geometries in the presence of uncertain, incomplete and often noisy observations and prior knowledge regarding the physical principles of the process characterized by a PDE. 
INI 1  
14:30 to 15:30 
Ian Dryden (University of Nottingham) Object Data Driven Discovery
Object data analysis is an important tool in the many disciplines where the data have much richer structure than the usual numbers or vectors. An initial question to ask is: what are the most basic data units? i.e. what are the atoms of the data? We describe an introduction to this topic,
where the statistical analysis of object data has a wide variety of applications. An important aspect of the analysis is to reduce the dimension to a small number key features while respecting the geometry of the manifold in which objects lie. Three case studies are given which exemplify the types of issues that are encountered: i) Describing changes in variability in damaged DNA, ii) Testing for geometrical differences in carotid arteries, where patients are at high or low risk of aneurysm, iii) clustering enzymes observed over time. In all three applications the structure of the data manifolds is important, in particular the manifold of covariance matrices, unlabelled sizeandshape space and shape space.

INI 1  
15:30 to 16:00  Afternoon Tea  
16:00 to 17:00 
Fang Yao (University of Toronto); (University of Toronto) Functional regression on manifold with contamination
We propose a new perspective on functional regression with a predictor process via the concept of manifold that is intrinsically finitedimensional and embedded in an infinitedimensional functional space, where the predictor is contaminated with discrete/noisy measurements. By a novel method of functional local linear manifold smoothing, we achieve a polynomial rate of convergence that adapts to the intrinsic manifold dimension and the level of sampling/noise contamination with a phase transition phenomenon depending on their interplay. This is in contrast to the logarithmic convergence rate in the literature of functional nonparametric regression. We demonstrate that the proposed method enjoys favorable finite sample performance relative to commonly used methods via simulated and real data examples.
(Joint with Zhenhua Lin)

INI 1 
09:00 to 10:00 
Rebecca Willett (University of WisconsinMadison) Graph Total Variation for Inverse Problems with Highly Correlated Designs
Coauthors: Garvesh Raskutti (University of Wisconsin), Yuan Li (University of Wisconsin) Sparse highdimensional linear regression and inverse problems have received substantial attention over the past two decades. Much of this work assumes that explanatory variables are only mildly correlated. However, in modern applications ranging from functional MRI to genomewide association studies, we observe highly correlated explanatory variables and associated design matrices that do not exhibit key properties (such as the restricted eigenvalue condition). In this talk, I will describe novel methods for robust sparse linear regression in these settings. Using side information about the strength of correlations among explanatory variables, we form a graph with edge weights corresponding to pairwise correlations. This graph is used to define a graph total variation regularizer that promotes similar weights for correlated explanatory variables. I will show how the graph structure encapsulated by this regularizer interacts with correlated design matrices to yield provably a ccurate estimates. The proposed approach outperforms standard methods in a variety of experiments on simulated and real fMRI data. This is joint work with Yuan Li and Garvesh Raskutti. 
INI 1  
10:00 to 11:00 
Sofia Olhede (University College London) Small and Large Scale Network Features
Comparing and contrasting networks is hindered by their
strongly nonEuclidean structure. I will discuss how one determines “optimal”
features to compare two different networks of different sparsity and size. As the topology of any complex system is key to
understanding its structure and function, the result will be developed from
topological ideas. Fundamentally, algebraic topology guarantees that any system
represented by a network can be understood through its closed paths. The length
of each path provides a notion of scale, which is vitally important in
characterizing dominant modes of system behavior. Here, by combining topology
with scale, we prove the existence of universal features which reveal the
dominant scales of any network. We use these features to compare several
canonical network types in the context of a social media discussion which
evolves through the sharing of rumors, leaks and other news. Our analysis
enables for the first time a universal understanding of the balance between
loops and treelike structure across network scales, and an assessment of how
this balance interacts with the spreading of information online. Crucially, our
results allow networks to be quantified and compared in a purely modelfree way
that is theoretically sound, fully automated, and inherently scalable.

INI 1  
11:00 to 11:30  Morning Coffee  
11:30 to 12:30 
Johannes Schmidthieber (Universiteit Leiden) Statistical theory for deep neural networks with ReLU activation function
The universal approximation theorem states that neural networks are capable of approximating any continuous function up to a small error that depends on the size of the network. The expressive power of a network does, however, not guarantee that deep networks perform well on data. For that, control of the statistical estimation risk is needed. In the talk, we derive statistical theory for fitting deep neural networks to data generated from the multivariate nonparametric regression model. It is shown that estimators based on sparsely connected deep neural networks with ReLU activation function and properly chosen network architecture achieve the minimax rates of convergence (up to logarithmic factors) under a general composition assumption on the regression function. The framework includes many wellstudied structural constraints such as (generalized) additive models. While there is a lot of flexibility in the network architecture, the tuning parameter is the sparsity of the n etwork. Specifically, we consider large networks with number of potential parameters being much bigger than the sample size. Interestingly, the depth (number of layers) of the neural network architectures plays an important role and our theory suggests that scaling the network depth with the logarithm of the sample size is natural. Related Links

INI 1  
12:30 to 13:30  Lunch @ Churchill College  
13:30 to 17:00  Free Afternoon 
09:00 to 10:00 
Jingjing Zou (University of Cambridge) Mixed Effects Model on Functional Manifolds / Sampling Directed Networks
I would like to talk about two projects.
Coauthors of Mixed Effects Model on Functional Manifolds: John Aston (University of Cambridge), Lexin Li (UC Berkeley)
We propose a generalized mixed effects model to study effects of subjectspecific covariates on geometric and functional features of the subjects' surfaces. Here the covariates include both timeinvariant covariates which affect both the geometric and functional features, and timevarying covariates which result in longitudinal changes in the functional textures. In addition, we extend the usual mixed effects model to model the covariance between a subject's geometric deformation and functional textures on the surface.
Coauthors of Sampling Directed Networks: Richard Davis (Columbia University), Gennady Samorodnitsky (Cornell University), ZhiLi Zhang (University of Minnesota).
We propose a sampling procedure for the nodes in a network with the goal of estimating uncommon population features of the entire network. Such features might include tail behavior of the indegree and outdegree distributions and as well as their joint distribution. Our procedure is based on selecting random initial nodes and then following the path of linked nodes in a structured fashion. In this procedure, targeted nodes with desired features, such as large indegree, will have a larger probability of being retained. In order to construct nearly unbiased estimates of the quantities of interest, weights associated with the sampled nodes must be calculated. We will illustrate this procedure and compare it with a sampling scheme based on multiple random walks on several data sets including webpage network data and Google+ social network data.

INI 1  
10:00 to 11:00 
Hao Chen (University of California, Davis) New twosample tests based on adjacency
Twosample tests for multivariate data and nonEuclidean data are widely used in many fields. We study a nonparametric testing procedure that utilizes graphs representing the similarity among observations. It can be applied to any data types as long as an informative similarity measure on the sample space can be defined. Existing tests based on a similarity graph lack power either for location or for scale alternatives. A new test is proposed that utilizes a common pattern overlooked previously, and it works for both types of alternatives. The test exhibits substantial power gains in simulation studies. Its asymptotic permutation null distribution is derived and shown to work well under finite samples, facilitating its application to large data sets. Another new test statistic will also be discussed that addresses the problem of the classic test of the type under unequal sample sizes. Both tests are illustrated on an application of comparing networks under different conditions.

INI 1  
11:00 to 11:30  Morning Coffee  
11:30 to 12:30 
Alexander Aue (University of California, Davis) Limiting spectral distributions for a class of highdimensional time series
This talk discusses extensions to the time series case of the MarcenkoPastur law on limiting spectral distributions (LSDs) for the eigenvalues of highdimensional sample covariance matrices. The main result will be on establishing a nonlinear integral equation characterizing the LSD in terms of its Stieltjes transform. Intuition will be presented for the simple case of a firstorder moving average time series and evidence will be provided, indicating the applicability of the result to problems involving to the estimation of certain quadratic forms as they arise, for example, when dealing with the Markowitz portfolio problem. The talk is based on joint work with Haoyang Liu (Florida State) and Debashis Paul (UC Davis).

INI 1  
12:30 to 13:30  Lunch @ Churchill College  
13:30 to 14:30 
Regina Liu (Rutgers, The State University of New Jersey) Fusion and Individualized Fusion Learning from Diverse Data Sources by Confidence Distribution
Inferences
from different data sources can often be fused together to yield more powerful
findings than those from individual sources alone. We present a new approach
for fusion learning by using the socalled confidence distributions (CD). We
further develop the individualized fusion learning, ‘iFusion’, for drawing
efficient individualized inference by fusing the leanings from relevant data sources.
This approach is robust for handling heterogeneity arising from diverse data
sources, and is ideally suited for goaldirected applications such as precision
medicine. In essence, iFusion strategically ‘borrows strength’ from relevant
individuals to improve efficiency while retaining its inference validity.
Computationally, the fusion approach here is parallel in nature and scales up
well in comparison with competing approaches. The performance of the approach
is demonstrated by simulation studies and risk valuation of aircraft
landing data.

INI 1  
14:30 to 15:30 
Debashis Paul (University of California, Davis) Spectral estimation for a class of highdimensional linear processes
We present results about the limiting behavior of the
empirical distribution of eigenvalues of weighted integrals of the sample
periodogram for a class of highdimensional linear processes. The processes
under consideration are characterized by having simultaneously diagonalizable
coefficient matrices.
We make use of these asymptotic results, derived under
the setting where the dimension and sample size are comparable, to formulate an
estimation strategy for the distribution of eigenvalues of the coefficients of
the linear process. This approach generalizes existing works on estimation of
the spectrum of an unknown covariance matrix for highdimensional i.i.d.
observations.
(Joint work with Jamshid Namdari and Alexander Aue) 
INI 1  
15:30 to 16:00  Afternoon Tea  
16:00 to 17:00 
Eardi Lila (University of Cambridge) Statistical Analysis of Functions on Surfaces, with an application to Medical Imaging
Coauthor: John Aston (University of Cambridge)
In Functional Data Analysis, data are commonly assumed to be smooth functions on a fixed interval of the real line. In this work, we introduce a comprehensive framework for the analysis of functional data, whose domain is a twodimensional manifold and the domain itself is subject to variability from sample to sample. We formulate a statistical model for such data, that we call Functions on Surfaces, which enables a joint representation of the geometric and functional aspects, and propose an associated estimation framework. We apply the proposed framework to the analysis of neuroimaging data of cortical thickness, acquired from the brains of different subjects, and thus lying on domains with different geometries. 
INI 1  
19:30 to 22:00  Formal Dinner at St John's College 
09:00 to 10:00 
Sara Anna van de Geer (ETH Zürich) A concentration interval for the Lasso
We consider the linear model
and the Lasso estimator. Our goal is to provide upper and lower bounds
for the prediction error that are close to each other.
We assume that the active components
of the vector of regression coefficients
are sufficiently large in absolute value (in a sense that will be specified)
and that the tuning parameter is suitably chosen.
The bounds depend on
socalled compatibility constants.
We will present the definition of the compatibility constants and discuss their relation with
restricted eigenvalues.
As an example, we consider the
the least squares estimator with total variation penalty
and present bounds with small gap.
For the case of random design, we assume that the rows of the design matrix are i.i.d.copies
of a Gaussian random vector. We assume that the largest
eigenvalue of the covariance matrix remains bounded and establish under some mild compatibility conditions upper and lower bounds with ratio tending to one.

INI 1  
10:00 to 11:00 
Miguel del alamo (GeorgAugustUniversität Göttingen) Multiscale Bounded Variation Regularization
Coauthors: Housen Li (University of Goettingen), Axel Munk (University of Goettingen) In nonparametric regression and inverse problems, variational methods based on bounded variation (BV) penalties are a wellknown and established tool for yielding edgepreserving reconstructions, which is a desirable feature in many applications. Despite its practical success, the theory behind BVregularization is poorly understood: most importantly, there is a lack of convergence guarantees in spatial dimension d\geq 2. In this talk we present a variational estimator that combines a BV penalty and a multiscale constraint, and prove that it converges to the truth at the optimal rate. Our theoretical analysis relies on a proper analysis of the multiscale constraint, which is motivated by the statistical properties of the noise, and relates in a natural way to certain Besov spaces of negative smoothness. Further, the main novelty of our approach is the use of refined interpolation inequalities between function spaces. We also illustrate the performance of these variational estimators in simulations on signals and images. 
INI 1  
11:00 to 11:30  Morning Coffee  
11:30 to 12:30 
Piotr Fryzlewicz (London School of Economics) Multiscale methods and recursion in data science
The talk starts on a general note: we first attempt to define a "multiscale" method / algorithm as a recursive program acting on a dataset in a suitable way. Wavelet transformations, unbalanced wavelet transformations and binary segmentation are all examples of multiscale methods in this sense. Using the example of binary segmentation, we illustrate the benefits of the recursive formulation of multiscale algorithms from the software implementation and theoretical tractability viewpoints. We then turn more specific and study the canonical problem of aposteriori detection of multiple changepoints in the mean of a piecewiseconstant signal observed with noise. Even in this simple setup, many publicly available stateoftheart methods struggle for certain classes of signals. In particular, this misperformance is observed in methods that work by minimising a "fit to the data plus a penalty" criterion, the reason being that it is challenging to think of a penalty that works well over a wide range of signal classes. To overcome this issue, we propose a new approach whereby methods learn from the data as they proceed, and, as a result, operate differently for different signal classes. As an example of this approach, we revisit our earlier changepoint detection algorithm, Wild Binary Segmentation, and make it dataadaptive by equipping it with a recursive mechanism for deciding "on the fly" how many subsamples of the input data to draw, and w here to draw them. This is in contrast to the original Wild Binary Segmentation, which is not recursive. We show that this significantly improves the algorithm particularly for signals with frequent changepoints. Related Links

INI 1  
12:30 to 13:30  Lunch @ Churchill College 