# Seminars (SCHW03)

Videos and presentation materials from other INI events are also available.

Search seminar archive

Event When Speaker Title Presentation Material
SCHW03 23rd June 2008
10:00 to 11:00
Variable selection in very high dimensional regression and classification
SCHW03 23rd June 2008
11:30 to 12:30
Dimension reduction

Ursula Gather joint work with Charlotte Guddat

Progress in computer science in the last decades has practically led to ’floods of data’ which can be stored and has to be handled to gain information of interest therein. As an example, consider data from the field of genetics where the dimension may increase to values up in the thousands. Classical statistical tools are not able to cope with this situation.

Hence, a number of dimension reduction procedures have been developed which may be applied when considering nonparametric regression procedures. The aim is to find a subspace of the predictor space which is of much lower dimension but still contains the important information on the relation between response and predictors.

We will review a number of procedures for dimension reduction (e.g. SIR, SAVE) in multiple regression and consider them under robustness aspects as well. As a special case we include methods for variable selection (e.g. EARTH, SIS) and introduce a new robust approach for the case when n is much smaller than p.

SCHW03 23rd June 2008
14:00 to 15:00
Stability - based regularisation

The properties of L1-penalized regression have been examined in detail in recent years. I will review some of the developments for sparse high-dimensional data, where the number of variables p is potentially very much larger than sample size n. The necessary conditions for convergence are less restrictive if looking for convergence in L2-norm than if looking for convergence in L0-quasi-norm. I will discuss some implications of these results. These promising theoretical developments notwithstanding, it is unfortunately often observed in practice that solutions are highly unstable. If running the same model selection procedure on a new set of samples, or indeed a subsample, results can change drastically. The choice of the proper regularization parameter is also not obvious in practice, especially if one is primarily interested in structure estimation and only secondarily in prediction. Some preliminary results suggest, though, that the stability or instability of results is informative when looking for suitable data-adaptive regularization.

SCHW03 23rd June 2008
15:30 to 16:30
T Cai Large-scale multiple testing: finding needles in a haystack

Due to advances in technology, it has become increasingly common in scientific investigations to collect vast amount of data with complex structures. Examples include microarray studies, fMRI analysis, and astronomical surveys. The analysis of these data sets poses many statistical challenges not present in smaller scale studies. In these studies, it is often required to test thousands and even millions of hypotheses simultaneously. Conventional multiple testing procedures are based on thresholding the ordered p-values. In this talk, we consider large-scale multiple testing from a compound decision theoretical point of view by treating it as a constrained optimization problem. The solution to this optimization problem yields an oracle procedure. A data-driven procedure is then constructed to mimic the performance of the oracle and is shown to be asymptotically optimal. In particular, the results show that, although p-value is appropriate for testing a single hypothesis, it fails to serve as the fundamental building block in large-scale multiple testing. Time permitting, I will also discuss simultaneous testing of grouped hypotheses.

This is joint work with Wenguang Sun (University of Pennsylvania).

SCHW03 24th June 2008
09:00 to 10:00
Fitting survival models with P>>n predictors: beyond proportional hazards

In a recent paper by Bovelstad et al. [1] partial likelihood ridge regression as used in [2] turned out to be the most successful approach to predicting survival with gene expression data.

However the proportional hazard model used in these models is quite simple and might not be realistic if there is a long survival follow-up. Exploring the fit of the model by using a cross-validated prognostic index leads to the conclusion that the effect of the predictor derived in [2] is neither linear nor constant over time.

We will discuss penalized reduced rank models as a way to obtain robust extensions of the Cox model for this type of data. For time varying effects the reduced rank model of [3] can be employed, while nonlinear effects can be introduced by means of bilinear terms. The predictive performance of such models can be regulated by penalization in combination with cross-validation.

References [1] Bovelstad, HM; Nygard, S; Storvold, HL; et al. Predicting survival from microarray data - a comparative study BIOINFORMATICS, 23 (16): 2080-2087 AUG 15 2007 [2] van Houwelingen, HC; Bruinsma, T; Hart, AAM; et al. Cross-validated Cox regression on microarray gene expression data STATISTICS IN MEDICINE, 25 (18): 3201-3216 SEP 30 2006 [3] Perperoglou, A; le Cessie, S; van Houwelingen, HC Reduced-rank hazard regression for modeling non-proportional hazards STATISTICS IN MEDICINE, 25 (16): 2831-2845 AUG 30 2006

SCHW03 24th June 2008
10:00 to 11:00
Model selection and estimation with multiple reproducing Karnel Hilbert spaces

In this talk, we consider the problem of learning a target function that belongs to the linear span of a large number of reproducing kernel Hilbert spaces. Such a problem arises naturally in many practice situations with the ANOVA, the additive model and multiple kernel learning as the most well known and important examples. We investigate approaches based on l1-type complexity regularization. We study the theoretical properties from both variable selection and estimation perspective. We establish several probabilistic inequalities providing bounds on the excess risk and L2-error that depend on the sparsity of the problem.

(part of the talk are based on joint work with Vladimir Koltchinskii.)

SCHW03 24th June 2008
11:30 to 12:30
A Tsybakov Sparsity oracle inequalities
The quality of solving several statistical problems, such as adaptive nonparametric estimation, aggregation of estimators, estimation under the sparsity scenario and weak learning can be assessed in terms of sparsity oracle inequalities (SOI) for the prediction risk. One of the challenges is to build estimators that attain the sharpest SOI under minimal assumptions on the dictionary. Methods of sparse estimation are mainly of the two types. Some of them, like the BIC, enjoy nice theoretical properties in terms of SOI without any assumption on the dictionary but are computationally infeasible starting from relatively modest dimensions p. Others, like the Lasso or the Dantzig selector, can be easily realized for very large p but their theoretical performance is conditioned by severe restrictions on the dictionary. We will focus on Sparse Exponential Weighting, a new method of sparse recovery realizing a compromise between theoretical properties and computational efficiency. The theoretical performance of the method in terms of SOI is comparable with that of the BIC. No assumption on the dictionary is required. At the same time, the method is computationally feasible for relatively large dimensions p. It is constructed using an exponential weighting with suitably chosen priors, and its analysis is based on the PAC-Bayesian ideas in statistical learning.
SCHW03 24th June 2008
14:00 to 14:20
The exchangeable graph model for statistical network analysis

Observations consisting of measurements on pairs of objects (or conditions) arise in a number of settings in the biological sciences (www.yeastgenome.org), with collections of scientific publications (www.jstor.org) and other hyper-linked resources (www.wikipedia.org), and in social networks (www.linkedin.com). Analyses of such data typically aim at identifying structure among the units of interest, in a low dimensional space, to support the generation of substantive hypotheses, to partially automate semantic categorization, to facilitate browsing, and to simplify complex data into useful patterns, more in general.

In this talk we introduce the exchangeable graph model and show its utility: 1. as a quantitative tool for exploring static/dynamic networks; 2. as a new paradigm for theoretical analyses of graph connectivity. Within this modeling context, we discuss alternative specifications and extensions that address fundamental issues in data analysis of complex interacting systems: bridging global and local phenomena, data integration, dynamics, and scalable inference.

SCHW03 24th June 2008
14:20 to 14:40
M West Data, models, inference and computation for dynamic cellular networks in systems biology

Advances in bioengineering technologies are generating the ability to measure increasingly high-resolution, dynamic data on complex cellular networks at multiple biological and temporal scales. Single-cell molecular studies, in which data is generated on the levels of expression of a small number of proteins within individual cells over time using time-lapse fluorescent microscopy, is one critical emerging area. Single cell experiments have potential to develop centrally in both mechanistic studies of natural biological systems as well as via synthetic biology -- the latter involving engineering of small cellular networks with well-defined function, so providing opportunity for controlled experimentation and bionetwork design. There is a substantial lag, however, in the ability to integrate, understand and utilize data generated from single-cell fluorescent microscopy studies. I will highlight aspects of this area from the perspective of our work in single cell studies in synthetic bacterial systems that emulate key aspects of mammalian gene networks central to all human cancers. I will touch on:

(a) DATA: Raw data come as movies of colonies of cells developing through time, with a need for imaging methods to estimate cell-specific levels of fluorescence measuring mRNA levels of one or several tagged genes within each cell. This is complicated by the progression of cells through multiple cell divisions that raises questions of tracking the lineages of individual cells over time.

(b) MODELS: In the context of our synthetic gene networks engineered into bacterial cells, we have developed discrete-time statistical dynamic models inspired by basic biochemical network modelling of the stochastic regulatory gene network. These models allow the incorporation of multiple components of noise that is "intrinsic" to biological networks as well as approximation and measurement errors, and provide the opportunity to formally evaluate the capacity of single cell data to inform on biochemical parameters and "recover" network structure in contexts of contaminating noise.

(c) INFERENCE & COMPUTATION: Our approaches to model fitting have developed Bayesian methods for inference in non-linear time series. This involves MCMC methods that impute parameter values coupled with novel, effective Metropolis methods for what can be very high-dimensional latent states representing the unobserved levels of mRNA or proteins on nodes in the network as well as contributions from "missing" nodes.

This work is collaborative with Jarad Niemi and Quanli wang (Statistical Science at Duke), Lingchong You and Chee-Meng Tan (Bioengineering at Duke).

SCHW03 24th June 2008
14:40 to 15:00
Statistical network analysis and inference: methods and applications

Exploring the statistical properties and hidden characteristics of network entities, and the stochastic processes behind temporal evolution of network topologies, are essential for computational knowledge discovery and prediction based on network data from biology, social sciences and various other fields. In this talk, I first discuss a hierarchical Bayesian framework that combines the mixed membership model and the stochastic blockmodel for inferring latent multi-facet roles of nodes in networks, and for estimating stochastic relationships (i.e., cooperativeness or antagonisms) between roles. Then I discuss a new formalism for modeling network evolution over time based on temporal exponential random graphs (TERGM), and a MCMC algorithm for posterior inference of the latent time-specific networks. The proposed methodology makes it possible to reverse-engineer the latent sequence of temporally rewiring networks given longitudinal measurements of node attributes, such as intensities of gene expressions or social metrics of actors, even when a single snapshot of such measurement resulted from each (time-specific) network is available.

Joint with Edo Airoldi, Dave Blei, Steve Fienberg, Fan Guo and Steve Hanneke

SCHW03 24th June 2008
15:30 to 16:30
High dimensional inference in bioinformatics and genomics

Bioinformatics came to the scene when biology started to automate its experiments. Although this would have led to large n and small p situations in other sciences, the complex nature of biology meant that it soon started to focus on lots of different variables, resulting in now well-known small n, large p situations. One such case is the inference of regulatory networks: the amount of networks is exponential in the number of nodes, whereas the available data is typically just a fraction thereof. We will present a penalized inference method that deals with such problems, that draws on experience with hypothesis testing. It has similarities with Approximate Bayesian Computation and seems to lead to exact inference in a few specific cases.

SCHW03 24th June 2008
16:30 to 17:30
Liquid association for large scale gene expression and network studies

The fast-growing public repertoire of microarray gene expression databases provides individual investigators with unprecedented opportunities to study transcriptional activities for genes of their research interest at no additional cost. Methods such as hierarchical clustering, principal component analysis, gene network and others, have been widely used. They offer biologists valuable genome-wide portraits of how genes are co-regulated in groups. Such approaches have a limitation because it often turns out that the majority of genes do not fall into the detected gene clusters. If one has a gene of primary interest in mind and cannot find any nearby clusters, what additional analysis can be conducted? In this talk, I will show how to address this issue via the statistical notion of liquid association. An online biodata mining system is developed in my lab for aiding biologists to distil information from a web of aggregated genomic knowledgebase and data sources at multi-levels, including gene ontology, protein complexes, genetic markers, drug sensitivity. The computational issue of liquid association and the challenges faced in the context of high p low n problems will be addressed.

SCHW03 25th June 2008
09:00 to 10:00
R Tibshirani The Lasso: some novel algorithms and applications

I will discuss some procedures for modelling high-dimensional data, based on L1 (lasso) -style penalties. I will describe pathwise coordinate descent algorithms for the lasso, which are remarkably fast and facilitate application of the methods to very large datasets for the first time. I will then give examples of new applications of the methods to microarray classification, undirected graphical models for cell pathways, and the fused lasso for signal detection, including comparative genomic hybridization.

SCHW03 25th June 2008
10:00 to 11:00
Sparsity in machine Learning: approaches and analyses
SCHW03 25th June 2008
11:30 to 12:30
A Owen Transposably invariant sample reuse: the pigeonhole bootstrap and blockwise cross-validation

Sample reuse methods like the bootstrap and cross-validation are widely used in statistics and machine learning. They provide measures of accuracy with some face value validity that is not dependent on strong model assumptions.

These methods depend on repeating or omitting cases, while keeping all the variables in those cases. But for many data sets, it is not obvious whether the rows are cases and colunns are variables, or vice versa. For example, with movie ratings organized by movie and customer, both movie and customer IDs can be thought of as variables.

This talk looks at bootstrap and cross-validation methods that treat rows and columns of the matrix symmetrically. We get the same answer on X as on X'. McCullagh has proved that no exact bootstrap exists in a certain framework of this type (crossed random effects). We show that a method based on resampling both rows and columns of the data matrix tracks the true error, for some simple statistics applied to large data matrices.

Similarly we look at a method of cross-validation that leaves out blocks of the data matrix, generalizing a proposal due to Gabriel that is used in the crop science literature. We find empirically that this approach provides a good way to choose the number of terms in a truncated SVD model or a non-negative matrix factorization. We also apply some recent results in random matrix theory to the truncated SVD case.

SCHW03 26th June 2008
09:00 to 10:00
J-L Wang Covariate adjusted functional principal component analysis for longitudinal data

Classical multivariate principal component analysis has been extended to functional data and termed Functional principal component analysis (FPCA). Much progress has been made but most existing FPCA approaches do not accommodate covariate information, and it is the goal of this talk to develop alternative approaches to incorporate covariate information in FPCA, especially for irregular or sparse functional data. Two approaches are studied: the first incorporates covariate effects only through the mean response function, but the second approach adjusts the covariate effects for both the mean and covariance functions of the response. Both new approaches can accommodate measurement errors and allow data to be sampled at regular or irregular time grids. Asymptotic results are developed and numerical support provided through simulations and a data example. A comparison of the two approaches will also be discussed.

SCHW03 26th June 2008
10:00 to 11:00
Penalized empirical risk minimization and sparse recovery problems

A number of problems in regression and classification can be stated as penalized empirical risk minimization over a linear span or a convex hull of a given dictionary with convex loss and convex complexity penalty, such as, for instance, $\ell_1$-norm. We will discuss several oracle inequalities showing how the error of the solution of such problems depends on the "sparsity" of the problem and the "geometry" of the dictionary.

SCHW03 26th June 2008
11:30 to 12:30
The Nystrom extension and spectral methods in learning: low-rank approximation of quadratic forms and products

Spectral methods are of fundamental importance in statistics and machine learning, as they underlie algorithms from classical principal components analysis to more recent approaches that exploit manifold structure. In most cases, the core technical problem can be reduced to computing a low-rank approximation to a positive-definite kernel. Motivated by such applications, we present here two new algorithms for the approximation of positive semi-definite kernels, together with error bounds that improve upon known results. The first of thesebased on samplingleads to a randomized algorithm whereupon the kernel induces a probability distribution on its set of partitions, whereas the latter approachbased on sortingprovides for the selection of a partition in a deterministic way. After detailing their numerical implementation and verifying performance via simulation results for representative problems in statistical data analysis, we conclude with an extension of these results to the sparse representation of linear operators and the efficient approximation of matrix products.

SCHW03 26th June 2008
14:00 to 14:20
Limiting theorems for large dimensional sample means, sample covariance matrices and Hotelling's T2 statistics

It is well known that sample means and sample covariance matrices are independent if the samples are from the Gaussian distribution and are i.i.d.. In this talk, via investigating the random quardratic forms involving sample means and sample covariance matrices, we suggest the conjecture that the sample means and the sample covariance matrices under general distribution functions are asymptotically independent in the large dimensional case when the dimension of the vectors and the sample size both go to infinity with their ratio being a positive constant. As a byproduct, the central limit theorem for the Hotelling $T^2$ statistic under the large dimensional case is established.

SCHW03 26th June 2008
14:20 to 14:40
JQ Shi Generalised gaussian process functional regression model

In this talk, I will discuss a functional regression problem with non-Gaussian functional (longitudinal) response with functional predictors. This type of problem includes for example binomial and Poisson response data, occurring in many bi-medical and engineering experiments. We proposed a generalised Gaussian process functional regression model for such regression situation. We suppose that there exists an underlying latent process between the inputs and the response. The latent process is defined by Gaussian process functional regression model, which is connected with stepwise response data by means of a link function.

SCHW03 26th June 2008
14:40 to 15:00
Estimation of large volatility matrix for high-frequency financial data

Statistical theory for estimating large covariance matrix shows that even for noiseless synchronized high-frequency financial data, the existing realized volatility based estimators of integrated volatility matrix of p assets are inconsistent, for large p (the number of assets and large n (the sample size for high-frequency data). This paper proposes new types of estimators of integrated volatility matrix for noisy non-synchronized high-frequency data. We show that when both n and p go to infinity with p/n approaching to a constant, the proposed estimators are consistent with good convergence rates. Our simulations demonstrate the excellent performance of the proposed estimators under complex stochastic volatility matrices. We have applied the methods to high-frequency data with over 600 stocks.

SCHW03 26th June 2008
15:30 to 16:30
Graph decomposition for community identification and covariance constraints

An application in large databases is to find well-connected clusters of nodes in an undirected graph where a link represents interaction between objects. For example, finding tight-knit communities in social networks, identifying related product-clusters in collaborative filtering, finding genes which collaborate in different biological functions. Unlike graph-partitioning, in this scenario an object may belong to more than one community -- for example, a person might belong to more than one group of friends, or a gene may be active in more than one gene-network. I'll discuss an approach to identifying such overlapping communities based on extending the incidence matrix decomposition of a graph to a clique-decomposition. Clusters are then identified by approximate variational (mean-field) inference in a related probabilistic model. The resulting decomposition has the side-effect of enabling a parameteristion of positive definite matrices under zero-constraints on entries in the matrix. Provided the graph corresponding to the constraints is decomposable all such matrices are reachable by this parameterisation. In the non-decomposable case, we show how the method forms an approximation of the space and relate it to more standard latent variable parameterisations of zero-constrained covariances.

SCHW03 26th June 2008
16:30 to 17:30
Permutation-invariant covariance regularisation in high dimensions

Estimation of covariance matrices has a number of applications, including principal component analysis, classification by discriminant analysis, and inferring independence and conditional independence between variables, and the sample covariance matrix has many undesirable features in high dimensions unless regularized. Recent research mostly focused on regularization in situations where variables have a natural ordering. When no such ordering exists, regularization must be performed in a way that is invariant under variable permutations. This talk will discuss several new sparse covariance estimators that are invariant to variable permutations. We obtain convergence rates that make explicit the trade-offs between the dimension, the sample size, and the sparsity of the true model, and illustrate the methods on simulations and real data. We will also discuss a method for finding a "good" ordering of the variables when it is not provided, based on the Isomap, a manifold projection algorithm.

The talk includes joint work with Adam Rothman, Amy Wagaman, Ji Zhu (University of Michigan) and Peter Bickel (UC Berkeley).

SCHW03 27th June 2008
09:00 to 09:20
Optimal prediction from relevant components

In Helland (1990) the partial least squares regression model was formulated in terms of an algorithm on the parameters of the model. A version of this parametric algorithm has recently been used by several authors in connection to determining the central subspace and the central mean subspace of sufficient model reduction, as a method where matrix inversion is avoided. A crucial feature of the parametric PLS model is that the algorithm stops after m steps, where m is the number of relevant components. The corresponding sample algorithm will not usually stop after m steps, implying the the ordinary PLS estimates fall outside the parameter space, and thus cannot be maximally efficient.

We approach this problem using group theory. The X-covariance matrix is endowed with a rotation group, and in addition the regression coefficients upon the X-principal components are endowed with scale groups. This gives a transitive group on each subspace corresponding to m relevant components; more precisely, these subspaces give the orbits of the group. The ordinary PLS predictor is equivariant under this group. It is a known fact that in such situations the best equivariant estimator is equal to the Bayes estimator when the prior is taken as the invariant measure of the group. This Bayes estimator is found by a MCMC method, and is verified to be better than the ordinary PLS predictor.

SCHW03 27th June 2008
09:20 to 09:40
Dimension selection with independent component analysis and its application to prediction

We consider the problem of selecting the best or most informative dimension for dimension reduction and feature extraction in high-dimensional data. We review current methods, and propose a dimension selector based on Independent Component Analysis which finds the most non-Gaussian lower-dimensional directions in the data. A criterion for choosing the optimal dimension is based on bias-adjusted skewness and kurtosis. We show how this dimension selector can be applied in supervised learning with independent components, both in a regression and classification framework.

SCHW03 27th June 2008
09:40 to 10:00
L Li Model free variable selection via sufficient dimension reduction

Sufficient dimension reduction (SDR) has proven effective to transform high dimensional problems to low dimensional projections, while losing no regression information and pre-specifying no parametric model during the phase of dimension reduction. However, existing SDR methods suffer from the fact that each dimension reduction component is a linear combination of all the original predictors, and thus can not perform variable selection. In this talk, we propose a regularized SDR estimation strategy, which is capable of simultaneous dimension reduction and variable selection. We demonstrate that the new estimator achieves consistency in variable selection without requiring any traditional model, meanwhile retaining root-n estimation consistency of the dimension reduction basis. Both simulation studies and real data analyses are reported.

SCHW03 27th June 2008
10:00 to 11:00
Estimation of nonlinear functionals: recent results and open problems

Abstract: We present a theory of point and interval estimation for nonlinear functionals in parametric, semi-, and non-parametric models based on higher order influence functions. The theory reproduces many previous results, produces new non-root n results, and opens up the ability to perform optimal non-root n inference in complex high dimensional models. We present novel rate-optimal point and intervals estimators for various functionals of central importance to biostatistics in settings in which estimation at the expected root n rate is not possible, owing to the curse of dimensionality. We also show that our higher order influence functions have a multi-robustness property that extends the double robustness property of first order influence functions. Open questions will be discussed

SCHW03 27th June 2008
11:30 to 12:30
Applications of approximate inference and experimental design for sparse (generalised) linear models

Sparsity, or more general sub-Gaussianity, is a fundamental regularization principle for high-dimensional statistics. A recent surge of activity has clarified the behaviour of efficient sparse estimators in the worst case, but much less is known about practically efficient approximations to Bayesian inference, which is required for higher-level tasks such as experimental design.

We present an efficient framework for Bayesian inference on generalized linear models with sparsity priors, based on the expectation propagation algorithm, a deterministic variational approximation. We highlight some applications where this framework produces promising results. We hope to convey the relevance of approximate inference methods in practice, which substantially go beyond point estimation, yet whose theoretical properties and algorithmic scalability remains insufficiently understood.

SCHW03 27th June 2008
14:00 to 15:00
Statistics in astronomy: the Taiwanese-American occultation survey

More than a thousand small planetary bodies with radii >100 km have recently been detected beyond Neptune using large telescopes. The purpose of the TAOS project is to measure directly the number of these Kuiper Belt Objects (KBOs) down to the typical size of cometary nuclei (a few km). When a KBO moves in between the earth and a distant star it will block the starlight momentarily, for about a quarter of a second. A telescope monitoring the starlight will thus see it blinking. Three small (20 inch) dedicated robotic telescopes equipped with 2,048 x 2,048 CCD cameras are operated in a coincidence so that the sequence and timing of the three separate blinks can be used to distinguish real events from false alarms. A fourth telescope will be added soon. TAOS will increase our knowledge about the Kuiper Belt, the home of most short period comets that return to the inner solar system every few years. This knowledge will help us to understand the formation and evolution of comets in the early solar system as well as to estimate their flux of impacting our home planet.

In this talk I will describe some of the statistical challenges that arise when hundreds or thousands of stars are simultaneously monitored every quarter of a second, every night of the year on which observation is possible, with the aim of detecting a few events. TAOS will produce a databank of the order of 10 terabytes per year, which is small by the standards of recent and future astronomical surveys. My intent in this talk is not to provide definitive methods of analysis but, rather, I hope that this concrete example of high dimensional non-Gaussian data informs the discussion of future directions in high dimensional data analysis to which this meeting is devoted.