skip to content
 

Seminars (SCH)

Videos and presentation materials from other INI events are also available.

Search seminar archive

Event When Speaker Title Presentation Material
SCHW01 7th January 2008
10:00 to 11:00
Breakdown point of model selection when the number of variables exceeds the number of observations
SCHW01 7th January 2008
11:30 to 12:30
The deterministic lasso

We study high-dimensional generalized linear models. and risk minimization using the Lasso. The risk is taken under a random probability measure P' and the target is an overall minimizer of the risk under some other nonrandom probability measure P. We restrict ourselves to a set S where P' and P are close to each other, and present an oracle inequality under a so-called compatibility condition between the L_2 norm and l_1 norm.

SCHW01 7th January 2008
14:00 to 15:00
Methods for visualizing high dimensional data

In this presentation, we review some fundamentals of visualization and then proceed to describe methods and combinations of methods useful for visualizing high dimensional data. Some methods include parallel coordinates, smooth interpolations of parallel coordinates, grand tours including wrapping tours, fractal tours, pseudo-grand tours, and pixel tours.

SCHW01 7th January 2008
15:30 to 16:30
A Young Bootstrap and parametric inference: successes and challenges

We review parametric frequentist inference as it has developed over the last 25 years or so. Two main strands have emerged: analytic procedures based on small-sample asymptotics and simulation (bootstrap) approaches. We argue that the latter yield, with appropriate handling of nuisance parameters, a simple and flexible methodology, yet one which nevertheless retains the finer inferential components of parametric theory in an automatic fashion. Performance of the bootstrap methods, even in problems with high-dimensional parameters but small data sample sizes, points in favour of their being the method of choice in complex settings, such as those motivating this programme.

Related Links

SCHW01 8th January 2008
09:00 to 10:00
Practical and information-theoretic limitations in high-dimensional inference

This talk considers questions of two types concerning high-dimensional inference. First, given a practical (polynomial-time) algorithm, what are the limits of its performance? Second, how do such practical limitations compare to information-theoretic bounds, which apply to the performance of any algorithm regardless of computational complexity?

We analyze these issues in high-dimensional versions of two canonical inference problems: (a) support recovery in sparse regression; and (b) the sparse PCA or eigenvector problem. For the sparse regression problem, we describe a sharp threshold on the sample size n that controls success/failure of \ell_1 constrained quadratic programming (the Lasso), as function of the problem size p, and sparsity index k (number of non-zero entries). Using information-theoretic methods, we prove that the Lasso is order-optimal for sublinear sparsity (vanishing k/p), but sub-optimal for linear sparsity (k/p bounded away from zero). For the sparse eigenvector problem, we analyze a semidefinite programming relaxation due to Aspremont et al., and establish a similar transition in failure/success for triplets (n,p,k) tending to infinity.

Based on joint works with Arash Amini, John Lafferty, and Pradeep Ravikumar.

SCHW01 8th January 2008
10:00 to 11:00
Some thoughts on nonparametric classification: nearest neighbours, bagging and max likelihood estimation of shape-constrained densities

The $k$-nearest neighbour rule is arguably the simplest and most intuitively appealing nonparametric classifier. We will discuss recent results on the optimal choice of $k$ in situations where the underlying populations have densities with a certain smoothness in $\mathbb{R}^d$. Extensions to the bagged nearest neighbour classifier, which can be regarded as a weighted $k$-nearest neighbour classifier, are also possible, and yield a somewhat suprising comparsion with the unweighted case.

Another possibility for nonparametric classification is based on estimating the underlying densities explicitly. An attractive alternative to kernel methods is based on the maximum likelihood estimator, which can be shown to exist if the densities satisfy certain shape constraints, such as log-concavity. We will also discuss an algorithm for computing the estimator in this case, which results in a classifier that is fully automatic yet still nonparametric.

Related Links

SCHW01 8th January 2008
11:30 to 12:30
RD Cook Model-based sufficient dimension reduction for regression

Dimension reduction in regression, represented primarily by principal components, is ubiquitous in the applied sciences. This is an old idea that has moved to a position of prominence in recent years because technological advances now allow scientists to routinely formulate regressions in which the number p of predictors is considerably larger than in the past. Although "large" p regressions are perhaps mainly responsible for renewed interest, dimension reduction methodology can be useful regardless of the size of p.

Starting with a little history and a definition of "sufficient reductions", we will consider a variety of models for dimension reduction in regression. The models start from one in which maximum likelihood estimation produces principal components, step along a few incremental expansions, and end with forms that have the potential to improve on some standard methodology. This development provides remedies for two concerns that have dogged principal components in regression: principal components are typically computed from the predictors alone and then do not make apparent use of the response, and they are not equivariant under full rank linear transformation of the predictors.

Related Links

SCHW01 8th January 2008
14:00 to 15:00
Kernel-based contrast functions for sufficient dimension reduction

We present a new methodology for sufficient dimension reduction (the problem of finding a subspace $S$ such that the projection of the covariate vector $X$ onto $S$ captures the statistical dependency of the response $Y$ on $X$). Our methodology derives directly from a formulation of sufficient dimension reduction in terms of the conditional independence of the covariate $X$ from the response $Y$, given the projection of $X$ on the central subspace (cf. Li, 1991; Cook, 1998). We show that this conditional independence assertion can be characterized in terms of conditional covariance operators on reproducing kernel Hilbert spaces and we show how this characterization leads to an M-estimator for the central subspace. The resulting estimator is shown to be consistent under weak conditions; in particular, we do not have to impose linearity or ellipticity conditions of the kinds that are generally invoked for SDR methods. We also present empirical results showing that the new methodology is competitive in practice.

Related Links

SCHW01 8th January 2008
15:30 to 16:30
J Fan Challenge of dimensionality in model selection and classification

Model selection and classification using high-dimensional features arise frequently in many contemporary statistical studies such as tumor classification using microarray or other high-throughput data. The impact of dimensionality on classifications is largely poorly understood. We first demonstrate that even for the independence classification rule, classification using all the features can be as bad as the random guessing due to noise accumulation in estimating population centroids in high-dimensional feature space. In fact, we demonstrate further that almost all linear discriminants can perform as bad as the random guessing. Thus, it is paramountly important to select a subset of important features for high-dimensional classification, resulting in Features Annealed Independence Rules (FAIR). The connections with the sure independent screeing (SIS) and iterative SIS(ISIS) of Fan and Lv (2007) in model selection will be elucidated and extended. The choice of the optimal number of features, or equivalently, the threshold value of the test statistics are proposed based on an upper bound of the classification error. Simulation studies and real data analysis support our theoretical results and demonstrate convincingly the advantage of our new classification procedure.

Related Links

SCHW01 8th January 2008
16:30 to 17:30
P Bickel Regularised estimation of high dimensional covariance matrices

Abstract: We review ,with examples, various important parameters depending on the population covariance matrix such as inverses and eigenstructures , and the uses they are put to.We give a brief discussion of well known pathologies of the empirical covariance matrix in various applications when the data is high dimensional which imply inconsistency of "plug-in"estimates of the parameters mentioned. We introduce different notions of sparsity of such matrices and show how some of these are intimately related. We then review a number of methods taking advantage of such sparsity in the population matrices .In particular we state results with various collaborators, particularly E. Levina establishing rates of convergence of our estimates of parameters as above ,as dimension and sample size tend to oo, that are uniform over large classes of sparse population covariance matrices . We conclude with some simulations , a data analysis supporting the asymptotics, and a discussion of future directions.

Related Links

SCHW01 9th January 2008
09:00 to 10:00
F Murtagh The ultrametric topology perspective on analysis of massive, very high dimensional data stores

An ultrametric topology formalizes the notion of hierarchical structure. An ultrametric embedding, referred to here as ultrametricity, is implied by a hierarchical embedding. Such hierarchical structure can be global in the data set, or local. By quantifying extent or degree of ultrametricity in a data set, we show that ultrametricity becomes pervasive as dimensionality and/or spatial sparsity increases. This leads us to assert that very high dimensional data are of simple structure. We exemplify this finding through a range of simulated data cases. We discuss also application to very high frequency time series segmentation and modeling. Other applications will be described, in particular in the area of textual data mining.

References

[1] F. Murtagh, On ultrametricity, data coding, and computation, Journal of Classification, 21, 167-184, 2004.

[2] F. Murtagh, G. Downs and P. Contreras, "Hierarchical clustering of massive, high dimensional data sets by exploiting ultrametric embedding", SIAM Journal on Scientific Computing, in press, 2007.

[3] F. Murtagh, The remarkable simplicity of very high dimensional data: application of model-based clustering, submitted, 2007.

[4] F. Murtagh, Symmetry in data mining and analysis: a unifying view based on hierarchy, submitted, 2007.

Related Links

SCHW01 9th January 2008
10:00 to 11:00
L Duembgen P-values for computer-intensive classifiers

In the first part of the talk presents p-values for classification in general. The latter are an interesting alternative to classifiers or posterior distributions of class labels. Their purpose is to quantify uncertainty when classifying a single observation, even if we don't have information on the prior distribution of class labels.

After illustrating this concept with some examples and procedures, we focus on computational issues and discuss p-values involving regularization, in particular, LASSO type penalties, to cope with high-dimensional data.

(Part of this talk is based on joint work with Axel Munk, Goettingen, and Bernd-Wolfgang Igl, Luebeck.)

Related Links

SCHW01 9th January 2008
11:30 to 12:30
W Stuetzle Nonparametric cluster analysis: estimating the cluster tree of a density

The general goal of clustering is to identify distinct groups in a collection of objects. To cast clustering as a statistical problem we regard the feature vectors characterizing the objects as a sample from some unknown probability density. The premise of nonparametric clustering is that groups correspond to modes of this density. The cluster tree summarizes the connectivity structure of the level sets of a density; leaves of the tree correspond to modes of the density. I will define the cluster tree, present methods for its estimating, show examples, and discuss some open problems.

Related Links

SCHW01 9th January 2008
14:00 to 15:00
M West Sparsity modelling in large-scale dynamic models for portfolio analysis

I will discuss some of our recent work in dynamic modelling for multivariate time series that combines stochastic volatility and graphical modelling ideas. I will describe the modelling ideas and resulting matrix-variate, dynamic graphical models, and aspects of Bayesian methodology and computation for model fitting and structure search. Practical implications of the framework when applied to financial time series for predictive portfolio analysis will highlight some of the reasons for interest in sparsely structured, conditional independence models of volatility matrices.

SCHW01 9th January 2008
15:30 to 16:30
Computationally tractable statistical estimation when there are more variables than observations

We consider the fundamental problem of estimating the mean of a vector y = X beta + z, where X is an n by p design matrix in which one can have far more variables than observations and z is a stochastic error term---the so-called `p > n' setup. When \beta is sparse, or more generally, when there is a sparse subset of covariates providing a close approximation to the unknown mean response, we ask whether or not it is possible to accurately estimate the mean using a computationally tractable algorithm.

We show that in a surprisingly wide range of situations, the lasso happens to nearly select the best subset of variables. Quantitatively speaking, we prove that solving a simple quadratic program achieves a squared error within a logarithmic factor of the ideal mean squared error one would achieve with an oracle supplying perfect information about which variables should be included in the model and which variables should not. Interestingly, our results describe the average performance of the lasso; that is, the performance one can expect in an overwhelming majority of cases where X\beta is a sparse or nearly sparse superposition of variables, but not in all cases.

Our results are sharp, nonasymptotic and widely applicable since they simply require that pairs of predictor variables be not overly collinear.

SCHW01 9th January 2008
16:30 to 17:30
Learning in high dimensions, noise, sparsity and treelets

In recent years there is growing practical need to perform learning (classification,regression, etc) in high dimensional settings where p>>n. Consequently instead of the standard limit $n\to\infty$, learning algorithms are typically analyzed in the joint limit $p,n\to\infty$. In this talk we present a different approach, that keeps $p,n$ fixed, but considers noise as a small parameter. This resulting perturbation analysis reveals the importance of a robust low dimensional representation of the noise-free signals, the possible failure of simple variable selection methods and the key role of sparsity for the success of learning in high dimensions. We also discuss sparsity in a-priori unknown basis and a possible data-driven adaptive construction of such basis, called treelets. We present a few applications of our analysis, mainly to error-in-variables linear regression problems, principal component analysis, and rank determination.

Related Links

SCHW01 10th January 2008
09:00 to 10:00
Estimating a response parameter in missing data models with high-dimensional covariates

We discuss a new method of estimation of parameters in semiparametric and nonparametric models. The method is based on estimating equations that are $U$-statistics in the observations. The $U$-statistics are based on higher order influence functions that extend ordinary linear influence functions of the parameter of interest, and represent higher derivatives of this parameter. For parameters for which the matching cannot be perfect the method leads to a bias-variance trade-off, and results in estimators that converge at a slower than root-n-rate. In a number of examples the resulting rate can be shown to be optimal. We are particularly interested in estimating parameters in models with a nuisance parameter of high dimension or low regularity, where the parameter of interest cannot be estimated at root-n-rate.

SCHW01 10th January 2008
10:00 to 11:00
Persistence: alternative proofs of some results of Greenshtein and Ritov
SCHW01 10th January 2008
11:30 to 12:30
Looking at models in high-dimensional data spaces

What do the fishing net models of self-organizing maps look like in the data space? How do the estimated mean vectors and variance-covariance ellipses from model-based clustering fit to the clusters? How does small n, large p affect the variability in the esimates of the separating hyperplane from support vector machine models? These are a few of the things that we may discuss in this talk. The goal is to calibrate participants' eyes to viewing high-dimensional spaces and stimulate thought about what types of plots might accompany high-dimensional statistical analysis

SCHW01 10th January 2008
14:00 to 15:00
The surprising structure of Gaussian point clouds and its implications for signal processing

We will explore connections between the structure of high-dimensional convex polytopes and information acquisition for compressible signals. A classical result in the field of convex polytopes is that if N points are distributed Gaussian i.i.d. at random in dimension n<<N, then only order (log N)^n of the points are vertices of their convex hull. Recent results show that provided n grows slowly with N, then with high probability all of the points are vertices of its convex hull. More surprisingly, a rich "neighborliness" structure emerges in the faces of the convex hull. One implication of this phenomenon is that an N-vector with k non-zeros can be recovered computationally efficiently from only n random projections with n=2e k log(N/n). Alternatively, the best k-term approximation of a signal in any basis can be recovered from 2e k log(N/n) non-adaptive measurements, which is within a log factor of the optimal rate achievable for adaptive sampling. Additional implications for randomized error correcting codes will be presented.

Related Links

SCHW01 10th January 2008
15:30 to 16:30
Finding low-dimensional structure in high-dimensional data

In high-dimensional data analysis, one is often faced with the problem that real data is noisy and in many cases given in coordinates that are not informative for understanding the data structure itself or for performing later tasks, such as clustering, classification and regression. The combination of noise and high dimensions (>100-1000) presents challenges for data analysis and calls for efficient dimensionality reduction tools that take the inherent geometry of natural data into account. In this talk, I will first describe treelets – an adaptive multi-scale basis inspired by wavelets and hierarchical trees. I will then, in the second half of my talk, describe diffusion maps -- a general framework for dimensionality reduction, data set parameterization and clustering that combines ideas from eigenmaps, spectral graph theory and harmonic analysis. Our construction is based on a Markov random walk on the data, and allows one to define a system of coordinates that is robust to noise, and that reflects the intrinsic geometry or connectivity of the data points in a diffusion process. I will outline where we stand and what problems still remain.

(Part of this work is joint with R.R. Coifman, S. Lafon, B. Nadler and L. Wasserman)

SCHW01 10th January 2008
16:30 to 17:30
P Niyogi A geometric perspective on learning theory and algorithms

Increasingly, we face machine learning problems in very high dimensional spaces. We proceed with the intuition that although natural data lives in very high dimensions, they have relatively few degrees of freedom. One way to formalize this intuition is to model the data as lying on or near a low dimensional manifold embedded in the high dimensional space. This point of view leads to a new class of algorithms that are "manifold motivated" and a new set of theoretical questions that surround their analysis. A central construction in these algorithms is a graph or simplicial complex that is data-derived and we will relate the geometry of these to the geometry of the underlying manifold. Applications to embedding, clustering, classification, and semi-supervised learning will be considered.

SCHW01 11th January 2008
09:00 to 10:00
High-dimensional variable selection and graphs: sparsity, faithfulness and stability

Over the last few years, substantial progress has been achieved on high-dimensional variable selection (and graphical modeling) using L1-penalization methods. Diametrically opposed to penalty-based schemes is the PC-algorithm, a special hierarchical multiple testing procedure, which exploits the so-called faithfulness assumption from graphical modeling. For asymptotic consistency in high-dimensional settings, the different approaches require very different "coherence" conditions, say for the design matrix in a linear model. From a conceptual aspect, the PC-algorithm allows to identify not only regression-type associations but also directed edges in a graph and causal effects (in the sense of Pearl's intervention operator). Thereby, sparsity, faithfulness and stability play a crucial role. We will discuss potential and limitations from a theory and practical point of view.

Related Links

SCHW01 11th January 2008
10:00 to 11:00
Time series regression with semiparametric factor dynamics

High-dimensional regression problems which reveal dynamic behavior are typically analyzed by time propagation of a few number of factors. The inference on the whole system is then based on the low-dimensional time series analysis. Such high-dimensional problems occur frequently in many different fields of science. In this paper we address the problem of inference when the factors and factor loadings are estimated by semiparametric methods. This more flexible modelling approach poses an important question: Is it justified, from inferential point of view, to base statistical inference on the estimated times series factors? We show that the difference of the inference based on the estimated time series and `true' unobserved time series is asymptotically negligible. Our results justify fitting vector autoregressive processes to the estimated factors, which allows one to study the dynamics of the whole high-dimensional system with a low-dimensional representation. The talk reports on joint projects with Szymon Borak, Wolfgang H\"ardle, Jens Perch Nielsen and Byeong U. Park

SCHW01 11th January 2008
11:30 to 12:30
B Y Yu Using side information for prediction

Extracting useful information from high-dimensional data is the focus of today's statistical research and practice. Penalized loss function minimization has been shown to be effective for this task both theoretically and empirically. With the vitues of both regularization and sparsity, the L1-penalized L2 minimization method Lasso has been popular. However, Lasso is often seen as not having enough regularization in the large p case.

In this talk, we propose two methods that take into account side information in the penalized L2 framework, in order to bring the needed extra regularization in the large p case. First, we combine different norms including L1 to to introduce the Composite Absolute Penalties (CAP) family. CAP allows the grouping and hierarchical relationships between the predictors to be expressed. It covers and goes beyond existing works including grouped lasso and elastic nets. Path following algorithms and simulation results will be presented to compare with Lasso in terms of prediction and sparsity. Second, motivated by the problem of predicting fMRI signals from input natural images, we investigate a method that uses side information in the unlabeled data for prediction. We present a theoretical result in the case of p/n -> constant and apply the method to the fMRI data problem. (It is noted that the second part is a report on on-going research.) ~

Related Links

SCHW01 11th January 2008
14:00 to 15:00
A physicist's approach to high-dimensional inference
SCHW01 11th January 2008
15:30 to 16:30
Models, model lists, model spaces and predictive optimality

Sources of uncertainty related to model specification are often the single biggest factors in inference. In the predictive context, we demonstrate the effect of varying the model list used for averaging and varying the averaging strategy in computational examples. In addition, by varying the model space while using similar lists and averaging strategies, we demonstrate that the effect of the space itself computationally. Thus, it is reasonable to associate a concept of variance and bias not just to individual models but to other aspects of an overall modeling strategy. Moreover, although difficult to formalize, good prediction is seen to be associated with a sort of complexity matching between the space and the unknown function, and robustness. In some cases, the relationship among complexity, variance-bias, robustness and averaging strategy seems to be dependent on sample size. Taken together, these considerations can be formalized into an overview that may serve as a framework for more general inferential problems in Statistics

SCH 14th January 2008
11:00 to 12:00
Innovative higher criticism for detecting sparse signals in correlated noise
SCH 16th January 2008
11:00 to 12:00
Hierarchically penalised Cox regression for censored data with grouped variables and its oracle property
SCH 18th January 2008
11:00 to 12:00
M Pontil A spectral regularisation framework for multi-task structure learning
SCH 22nd January 2008
11:00 to 12:00
Statistical issues amd metabolomics
SCH 24th January 2008
11:00 to 12:00
Excess mass estimation
SCH 24th January 2008
15:00 to 17:00
An informal introduction to sufficient dimension reduction
SCH 25th January 2008
11:00 to 12:00
An ensemble approach to improved prediction from multitype data
SCH 29th January 2008
11:00 to 12:30
Model selection and sharp asymptotic minimaxity

We will show that a class of model selection procedures are asymptotically sharp minimax to recover sparse signals over a wide range of parameter spaces. Connections to Bayesian model selection, the MDL principle and wavelet estimation will be discussed.

SCH 31st January 2008
09:00 to 10:00
High frequency micro structure in futures markets
SCH 31st January 2008
10:00 to 10:45
Choosing a portfolio of many assets
SCH 31st January 2008
11:00 to 12:00
P Clarkson A database of foreign exchange deals
SCH 5th February 2008
11:00 to 12:00
Approximation methods in statistical learning theory

Spectral methods are of fundamental importance in statistical learning, as they underlie algorithms from classical principal components analysis to more recent approaches that exploit manifold structure. In most cases, the core technical problem can be reduced to computing a low-rank approximation to a positive-definite kernel. Using traditional methods, such an approximation can be obtained with computational complexity that scales as the cube of the number of training examples. For the growing number of applications dealing with very large or high-dimensional data sets, however, these techniques are too costly. A known alternative is the Nystrom extension from finite element methods. While its application to machine learning has previously been suggested in the literature, we introduce here what is, to the best of our knowledge, the first randomized algorithm of this type to yield a relative approximation error bound. Our results follow from a new class of algorithms for the approximation of matrix products, which reveal connections between classical linear algebraic quantities such as Schur complements and techniques from theoretical computer science such as the notion of volume sampling.

SCH 7th February 2008
11:00 to 12:00
Modelling human motion with Gaussian processes

Human motion capture data is a high dimensional time series. Probabilistic modelling of this high dimensional data is affected by problems of dimensionality. In this talk we will show how Gaussian processes can be used to reduce the dimensionality and construct accurate models of human motion. The main application will be three dimensional human pose reconstruction from images.

SCH 8th February 2008
11:00 to 12:00
Properties of regularisation operators in learning theory

We consider the properties of a large class of learning algorithms defined in terms of classical regularization operators for ill-posed problems. This class includes regularized least-squares, Landweber method, $\nu$-methods and truncated singular value decomposition on hypotyesis spaces of vector-valued functions defined in terms of suitable reproducing kernels. In particular universal consistency, minimax rates and statistical adaptation of the methods we will be discussed.

SCH 12th February 2008
11:00 to 12:00
J Kent Procrustes methods for projective shape

Projective shape is important in computer vision to represent the information in a scene that is invariant under different camera views. The simplest example is the cross ratio, which represents the projective shape of four collinear points. One way to study projective shape is through projective invariants. However, a disadvantage is that there seems to be no natural metric structure on these invariants, making it difficult to quantify differences between different projective shapes. The purpose of this talk is to describe a metric structure for projective shapes. Then, using Procrustes methods, the beginnings of a statistical theory will be developed to construct averages and describe variability for a collection of projective shapes.

SCH 13th February 2008
14:00 to 15:00
YH Said Text mining and high dimensional statistical analysis

Text mining can be thought of as a synthesis of information retrieval, natural language processing and statistical data mining. The set of documents being considered can scale to hundreds of thousands and the associated lexicon can be a million or more words. Analysis is often done by consideration of a term-document matrix or even a bigram-document matrix. The dimensionality of the term vector can thus easily be a million or more. In this talk I will describe some of the approaches to text mining on which we have been working. This is a joint work with Dr Edward Wegman.

SCH 15th February 2008
11:00 to 12:00
An introduction to variational methods for incomplete-data problems

Likelihood and Bayesian inference for incomplete-data problems tend to involve computational complications. In Bayesian inference, for example, simulation-based methods such as Markov chain Monte Carlo represent one approach to dealing with such difficulties. The talk will describe a more deterministic approach, based on so-called variational approximations. These have been developed in the computer science literature and versions of them for likelihood analysis and Bayesian analysis will be described in the talk. Application to the analysis of mixture models and extensions thereof will be discussed, as will general issues concerning the theoretical properties of the methods.

SCH 18th February 2008
15:00 to 15:30
Bayesian hierarchical clustering
SCH 18th February 2008
15:30 to 16:00
Bayesian nonparametric latent feature models
SCH 18th February 2008
16:00 to 16:30
New models for relational classification
SCH 18th February 2008
16:30 to 17:00
Gaussian process methods for large and high-dimensional data sets
SCH 19th February 2008
11:00 to 12:00
M Seeger Expectation Propagation -- Experimental Design for the Sparse Linear M

Expectation propagation (EP) is a novel variational method for approximate Bayesian inference, which has given promising results in terms of computational efficiency and accuracy in several machine learning applications. It can readily be applied to inference in linear models with non-Gaussian priors, generalised linear models, or nonparametric Gaussian process models, among others, yet has not been used in Statistics so far to our knowledge. I will give an introduction to this framework. I will then show how to address sequential experimental design for a linear model with non-Gaussian sparsity priors, giving some results in two different machine learning applications. These results indicate that experimental design for these models may have significantly different properties than for linear-Gaussian models, where Bayesian inference is analytically tractable and experimental design seems best understood. EP as a statistical approximation technique, and especially experimental design for models different from linear-Gaussian ones, is not well-understood theoretically. To advance on the understanding, it seems promising to relate it to work in Statistics on multivariate continuous-variable distributions, and I am hoping very much for feedback from the audience in that respect.

SCH 21st February 2008
11:00 to 12:00
Some statistical problems from artificial intelligence
SCH 22nd February 2008
11:00 to 12:00
Functional sparsity

Substantial progress has recently been made on understanding the behaviour of sparse linear models in the high-dimensional setting, where the number the variables can greatly exceed the number of samples. This problem has attracted the interest of multiple communities, including applied mathematics, signal processing, statistics and machine learning. But linear models often rely on unrealistically strong assumptions, made mainly for convenience. Going beyond parametric models, can we understand the properties of high-dimensional functions that enable them to be estimated accurately from sparse data? In this talk we present some progress on this problem, showing that many of the recent results for sparse linear models can be extended to the infinite-dimensional setting of nonparametric function estimation. In particular, we present some theory for estimating sparse additive models, together with algorithms that are scalable to high dimensions. We illustrate these ideas with an application to functional sparse coding of natural images. This is joint work with Han Liu, Pradeep Ravikumar, and Larry Wasserman.

SCH 26th February 2008
11:00 to 12:00
Learning latent activites in large scale dynamical problems

Many machine learning problems can be cast as problems of learning highly structured latent activities or dynamics. I will discuss typical approaches to these problems, and illustrate this using the problems of modelling handwriting and modelling fMRI data. However the problem of really learning complicated structural dynamics still seems elusive, and I will briefly discuss what approaches may be fruitful in achieving this.

SCH 28th February 2008
11:00 to 12:00
Pre-modelling via BART

Consider the canonical regression set-up where one wants to learn about the relationship between y, a variable of interest, and x_1,...,x_p, p potential predictor variables. Although one may ultimately want to build a parametric model to describe and summarize this relationship, preliminary analysis via flexible nonparametric models may provide useful guidance. For this purpose we propose BART (Bayesian Additive Regression Trees), a flexible nonparametric ensemble Bayes approach for estimating f(x_1,...,x_p), which is E(Y|x_1,...,x_p), for obtaining predictive regions for future y, for describing the marginal effects of subsets of x_1,...,x_p and for model-free variable selection. Essentially, BART approximates f by a Bayesian 'sum-of-trees' model where fitting and inference are accomplished via an iterative backfitting MCMC algorithm. By using a large number of trees, which yields a redundant basis for f, BART is seen to be remarkably effective at finding highly nonlinear relationships hidden within a large number of irrelevant potential predictors. BART also provides an omnibus test: the absence of any relationship between y and any subset of x_1,...,x_p is indicated when BART posterior intervals for f reveal no signal. (This is joint work with Hugh Chipman and Robert McCulloch.)

SCH 29th February 2008
11:00 to 12:00
Some thoughts about the design of dissimilarity measures

In many situations, dissimilarities between objects cannot be measured directly, but have to be constructed from some known characteristics of the objects of interest, e.g. some values on certain variables.

>From a philosophical point of view, the assumption of the objective existence of a 'true' but not directly observable dissimilarity value between two objects is highly questionable. We treat the dissimilarity construction problem as a problem of the choice or design of such a measure and not as an estimation problem of some existing but unknown quantities.

Therefore, subjective judgment is necessarily involved, and the main aim of the design of a dissimilarity measure is the proper representation of a subjective or intersubjective concept (usually of subject-matter experts) of similarity or dissimilarity between the objects.

The design of dissimilarity measures is of particular interest when analyzing high-dimensional data, because methods such as MDS and nearest neighbour techniques operate on dissimilarity matrices and such matrices are not essentially more complex when derived from high dimensional data.

Some guidelines for the choice and design of dissimilarity measures are given and illustrated by the construction of a new dissimilarity measure between species distribution areas in biogeography, which are formalized as binary presence-absence data on a set of geographic units.

I will also discuss alternatives to the Euclidean distance and their implications for high-dimensional situations in which it is not feasible to use information about the meaning of individual variables to construct a dissimilarity measure.

SCH 4th March 2008
11:00 to 12:00
JQ Shi Gaussian process functional regression model for curve prediction and clustering

In this talk I will first discuss Gaussian Process Functional Regression (GPFR) model, which is used to model functional response curves with a set of functional covariates (the dimension of the covariates may be very large). There are two main features: modelling nonlinear and nonparametric functional regression relationship and modelling covariance structure and mean structure simultaneously. The method gives very accurate results for curve fitting and prediction but side-steps the problem of heterogeneity. I will then discuss how to define a hierarchical mixture model to model 'spatially' indexed functional data, i.e., the heterogeneity is dependent on factors such as region or individual patient's information. The mixture model has also been used for curve clustering, but focusing on the problem of clustering functional relationships between response curve and covariates, i.e. the clustering is based on the surface shape of the functional response against the set of functional covariates. Some applications based on simulated data and real data will be presented.

SCH 6th March 2008
11:00 to 12:00
Non-parametric estimation of HARDI diffusion weighted magnetic resonance imaging data

Diffusion-Weighted Magnetic Resonance Imaging captures the diffusion of water molecules in tissue. The impediment of this diffusion process by nerves enables the characterisation of white matter structure and the measurement of quantitative descriptions of white matter integrity.

Initial quantification of the diffusion was based on modelling the Diffusion PDF parametrically, and as such the parameters of the PDF can be estimated, if with some model-choice issues. A single Gaussian Diffusion Tensor model can for example be determined with a minimum of 6 measurements. Of special interest is inferring the orientational structure of the PDF and as much as one third of all white matter voxels in the brain experience orientational heterogeneity. It is hard to model orientational heterogeneity parametrically, and to estimate the PDF without bias a substantial number of additional measurements are required. We discuss non-parametric estimation methods of the important characteristics of the diffusion PDF, and inherent limitations in estimation based on a clinically feasible acquisition protocol. We discuss combining hard and soft shrinkage procedures with a suitable basis representation, and how to construct non-parametric summaries of the diffusion with reduced variance without incurring substantial bias.

This is joint work with Brandon Whitcher, CIC Hammersmith, GSK.

SCH 11th March 2008
11:00 to 12:00
Total variation and curves

We discuss the approximation of data from one- and two-dimensional curves using total-variation-based techniques. Our aim will be to minimise complexity among all functions which satisfy a criterion for approximation. Complexity will be measured by the number of local extreme values or variational properties of the functions. Our criteria for approximation will be based on a multiscale analysis of the residuals.

SCH 11th March 2008
14:00 to 15:00
Proteomics data analysis

Within the context of expression proteomics, we developed a novel approach to identify and assess meaningful differences in functional datasets. Given multiple proteomic profiles (generated by a Matrix Assisted Laser Desorption Ionization Time-of-Flight Mass Spectrometer) from subjects who belonged to one of two treatment groups, we extracted and classified biologically relevant information using Bayesian nonparametric methods. We modelled f(t), the mean ion abundance per spectrum, via an adaptive kernel regression approach, and relied on an underlying Levy random field to control model complexity. We began by implementing a Levy random fields model for an individual spectrum, and extended it hierarchically to include data from multiple spectra. To make the extension, we asserted that each multi-modal spectrum depended upon one, time and resolution dependent, marked Gamma process, but was unique for reasons including random, biological or measurement error. Upon eliciting parameter prior distributions, we designed a Markov chain Monte Carlo algorithm that enabled exploration of a trans-dimensional model space and posterior predictions of experimental-group status.

SCH 12th March 2008
16:15 to 17:00
Some issues raised by high dimension in Statistics - a partial overview of the SCH Programme
SCH 13th March 2008
11:00 to 12:00
Multilevel modelling of proteomic mass-spectrometry data

Statistical methodology for the analysis of proteomic mass-spectrometry data is proposed using multilevel modelling. Each high-dimensional spectrum is represented using a near-orthogonal low dimensional basis of Gaussian functions. Multivariate mixed effect models are proposed in the lower dimensional space. In particular, differences between groups are investigated using fixed effect parameters, and individual variability of spectra is modelled using random effects. A deterministic peak fitting algorithm provides initial estimates of the near-orthogonal Gaussian basis, and the estimates are updated using a two-stage iterative method. The multilevel model is fitted using a parallel procedure for computational convenience. The methodology is applied to proteomic mass-spectrometry data from serum samples from melanoma patients categorized as Stage I or Stage IV, and significant locations of peaks are identified. Finally comparisons with other methods, including simple feature-based statistics and more complicated Bayesian Markov chain Monte Carlo inference are also made. This is joint work with William Browne (University of Bristol) and Kelly Handley (University of Birmingham).

SCH 17th March 2008
17:00 to 18:00
D Donoho More unknowns than equations? Not a problem! Use Sparsity!
Everything you were taught about underdetermined systems of linear equations is wrong...

Okay, that's too strong. But you have been taught things in undergraduate linear algebra which, if you are an engineer or scientist, may be holding you back. The main one is that if you have more unknowns than equations, you're lost. Don't believe it. At the moment there are many interesting problems in the information sciences where researchers are currently confounding expectations by turning linear algebra upside down:

  • (a) An standard magnetic resonance imaging device can now produce a clinical-quality image using a factor 8 less time than previously thought.
  • (b) A Fourier imaging system can observe just the lowest frequencies of a sparse nonnegative signal and perfectly reconstruct all the unmeasured high frequencies of the signal.
  • (c) a communications system can transmit a very weak signal perfectly in the presence of intermittent but arbitrarily powerful jamming.

Moreover, in each case the methods are convenient and computationally tractable.

Mathematically, what's going on is a recent explosion of interest in finding the sparsest solution to certain systems of underdetermined linear equations. This problem is known to be NP-Hard in general, and hence the problem sounds intractable. Surprisingly, in some particular cases, it has been found that one can find the sparsest solution by l¹ minimization, which is a convex optimization problem and so tractable. Many researchers are now actively working to explain and exploit this phenomenon. It's responsible for the examples given above.

In my talk, I'll discuss that this curious behavior of l¹ minimization and connect with some pure mathematics -- convex polytope theory and oriented matroid theory.

Ultimately, the pure math behind this phenomenon concerns some accessible but very surprising properties of random point clouds in very high dimensions: each point gets very neighborly!

I'll also explain the connection of this phenomenon to the Newton Institute's ongoing program "Statistical Theory and Methods for Complex, High-Dimensional Data".

SCH 18th March 2008
11:00 to 12:00
E Hancock Analysis of graphs using diffusion processes and random walks (a random walk through spectral graph theory)

This talk will focus on how graph-structures can be analysed using diffusion processes and random walks. It will commence by explaining the relationship between the heat equation on a graph, the spectrum of the Laplacian matrix (the degree matrix minus the weighted adjacency matrix) and the steady-state random walk. The talk will then focus in some depth on how the heat kernel, i.e. the solution of the heat equation, can be used to characterise graph structure in a compact way. One of the important steps here is to show that the zeta function is the moment generating function of the heat kernel trace, and that the zeta function is determined by the distribution of paths and the number of spanning trees in a graph. We will then explore a number of applications of these ideas in image analysis and computer vision. This will commence by showing how the heat kernel can be used for the anisotropic smoothing of complex non-Euclidean image data, including tensor MRI. We will then show how a similar diffusion process based on the Fokker-Planck equation can be used for consistent image labelling. Thirdly, we will show how permutation invariant characteristics extracted from the heat-kernel can be used for learning shape classes. If time permits, the talk will conclude by showing how quantum walks on graphs can overcome some of the problems which limit the utility of classical random walks.

SCH 19th March 2008
11:00 to 12:00
Bootstrapping divergence weighted independence graphs

Independence graphs give an overview of multivariate dependency. After a brief introduction to information divergence and to conditional independence graphs we show DWIGs fall within the paradigm of design based inference. Bootstrap resampling tests the stability of the DWIG parameters when increasing the dimension of the underlying data set.

SCH 26th March 2008
11:00 to 12:00
WH Teh Improvements to variational Bayesian inference

Variational Bayesian (VB) inference is an approximate inference framework that has been successfully applied in a wide variety of graphical models. It is well accepted that VB provides lowered variance in posterior estimation in exchange for higher bias, as opposed to Markov chain Monte Carlo (MCMC) inference. In this talk we shall explore improvements to the VB framework in order to reduce bias, in the context of a specific Bayesian network called latent Dirichlet allocation. Specifically we consider two ideas: collapsing or integrating out variables before any approximations are made, and hybrid methods that combine VB and MCMC techniques.

SCH 27th March 2008
11:00 to 12:00
B Kleijn The semiparametric Bernstein-Von Mises theorem

The Bernstein-Von Mises theorem provides a detailed relation between frequentist and Bayesian statistical methods in smooth, parametric models. It states that the posterior distribution converges to a normal distibution centred on the maximum-likelihood estimator with covariance proportional to the Fisher information. In this talk we consider conditions under which such an assertion holds for the marginal posterior of a parameter of interest in semiparametric models. From a practical point of view, this enables the use of Bayesian computational techniques (e.g. MCMC simulation) to obtain (hard to compute otherwise) frequentist confidence intervals. (Joint work with P. Bickel.)

SCHW02 31st March 2008
10:00 to 11:00
The evolution of promoter sequence

We have produced an evolutionary model for promoters (and more generally for genomic regulatory sequence) analogous to the commonly used synonymous/nonsynonymous mutation models for protein-coding sequence. Although our model, called Sunflower, relies on some simple assumptions, it captures enough of the biology of transcription factor action to show clear correlation with other biological features. Sunflower predicts a binding profile of transcription factors to DNA sequence, in which different factors compete for the same potential binding sites. Sunflower can also model cooperative binding. We can control the apparent concentration of the factors by setting parameters uniformly or from gene expression data. The parameterized model simultaneously estimates a continuous measurement of binding occupancy across the genomic sequence for each factor. We can then introduce either a localized mutation (such as a SNP) or a coordinated set of mutations (for example, from a haplotype or another species), rerun the binding model and record the difference in binding profiles using their relative entropy. A single mutation can alter interactions both upstream and downstream of its position due to potential overlapping binding sites, and our statistic captures this domino effect.

Results from Sunflower show many features in agreement with known biology. For example, the overall binding occupancy rises over transcription start sites, and CpG desert promoters show sharper localization signals relative to the transcription start site. More interesting are correlates to variation both between species and within them. Over evolutionary time, we observe a clear excess of low- scoring mutations fixed in promoters, consistent with most changes being neutral. However, this is not consistent across all promoters, and some promoters show more rapid divergence. This divergence often occurs in the presence of relatively constant protein coding divergence. Interestingly, different classes of promoters show different sensitivity to mutations, with developmental and immunological genes having promoters inherently more sensitive to mutations than housekeeping genes.

SCHW02 31st March 2008
11:30 to 12:30
Functional genomics and the forest of life

We will discuss the 0-dimensional statistical problem of alignment, and its relation to the high-dimensional problem of phylogeny. In particular, we will discuss the relevance of the "space of phylogenetic oranges" and its relation to the above problems. We will also discuss "sequence annealing", which is a new alignment strategy based on these ideas.

SCHW02 31st March 2008
14:00 to 15:00
Understanding interactomes by data integration
SCHW02 31st March 2008
15:30 to 16:30
GJ McLachlan On mixture models in high-dimensional testing for the detection of differential gene expression

An important problem in microarray experiments is the detection of genes that are differentially expressed in a given number of classes. As there are usually thousands of genes to be considered simultaneously, one encounters high-dimensional testing problems. We provide a straightforward and easily implemented method for estimating the posterior probability that an individual gene is null (not differentially expressed). The problem can be expressed in a two-component mixture framework, using an empirical Bayes approach. Current methods of implementing this approach either have some limitations due to the minimal assumptions made or with the computationally intensive nature of more specific assumptions. By converting to a z-score the value of the test statistic used to test the significance of each gene, we propose a simple two-component normal mixture that models adequately the distribution of this score. The approach provides an estimate of the local false discovery rate (FDR) for each gene, which is taken to be the posterior probability that the gene is null. Genes with the local FDR less than a specified threshold C are taken to be differentially expressed. For a given C, this approach also provides estimates of the implied overall errors such as the (global) FDR and the false negative/positive rates.

Related Links

SCHW02 31st March 2008
16:30 to 17:30
Statistical challenges in using comparative genomics for the identification of functional sequences

There are two main aspects of comparative sequence analysis that rely on high-dimensional statistical approaches: identifying evolutionarily constrained regions and determining the significance of their overlap with functional sequences. The identification of constrained sequences largely relies on our understanding of evolutionary models and applying them to multi-sequence alignments. However, our understanding of evolutionary processes is incomplete and our ability to generate “perfect” multi-sequence alignments is hampered by incomplete sequence datasets and general uncertainty in the process; these factors can lead to multiple equally plausible alignments, only one of which is typically represented in downstream analyses. In order to mitigate some of these issues, we have been developing new comparative genomics approaches that take into account the biochemical physical properties of DNA, such that we can understand which substitutions are more “tolerable” with respect to the three dimensional structure of DNA, and thus more “neutral” in evolution. We also plan to start taking into account alignment uncertainty into our predictions of constrained sequences. Determining the significance of our improved sequence constraint methods relies on a new statistical approach for determining the significance of overlap with known functional annotations. This new method, devised by Peter Bickel and colleagues, was applied to analyses performed within the ENCODE consortium and provides the basis for newer methods that will be discussed later in this meeting.

Related Links

SCHW02 1st April 2008
09:00 to 10:00
Structural variation in the human genome

Over the past three years it has become rapidly appreciated that the human genome varies in its structure as well as its sequence, by virtue of a panoply of different chromosomal rearrangements, some that alter the number of copies of DNA segments, and others that alter orientation but not copy number. Evidence is growing from diverse sources that this source of genomic variation has an appreciable functional impact, and yet we remain far from a complete catalogue of this form of variation let alone its biological consequences. In my talk I will summarise the progress to date and highlight the appreciable statistical challenges that remain, with particular reference to the approaches being adopted in my group towards assaying copy number variation and assessing its impact on complex traits through genetic association studies.

SCHW02 1st April 2008
10:00 to 11:00
Y Benjamini Selective inference in complex research problems

We shall highlight the problem of selective inference in genomics using some recent studies. The False Discovery Rate (FDR) approach to this problem will be reviewed, and then we shall discuss: (i) advances in hierarchical testing with an example from a study associating gene expression in the brain with multiple traits of behavior; (ii) screening for partial conjunctions in order to address replicability; and (iii) selective confidence intervals in the frequentist and Bayesian frameworks.

SCHW02 1st April 2008
11:30 to 12:30
Efficient use of population genome sequencing data

With the advent of new sequencing technologies that reduce the cost of DNA sequence by a factor of a hundred, we have moved into the era of population genomic sequencing, where we sample many individuals from a population to study natural genetic variation genome-wide. However, at this scale sequencing is still costly. I will discuss strategies to use low coverage sequencing on multiple samples from a population, and some of the complications in using the resulting data for population genetic analyses. Examples will be drawn from the Saccharomyces Genome Resequencing Project (SGRP) in which we have collected sequence data from 70 yeast strains, and planning for the 1000 Genomes Project to characterise human genetic variation down to 1% allele frequency.

Related Links

SCHW02 1st April 2008
14:00 to 15:00
M West Sparsity modelling in gene expression pathway studies

I will discuss aspects of large-scale multivariate modelling utilising sparsity priors for anova, regression and latent factor analysis in gene expression studies. Specific attention will be given to the development of experimental gene expression signatures in cell lines and animal models, and their extrapolation/evaluation in gene pathway-focused analyses of data from human disease contexts. The role of sparse statistical modelling in signature identification, and in evaluation of complex interacting "sub pathway" related patterns in gene expression in observational data sets, will behighlighted. I will draw on data and examples from some of our projects in cancer and cardiovascular genomics.

SCHW02 1st April 2008
15:30 to 16:30
Population genomics of human gene expression

The recent comparative analysis of the human genome has revealed a large fraction of functionally constrained non-coding DNA in mammalian genomes. However, our understanding of the function of non-coding DNA is very limited. In this talk I will present recent analysis in my group and collaborators that aims at the identification of functionally variable regulatory regions in the human genome by correlating SNPs and copy number variants with gene expression data. I will also be presenting some analysis on inference of trans regulatory interactions and evolutionary consequences of gene expression variation.

SCHW02 2nd April 2008
09:00 to 10:00
A Enright Computational analysis and prediction of microRNA binding sites

MicroRNAs (miRNAs) are small 22 nucleotide RNA molecules that directly bind to the 3' Untranslated regions of protein-coding messenger RNAs. This binding event represses the target transcript rendering it unsuitable for protein production and causing its degradation. Many miRNAs have been found and a large-number of them have already been implicated in human disease and development. We have developed a number of computational approaches for predicting the target transcripts of miRNAs. One method (miRanda) is purely computational and uses a simple dynamic programming algorithm and a statistical model to identify significant binding sites. Our second approach (Sylamer) is an algorithm for scanning genome sequences for 7mer words and testing gene-expression data to identify gene sets which are significantly enriched or depleted in such 7mer words using Hypergeometric Statistics. This combined computational/experimental approach has worked extremely well for identifying candidate miRNA targets in B and T blood cells, developing Zebrafish embryos and in mouse mutants with deafness.

Related Links

SCHW02 2nd April 2008
10:00 to 11:00
L1-regularisation, motif regression and ChIP-on-chip data analysis

Motivated by the proposed format of talks, we include the following: (i) a review of statistical facts about L1-regularization for high-dimensional problems; (ii) some adaptations of motif regression (Conlon, Liu, Lieb & Liu, 2003) for scoring potential motifs or for presence/absence of other biological targets of interest (e.g. proteins) by integrating multiple data sources; (iii) using the concepts for analyzing ChIP-on-chip data from human liver cells (with a side remark on signal extraction) for HIF-dependent transcriptional networks.

Issue (i) deals with a general purpose method for variable selection or feature extraction which is potentially useful for a broad variety of (multiple) bio-molecular and high-dimensional data. Issue (ii) is - in our experience - an interesting method to improve upon some chosen "standard" methodology by making use of additional data sources. Finally, issue (iii) is work in progress with the Ricci lab at ETH Zurich: it is an illustration for statisticians and - of course - the "real thing" for biologists.

Related Links

SCHW02 2nd April 2008
11:30 to 12:30
Extraction and classification of cellular and genetic phenotypes from automated microscopy data

I will start the presentation by an overview over the Bioconductor project, a large international open source and open development software project for the analysis and comprehension of genomic data. Its goals are to provide access to a wide range of powerful statistical and graphical methods for the analysis of genomic data; to facilitate the integration of biological metadata in the analysis of experimental data: e.g. literature data, gene and genome annotation data; to allow the rapid development of extensible, scalable, and interoperable software; to promote high-quality documentation and reproducible research; to provide training in computational and statistical methods for the analysis of genomic data. While much of the initial focus has been on microarray analysis, one of the recent developments has been the development of methods, and computational infrastructure, for the analysis of cell-based assays using various phenotypic readouts.

Changes in cell shape are important for many processes during development and disease. However, cellular mechanisms and molecular components that underlie these processes remain poorly understood. We here present a rapid and automated approach to identify and categorize genes based on their phenotypic signatures on a single-cell level. Perturbations by RNAi on a whole genome scale led to the identification of several hundred genes with distinct cell shape phenotypes. More than 6,000,000 cells were individually profiled into different phenotypic classes. The approach is permits the ‘segmentation’ of the genome into phenotypic clusters using complex phenotypic signatures.

SCHW02 2nd April 2008
14:00 to 15:00
Ultra-deep sequencing of mixed virus populations

The diversity of virus populations within single infected hosts presents a major difficulty for the natural immune response, vaccine design, and antiviral drug therapy. Recently developed ultra-deep sequencing technologies can be used for quantifying this diversity by direct sequencing of the mixed virus population. We present statistical and computational methods for the analysis of such sequence data. Inference of the population structure from observed reads is based on error correction, reconstruction of a minimal set of haplotypes that explain the data, and eventually estimation of haplotype frequencies. We demonstrate our approach by analyzing simulated data and by comparison to 165 sequences obtained from clonal Sanger sequencing of four independent, diverse HIV populations.

Related Links

SCHW02 3rd April 2008
09:00 to 10:00
Cracking the regulatory code: predicting expression patterns from DNA sequence

Precise control of gene expression lies at the heart of nearly all biological processes. However, despite enormous advances in understanding this process from both experimental and theoretical perspectives, we are still missing a quantitative description of the underlying transcriptional control mechanisms, and the remaining questions, such as how regulatory sequence elements ‘compute’ expression from the inputs they receive, are still very basic.

In this talk, I will present our progress towards the ultimate goal of developing integrated quantitative models for transcription regulation, spanning all aspects of the process, including the DNA sequence, regulators, and expression patterns. I will first describe a novel thermodynamic model that computes expression patterns as a function of cis-regulatory sequence and the binding site preferences and expression of participating transcription factors. I will show that when applied to the segmentation gene network of Drosophila, the model accurately predicts the expression of many known cis-regulatory modules, even across species, and reveals important organizing principles of transcriptional regulation in the network: that both strong and large numbers of weaker binding sites contribute, leading to high occupancy of the module DNA, and conferring robustness against mutation; and that clustering of weaker sites permits cooperative binding, which is necessary to sharpen the patterns.

Related Links

SCHW02 3rd April 2008
10:00 to 11:00
PJ Bickel Refined nonparametric methods for genomic inference

Inference about genomic features faces the particular difficulty that, save for interspecies variation, we have only one copy of any of the genomes that Nature might have produced. We postulate a framework which includes an “ergodic hypothesis” which permits us to compute p values and confidence bounds. These seem to be as conservative as could be hoped for. Out methods in crude form were applied to data for the ENCODE project (Birney et al (2007)). We will discuss our model and refinements of the methods previously proposed.

SCHW02 3rd April 2008
11:30 to 12:30
Steps toward directed identification of disease genes: predicting the consequences of genetic perturbations

Related Links

SCHW02 3rd April 2008
14:00 to 15:00
High-resolution identification of active gene regulatory elements

I will discuss methods we use to identify active gene regulatory elements within the human genome and some of the current obstacles and hurdles we still need to overcome

SCHW02 3rd April 2008
15:30 to 16:30
High-resolution binding specificity profiles of transcription factors and cis regulatory codes in DNA
SCHW02 4th April 2008
09:00 to 10:00
Functional genomic approaches to stem cell biology

Embryonic stem (ES) cells are similar to the transient population of self-renewing cells within the inner cell mass of the preimplantation blastocyst (epiblast), capable of pluripotential differentiation to all specialised cell types comprising the adult organism. These cells undergo continuous self-renewal to produce identical daughter cells, or can develop into specialised progenitors and terminally differentiated cells. A variety of molecular pathways involved in embryonic development have been elucidated, including those influencing stem cell differentiation. As a result, we know of a number of key transcriptional regulators and signalling molecules that play essential roles in manifesting nuclear potency and self-renewal capacity of embryo-derived and tissue-specific stem cells. Despite these efforts however, a small number of components have been identified and large-scale characterisation of these processes remains incomplete. While the precise biological niche is believed to direct differentiation and development in vivo, it is now possible to utilise explanted stem cell lines as an in vitro model of cell fate assignment and differentiation. The aim of the studies discussed here is to map the global transcriptomic and proteomic activity of ES cells during various stages of differentiation and lineage commitment in tissue culture. This approach will help characterise the functional roles of key developmental regulators and yield more rational approaches to manipulating stem cell behaviour in vitro. The generation of large-scale data from microarray and functional genomic experiments will help to identify and characterise the regulatory influence of key transcription factors, signaling genes and non-coding RNAs involved in early developmental pathways, leading to a more detailed understanding of the molecular mechanisms of vertebrate embryogenesis.

SCHW02 4th April 2008
10:00 to 11:00
G McVean Approximate genealogical inference

For many inferential problems in evolutionary biology and population genetics considerable power can be gained by explicitly modelling the genealogical relationship between DNA sequences. In the presence of of recombination, genealogical relationships are described by a complex graph. While it is theoretically possible to explore the posterior distribution of such graphs using techniques such as MCMC, in most realistic situations the computational complexity of such methods makes them unpractical. One possible solution is to develop approximations to full genealogical inference. I will discuss what properties such approximations should have and describe one approach that samples local genealogical relationships along a genome.

SCHW02 4th April 2008
11:30 to 12:30
Genomic principles for feedback regulation of metabolism

Small molecule metabolism is the highly coordinated interconversion of chemical substrates through enzyme-catalysed reactions. It is central to the viability of all organisms as it enables the assimilation of nutrients for energy production and the synthesis of precursors for all cellular components. The system is tightly regulated so cells can respond efficiently to environmental changes. This is optimised to minimise the substantial cost of enzyme production and core metabolite depletion, and to maximise the benefit of cell growth and division. It is commonly known that this regulation is achieved by controlling either (i) the availability of enzymes or (ii) their activities. Though the molecular mechanisms behind these two regulatory processes have been elucidated in great detail, and we still lack insight into how they are deployed and complement each other at a global level. Here, I will present a genome-scale analysis of how regulatory feedback by small molecules control the metabolic system, and examine how the two modes of regulation are deployed throughout the system.

Bio: Nick Luscombe, Group Leader, EMBL-European Bioinformatics Institute Nick completed his PhD with Professor Janet Thornton at University College London (1996-2000), studying the basis for specificity of DNA-binding proteins. He then moved to Yale University as a post-doctoral fellow with Professor Mark Gerstein (2000-2004). During this time, he shifted his research focus to genomics, with a particular emphasis on transcriptional regulation in yeast. He has been a Group Leader at EMBL-EBI since 2005, examining the control of interesting biological systems.

SCHW02 4th April 2008
14:00 to 15:00
A bayesian probabilistic approach to transform public microarray repositories into disease diagnosis databases

Predicting phenotypes from genotypes is one of the major challenges of functional genomics. In this talk, we aim to take the first step into using microarray repositories to create a disease diagnosis database, or in general, for phenotype prediction. This will provide an important application for the enormous amount of costly generated, yet freely available, genomics data. In many disease diagnosis cases, it is not obvious which potential disease should be targeted, and screening across the enormous accumulation of disease expression profiles will help to narrow down the disease candidates. In addition, such profile-based-diagnosis is especially useful for those diseases that lack biochemical diagnosis tests.

SCH 8th April 2008
11:00 to 12:00
Empirical efficiency maximisation: improved locally efficient covariate adjustment

It has long been recognized that covariate adjustment can increase precision in randomized experiments, even when it is not strictly necessary. Adjustment is often straightforward when a discrete covariate partitions the sample into a handful of strata, but becomes more involved when modern studies collect copious amounts of baseline information on each subject. This dilemma helped motivate locally efficient estimation, in which one attempts to gain efficiency through a (possibly misspecified) working model. However, with complex high-dimensional covariates, where one might have no belief in the working model, misspecification can actually decrease precision. We propose a new method, empirical efficiency maximization, to target the working model element minimizing asymptotic variance for the resulting parameter estimate, whether or not the working model is (approximately) correct. Gains are demonstrated relative to standard locally efficient estimators.

SCH 10th April 2008
11:00 to 12:00
C Taylor Boosting kernel estimates

Kernel density estimation can be used to implement an estimate of Bayes' rule for classification. Kernel functions can also be used in nonparametric regression, and all three topics (classification, regression and clustering) are examples of "statistical learning". Boosting - an iterative procedure for improving estimates - is increasingly widely used due to its impressive performance. In this talk we give an introduction to these kernel methods as well as to boosting. We show how to implement boosting in each case, and illustrate (both theoretically, and by example) the effect on bias and variance.

SCH 15th April 2008
11:00 to 12:00
Data visualisation via pairwise displays

We take a graph theoretic approach to the component ordering problem in the layout of statistical graphics. We use Eulerian tours and Hamiltonian decompositions of complete graphs to ameliorate order effects. Similarly, visual effects of selected salient features in the data are amplified with traversals of edge weighted graphs. Examples of these techniques include improved versions of multiple comparison displays, interaction plots, star glyph displays and parallel coordinate plots. Improved versions of interaction plots and star glyph displays are described based on graph traversals. We present algorithms based on classical graph theory methods. These along with the new graphical displays are available as an R package.

This is joint work with R.W. Oldford (Waterloo).

SCH 17th April 2008
11:00 to 12:00
Determining the number of factors in a linear mixture model from limited noisy data

Determining the number of signals (sources / components) in a linear mixture model is a fundamental problem in many scientific fields, including signal processing and analytical chemistry. While most methods in signal processing are based on information-theoretic criteria, in this talk we'll describe a novel non-parametric estimation method based on a sequence of hypothesis tests. The proposed method uses the eigenvalues of the sample covariance matrix, and combines a matrix perturbation approach with recent results from random matrix theory regarding the behaviour of noise eigenvalues. We'll present the theoretical derivation of the method, analysis of its consistency and limit of detection. As we'll show in simulations, under a wide range of conditions our method compares favourably with other common methods

Joint work with Shira Kritchman (Weizmann).

SCH 21st April 2008
11:00 to 12:00
Spectra and generalisation

The talk briefly reviews generalisation bounds for Support Vector Machines and poses the question of whether the spectrum of the empirical covariance matrix can be used to improve the quality of the bounds. Early results in this direction are surveyed before introducing a recent bound on the number of dichotomies of a graph in terms of the spectrum of the graph Laplacian. This result gives a bound on transductive algorithms that minimise the cut size of the classification. The result is then generalised to other bilinear forms and hence applied to Support Vector Classification. In order to obtain an inductive bound the eigenvalues of the true covariance must be estimated from those of a sample covariance matrix. Possible improvements in the quality of the bound are discussed.

SCH 22nd April 2008
11:00 to 12:00
Empirical likelihood with a growing number of parameters
SCH 24th April 2008
11:00 to 12:00
A Bayesian reassessment of nearest-neighbour classification

The k-nearest-neighbour procedure is a well-known deterministic method used in supervised classification. This paper proposes a reassessment of this approach as a statistical technique derived from a proper probabilistic model; in particular, we modify the assessment made in a previous analysis of this method undertaken by Holmes & Adams (2002,2003), and evaluated by Manocha & Girolami (2007), where the underlying probabilistic model is not completely well-defined. Once a clear probabilistic basis for the $k$-nearest-neighbour procedure is established, we derive computational tools for conducting Bayesian inference on the parameters of the corresponding model. In particular, we assess the difficulties inherent to pseudo-likelihood and to path sampling approximations of an intractable normalising constant, and propose a perfect sampling strategy to implement a correct MCMC sampler associated with our model. If perfect sampling is not available, we suggest using a Gibbs sampling approximation. Illustrations of the performance of the corresponding Bayesian classifier are provided for several benchmark datasets, demonstrating in particular the limitations of the pseudo-likelihood approximation in this set-up.

[Joint work with L. Cucala, J.-M. Marin, and D.M. Titterington]

SCH 29th April 2008
11:00 to 12:00
Making the sky searchable: large scale astronomical pattern recognition
SCH 30th April 2008
11:00 to 12:00
Testing for sparse normal means: is there a signal?

Donoho and Jin (2004), following work of Ingster (1999), studied the problem of testing for a signal in a sparse normal means model and showed that there is a ``detection boundary'' above which the signal can be detected and below which no test has any power. They showed that Tukey's ``higher criticism'' statistic achieves the detection boundary. I will introduce a new family of test statistics based on phi-divergences (indexed by a real number s with values between -1 and 2)which all achieve the Donoho-Jin-Ingster detection boundary. I will also review recent work on estimating the proportion of non-zero means.

SCH 30th April 2008
14:00 to 15:00
Looking at data and models in high-dimensional spaces: (1) Tools and tips for making good plots

This session focuses on making static plots for publications utilizing contemporary wisdom on plot design. It includes choice of background and grid lines, color use, aspect ratio. We'll use R and the package ggplot2, and the web site vischeck for color checks.

SCH 6th May 2008
11:00 to 12:00
Non-asymptotic variable identification via the Lasso and the elastic net

The topic of l_1 regularized or Lasso-type estimation has received considerable attention over the past decade. Recent theoretical advances have been mainly concerned with the risk of the estimators and corresponding sparsity oracle inequalities. In this talk we will imvestigate the quality of the l_1 penalized estimators from a different perspective, shifting the emphasis to non-asymptotic variable selection, which complements the consistent variable selection literature. Our main results are established for regression models, with emphasis on the square and logistic loss. The identification of the tagged SNPs associated with a disease, in genome-wide association studies, provides the principal motivation for this analysis. The performance of the method depends crucially on the choice of the tuning sequence and we discuss non-asymptotic choices for which we can correctly detect sets of variables associated with the response at any pre-specified confidence level. These tuning sequences are different for the two loss functions, but in both cases larger than those required for best risk performance, The stability of the design matrix is another major issue in correct variable selection, especially when the total number of variables exceeds the sample size. A possible solution os provided by further regularization, for instance via an l_1+l_2 or elastic net penalty. We discuss the merits and limitations of this method in the same context as above.

SCH 7th May 2008
14:00 to 15:00
Looking at data and models in high-dimensional spaces: (2) How, when and why to use interactive and dynamic graphics

This session will be an explanation of graphics for high-dimensional spaces, and ways to calibrate your eyes to recognise structure. We'll also discuss graphics in association with data mining methods, perhaps, self-organizing maps, model-based clustering, support vector machines and neural networks. We'll use R, ggobi, and the package rggobi.

SCH 8th May 2008
11:00 to 12:00
M Wegkamp Lasso type classifiers with a reject option

We consider the problem of binary classification where one can, for a particular cost, choose not to classify an observation. We present a simple oracle inequality for the excess risk of structural risk minimizers using a generalized lasso penalty.

SCH 13th May 2008
11:00 to 12:00
A Hero Entropic graphs for high-dimensional data analysis

A minimal spanning tree (MST) spanning random points has total spanning length that converges to the entropy of the underlying density generating the points. This celebrated result was first established by Beardwood, Halton and Hammersley (1958) and has since been extended to other random Euclidean and non-Euclidean graphs, such as the geodesic MST (GMST) and the k-nearest neighbor graph (kNNG) over a random set of point. Using the BHH theory of random graphs one can construct graph-based estimates of topological properties of a high dimensional distribution of a data sample. This leads, for example, to a model-free consistent estimator of intrinsic dimension of a data manifold and a high performance non-parametric anomaly detector. We will illustrate this entropic graph approach for applications including: anomaly detection in Internet traffic; activity detection in a MICA2 wireless network; and intrinsic dimension estimation of image databases.

SCH 14th May 2008
11:00 to 12:00
Object oriented data analysis

Object Oriented Data Analysis is the statistical analysis of populations of complex objects. In the special case of Functional Data Analysis, these data objects are curves, where standard Euclidean approaches, such as principal components analysis, have been very successful. Recent developments in medical image analysis motivate the statistical analysis of populations of more complex data objects which are elements of mildly non-Euclidean spaces, such as Lie Groups and Symmetric Spaces, or of strongly non-Euclidean spaces, such as spaces of tree-structured data objects. These new contexts for Object Oriented Data Analysis create several potentially large new interfaces between mathematics and statistics. Even in situations where Euclidean analysis makes sense, there are statistical challenges because of the High Dimension Low Sample Size problem, which motivates a new type of asymptotics leading to non-standard mathematical statistics.

SCH 14th May 2008
14:00 to 15:00
Looking at data and models in high-dimensional spaces: (3) Determining significance of structure

Now we'll look at using permutations and simulation to check for the significance of structure, and to compare with null samples. We'll also discuss re-ordering methods to better reveal structure. We'll use R, ggobi, and the package rggobi.

SCH 15th May 2008
11:00 to 12:00
Assessing high-dimensional latent variable models

Having built a probabilistic model, a natural question is: "what probability does my model assign to the data?".

We might fit the model's parameters to avoid having to compute an intractable marginal likelihood. Even then, evaluating a test-set probability with fixed parameters can be difficult. I will discuss recent work on evaluating high-dimensional undirected graphical models and models with many latent variables. This allows direct comparisons of the probabilistic predictions made by graphical models with hundreds of thousands of parameters against simpler alternatives.

SCH 19th May 2008
16:40 to 17:10
Frontiers in applications of data mining
SCH 19th May 2008
17:10 to 17:40
Frontiers in applications of machine learning
SCH 19th May 2008
17:40 to 18:30
Panel discussion
SCH 20th May 2008
11:00 to 12:00
On stratified path sampling of the thermodynamic integral: computing Bayes factors for nonlinear ODE models of biochemical pathways

Bayes factors provide a means of objectively ranking a number of plausible statistical models based on their evidential support. Computing Bayes factors is far from straightforward and methodology based on thermodynamic integration can provide stable estimates of the integrated likelihood. This talk will consider a stratified sampling strategy in estimating the thermodynamic integral and will consider issues such as optimal paths and the variance of the overall estimator. The main application considered will be the computation of Bayes factors for biochemical pathway models based on systems of nonlinear ordinary differential equations (ODE). A large scale study of the ExtraCellular Regulated Kinase (ERK) pathway will be discussed where recent Small Interfering RNA (siRNA) experimental validation of the predictions made using the computed Bayes factors is presented.

SCH 21st May 2008
11:00 to 12:00
Slow subspace learning

Slow feature learning exploits the intuition that in realistic processes subsequently observed stimuli are likely to have the same interpretation while independently observed stimuli are likely to be interpreted differently. The talk discusses such a method for stationary, absolutely regular process taking values in a high dimensional space. A projection to a low-dimensional subspace is selected from a finite number of observations on the basis of a criterion which rewards data-variance and penalizes the variance of the velocity vector. Convergence theorems, error analysis and some experiments are reported.

SCH 22nd May 2008
11:00 to 12:00
Latent variable models of transcriptional regulation

The expression of genes as messenger RNA (mRNA) in the cell is regulated by the activity of transcription factor proteins. The measurement of mRNA concentration for essentially all genes can be routinely carried out using high-throughput experimental techniques such as microarrays. It is much less straightforward to measure the concentration of activated transcription factor proteins. Latent variable models have therefore been developed which treat transcription factors as unobserved chemical species who's active concentration and effect can be inferred indirectly from the expression levels of their target genes. We are developing two classes of latent variable models. For small sub-systems, e.g. genes controlled by a single transcription factor, we model the process of transcription using ordinary differential equations with the transcription factor's concentration modeled using a Gaussian process prior distribution over functions. For larger systems, with hundreds of transcription factors controlling thousands of genes, we use simple discrete-time or non-temporal linear models. Bayesian methods provide a natural means for inference of transcription factor concentrations and other model parameters of interest.

Joint work with Neil Lawrence.

SCH 27th May 2008
15:00 to 15:30
On estimating covariances between many assets with histories of highly variable length
SCH 27th May 2008
15:30 to 16:00
Nonparametric estimation of a log-concave density
SCH 27th May 2008
16:00 to 16:30
Factorial mixture of Gaussians and the marginal independence model
SCH 27th May 2008
16:30 to 17:00
Understanding uncertainty
SCH 28th May 2008
11:00 to 12:00
HG Mueller Functional regression and additive models

Functional regression analysis aims at situations where predictors or responses in a regression setting include random functions. Early functional linear models were based on the assumption of observing complete random trajectories, while more recent approaches emphasize more realistic settings of repeated noisy measurements, as encountered in longitudinal studies or online data. Recent joint work with Yao on a functional additive model (FAM) will be discussed. FAM has good asymptotic and practical properties and provides desirable flexibility.

SCH 29th May 2008
11:00 to 12:00
N Cristianini Learning curves: lessons from statistical machine translation
SCH 3rd June 2008
11:00 to 12:00
On the approximation of quadratic forms and sparse matrix products

Thus far, sparse representations have been exploited largely in the context of robustly estimating functions in a noisy environment from a few measurements. In this context, the existence of a basis in which the signal class under consideration is sparse is used to decrease the number of necessary measurements while controlling the approximation error. In this talk, we instead focus on sparse representations of linear operators, with the objective of minimizing the number of operations required to perform basic operations (here, multiplication) on their matrix representations. We employ a representation in terms of sums of rank-one operators, and show how solving a sparse approximation problem akin to model selection for an induced quadratic form in turn guarantees a bounded approximation error for the product of two matrices. Connections to multilinear algebra by way of exterior products in turn yield new randomized algorithms for this and other tasks involving the large matrices and high-dimensional covariance operators that arise in modern statistical practice.

(joint work with Mohamed-Ali Belabbas)

SCH 5th June 2008
11:00 to 12:00
Approximation of functional spatial regression models using bivariate splines

We consider the functional linear regression model where the explanatory variable is a random surface and the response is a real random variable, with bounded or normal noise. Bivariate splines over triangulations represent the random surfaces. We use this representation to construct least squares estimators of the regression function with or without a penalization term. Under the assumptions that the regressors in the sample are bounded and span a large enough space of functions, bivariate splines approximation properties yield the consistency of the estimators. Simulations demonstrate the quality of the asymptotic properties on a realistic domain. We also carry out an aplication to ozone forecasting over the US that illustrates the predictive skills of the method.

This is joint work with Ming-Jun Lai.

SCH 6th June 2008
11:00 to 12:00
Challenges of regional climate modelling and validation

As attention shifts from broad global summaries of climate change to more specific regional results there is a need for statistics to analyze observations and model output that have significant variability and also to quantify the uncertainty in regional projections. This talk will survey some work on interpreting regional climate experiments. In large multi-model studies one challenge is to understand the contributions of different global and regional model combinations to the simulated climate. This is difficult because the individual runs tend to be short in length. Thus one is faced with the paradox of generating massive data sets that still demand statistical analysis to quantify significant features. We suggest some approaches based on functional data analysis that leverage sparse matrix techniques to handle large spatial fields.

(Joint work with Cari Kaufman, Stephen Sain and Linda Mearns.)

SCH 10th June 2008
11:00 to 12:00
Sparse recovery in convex hulls based on entropy penalisation
SCH 12th June 2008
11:00 to 12:00
Confidence sets for the optimal approximating model - bridging a gap between adaptive point estimation and confidence regions
SCH 17th June 2008
11:00 to 12:00
JC van Houwelingen Global testing of association and/or predictability in regression problems with p>>n predictors
Global testing is an accepted strategy for 'screening' regression problems and controlling the family-wise error rate. For p>n a global test was introduced by Jelle Goeman based on a random effect imbedding of the regression problem. This talk will first review this global test and discuss its link with goodness of fit tests. Secondly, its use as a screening instrument will be discussed along with its link with crossvalidation error. Finally, an adaptation will be presented that is better suited for screening on predictability. References. Goeman et al. (2006) JRSSB, 68, 477-493. Goeman et al. (2005) Bioinformatics 21, 1950-1957. Goeman et al. (2004) Bioinformatics 20, 93-99. LeCessie and van Houwelingen (1995) Biometrics 51, 600-614.
SCHW05 18th June 2008
14:00 to 15:00
S Godsill Sequential inference for dynamically evolving groups of objects

In this talk I will describe recent work on tracking for groups of objects. The aim of the process is to infer evolving groupings of moving objects over time, including group affiliations and individual object states. Behaviour of group objects is modelled using interacting multiple object models, in which individuals attempt stochastically to adjust their behaviour to be `similar' to that of other objects in the same group; this idea is formalised as a multi-dimensional stochastic differential equation for group object motion. The models are estimated algorithmically using sequential Markov chain Monte Carlo approximations to the filtering distributions over time, allowing for more complex modelling scenarios than the more familiar importance-sampling based Monte Carlo filtering schemes. Examples will be presented from GMTI data trials for multiple vehicle motion.

Related Links

SCHW05 18th June 2008
15:30 to 16:10
Y Cai A Bayesian method for non-Gaussian autoregressive quantile function time series models

Many time series in economics and finance are non-Gaussian. In this paper, we propose a Bayesian approach to non-Gaussian autoregressive quantile function time series models where the scale parameter of the models does not depend on the values of the time series. This approach is parametric. So we also compare the proposed parametric approach with the semi-parametric approach (Koenker, 2005). Simulation study and applications to real time series show that the method works very well.

SCHW05 18th June 2008
16:10 to 16:50
X Luo State estimation in high dimensional systems: the method of the ensemble unscented Kalman filter

The ensemble Kalman filter (EnKF) is a Monte Carlo implementation of the Kalman filter, which is often adopted to reduce the computational cost when dealing with high dimensional systems. In this work, we propose a new EnKF scheme based on the concept of the unscented transform, which therefore will be called the ensemble unscented Kalman filter (EnUKF). Under the assumption of Gaussian distribution of the estimation errors, it can be shown analytically that, the EnUKF can achieve more accurate estimations of the ensemble mean and covariance than the ordinary EnKF. Therefore incorporating the unscented transform into an EnKF may benefit its performance. Numerical experiments conducted on a $40$-dimensional system support this argument.

SCHW05 18th June 2008
16:50 to 17:30
A modern perspective on auxiliary particle filters

The auxiliary particle filter (APF) is a popular algorithm for the Monte Carlo approximation of the optimal filtering equations of state space models. This talk presents a summary of several recent developments which affect the practical implementation of this algorithm as well as simplifying its theoretical analysis. In particular, an interpretation of the APF, which makes use of an auxiliary sequence of distributions, allows the approach to be extended to more general Sequential Monte Carlo algorithms. The same interpretation allows existing theoretical results for standard particle filters to be applied directly. Several non-standard implementations and applications will also be discussed.

SCHW05 19th June 2008
09:00 to 09:40
VA Reisen Estimating multiple fractional seasonal long-memory parameter

This paper explores seasonal and long-memory time series properties by using the seasonal fractionally ARIMA model when the seasonal data has two seasonal periods, namely, s1 and s2. The stationarity and invertibility parameter conditions are established for the model studied. To estimate the memory parameters, the method given in Reisen, Rodrigues and Palma (2006 a,b), which is a variant of the technique proposed in Geweke and Porter-Hudak (1983) (GPH), is generalized here to deal with a time series with multiple seasonal fractional long-memory parameters. The accuracy of the method is investigated through Monte Carlo experiments and the good performance of the estimator indicates that it can be an alternative procedure to estimate seasonal and cyclical long-memory time series data.

SCHW05 19th June 2008
09:40 to 10:20
Y Shen Variational Markov Chain Monte Carlo for inference in partially observed stochastic dynamic systems

In this paper, we develop set of novel Markov chain Monte Carlo algorithms for Bayesian inference in partially observed non-linear diffusion processes. The Markov chain Monte Carlo algorithms we develop herein use an approximating distribution to the true posterior as the proposal distribution for an independence sampler. The approximating distribution utilises the posterior approximation computed using the recently developed variational Gaussian Process approximation method. Flexible blocking strategies are then introduced to further improve the mixing, and thus the efficiency, of the Markov chain Monte Carlo algorithms. The algorithms are tested on two cases of a double-well potential system. It is shown that the blocked versions of the variational sampling algorithms outperform Hybrid Monte Carlo sampling in terms of computational efficiency, except for cases where multi-modal structure is present in the posterior distribution.

SCHW05 19th June 2008
10:20 to 11:00
Two problems with variational expectation maximisation for time-series models

Variational methods are a key component of the approximate inference and learning toolbox. These methods fill an important middle ground, retaining distributional information about uncertainty in latent variables, unlike maximum a posteriori methods (MAP), and yet requiring fewer computational resources than Monte Carlo Markov Chain methods. In particular the variational Expectation Maximisation (vEM) and variational Bayes algorithms, both involving variational optimisation of a free energy, are widely used in time-series modelling. Here, we investigate the success of vEM in simple probabilistic time-series models. First we consider the inference step of vEM, and show that a consequence of the well known compactness property is a failure to propagate uncertainty in time, thus limiting the usefulness of the retained distributional information. In particular, the uncertainty may appear to be smallest precisely when the approximation is poorest. Second, we consider parameter learning and analytically reveal systematic biases in the parameters found by vEM. Surprisingly, simpler variational approximations (such a mean-field) can lead to less bias than more complicated structured approximations.

Related Links

SCHW05 19th June 2008
11:30 to 12:30
M Opper Approximate Inference for Continuous Time Markov Processes

Continuous time Markov processes (such as jump processes and diffusions) play an important role in the modelling of dynamical systems in many scientific areas.

In a variety of applications, the stochastic state of the system as a function of time is not directly observed. One has only access to a set of nolsy observations taken at a discrete set of times. The problem is then to infer the unknown state path as best as possible. In addition, model parameters (like diffusion constants or transition rates) may also be unknown and have to be estimated from the data. While it is fairly straightforward to present a theoretical solution to these estimation problems, a practical solution in terms of PDEs or by Monte Carlo sampling can be time consuming and one is looking for efficient approximations. I will discuss approximate solutions to this problem such as variational approximations to the probability measure over paths and weak noise expansions.

SCHW05 19th June 2008
14:00 to 15:00
Recent applications of spatial point processes to multiple-object tracking
The Point Process framework is natural for the multiple-object tracking problem and is increasingly playing a central role in the derivation of new inference schemes. Interest in this framework is largely due to the derivation of a filter that propagates the first moment of a Markov-in-time Spatial Point Processes observed in noise by Ronald Mahler. Since then there have been several extensions to this result with accompanying numerical implementations based on Sequential Monte Carlo. These results will be presented.
SCHW05 19th June 2008
15:20 to 16:00
Multi-object tracking with representations of the symmetric group

We present a framework for maintaining and updating a time varying distribution over permutations matching tracks to real world objects. Our approach hinges on two insights from the theory of harmonic analysis on noncommutative groups. The first is that it is sufficient to maintain certain “low frequency” Fourier components of this distribution. The second is that marginals and observation updates can be efficiently computed from such components by extensions of Clausen’s FFT for the symmetric group.

Related Links

SCHW05 19th June 2008
16:00 to 17:00
C Williams Factorial switching linear dynamical systems for physiological condition monitoring

Condition monitoring often involves the analysis of measurements taken from a system which "switches" between different modes of operation in some way. Given a sequence of observations, the task is to infer which possible condition (or "switch setting") of the system is most likely at each time frame. In this paper we describe the use of factorial switching linear dynamical models for such problems. A particular advantage of this construction is that it provides a framework in which domain knowledge about the system being analysed can easily be incorporated.

We demonstrate the flexibility of this type of model by applying it to the problem of monitoring the condition of a premature baby receiving intensive care. The state of health of a baby cannot be observed directly, but different underlying factors are associated with particular patterns of measurements, e.g. in the heart rate, blood pressure and temperature. We use the model to infer the presence of two different types of factors: common, recognisable regimes (e.g. certain artifacts or common physiological phenomena), and novel patterns which are clinically significant but have unknown cause. Experimental results are given which show the developed methods to be effective on real intensive care unit monitoring data.

Joint work with John Quinn and Neil McIntosh

Related Links

SCHW05 19th June 2008
17:00 to 17:30
Bayesian Gaussian process models for multi-sensor time-series prediction

We propose a powerful prediction algorithm built upon Gaussian processes (GPs). They are particularly useful for their flexibility, facilitating accurate prediction even in the absence of strong physical models. GPs further allow us to work within a completely Bayesian framework. As such, we show how the hyperparameters of our system can be marginalised by use of Bayesian Monte Carlo, a principled method of approximate integration. We employ the error bars of the GP's prediction as a means to select only the most informative observations to store. This allows us to introduce an iterative formulation of the GP to give a dynamic, on-line algorithm. We also show how our error bars can be used to perform active data selection, allowing the GP to select where and when it should next take a measurement. We demonstrate how our methods can be applied to multi-sensor prediction problems where data may be missing, delayed and/or correlated. In particular, we present a real network of weather sensors as a testbed for our algorithm.

SCHW05 20th June 2008
09:00 to 09:40
GJ McLachlan Clustering of time course gene-expression data via mixture regression models

In this paper, we consider the use of mixtures of linear mixed models to cluster data which may be correlated and replicated and which may have covariates. This approach can thus be used to cluster time series data. For each cluster, a regression model is adopted to incorporate the covariates, and the correlation and replication structure in the data are specified by the inclusion of random effects terms. The procedure is illustrated in its application to the clustering of time-course gene expression data.

SCHW05 20th June 2008
09:40 to 10:20
Markov chain Monte Carlo algorithms for Gaussian processes

We discuss Markov chain Monte Carlo algorithms for sampling functions in Gaussian process models. A first algorithm is a local sampler that iteratively samples each local part of the function by conditioning on the remaining part of the function. The partitioning of the domain of the function into regions is automatically carried out during the burn-in sampling phase. A more advanced algorithm uses control variables which are auxiliary function values that summarize the properties of the function. At each iteration, the algorithm proposes new values for the control variables and then generates the function from the conditional Gaussian process prior. The control input locations are found by minimizing the total variance of the conditional prior. We apply these algorithms to estimate non-linear differential equations in Systems Biology.

SCHW05 20th June 2008
10:20 to 11:00
Is that really the pattern we're looking for? Bridging the gap between statistical uncertainty and dynamic programming algorithms

Two approaches to statistical pattern detection, when using hidden or latent variable models, are to use either dynamic programming algorithms or Monte Carlo simulations. The first produces the most likely underlying sequence from which patterns can be detected but gives no quantification of the error, while the second allows quantification of the error but is only approximate due to sampling error. This paper describes a method to determine the statistical distributions of patterns in the underlying sequence without sampling error in an efficient manner. This approach allows the incorporation of restrictions about the kinds of patterns that are of interest directly into the inference framework, and thus facilitates a true consideration of the uncertainty in pattern detection.

SCHW05 20th June 2008
11:30 to 12:30
E Moulines Adaptive Monte Carlo Markov Chains

In this talk, we present in a common unifying framework several adaptive Monte Carlo Markov chain algorithms (MCMC) that have been recently proposed in the literature. We prove that under a set of verifiable conditions, ergodic averages calculated from the output of a so-called adaptive MCMC sampler converge to the required value and can even, under more stringent assumptions, satisfy a central limit theorem. We prove that the conditions required are satisfied for the Independent Metropolis-Hastings algorithm and the Random Walk Metropolis algorithm with symmetric increments. Finally we propose an application of these results to the case where the proposal distribution of the Metropolis-Hastings update is a mixture of distributions from a curved exponential family. Several illustrations will be provided.

SCHW05 20th June 2008
14:00 to 15:00
O Papaspiliopoulos A methodological framework for Monte Carlo estimation of continuous-time processes

In this talk I will review a mathodological framework for the estimation of partially observed continuous-time processes using Monte Carlo methods. I will presente different types of data structures and frequency regimes and will focus on unbiased (with respect to discretization errors) Monte Carlo methods for parameter estimation and particle filtering of continuous-time processes. An important component of the methodology is the Poisson estimator and I will discuss some of its properties. I will also present some results on the parameter estimation using variations of the smooth particle filter which exploit the graphical model structure inherent in partially observed continuous-time Markov processes.

SCHW05 20th June 2008
15:30 to 16:10
High frequency variability and microstructure bias
Microstructure noise can substantially bias the estimation of volatility of an Ito process. Such noise is inherently multiscale, causing eventual inconsistency in estimation as the sampling rate becomes more frequent. Methods have been proposed to remove this bias using subsampling mechanisms. We instead take a frequency domain approach and advocate learning the degree of contamination from the data. The volatility can be seen as an aggregation of contributions from many different frequencies. Having learned the degree of contamination allows us to frequency-by-frequency correct these contributions and calculate a bias-corrected estimator. This procedure is fast, robust to different signal to microstructure scenarios, and is also extended to the problem of correlated microstructure noise. Theory can be developed as long as the Ito process has harmonizable increments, and suitable dynamic spectral range.
SCHW05 20th June 2008
16:10 to 17:10
Nonparametric Bayesian times series models: infinite HMMs and beyond

Hidden Markov models (HMMs) are one of the most widely used statistical models for time series. Traditionally, HMMs have a known structure with a fixed number of states and are trained using maximum likelihood techniques. The infinite HMM (iHMM) allows a potentially unbounded number of hidden states, letting the model use as many states as it needs for the data (Beal, Ghahramani and Rasmussen 2002). Teh, Jordan, Beal and Blei (2006) showed that a form of the iHMM could be derived from the Hierarchical Dirichlet Process, and described a Gibbs sampling algorithm based on this for the iHMM. I will talk about recent work we have done on infinite HMMs. In particular: we now have a much more efficient inference algorithm based on dynamic programming, called 'Beam Sampling', which should make it possible to apply iHMMs to larger problems. We have also developed a factorial version of the iHMM which makes it possible to have an unbounded number of binary state variables, and can be thought of as a time-series generalization of the Indian buffet process.

Joint work with Jurgen van Gael (Cambridge), Yunus Saatci (Cambridge) and Yee Whye Teh (Gatsby Unit, UCL).

Related Links

SCHW03 23rd June 2008
10:00 to 11:00
Variable selection in very high dimensional regression and classification
SCHW03 23rd June 2008
11:30 to 12:30
Dimension reduction

Ursula Gather joint work with Charlotte Guddat

Progress in computer science in the last decades has practically led to ’floods of data’ which can be stored and has to be handled to gain information of interest therein. As an example, consider data from the field of genetics where the dimension may increase to values up in the thousands. Classical statistical tools are not able to cope with this situation.

Hence, a number of dimension reduction procedures have been developed which may be applied when considering nonparametric regression procedures. The aim is to find a subspace of the predictor space which is of much lower dimension but still contains the important information on the relation between response and predictors.

We will review a number of procedures for dimension reduction (e.g. SIR, SAVE) in multiple regression and consider them under robustness aspects as well. As a special case we include methods for variable selection (e.g. EARTH, SIS) and introduce a new robust approach for the case when n is much smaller than p.

SCHW03 23rd June 2008
14:00 to 15:00
Stability - based regularisation

The properties of L1-penalized regression have been examined in detail in recent years. I will review some of the developments for sparse high-dimensional data, where the number of variables p is potentially very much larger than sample size n. The necessary conditions for convergence are less restrictive if looking for convergence in L2-norm than if looking for convergence in L0-quasi-norm. I will discuss some implications of these results. These promising theoretical developments notwithstanding, it is unfortunately often observed in practice that solutions are highly unstable. If running the same model selection procedure on a new set of samples, or indeed a subsample, results can change drastically. The choice of the proper regularization parameter is also not obvious in practice, especially if one is primarily interested in structure estimation and only secondarily in prediction. Some preliminary results suggest, though, that the stability or instability of results is informative when looking for suitable data-adaptive regularization.

SCHW03 23rd June 2008
15:30 to 16:30
T Cai Large-scale multiple testing: finding needles in a haystack

Due to advances in technology, it has become increasingly common in scientific investigations to collect vast amount of data with complex structures. Examples include microarray studies, fMRI analysis, and astronomical surveys. The analysis of these data sets poses many statistical challenges not present in smaller scale studies. In these studies, it is often required to test thousands and even millions of hypotheses simultaneously. Conventional multiple testing procedures are based on thresholding the ordered p-values. In this talk, we consider large-scale multiple testing from a compound decision theoretical point of view by treating it as a constrained optimization problem. The solution to this optimization problem yields an oracle procedure. A data-driven procedure is then constructed to mimic the performance of the oracle and is shown to be asymptotically optimal. In particular, the results show that, although p-value is appropriate for testing a single hypothesis, it fails to serve as the fundamental building block in large-scale multiple testing. Time permitting, I will also discuss simultaneous testing of grouped hypotheses.

This is joint work with Wenguang Sun (University of Pennsylvania).

Related Links

SCHW03 24th June 2008
09:00 to 10:00
Fitting survival models with P>>n predictors: beyond proportional hazards

In a recent paper by Bovelstad et al. [1] partial likelihood ridge regression as used in [2] turned out to be the most successful approach to predicting survival with gene expression data.

However the proportional hazard model used in these models is quite simple and might not be realistic if there is a long survival follow-up. Exploring the fit of the model by using a cross-validated prognostic index leads to the conclusion that the effect of the predictor derived in [2] is neither linear nor constant over time.

We will discuss penalized reduced rank models as a way to obtain robust extensions of the Cox model for this type of data. For time varying effects the reduced rank model of [3] can be employed, while nonlinear effects can be introduced by means of bilinear terms. The predictive performance of such models can be regulated by penalization in combination with cross-validation.

References [1] Bovelstad, HM; Nygard, S; Storvold, HL; et al. Predicting survival from microarray data - a comparative study BIOINFORMATICS, 23 (16): 2080-2087 AUG 15 2007 [2] van Houwelingen, HC; Bruinsma, T; Hart, AAM; et al. Cross-validated Cox regression on microarray gene expression data STATISTICS IN MEDICINE, 25 (18): 3201-3216 SEP 30 2006 [3] Perperoglou, A; le Cessie, S; van Houwelingen, HC Reduced-rank hazard regression for modeling non-proportional hazards STATISTICS IN MEDICINE, 25 (16): 2831-2845 AUG 30 2006

SCHW03 24th June 2008
10:00 to 11:00
Model selection and estimation with multiple reproducing Karnel Hilbert spaces

In this talk, we consider the problem of learning a target function that belongs to the linear span of a large number of reproducing kernel Hilbert spaces. Such a problem arises naturally in many practice situations with the ANOVA, the additive model and multiple kernel learning as the most well known and important examples. We investigate approaches based on l1-type complexity regularization. We study the theoretical properties from both variable selection and estimation perspective. We establish several probabilistic inequalities providing bounds on the excess risk and L2-error that depend on the sparsity of the problem.

(part of the talk are based on joint work with Vladimir Koltchinskii.)

SCHW03 24th June 2008
11:30 to 12:30
A Tsybakov Sparsity oracle inequalities
The quality of solving several statistical problems, such as adaptive nonparametric estimation, aggregation of estimators, estimation under the sparsity scenario and weak learning can be assessed in terms of sparsity oracle inequalities (SOI) for the prediction risk. One of the challenges is to build estimators that attain the sharpest SOI under minimal assumptions on the dictionary. Methods of sparse estimation are mainly of the two types. Some of them, like the BIC, enjoy nice theoretical properties in terms of SOI without any assumption on the dictionary but are computationally infeasible starting from relatively modest dimensions p. Others, like the Lasso or the Dantzig selector, can be easily realized for very large p but their theoretical performance is conditioned by severe restrictions on the dictionary. We will focus on Sparse Exponential Weighting, a new method of sparse recovery realizing a compromise between theoretical properties and computational efficiency. The theoretical performance of the method in terms of SOI is comparable with that of the BIC. No assumption on the dictionary is required. At the same time, the method is computationally feasible for relatively large dimensions p. It is constructed using an exponential weighting with suitably chosen priors, and its analysis is based on the PAC-Bayesian ideas in statistical learning.
SCHW03 24th June 2008
14:00 to 14:20
The exchangeable graph model for statistical network analysis

Observations consisting of measurements on pairs of objects (or conditions) arise in a number of settings in the biological sciences (www.yeastgenome.org), with collections of scientific publications (www.jstor.org) and other hyper-linked resources (www.wikipedia.org), and in social networks (www.linkedin.com). Analyses of such data typically aim at identifying structure among the units of interest, in a low dimensional space, to support the generation of substantive hypotheses, to partially automate semantic categorization, to facilitate browsing, and to simplify complex data into useful patterns, more in general.

In this talk we introduce the exchangeable graph model and show its utility: 1. as a quantitative tool for exploring static/dynamic networks; 2. as a new paradigm for theoretical analyses of graph connectivity. Within this modeling context, we discuss alternative specifications and extensions that address fundamental issues in data analysis of complex interacting systems: bridging global and local phenomena, data integration, dynamics, and scalable inference.

SCHW03 24th June 2008
14:20 to 14:40
M West Data, models, inference and computation for dynamic cellular networks in systems biology

Advances in bioengineering technologies are generating the ability to measure increasingly high-resolution, dynamic data on complex cellular networks at multiple biological and temporal scales. Single-cell molecular studies, in which data is generated on the levels of expression of a small number of proteins within individual cells over time using time-lapse fluorescent microscopy, is one critical emerging area. Single cell experiments have potential to develop centrally in both mechanistic studies of natural biological systems as well as via synthetic biology -- the latter involving engineering of small cellular networks with well-defined function, so providing opportunity for controlled experimentation and bionetwork design. There is a substantial lag, however, in the ability to integrate, understand and utilize data generated from single-cell fluorescent microscopy studies. I will highlight aspects of this area from the perspective of our work in single cell studies in synthetic bacterial systems that emulate key aspects of mammalian gene networks central to all human cancers. I will touch on:

(a) DATA: Raw data come as movies of colonies of cells developing through time, with a need for imaging methods to estimate cell-specific levels of fluorescence measuring mRNA levels of one or several tagged genes within each cell. This is complicated by the progression of cells through multiple cell divisions that raises questions of tracking the lineages of individual cells over time.

(b) MODELS: In the context of our synthetic gene networks engineered into bacterial cells, we have developed discrete-time statistical dynamic models inspired by basic biochemical network modelling of the stochastic regulatory gene network. These models allow the incorporation of multiple components of noise that is "intrinsic" to biological networks as well as approximation and measurement errors, and provide the opportunity to formally evaluate the capacity of single cell data to inform on biochemical parameters and "recover" network structure in contexts of contaminating noise.

(c) INFERENCE & COMPUTATION: Our approaches to model fitting have developed Bayesian methods for inference in non-linear time series. This involves MCMC methods that impute parameter values coupled with novel, effective Metropolis methods for what can be very high-dimensional latent states representing the unobserved levels of mRNA or proteins on nodes in the network as well as contributions from "missing" nodes.

This work is collaborative with Jarad Niemi and Quanli wang (Statistical Science at Duke), Lingchong You and Chee-Meng Tan (Bioengineering at Duke).

SCHW03 24th June 2008
14:40 to 15:00
Statistical network analysis and inference: methods and applications

Exploring the statistical properties and hidden characteristics of network entities, and the stochastic processes behind temporal evolution of network topologies, are essential for computational knowledge discovery and prediction based on network data from biology, social sciences and various other fields. In this talk, I first discuss a hierarchical Bayesian framework that combines the mixed membership model and the stochastic blockmodel for inferring latent multi-facet roles of nodes in networks, and for estimating stochastic relationships (i.e., cooperativeness or antagonisms) between roles. Then I discuss a new formalism for modeling network evolution over time based on temporal exponential random graphs (TERGM), and a MCMC algorithm for posterior inference of the latent time-specific networks. The proposed methodology makes it possible to reverse-engineer the latent sequence of temporally rewiring networks given longitudinal measurements of node attributes, such as intensities of gene expressions or social metrics of actors, even when a single snapshot of such measurement resulted from each (time-specific) network is available.

Joint with Edo Airoldi, Dave Blei, Steve Fienberg, Fan Guo and Steve Hanneke

SCHW03 24th June 2008
15:30 to 16:30
High dimensional inference in bioinformatics and genomics

Bioinformatics came to the scene when biology started to automate its experiments. Although this would have led to “large n and small p” situations in other sciences, the complex nature of biology meant that it soon started to focus on lots of different variables, resulting in now well-known “small n, large p” situations. One such case is the inference of regulatory networks: the amount of networks is exponential in the number of nodes, whereas the available data is typically just a fraction thereof. We will present a penalized inference method that deals with such problems, that draws on experience with hypothesis testing. It has similarities with Approximate Bayesian Computation and seems to lead to exact inference in a few specific cases.

SCHW03 24th June 2008
16:30 to 17:30
Liquid association for large scale gene expression and network studies

The fast-growing public repertoire of microarray gene expression databases provides individual investigators with unprecedented opportunities to study transcriptional activities for genes of their research interest at no additional cost. Methods such as hierarchical clustering, principal component analysis, gene network and others, have been widely used. They offer biologists valuable genome-wide portraits of how genes are co-regulated in groups. Such approaches have a limitation because it often turns out that the majority of genes do not fall into the detected gene clusters. If one has a gene of primary interest in mind and cannot find any nearby clusters, what additional analysis can be conducted? In this talk, I will show how to address this issue via the statistical notion of liquid association. An online biodata mining system is developed in my lab for aiding biologists to distil information from a web of aggregated genomic knowledgebase and data sources at multi-levels, including gene ontology, protein complexes, genetic markers, drug sensitivity. The computational issue of liquid association and the challenges faced in the context of high p low n problems will be addressed.

SCHW03 25th June 2008
09:00 to 10:00
R Tibshirani The Lasso: some novel algorithms and applications

I will discuss some procedures for modelling high-dimensional data, based on L1 (lasso) -style penalties. I will describe pathwise coordinate descent algorithms for the lasso, which are remarkably fast and facilitate application of the methods to very large datasets for the first time. I will then give examples of new applications of the methods to microarray classification, undirected graphical models for cell pathways, and the fused lasso for signal detection, including comparative genomic hybridization.

SCHW03 25th June 2008
10:00 to 11:00
Sparsity in machine Learning: approaches and analyses
SCHW03 25th June 2008
11:30 to 12:30
A Owen Transposably invariant sample reuse: the pigeonhole bootstrap and blockwise cross-validation

Sample reuse methods like the bootstrap and cross-validation are widely used in statistics and machine learning. They provide measures of accuracy with some face value validity that is not dependent on strong model assumptions.

These methods depend on repeating or omitting cases, while keeping all the variables in those cases. But for many data sets, it is not obvious whether the rows are cases and colunns are variables, or vice versa. For example, with movie ratings organized by movie and customer, both movie and customer IDs can be thought of as variables.

This talk looks at bootstrap and cross-validation methods that treat rows and columns of the matrix symmetrically. We get the same answer on X as on X'. McCullagh has proved that no exact bootstrap exists in a certain framework of this type (crossed random effects). We show that a method based on resampling both rows and columns of the data matrix tracks the true error, for some simple statistics applied to large data matrices.

Similarly we look at a method of cross-validation that leaves out blocks of the data matrix, generalizing a proposal due to Gabriel that is used in the crop science literature. We find empirically that this approach provides a good way to choose the number of terms in a truncated SVD model or a non-negative matrix factorization. We also apply some recent results in random matrix theory to the truncated SVD case.

Related Links

SCHW03 26th June 2008
09:00 to 10:00
J-L Wang Covariate adjusted functional principal component analysis for longitudinal data

Classical multivariate principal component analysis has been extended to functional data and termed Functional principal component analysis (FPCA). Much progress has been made but most existing FPCA approaches do not accommodate covariate information, and it is the goal of this talk to develop alternative approaches to incorporate covariate information in FPCA, especially for irregular or sparse functional data. Two approaches are studied: the first incorporates covariate effects only through the mean response function, but the second approach adjusts the covariate effects for both the mean and covariance functions of the response. Both new approaches can accommodate measurement errors and allow data to be sampled at regular or irregular time grids. Asymptotic results are developed and numerical support provided through simulations and a data example. A comparison of the two approaches will also be discussed.

SCHW03 26th June 2008
10:00 to 11:00
Penalized empirical risk minimization and sparse recovery problems

A number of problems in regression and classification can be stated as penalized empirical risk minimization over a linear span or a convex hull of a given dictionary with convex loss and convex complexity penalty, such as, for instance, $\ell_1$-norm. We will discuss several oracle inequalities showing how the error of the solution of such problems depends on the "sparsity" of the problem and the "geometry" of the dictionary.

SCHW03 26th June 2008
11:30 to 12:30
The Nystrom extension and spectral methods in learning: low-rank approximation of quadratic forms and products

Spectral methods are of fundamental importance in statistics and machine learning, as they underlie algorithms from classical principal components analysis to more recent approaches that exploit manifold structure. In most cases, the core technical problem can be reduced to computing a low-rank approximation to a positive-definite kernel. Motivated by such applications, we present here two new algorithms for the approximation of positive semi-definite kernels, together with error bounds that improve upon known results. The first of these—based on sampling—leads to a randomized algorithm whereupon the kernel induces a probability distribution on its set of partitions, whereas the latter approach—based on sorting—provides for the selection of a partition in a deterministic way. After detailing their numerical implementation and verifying performance via simulation results for representative problems in statistical data analysis, we conclude with an extension of these results to the sparse representation of linear operators and the efficient approximation of matrix products.

SCHW03 26th June 2008
14:00 to 14:20
Limiting theorems for large dimensional sample means, sample covariance matrices and Hotelling's T2 statistics

It is well known that sample means and sample covariance matrices are independent if the samples are from the Gaussian distribution and are i.i.d.. In this talk, via investigating the random quardratic forms involving sample means and sample covariance matrices, we suggest the conjecture that the sample means and the sample covariance matrices under general distribution functions are asymptotically independent in the large dimensional case when the dimension of the vectors and the sample size both go to infinity with their ratio being a positive constant. As a byproduct, the central limit theorem for the Hotelling $T^2$ statistic under the large dimensional case is established.

SCHW03 26th June 2008
14:20 to 14:40
JQ Shi Generalised gaussian process functional regression model

In this talk, I will discuss a functional regression problem with non-Gaussian functional (longitudinal) response with functional predictors. This type of problem includes for example binomial and Poisson response data, occurring in many bi-medical and engineering experiments. We proposed a generalised Gaussian process functional regression model for such regression situation. We suppose that there exists an underlying latent process between the inputs and the response. The latent process is defined by Gaussian process functional regression model, which is connected with stepwise response data by means of a link function.

SCHW03 26th June 2008
14:40 to 15:00
Estimation of large volatility matrix for high-frequency financial data

Statistical theory for estimating large covariance matrix shows that even for noiseless synchronized high-frequency financial data, the existing realized volatility based estimators of integrated volatility matrix of p assets are inconsistent, for large p (the number of assets and large n (the sample size for high-frequency data). This paper proposes new types of estimators of integrated volatility matrix for noisy non-synchronized high-frequency data. We show that when both n and p go to infinity with p/n approaching to a constant, the proposed estimators are consistent with good convergence rates. Our simulations demonstrate the excellent performance of the proposed estimators under complex stochastic volatility matrices. We have applied the methods to high-frequency data with over 600 stocks.

SCHW03 26th June 2008
15:30 to 16:30
Graph decomposition for community identification and covariance constraints

An application in large databases is to find well-connected clusters of nodes in an undirected graph where a link represents interaction between objects. For example, finding tight-knit communities in social networks, identifying related product-clusters in collaborative filtering, finding genes which collaborate in different biological functions. Unlike graph-partitioning, in this scenario an object may belong to more than one community -- for example, a person might belong to more than one group of friends, or a gene may be active in more than one gene-network. I'll discuss an approach to identifying such overlapping communities based on extending the incidence matrix decomposition of a graph to a clique-decomposition. Clusters are then identified by approximate variational (mean-field) inference in a related probabilistic model. The resulting decomposition has the side-effect of enabling a parameteristion of positive definite matrices under zero-constraints on entries in the matrix. Provided the graph corresponding to the constraints is decomposable all such matrices are reachable by this parameterisation. In the non-decomposable case, we show how the method forms an approximation of the space and relate it to more standard latent variable parameterisations of zero-constrained covariances.

SCHW03 26th June 2008
16:30 to 17:30
Permutation-invariant covariance regularisation in high dimensions

Estimation of covariance matrices has a number of applications, including principal component analysis, classification by discriminant analysis, and inferring independence and conditional independence between variables, and the sample covariance matrix has many undesirable features in high dimensions unless regularized. Recent research mostly focused on regularization in situations where variables have a natural ordering. When no such ordering exists, regularization must be performed in a way that is invariant under variable permutations. This talk will discuss several new sparse covariance estimators that are invariant to variable permutations. We obtain convergence rates that make explicit the trade-offs between the dimension, the sample size, and the sparsity of the true model, and illustrate the methods on simulations and real data. We will also discuss a method for finding a "good" ordering of the variables when it is not provided, based on the Isomap, a manifold projection algorithm.

The talk includes joint work with Adam Rothman, Amy Wagaman, Ji Zhu (University of Michigan) and Peter Bickel (UC Berkeley).

SCHW03 27th June 2008
09:00 to 09:20
Optimal prediction from relevant components

In Helland (1990) the partial least squares regression model was formulated in terms of an algorithm on the parameters of the model. A version of this parametric algorithm has recently been used by several authors in connection to determining the central subspace and the central mean subspace of sufficient model reduction, as a method where matrix inversion is avoided. A crucial feature of the parametric PLS model is that the algorithm stops after m steps, where m is the number of relevant components. The corresponding sample algorithm will not usually stop after m steps, implying the the ordinary PLS estimates fall outside the parameter space, and thus cannot be maximally efficient.

We approach this problem using group theory. The X-covariance matrix is endowed with a rotation group, and in addition the regression coefficients upon the X-principal components are endowed with scale groups. This gives a transitive group on each subspace corresponding to m relevant components; more precisely, these subspaces give the orbits of the group. The ordinary PLS predictor is equivariant under this group. It is a known fact that in such situations the best equivariant estimator is equal to the Bayes estimator when the prior is taken as the invariant measure of the group. This Bayes estimator is found by a MCMC method, and is verified to be better than the ordinary PLS predictor.

SCHW03 27th June 2008
09:20 to 09:40
Dimension selection with independent component analysis and its application to prediction

We consider the problem of selecting the best or most informative dimension for dimension reduction and feature extraction in high-dimensional data. We review current methods, and propose a dimension selector based on Independent Component Analysis which finds the most non-Gaussian lower-dimensional directions in the data. A criterion for choosing the optimal dimension is based on bias-adjusted skewness and kurtosis. We show how this dimension selector can be applied in supervised learning with independent components, both in a regression and classification framework.

SCHW03 27th June 2008
09:40 to 10:00
L Li Model free variable selection via sufficient dimension reduction

Sufficient dimension reduction (SDR) has proven effective to transform high dimensional problems to low dimensional projections, while losing no regression information and pre-specifying no parametric model during the phase of dimension reduction. However, existing SDR methods suffer from the fact that each dimension reduction component is a linear combination of all the original predictors, and thus can not perform variable selection. In this talk, we propose a regularized SDR estimation strategy, which is capable of simultaneous dimension reduction and variable selection. We demonstrate that the new estimator achieves consistency in variable selection without requiring any traditional model, meanwhile retaining root-n estimation consistency of the dimension reduction basis. Both simulation studies and real data analyses are reported.

SCHW03 27th June 2008
10:00 to 11:00
Estimation of nonlinear functionals: recent results and open problems

Abstract: We present a theory of point and interval estimation for nonlinear functionals in parametric, semi-, and non-parametric models based on higher order influence functions. The theory reproduces many previous results, produces new non-root n results, and opens up the ability to perform optimal non-root n inference in complex high dimensional models. We present novel rate-optimal point and intervals estimators for various functionals of central importance to biostatistics in settings in which estimation at the expected root n rate is not possible, owing to the curse of dimensionality. We also show that our higher order influence functions have a multi-robustness property that extends the double robustness property of first order influence functions. Open questions will be discussed

SCHW03 27th June 2008
11:30 to 12:30
Applications of approximate inference and experimental design for sparse (generalised) linear models

Sparsity, or more general sub-Gaussianity, is a fundamental regularization principle for high-dimensional statistics. A recent surge of activity has clarified the behaviour of efficient sparse estimators in the worst case, but much less is known about practically efficient approximations to Bayesian inference, which is required for higher-level tasks such as experimental design.

We present an efficient framework for Bayesian inference on generalized linear models with sparsity priors, based on the expectation propagation algorithm, a deterministic variational approximation. We highlight some applications where this framework produces promising results. We hope to convey the relevance of approximate inference methods in practice, which substantially go beyond point estimation, yet whose theoretical properties and algorithmic scalability remains insufficiently understood.

SCHW03 27th June 2008
14:00 to 15:00
Statistics in astronomy: the Taiwanese-American occultation survey

More than a thousand small planetary bodies with radii >100 km have recently been detected beyond Neptune using large telescopes. The purpose of the TAOS project is to measure directly the number of these Kuiper Belt Objects (KBO’s) down to the typical size of cometary nuclei (a few km). When a KBO moves in between the earth and a distant star it will block the starlight momentarily, for about a quarter of a second. A telescope monitoring the starlight will thus see it blinking. Three small (20 inch) dedicated robotic telescopes equipped with 2,048 x 2,048 CCD cameras are operated in a coincidence so that the sequence and timing of the three separate blinks can be used to distinguish real events from false alarms. A fourth telescope will be added soon. TAOS will increase our knowledge about the Kuiper Belt, the home of most short period comets that return to the inner solar system every few years. This knowledge will help us to understand the formation and evolution of comets in the early solar system as well as to estimate their flux of impacting our home planet.

In this talk I will describe some of the statistical challenges that arise when hundreds or thousands of stars are simultaneously monitored every quarter of a second, every night of the year on which observation is possible, with the aim of detecting a few events. TAOS will produce a databank of the order of 10 terabytes per year, which is small by the standards of recent and future astronomical surveys. My intent in this talk is not to provide definitive methods of analysis but, rather, I hope that this concrete example of high dimensional non-Gaussian data informs the discussion of future directions in high dimensional data analysis to which this meeting is devoted.

Related Links

University of Cambridge Research Councils UK
    Clay Mathematics Institute London Mathematical Society NM Rothschild and Sons