Videos and presentation materials from other INI events are also available.
Event  When  Speaker  Title  Presentation Material 

SCHW01 
7th January 2008 10:00 to 11:00 
Breakdown point of model selection when the number of variables exceeds the number of observations  
SCHW01 
7th January 2008 11:30 to 12:30 
The deterministic lasso We study highdimensional generalized linear models. and risk minimization using the Lasso. The risk is taken under a random probability measure P' and the target is an overall minimizer of the risk under some other nonrandom probability measure P. We restrict ourselves to a set S where P' and P are close to each other, and present an oracle inequality under a socalled compatibility condition between the L_2 norm and l_1 norm. 

SCHW01 
7th January 2008 14:00 to 15:00 
Methods for visualizing high dimensional data In this presentation, we review some fundamentals of visualization and then proceed to describe methods and combinations of methods useful for visualizing high dimensional data. Some methods include parallel coordinates, smooth interpolations of parallel coordinates, grand tours including wrapping tours, fractal tours, pseudogrand tours, and pixel tours. 

SCHW01 
7th January 2008 15:30 to 16:30 
A Young 
Bootstrap and parametric inference: successes and challenges We review parametric frequentist inference as it has developed over the last 25 years or so. Two main strands have emerged: analytic procedures based on smallsample asymptotics and simulation (bootstrap) approaches. We argue that the latter yield, with appropriate handling of nuisance parameters, a simple and flexible methodology, yet one which nevertheless retains the finer inferential components of parametric theory in an automatic fashion. Performance of the bootstrap methods, even in problems with highdimensional parameters but small data sample sizes, points in favour of their being the method of choice in complex settings, such as those motivating this programme. Related Links


SCHW01 
8th January 2008 09:00 to 10:00 
Practical and informationtheoretic limitations in highdimensional inference This talk considers questions of two types concerning highdimensional inference. First, given a practical (polynomialtime) algorithm, what are the limits of its performance? Second, how do such practical limitations compare to informationtheoretic bounds, which apply to the performance of any algorithm regardless of computational complexity? We analyze these issues in highdimensional versions of two canonical inference problems: (a) support recovery in sparse regression; and (b) the sparse PCA or eigenvector problem. For the sparse regression problem, we describe a sharp threshold on the sample size n that controls success/failure of \ell_1 constrained quadratic programming (the Lasso), as function of the problem size p, and sparsity index k (number of nonzero entries). Using informationtheoretic methods, we prove that the Lasso is orderoptimal for sublinear sparsity (vanishing k/p), but suboptimal for linear sparsity (k/p bounded away from zero). For the sparse eigenvector problem, we analyze a semidefinite programming relaxation due to Aspremont et al., and establish a similar transition in failure/success for triplets (n,p,k) tending to infinity. Based on joint works with Arash Amini, John Lafferty, and Pradeep Ravikumar. 

SCHW01 
8th January 2008 10:00 to 11:00 
Some thoughts on nonparametric classification: nearest neighbours, bagging and max likelihood estimation of shapeconstrained densities The $k$nearest neighbour rule is arguably the simplest and most intuitively appealing nonparametric classifier. We will discuss recent results on the optimal choice of $k$ in situations where the underlying populations have densities with a certain smoothness in $\mathbb{R}^d$. Extensions to the bagged nearest neighbour classifier, which can be regarded as a weighted $k$nearest neighbour classifier, are also possible, and yield a somewhat suprising comparsion with the unweighted case. Another possibility for nonparametric classification is based on estimating the underlying densities explicitly. An attractive alternative to kernel methods is based on the maximum likelihood estimator, which can be shown to exist if the densities satisfy certain shape constraints, such as logconcavity. We will also discuss an algorithm for computing the estimator in this case, which results in a classifier that is fully automatic yet still nonparametric. Related Links


SCHW01 
8th January 2008 11:30 to 12:30 
RD Cook 
Modelbased sufficient dimension reduction for regression Dimension reduction in regression, represented primarily by principal components, is ubiquitous in the applied sciences. This is an old idea that has moved to a position of prominence in recent years because technological advances now allow scientists to routinely formulate regressions in which the number p of predictors is considerably larger than in the past. Although "large" p regressions are perhaps mainly responsible for renewed interest, dimension reduction methodology can be useful regardless of the size of p. Starting with a little history and a definition of "sufficient reductions", we will consider a variety of models for dimension reduction in regression. The models start from one in which maximum likelihood estimation produces principal components, step along a few incremental expansions, and end with forms that have the potential to improve on some standard methodology. This development provides remedies for two concerns that have dogged principal components in regression: principal components are typically computed from the predictors alone and then do not make apparent use of the response, and they are not equivariant under full rank linear transformation of the predictors. Related Links 

SCHW01 
8th January 2008 14:00 to 15:00 
Kernelbased contrast functions for sufficient dimension reduction We present a new methodology for sufficient dimension reduction (the problem of finding a subspace $S$ such that the projection of the covariate vector $X$ onto $S$ captures the statistical dependency of the response $Y$ on $X$). Our methodology derives directly from a formulation of sufficient dimension reduction in terms of the conditional independence of the covariate $X$ from the response $Y$, given the projection of $X$ on the central subspace (cf. Li, 1991; Cook, 1998). We show that this conditional independence assertion can be characterized in terms of conditional covariance operators on reproducing kernel Hilbert spaces and we show how this characterization leads to an Mestimator for the central subspace. The resulting estimator is shown to be consistent under weak conditions; in particular, we do not have to impose linearity or ellipticity conditions of the kinds that are generally invoked for SDR methods. We also present empirical results showing that the new methodology is competitive in practice. Related Links 

SCHW01 
8th January 2008 15:30 to 16:30 
J Fan 
Challenge of dimensionality in model selection and classification Model selection and classification using highdimensional features arise frequently in many contemporary statistical studies such as tumor classification using microarray or other highthroughput data. The impact of dimensionality on classifications is largely poorly understood. We first demonstrate that even for the independence classification rule, classification using all the features can be as bad as the random guessing due to noise accumulation in estimating population centroids in highdimensional feature space. In fact, we demonstrate further that almost all linear discriminants can perform as bad as the random guessing. Thus, it is paramountly important to select a subset of important features for highdimensional classification, resulting in Features Annealed Independence Rules (FAIR). The connections with the sure independent screeing (SIS) and iterative SIS(ISIS) of Fan and Lv (2007) in model selection will be elucidated and extended. The choice of the optimal number of features, or equivalently, the threshold value of the test statistics are proposed based on an upper bound of the classification error. Simulation studies and real data analysis support our theoretical results and demonstrate convincingly the advantage of our new classification procedure. Related Links


SCHW01 
8th January 2008 16:30 to 17:30 
P Bickel 
Regularised estimation of high dimensional covariance matrices Abstract: We review ,with examples, various important parameters depending on the population covariance matrix such as inverses and eigenstructures , and the uses they are put to.We give a brief discussion of well known pathologies of the empirical covariance matrix in various applications when the data is high dimensional which imply inconsistency of "plugin"estimates of the parameters mentioned. We introduce different notions of sparsity of such matrices and show how some of these are intimately related. We then review a number of methods taking advantage of such sparsity in the population matrices .In particular we state results with various collaborators, particularly E. Levina establishing rates of convergence of our estimates of parameters as above ,as dimension and sample size tend to oo, that are uniform over large classes of sparse population covariance matrices . We conclude with some simulations , a data analysis supporting the asymptotics, and a discussion of future directions. Related Links


SCHW01 
9th January 2008 09:00 to 10:00 
F Murtagh 
The ultrametric topology perspective on analysis of massive, very high dimensional data stores An ultrametric topology formalizes the notion of hierarchical structure. An ultrametric embedding, referred to here as ultrametricity, is implied by a hierarchical embedding. Such hierarchical structure can be global in the data set, or local. By quantifying extent or degree of ultrametricity in a data set, we show that ultrametricity becomes pervasive as dimensionality and/or spatial sparsity increases. This leads us to assert that very high dimensional data are of simple structure. We exemplify this finding through a range of simulated data cases. We discuss also application to very high frequency time series segmentation and modeling. Other applications will be described, in particular in the area of textual data mining. References [1] F. Murtagh, On ultrametricity, data coding, and computation, Journal of Classification, 21, 167184, 2004. [2] F. Murtagh, G. Downs and P. Contreras, "Hierarchical clustering of massive, high dimensional data sets by exploiting ultrametric embedding", SIAM Journal on Scientific Computing, in press, 2007. [3] F. Murtagh, The remarkable simplicity of very high dimensional data: application of modelbased clustering, submitted, 2007. [4] F. Murtagh, Symmetry in data mining and analysis: a unifying view based on hierarchy, submitted, 2007. Related Links


SCHW01 
9th January 2008 10:00 to 11:00 
L Duembgen 
Pvalues for computerintensive classifiers In the first part of the talk presents pvalues for classification in general. The latter are an interesting alternative to classifiers or posterior distributions of class labels. Their purpose is to quantify uncertainty when classifying a single observation, even if we don't have information on the prior distribution of class labels. After illustrating this concept with some examples and procedures, we focus on computational issues and discuss pvalues involving regularization, in particular, LASSO type penalties, to cope with highdimensional data. (Part of this talk is based on joint work with Axel Munk, Goettingen, and BerndWolfgang Igl, Luebeck.) Related Links


SCHW01 
9th January 2008 11:30 to 12:30 
W Stuetzle 
Nonparametric cluster analysis: estimating the cluster tree of a density The general goal of clustering is to identify distinct groups in a collection of objects. To cast clustering as a statistical problem we regard the feature vectors characterizing the objects as a sample from some unknown probability density. The premise of nonparametric clustering is that groups correspond to modes of this density. The cluster tree summarizes the connectivity structure of the level sets of a density; leaves of the tree correspond to modes of the density. I will define the cluster tree, present methods for its estimating, show examples, and discuss some open problems. Related Links 

SCHW01 
9th January 2008 14:00 to 15:00 
M West 
Sparsity modelling in largescale dynamic models for portfolio analysis I will discuss some of our recent work in dynamic modelling for multivariate time series that combines stochastic volatility and graphical modelling ideas. I will describe the modelling ideas and resulting matrixvariate, dynamic graphical models, and aspects of Bayesian methodology and computation for model fitting and structure search. Practical implications of the framework when applied to financial time series for predictive portfolio analysis will highlight some of the reasons for interest in sparsely structured, conditional independence models of volatility matrices. 

SCHW01 
9th January 2008 15:30 to 16:30 
Computationally tractable statistical estimation when there are more variables than observations We consider the fundamental problem of estimating the mean of a vector y = X beta + z, where X is an n by p design matrix in which one can have far more variables than observations and z is a stochastic error termthe socalled `p > n' setup. When \beta is sparse, or more generally, when there is a sparse subset of covariates providing a close approximation to the unknown mean response, we ask whether or not it is possible to accurately estimate the mean using a computationally tractable algorithm. We show that in a surprisingly wide range of situations, the lasso happens to nearly select the best subset of variables. Quantitatively speaking, we prove that solving a simple quadratic program achieves a squared error within a logarithmic factor of the ideal mean squared error one would achieve with an oracle supplying perfect information about which variables should be included in the model and which variables should not. Interestingly, our results describe the average performance of the lasso; that is, the performance one can expect in an overwhelming majority of cases where X\beta is a sparse or nearly sparse superposition of variables, but not in all cases. Our results are sharp, nonasymptotic and widely applicable since they simply require that pairs of predictor variables be not overly collinear. 

SCHW01 
9th January 2008 16:30 to 17:30 
Learning in high dimensions, noise, sparsity and treelets In recent years there is growing practical need to perform learning (classification,regression, etc) in high dimensional settings where p>>n. Consequently instead of the standard limit $n\to\infty$, learning algorithms are typically analyzed in the joint limit $p,n\to\infty$. In this talk we present a different approach, that keeps $p,n$ fixed, but considers noise as a small parameter. This resulting perturbation analysis reveals the importance of a robust low dimensional representation of the noisefree signals, the possible failure of simple variable selection methods and the key role of sparsity for the success of learning in high dimensions. We also discuss sparsity in apriori unknown basis and a possible datadriven adaptive construction of such basis, called treelets. We present a few applications of our analysis, mainly to errorinvariables linear regression problems, principal component analysis, and rank determination. Related Links 

SCHW01 
10th January 2008 09:00 to 10:00 
Estimating a response parameter in missing data models with highdimensional covariates We discuss a new method of estimation of parameters in semiparametric and nonparametric models. The method is based on estimating equations that are $U$statistics in the observations. The $U$statistics are based on higher order influence functions that extend ordinary linear influence functions of the parameter of interest, and represent higher derivatives of this parameter. For parameters for which the matching cannot be perfect the method leads to a biasvariance tradeoff, and results in estimators that converge at a slower than rootnrate. In a number of examples the resulting rate can be shown to be optimal. We are particularly interested in estimating parameters in models with a nuisance parameter of high dimension or low regularity, where the parameter of interest cannot be estimated at rootnrate. 

SCHW01 
10th January 2008 10:00 to 11:00 
Persistence: alternative proofs of some results of Greenshtein and Ritov  
SCHW01 
10th January 2008 11:30 to 12:30 
Looking at models in highdimensional data spaces What do the fishing net models of selforganizing maps look like in the data space? How do the estimated mean vectors and variancecovariance ellipses from modelbased clustering fit to the clusters? How does small n, large p affect the variability in the esimates of the separating hyperplane from support vector machine models? These are a few of the things that we may discuss in this talk. The goal is to calibrate participants' eyes to viewing highdimensional spaces and stimulate thought about what types of plots might accompany highdimensional statistical analysis 

SCHW01 
10th January 2008 14:00 to 15:00 
The surprising structure of Gaussian point clouds and its implications for signal processing We will explore connections between the structure of highdimensional convex polytopes and information acquisition for compressible signals. A classical result in the field of convex polytopes is that if N points are distributed Gaussian i.i.d. at random in dimension n<<N, then only order (log N)^n of the points are vertices of their convex hull. Recent results show that provided n grows slowly with N, then with high probability all of the points are vertices of its convex hull. More surprisingly, a rich "neighborliness" structure emerges in the faces of the convex hull. One implication of this phenomenon is that an Nvector with k nonzeros can be recovered computationally efficiently from only n random projections with n=2e k log(N/n). Alternatively, the best kterm approximation of a signal in any basis can be recovered from 2e k log(N/n) nonadaptive measurements, which is within a log factor of the optimal rate achievable for adaptive sampling. Additional implications for randomized error correcting codes will be presented. Related Links


SCHW01 
10th January 2008 15:30 to 16:30 
Finding lowdimensional structure in highdimensional data In highdimensional data analysis, one is often faced with the problem that real data is noisy and in many cases given in coordinates that are not informative for understanding the data structure itself or for performing later tasks, such as clustering, classification and regression. The combination of noise and high dimensions (>1001000) presents challenges for data analysis and calls for efficient dimensionality reduction tools that take the inherent geometry of natural data into account. In this talk, I will first describe treelets an adaptive multiscale basis inspired by wavelets and hierarchical trees. I will then, in the second half of my talk, describe diffusion maps  a general framework for dimensionality reduction, data set parameterization and clustering that combines ideas from eigenmaps, spectral graph theory and harmonic analysis. Our construction is based on a Markov random walk on the data, and allows one to define a system of coordinates that is robust to noise, and that reflects the intrinsic geometry or connectivity of the data points in a diffusion process. I will outline where we stand and what problems still remain. (Part of this work is joint with R.R. Coifman, S. Lafon, B. Nadler and L. Wasserman) 

SCHW01 
10th January 2008 16:30 to 17:30 
P Niyogi 
A geometric perspective on learning theory and algorithms Increasingly, we face machine learning problems in very high dimensional spaces. We proceed with the intuition that although natural data lives in very high dimensions, they have relatively few degrees of freedom. One way to formalize this intuition is to model the data as lying on or near a low dimensional manifold embedded in the high dimensional space. This point of view leads to a new class of algorithms that are "manifold motivated" and a new set of theoretical questions that surround their analysis. A central construction in these algorithms is a graph or simplicial complex that is dataderived and we will relate the geometry of these to the geometry of the underlying manifold. Applications to embedding, clustering, classification, and semisupervised learning will be considered. 

SCHW01 
11th January 2008 09:00 to 10:00 
Highdimensional variable selection and graphs: sparsity, faithfulness and stability Over the last few years, substantial progress has been achieved on highdimensional variable selection (and graphical modeling) using L1penalization methods. Diametrically opposed to penaltybased schemes is the PCalgorithm, a special hierarchical multiple testing procedure, which exploits the socalled faithfulness assumption from graphical modeling. For asymptotic consistency in highdimensional settings, the different approaches require very different "coherence" conditions, say for the design matrix in a linear model. From a conceptual aspect, the PCalgorithm allows to identify not only regressiontype associations but also directed edges in a graph and causal effects (in the sense of Pearl's intervention operator). Thereby, sparsity, faithfulness and stability play a crucial role. We will discuss potential and limitations from a theory and practical point of view. Related Links 

SCHW01 
11th January 2008 10:00 to 11:00 
Time series regression with semiparametric factor dynamics Highdimensional regression problems which reveal dynamic behavior are typically analyzed by time propagation of a few number of factors. The inference on the whole system is then based on the lowdimensional time series analysis. Such highdimensional problems occur frequently in many different fields of science. In this paper we address the problem of inference when the factors and factor loadings are estimated by semiparametric methods. This more flexible modelling approach poses an important question: Is it justified, from inferential point of view, to base statistical inference on the estimated times series factors? We show that the difference of the inference based on the estimated time series and `true' unobserved time series is asymptotically negligible. Our results justify fitting vector autoregressive processes to the estimated factors, which allows one to study the dynamics of the whole highdimensional system with a lowdimensional representation. The talk reports on joint projects with Szymon Borak, Wolfgang H\"ardle, Jens Perch Nielsen and Byeong U. Park 

SCHW01 
11th January 2008 11:30 to 12:30 
B Y Yu 
Using side information for prediction Extracting useful information from highdimensional data is the focus of today's statistical research and practice. Penalized loss function minimization has been shown to be effective for this task both theoretically and empirically. With the vitues of both regularization and sparsity, the L1penalized L2 minimization method Lasso has been popular. However, Lasso is often seen as not having enough regularization in the large p case. In this talk, we propose two methods that take into account side information in the penalized L2 framework, in order to bring the needed extra regularization in the large p case. First, we combine different norms including L1 to to introduce the Composite Absolute Penalties (CAP) family. CAP allows the grouping and hierarchical relationships between the predictors to be expressed. It covers and goes beyond existing works including grouped lasso and elastic nets. Path following algorithms and simulation results will be presented to compare with Lasso in terms of prediction and sparsity. Second, motivated by the problem of predicting fMRI signals from input natural images, we investigate a method that uses side information in the unlabeled data for prediction. We present a theoretical result in the case of p/n > constant and apply the method to the fMRI data problem. (It is noted that the second part is a report on ongoing research.) ~ Related Links


SCHW01 
11th January 2008 14:00 to 15:00 
A physicist's approach to highdimensional inference  
SCHW01 
11th January 2008 15:30 to 16:30 
Models, model lists, model spaces and predictive optimality Sources of uncertainty related to model specification are often the single biggest factors in inference. In the predictive context, we demonstrate the effect of varying the model list used for averaging and varying the averaging strategy in computational examples. In addition, by varying the model space while using similar lists and averaging strategies, we demonstrate that the effect of the space itself computationally. Thus, it is reasonable to associate a concept of variance and bias not just to individual models but to other aspects of an overall modeling strategy. Moreover, although difficult to formalize, good prediction is seen to be associated with a sort of complexity matching between the space and the unknown function, and robustness. In some cases, the relationship among complexity, variancebias, robustness and averaging strategy seems to be dependent on sample size. Taken together, these considerations can be formalized into an overview that may serve as a framework for more general inferential problems in Statistics 

SCH 
14th January 2008 11:00 to 12:00 
Innovative higher criticism for detecting sparse signals in correlated noise  
SCH 
16th January 2008 11:00 to 12:00 
Hierarchically penalised Cox regression for censored data with grouped variables and its oracle property  
SCH 
18th January 2008 11:00 to 12:00 
M Pontil  A spectral regularisation framework for multitask structure learning  
SCH 
22nd January 2008 11:00 to 12:00 
Statistical issues amd metabolomics  
SCH 
24th January 2008 11:00 to 12:00 
Excess mass estimation  
SCH 
24th January 2008 15:00 to 17:00 
An informal introduction to sufficient dimension reduction  
SCH 
25th January 2008 11:00 to 12:00 
An ensemble approach to improved prediction from multitype data  
SCH 
29th January 2008 11:00 to 12:30 
Model selection and sharp asymptotic minimaxity We will show that a class of model selection procedures are asymptotically sharp minimax to recover sparse signals over a wide range of parameter spaces. Connections to Bayesian model selection, the MDL principle and wavelet estimation will be discussed. 

SCH 
31st January 2008 09:00 to 10:00 
High frequency micro structure in futures markets  
SCH 
31st January 2008 10:00 to 10:45 
Choosing a portfolio of many assets  
SCH 
31st January 2008 11:00 to 12:00 
P Clarkson  A database of foreign exchange deals  
SCH 
5th February 2008 11:00 to 12:00 
Approximation methods in statistical learning theory Spectral methods are of fundamental importance in statistical learning, as they underlie algorithms from classical principal components analysis to more recent approaches that exploit manifold structure. In most cases, the core technical problem can be reduced to computing a lowrank approximation to a positivedefinite kernel. Using traditional methods, such an approximation can be obtained with computational complexity that scales as the cube of the number of training examples. For the growing number of applications dealing with very large or highdimensional data sets, however, these techniques are too costly. A known alternative is the Nystrom extension from finite element methods. While its application to machine learning has previously been suggested in the literature, we introduce here what is, to the best of our knowledge, the first randomized algorithm of this type to yield a relative approximation error bound. Our results follow from a new class of algorithms for the approximation of matrix products, which reveal connections between classical linear algebraic quantities such as Schur complements and techniques from theoretical computer science such as the notion of volume sampling. 

SCH 
7th February 2008 11:00 to 12:00 
Modelling human motion with Gaussian processes Human motion capture data is a high dimensional time series. Probabilistic modelling of this high dimensional data is affected by problems of dimensionality. In this talk we will show how Gaussian processes can be used to reduce the dimensionality and construct accurate models of human motion. The main application will be three dimensional human pose reconstruction from images. 

SCH 
8th February 2008 11:00 to 12:00 
Properties of regularisation operators in learning theory We consider the properties of a large class of learning algorithms defined in terms of classical regularization operators for illposed problems. This class includes regularized leastsquares, Landweber method, $\nu$methods and truncated singular value decomposition on hypotyesis spaces of vectorvalued functions defined in terms of suitable reproducing kernels. In particular universal consistency, minimax rates and statistical adaptation of the methods we will be discussed. 

SCH 
12th February 2008 11:00 to 12:00 
J Kent 
Procrustes methods for projective shape Projective shape is important in computer vision to represent the information in a scene that is invariant under different camera views. The simplest example is the cross ratio, which represents the projective shape of four collinear points. One way to study projective shape is through projective invariants. However, a disadvantage is that there seems to be no natural metric structure on these invariants, making it difficult to quantify differences between different projective shapes. The purpose of this talk is to describe a metric structure for projective shapes. Then, using Procrustes methods, the beginnings of a statistical theory will be developed to construct averages and describe variability for a collection of projective shapes. 

SCH 
13th February 2008 14:00 to 15:00 
YH Said 
Text mining and high dimensional statistical analysis Text mining can be thought of as a synthesis of information retrieval, natural language processing and statistical data mining. The set of documents being considered can scale to hundreds of thousands and the associated lexicon can be a million or more words. Analysis is often done by consideration of a termdocument matrix or even a bigramdocument matrix. The dimensionality of the term vector can thus easily be a million or more. In this talk I will describe some of the approaches to text mining on which we have been working. This is a joint work with Dr Edward Wegman. 

SCH 
15th February 2008 11:00 to 12:00 
An introduction to variational methods for incompletedata problems Likelihood and Bayesian inference for incompletedata problems tend to involve computational complications. In Bayesian inference, for example, simulationbased methods such as Markov chain Monte Carlo represent one approach to dealing with such difficulties. The talk will describe a more deterministic approach, based on socalled variational approximations. These have been developed in the computer science literature and versions of them for likelihood analysis and Bayesian analysis will be described in the talk. Application to the analysis of mixture models and extensions thereof will be discussed, as will general issues concerning the theoretical properties of the methods. 

SCH 
18th February 2008 15:00 to 15:30 
Bayesian hierarchical clustering  
SCH 
18th February 2008 15:30 to 16:00 
Bayesian nonparametric latent feature models  
SCH 
18th February 2008 16:00 to 16:30 
New models for relational classification  
SCH 
18th February 2008 16:30 to 17:00 
Gaussian process methods for large and highdimensional data sets  
SCH 
19th February 2008 11:00 to 12:00 
M Seeger 
Expectation Propagation  Experimental Design for the Sparse Linear M Expectation propagation (EP) is a novel variational method for approximate Bayesian inference, which has given promising results in terms of computational efficiency and accuracy in several machine learning applications. It can readily be applied to inference in linear models with nonGaussian priors, generalised linear models, or nonparametric Gaussian process models, among others, yet has not been used in Statistics so far to our knowledge. I will give an introduction to this framework. I will then show how to address sequential experimental design for a linear model with nonGaussian sparsity priors, giving some results in two different machine learning applications. These results indicate that experimental design for these models may have significantly different properties than for linearGaussian models, where Bayesian inference is analytically tractable and experimental design seems best understood. EP as a statistical approximation technique, and especially experimental design for models different from linearGaussian ones, is not wellunderstood theoretically. To advance on the understanding, it seems promising to relate it to work in Statistics on multivariate continuousvariable distributions, and I am hoping very much for feedback from the audience in that respect. 

SCH 
21st February 2008 11:00 to 12:00 
Some statistical problems from artificial intelligence  
SCH 
22nd February 2008 11:00 to 12:00 
Functional sparsity Substantial progress has recently been made on understanding the behaviour of sparse linear models in the highdimensional setting, where the number the variables can greatly exceed the number of samples. This problem has attracted the interest of multiple communities, including applied mathematics, signal processing, statistics and machine learning. But linear models often rely on unrealistically strong assumptions, made mainly for convenience. Going beyond parametric models, can we understand the properties of highdimensional functions that enable them to be estimated accurately from sparse data? In this talk we present some progress on this problem, showing that many of the recent results for sparse linear models can be extended to the infinitedimensional setting of nonparametric function estimation. In particular, we present some theory for estimating sparse additive models, together with algorithms that are scalable to high dimensions. We illustrate these ideas with an application to functional sparse coding of natural images. This is joint work with Han Liu, Pradeep Ravikumar, and Larry Wasserman. 

SCH 
26th February 2008 11:00 to 12:00 
Learning latent activites in large scale dynamical problems Many machine learning problems can be cast as problems of learning highly structured latent activities or dynamics. I will discuss typical approaches to these problems, and illustrate this using the problems of modelling handwriting and modelling fMRI data. However the problem of really learning complicated structural dynamics still seems elusive, and I will briefly discuss what approaches may be fruitful in achieving this. 

SCH 
28th February 2008 11:00 to 12:00 
Premodelling via BART Consider the canonical regression setup where one wants to learn about the relationship between y, a variable of interest, and x_1,...,x_p, p potential predictor variables. Although one may ultimately want to build a parametric model to describe and summarize this relationship, preliminary analysis via flexible nonparametric models may provide useful guidance. For this purpose we propose BART (Bayesian Additive Regression Trees), a flexible nonparametric ensemble Bayes approach for estimating f(x_1,...,x_p), which is E(Yx_1,...,x_p), for obtaining predictive regions for future y, for describing the marginal effects of subsets of x_1,...,x_p and for modelfree variable selection. Essentially, BART approximates f by a Bayesian 'sumoftrees' model where fitting and inference are accomplished via an iterative backfitting MCMC algorithm. By using a large number of trees, which yields a redundant basis for f, BART is seen to be remarkably effective at finding highly nonlinear relationships hidden within a large number of irrelevant potential predictors. BART also provides an omnibus test: the absence of any relationship between y and any subset of x_1,...,x_p is indicated when BART posterior intervals for f reveal no signal. (This is joint work with Hugh Chipman and Robert McCulloch.) 

SCH 
29th February 2008 11:00 to 12:00 
Some thoughts about the design of dissimilarity measures In many situations, dissimilarities between objects cannot be measured directly, but have to be constructed from some known characteristics of the objects of interest, e.g. some values on certain variables. >From a philosophical point of view, the assumption of the objective existence of a 'true' but not directly observable dissimilarity value between two objects is highly questionable. We treat the dissimilarity construction problem as a problem of the choice or design of such a measure and not as an estimation problem of some existing but unknown quantities. Therefore, subjective judgment is necessarily involved, and the main aim of the design of a dissimilarity measure is the proper representation of a subjective or intersubjective concept (usually of subjectmatter experts) of similarity or dissimilarity between the objects. The design of dissimilarity measures is of particular interest when analyzing highdimensional data, because methods such as MDS and nearest neighbour techniques operate on dissimilarity matrices and such matrices are not essentially more complex when derived from high dimensional data. Some guidelines for the choice and design of dissimilarity measures are given and illustrated by the construction of a new dissimilarity measure between species distribution areas in biogeography, which are formalized as binary presenceabsence data on a set of geographic units. I will also discuss alternatives to the Euclidean distance and their implications for highdimensional situations in which it is not feasible to use information about the meaning of individual variables to construct a dissimilarity measure. 

SCH 
4th March 2008 11:00 to 12:00 
JQ Shi 
Gaussian process functional regression model for curve prediction and clustering In this talk I will first discuss Gaussian Process Functional Regression (GPFR) model, which is used to model functional response curves with a set of functional covariates (the dimension of the covariates may be very large). There are two main features: modelling nonlinear and nonparametric functional regression relationship and modelling covariance structure and mean structure simultaneously. The method gives very accurate results for curve fitting and prediction but sidesteps the problem of heterogeneity. I will then discuss how to define a hierarchical mixture model to model 'spatially' indexed functional data, i.e., the heterogeneity is dependent on factors such as region or individual patient's information. The mixture model has also been used for curve clustering, but focusing on the problem of clustering functional relationships between response curve and covariates, i.e. the clustering is based on the surface shape of the functional response against the set of functional covariates. Some applications based on simulated data and real data will be presented. 

SCH 
6th March 2008 11:00 to 12:00 
Nonparametric estimation of HARDI diffusion weighted magnetic resonance imaging data DiffusionWeighted Magnetic Resonance Imaging captures the diffusion of water molecules in tissue. The impediment of this diffusion process by nerves enables the characterisation of white matter structure and the measurement of quantitative descriptions of white matter integrity. Initial quantification of the diffusion was based on modelling the Diffusion PDF parametrically, and as such the parameters of the PDF can be estimated, if with some modelchoice issues. A single Gaussian Diffusion Tensor model can for example be determined with a minimum of 6 measurements. Of special interest is inferring the orientational structure of the PDF and as much as one third of all white matter voxels in the brain experience orientational heterogeneity. It is hard to model orientational heterogeneity parametrically, and to estimate the PDF without bias a substantial number of additional measurements are required. We discuss nonparametric estimation methods of the important characteristics of the diffusion PDF, and inherent limitations in estimation based on a clinically feasible acquisition protocol. We discuss combining hard and soft shrinkage procedures with a suitable basis representation, and how to construct nonparametric summaries of the diffusion with reduced variance without incurring substantial bias. This is joint work with Brandon Whitcher, CIC Hammersmith, GSK. 

SCH 
11th March 2008 11:00 to 12:00 
Total variation and curves We discuss the approximation of data from one and twodimensional curves using totalvariationbased techniques. Our aim will be to minimise complexity among all functions which satisfy a criterion for approximation. Complexity will be measured by the number of local extreme values or variational properties of the functions. Our criteria for approximation will be based on a multiscale analysis of the residuals. 

SCH 
11th March 2008 14:00 to 15:00 
Proteomics data analysis Within the context of expression proteomics, we developed a novel approach to identify and assess meaningful differences in functional datasets. Given multiple proteomic profiles (generated by a Matrix Assisted Laser Desorption Ionization TimeofFlight Mass Spectrometer) from subjects who belonged to one of two treatment groups, we extracted and classified biologically relevant information using Bayesian nonparametric methods. We modelled f(t), the mean ion abundance per spectrum, via an adaptive kernel regression approach, and relied on an underlying Levy random field to control model complexity. We began by implementing a Levy random fields model for an individual spectrum, and extended it hierarchically to include data from multiple spectra. To make the extension, we asserted that each multimodal spectrum depended upon one, time and resolution dependent, marked Gamma process, but was unique for reasons including random, biological or measurement error. Upon eliciting parameter prior distributions, we designed a Markov chain Monte Carlo algorithm that enabled exploration of a transdimensional model space and posterior predictions of experimentalgroup status. 

SCH 
12th March 2008 16:15 to 17:00 
Some issues raised by high dimension in Statistics  a partial overview of the SCH Programme  
SCH 
13th March 2008 11:00 to 12:00 
Multilevel modelling of proteomic massspectrometry data Statistical methodology for the analysis of proteomic massspectrometry data is proposed using multilevel modelling. Each highdimensional spectrum is represented using a nearorthogonal low dimensional basis of Gaussian functions. Multivariate mixed effect models are proposed in the lower dimensional space. In particular, differences between groups are investigated using fixed effect parameters, and individual variability of spectra is modelled using random effects. A deterministic peak fitting algorithm provides initial estimates of the nearorthogonal Gaussian basis, and the estimates are updated using a twostage iterative method. The multilevel model is fitted using a parallel procedure for computational convenience. The methodology is applied to proteomic massspectrometry data from serum samples from melanoma patients categorized as Stage I or Stage IV, and significant locations of peaks are identified. Finally comparisons with other methods, including simple featurebased statistics and more complicated Bayesian Markov chain Monte Carlo inference are also made. This is joint work with William Browne (University of Bristol) and Kelly Handley (University of Birmingham). 

SCH 
17th March 2008 17:00 to 18:00 
D Donoho 
More unknowns than equations? Not a problem! Use Sparsity!
Everything you were taught about underdetermined systems of linear equations is wrong... Okay, that's too strong. But you have been taught things in undergraduate linear algebra which, if you are an engineer or scientist, may be holding you back. The main one is that if you have more unknowns than equations, you're lost. Don't believe it. At the moment there are many interesting problems in the information sciences where researchers are currently confounding expectations by turning linear algebra upside down:
Moreover, in each case the methods are convenient and computationally tractable. Mathematically, what's going on is a recent explosion of interest in finding the sparsest solution to certain systems of underdetermined linear equations. This problem is known to be NPHard in general, and hence the problem sounds intractable. Surprisingly, in some particular cases, it has been found that one can find the sparsest solution by l¹ minimization, which is a convex optimization problem and so tractable. Many researchers are now actively working to explain and exploit this phenomenon. It's responsible for the examples given above. In my talk, I'll discuss that this curious behavior of l¹ minimization and connect with some pure mathematics  convex polytope theory and oriented matroid theory. Ultimately, the pure math behind this phenomenon concerns some accessible but very surprising properties of random point clouds in very high dimensions: each point gets very neighborly! I'll also explain the connection of this phenomenon to the Newton Institute's ongoing program "Statistical Theory and Methods for Complex, HighDimensional Data". 

SCH 
18th March 2008 11:00 to 12:00 
E Hancock 
Analysis of graphs using diffusion processes and random walks (a random walk through spectral graph theory) This talk will focus on how graphstructures can be analysed using diffusion processes and random walks. It will commence by explaining the relationship between the heat equation on a graph, the spectrum of the Laplacian matrix (the degree matrix minus the weighted adjacency matrix) and the steadystate random walk. The talk will then focus in some depth on how the heat kernel, i.e. the solution of the heat equation, can be used to characterise graph structure in a compact way. One of the important steps here is to show that the zeta function is the moment generating function of the heat kernel trace, and that the zeta function is determined by the distribution of paths and the number of spanning trees in a graph. We will then explore a number of applications of these ideas in image analysis and computer vision. This will commence by showing how the heat kernel can be used for the anisotropic smoothing of complex nonEuclidean image data, including tensor MRI. We will then show how a similar diffusion process based on the FokkerPlanck equation can be used for consistent image labelling. Thirdly, we will show how permutation invariant characteristics extracted from the heatkernel can be used for learning shape classes. If time permits, the talk will conclude by showing how quantum walks on graphs can overcome some of the problems which limit the utility of classical random walks. 

SCH 
19th March 2008 11:00 to 12:00 
Bootstrapping divergence weighted independence graphs Independence graphs give an overview of multivariate dependency. After a brief introduction to information divergence and to conditional independence graphs we show DWIGs fall within the paradigm of design based inference. Bootstrap resampling tests the stability of the DWIG parameters when increasing the dimension of the underlying data set. 

SCH 
26th March 2008 11:00 to 12:00 
WH Teh 
Improvements to variational Bayesian inference Variational Bayesian (VB) inference is an approximate inference framework that has been successfully applied in a wide variety of graphical models. It is well accepted that VB provides lowered variance in posterior estimation in exchange for higher bias, as opposed to Markov chain Monte Carlo (MCMC) inference. In this talk we shall explore improvements to the VB framework in order to reduce bias, in the context of a specific Bayesian network called latent Dirichlet allocation. Specifically we consider two ideas: collapsing or integrating out variables before any approximations are made, and hybrid methods that combine VB and MCMC techniques. 

SCH 
27th March 2008 11:00 to 12:00 
B Kleijn 
The semiparametric BernsteinVon Mises theorem The BernsteinVon Mises theorem provides a detailed relation between frequentist and Bayesian statistical methods in smooth, parametric models. It states that the posterior distribution converges to a normal distibution centred on the maximumlikelihood estimator with covariance proportional to the Fisher information. In this talk we consider conditions under which such an assertion holds for the marginal posterior of a parameter of interest in semiparametric models. From a practical point of view, this enables the use of Bayesian computational techniques (e.g. MCMC simulation) to obtain (hard to compute otherwise) frequentist confidence intervals. (Joint work with P. Bickel.) 

SCHW02 
31st March 2008 10:00 to 11:00 
The evolution of promoter sequence We have produced an evolutionary model for promoters (and more generally for genomic regulatory sequence) analogous to the commonly used synonymous/nonsynonymous mutation models for proteincoding sequence. Although our model, called Sunflower, relies on some simple assumptions, it captures enough of the biology of transcription factor action to show clear correlation with other biological features. Sunflower predicts a binding profile of transcription factors to DNA sequence, in which different factors compete for the same potential binding sites. Sunflower can also model cooperative binding. We can control the apparent concentration of the factors by setting parameters uniformly or from gene expression data. The parameterized model simultaneously estimates a continuous measurement of binding occupancy across the genomic sequence for each factor. We can then introduce either a localized mutation (such as a SNP) or a coordinated set of mutations (for example, from a haplotype or another species), rerun the binding model and record the difference in binding profiles using their relative entropy. A single mutation can alter interactions both upstream and downstream of its position due to potential overlapping binding sites, and our statistic captures this domino effect. Results from Sunflower show many features in agreement with known biology. For example, the overall binding occupancy rises over transcription start sites, and CpG desert promoters show sharper localization signals relative to the transcription start site. More interesting are correlates to variation both between species and within them. Over evolutionary time, we observe a clear excess of low scoring mutations fixed in promoters, consistent with most changes being neutral. However, this is not consistent across all promoters, and some promoters show more rapid divergence. This divergence often occurs in the presence of relatively constant protein coding divergence. Interestingly, different classes of promoters show different sensitivity to mutations, with developmental and immunological genes having promoters inherently more sensitive to mutations than housekeeping genes. 

SCHW02 
31st March 2008 11:30 to 12:30 
Functional genomics and the forest of life We will discuss the 0dimensional statistical problem of alignment, and its relation to the highdimensional problem of phylogeny. In particular, we will discuss the relevance of the "space of phylogenetic oranges" and its relation to the above problems. We will also discuss "sequence annealing", which is a new alignment strategy based on these ideas. 

SCHW02 
31st March 2008 14:00 to 15:00 
Understanding interactomes by data integration  
SCHW02 
31st March 2008 15:30 to 16:30 
GJ McLachlan 
On mixture models in highdimensional testing for the detection of differential gene expression An important problem in microarray experiments is the detection of genes that are differentially expressed in a given number of classes. As there are usually thousands of genes to be considered simultaneously, one encounters highdimensional testing problems. We provide a straightforward and easily implemented method for estimating the posterior probability that an individual gene is null (not differentially expressed). The problem can be expressed in a twocomponent mixture framework, using an empirical Bayes approach. Current methods of implementing this approach either have some limitations due to the minimal assumptions made or with the computationally intensive nature of more specific assumptions. By converting to a zscore the value of the test statistic used to test the significance of each gene, we propose a simple twocomponent normal mixture that models adequately the distribution of this score. The approach provides an estimate of the local false discovery rate (FDR) for each gene, which is taken to be the posterior probability that the gene is null. Genes with the local FDR less than a specified threshold C are taken to be differentially expressed. For a given C, this approach also provides estimates of the implied overall errors such as the (global) FDR and the false negative/positive rates. Related Links


SCHW02 
31st March 2008 16:30 to 17:30 
Statistical challenges in using comparative genomics for the identification of functional sequences There are two main aspects of comparative sequence analysis that rely on highdimensional statistical approaches: identifying evolutionarily constrained regions and determining the significance of their overlap with functional sequences. The identification of constrained sequences largely relies on our understanding of evolutionary models and applying them to multisequence alignments. However, our understanding of evolutionary processes is incomplete and our ability to generate perfect multisequence alignments is hampered by incomplete sequence datasets and general uncertainty in the process; these factors can lead to multiple equally plausible alignments, only one of which is typically represented in downstream analyses. In order to mitigate some of these issues, we have been developing new comparative genomics approaches that take into account the biochemical physical properties of DNA, such that we can understand which substitutions are more tolerable with respect to the three dimensional structure of DNA, and thus more neutral in evolution. We also plan to start taking into account alignment uncertainty into our predictions of constrained sequences. Determining the significance of our improved sequence constraint methods relies on a new statistical approach for determining the significance of overlap with known functional annotations. This new method, devised by Peter Bickel and colleagues, was applied to analyses performed within the ENCODE consortium and provides the basis for newer methods that will be discussed later in this meeting. Related Links


SCHW02 
1st April 2008 09:00 to 10:00 
Structural variation in the human genome Over the past three years it has become rapidly appreciated that the human genome varies in its structure as well as its sequence, by virtue of a panoply of different chromosomal rearrangements, some that alter the number of copies of DNA segments, and others that alter orientation but not copy number. Evidence is growing from diverse sources that this source of genomic variation has an appreciable functional impact, and yet we remain far from a complete catalogue of this form of variation let alone its biological consequences. In my talk I will summarise the progress to date and highlight the appreciable statistical challenges that remain, with particular reference to the approaches being adopted in my group towards assaying copy number variation and assessing its impact on complex traits through genetic association studies. 

SCHW02 
1st April 2008 10:00 to 11:00 
Y Benjamini 
Selective inference in complex research problems We shall highlight the problem of selective inference in genomics using some recent studies. The False Discovery Rate (FDR) approach to this problem will be reviewed, and then we shall discuss: (i) advances in hierarchical testing with an example from a study associating gene expression in the brain with multiple traits of behavior; (ii) screening for partial conjunctions in order to address replicability; and (iii) selective confidence intervals in the frequentist and Bayesian frameworks. 

SCHW02 
1st April 2008 11:30 to 12:30 
Efficient use of population genome sequencing data With the advent of new sequencing technologies that reduce the cost of DNA sequence by a factor of a hundred, we have moved into the era of population genomic sequencing, where we sample many individuals from a population to study natural genetic variation genomewide. However, at this scale sequencing is still costly. I will discuss strategies to use low coverage sequencing on multiple samples from a population, and some of the complications in using the resulting data for population genetic analyses. Examples will be drawn from the Saccharomyces Genome Resequencing Project (SGRP) in which we have collected sequence data from 70 yeast strains, and planning for the 1000 Genomes Project to characterise human genetic variation down to 1% allele frequency. Related Links


SCHW02 
1st April 2008 14:00 to 15:00 
M West 
Sparsity modelling in gene expression pathway studies I will discuss aspects of largescale multivariate modelling utilising sparsity priors for anova, regression and latent factor analysis in gene expression studies. Specific attention will be given to the development of experimental gene expression signatures in cell lines and animal models, and their extrapolation/evaluation in gene pathwayfocused analyses of data from human disease contexts. The role of sparse statistical modelling in signature identification, and in evaluation of complex interacting "sub pathway" related patterns in gene expression in observational data sets, will behighlighted. I will draw on data and examples from some of our projects in cancer and cardiovascular genomics. 

SCHW02 
1st April 2008 15:30 to 16:30 
Population genomics of human gene expression The recent comparative analysis of the human genome has revealed a large fraction of functionally constrained noncoding DNA in mammalian genomes. However, our understanding of the function of noncoding DNA is very limited. In this talk I will present recent analysis in my group and collaborators that aims at the identification of functionally variable regulatory regions in the human genome by correlating SNPs and copy number variants with gene expression data. I will also be presenting some analysis on inference of trans regulatory interactions and evolutionary consequences of gene expression variation. 

SCHW02 
2nd April 2008 09:00 to 10:00 
A Enright 
Computational analysis and prediction of microRNA binding sites MicroRNAs (miRNAs) are small 22 nucleotide RNA molecules that directly bind to the 3' Untranslated regions of proteincoding messenger RNAs. This binding event represses the target transcript rendering it unsuitable for protein production and causing its degradation. Many miRNAs have been found and a largenumber of them have already been implicated in human disease and development. We have developed a number of computational approaches for predicting the target transcripts of miRNAs. One method (miRanda) is purely computational and uses a simple dynamic programming algorithm and a statistical model to identify significant binding sites. Our second approach (Sylamer) is an algorithm for scanning genome sequences for 7mer words and testing geneexpression data to identify gene sets which are significantly enriched or depleted in such 7mer words using Hypergeometric Statistics. This combined computational/experimental approach has worked extremely well for identifying candidate miRNA targets in B and T blood cells, developing Zebrafish embryos and in mouse mutants with deafness. Related Links 

SCHW02 
2nd April 2008 10:00 to 11:00 
L1regularisation, motif regression and ChIPonchip data analysis Motivated by the proposed format of talks, we include the following: (i) a review of statistical facts about L1regularization for highdimensional problems; (ii) some adaptations of motif regression (Conlon, Liu, Lieb & Liu, 2003) for scoring potential motifs or for presence/absence of other biological targets of interest (e.g. proteins) by integrating multiple data sources; (iii) using the concepts for analyzing ChIPonchip data from human liver cells (with a side remark on signal extraction) for HIFdependent transcriptional networks. Issue (i) deals with a general purpose method for variable selection or feature extraction which is potentially useful for a broad variety of (multiple) biomolecular and highdimensional data. Issue (ii) is  in our experience  an interesting method to improve upon some chosen "standard" methodology by making use of additional data sources. Finally, issue (iii) is work in progress with the Ricci lab at ETH Zurich: it is an illustration for statisticians and  of course  the "real thing" for biologists. Related Links 

SCHW02 
2nd April 2008 11:30 to 12:30 
Extraction and classification of cellular and genetic phenotypes from automated microscopy data I will start the presentation by an overview over the Bioconductor project, a large international open source and open development software project for the analysis and comprehension of genomic data. Its goals are to provide access to a wide range of powerful statistical and graphical methods for the analysis of genomic data; to facilitate the integration of biological metadata in the analysis of experimental data: e.g. literature data, gene and genome annotation data; to allow the rapid development of extensible, scalable, and interoperable software; to promote highquality documentation and reproducible research; to provide training in computational and statistical methods for the analysis of genomic data. While much of the initial focus has been on microarray analysis, one of the recent developments has been the development of methods, and computational infrastructure, for the analysis of cellbased assays using various phenotypic readouts. Changes in cell shape are important for many processes during development and disease. However, cellular mechanisms and molecular components that underlie these processes remain poorly understood. We here present a rapid and automated approach to identify and categorize genes based on their phenotypic signatures on a singlecell level. Perturbations by RNAi on a whole genome scale led to the identification of several hundred genes with distinct cell shape phenotypes. More than 6,000,000 cells were individually profiled into different phenotypic classes. The approach is permits the segmentation of the genome into phenotypic clusters using complex phenotypic signatures. 

SCHW02 
2nd April 2008 14:00 to 15:00 
Ultradeep sequencing of mixed virus populations The diversity of virus populations within single infected hosts presents a major difficulty for the natural immune response, vaccine design, and antiviral drug therapy. Recently developed ultradeep sequencing technologies can be used for quantifying this diversity by direct sequencing of the mixed virus population. We present statistical and computational methods for the analysis of such sequence data. Inference of the population structure from observed reads is based on error correction, reconstruction of a minimal set of haplotypes that explain the data, and eventually estimation of haplotype frequencies. We demonstrate our approach by analyzing simulated data and by comparison to 165 sequences obtained from clonal Sanger sequencing of four independent, diverse HIV populations. Related Links


SCHW02 
3rd April 2008 09:00 to 10:00 
Cracking the regulatory code: predicting expression patterns from DNA sequence Precise control of gene expression lies at the heart of nearly all biological processes. However, despite enormous advances in understanding this process from both experimental and theoretical perspectives, we are still missing a quantitative description of the underlying transcriptional control mechanisms, and the remaining questions, such as how regulatory sequence elements compute expression from the inputs they receive, are still very basic. In this talk, I will present our progress towards the ultimate goal of developing integrated quantitative models for transcription regulation, spanning all aspects of the process, including the DNA sequence, regulators, and expression patterns. I will first describe a novel thermodynamic model that computes expression patterns as a function of cisregulatory sequence and the binding site preferences and expression of participating transcription factors. I will show that when applied to the segmentation gene network of Drosophila, the model accurately predicts the expression of many known cisregulatory modules, even across species, and reveals important organizing principles of transcriptional regulation in the network: that both strong and large numbers of weaker binding sites contribute, leading to high occupancy of the module DNA, and conferring robustness against mutation; and that clustering of weaker sites permits cooperative binding, which is necessary to sharpen the patterns. Related Links


SCHW02 
3rd April 2008 10:00 to 11:00 
PJ Bickel 
Refined nonparametric methods for genomic inference Inference about genomic features faces the particular difficulty that, save for interspecies variation, we have only one copy of any of the genomes that Nature might have produced. We postulate a framework which includes an ergodic hypothesis which permits us to compute p values and confidence bounds. These seem to be as conservative as could be hoped for. Out methods in crude form were applied to data for the ENCODE project (Birney et al (2007)). We will discuss our model and refinements of the methods previously proposed. 

SCHW02 
3rd April 2008 11:30 to 12:30 
Steps toward directed identification of disease genes: predicting the consequences of genetic perturbations Related Links


SCHW02 
3rd April 2008 14:00 to 15:00 
Highresolution identification of active gene regulatory elements I will discuss methods we use to identify active gene regulatory elements within the human genome and some of the current obstacles and hurdles we still need to overcome 

SCHW02 
3rd April 2008 15:30 to 16:30 
Highresolution binding specificity profiles of transcription factors and cis regulatory codes in DNA  
SCHW02 
4th April 2008 09:00 to 10:00 
Functional genomic approaches to stem cell biology Embryonic stem (ES) cells are similar to the transient population of selfrenewing cells within the inner cell mass of the preimplantation blastocyst (epiblast), capable of pluripotential differentiation to all specialised cell types comprising the adult organism. These cells undergo continuous selfrenewal to produce identical daughter cells, or can develop into specialised progenitors and terminally differentiated cells. A variety of molecular pathways involved in embryonic development have been elucidated, including those influencing stem cell differentiation. As a result, we know of a number of key transcriptional regulators and signalling molecules that play essential roles in manifesting nuclear potency and selfrenewal capacity of embryoderived and tissuespecific stem cells. Despite these efforts however, a small number of components have been identified and largescale characterisation of these processes remains incomplete. While the precise biological niche is believed to direct differentiation and development in vivo, it is now possible to utilise explanted stem cell lines as an in vitro model of cell fate assignment and differentiation. The aim of the studies discussed here is to map the global transcriptomic and proteomic activity of ES cells during various stages of differentiation and lineage commitment in tissue culture. This approach will help characterise the functional roles of key developmental regulators and yield more rational approaches to manipulating stem cell behaviour in vitro. The generation of largescale data from microarray and functional genomic experiments will help to identify and characterise the regulatory influence of key transcription factors, signaling genes and noncoding RNAs involved in early developmental pathways, leading to a more detailed understanding of the molecular mechanisms of vertebrate embryogenesis. 

SCHW02 
4th April 2008 10:00 to 11:00 
G McVean 
Approximate genealogical inference For many inferential problems in evolutionary biology and population genetics considerable power can be gained by explicitly modelling the genealogical relationship between DNA sequences. In the presence of of recombination, genealogical relationships are described by a complex graph. While it is theoretically possible to explore the posterior distribution of such graphs using techniques such as MCMC, in most realistic situations the computational complexity of such methods makes them unpractical. One possible solution is to develop approximations to full genealogical inference. I will discuss what properties such approximations should have and describe one approach that samples local genealogical relationships along a genome. 

SCHW02 
4th April 2008 11:30 to 12:30 
Genomic principles for feedback regulation of metabolism Small molecule metabolism is the highly coordinated interconversion of chemical substrates through enzymecatalysed reactions. It is central to the viability of all organisms as it enables the assimilation of nutrients for energy production and the synthesis of precursors for all cellular components. The system is tightly regulated so cells can respond efficiently to environmental changes. This is optimised to minimise the substantial cost of enzyme production and core metabolite depletion, and to maximise the benefit of cell growth and division. It is commonly known that this regulation is achieved by controlling either (i) the availability of enzymes or (ii) their activities. Though the molecular mechanisms behind these two regulatory processes have been elucidated in great detail, and we still lack insight into how they are deployed and complement each other at a global level. Here, I will present a genomescale analysis of how regulatory feedback by small molecules control the metabolic system, and examine how the two modes of regulation are deployed throughout the system. Bio: Nick Luscombe, Group Leader, EMBLEuropean Bioinformatics Institute Nick completed his PhD with Professor Janet Thornton at University College London (19962000), studying the basis for specificity of DNAbinding proteins. He then moved to Yale University as a postdoctoral fellow with Professor Mark Gerstein (20002004). During this time, he shifted his research focus to genomics, with a particular emphasis on transcriptional regulation in yeast. He has been a Group Leader at EMBLEBI since 2005, examining the control of interesting biological systems. 

SCHW02 
4th April 2008 14:00 to 15:00 
A bayesian probabilistic approach to transform public microarray repositories into disease diagnosis databases Predicting phenotypes from genotypes is one of the major challenges of functional genomics. In this talk, we aim to take the first step into using microarray repositories to create a disease diagnosis database, or in general, for phenotype prediction. This will provide an important application for the enormous amount of costly generated, yet freely available, genomics data. In many disease diagnosis cases, it is not obvious which potential disease should be targeted, and screening across the enormous accumulation of disease expression profiles will help to narrow down the disease candidates. In addition, such profilebaseddiagnosis is especially useful for those diseases that lack biochemical diagnosis tests. 

SCH 
8th April 2008 11:00 to 12:00 
Empirical efficiency maximisation: improved locally efficient covariate adjustment It has long been recognized that covariate adjustment can increase precision in randomized experiments, even when it is not strictly necessary. Adjustment is often straightforward when a discrete covariate partitions the sample into a handful of strata, but becomes more involved when modern studies collect copious amounts of baseline information on each subject. This dilemma helped motivate locally efficient estimation, in which one attempts to gain efficiency through a (possibly misspecified) working model. However, with complex highdimensional covariates, where one might have no belief in the working model, misspecification can actually decrease precision. We propose a new method, empirical efficiency maximization, to target the working model element minimizing asymptotic variance for the resulting parameter estimate, whether or not the working model is (approximately) correct. Gains are demonstrated relative to standard locally efficient estimators. 

SCH 
10th April 2008 11:00 to 12:00 
C Taylor 
Boosting kernel estimates Kernel density estimation can be used to implement an estimate of Bayes' rule for classification. Kernel functions can also be used in nonparametric regression, and all three topics (classification, regression and clustering) are examples of "statistical learning". Boosting  an iterative procedure for improving estimates  is increasingly widely used due to its impressive performance. In this talk we give an introduction to these kernel methods as well as to boosting. We show how to implement boosting in each case, and illustrate (both theoretically, and by example) the effect on bias and variance. 

SCH 
15th April 2008 11:00 to 12:00 
Data visualisation via pairwise displays We take a graph theoretic approach to the component ordering problem in the layout of statistical graphics. We use Eulerian tours and Hamiltonian decompositions of complete graphs to ameliorate order effects. Similarly, visual effects of selected salient features in the data are amplified with traversals of edge weighted graphs. Examples of these techniques include improved versions of multiple comparison displays, interaction plots, star glyph displays and parallel coordinate plots. Improved versions of interaction plots and star glyph displays are described based on graph traversals. We present algorithms based on classical graph theory methods. These along with the new graphical displays are available as an R package. This is joint work with R.W. Oldford (Waterloo). 

SCH 
17th April 2008 11:00 to 12:00 
Determining the number of factors in a linear mixture model from limited noisy data Determining the number of signals (sources / components) in a linear mixture model is a fundamental problem in many scientific fields, including signal processing and analytical chemistry. While most methods in signal processing are based on informationtheoretic criteria, in this talk we'll describe a novel nonparametric estimation method based on a sequence of hypothesis tests. The proposed method uses the eigenvalues of the sample covariance matrix, and combines a matrix perturbation approach with recent results from random matrix theory regarding the behaviour of noise eigenvalues. We'll present the theoretical derivation of the method, analysis of its consistency and limit of detection. As we'll show in simulations, under a wide range of conditions our method compares favourably with other common methods Joint work with Shira Kritchman (Weizmann). 

SCH 
21st April 2008 11:00 to 12:00 
Spectra and generalisation The talk briefly reviews generalisation bounds for Support Vector Machines and poses the question of whether the spectrum of the empirical covariance matrix can be used to improve the quality of the bounds. Early results in this direction are surveyed before introducing a recent bound on the number of dichotomies of a graph in terms of the spectrum of the graph Laplacian. This result gives a bound on transductive algorithms that minimise the cut size of the classification. The result is then generalised to other bilinear forms and hence applied to Support Vector Classification. In order to obtain an inductive bound the eigenvalues of the true covariance must be estimated from those of a sample covariance matrix. Possible improvements in the quality of the bound are discussed. 

SCH 
22nd April 2008 11:00 to 12:00 
Empirical likelihood with a growing number of parameters  
SCH 
24th April 2008 11:00 to 12:00 
A Bayesian reassessment of nearestneighbour classification The knearestneighbour procedure is a wellknown deterministic method used in supervised classification. This paper proposes a reassessment of this approach as a statistical technique derived from a proper probabilistic model; in particular, we modify the assessment made in a previous analysis of this method undertaken by Holmes & Adams (2002,2003), and evaluated by Manocha & Girolami (2007), where the underlying probabilistic model is not completely welldefined. Once a clear probabilistic basis for the $k$nearestneighbour procedure is established, we derive computational tools for conducting Bayesian inference on the parameters of the corresponding model. In particular, we assess the difficulties inherent to pseudolikelihood and to path sampling approximations of an intractable normalising constant, and propose a perfect sampling strategy to implement a correct MCMC sampler associated with our model. If perfect sampling is not available, we suggest using a Gibbs sampling approximation. Illustrations of the performance of the corresponding Bayesian classifier are provided for several benchmark datasets, demonstrating in particular the limitations of the pseudolikelihood approximation in this setup. [Joint work with L. Cucala, J.M. Marin, and D.M. Titterington] 

SCH 
29th April 2008 11:00 to 12:00 
Making the sky searchable: large scale astronomical pattern recognition  
SCH 
30th April 2008 11:00 to 12:00 
Testing for sparse normal means: is there a signal? Donoho and Jin (2004), following work of Ingster (1999), studied the problem of testing for a signal in a sparse normal means model and showed that there is a ``detection boundary'' above which the signal can be detected and below which no test has any power. They showed that Tukey's ``higher criticism'' statistic achieves the detection boundary. I will introduce a new family of test statistics based on phidivergences (indexed by a real number s with values between 1 and 2)which all achieve the DonohoJinIngster detection boundary. I will also review recent work on estimating the proportion of nonzero means. 

SCH 
30th April 2008 14:00 to 15:00 
Looking at data and models in highdimensional spaces: (1) Tools and tips for making good plots This session focuses on making static plots for publications utilizing contemporary wisdom on plot design. It includes choice of background and grid lines, color use, aspect ratio. We'll use R and the package ggplot2, and the web site vischeck for color checks. 

SCH 
6th May 2008 11:00 to 12:00 
Nonasymptotic variable identification via the Lasso and the elastic net The topic of l_1 regularized or Lassotype estimation has received considerable attention over the past decade. Recent theoretical advances have been mainly concerned with the risk of the estimators and corresponding sparsity oracle inequalities. In this talk we will imvestigate the quality of the l_1 penalized estimators from a different perspective, shifting the emphasis to nonasymptotic variable selection, which complements the consistent variable selection literature. Our main results are established for regression models, with emphasis on the square and logistic loss. The identification of the tagged SNPs associated with a disease, in genomewide association studies, provides the principal motivation for this analysis. The performance of the method depends crucially on the choice of the tuning sequence and we discuss nonasymptotic choices for which we can correctly detect sets of variables associated with the response at any prespecified confidence level. These tuning sequences are different for the two loss functions, but in both cases larger than those required for best risk performance, The stability of the design matrix is another major issue in correct variable selection, especially when the total number of variables exceeds the sample size. A possible solution os provided by further regularization, for instance via an l_1+l_2 or elastic net penalty. We discuss the merits and limitations of this method in the same context as above. 

SCH 
7th May 2008 14:00 to 15:00 
Looking at data and models in highdimensional spaces: (2) How, when and why to use interactive and dynamic graphics This session will be an explanation of graphics for highdimensional spaces, and ways to calibrate your eyes to recognise structure. We'll also discuss graphics in association with data mining methods, perhaps, selforganizing maps, modelbased clustering, support vector machines and neural networks. We'll use R, ggobi, and the package rggobi. 

SCH 
8th May 2008 11:00 to 12:00 
M Wegkamp 
Lasso type classifiers with a reject option We consider the problem of binary classification where one can, for a particular cost, choose not to classify an observation. We present a simple oracle inequality for the excess risk of structural risk minimizers using a generalized lasso penalty. 

SCH 
13th May 2008 11:00 to 12:00 
A Hero 
Entropic graphs for highdimensional data analysis A minimal spanning tree (MST) spanning random points has total spanning length that converges to the entropy of the underlying density generating the points. This celebrated result was first established by Beardwood, Halton and Hammersley (1958) and has since been extended to other random Euclidean and nonEuclidean graphs, such as the geodesic MST (GMST) and the knearest neighbor graph (kNNG) over a random set of point. Using the BHH theory of random graphs one can construct graphbased estimates of topological properties of a high dimensional distribution of a data sample. This leads, for example, to a modelfree consistent estimator of intrinsic dimension of a data manifold and a high performance nonparametric anomaly detector. We will illustrate this entropic graph approach for applications including: anomaly detection in Internet traffic; activity detection in a MICA2 wireless network; and intrinsic dimension estimation of image databases. 

SCH 
14th May 2008 11:00 to 12:00 
Object oriented data analysis Object Oriented Data Analysis is the statistical analysis of populations of complex objects. In the special case of Functional Data Analysis, these data objects are curves, where standard Euclidean approaches, such as principal components analysis, have been very successful. Recent developments in medical image analysis motivate the statistical analysis of populations of more complex data objects which are elements of mildly nonEuclidean spaces, such as Lie Groups and Symmetric Spaces, or of strongly nonEuclidean spaces, such as spaces of treestructured data objects. These new contexts for Object Oriented Data Analysis create several potentially large new interfaces between mathematics and statistics. Even in situations where Euclidean analysis makes sense, there are statistical challenges because of the High Dimension Low Sample Size problem, which motivates a new type of asymptotics leading to nonstandard mathematical statistics. 

SCH 
14th May 2008 14:00 to 15:00 
Looking at data and models in highdimensional spaces: (3) Determining significance of structure Now we'll look at using permutations and simulation to check for the significance of structure, and to compare with null samples. We'll also discuss reordering methods to better reveal structure. We'll use R, ggobi, and the package rggobi. 

SCH 
15th May 2008 11:00 to 12:00 
Assessing highdimensional latent variable models Having built a probabilistic model, a natural question is: "what probability does my model assign to the data?". We might fit the model's parameters to avoid having to compute an intractable marginal likelihood. Even then, evaluating a testset probability with fixed parameters can be difficult. I will discuss recent work on evaluating highdimensional undirected graphical models and models with many latent variables. This allows direct comparisons of the probabilistic predictions made by graphical models with hundreds of thousands of parameters against simpler alternatives. 

SCH 
19th May 2008 16:40 to 17:10 
Frontiers in applications of data mining  
SCH 
19th May 2008 17:10 to 17:40 
Frontiers in applications of machine learning  
SCH 
19th May 2008 17:40 to 18:30 
Panel discussion  
SCH 
20th May 2008 11:00 to 12:00 
On stratified path sampling of the thermodynamic integral: computing Bayes factors for nonlinear ODE models of biochemical pathways Bayes factors provide a means of objectively ranking a number of plausible statistical models based on their evidential support. Computing Bayes factors is far from straightforward and methodology based on thermodynamic integration can provide stable estimates of the integrated likelihood. This talk will consider a stratified sampling strategy in estimating the thermodynamic integral and will consider issues such as optimal paths and the variance of the overall estimator. The main application considered will be the computation of Bayes factors for biochemical pathway models based on systems of nonlinear ordinary differential equations (ODE). A large scale study of the ExtraCellular Regulated Kinase (ERK) pathway will be discussed where recent Small Interfering RNA (siRNA) experimental validation of the predictions made using the computed Bayes factors is presented. 

SCH 
21st May 2008 11:00 to 12:00 
Slow subspace learning Slow feature learning exploits the intuition that in realistic processes subsequently observed stimuli are likely to have the same interpretation while independently observed stimuli are likely to be interpreted differently. The talk discusses such a method for stationary, absolutely regular process taking values in a high dimensional space. A projection to a lowdimensional subspace is selected from a finite number of observations on the basis of a criterion which rewards datavariance and penalizes the variance of the velocity vector. Convergence theorems, error analysis and some experiments are reported. 

SCH 
22nd May 2008 11:00 to 12:00 
Latent variable models of transcriptional regulation The expression of genes as messenger RNA (mRNA) in the cell is regulated by the activity of transcription factor proteins. The measurement of mRNA concentration for essentially all genes can be routinely carried out using highthroughput experimental techniques such as microarrays. It is much less straightforward to measure the concentration of activated transcription factor proteins. Latent variable models have therefore been developed which treat transcription factors as unobserved chemical species who's active concentration and effect can be inferred indirectly from the expression levels of their target genes. We are developing two classes of latent variable models. For small subsystems, e.g. genes controlled by a single transcription factor, we model the process of transcription using ordinary differential equations with the transcription factor's concentration modeled using a Gaussian process prior distribution over functions. For larger systems, with hundreds of transcription factors controlling thousands of genes, we use simple discretetime or nontemporal linear models. Bayesian methods provide a natural means for inference of transcription factor concentrations and other model parameters of interest. Joint work with Neil Lawrence. 

SCH 
27th May 2008 15:00 to 15:30 
On estimating covariances between many assets with histories of highly variable length  
SCH 
27th May 2008 15:30 to 16:00 
Nonparametric estimation of a logconcave density  
SCH 
27th May 2008 16:00 to 16:30 
Factorial mixture of Gaussians and the marginal independence model  
SCH 
27th May 2008 16:30 to 17:00 
Understanding uncertainty  
SCH 
28th May 2008 11:00 to 12:00 
HG Mueller 
Functional regression and additive models Functional regression analysis aims at situations where predictors or responses in a regression setting include random functions. Early functional linear models were based on the assumption of observing complete random trajectories, while more recent approaches emphasize more realistic settings of repeated noisy measurements, as encountered in longitudinal studies or online data. Recent joint work with Yao on a functional additive model (FAM) will be discussed. FAM has good asymptotic and practical properties and provides desirable flexibility. 

SCH 
29th May 2008 11:00 to 12:00 
N Cristianini  Learning curves: lessons from statistical machine translation  
SCH 
3rd June 2008 11:00 to 12:00 
On the approximation of quadratic forms and sparse matrix products Thus far, sparse representations have been exploited largely in the context of robustly estimating functions in a noisy environment from a few measurements. In this context, the existence of a basis in which the signal class under consideration is sparse is used to decrease the number of necessary measurements while controlling the approximation error. In this talk, we instead focus on sparse representations of linear operators, with the objective of minimizing the number of operations required to perform basic operations (here, multiplication) on their matrix representations. We employ a representation in terms of sums of rankone operators, and show how solving a sparse approximation problem akin to model selection for an induced quadratic form in turn guarantees a bounded approximation error for the product of two matrices. Connections to multilinear algebra by way of exterior products in turn yield new randomized algorithms for this and other tasks involving the large matrices and highdimensional covariance operators that arise in modern statistical practice. (joint work with MohamedAli Belabbas) 

SCH 
5th June 2008 11:00 to 12:00 
Approximation of functional spatial regression models using bivariate splines We consider the functional linear regression model where the explanatory variable is a random surface and the response is a real random variable, with bounded or normal noise. Bivariate splines over triangulations represent the random surfaces. We use this representation to construct least squares estimators of the regression function with or without a penalization term. Under the assumptions that the regressors in the sample are bounded and span a large enough space of functions, bivariate splines approximation properties yield the consistency of the estimators. Simulations demonstrate the quality of the asymptotic properties on a realistic domain. We also carry out an aplication to ozone forecasting over the US that illustrates the predictive skills of the method. This is joint work with MingJun Lai. 

SCH 
6th June 2008 11:00 to 12:00 
Challenges of regional climate modelling and validation As attention shifts from broad global summaries of climate change to more specific regional results there is a need for statistics to analyze observations and model output that have significant variability and also to quantify the uncertainty in regional projections. This talk will survey some work on interpreting regional climate experiments. In large multimodel studies one challenge is to understand the contributions of different global and regional model combinations to the simulated climate. This is difficult because the individual runs tend to be short in length. Thus one is faced with the paradox of generating massive data sets that still demand statistical analysis to quantify significant features. We suggest some approaches based on functional data analysis that leverage sparse matrix techniques to handle large spatial fields. (Joint work with Cari Kaufman, Stephen Sain and Linda Mearns.) 

SCH 
10th June 2008 11:00 to 12:00 
Sparse recovery in convex hulls based on entropy penalisation  
SCH 
12th June 2008 11:00 to 12:00 
Confidence sets for the optimal approximating model  bridging a gap between adaptive point estimation and confidence regions  
SCH 
17th June 2008 11:00 to 12:00 
JC van Houwelingen 
Global testing of association and/or predictability in regression problems with p>>n predictors
Global testing is an accepted strategy for 'screening' regression problems and controlling the familywise error rate. For p 

SCHW05 
18th June 2008 14:00 to 15:00 
S Godsill 
Sequential inference for dynamically evolving groups of objects In this talk I will describe recent work on tracking for groups of objects. The aim of the process is to infer evolving groupings of moving objects over time, including group affiliations and individual object states. Behaviour of group objects is modelled using interacting multiple object models, in which individuals attempt stochastically to adjust their behaviour to be `similar' to that of other objects in the same group; this idea is formalised as a multidimensional stochastic differential equation for group object motion. The models are estimated algorithmically using sequential Markov chain Monte Carlo approximations to the filtering distributions over time, allowing for more complex modelling scenarios than the more familiar importancesampling based Monte Carlo filtering schemes. Examples will be presented from GMTI data trials for multiple vehicle motion. Related Links 

SCHW05 
18th June 2008 15:30 to 16:10 
Y Cai 
A Bayesian method for nonGaussian autoregressive quantile function time series models Many time series in economics and finance are nonGaussian. In this paper, we propose a Bayesian approach to nonGaussian autoregressive quantile function time series models where the scale parameter of the models does not depend on the values of the time series. This approach is parametric. So we also compare the proposed parametric approach with the semiparametric approach (Koenker, 2005). Simulation study and applications to real time series show that the method works very well. 

SCHW05 
18th June 2008 16:10 to 16:50 
X Luo 
State estimation in high dimensional systems: the method of the ensemble unscented Kalman filter The ensemble Kalman filter (EnKF) is a Monte Carlo implementation of the Kalman filter, which is often adopted to reduce the computational cost when dealing with high dimensional systems. In this work, we propose a new EnKF scheme based on the concept of the unscented transform, which therefore will be called the ensemble unscented Kalman filter (EnUKF). Under the assumption of Gaussian distribution of the estimation errors, it can be shown analytically that, the EnUKF can achieve more accurate estimations of the ensemble mean and covariance than the ordinary EnKF. Therefore incorporating the unscented transform into an EnKF may benefit its performance. Numerical experiments conducted on a $40$dimensional system support this argument. 

SCHW05 
18th June 2008 16:50 to 17:30 
A modern perspective on auxiliary particle filters The auxiliary particle filter (APF) is a popular algorithm for the Monte Carlo approximation of the optimal filtering equations of state space models. This talk presents a summary of several recent developments which affect the practical implementation of this algorithm as well as simplifying its theoretical analysis. In particular, an interpretation of the APF, which makes use of an auxiliary sequence of distributions, allows the approach to be extended to more general Sequential Monte Carlo algorithms. The same interpretation allows existing theoretical results for standard particle filters to be applied directly. Several nonstandard implementations and applications will also be discussed. 

SCHW05 
19th June 2008 09:00 to 09:40 
VA Reisen 
Estimating multiple fractional seasonal longmemory parameter This paper explores seasonal and longmemory time series properties by using the seasonal fractionally ARIMA model when the seasonal data has two seasonal periods, namely, s1 and s2. The stationarity and invertibility parameter conditions are established for the model studied. To estimate the memory parameters, the method given in Reisen, Rodrigues and Palma (2006 a,b), which is a variant of the technique proposed in Geweke and PorterHudak (1983) (GPH), is generalized here to deal with a time series with multiple seasonal fractional longmemory parameters. The accuracy of the method is investigated through Monte Carlo experiments and the good performance of the estimator indicates that it can be an alternative procedure to estimate seasonal and cyclical longmemory time series data. 

SCHW05 
19th June 2008 09:40 to 10:20 
Y Shen 
Variational Markov Chain Monte Carlo for inference in partially observed stochastic dynamic systems In this paper, we develop set of novel Markov chain Monte Carlo algorithms for Bayesian inference in partially observed nonlinear diffusion processes. The Markov chain Monte Carlo algorithms we develop herein use an approximating distribution to the true posterior as the proposal distribution for an independence sampler. The approximating distribution utilises the posterior approximation computed using the recently developed variational Gaussian Process approximation method. Flexible blocking strategies are then introduced to further improve the mixing, and thus the efficiency, of the Markov chain Monte Carlo algorithms. The algorithms are tested on two cases of a doublewell potential system. It is shown that the blocked versions of the variational sampling algorithms outperform Hybrid Monte Carlo sampling in terms of computational efficiency, except for cases where multimodal structure is present in the posterior distribution. 

SCHW05 
19th June 2008 10:20 to 11:00 
Two problems with variational expectation maximisation for timeseries models Variational methods are a key component of the approximate inference and learning toolbox. These methods fill an important middle ground, retaining distributional information about uncertainty in latent variables, unlike maximum a posteriori methods (MAP), and yet requiring fewer computational resources than Monte Carlo Markov Chain methods. In particular the variational Expectation Maximisation (vEM) and variational Bayes algorithms, both involving variational optimisation of a free energy, are widely used in timeseries modelling. Here, we investigate the success of vEM in simple probabilistic timeseries models. First we consider the inference step of vEM, and show that a consequence of the well known compactness property is a failure to propagate uncertainty in time, thus limiting the usefulness of the retained distributional information. In particular, the uncertainty may appear to be smallest precisely when the approximation is poorest. Second, we consider parameter learning and analytically reveal systematic biases in the parameters found by vEM. Surprisingly, simpler variational approximations (such a meanfield) can lead to less bias than more complicated structured approximations. Related Links


SCHW05 
19th June 2008 11:30 to 12:30 
M Opper 
Approximate Inference for Continuous Time Markov Processes Continuous time Markov processes (such as jump processes and diffusions) play an important role in the modelling of dynamical systems in many scientific areas. In a variety of applications, the stochastic state of the system as a function of time is not directly observed. One has only access to a set of nolsy observations taken at a discrete set of times. The problem is then to infer the unknown state path as best as possible. In addition, model parameters (like diffusion constants or transition rates) may also be unknown and have to be estimated from the data. While it is fairly straightforward to present a theoretical solution to these estimation problems, a practical solution in terms of PDEs or by Monte Carlo sampling can be time consuming and one is looking for efficient approximations. I will discuss approximate solutions to this problem such as variational approximations to the probability measure over paths and weak noise expansions. 

SCHW05 
19th June 2008 14:00 to 15:00 
Recent applications of spatial point processes to multipleobject tracking
The Point Process framework is natural for the multipleobject tracking problem and is increasingly playing a central role in the derivation of new inference schemes. Interest in this framework is largely due to the derivation of a filter that propagates the first moment of a Markovintime Spatial Point Processes observed in noise by Ronald Mahler. Since then there have been several extensions to this result with accompanying numerical implementations based on Sequential Monte Carlo. These results will be presented.


SCHW05 
19th June 2008 15:20 to 16:00 
Multiobject tracking with representations of the symmetric group We present a framework for maintaining and updating a time varying distribution over permutations matching tracks to real world objects. Our approach hinges on two insights from the theory of harmonic analysis on noncommutative groups. The first is that it is sufficient to maintain certain low frequency Fourier components of this distribution. The second is that marginals and observation updates can be efficiently computed from such components by extensions of Clausens FFT for the symmetric group. Related Links


SCHW05 
19th June 2008 16:00 to 17:00 
C Williams 
Factorial switching linear dynamical systems for physiological condition monitoring Condition monitoring often involves the analysis of measurements taken from a system which "switches" between different modes of operation in some way. Given a sequence of observations, the task is to infer which possible condition (or "switch setting") of the system is most likely at each time frame. In this paper we describe the use of factorial switching linear dynamical models for such problems. A particular advantage of this construction is that it provides a framework in which domain knowledge about the system being analysed can easily be incorporated. We demonstrate the flexibility of this type of model by applying it to the problem of monitoring the condition of a premature baby receiving intensive care. The state of health of a baby cannot be observed directly, but different underlying factors are associated with particular patterns of measurements, e.g. in the heart rate, blood pressure and temperature. We use the model to infer the presence of two different types of factors: common, recognisable regimes (e.g. certain artifacts or common physiological phenomena), and novel patterns which are clinically significant but have unknown cause. Experimental results are given which show the developed methods to be effective on real intensive care unit monitoring data. Joint work with John Quinn and Neil McIntosh Related Links


SCHW05 
19th June 2008 17:00 to 17:30 
Bayesian Gaussian process models for multisensor timeseries prediction We propose a powerful prediction algorithm built upon Gaussian processes (GPs). They are particularly useful for their flexibility, facilitating accurate prediction even in the absence of strong physical models. GPs further allow us to work within a completely Bayesian framework. As such, we show how the hyperparameters of our system can be marginalised by use of Bayesian Monte Carlo, a principled method of approximate integration. We employ the error bars of the GP's prediction as a means to select only the most informative observations to store. This allows us to introduce an iterative formulation of the GP to give a dynamic, online algorithm. We also show how our error bars can be used to perform active data selection, allowing the GP to select where and when it should next take a measurement. We demonstrate how our methods can be applied to multisensor prediction problems where data may be missing, delayed and/or correlated. In particular, we present a real network of weather sensors as a testbed for our algorithm. 

SCHW05 
20th June 2008 09:00 to 09:40 
GJ McLachlan 
Clustering of time course geneexpression data via mixture regression models In this paper, we consider the use of mixtures of linear mixed models to cluster data which may be correlated and replicated and which may have covariates. This approach can thus be used to cluster time series data. For each cluster, a regression model is adopted to incorporate the covariates, and the correlation and replication structure in the data are specified by the inclusion of random effects terms. The procedure is illustrated in its application to the clustering of timecourse gene expression data. 

SCHW05 
20th June 2008 09:40 to 10:20 
Markov chain Monte Carlo algorithms for Gaussian processes We discuss Markov chain Monte Carlo algorithms for sampling functions in Gaussian process models. A first algorithm is a local sampler that iteratively samples each local part of the function by conditioning on the remaining part of the function. The partitioning of the domain of the function into regions is automatically carried out during the burnin sampling phase. A more advanced algorithm uses control variables which are auxiliary function values that summarize the properties of the function. At each iteration, the algorithm proposes new values for the control variables and then generates the function from the conditional Gaussian process prior. The control input locations are found by minimizing the total variance of the conditional prior. We apply these algorithms to estimate nonlinear differential equations in Systems Biology. 

SCHW05 
20th June 2008 10:20 to 11:00 
Is that really the pattern we're looking for? Bridging the gap between statistical uncertainty and dynamic programming algorithms Two approaches to statistical pattern detection, when using hidden or latent variable models, are to use either dynamic programming algorithms or Monte Carlo simulations. The first produces the most likely underlying sequence from which patterns can be detected but gives no quantification of the error, while the second allows quantification of the error but is only approximate due to sampling error. This paper describes a method to determine the statistical distributions of patterns in the underlying sequence without sampling error in an efficient manner. This approach allows the incorporation of restrictions about the kinds of patterns that are of interest directly into the inference framework, and thus facilitates a true consideration of the uncertainty in pattern detection. 

SCHW05 
20th June 2008 11:30 to 12:30 
E Moulines 
Adaptive Monte Carlo Markov Chains In this talk, we present in a common unifying framework several adaptive Monte Carlo Markov chain algorithms (MCMC) that have been recently proposed in the literature. We prove that under a set of verifiable conditions, ergodic averages calculated from the output of a socalled adaptive MCMC sampler converge to the required value and can even, under more stringent assumptions, satisfy a central limit theorem. We prove that the conditions required are satisfied for the Independent MetropolisHastings algorithm and the Random Walk Metropolis algorithm with symmetric increments. Finally we propose an application of these results to the case where the proposal distribution of the MetropolisHastings update is a mixture of distributions from a curved exponential family. Several illustrations will be provided. 

SCHW05 
20th June 2008 14:00 to 15:00 
O Papaspiliopoulos 
A methodological framework for Monte Carlo estimation of continuoustime processes In this talk I will review a mathodological framework for the estimation of partially observed continuoustime processes using Monte Carlo methods. I will presente different types of data structures and frequency regimes and will focus on unbiased (with respect to discretization errors) Monte Carlo methods for parameter estimation and particle filtering of continuoustime processes. An important component of the methodology is the Poisson estimator and I will discuss some of its properties. I will also present some results on the parameter estimation using variations of the smooth particle filter which exploit the graphical model structure inherent in partially observed continuoustime Markov processes. 

SCHW05 
20th June 2008 15:30 to 16:10 
High frequency variability and microstructure bias
Microstructure noise can substantially bias the estimation of volatility of an Ito process. Such noise is inherently multiscale, causing eventual inconsistency in estimation as the sampling rate becomes more frequent. Methods have been proposed to remove this bias using subsampling mechanisms. We instead take a frequency domain approach and advocate learning the degree of contamination from the data. The volatility can be seen as an aggregation of contributions from many different frequencies. Having learned the degree of contamination allows us to frequencybyfrequency correct these contributions and calculate a biascorrected estimator. This procedure is fast, robust to different signal to microstructure scenarios, and is also extended to the problem of correlated microstructure noise. Theory can be developed as long as the Ito process has harmonizable increments, and suitable dynamic spectral range.


SCHW05 
20th June 2008 16:10 to 17:10 
Nonparametric Bayesian times series models: infinite HMMs and beyond Hidden Markov models (HMMs) are one of the most widely used statistical models for time series. Traditionally, HMMs have a known structure with a fixed number of states and are trained using maximum likelihood techniques. The infinite HMM (iHMM) allows a potentially unbounded number of hidden states, letting the model use as many states as it needs for the data (Beal, Ghahramani and Rasmussen 2002). Teh, Jordan, Beal and Blei (2006) showed that a form of the iHMM could be derived from the Hierarchical Dirichlet Process, and described a Gibbs sampling algorithm based on this for the iHMM. I will talk about recent work we have done on infinite HMMs. In particular: we now have a much more efficient inference algorithm based on dynamic programming, called 'Beam Sampling', which should make it possible to apply iHMMs to larger problems. We have also developed a factorial version of the iHMM which makes it possible to have an unbounded number of binary state variables, and can be thought of as a timeseries generalization of the Indian buffet process. Joint work with Jurgen van Gael (Cambridge), Yunus Saatci (Cambridge) and Yee Whye Teh (Gatsby Unit, UCL). Related Links


SCHW03 
23rd June 2008 10:00 to 11:00 
Variable selection in very high dimensional regression and classification  
SCHW03 
23rd June 2008 11:30 to 12:30 
Dimension reduction Ursula Gather joint work with Charlotte Guddat Progress in computer science in the last decades has practically led to ’floods of data’ which can be stored and has to be handled to gain information of interest therein. As an example, consider data from the field of genetics where the dimension may increase to values up in the thousands. Classical statistical tools are not able to cope with this situation. Hence, a number of dimension reduction procedures have been developed which may be applied when considering nonparametric regression procedures. The aim is to find a subspace of the predictor space which is of much lower dimension but still contains the important information on the relation between response and predictors. We will review a number of procedures for dimension reduction (e.g. SIR, SAVE) in multiple regression and consider them under robustness aspects as well. As a special case we include methods for variable selection (e.g. EARTH, SIS) and introduce a new robust approach for the case when n is much smaller than p. 

SCHW03 
23rd June 2008 14:00 to 15:00 
Stability  based regularisation The properties of L1penalized regression have been examined in detail in recent years. I will review some of the developments for sparse highdimensional data, where the number of variables p is potentially very much larger than sample size n. The necessary conditions for convergence are less restrictive if looking for convergence in L2norm than if looking for convergence in L0quasinorm. I will discuss some implications of these results. These promising theoretical developments notwithstanding, it is unfortunately often observed in practice that solutions are highly unstable. If running the same model selection procedure on a new set of samples, or indeed a subsample, results can change drastically. The choice of the proper regularization parameter is also not obvious in practice, especially if one is primarily interested in structure estimation and only secondarily in prediction. Some preliminary results suggest, though, that the stability or instability of results is informative when looking for suitable dataadaptive regularization. 

SCHW03 
23rd June 2008 15:30 to 16:30 
T Cai 
Largescale multiple testing: finding needles in a haystack Due to advances in technology, it has become increasingly common in scientific investigations to collect vast amount of data with complex structures. Examples include microarray studies, fMRI analysis, and astronomical surveys. The analysis of these data sets poses many statistical challenges not present in smaller scale studies. In these studies, it is often required to test thousands and even millions of hypotheses simultaneously. Conventional multiple testing procedures are based on thresholding the ordered pvalues. In this talk, we consider largescale multiple testing from a compound decision theoretical point of view by treating it as a constrained optimization problem. The solution to this optimization problem yields an oracle procedure. A datadriven procedure is then constructed to mimic the performance of the oracle and is shown to be asymptotically optimal. In particular, the results show that, although pvalue is appropriate for testing a single hypothesis, it fails to serve as the fundamental building block in largescale multiple testing. Time permitting, I will also discuss simultaneous testing of grouped hypotheses. This is joint work with Wenguang Sun (University of Pennsylvania). Related Links


SCHW03 
24th June 2008 09:00 to 10:00 
Fitting survival models with P>>n predictors: beyond proportional hazards In a recent paper by Bovelstad et al. [1] partial likelihood ridge regression as used in [2] turned out to be the most successful approach to predicting survival with gene expression data. However the proportional hazard model used in these models is quite simple and might not be realistic if there is a long survival followup. Exploring the fit of the model by using a crossvalidated prognostic index leads to the conclusion that the effect of the predictor derived in [2] is neither linear nor constant over time. We will discuss penalized reduced rank models as a way to obtain robust extensions of the Cox model for this type of data. For time varying effects the reduced rank model of [3] can be employed, while nonlinear effects can be introduced by means of bilinear terms. The predictive performance of such models can be regulated by penalization in combination with crossvalidation. References [1] Bovelstad, HM; Nygard, S; Storvold, HL; et al. Predicting survival from microarray data  a comparative study BIOINFORMATICS, 23 (16): 20802087 AUG 15 2007 [2] van Houwelingen, HC; Bruinsma, T; Hart, AAM; et al. Crossvalidated Cox regression on microarray gene expression data STATISTICS IN MEDICINE, 25 (18): 32013216 SEP 30 2006 [3] Perperoglou, A; le Cessie, S; van Houwelingen, HC Reducedrank hazard regression for modeling nonproportional hazards STATISTICS IN MEDICINE, 25 (16): 28312845 AUG 30 2006 

SCHW03 
24th June 2008 10:00 to 11:00 
Model selection and estimation with multiple reproducing Karnel Hilbert spaces In this talk, we consider the problem of learning a target function that belongs to the linear span of a large number of reproducing kernel Hilbert spaces. Such a problem arises naturally in many practice situations with the ANOVA, the additive model and multiple kernel learning as the most well known and important examples. We investigate approaches based on l1type complexity regularization. We study the theoretical properties from both variable selection and estimation perspective. We establish several probabilistic inequalities providing bounds on the excess risk and L2error that depend on the sparsity of the problem. (part of the talk are based on joint work with Vladimir Koltchinskii.) 

SCHW03 
24th June 2008 11:30 to 12:30 
A Tsybakov 
Sparsity oracle inequalities
The quality of solving several statistical problems, such as adaptive nonparametric estimation, aggregation of estimators, estimation under the sparsity scenario and weak learning can be assessed in terms of sparsity oracle inequalities (SOI) for the prediction risk. One of the challenges is to build estimators that attain the sharpest SOI under minimal assumptions on the dictionary. Methods of sparse estimation are mainly of the two types. Some of them, like the BIC, enjoy nice theoretical properties in terms of SOI without any assumption on the dictionary but are computationally infeasible starting from relatively modest dimensions p. Others, like the Lasso or the Dantzig selector, can be easily realized for very large p but their theoretical performance is conditioned by severe restrictions on the dictionary. We will focus on Sparse Exponential Weighting, a new method of sparse recovery realizing a compromise between theoretical properties and computational efficiency. The theoretical performance of the method in terms of SOI is comparable with that of the BIC. No assumption on the dictionary is required. At the same time, the method is computationally feasible for relatively large dimensions p. It is constructed using an exponential weighting with suitably chosen priors, and its analysis is based on the PACBayesian ideas in statistical learning.


SCHW03 
24th June 2008 14:00 to 14:20 
The exchangeable graph model for statistical network analysis Observations consisting of measurements on pairs of objects (or conditions) arise in a number of settings in the biological sciences (www.yeastgenome.org), with collections of scientific publications (www.jstor.org) and other hyperlinked resources (www.wikipedia.org), and in social networks (www.linkedin.com). Analyses of such data typically aim at identifying structure among the units of interest, in a low dimensional space, to support the generation of substantive hypotheses, to partially automate semantic categorization, to facilitate browsing, and to simplify complex data into useful patterns, more in general. In this talk we introduce the exchangeable graph model and show its utility: 1. as a quantitative tool for exploring static/dynamic networks; 2. as a new paradigm for theoretical analyses of graph connectivity. Within this modeling context, we discuss alternative specifications and extensions that address fundamental issues in data analysis of complex interacting systems: bridging global and local phenomena, data integration, dynamics, and scalable inference. 

SCHW03 
24th June 2008 14:20 to 14:40 
M West 
Data, models, inference and computation for dynamic cellular networks in systems biology Advances in bioengineering technologies are generating the ability to measure increasingly highresolution, dynamic data on complex cellular networks at multiple biological and temporal scales. Singlecell molecular studies, in which data is generated on the levels of expression of a small number of proteins within individual cells over time using timelapse fluorescent microscopy, is one critical emerging area. Single cell experiments have potential to develop centrally in both mechanistic studies of natural biological systems as well as via synthetic biology  the latter involving engineering of small cellular networks with welldefined function, so providing opportunity for controlled experimentation and bionetwork design. There is a substantial lag, however, in the ability to integrate, understand and utilize data generated from singlecell fluorescent microscopy studies. I will highlight aspects of this area from the perspective of our work in single cell studies in synthetic bacterial systems that emulate key aspects of mammalian gene networks central to all human cancers. I will touch on: (a) DATA: Raw data come as movies of colonies of cells developing through time, with a need for imaging methods to estimate cellspecific levels of fluorescence measuring mRNA levels of one or several tagged genes within each cell. This is complicated by the progression of cells through multiple cell divisions that raises questions of tracking the lineages of individual cells over time. (b) MODELS: In the context of our synthetic gene networks engineered into bacterial cells, we have developed discretetime statistical dynamic models inspired by basic biochemical network modelling of the stochastic regulatory gene network. These models allow the incorporation of multiple components of noise that is "intrinsic" to biological networks as well as approximation and measurement errors, and provide the opportunity to formally evaluate the capacity of single cell data to inform on biochemical parameters and "recover" network structure in contexts of contaminating noise. (c) INFERENCE & COMPUTATION: Our approaches to model fitting have developed Bayesian methods for inference in nonlinear time series. This involves MCMC methods that impute parameter values coupled with novel, effective Metropolis methods for what can be very highdimensional latent states representing the unobserved levels of mRNA or proteins on nodes in the network as well as contributions from "missing" nodes. This work is collaborative with Jarad Niemi and Quanli wang (Statistical Science at Duke), Lingchong You and CheeMeng Tan (Bioengineering at Duke). 

SCHW03 
24th June 2008 14:40 to 15:00 
Statistical network analysis and inference: methods and applications Exploring the statistical properties and hidden characteristics of network entities, and the stochastic processes behind temporal evolution of network topologies, are essential for computational knowledge discovery and prediction based on network data from biology, social sciences and various other fields. In this talk, I first discuss a hierarchical Bayesian framework that combines the mixed membership model and the stochastic blockmodel for inferring latent multifacet roles of nodes in networks, and for estimating stochastic relationships (i.e., cooperativeness or antagonisms) between roles. Then I discuss a new formalism for modeling network evolution over time based on temporal exponential random graphs (TERGM), and a MCMC algorithm for posterior inference of the latent timespecific networks. The proposed methodology makes it possible to reverseengineer the latent sequence of temporally rewiring networks given longitudinal measurements of node attributes, such as intensities of gene expressions or social metrics of actors, even when a single snapshot of such measurement resulted from each (timespecific) network is available. Joint with Edo Airoldi, Dave Blei, Steve Fienberg, Fan Guo and Steve Hanneke 

SCHW03 
24th June 2008 15:30 to 16:30 
High dimensional inference in bioinformatics and genomics Bioinformatics came to the scene when biology started to automate its experiments. Although this would have led to large n and small p situations in other sciences, the complex nature of biology meant that it soon started to focus on lots of different variables, resulting in now wellknown small n, large p situations. One such case is the inference of regulatory networks: the amount of networks is exponential in the number of nodes, whereas the available data is typically just a fraction thereof. We will present a penalized inference method that deals with such problems, that draws on experience with hypothesis testing. It has similarities with Approximate Bayesian Computation and seems to lead to exact inference in a few specific cases. 

SCHW03 
24th June 2008 16:30 to 17:30 
Liquid association for large scale gene expression and network studies The fastgrowing public repertoire of microarray gene expression databases provides individual investigators with unprecedented opportunities to study transcriptional activities for genes of their research interest at no additional cost. Methods such as hierarchical clustering, principal component analysis, gene network and others, have been widely used. They offer biologists valuable genomewide portraits of how genes are coregulated in groups. Such approaches have a limitation because it often turns out that the majority of genes do not fall into the detected gene clusters. If one has a gene of primary interest in mind and cannot find any nearby clusters, what additional analysis can be conducted? In this talk, I will show how to address this issue via the statistical notion of liquid association. An online biodata mining system is developed in my lab for aiding biologists to distil information from a web of aggregated genomic knowledgebase and data sources at multilevels, including gene ontology, protein complexes, genetic markers, drug sensitivity. The computational issue of liquid association and the challenges faced in the context of high p low n problems will be addressed. 

SCHW03 
25th June 2008 09:00 to 10:00 
R Tibshirani 
The Lasso: some novel algorithms and applications I will discuss some procedures for modelling highdimensional data, based on L1 (lasso) style penalties. I will describe pathwise coordinate descent algorithms for the lasso, which are remarkably fast and facilitate application of the methods to very large datasets for the first time. I will then give examples of new applications of the methods to microarray classification, undirected graphical models for cell pathways, and the fused lasso for signal detection, including comparative genomic hybridization. 

SCHW03 
25th June 2008 10:00 to 11:00 
Sparsity in machine Learning: approaches and analyses  
SCHW03 
25th June 2008 11:30 to 12:30 
A Owen 
Transposably invariant sample reuse: the pigeonhole bootstrap and blockwise crossvalidation Sample reuse methods like the bootstrap and crossvalidation are widely used in statistics and machine learning. They provide measures of accuracy with some face value validity that is not dependent on strong model assumptions. These methods depend on repeating or omitting cases, while keeping all the variables in those cases. But for many data sets, it is not obvious whether the rows are cases and colunns are variables, or vice versa. For example, with movie ratings organized by movie and customer, both movie and customer IDs can be thought of as variables. This talk looks at bootstrap and crossvalidation methods that treat rows and columns of the matrix symmetrically. We get the same answer on X as on X'. McCullagh has proved that no exact bootstrap exists in a certain framework of this type (crossed random effects). We show that a method based on resampling both rows and columns of the data matrix tracks the true error, for some simple statistics applied to large data matrices. Similarly we look at a method of crossvalidation that leaves out blocks of the data matrix, generalizing a proposal due to Gabriel that is used in the crop science literature. We find empirically that this approach provides a good way to choose the number of terms in a truncated SVD model or a nonnegative matrix factorization. We also apply some recent results in random matrix theory to the truncated SVD case. Related Links


SCHW03 
26th June 2008 09:00 to 10:00 
JL Wang 
Covariate adjusted functional principal component analysis for longitudinal data Classical multivariate principal component analysis has been extended to functional data and termed Functional principal component analysis (FPCA). Much progress has been made but most existing FPCA approaches do not accommodate covariate information, and it is the goal of this talk to develop alternative approaches to incorporate covariate information in FPCA, especially for irregular or sparse functional data. Two approaches are studied: the first incorporates covariate effects only through the mean response function, but the second approach adjusts the covariate effects for both the mean and covariance functions of the response. Both new approaches can accommodate measurement errors and allow data to be sampled at regular or irregular time grids. Asymptotic results are developed and numerical support provided through simulations and a data example. A comparison of the two approaches will also be discussed. 

SCHW03 
26th June 2008 10:00 to 11:00 
Penalized empirical risk minimization and sparse recovery problems A number of problems in regression and classification can be stated as penalized empirical risk minimization over a linear span or a convex hull of a given dictionary with convex loss and convex complexity penalty, such as, for instance, $\ell_1$norm. We will discuss several oracle inequalities showing how the error of the solution of such problems depends on the "sparsity" of the problem and the "geometry" of the dictionary. 

SCHW03 
26th June 2008 11:30 to 12:30 
The Nystrom extension and spectral methods in learning: lowrank approximation of quadratic forms and products Spectral methods are of fundamental importance in statistics and machine learning, as they underlie algorithms from classical principal components analysis to more recent approaches that exploit manifold structure. In most cases, the core technical problem can be reduced to computing a lowrank approximation to a positivedefinite kernel. Motivated by such applications, we present here two new algorithms for the approximation of positive semidefinite kernels, together with error bounds that improve upon known results. The first of thesebased on samplingleads to a randomized algorithm whereupon the kernel induces a probability distribution on its set of partitions, whereas the latter approachbased on sortingprovides for the selection of a partition in a deterministic way. After detailing their numerical implementation and verifying performance via simulation results for representative problems in statistical data analysis, we conclude with an extension of these results to the sparse representation of linear operators and the efficient approximation of matrix products. 

SCHW03 
26th June 2008 14:00 to 14:20 
Limiting theorems for large dimensional sample means, sample covariance matrices and Hotelling's T2 statistics It is well known that sample means and sample covariance matrices are independent if the samples are from the Gaussian distribution and are i.i.d.. In this talk, via investigating the random quardratic forms involving sample means and sample covariance matrices, we suggest the conjecture that the sample means and the sample covariance matrices under general distribution functions are asymptotically independent in the large dimensional case when the dimension of the vectors and the sample size both go to infinity with their ratio being a positive constant. As a byproduct, the central limit theorem for the Hotelling $T^2$ statistic under the large dimensional case is established. 

SCHW03 
26th June 2008 14:20 to 14:40 
JQ Shi 
Generalised gaussian process functional regression model In this talk, I will discuss a functional regression problem with nonGaussian functional (longitudinal) response with functional predictors. This type of problem includes for example binomial and Poisson response data, occurring in many bimedical and engineering experiments. We proposed a generalised Gaussian process functional regression model for such regression situation. We suppose that there exists an underlying latent process between the inputs and the response. The latent process is defined by Gaussian process functional regression model, which is connected with stepwise response data by means of a link function. 

SCHW03 
26th June 2008 14:40 to 15:00 
Estimation of large volatility matrix for highfrequency financial data Statistical theory for estimating large covariance matrix shows that even for noiseless synchronized highfrequency financial data, the existing realized volatility based estimators of integrated volatility matrix of p assets are inconsistent, for large p (the number of assets and large n (the sample size for highfrequency data). This paper proposes new types of estimators of integrated volatility matrix for noisy nonsynchronized highfrequency data. We show that when both n and p go to infinity with p/n approaching to a constant, the proposed estimators are consistent with good convergence rates. Our simulations demonstrate the excellent performance of the proposed estimators under complex stochastic volatility matrices. We have applied the methods to highfrequency data with over 600 stocks. 

SCHW03 
26th June 2008 15:30 to 16:30 
Graph decomposition for community identification and covariance constraints An application in large databases is to find wellconnected clusters of nodes in an undirected graph where a link represents interaction between objects. For example, finding tightknit communities in social networks, identifying related productclusters in collaborative filtering, finding genes which collaborate in different biological functions. Unlike graphpartitioning, in this scenario an object may belong to more than one community  for example, a person might belong to more than one group of friends, or a gene may be active in more than one genenetwork. I'll discuss an approach to identifying such overlapping communities based on extending the incidence matrix decomposition of a graph to a cliquedecomposition. Clusters are then identified by approximate variational (meanfield) inference in a related probabilistic model. The resulting decomposition has the sideeffect of enabling a parameteristion of positive definite matrices under zeroconstraints on entries in the matrix. Provided the graph corresponding to the constraints is decomposable all such matrices are reachable by this parameterisation. In the nondecomposable case, we show how the method forms an approximation of the space and relate it to more standard latent variable parameterisations of zeroconstrained covariances. 

SCHW03 
26th June 2008 16:30 to 17:30 
Permutationinvariant covariance regularisation in high dimensions Estimation of covariance matrices has a number of applications, including principal component analysis, classification by discriminant analysis, and inferring independence and conditional independence between variables, and the sample covariance matrix has many undesirable features in high dimensions unless regularized. Recent research mostly focused on regularization in situations where variables have a natural ordering. When no such ordering exists, regularization must be performed in a way that is invariant under variable permutations. This talk will discuss several new sparse covariance estimators that are invariant to variable permutations. We obtain convergence rates that make explicit the tradeoffs between the dimension, the sample size, and the sparsity of the true model, and illustrate the methods on simulations and real data. We will also discuss a method for finding a "good" ordering of the variables when it is not provided, based on the Isomap, a manifold projection algorithm. The talk includes joint work with Adam Rothman, Amy Wagaman, Ji Zhu (University of Michigan) and Peter Bickel (UC Berkeley). 

SCHW03 
27th June 2008 09:00 to 09:20 
Optimal prediction from relevant components In Helland (1990) the partial least squares regression model was formulated in terms of an algorithm on the parameters of the model. A version of this parametric algorithm has recently been used by several authors in connection to determining the central subspace and the central mean subspace of sufficient model reduction, as a method where matrix inversion is avoided. A crucial feature of the parametric PLS model is that the algorithm stops after m steps, where m is the number of relevant components. The corresponding sample algorithm will not usually stop after m steps, implying the the ordinary PLS estimates fall outside the parameter space, and thus cannot be maximally efficient. We approach this problem using group theory. The Xcovariance matrix is endowed with a rotation group, and in addition the regression coefficients upon the Xprincipal components are endowed with scale groups. This gives a transitive group on each subspace corresponding to m relevant components; more precisely, these subspaces give the orbits of the group. The ordinary PLS predictor is equivariant under this group. It is a known fact that in such situations the best equivariant estimator is equal to the Bayes estimator when the prior is taken as the invariant measure of the group. This Bayes estimator is found by a MCMC method, and is verified to be better than the ordinary PLS predictor. 

SCHW03 
27th June 2008 09:20 to 09:40 
Dimension selection with independent component analysis and its application to prediction We consider the problem of selecting the best or most informative dimension for dimension reduction and feature extraction in highdimensional data. We review current methods, and propose a dimension selector based on Independent Component Analysis which finds the most nonGaussian lowerdimensional directions in the data. A criterion for choosing the optimal dimension is based on biasadjusted skewness and kurtosis. We show how this dimension selector can be applied in supervised learning with independent components, both in a regression and classification framework. 

SCHW03 
27th June 2008 09:40 to 10:00 
L Li 
Model free variable selection via sufficient dimension reduction Sufficient dimension reduction (SDR) has proven effective to transform high dimensional problems to low dimensional projections, while losing no regression information and prespecifying no parametric model during the phase of dimension reduction. However, existing SDR methods suffer from the fact that each dimension reduction component is a linear combination of all the original predictors, and thus can not perform variable selection. In this talk, we propose a regularized SDR estimation strategy, which is capable of simultaneous dimension reduction and variable selection. We demonstrate that the new estimator achieves consistency in variable selection without requiring any traditional model, meanwhile retaining rootn estimation consistency of the dimension reduction basis. Both simulation studies and real data analyses are reported. 

SCHW03 
27th June 2008 10:00 to 11:00 
Estimation of nonlinear functionals: recent results and open problems Abstract: We present a theory of point and interval estimation for nonlinear functionals in parametric, semi, and nonparametric models based on higher order influence functions. The theory reproduces many previous results, produces new nonroot n results, and opens up the ability to perform optimal nonroot n inference in complex high dimensional models. We present novel rateoptimal point and intervals estimators for various functionals of central importance to biostatistics in settings in which estimation at the expected root n rate is not possible, owing to the curse of dimensionality. We also show that our higher order influence functions have a multirobustness property that extends the double robustness property of first order influence functions. Open questions will be discussed 

SCHW03 
27th June 2008 11:30 to 12:30 
Applications of approximate inference and experimental design for sparse (generalised) linear models Sparsity, or more general subGaussianity, is a fundamental regularization principle for highdimensional statistics. A recent surge of activity has clarified the behaviour of efficient sparse estimators in the worst case, but much less is known about practically efficient approximations to Bayesian inference, which is required for higherlevel tasks such as experimental design. We present an efficient framework for Bayesian inference on generalized linear models with sparsity priors, based on the expectation propagation algorithm, a deterministic variational approximation. We highlight some applications where this framework produces promising results. We hope to convey the relevance of approximate inference methods in practice, which substantially go beyond point estimation, yet whose theoretical properties and algorithmic scalability remains insufficiently understood. 

SCHW03 
27th June 2008 14:00 to 15:00 
Statistics in astronomy: the TaiwaneseAmerican occultation survey More than a thousand small planetary bodies with radii >100 km have recently been detected beyond Neptune using large telescopes. The purpose of the TAOS project is to measure directly the number of these Kuiper Belt Objects (KBOs) down to the typical size of cometary nuclei (a few km). When a KBO moves in between the earth and a distant star it will block the starlight momentarily, for about a quarter of a second. A telescope monitoring the starlight will thus see it blinking. Three small (20 inch) dedicated robotic telescopes equipped with 2,048 x 2,048 CCD cameras are operated in a coincidence so that the sequence and timing of the three separate blinks can be used to distinguish real events from false alarms. A fourth telescope will be added soon. TAOS will increase our knowledge about the Kuiper Belt, the home of most short period comets that return to the inner solar system every few years. This knowledge will help us to understand the formation and evolution of comets in the early solar system as well as to estimate their flux of impacting our home planet. In this talk I will describe some of the statistical challenges that arise when hundreds or thousands of stars are simultaneously monitored every quarter of a second, every night of the year on which observation is possible, with the aim of detecting a few events. TAOS will produce a databank of the order of 10 terabytes per year, which is small by the standards of recent and future astronomical surveys. My intent in this talk is not to provide definitive methods of analysis but, rather, I hope that this concrete example of high dimensional nonGaussian data informs the discussion of future directions in high dimensional data analysis to which this meeting is devoted. Related Links 