skip to content
 

Statistical Theory and Methods for Complex, High-Dimensional Data

7th January 2008 to 27th June 2008

Organisers: Professor D Banks (Duke), Professor P Bickel (California, Berkeley), Professor IM Johnstone (Stanford) and Professor DM Titterington (Glasgow)

Scientific Advisors: Professor CM Bishop (Microsoft Research), Professor P Hall (Australian National), Professor J Shawe-Taylor (University College London) and Professor S van de Geer (Zurich)

Programme Theme

Most of twentieth-century statistical theory was restricted to problems in which the number p of 'unknowns', such as parameters, is much less than n, the number of experimental units. However, the practical environment has changed dramatically over the last twenty years or so, with the spectacular evolution of computing facilities and the emergence of applications in which the number of experimental units is comparatively small but the underlying dimension is massive, leading to the desire to fit complex models for which the effective p is very large. Areas of application include image analysis, microarray analysis, finance, document classification, astronomy and atmospheric science. Some methodological advances have been made, but there is a need to provide firm consolidation in the form of a systematic and critical assessment of the new approaches as well as appropriate theoretical underpinning in this 'large p, small n' context. The existence of key applications strongly motivates the programme, but the fundamental aim is to promote core theoretical and methodological research. Both frequentist and Bayesian paradigms will be featured. The programme is directed at a broad research community, including both mainstream statisticians and the growing population of researchers in machine learning. The methodological issues likely to be covered fall roughly into four overlapping categories:

  • strategies for explicit and implicit dimension-reduction, including latent-structure methods, semiparametric models and large-scale multiple testing;
  • classification methods for complex datasets, including machine-learning methods such as support vector machines;
  • asymptotics for increasing dimension, including the application of random matrix theory to high-dimensional multivariate methods;
  • graphical and other visualisation methods for complex datasets.

In addition, discussion of particular applications will permeate the programme.

Additional Sponsor

MS Research logo

Final Scientific Report: 
University of Cambridge Research Councils UK
    Clay Mathematics Institute The Leverhulme Trust London Mathematical Society Microsoft Research NM Rothschild and Sons