skip to content

Statistical Theory and Methods for Complex, High-Dimensional Data

Participation in INI programmes is by invitation only. Anyone wishing to apply to participate in the associated workshop(s) should use the relevant workshop application form.

7th January 2008 to 27th June 2008
David Banks Duke University
Peter Bickel University of California, Berkeley
Iain Johnstone Stanford University
Mike Titterington University of Glasgow


Scientific Advisors: Professor CM Bishop (Microsoft Research), Professor P Hall (Australian National), Professor J Shawe-Taylor (University College London) and Professor S van de Geer (Zurich)

Programme Theme

Most of twentieth-century statistical theory was restricted to problems in which the number p of 'unknowns', such as parameters, is much less than n, the number of experimental units. However, the practical environment has changed dramatically over the last twenty years or so, with the spectacular evolution of computing facilities and the emergence of applications in which the number of experimental units is comparatively small but the underlying dimension is massive, leading to the desire to fit complex models for which the effective p is very large. Areas of application include image analysis, microarray analysis, finance, document classification, astronomy and atmospheric science. Some methodological advances have been made, but there is a need to provide firm consolidation in the form of a systematic and critical assessment of the new approaches as well as appropriate theoretical underpinning in this 'large p, small n' context. The existence of key applications strongly motivates the programme, but the fundamental aim is to promote core theoretical and methodological research. Both frequentist and Bayesian paradigms will be featured. The programme is directed at a broad research community, including both mainstream statisticians and the growing population of researchers in machine learning. The methodological issues likely to be covered fall roughly into four overlapping categories:

  • strategies for explicit and implicit dimension-reduction, including latent-structure methods, semiparametric models and large-scale multiple testing;
  • classification methods for complex datasets, including machine-learning methods such as support vector machines;
  • asymptotics for increasing dimension, including the application of random matrix theory to high-dimensional multivariate methods;
  • graphical and other visualisation methods for complex datasets.

In addition, discussion of particular applications will permeate the programme.

Additional Sponsor

MS Research logo

Final Scientific Report: 
University of Cambridge Research Councils UK
    Clay Mathematics Institute London Mathematical Society NM Rothschild and Sons