skip to content

Finding low-dimensional structure in high-dimensional data

Thursday 10th January 2008 - 15:30 to 16:30
INI Seminar Room 1
Session Chair: 
Enno Mammen

In high-dimensional data analysis, one is often faced with the problem that real data is noisy and in many cases given in coordinates that are not informative for understanding the data structure itself or for performing later tasks, such as clustering, classification and regression. The combination of noise and high dimensions (>100-1000) presents challenges for data analysis and calls for efficient dimensionality reduction tools that take the inherent geometry of natural data into account. In this talk, I will first describe treelets – an adaptive multi-scale basis inspired by wavelets and hierarchical trees. I will then, in the second half of my talk, describe diffusion maps -- a general framework for dimensionality reduction, data set parameterization and clustering that combines ideas from eigenmaps, spectral graph theory and harmonic analysis. Our construction is based on a Markov random walk on the data, and allows one to define a system of coordinates that is robust to noise, and that reflects the intrinsic geometry or connectivity of the data points in a diffusion process. I will outline where we stand and what problems still remain.

(Part of this work is joint with R.R. Coifman, S. Lafon, B. Nadler and L. Wasserman)

The video for this talk should appear here if JavaScript is enabled.
If it doesn't, something may have gone wrong with our embedded player.
We'll get it fixed as soon as possible.
University of Cambridge Research Councils UK
    Clay Mathematics Institute London Mathematical Society NM Rothschild and Sons