skip to content

The ultrametric topology perspective on analysis of massive, very high dimensional data stores

Presented by: 
F Murtagh [Royal Holloway]
Wednesday 9th January 2008 - 09:00 to 10:00
INI Seminar Room 1
Session Chair: 
Jianqing Fan

An ultrametric topology formalizes the notion of hierarchical structure. An ultrametric embedding, referred to here as ultrametricity, is implied by a hierarchical embedding. Such hierarchical structure can be global in the data set, or local. By quantifying extent or degree of ultrametricity in a data set, we show that ultrametricity becomes pervasive as dimensionality and/or spatial sparsity increases. This leads us to assert that very high dimensional data are of simple structure. We exemplify this finding through a range of simulated data cases. We discuss also application to very high frequency time series segmentation and modeling. Other applications will be described, in particular in the area of textual data mining.


[1] F. Murtagh, On ultrametricity, data coding, and computation, Journal of Classification, 21, 167-184, 2004.

[2] F. Murtagh, G. Downs and P. Contreras, "Hierarchical clustering of massive, high dimensional data sets by exploiting ultrametric embedding", SIAM Journal on Scientific Computing, in press, 2007.

[3] F. Murtagh, The remarkable simplicity of very high dimensional data: application of model-based clustering, submitted, 2007.

[4] F. Murtagh, Symmetry in data mining and analysis: a unifying view based on hierarchy, submitted, 2007.

Related Links

The video for this talk should appear here if JavaScript is enabled.
If it doesn't, something may have gone wrong with our embedded player.
We'll get it fixed as soon as possible.
University of Cambridge Research Councils UK
    Clay Mathematics Institute London Mathematical Society NM Rothschild and Sons