skip to content

Principal components analysis in tree space

Tuesday 21st June 2011 - 16:30 to 16:50
INI Seminar Room 1
Phylogenetic analysis commonly gives rise to a collection or sample of inferred evolutionary trees, each differing from the others. There is a need for methods that visualize, compare, and quantify variability in such sets of trees, in terms of both topological and geometrical differences. Standard tools of multivariate analysis such as multi-dimensional scaling and clustering have been applied to sets of trees, but Principal Components Analysis (PCA) cannot be applied directly since the space of evolutionary trees on a fixed set of taxa is not a vector space. I propose a novel geometrical approach to PCA in tree-space that works in an analogous way to standard linear Euclidean PCA. Given a data set of phylogenetic trees, a geodesic path is sought that maximises the variance of the data under a form of projection within tree-space onto the path. Geodesic paths identified in this way reveal and quantify the principal sources of variation in the original collection of trees in terms of both topology and branch lengths, and can be visualized as animations of smoothly changing alternative evolutionary trees. The potential of the approach is illustrated by applying tree-space PCA to experimental data from metazoa and a simulation study of long-branch attraction.
The video for this talk should appear here if JavaScript is enabled.
If it doesn't, something may have gone wrong with our embedded player.
We'll get it fixed as soon as possible.
Presentation Material: 
University of Cambridge Research Councils UK
    Clay Mathematics Institute London Mathematical Society NM Rothschild and Sons