Explicit bounds for the stability of maximum likelihood trees
Seminar Room 1, Newton Institute
The application of the maximum likelihood framework for inferring phylogenetic trees is based on the adoption of an explicit evolution model on which depend both the analysis of sequence evolution and the computation of the likelihood of a tree. Since this computation is based on the observed nucleotides, which are only in finite number, the likelihood has a random component, possibly large, and the robustness of the inferred tree has to be assessed.
Felsenstein's bootstrap test, along with the subsequently developed bootstrap-based tests, is the most commonly used test of reliability of a tree. However, it discards the relation between the size of the data, the number of species in the study and the stability of the phylogeny. The bootstrap is also based on resampling with replacement, which can be time consuming especially for large sets of data.
We propose to bound the variability of the empirical likelihood around its true value, for a given phylogeny. We also bound the probability of a phylogeny being better than another one "just by chance" when in reality it is worse. These bounds, obtained with measure concentration tools, account for the number of species and the number of nucleotides. In particular, they give the minimum number of nucleotides needed to achieve a given confidence level, when working with a given number of species.
Finally, to illustrate the behaviour of our method, we compare it to bootstrap on a toy example.
If it doesn't, something may have gone wrong with our embedded player.
We'll get it fixed as soon as possible.