Recovering reticulation: a theoretical algorithm for a practical problem
Seminar Room 2, Newton Institute Gatehouse
Reticulate (non-tree-like) evolution is a fundamental process in the evolution of certain groups of species. This process results in species being a composite of DNA regions derived from different ancestors. Consequently, conflicting signals in a data set may not be the result of sampling or modelling errors, but due to the fact that reticulation has played a role in the evolutionary history of the species under consideration. Such species include certain birds and plants.
Assuming that our initial data set is correct, a fundamental problem for biologists is to compute the minimum number of reticulation events that explains this set. This smallest number sets a lower bound on the number of such events and provides an indication of the extent that reticulation has had on the evolutionary history of a collection of present-day species.
In this talk, we focus our attention on this problem for when the initial set consists of two (evolutionary) trees. This may seem rather special, but there are several reasons for this. Firstly, the problem is NP-hard even when the initial set consists of two such trees. Secondly, we are interested in finding a general solution rather than one that is restricted in some way. Lastly, the problem for when the initial data set consists of binary sequences can be interpreted as a sequence of two-tree problems.