# Seminars (PLGW01)

Videos and presentation materials from other INI events are also available.

Search seminar archive

Event When Speaker Title Presentation Material
PLGW01 3rd September 2007
10:00 to 11:00
Computational and mathematical challenges involved in very large-scale phylogenetics

Phylogenetic inference presents enormous computational and mathematical challenges, but these are particularly exacerbated when dealing with very large datasets (containing thousands of sequences) or when sequences evolve under complex models of evoluiton. In this talk, I will describe some of the recent progress in large-scale phylogenetics. In particular, I will talk about multiple sequence alignment and its implications for large-scale phylogenetics.

PLGW01 3rd September 2007
11:30 to 11:50
Dendroscope - an interactive viewer for large phylogenetic trees

Research in evolution requires software for visualizing and editing phylogenetic trees, for increasingly very large datasets, such as arise in expression analysis or metagenomics, for example. It would be desirable to have a program that provides these services in an efficient and user-friendly way, and that can be easily installed and run on all major operating systems. Although a large number of tree visualization tools are freely available, some as a part of more comprehensive analysis packages, all have drawbacks in one or more domains. They either lack some of the standard tree visualization techniques or basic graphics and editing features, or they are restricted to small trees containing only tens of thousands of taxa. Moreover, many programs are difficult to install or are not available for all common operating systems. We have developed a new program, Dendroscope, for the interactive visualization and navigation of phylogenetic trees. The program provides all standard tree visualizations and is optimized to run interactively on trees containing hundreds of thousands of taxa. The program provides tree editing and graphics export capabilities. To support the inspection of large trees, Dendroscope offers a magnification tool. The software is written in Java 1.4 and installers are provided for Linux/Unix, MacOS X and Windows XP. Dendroscope is a user-friendly program for visualizing and navigating phylogenetic trees, for both small and large datasets.

PLGW01 3rd September 2007
11:50 to 12:10
Appropriate models for heterogeneous multi-gene data sites

Multigene data sets are becoming the norm for phylogenetic studies; so called phylogenomic datasets may involve hundreds of genes for many species. These data sets create challenges for current phylogenetic methods, as different genes have different functions and hence evolve under different processes. The question is how best to model this heterogeneity to give reliable phylogenetic estimates of the species tree.

Here we discuss two approaches. The first approach involves stochastic partitioning of genes into different classes, the second approach is a form of penalised maximum likelihood where the penalty term 'encourages' parameter estimates to be similar between different gene trees.

PLGW01 3rd September 2007
12:10 to 12:30
H Li Incorporating speices phylogeny in the reconstruction of gene trees

Reconstructing accurate trees is one of the most challenging topics in phylogenetics. In this talk, two new algorithms will be presented in attempt to improve the accuracy of gene tree building by incorporating species phylogeny. Both of them are presented in the CFG (Context-Free Grammar) framework. The first algorithm is called tree merge, which takes a set of gene trees built by different algorithms and then merges them together to get a new tree that can be better than all the input. The second algorithm calculates a likelihood with respect to the species evolution and incorporates this likelihood to the likelihood calculated from the sequence alignment. Maximum likelihood tree can then be searched. The two algorithms have been implemented in NJTREE software which is the core engine of the TreeFam database. NJTREE is also extensively used in the orthology prediction pipeline of Ensembl Compara, and is proved to be more accurate than traditional algorithms in reconstructing phylogenetic trees in a multi-gene family.

PLGW01 3rd September 2007
14:00 to 15:00
Mixture models in phylogenetic inference

I describe a general likelihood-based mixture model approach for inferring phylogenetic trees from gene-sequence or other character- state data. Conventional models of gene-sequence evolution assume that all sites evolve according to a single homogeneous model of evolution or that rates of evolution vary among sites according to some statistical distribution, such as the gamma. In a phylogenetic context, a mixture model allows for more than one model of the evolutionary process to be fitted to each site in the alignment, with the likelihood being summed over models at each site. I describe how mixture modelling can accommodate cases in which different sites in the alignment evolve in qualitatively distinct ways, a phenomenon we call pattern heterogeneity. Mixture models can also accommodate sites whose rate of evolution varies in different parts of the tree. This is often called heterotachy and I show how a mixture model approach can improve on a parametric covarion model for heterotachous data, and how the model can be used to identify which sites are evolving heterotachously and where in the tree this behaviour occurs. The mixture model does not require prior knowledge of the patterns or of the nature of the heterotachy in the data, nor does it partition the data. We present studies to show that the model correctly retrieves known patterns from simulated gene-sequence data, and present evidence that it generally improves the likelihood of the data substantially over simpler models of gene-sequence evolution. Mixture modelling seems a promising technique for discovering some of the complex evolutionary processes that underly sequence evolution, and may be of interest to researchers who wish to identify how specific sites respond adaptively to particular environments. We implement the model within a Bayesian Markov-Chain Monte Carlo framework, and make it available in our BayesPhylogenies software (www.evolution.rdg.ac.uk).

PLGW01 3rd September 2007
15:30 to 15:50
G Margos Multilocus sequence analysis of Borrelia burgdorferi s.l.

Lyme borreliosis, the most frequent vector-borne disease in the Northern Hemisphere, is caused by spirochaetes belonging to the genus Borrelia. These bacteria now comprise 12 named "species", thus forming a species complex referred to as Borrelia burdorferi sensu lato. All known genotypes of this species complex are maintained by vertebrate hosts and ticks, however, their maintenance systems and pathogenic properties are remarkably diverse. A major question addressed in our project is whether genotypically and ecologically defined Borrelia populations are congruent. So far, most genotyping schemes of Borrelia are based on non-coding spacer regions and genes encoding outer surface proteins, many of which are under frequency-dependent immune selection or prone to recombination. In order to better resolve the geographical population structure and the phylogenetic relationships of these bacteria, we are currently developing and exploring multilocus sequence analysis (MLSA) based on housekeeping genes. In this talk I will present some 'proof of principle' data on the suitability of MLSA for genotyping spirochaetal strains.

PLGW01 3rd September 2007
15:50 to 16:10
Comparing phylogenetic trees

Comparing phylogenetic trees is an important part of analyzing biological data sets. Many of the popular distance metrics are NP-hard but have been shown to efficient to approximate and tractable in small cases. We discuss recent results in comparing two phylogenetic trees under the tree-bisection-and-reconnection (TBR) and subtree-prune-and-regraft (SPR) distances.

PLGW01 3rd September 2007
16:10 to 16:30
Phylogenetic reconstructions based on RNA secondary structure

It is known that RNA secondary structure motifs are usually conserved, and have a vital influence on the activity of the RNA molecules. We present a novel approach for phylogenetic reconstruction, based on comparing secondary structures of homologous RNA molecules from different species. By manipulating the RSmatch algorithm for measuring RNA secondary structure similarity, we computed pairwise distances between secondary structures. Consequently, we applied the neighbor joining algorithm on the distance matrix, to produce phylogenetic trees. Initial analysis of the results exhibits a good agreement with the accepted taxonomy. Comparison with the trees based RNA sequences alone have also produced favorable results.

This is joint work with Moran Cabili, Assaf Meirovich, and Metsada Pasmanik-Chor

PLGW01 4th September 2007
09:00 to 10:00
Advances and limitations of maximum likelihood phylogenetics

I'll present recent developments in inferring phylogenies from sequences by maximum likelihood (ML). In fact, this approach involves two main components: probabilistic modelling of substitution events, and algorithmics to infer near-optimal trees. I'll show that algorithmics has considerably progressed during the last few years, allowing now for fast ML inference from large datasets. On the other hand, I'll underline the difficulties encountered with modelling, especially of protein evolution, and show that elaborating more realistic and improved substitution models is a major challenge.

PLGW01 4th September 2007
10:00 to 10:20
Restrictions on meaningful phylogenetic networks
PLGW01 4th September 2007
10:20 to 10:40
Inference rules for super-network construction

A fundamental aspect of understanding molecular evolution is species phylogeny reconstruction. A commonly used approach in this context is to infer it from a collection of gene tress using, for example, a supertree or supermatrix method. Although popular, one of the main drawbacks of such a method is that it can suppress conflict within the gene trees arising from, for example, lateral gene transfer or recombination. Until now, the only known approaches capable of accommodating such conflict have been Z-closure super-networks and Q-imputation super-networks.

In this talk, we introduce the novel Y-inference rule and show that under certain circumstances this inference rule on its own or in combination with one Meacham's inference rules constitute an attractive alternative approach that complements the above list of super-network approaches.

This is joint work with Stefan Gruenewald (AS-MPG Partner Institute for Computational Biology, China) and Qiong Wu (University of East Anglia, UK).

PLGW01 4th September 2007
10:40 to 11:00
Maximum parsimony on phylogenetic networks: some rigorous results

Phylogenies, the evolutionary histories of groups of organisms, play a major role in representing the interrelationships among biological entities. Although many biological processes can be effectively modeled and summarized as a binary tree, others cannot: recombination, hybrid speciation, and horizontal gene transfer result in networks of relationships rather than trees of relationships.

Maximum parsimony (MP) is one of the most popular methods used for phylogenetic tree reconstruction. Roughly, this method is based on the assumption that "evolution is parsimonious", that is, the best evolutionary trees are the ones that minimize the number of changes along the edges of the tree.

In previous works, we formulated a maximum parsimony (MP) criterion for reconstructing and evaluating phylogenetic networks, and demonstrated its quality on biological as well as synthetic datasets. In this work, we show further theoretical insights and results regarding various aspects of the MP criterion of phylogenetic networks.

PLGW01 4th September 2007
11:30 to 11:50
Integer programming for phylogenetic computations

We consider the following three related problems: 1) Given complete data (taxon by character data) remove the minimum number of characters so that the remaining data is pairwise compatible (contains a perfect phylogeny); 2) Given data with missing entries, impute the missing data in order to minimize, in the completed data, the number of pairs of characters that are incompatible; 3) Given data with missing entries, impute the missing data to minimize, in the completed data, the number of characters that must be removed so that the remaining sites are pairwise compatible. We consider in detail the cases of two, three and four states allowed per character and describe compact integer programming formulations that solve these problems (the approach extends in principle to any fixed number of states). We discuss empirical tests on problem instances of size of interest in phylogenetics, and establish that these instances can be solved very efficiently in practice, even though the problems are all NP-hard.

PLGW01 4th September 2007
11:50 to 12:10
Quartet compatability: new results and surprising counterexamples

Many methods in phylogenetic tree reconstruction are not feasible for a large number of taxa. Hence, it is a natural approach to construct trees on small taxa sets first and then to look for a big tree (supertree) on the union of the taxa sets that contains all information of the small trees. However, even if all input trees are quartets, i.e. binary trees with four leaves, it is an NP-complete problem to decide whether they are compatible.

I will present a sufficient condition for a set of binary phylogenetic trees to be compatible. That result can be used to give a much shorter proof of the known characterization of quartet sets of minimal size which are defining, i.e. compatible with a unique supertree. Further, I will show some quartet sets with surprising properties which are counterintuitive and explain the hardness of the compatibility problem. One example is an incompatible collection of quartets for which every two quartets overlap in at most one taxon. It has been known for fifteen years that such quartet sets exist but the example is the first one that has been constructed.

PLGW01 4th September 2007
12:10 to 12:30
On k-leaf powers

In 2002, Nishimura, Ragde and Thilikos introduced the notion of k-leaf power and k-leaf root, motivated by the following: ... a fundamental problem in computational biology is the reconstruction of the phylogeny, or evolutionary history, of a set of species or genes, typically represented as a phylogenetic tree ...''. The species occur as leaves of the phylogenetic tree. Let G=(V_G,E_G) be a finite undirected graph. For k at least 2, a tree T is a $k$-leaf root of G if V_G is the leaf set of T and two vertices x,y in V_G are adjacent in G if and only if their distance d_T(x,y) in T is at most k. The graph G is a $k$-leaf power if it has a k-leaf root. Obviously, a graph is a 2-leaf power if and only if it is the disjoint union of cliques, i.e., it contains no induced path of three vertices. Recent work on leaf powers contains characterisations of 3- and 4-leaf powers and their variants. It is well known that every k-leaf power is a strongly chordal graph. For k at least 6, no characterisation of $k$-leaf powers and no efficient recognition is known. We discuss some open problems and new results on leaf powers.

PLGW01 4th September 2007
14:00 to 15:00
Tripartitions in phylogenetic networks

Phylogenetic networks are a generalization of phylogenetic trees that allow for the representation of reticulate evolution events, like recombination, hybridization, and lateral gene transfer. In a recent series of papers devoted to the study of reconstructibility of phylogenetic networks, the well-known bipartition metric for phylogenetic trees was generalized to the so-called tripartition metric for phylogenetic networks. In this talk, we show that, in fact, this tripartition metric does not satisfy the separation axiom of distances (zero distance means isomorphism, or, in a more relaxed version, zero distance means indistinguishability in some specific sense) in any of the classes of phylogenetic networks where it is claimed to do so. We also present a class of phylogenetic networks whose members can be singled out by means of their sets of tripartitions, and hence, where the latter can be used to define a meaningful metric.

PLGW01 4th September 2007
15:30 to 15:50
Branch and bound construction of balanced minimum evolution optimal trees

The Balanced Minimum Evolution (BME) score has been recently proposed as a criterion for reconstructing phylogenetic trees based on a distance matrix. It is based on Pauplin's formula, which provides a natural estimate of the total length of a tree, based on its topology and a matrix of estimated pairwise distances. The objective is to find the tree topology that minimizes this length estimate, as short trees are usually the ones that best reflect the data.

Recently, it has been shown that Neighbor Joining can be viewed as a greedy algorithm trying to minimise BME. Together with other theoretical reasons, there are strong experimental reasons supporting BME-guided tree reconstruction. However, the published methods are heuristic and do not attempt to construct BME-optimal trees.

The main aim of this talk will be to present a Branch and Bound approach for finding BME-optimal trees. We derived a simple bound on the BME score of a tree based on the score of a partially constructed tree. This allows us to eliminate the need to explore large parts of the space of all possible trees, but still guarantees that all optimal trees will be found. The efficiency of this approach compares well with that of other Branch and Bound approaches such as the ones for Maximum Parsimony. Finally the topological accuracy of the reconstructed trees will also be discussed.

PLGW01 4th September 2007
15:50 to 16:10
New analytical results for the speciation times in neutral models

Did a reconstructed phylogeny evolve under a neutral model? We provide an analytical way  opposed to the common simulation approach  for comparing the speciation times of a reconstructed tree with the dates proposed by the neutral models, e.g. via lineages-through-time plots. Further, if dates are missing in a reconstructed tree which evolved under a neutral model, we provide the time of the speciation events. These methods, which make use of the rank function on a phylogenetic tree, can also be applied for calculating analytically the loss in biodiversity when some species become extinct. The neutral models considered in this work are the Yule model and a generalization which includes extinction as well, the critical branching process (CBP) model, recently introduced by D. Aldous and L. Popovic. Trees under these neutral models are conditioned on having $n$ extant species today which is crucial for data analysis.

PLGW01 4th September 2007
16:10 to 16:30
Partitioning the sample space on five taxa for the neighbour joining algorithm

In this talk, we will analyze the behavior of the Neighbor Joining algorithm on five taxa and we will show that the partition of the sample (data) space for estimation of a tree topology with five taxa into subspaces, within each of which the Neighbor Joining algorithm returns the same tree topology. A key of our method to partition the sample space is the action of the symmetric group $S_5$ on the set of distance matrices by changing the labels of leaves. The method described in this paper can be generalized to trees with more than five taxa.

This is joint work with Kord Eickmeyer and a preprint is available at http://arxiv.org/abs/math.CO/0703081.

PLGW01 5th September 2007
09:00 to 10:00
Reconstruction of phylogeny by stratophenetic tracing in the fossil record and its comparison with molecular data

The information on the relatedness between taxa and the direction of evolution are only indirectly coded in the input data of all molecular phylogenies. Fossil record, on the other hand, contains direct information on the order, absolute timing and nature of evolutionary events. There is, however, a small hitch: the fossil record only preserves a fraction of the phylogenetically relevant information and it is often so patchy and incomplete, that reconstructions of phylogeny based on fossil data are not superior to molecular methods. A notable exception is presented by the fossil record of marine microplankton, in particular planktonic foraminifera. Their calcite shells are rich in characters and their prolific production and excellent preservation in oceanic sediments produced probably the most complete fossil record on Earth. Here, phylogenetic relationships can be reconstructed by the method of stratophenetic tracing  the nearest one can come to direct observations from a time machine  where both the topology and the timing of divergences are inferred by back-peeling of layers of sediments and tracking changes in morphology of the evolving species through time. The fossil record of planktonic foraminifera thus offers a unique possibility to ground truce molecular phylogenies, the methods that produce them and the assumptions and processes on which the methods are based. We will present a comparison between fossil phylogeny of one monophylum of planktonic foraminifera with corresponding SSU rDNA phylogenies, showing unequivocal direct evidence for long-branch artefacts, enormous differences in substitution rates and incongruent tree topologies, underscoring the potential of foraminifera as a model organism in phylogenetics.

PLGW01 5th September 2007
10:00 to 10:20
Explicit bounds for the stability of maximum likelihood trees

The application of the maximum likelihood framework for inferring phylogenetic trees is based on the adoption of an explicit evolution model on which depend both the analysis of sequence evolution and the computation of the likelihood of a tree. Since this computation is based on the observed nucleotides, which are only in finite number, the likelihood has a random component, possibly large, and the robustness of the inferred tree has to be assessed.

Felsenstein's bootstrap test, along with the subsequently developed bootstrap-based tests, is the most commonly used test of reliability of a tree. However, it discards the relation between the size of the data, the number of species in the study and the stability of the phylogeny. The bootstrap is also based on resampling with replacement, which can be time consuming especially for large sets of data.

We propose to bound the variability of the empirical likelihood around its true value, for a given phylogeny. We also bound the probability of a phylogeny being better than another one "just by chance" when in reality it is worse. These bounds, obtained with measure concentration tools, account for the number of species and the number of nucleotides. In particular, they give the minimum number of nucleotides needed to achieve a given confidence level, when working with a given number of species.

Finally, to illustrate the behaviour of our method, we compare it to bootstrap on a toy example.

PLGW01 5th September 2007
10:20 to 10:40
D Metzler Insertion-deletion models for dequence evolution and Bayesian sampling methods for multiple alignment

In current practice the estimation of a phylogenetic tree is a two-step procedure: first a multiple alignment is computed and subsequently a phylogenetic tree is reconstructed, based on the alignment. However, it is well known that the alignment and the tree reconstruction problem are intertwined. Thus, it is of great interest to estimate alignment and tree simultaneously. A precondition for this is the availability of biologically plausible sequence evolution models which are compatible with efficient alignment algorithms. We discuss a variant of the Thorne-Kishino-Felsenstein model, having equal rates of insertions and of deletions of sequence fragments, for more than two sequences related by a phylogenetic tree. Finally, we discuss the perspectives for statistical multiple alignment based on this type of model.

Most parts of this talk are based on joint work with Roland Fleissner, Arndt von Haeseler, Ana Arribas-Gil, and Anton Wakolbinger.

PLGW01 5th September 2007
10:40 to 11:00
Indentifiability of the GTR=gamma substitution model of DNA evolution

The GTR+Gamma+I model of DNA substitution is probably the most widely used model in practical phylogenetic inference. However, a valid proof of the identifiability of its parameters (tree topology, root distribution, rate matrix, proportion of invariable sites, and alpha shape parameter) has not been given.

After indicating the gaps in the previously-published proof of the GTR+Gamma+I model, and discussing various mathematical approaches for establishing identifiability results, a sketch of a recent proof of the identifiability of GTR+Gamma will be given.

This is joint work with Cecile Ane' and John Rhodes.

PLGW01 5th September 2007
11:30 to 12:30
Likelihood calculations on the ancestral recombination graph
PLGW01 6th September 2007
09:00 to 10:00
Model averaging in phylogenies

It is well known that the use of different probabilistic models of evolution may change the results of phylogenetic analyses, and during the last years much research has been devoted to the selection of best-fit models. An alternative to model selection is model averaging or multimodel inference, where all models are simultaneously used to infer different parameters of the substitution process, or the phylogenetic tree (topology and branches) itself. Here I will discuss some preliminary simulations that suggest that model-averaging might not be very useful to estimate substitution parameters, but that might reduce the variance associated to the estimation of tree topology. New software that implements model averaging for phylogenetics will also be introduced.

PLGW01 6th September 2007
10:00 to 10:20
FA Matsen Glimpses from the strange world of phylogenetic mixtures

Rates of evolution and evolutionary history may vary across the genome of a given organism. The standard way of modeling the resulting data is a "mixture model" where a given phylogenetic pattern comes from one of a collection of trees with fixed probabilities. In this talk I will focus on recent results (joint with Mike Steel and Elchanan Mossel) which investigate what pattern probabilities are and are not possible when rates of evolution vary on a single topology.

In particular, we have shown that a mixture of two processes on a tree of one topology can produce exactly the same expected site pattern frequencies (i.e. sequence data) as an unmixed process on a different topology. For mixtures of a number of processes the situation is even more dire: in a certain sense the topology is not uniquely determined for a majority of data vectors. I will also present other related results along with the associated mathematics and indicate some interesting future directions for research.

PLGW01 6th September 2007
10:20 to 10:40
Neighbor joining algorithms for inferring phylogenies via LCA-distances

Reconstructing phylogenetic trees efficiently and accurately from distance estimates is an ongoing challenge in computational biology from both practical and theoretical considerations. We study algorithms which are based on a characterization of edge-weighted trees by distances to LCAs ({\em Least Common Ancestors}). This characterization enables a direct application of ultrametric reconstruction techniques to trees which are not necessarily ultrametric. A simple and natural neighbor joining criterion based on this observation is used to provide a family of efficient neighbor-joining algorithms. These algorithms are shown to reconstruct a refinement of the Buneman tree, which implies optimal robustness to noise under criteria defined by Atteson. In this sense, they outperform many popular algorithms such as Saitou\&Nei's NJ. Preliminary experimental results indicate that when executed from an appropriate root taxon, this algorithm provides reconstruction of phylogenies which are competitive with NJ and other common algorithms.

PLGW01 6th September 2007
10:40 to 11:00
Reconstruction of phylogenetic trees with very short branches

One of the central challenges in phylogenetics is the accurate reconstruction of phylogenetic trees from short taxon-sequences. The sequence length required for correct topological reconstruction depends on certain properties of the tree such as its depth and edge-weights. Of special interest are fast converging algorithms, which, under the assumption of a constant lower bound on edge weights, require only polynomial length sequences to guarantee correct topological reconstruction with high probability. However, when the original phylogenetic tree contains very short edges, the required sequence-length may be too long for practical purposes. Moreover, incorrect reconstruction of short edges may (and in practice often does) propagate in the sense that it prevents the correct reconstruction of long edges as well. Therefore, most known reconstruction algorithms assume that the original tree has a fully resolved topology and all its edges are sufficiently long.

In this talk we present a fast converging reconstruction algorithm which avoids this assumption. The algorithm returns a partially resolved topology which contains all sufficiently long edges of the original tree, and no edges which contradict the original topology. It is designed to reconstruct more edges when the input sequences grow longer. Unlike existing fast converging reconstruction algorithms, this method provides a reconstruction guarantee for every taxon-sequence length. This makes it more appropriate for practical use.

PLGW01 6th September 2007
11:30 to 11:50
Nature reserve selection problems

The Nature Reserve Selection Problem (NRS) is a problem that arises in the context of conservation biology. Subject to budgetary constraints, the problem is to select a set of conservation areas to preserve so that the phylogenetic diversity of the species contained within those areas is maximized. It was recently shown that NRS is NP-hard. In this talk, we give a tight polynomial-time approximation algorithm for NRS and describe a closely-related problem for which little is known.

PLGW01 6th September 2007
11:50 to 12:10
Computing phylogenetic diversity for split systems

In conservation biology, it is important to measure, predict, and preserve biodiversity as many species are facing extinction. In 1992, Faith proposed measuring the diversity of a collection of species in terms of their relationship on a phylogenetic tree, and using this information to identify collections of species with high diversity. Here we are interested in some variants of the resulting optimisation problem that arise when considering species whose evolution is better represented by a network rather than a tree. More specifically, we consider the problem of computing phylogenetic diversity relative to split systems. We show that for general split systems, this problem is NP-hard. In addition we provide some efficient algorithms for some special classes of split systems, in particular presenting an optimal O(n) time algorithm for phylogenetic trees and an O(n log n + nk) time algorithm for circular split systems.

Keywords: Phylogenetic tree, Phylogenetic network, Phylogenetic diversity, Biodiversity conservation, Split systems

PLGW01 6th September 2007
12:10 to 12:30
K Hartmann Prioritising species for biodoversity conservation

Conservation organisations are often faced with the problem of allocating limited funds to the conservation of many needy species. The ultimate goal is to ensure that the expected future biodiversity is as high as possible. Other factors that need to be considered are the costs of conserving different species and the expected increase in their survival probability that conservation would provide them. The Noah's Ark Problem (NAP) is a mathematical framework that formalises this problem and uses phylogenetic diversity as a measure of biodiversity.

Whilst much mathematical work has been done on the NAP actual applications of it to real problems has been limited. In this talk I consider some common criticisms of the NAP. I will also briefly introduce a new software package I have developed that bundles common algorithms for the NAP together with a graphical interface.

PLGW01 6th September 2007
14:00 to 15:00
Big trees: challenges and opportunites in the phylogenetic analysis of large data sets

It is becoming easier to obtain genetic sequences. Many of these will be used for phylogenetic analyses. But these analyses have traditionally been hampered by the size of the data (usually, the number of sequences) that can be analysed satisfactorily. In this talk, I consider three problems that have arisen as a consequence of having "too much" data. First, supertree reconstruction may provide some relief with large datasets, but what is the best way of reconstructing supertrees? I explore the possibility of constructing maximum-likelihood estimates of supertrees, and discuss some of the features that such trees may possess. Second, I consider how we can break up datasets to make satisfactory inferences. Finally, I look at the new generation of automatic sequencing machines, and the promise of large amounts of data. I outline some computational approaches that may be useful in analysing this type of data. This talk provides very little in the way of concrete results but a great deal of informed speculation. It is not my intention to instruct, but to stimulate discussion.

PLGW01 6th September 2007
15:30 to 15:50
BC Emerson Mitochondrial DNA heteroplasmy and recombination in a reptilian secondary contact zone

Inheritance of mitochondrial DNA in animals is thought to be both strictly maternal and non-recombining. Nevertheless, in the last decade evidence for recombination of mitochondrial DNA has been observed in several animal species, but few reports come from natural populations and most of the time the frequency of individuals possessing more than one mtDNA type is very low. This talk presents evidence for extensive heteroplasmy and mitochondrial recombination in a region of secondary contact of a lizard species (Lacerta lepida). Individuals at the centre of the contact zone possess heteroplasmic point mutations at the mtDNA cytochrome b gene. The heteroplasmic condition was further tested by cloning 20 individuals at the centre of the zone and sequencing 8 to 10 clones per individual. All individuals tested had recombinant mtDNA sequences, often more than one. The results suggest that mtDNA recombination was derived from the fusion of the maternal and paternal mtDNAs due to paternal leakage, which is thought to be more frequent in hybrid zones. To our knowledge this is the first study reporting such extensive heteroplasmy and recombination in the mtDNA genome in natural populations of a species with typical maternal mtDNA inheritance. Our findings suggest that mtDNA may recombine regularly and that secondary contact zones, where highly divergent molecules come into contact, may provide natural laboratories for studying this phenomenon.

PLGW01 6th September 2007
15:50 to 16:10
Generalising from consensus to supertree methods

There have been a number of attempts to generalise from consensus methods to supertree analogues that are equivalent to the consensus method in the consensus case and which preserve desirable properties of the method beyond consensus. Recently, Cotton and Wilkinson (2007) used a characterisation of the classical majority-rule consensus tree in terms of its minimisation of the sum of consensus tree to input tree distances as a basis for generalisation to the supertree case. Here I use the same general approach to generalise from strict, loose (= semi-strict) classical and reduced consensus to the corresponding supertree methods. Not all properties of the consensus methods can be simultaneously generalised beyond consensus and not all properties are desirable in the supertree case. Useful generalisation requires that we distinguish between those consensus properties that are 'essential' or 'most desirable' beyond consensus and those that seem to be merely a consequence of the special case of consensus. The loose supertree method is compared to other supertree methods, particularly to an alternative generalisation of the semi-strict consensus due to Goloboff and Pol (2002) which, it is argued, preserves consensus properties that are unhelpful more generally.

Cotton, J. A., and M. Wilkinson. 2007. Majority-rule supertrees. Systematic Biology 56: 445-452.

Goloboff, P. A., and D. Pol. 2002. Semi-strict supertrees. Cladistics 18: 514-525.

PLGW01 7th September 2007
09:00 to 10:00
Mapping human genetic ancestry
The human genome is a mosaic with respect to its evolutionary history. Based on a phylogenetic analysis of 23,210 DNA sequences alignments from human, chimpanzee, gorilla, orangutan and rhesus, we present a map of human genetic ancestry. For about 23% of our genome we share no immediate genetic ancestry with our closest living relative, the chimpanzee. This encompasses genes and exons to the same extent as intergenic regions. We, conclude that about 1/3 of our genes evolved as human specific lineages before the differentiation of human, chimps and gorillas took place. This explains recurrent findings of very old human specific morphological traits in the fossils record, which predate the recent emergence of the human species about 5 million years ago. Furthermore, the sorting of such ancestral phenotypic polymorphisms in subsequent speciation events provides a parsimonious explanation why evolutionary derived characteristics are shared among species that are not each other's closest relatives.
PLGW01 7th September 2007
10:00 to 10:20
Consistency of phylogenetic tree search algorithms based on the balanced minimum evolution principle

In phylogenetics, it is common practise to infer evolutionary or phylogenetic trees by searching tree-space, that is, searching through the collection of all possible phylogenetic trees on a given set of species or taxa using topological rearrangements and some optimality criterion. Recently, such an approach called the balanced nearest neighbor interchange (BNNI) algorithm was introduced, which is based on the balanced minimum evolution (BME) principle. This algorithm takes as input a distance matrix and a putative phylogenetic tree on a given set of species. It modifies this tree using nearest neighbor interchange (NNI) operations in such a way that the total BME length of the resulting tree decreases relative to the distance matrix. This process is repeated until a tree with (locally) shortest BME length is found. Guided by numerous computer simulations it has been conjectured that the BNNI algorithm is consistent, that is, when it is applied to an input distance that is a tree-metric, then it will always converge to the (unique) tree corresponding to that metric. To date this conjecture remains open. Here, however, we prove that a related algorithm | which we call the BSPR algorithm for short | that results from replacing NNI operations by the more general subtree prune and regraft (SPR) operations in the BNNI algorithm is, in fact, consistent. Moreover, we show that even if the input distance matrix contains (small) errors relative to the tree-metric, then the BSPR algorithm will still return the corresponding tree.

PLGW01 7th September 2007
10:20 to 10:40
Optimal and hereditarily optimal realisation of metric spaces

Given a finite set X and a metric d on X, a realization of (X,d) is a weighted graph G=(V,E,w) such that X is a subset of V and the distance d(x,y) between any pair of elements in X is the shortest distance between x and y in G. A realization such that the total edge weight of G is minimal is known as an optimal realization. It is well-known that if there exists a tree which is a realization of (X,d) then this tree is the unique optimal realization. In general however, optimal realizations are not trees, not necessarily unique, and notoriously hard to find. It has been suggested by Andreas Dress that optimal realizations may be possible to find by deleting edges in a larger graph known as the hereditarily optimal realization of the given metric, which is unique and can be explicitly calculated. Both optimal and hereditarily optimal realizations can be useful concepts in phylogenetics, especially when studying e.g. plants or viruses, where a phylogenetic tree is not necessarily the best representation of the evolutionary relationships.

We will outline some recent progress on this problem, showing both that optimal realizations can be found within hereditarily optimal realizations for all metrics on less than five elements, but also that there exist counterexamples, which in turn give rise to new questions.

PLGW01 7th September 2007
10:40 to 11:00
Properties of the TBR metric

Tree rearrangement operations are used in computational biology as a measure of the similarity between two binary phylogenetic trees on the same leaf set. For unrooted trees, the three operations of primary interest are nearest-neighbour interchange (NNI), subtree prune-and-regraft (SPR) and tree bisection-and-reconnection (TBR). All three of these have been shown to induce metrics on the set of unrooted binary phylogenetic trees.

The unit-neighbourhood of a tree T is the set of all trees that can be obtained by performing exactly one tree rearrangement on T. For both NNI and SPR, the size of this unit-neighbourhood is independent of the shape of T, and depends only on the size of the leaf set. The same is not true for TBR however, although both upper and lower bounds are known.

We present a recursion for calculating the size of the TBR unit- neighbourhood for any tree in the space of unrooted binary phylogenetic trees with n leaves. We also use this to establish improved upper and lower bounds on this size, and characterise those trees for which the given upper bound is sharp.

PLGW01 7th September 2007
11:30 to 12:30
Stars, nets, and the "war on error": Six mathematical challenges