EMBO Workshop on Current Challenges and Problems in Phylogenetics
Monday 3rd September 2007 to Friday 7th September 2007
08:30 to 09:55  Registration  
09:55 to 10:00  David Wallace  Welcome  INI 1  
10:00 to 11:00 
Computational and mathematical challenges involved in very largescale phylogenetics Phylogenetic inference presents enormous computational and mathematical challenges, but these are particularly exacerbated when dealing with very large datasets (containing thousands of sequences) or when sequences evolve under complex models of evoluiton. In this talk, I will describe some of the recent progress in largescale phylogenetics. In particular, I will talk about multiple sequence alignment and its implications for largescale phylogenetics. Related Links

INI 1  
11:00 to 11:30  Coffee  
11:30 to 11:50 
Dendroscope  an interactive viewer for large phylogenetic trees Research in evolution requires software for visualizing and editing phylogenetic trees, for increasingly very large datasets, such as arise in expression analysis or metagenomics, for example. It would be desirable to have a program that provides these services in an efficient and userfriendly way, and that can be easily installed and run on all major operating systems. Although a large number of tree visualization tools are freely available, some as a part of more comprehensive analysis packages, all have drawbacks in one or more domains. They either lack some of the standard tree visualization techniques or basic graphics and editing features, or they are restricted to small trees containing only tens of thousands of taxa. Moreover, many programs are difficult to install or are not available for all common operating systems. We have developed a new program, Dendroscope, for the interactive visualization and navigation of phylogenetic trees. The program provides all standard tree visualizations and is optimized to run interactively on trees containing hundreds of thousands of taxa. The program provides tree editing and graphics export capabilities. To support the inspection of large trees, Dendroscope offers a magnification tool. The software is written in Java 1.4 and installers are provided for Linux/Unix, MacOS X and Windows XP. Dendroscope is a userfriendly program for visualizing and navigating phylogenetic trees, for both small and large datasets. Related Links

INI 1  
11:50 to 12:10 
Appropriate models for heterogeneous multigene data sites Multigene data sets are becoming the norm for phylogenetic studies; so called phylogenomic datasets may involve hundreds of genes for many species. These data sets create challenges for current phylogenetic methods, as different genes have different functions and hence evolve under different processes. The question is how best to model this heterogeneity to give reliable phylogenetic estimates of the species tree. Here we discuss two approaches. The first approach involves stochastic partitioning of genes into different classes, the second approach is a form of penalised maximum likelihood where the penalty term 'encourages' parameter estimates to be similar between different gene trees. 
INI 1  
12:10 to 12:30 
H Li ([Wellcome Trust Sanger]) Incorporating speices phylogeny in the reconstruction of gene trees Reconstructing accurate trees is one of the most challenging topics in phylogenetics. In this talk, two new algorithms will be presented in attempt to improve the accuracy of gene tree building by incorporating species phylogeny. Both of them are presented in the CFG (ContextFree Grammar) framework. The first algorithm is called tree merge, which takes a set of gene trees built by different algorithms and then merges them together to get a new tree that can be better than all the input. The second algorithm calculates a likelihood with respect to the species evolution and incorporates this likelihood to the likelihood calculated from the sequence alignment. Maximum likelihood tree can then be searched. The two algorithms have been implemented in NJTREE software which is the core engine of the TreeFam database. NJTREE is also extensively used in the orthology prediction pipeline of Ensembl Compara, and is proved to be more accurate than traditional algorithms in reconstructing phylogenetic trees in a multigene family. Related Links

INI 1  
12:30 to 13:30  Lunch at Wolfson Court  
14:00 to 15:00 
Mixture models in phylogenetic inference I describe a general likelihoodbased mixture model approach for inferring phylogenetic trees from genesequence or other character state data. Conventional models of genesequence evolution assume that all sites evolve according to a single homogeneous model of evolution or that rates of evolution vary among sites according to some statistical distribution, such as the gamma. In a phylogenetic context, a mixture model allows for more than one model of the evolutionary process to be fitted to each site in the alignment, with the likelihood being summed over models at each site. I describe how mixture modelling can accommodate cases in which different sites in the alignment evolve in qualitatively distinct ways, a phenomenon we call pattern heterogeneity. Mixture models can also accommodate sites whose rate of evolution varies in different parts of the tree. This is often called heterotachy and I show how a mixture model approach can improve on a parametric covarion model for heterotachous data, and how the model can be used to identify which sites are evolving heterotachously and where in the tree this behaviour occurs. The mixture model does not require prior knowledge of the patterns or of the nature of the heterotachy in the data, nor does it partition the data. We present studies to show that the model correctly retrieves known patterns from simulated genesequence data, and present evidence that it generally improves the likelihood of the data substantially over simpler models of genesequence evolution. Mixture modelling seems a promising technique for discovering some of the complex evolutionary processes that underly sequence evolution, and may be of interest to researchers who wish to identify how specific sites respond adaptively to particular environments. We implement the model within a Bayesian MarkovChain Monte Carlo framework, and make it available in our BayesPhylogenies software (www.evolution.rdg.ac.uk). 
INI 1  
15:00 to 15:30  Tea  
15:30 to 15:50 
G Margos ([Bath]) Multilocus sequence analysis of Borrelia burgdorferi s.l. Lyme borreliosis, the most frequent vectorborne disease in the Northern Hemisphere, is caused by spirochaetes belonging to the genus Borrelia. These bacteria now comprise 12 named "species", thus forming a species complex referred to as Borrelia burdorferi sensu lato. All known genotypes of this species complex are maintained by vertebrate hosts and ticks, however, their maintenance systems and pathogenic properties are remarkably diverse. A major question addressed in our project is whether genotypically and ecologically defined Borrelia populations are congruent. So far, most genotyping schemes of Borrelia are based on noncoding spacer regions and genes encoding outer surface proteins, many of which are under frequencydependent immune selection or prone to recombination. In order to better resolve the geographical population structure and the phylogenetic relationships of these bacteria, we are currently developing and exploring multilocus sequence analysis (MLSA) based on housekeeping genes. In this talk I will present some 'proof of principle' data on the suitability of MLSA for genotyping spirochaetal strains. 
INI 1  
15:50 to 16:10 
Comparing phylogenetic trees Comparing phylogenetic trees is an important part of analyzing biological data sets. Many of the popular distance metrics are NPhard but have been shown to efficient to approximate and tractable in small cases. We discuss recent results in comparing two phylogenetic trees under the treebisectionandreconnection (TBR) and subtreepruneandregraft (SPR) distances. 
INI 1  
16:10 to 16:30 
Phylogenetic reconstructions based on RNA secondary structure It is known that RNA secondary structure motifs are usually conserved, and have a vital influence on the activity of the RNA molecules. We present a novel approach for phylogenetic reconstruction, based on comparing secondary structures of homologous RNA molecules from different species. By manipulating the RSmatch algorithm for measuring RNA secondary structure similarity, we computed pairwise distances between secondary structures. Consequently, we applied the neighbor joining algorithm on the distance matrix, to produce phylogenetic trees. Initial analysis of the results exhibits a good agreement with the accepted taxonomy. Comparison with the trees based RNA sequences alone have also produced favorable results. This is joint work with Moran Cabili, Assaf Meirovich, and Metsada PasmanikChor 
INI 1  
16:45 to 18:30  Welcome Wine Reception and poster session  
18:45 to 19:30  Dinner at Wolfson Court (Residents only) 
09:00 to 10:00 
Advances and limitations of maximum likelihood phylogenetics I'll present recent developments in inferring phylogenies from sequences by maximum likelihood (ML). In fact, this approach involves two main components: probabilistic modelling of substitution events, and algorithmics to infer nearoptimal trees. I'll show that algorithmics has considerably progressed during the last few years, allowing now for fast ML inference from large datasets. On the other hand, I'll underline the difficulties encountered with modelling, especially of protein evolution, and show that elaborating more realistic and improved substitution models is a major challenge. 
INI 1  
10:00 to 10:20  Restrictions on meaningful phylogenetic networks  INI 1  
10:20 to 10:40 
Inference rules for supernetwork construction A fundamental aspect of understanding molecular evolution is species phylogeny reconstruction. A commonly used approach in this context is to infer it from a collection of gene tress using, for example, a supertree or supermatrix method. Although popular, one of the main drawbacks of such a method is that it can suppress conflict within the gene trees arising from, for example, lateral gene transfer or recombination. Until now, the only known approaches capable of accommodating such conflict have been Zclosure supernetworks and Qimputation supernetworks. In this talk, we introduce the novel Yinference rule and show that under certain circumstances this inference rule on its own or in combination with one Meacham's inference rules constitute an attractive alternative approach that complements the above list of supernetwork approaches. This is joint work with Stefan Gruenewald (ASMPG Partner Institute for Computational Biology, China) and Qiong Wu (University of East Anglia, UK). 
INI 1  
10:40 to 11:00 
Maximum parsimony on phylogenetic networks: some rigorous results Phylogenies, the evolutionary histories of groups of organisms, play a major role in representing the interrelationships among biological entities. Although many biological processes can be effectively modeled and summarized as a binary tree, others cannot: recombination, hybrid speciation, and horizontal gene transfer result in networks of relationships rather than trees of relationships. Maximum parsimony (MP) is one of the most popular methods used for phylogenetic tree reconstruction. Roughly, this method is based on the assumption that "evolution is parsimonious", that is, the best evolutionary trees are the ones that minimize the number of changes along the edges of the tree. In previous works, we formulated a maximum parsimony (MP) criterion for reconstructing and evaluating phylogenetic networks, and demonstrated its quality on biological as well as synthetic datasets. In this work, we show further theoretical insights and results regarding various aspects of the MP criterion of phylogenetic networks. 
INI 1  
11:00 to 11:30  Coffee  
11:30 to 11:50 
Integer programming for phylogenetic computations We consider the following three related problems: 1) Given complete data (taxon by character data) remove the minimum number of characters so that the remaining data is pairwise compatible (contains a perfect phylogeny); 2) Given data with missing entries, impute the missing data in order to minimize, in the completed data, the number of pairs of characters that are incompatible; 3) Given data with missing entries, impute the missing data to minimize, in the completed data, the number of characters that must be removed so that the remaining sites are pairwise compatible. We consider in detail the cases of two, three and four states allowed per character and describe compact integer programming formulations that solve these problems (the approach extends in principle to any fixed number of states). We discuss empirical tests on problem instances of size of interest in phylogenetics, and establish that these instances can be solved very efficiently in practice, even though the problems are all NPhard. Related Links

INI 1  
11:50 to 12:10 
Quartet compatability: new results and surprising counterexamples Many methods in phylogenetic tree reconstruction are not feasible for a large number of taxa. Hence, it is a natural approach to construct trees on small taxa sets first and then to look for a big tree (supertree) on the union of the taxa sets that contains all information of the small trees. However, even if all input trees are quartets, i.e. binary trees with four leaves, it is an NPcomplete problem to decide whether they are compatible. I will present a sufficient condition for a set of binary phylogenetic trees to be compatible. That result can be used to give a much shorter proof of the known characterization of quartet sets of minimal size which are defining, i.e. compatible with a unique supertree. Further, I will show some quartet sets with surprising properties which are counterintuitive and explain the hardness of the compatibility problem. One example is an incompatible collection of quartets for which every two quartets overlap in at most one taxon. It has been known for fifteen years that such quartet sets exist but the example is the first one that has been constructed. 
INI 1  
12:10 to 12:30 
On kleaf powers In 2002, Nishimura, Ragde and Thilikos introduced the notion of kleaf power and kleaf root, motivated by the following: ``... a fundamental problem in computational biology is the reconstruction of the phylogeny, or evolutionary history, of a set of species or genes, typically represented as a phylogenetic tree ...''. The species occur as leaves of the phylogenetic tree. Let G=(V_G,E_G) be a finite undirected graph. For k at least 2, a tree T is a $k$leaf root of G if V_G is the leaf set of T and two vertices x,y in V_G are adjacent in G if and only if their distance d_T(x,y) in T is at most k. The graph G is a $k$leaf power if it has a kleaf root. Obviously, a graph is a 2leaf power if and only if it is the disjoint union of cliques, i.e., it contains no induced path of three vertices. Recent work on leaf powers contains characterisations of 3 and 4leaf powers and their variants. It is well known that every kleaf power is a strongly chordal graph. For k at least 6, no characterisation of $k$leaf powers and no efficient recognition is known. We discuss some open problems and new results on leaf powers. 
INI 1  
12:30 to 13:30  Lunch at Wolfson Court  
14:00 to 15:00 
Tripartitions in phylogenetic networks Phylogenetic networks are a generalization of phylogenetic trees that allow for the representation of reticulate evolution events, like recombination, hybridization, and lateral gene transfer. In a recent series of papers devoted to the study of reconstructibility of phylogenetic networks, the wellknown bipartition metric for phylogenetic trees was generalized to the socalled tripartition metric for phylogenetic networks. In this talk, we show that, in fact, this tripartition metric does not satisfy the separation axiom of distances (zero distance means isomorphism, or, in a more relaxed version, zero distance means indistinguishability in some specific sense) in any of the classes of phylogenetic networks where it is claimed to do so. We also present a class of phylogenetic networks whose members can be singled out by means of their sets of tripartitions, and hence, where the latter can be used to define a meaningful metric. Related Links

INI 1  
15:00 to 15:30  Tea  
15:30 to 15:50 
Branch and bound construction of balanced minimum evolution optimal trees The Balanced Minimum Evolution (BME) score has been recently proposed as a criterion for reconstructing phylogenetic trees based on a distance matrix. It is based on Pauplin's formula, which provides a natural estimate of the total length of a tree, based on its topology and a matrix of estimated pairwise distances. The objective is to find the tree topology that minimizes this length estimate, as short trees are usually the ones that best reflect the data. Recently, it has been shown that Neighbor Joining can be viewed as a greedy algorithm trying to minimise BME. Together with other theoretical reasons, there are strong experimental reasons supporting BMEguided tree reconstruction. However, the published methods are heuristic and do not attempt to construct BMEoptimal trees. The main aim of this talk will be to present a Branch and Bound approach for finding BMEoptimal trees. We derived a simple bound on the BME score of a tree based on the score of a partially constructed tree. This allows us to eliminate the need to explore large parts of the space of all possible trees, but still guarantees that all optimal trees will be found. The efficiency of this approach compares well with that of other Branch and Bound approaches such as the ones for Maximum Parsimony. Finally the topological accuracy of the reconstructed trees will also be discussed. 
INI 1  
15:50 to 16:10 
New analytical results for the speciation times in neutral models Did a reconstructed phylogeny evolve under a neutral model? We provide an analytical way opposed to the common simulation approach for comparing the speciation times of a reconstructed tree with the dates proposed by the neutral models, e.g. via lineagesthroughtime plots. Further, if dates are missing in a reconstructed tree which evolved under a neutral model, we provide the time of the speciation events. These methods, which make use of the rank function on a phylogenetic tree, can also be applied for calculating analytically the loss in biodiversity when some species become extinct. The neutral models considered in this work are the Yule model and a generalization which includes extinction as well, the critical branching process (CBP) model, recently introduced by D. Aldous and L. Popovic. Trees under these neutral models are conditioned on having $n$ extant species today which is crucial for data analysis. 
INI 1  
16:10 to 16:30 
Partitioning the sample space on five taxa for the neighbour joining algorithm In this talk, we will analyze the behavior of the Neighbor Joining algorithm on five taxa and we will show that the partition of the sample (data) space for estimation of a tree topology with five taxa into subspaces, within each of which the Neighbor Joining algorithm returns the same tree topology. A key of our method to partition the sample space is the action of the symmetric group $S_5$ on the set of distance matrices by changing the labels of leaves. The method described in this paper can be generalized to trees with more than five taxa. This is joint work with Kord Eickmeyer and a preprint is available at http://arxiv.org/abs/math.CO/0703081. Related Links 
INI 1  
18:45 to 19:30  Dinner at Wolfson Court (Residents only) 
09:00 to 10:00 
Reconstruction of phylogeny by stratophenetic tracing in the fossil record and its comparison with molecular data The information on the relatedness between taxa and the direction of evolution are only indirectly coded in the input data of all molecular phylogenies. Fossil record, on the other hand, contains direct information on the order, absolute timing and nature of evolutionary events. There is, however, a small hitch: the fossil record only preserves a fraction of the phylogenetically relevant information and it is often so patchy and incomplete, that reconstructions of phylogeny based on fossil data are not superior to molecular methods. A notable exception is presented by the fossil record of marine microplankton, in particular planktonic foraminifera. Their calcite shells are rich in characters and their prolific production and excellent preservation in oceanic sediments produced probably the most complete fossil record on Earth. Here, phylogenetic relationships can be reconstructed by the method of stratophenetic tracing the nearest one can come to direct observations from a time machine where both the topology and the timing of divergences are inferred by backpeeling of layers of sediments and tracking changes in morphology of the evolving species through time. The fossil record of planktonic foraminifera thus offers a unique possibility to ground truce molecular phylogenies, the methods that produce them and the assumptions and processes on which the methods are based. We will present a comparison between fossil phylogeny of one monophylum of planktonic foraminifera with corresponding SSU rDNA phylogenies, showing unequivocal direct evidence for longbranch artefacts, enormous differences in substitution rates and incongruent tree topologies, underscoring the potential of foraminifera as a model organism in phylogenetics. 
INI 1  
10:00 to 10:20 
Explicit bounds for the stability of maximum likelihood trees The application of the maximum likelihood framework for inferring phylogenetic trees is based on the adoption of an explicit evolution model on which depend both the analysis of sequence evolution and the computation of the likelihood of a tree. Since this computation is based on the observed nucleotides, which are only in finite number, the likelihood has a random component, possibly large, and the robustness of the inferred tree has to be assessed. Felsenstein's bootstrap test, along with the subsequently developed bootstrapbased tests, is the most commonly used test of reliability of a tree. However, it discards the relation between the size of the data, the number of species in the study and the stability of the phylogeny. The bootstrap is also based on resampling with replacement, which can be time consuming especially for large sets of data. We propose to bound the variability of the empirical likelihood around its true value, for a given phylogeny. We also bound the probability of a phylogeny being better than another one "just by chance" when in reality it is worse. These bounds, obtained with measure concentration tools, account for the number of species and the number of nucleotides. In particular, they give the minimum number of nucleotides needed to achieve a given confidence level, when working with a given number of species. Finally, to illustrate the behaviour of our method, we compare it to bootstrap on a toy example. 
INI 1  
10:20 to 10:40 
D Metzler ([Frankfurt]) Insertiondeletion models for dequence evolution and Bayesian sampling methods for multiple alignment In current practice the estimation of a phylogenetic tree is a twostep procedure: first a multiple alignment is computed and subsequently a phylogenetic tree is reconstructed, based on the alignment. However, it is well known that the alignment and the tree reconstruction problem are intertwined. Thus, it is of great interest to estimate alignment and tree simultaneously. A precondition for this is the availability of biologically plausible sequence evolution models which are compatible with efficient alignment algorithms. We discuss a variant of the ThorneKishinoFelsenstein model, having equal rates of insertions and of deletions of sequence fragments, for more than two sequences related by a phylogenetic tree. Finally, we discuss the perspectives for statistical multiple alignment based on this type of model. Most parts of this talk are based on joint work with Roland Fleissner, Arndt von Haeseler, Ana ArribasGil, and Anton Wakolbinger. Related Links

INI 1  
10:40 to 11:00 
Indentifiability of the GTR=gamma substitution model of DNA evolution The GTR+Gamma+I model of DNA substitution is probably the most widely used model in practical phylogenetic inference. However, a valid proof of the identifiability of its parameters (tree topology, root distribution, rate matrix, proportion of invariable sites, and alpha shape parameter) has not been given. After indicating the gaps in the previouslypublished proof of the GTR+Gamma+I model, and discussing various mathematical approaches for establishing identifiability results, a sketch of a recent proof of the identifiability of GTR+Gamma will be given. This is joint work with Cecile Ane' and John Rhodes. 
INI 1  
11:00 to 11:30  Coffee  
11:30 to 12:30  Likelihood calculations on the ancestral recombination graph  INI 1  
12:30 to 13:30  Lunch at Wolfson Court  
18:45 to 19:30  Dinner at Wolfson Court (Residents only) 
09:00 to 10:00 
Model averaging in phylogenies It is well known that the use of different probabilistic models of evolution may change the results of phylogenetic analyses, and during the last years much research has been devoted to the selection of bestfit models. An alternative to model selection is model averaging or multimodel inference, where all models are simultaneously used to infer different parameters of the substitution process, or the phylogenetic tree (topology and branches) itself. Here I will discuss some preliminary simulations that suggest that modelaveraging might not be very useful to estimate substitution parameters, but that might reduce the variance associated to the estimation of tree topology. New software that implements model averaging for phylogenetics will also be introduced. Related Links

INI 1  
10:00 to 10:20 
FA Matsen ([California]) Glimpses from the strange world of phylogenetic mixtures Rates of evolution and evolutionary history may vary across the genome of a given organism. The standard way of modeling the resulting data is a "mixture model" where a given phylogenetic pattern comes from one of a collection of trees with fixed probabilities. In this talk I will focus on recent results (joint with Mike Steel and Elchanan Mossel) which investigate what pattern probabilities are and are not possible when rates of evolution vary on a single topology. In particular, we have shown that a mixture of two processes on a tree of one topology can produce exactly the same expected site pattern frequencies (i.e. sequence data) as an unmixed process on a different topology. For mixtures of a number of processes the situation is even more dire: in a certain sense the topology is not uniquely determined for a majority of data vectors. I will also present other related results along with the associated mathematics and indicate some interesting future directions for research. 
INI 1  
10:20 to 10:40 
Neighbor joining algorithms for inferring phylogenies via LCAdistances Reconstructing phylogenetic trees efficiently and accurately from distance estimates is an ongoing challenge in computational biology from both practical and theoretical considerations. We study algorithms which are based on a characterization of edgeweighted trees by distances to LCAs ({\em Least Common Ancestors}). This characterization enables a direct application of ultrametric reconstruction techniques to trees which are not necessarily ultrametric. A simple and natural neighbor joining criterion based on this observation is used to provide a family of efficient neighborjoining algorithms. These algorithms are shown to reconstruct a refinement of the Buneman tree, which implies optimal robustness to noise under criteria defined by Atteson. In this sense, they outperform many popular algorithms such as Saitou\&Nei's NJ. Preliminary experimental results indicate that when executed from an appropriate root taxon, this algorithm provides reconstruction of phylogenies which are competitive with NJ and other common algorithms. 
INI 1  
10:40 to 11:00 
Reconstruction of phylogenetic trees with very short branches One of the central challenges in phylogenetics is the accurate reconstruction of phylogenetic trees from short taxonsequences. The sequence length required for correct topological reconstruction depends on certain properties of the tree such as its depth and edgeweights. Of special interest are fast converging algorithms, which, under the assumption of a constant lower bound on edge weights, require only polynomial length sequences to guarantee correct topological reconstruction with high probability. However, when the original phylogenetic tree contains very short edges, the required sequencelength may be too long for practical purposes. Moreover, incorrect reconstruction of short edges may (and in practice often does) propagate in the sense that it prevents the correct reconstruction of long edges as well. Therefore, most known reconstruction algorithms assume that the original tree has a fully resolved topology and all its edges are sufficiently long. In this talk we present a fast converging reconstruction algorithm which avoids this assumption. The algorithm returns a partially resolved topology which contains all sufficiently long edges of the original tree, and no edges which contradict the original topology. It is designed to reconstruct more edges when the input sequences grow longer. Unlike existing fast converging reconstruction algorithms, this method provides a reconstruction guarantee for every taxonsequence length. This makes it more appropriate for practical use. 
INI 1  
11:00 to 11:30  Coffee  
11:30 to 11:50 
Nature reserve selection problems The Nature Reserve Selection Problem (NRS) is a problem that arises in the context of conservation biology. Subject to budgetary constraints, the problem is to select a set of conservation areas to preserve so that the phylogenetic diversity of the species contained within those areas is maximized. It was recently shown that NRS is NPhard. In this talk, we give a tight polynomialtime approximation algorithm for NRS and describe a closelyrelated problem for which little is known. 
INI 1  
11:50 to 12:10 
Computing phylogenetic diversity for split systems In conservation biology, it is important to measure, predict, and preserve biodiversity as many species are facing extinction. In 1992, Faith proposed measuring the diversity of a collection of species in terms of their relationship on a phylogenetic tree, and using this information to identify collections of species with high diversity. Here we are interested in some variants of the resulting optimisation problem that arise when considering species whose evolution is better represented by a network rather than a tree. More specifically, we consider the problem of computing phylogenetic diversity relative to split systems. We show that for general split systems, this problem is NPhard. In addition we provide some efficient algorithms for some special classes of split systems, in particular presenting an optimal O(n) time algorithm for phylogenetic trees and an O(n log n + nk) time algorithm for circular split systems. Keywords: Phylogenetic tree, Phylogenetic network, Phylogenetic diversity, Biodiversity conservation, Split systems 
INI 1  
12:10 to 12:30 
K Hartmann ([Canterbury]) Prioritising species for biodoversity conservation Conservation organisations are often faced with the problem of allocating limited funds to the conservation of many needy species. The ultimate goal is to ensure that the expected future biodiversity is as high as possible. Other factors that need to be considered are the costs of conserving different species and the expected increase in their survival probability that conservation would provide them. The Noah's Ark Problem (NAP) is a mathematical framework that formalises this problem and uses phylogenetic diversity as a measure of biodiversity. Whilst much mathematical work has been done on the NAP actual applications of it to real problems has been limited. In this talk I consider some common criticisms of the NAP. I will also briefly introduce a new software package I have developed that bundles common algorithms for the NAP together with a graphical interface. Related Links

INI 1  
12:30 to 13:30  Lunch at Wolfson Court  
14:00 to 15:00 
Big trees: challenges and opportunites in the phylogenetic analysis of large data sets It is becoming easier to obtain genetic sequences. Many of these will be used for phylogenetic analyses. But these analyses have traditionally been hampered by the size of the data (usually, the number of sequences) that can be analysed satisfactorily. In this talk, I consider three problems that have arisen as a consequence of having "too much" data. First, supertree reconstruction may provide some relief with large datasets, but what is the best way of reconstructing supertrees? I explore the possibility of constructing maximumlikelihood estimates of supertrees, and discuss some of the features that such trees may possess. Second, I consider how we can break up datasets to make satisfactory inferences. Finally, I look at the new generation of automatic sequencing machines, and the promise of large amounts of data. I outline some computational approaches that may be useful in analysing this type of data. This talk provides very little in the way of concrete results but a great deal of informed speculation. It is not my intention to instruct, but to stimulate discussion. 
INI 1  
15:00 to 15:30  Tea  
15:30 to 15:50 
BC Emerson ([East Anglia]) Mitochondrial DNA heteroplasmy and recombination in a reptilian secondary contact zone Inheritance of mitochondrial DNA in animals is thought to be both strictly maternal and nonrecombining. Nevertheless, in the last decade evidence for recombination of mitochondrial DNA has been observed in several animal species, but few reports come from natural populations and most of the time the frequency of individuals possessing more than one mtDNA type is very low. This talk presents evidence for extensive heteroplasmy and mitochondrial recombination in a region of secondary contact of a lizard species (Lacerta lepida). Individuals at the centre of the contact zone possess heteroplasmic point mutations at the mtDNA cytochrome b gene. The heteroplasmic condition was further tested by cloning 20 individuals at the centre of the zone and sequencing 8 to 10 clones per individual. All individuals tested had recombinant mtDNA sequences, often more than one. The results suggest that mtDNA recombination was derived from the fusion of the maternal and paternal mtDNAs due to paternal leakage, which is thought to be more frequent in hybrid zones. To our knowledge this is the first study reporting such extensive heteroplasmy and recombination in the mtDNA genome in natural populations of a species with typical maternal mtDNA inheritance. Our findings suggest that mtDNA may recombine regularly and that secondary contact zones, where highly divergent molecules come into contact, may provide natural laboratories for studying this phenomenon. 
INI 1  
15:50 to 16:10 
Generalising from consensus to supertree methods There have been a number of attempts to generalise from consensus methods to supertree analogues that are equivalent to the consensus method in the consensus case and which preserve desirable properties of the method beyond consensus. Recently, Cotton and Wilkinson (2007) used a characterisation of the classical majorityrule consensus tree in terms of its minimisation of the sum of consensus tree to input tree distances as a basis for generalisation to the supertree case. Here I use the same general approach to generalise from strict, loose (= semistrict) classical and reduced consensus to the corresponding supertree methods. Not all properties of the consensus methods can be simultaneously generalised beyond consensus and not all properties are desirable in the supertree case. Useful generalisation requires that we distinguish between those consensus properties that are 'essential' or 'most desirable' beyond consensus and those that seem to be merely a consequence of the special case of consensus. The loose supertree method is compared to other supertree methods, particularly to an alternative generalisation of the semistrict consensus due to Goloboff and Pol (2002) which, it is argued, preserves consensus properties that are unhelpful more generally. Cotton, J. A., and M. Wilkinson. 2007. Majorityrule supertrees. Systematic Biology 56: 445452. Goloboff, P. A., and D. Pol. 2002. Semistrict supertrees. Cladistics 18: 514525. 
INI 1  
19:30 to 23:00  Conference Dinner  Christ's College Dining Hall 
09:00 to 10:00 
Mapping human genetic ancestry
The human genome is a mosaic with respect to its evolutionary history. Based on a phylogenetic analysis of 23,210 DNA sequences alignments from human, chimpanzee, gorilla, orangutan and rhesus, we present a map of human genetic ancestry. For about 23% of our genome we share no immediate genetic ancestry with our closest living relative, the chimpanzee. This encompasses genes and exons to the same extent as intergenic regions. We, conclude that about 1/3 of our genes evolved as human specific lineages before the differentiation of human, chimps and gorillas took place. This explains recurrent findings of very old human specific morphological traits in the fossils record, which predate the recent emergence of the human species about 5 million years ago. Furthermore, the sorting of such ancestral phenotypic polymorphisms in subsequent speciation events provides a parsimonious explanation why evolutionary derived characteristics are shared among species that are not each other's closest relatives.

INI 1  
10:00 to 10:20 
Consistency of phylogenetic tree search algorithms based on the balanced minimum evolution principle In phylogenetics, it is common practise to infer evolutionary or phylogenetic trees by searching treespace, that is, searching through the collection of all possible phylogenetic trees on a given set of species or taxa using topological rearrangements and some optimality criterion. Recently, such an approach called the balanced nearest neighbor interchange (BNNI) algorithm was introduced, which is based on the balanced minimum evolution (BME) principle. This algorithm takes as input a distance matrix and a putative phylogenetic tree on a given set of species. It modifies this tree using nearest neighbor interchange (NNI) operations in such a way that the total BME length of the resulting tree decreases relative to the distance matrix. This process is repeated until a tree with (locally) shortest BME length is found. Guided by numerous computer simulations it has been conjectured that the BNNI algorithm is consistent, that is, when it is applied to an input distance that is a treemetric, then it will always converge to the (unique) tree corresponding to that metric. To date this conjecture remains open. Here, however, we prove that a related algorithm  which we call the BSPR algorithm for short  that results from replacing NNI operations by the more general subtree prune and regraft (SPR) operations in the BNNI algorithm is, in fact, consistent. Moreover, we show that even if the input distance matrix contains (small) errors relative to the treemetric, then the BSPR algorithm will still return the corresponding tree. 
INI 1  
10:20 to 10:40 
Optimal and hereditarily optimal realisation of metric spaces Given a finite set X and a metric d on X, a realization of (X,d) is a weighted graph G=(V,E,w) such that X is a subset of V and the distance d(x,y) between any pair of elements in X is the shortest distance between x and y in G. A realization such that the total edge weight of G is minimal is known as an optimal realization. It is wellknown that if there exists a tree which is a realization of (X,d) then this tree is the unique optimal realization. In general however, optimal realizations are not trees, not necessarily unique, and notoriously hard to find. It has been suggested by Andreas Dress that optimal realizations may be possible to find by deleting edges in a larger graph known as the hereditarily optimal realization of the given metric, which is unique and can be explicitly calculated. Both optimal and hereditarily optimal realizations can be useful concepts in phylogenetics, especially when studying e.g. plants or viruses, where a phylogenetic tree is not necessarily the best representation of the evolutionary relationships. We will outline some recent progress on this problem, showing both that optimal realizations can be found within hereditarily optimal realizations for all metrics on less than five elements, but also that there exist counterexamples, which in turn give rise to new questions. Related Links

INI 1  
10:40 to 11:00 
Properties of the TBR metric Tree rearrangement operations are used in computational biology as a measure of the similarity between two binary phylogenetic trees on the same leaf set. For unrooted trees, the three operations of primary interest are nearestneighbour interchange (NNI), subtree pruneandregraft (SPR) and tree bisectionandreconnection (TBR). All three of these have been shown to induce metrics on the set of unrooted binary phylogenetic trees. The unitneighbourhood of a tree T is the set of all trees that can be obtained by performing exactly one tree rearrangement on T. For both NNI and SPR, the size of this unitneighbourhood is independent of the shape of T, and depends only on the size of the leaf set. The same is not true for TBR however, although both upper and lower bounds are known. We present a recursion for calculating the size of the TBR unit neighbourhood for any tree in the space of unrooted binary phylogenetic trees with n leaves. We also use this to establish improved upper and lower bounds on this size, and characterise those trees for which the given upper bound is sharp. 
INI 1  
11:00 to 11:30  Coffee  
11:30 to 12:30  Stars, nets, and the "war on error": Six mathematical challenges  INI 1  
12:30 to 12:35  Closing remarks  
12:35 to 13:30  Lunch at Wolfson Court  
18:45 to 19:30  Dinner at Wolfson Court (Residents only) 