08:30 to 09:45 Registration 09:45 to 10:00 Opening remarks and welcome from Dr Ben Mestel (INI Deputy Director) 10:00 to 11:00 Recent Progress on Multi-State Perfect Phylogeny (Compatibility) Problems The Multi-State Perfect Phylogeny Problem is an extension of the Binary Perfect Phylogeny (or Compatibility) Problem, allowing evolutionary characters to have more than two states. The problem is of interest due to prior elegant mathematical and algorithmic results, and due to integer data that is increasingly available. In 1975, Buneman showed a how to view the Multi-State problem as one of triangulating non-chordal graphs, but the result has not been widely exploited as a computational tool. In this talk, I discuss our recent work on Multi-State Perfect Phylogeny problems, mostly by exploiting Buneman's result. I will talk about three sets of results. The first set of results concerns problems that extend the multi-state problem: The Missing Data (MD) Problem, where some entries in the input are missing and the question is whether (bounded) values can be imputed so that the full data has a multi-state perfect phylogeny; The Character-Removal (CR) Problem, minimizing the number of characters to remove so that the resulting data has a multi-state perfect phylogeny. Using results on triangulation of non-chordal graphs, we establish new necessary and sufficient conditions for the existence of a perfect phylogeny with (or without) missing data. This leads to solutions of the MD and CR problems using integer linear programming. The second set of results concerns generalization of the famous four-gametes test" that characterizes the compatibility of binary data. Such a generalization was sought since the 1970s. Our new generalization establishes that 3-state data has a perfect phylogeny if and only if every subset of three characters has a 3-state perfect phylogeny. We also characterize the patterns in the data that prohibit a 3-state perfect phylogeny. We briefly discuss efforts to generalize this to more states. The third set of results concerns reductions of multi-state data to binary data, to the 2-SAT problem, and to sparser problem instances. INI 1 11:00 to 11:30 Morning coffee 11:30 to 11:50 The Complexity of the uSPR distance We show that subtree prune and regraft (uSPR) distance on unrooted trees is fixed parameter tractable with respect to the distance. We also make progress on a conjecture of Steel on the preservation of uSPR distance under chain reduction. INI 1 11:50 to 12:10 M Fischer ([CIBIV, Vienna])Revisiting the question how many characters are needed to reconstruct the true tree The question of how many sequence sites are required to recover the evolutionary relationship of the underlying species accurately is important for phylogeneticists. It is known that a particularly challenging problem for phylogenetic methods arises when a rapid divergence event occurred in the distant past, which leads to long pending branches and a short internal branch in the corresponding phylogenetic tree. While most previous approaches tackling this problem considered only 2-state models, we investigate the scenario based on all four (DNA) character states. Particularly, we analyze a binary unrooted 4-taxon phylogenetic tree with a short interior edge and pending edges of multiple length. In my talk, I will present an optimal branch length of the interior edge in this case and I will explain how many characters are at least needed to reconstruct the 'true' tree. INI 1 12:10 to 12:30 S Linz ([Tubingen])On the Complexity of the Quartet Challenge A collection of quartets (unrooted phylogenetic trees on four taxa) is compatible if there exists a phylogenetic tree that explains all ancestral relationships given by the quartets. It is well-known that deciding whether or not a set of quartets is compatible is an NP-complete problem. A natural extension of this problem asks if, given a phylogenetic tree T that is compatible with the input, is T the only such tree. Several graph-theoretic characterizations have been established to approach this problem, which is known as the Quartet Challenge. However, the complexity of this challenge remains open. In this talk, we show that the Quartet Challenge is ASP-complete (i.e. it is computationally hard to decide whether another solution exists) by using a particular type of polynomial-time reduction from a variation of the Betweenness problem. This is joint work with Maria Luisa Bonet (UPC, Spain) and Katherine St. John (CUNY, USA). INI 1 12:30 to 13:30 Lunch at Wolfson Court 14:00 to 15:00 Free Time 15:00 to 15:30 Afternoon tea 15:30 to 15:50 Metrics on Multi-labeled Trees: Interrelationships and Diameter Bounds Multi-labeled trees or MUL-trees, for short, are trees whose leaves are labeled by elements of some non-empty finite set $X$ such that more than one leaf may be labeled by the same element of $X$. This class of trees includes phylogenetic trees and tree shapes. MUL-trees arise naturally in, for example, biogeography and gene evolution studies and also in the area of phylogenetic network reconstruction. In this talk we introduce novel metrics which may be used to compare MUL-trees, most of which generalize well-known metrics on phylogenetic trees and tree shapes. These metrics can be used, for example, to better understand the space of MUL-trees or to help visualize collections of MUL-trees. In addition, we describe some relationships between the MUL-tree metrics that we present and also give some novel diameter bounds for these metrics. This is joint work with A. Spillner, University of Greifswald, Germany, and R. Suchecki and V. Moulton, both School of Computing Sciences, University of East Anglia, UK. INI 1 15:50 to 16:10 M Thuillard Identifying Lateral Transfers with Neighbor-Nets With the ever increasing availability of complete genomes, the importance of lateral transfers in evolution has been clearly recognized and increasing doubts are being raised about the feasibility of inferring a tree of life based on genome evolution. Identifying lateral gene transfers is therefore an important problem in evolutionary biology. A distance matrix in the Farris space that fulfills the Kalmanson inequalities can be exactly represented by an outer-planar split network or an X-tree. Using a simple model of lateral transfers one can show that a circular order of the taxa labeling the leaves of a phylogenetic tree is quite robust to lateral transfers. A circular order of the taxa is an order in which the taxa are encountered through a clockwise scanning of the tree embedded in the plane. If lateral transfers occur only between consecutive taxa in such an order, it can be shown that, within the usual distance-based framework for constructing phylogenetic trees, the o uter-planar network reconstructed by applying the Neighbor-Net algorithm furnishes a circular order of the nodes with the corresponding distance matrices in the Farris space fulfilling the Kalmanson inequalities. There are limits to the robustness of a circular order to lateral transfers. A new approach has been developed to deal with such cases. The approach combines the Neighbor-Net algorithm for computing phylogenetic networks with the Minimum Contradiction method. A subset of taxa, prescribed using Neighbor-Net, is obtained by measuring how closely the Kalmanson inequalities are fulfilled by each taxon. A criterion is then used to identify the taxa possibly involved in a lateral transfer between non-consecutive taxa. We illustrate the utility of the new approach by applying it to a distance matrix for Archaea, Bacteria and Eukaryota. Our new approach gives a way to use Neighbor-Nets to get further biological information, for instance to localize and identify lateral transfers or to detect large deviations to an X-tree or an outer-planar split network topology using distance matrices. In future, we expect that it will become increasingly important to develop new tools such as this that can allow one to extract useful biological information from trees and networks. INI 1 16:10 to 16:30 Exploring Treespace Phylogenies, or evolutionary histories, play a central role in modern biology, illustrating the interrelationships between species, and also aiding the prediction of structural, physiological, and biochemical properties. The reconstruction of the underlying evolutionary history from a set of morphological characters or biomolecular sequences is difficult since the optimality criteria favored by biologists are NP-hard, and the space of possible answers is huge. The number of possible phylogenetic trees for n taxa is (2n − 5)!!. Due to the hardness and the large number of possible answers, clever searching techniques and heuristics are used to estimate the underlying tree. We explore the underlying space of trees, under different metrics, in particular the nearest-neighbor-interchange (NNI), subtree-prune-and-regraft (SPR), tree-bisection-and-reconnection (TBR), and Robinson-Foulds (RF) distances. INI 1 16:30 to 16:50 B Eldon ([Oxford])Concordance between species trees and gene trees with multiple mergers of ancestral lineages Concordance between species trees and gene genealogies are considered for gene genealogies allowing multiple mergers of ancestral lineages. Donnelly and Kurtz, Pitman, and Sagitov independently introduced coalescent processes (Lambda coalescent) that allow asynchronous multiple mergers of ancestral lineages. Kingman's coalescent allows only two ancestral lineages to merge at a time. Lambda coalescents may be more appropriate than Kingman's coalescent for some marine organisms with high fecundity and broadcast spawning. Multiple mergers, ie allowing any number of ancestral lineages present each time to coalesce, complicates computations. Two methods will be presented to compute the concordance, one relies on enumeration of all possible sequences of mergers, and the other is the spectral decomposition of the rate matrix. Results will be presented for species trees consisting of two to four species, and for different Lambda coalescents. INI 1 16:50 to 17:10 Quantifying diversity and abundance in soil microbial communities from high-throughput sequence data Soil microbial communities are essential in many different ways. For instance, they support crop productivity and maintain environmental stability through their roles in the carbon, nitrogen, oxygen and sulphur cycles. They also play a central part in plant health and contribute in the bioremediation of contaminated soils. Understanding the complex structure of soil microbial communities is crucial to better manage agricultural soils and minimise the negative impact of agricultural practices. However, this poses important biological, statistical and computational challenges because of the vast numbers and diversity of microbes present in each gram of soil, the paucity of knowledge concerning the majority of them and the sheer amount of data typically observed in soil metagenomics studies. We address the problem of identifying bacterial groups present in soil and estimating their relative abundance using high-throughput data sampled from agricultural fields. Our procedure computes an overall phylogeny as the agreement among several individual phylogenies independently inferred from carefully selected marker genes. The principle being that such an agreement leads to the true phylogeny underlying all the observed taxa. The overall phylogeny provides information on the identity and abundance of bacterial groups through its clade structure and the relative sizes of these clades. Most importantly, because our method does not bin organisms according to reference databases, it sheds light into the identity and abundance of not only previously known soil bacteria but also of the 'unknowns'. As a first step, we demonstrate the ability of our method to correctly discriminate between different bacterial groups and to quantify their relative abundance using simulated data. We then go on to analyse Roche/454 data sampled from one of Rothamsted's long-term experimental fields. We present preliminary results on this analysis. INI 1 17:10 to 18:10 Welcome wine reception
 09:00 to 10:00 A Mooers (Simon Fraser University)Trees as representations of evolutionary information worth keeping Weighted trees can be viewed as depictions of informational redundancy among tips. There is a rich literature in conservation biology on how this view can be used to maximize this information in triage situations or predict the future loss of such evolutionary information. Early views that redundancy is expected to be high is supported neither by theory on tree shapes nor by surveys of phylogenies. This means that work on maximization is relevant and timely. However, much triage is done within, rather than between, independent lineages; more theoretical work is needed to translate relevant metrics to, e.g., haplotype networks on landscapes. INI 1 10:00 to 10:20 W Hordijk ([Lausanne])Speciation & Extinction Rate Estimation from Phylogenetic Trees Recently there has been an increasing interest in trying to estimate rates of evolution (in particular speciation and extinction rates) from phylogenetic tree data. However, this is currently still a difficult problem. We use a method based on lineage survival probabilities and actual speciation times to infer speciation and exitinction rates from (simulated) phylogenetic trees. Our results show that large trees (several hundred species) are necessary to get statistically significant results. This work is still ongoing, and we are investigating how to get more accurate and reliable results on smaller trees, and whether this is possible at all given the statistical nature of the problem. INI 1 10:20 to 10:40 Coalescent-based Species Tree Inference from Gene Tree Topologies Under Incomplete Lineage Sorting by Maximum Likelihood Incomplete lineage sorting can cause incongruence between phylogenetic history of genes (the gene tree) and that of the species (the species tree), which can complicate the inference of phylogenies. Developing robust computational inference approaches is currently of interests in studying incomplete lineage sorting. In this talk, I will present a new coalescent-based algorithm for inferring species tree with maximum likelihood. I will first describe an improved method for computing the probability of a gene tree topology for a species tree, which is much faster than an existing algorithm. Based on this method, I will present a practical algorithm that takes a set of gene tree topologies and infers species trees with maximum likelihood. In this algorithm, we search for the best species tree by starting from candidate species trees found by a parsimony method and performing local search to obtain better trees with higher likelihood. This algorithm, called {STELLS}, has been imp lemented in a program that is downloadable from the author's web page. The simulation results show that the STELLS algorithm is more accurate than several existing methods for many datasets, especially when there is noise (in terms of topology, branch lengths and rooting) in gene trees. We also show the STELLS algorithm is efficient and can be applied to real biological datasets. INI 1 10:40 to 11:00 Tree reconciliations: beyond the LCA mapping The least common ancestor (LCA) mapping is an important tool to reconcile discordance between gene tree and species tree, and has been well studied in the literature. In this talk, we report some recent results in studying general reconciliations, i.e., leaf-preserving and order-preserving maps, between gene tree and species tree. Here we introduce three cost models to infer, respectively, the gene duplication, gene loss and deep coalescent events. And we also show that the canonical partial order dened on the set of reconciliations is compatible with these cost models. Finally, we discuss how to generalize these results to non-binary gene trees. INI 1 11:00 to 11:30 Morning coffee 11:30 to 11:50 A Lie algebraic classification of continuous-time Markov models In recent work we have discussed the importance of multiplicative closure for the Markov models used in phylogenetics. For continuous-time Markov chains, a sufficient condition for multiplicative closure of a model class is ensured by demanding that the set of rate matrices belonging to the model class form a Lie algebra. It is the case that some Markov models do form Lie algebras (eg. JC, F81, K3ST and GMM), and we refer to these models as Lie Markov models''. Importantly, other Markov models unequivocally do not form Lie algebras (the most conspicuous example being GTR). The question then naturally arises: How do we generate a full list of Lie Markov models? To answer this question in full generality seems quite difficult as the interaction between the operations of a Lie algebra and the stochastic requirements of rate matrices are somewhat at odds, and it seems that this question has not been addressed in the algebraic or stochastic mathematics literature. In this talk, we will discuss how we have made signficant progress in generating Lie Markov models by demanding that the models have certain symmetrics under nucleotide permutations. >From a theoretical perspective, we show that the Lie Markov models include, and hence provide a unifying concept for, group-based'' and "equivariant" models, whilst also generating many other interesting examples. We also argue that our scheme is very pleasing in the context of applied phylogenetics, as, for a given symmetry of nucelotide substitution, it provides a natural hierarchy of models with increasing number of parameters. This is joint work with Jesús Fernández-Sánchez and Peter Jarvis. INI 1 11:50 to 12:10 A generalization of Stirling numbers and distribution of phylogenetic trees P.L. Erdos and L.A. Szekely provided a bijection between rooted semi-labeled trees and set partitions, and hence Stirling numbers of the second kind. This, with the asymptotic normality of the Sirling numbers of the second kind (Harper) translates into the asymptotic normality of rooted semi-labeled trees with a fixed number of vertices and a variable number of internal vertices. We apply Harper's method and the Erdos-szekely bijection to obtain the asymptotic normality of of phylogenetic trees. INI 1 12:10 to 12:30 Mammalian phylogeny reveals recent diversi cation rate shifts Phylogenetic trees of present-day species allow investigation of the rate of evolution which led to the present-day diversity. A recent analysis of the complete mammalian phylogeny challenged the view of explosive mammalian evolution after the K/T boundary (65 Ma). However, due to lack of appropriate methods, the diversication rates in the more recent past of mammalian evolution could not be determined. Here, I provide a method which reveals that the tempo of mammalian evolution did not change until about 33 Ma. This constant period was followed by a peak of diversication rates between 33 and 30 Ma. Thereafter, diversication rates remained high and constant until 8.55 Ma. Diversication rates declined signicantly at 8.55 and 3.35 Ma. Investigation of mammalian subgroups (marsupials, placentals, and the six largest placental subgroups) reveals that the diversication rate peak at 33-30 Ma ago is mainly driven by rodents, cetartiodactyla and marsupials. The recent diversication rate decrease is signicant for all analyzed subgroups but eulipotyphla, cetartiodactyla and primates. My likelihood approach is not limited to mammalian evolution. It provides a robust framework to infer diversication rate changes and mass extinction events in phylogenies, reconstructed from e.g. present-day species or virus data. In particular, the method is very robust towards noise and uncertainty in the phylogeny, and can account for incomplete taxon sampling. INI 1 12:30 to 13:30 Lunch at Wolfson Court 14:00 to 15:00 Integrative analysis if environmental sequences In metagenomics, the aim is to understand the composition and operation of complex microbial assemblages in environmental samples through sequencing and analysis of their DNA. Similarly, metatranscriptomics and metaproteomics target the RNA and proteins obtained from such samples. A major challenge in the analysis of environmental sequences is data integration. The question is how to anaylze different types of data in a unified approach, addressing both taxonomic and functional aspects. We will discuss how to perform taxonomic analysis based on the NCBI taxonomy. Functional analysis can be based on classifications such as the SEED classification of functional roles and subsystems or the KEGG classification of pathways and enzymes. We will also discuss how to compare different datasets using statistical approaches, ecological indices and methods such as Neighbor-net and multi-dimensional scaling. INI 1 15:00 to 15:30 Afternoon tea 15:30 to 17:30 Free afternoon