skip to content
 

Timetable (PLGW03)

Future Directions in Phylogenetic Methods and Models

Monday 17th December 2007 to Friday 21st December 2007

Monday 17th December 2007
08:30 to 09:55 Registration
09:55 to 10:00 Welcome - David Wallace INI 1
10:00 to 11:00 T Warnow ([Texas at Austin])
The tree of life: algorithmic and software challenges
Chair: Vincent Moulton

The Tree of Life initiative -- to reconstruct the evolutionary history of all organisms -- is the computational grand challenge of evolutionary biology. Current methods are limited to problems several orders of magnitude smaller and also fail to provide sufficient accuracy at the high end of their range.The Cyberinfrastructure for Phylogenetic Research (CIPRes) project, whcih is funded by a $11.6M Information Technology Grant from the US National Science Foundation, is a multi-institutional project whose goal is to help develop the computational infrastructure for evolutionary biologists so that they can analyze large datasets. In this talk, I will describe the work done by the CIPRES project towards these objectives, and will address some of the interesting and pressing problems that will remain.

Related Links

INI 1
11:00 to 11:30 Coffee
11:30 to 11:50 T Nye ([Newcastle])
Seeing the wood for trees: analysing multiple alternative phylogenies
Chair: Vincent Moulton

Phylogenetic analysis very commonly produces several alternative trees for a given set of taxa. This can be due to having several, possibly conflicting, sources of information (such as different sets of orthologous genes) or as a result of the nature of the analytic method itself (for example, sampling a large number of trees from a posterior distribution). Faced with several alternative trees, biologists typically seek a single concensus phylogeny -- thereby discarding important information.

We review some recent approaches to dealing with alternative phylogenies that arise from a multivariate statistical view-point. We go on to describe a novel approach for comparing alternative phylogenies that is based on the idea of a "tree-of-trees": trees with similar topologies are clustered together in the same way that a single phylogeny clusters species with similar DNA sequences. The tree-of-trees is constructed in an analogous way to a most parsimonious tree for DNA data. We illustrate these ideas by applying our method to some simple data sets.

INI 1
11:50 to 12:10 J Fernandez-Sanchez ([ETSEIB-UPC])
Performance of new invariant methods for phylogenetic reconstuction
Chair: Vincent Moulton

An attempt to use phylogenetic invariants for tree reconstruction was made at the end of the 80s and the beginning of the 90s by Lake and Cavender-Felsenstein. However, the efficiency of the methods based on invariants was not obvious ([Huelsenbeck, 1995]), probably because they only used few generators of the set of phylogenetic invariants. Results and simulation studies about the performance of a new method of phylogenetic reconstruction based on invariants ([Casanellas et al., 2005]) are presented in this talk. These simulations show that it is very competitive and highly efficient, especially for nonhomogeneous phylogenetic trees. Recent improvements for this method using algebraic geometry techniques are also discussed. Part of these results has appeared in [Casanellas and Fernandez-Sanchez, 2007].

References: [Casanellas and Fernandez-Sanchez, 2007] Casanellas, M and Fernandez-Sanchez, J. (2007). Performance of a new invariants method method on homogeneous and nonhomogeneous quarted trees. Mol Biol Evol 24 :288-293.

[Casanellas et al., 2005] Casanellas, M; Garcia, L and Sullivant, S (2005). Catalog of small trees. In Pachter, L. and Sturmfels, B., editors, Algebraic Statistics for computational biology, chapter 15. Cambridge University Press.

[Huelsenbeck, 1995] Huelsenbeck, J. (1995). Performance of phylogentic methods in simulation. Syst. Biol., 44:17–48.

INI 1
12:30 to 13:30 Lunch at Wolfson Court
14:00 to 14:20 W Gilks ([Leeds])
Incorporating uncertainty in distance-matrix phylogenetic reconstruction
Chair: Kathi Huber

One approach to estimating phylogenetic trees supposes that a matrix of estimated evolutionary distances between taxa is available. Agglomerative methods have been proposed in which closely related taxon-pairs are successively combined to form ancestral taxa. Several of these computationally efficient agglomerative algorithms involve steps to reduce the variance in estimated distances. However, formal statistical models and methods for agglomerative distance-based phylogenetic reconstruction have not previously been published.

We propose a statistical agglomerative phylogenetic method which formally considers uncertainty in distance estimates and how it propagates during the agglomerative process. It simultaneously produces two topologically identical rooted trees, one tree having branch lengths proportional to elapsed time, and the other having branch lengths proportional to underlying evolutionary divergence. The method models two major sources of variation which have been separately discussed in the literature: noise, reflecting inaccuracies in measuring divergences, and distortion, reflecting randomness in the amounts of divergence in different parts of the tree. The methodology is based on successive hierarchical generalised least-squares regressions. It involves only means, variances and covariances of distance estimates, thereby avoiding full distributional assumptions. Exploitation of the algebraic structure of the estimation leads to an algorithm with computational complexity comparable to the leading published agglomerative methods. A parametric bootstrap procedure allows full uncertainty in the phylogenetic reconstruction to be assessed.

INI 1
14:20 to 14:40 I Gronau ([Technion])
Adaptive fast convergence - towards optimal reconstruction guarantees for Phylogenetic trees
Chair: Kathi Huber

One of the central challenges in phylogenetics is to be able to reliably resolve as much of the topology of the evolutionary tree from short taxon-sequences. In the past decade much attention has been focused on studying fast converging reconstruction algorithms, which guarantee (w.h.p.) correct reconstruction of the entire tree from sequences of near-minimal length (assuming some accepted model of sequence evolution along the tree). The major drawback of these methods is that when the sequences are too short to correctly reconstruct the tree in its entirety, they do not provide any reconstruction guarantee for sufficiently long edges. Specifically, the presence of some very short edges in the model tree may prevent these algorithms from reconstructing even edges of moderate length.

In this talk we present a stronger reconstruction guarantee called "adaptive fast convergence", which provides guarantees for the correct reconstruction of all sufficiently long edges of the original tree. We then present a general technique, which (unlike previous reconstruction techniques) employs dynamic edge-contraction during the reconstruction of the tree. We conclude by demonstrating how this technique is used to achieve adaptive fast convergence.

Related Links

INI 1
14:40 to 15:00 S Moran ([Technion])
On the hardness of inferring Phylogenies from triplet-dissimilarities
Chair: Kathi Huber

We considers the problem of reconstructing a phylogenetic tree from triplet dissimilarities, which are dissimilarities defined over taxon-triplets. Triplet dissimilarities are possibly the simplest generalization of pairwise dissimilarities, and were used for phylogenetic reconstructions in the past few years. We study the hardness of finding a tree best fitting a given triplet-dissimilarity table under the "maximal difference" criterion. We show that the corresponding decision problem is NP-hard and that the corresponding optimization problem cannot be approximated in polynomial time within a constant multiplicative factor smaller than 1.4. We also address the issue of best-fit under "maximal distortion", which corresponds to the largest ratio between matching entries in two triplet-dissimilarity tables. We show that it is NP-hard to approximate the corresponding optimization problem within any constant multiplicative factor. On the positive side, we present a polynomial time constant-rate approximation algorithm for this problem.

INI 1
15:00 to 15:30 Tea
15:30 to 15:50 M Lacey ([Tulane])
A signal-to-noise analysis of phylogeny estimation by neighbour-joining
Chair: Kathi Huber

The property of fast-convergence describes phylogeny reconstruction methods that, with high probability, recover the true tree from sequences that grow polynomially in the number of taxa. While provably fast-converging methods have been developed, the neighbor-joining (NJ) algorithm of Saitou and Nei remains one of the most popular methods used in practice. This algorithm is known to converge for sequences that are exponential in n, but no lower bound for its convergence rate has been established. To address this theoretical question, we analyze the performance of the NJ algorithm on a type of phylogeny known as a "caterpillar tree." We find that, for sequences of polynomial length in the number of taxa, the variability of the NJ criterion is sufficiently high that the algorithm is likely to fail even in the first step of the phylogeny reconstruction process, regardless of the degree of polynomial considered. This result demonstrates that, for general trees, the exponential bound cannot be improved.

INI 1
15:50 to 16:10 M Fischer ([Canterbury])
How many characters are needed to recontruct the true tree?
Chair: Kathi Huber

Maximum Parsimony, which basically favors the tree hypothesizing the fewest number of character state changes, is still one of the most important methods of tree reconstruction. Even so, MP is known to be prone to fail when the underlying phylogenetic tree has some extreme characteristics, as it is the case, for example, in the Felsenstein-zone or when interior edges are very short. While the Felsenstein-zone has already been well-examined over the years, some aspects of the latter problem remained unsolved. We investigated this on a binary unrooted 4-taxon phylogenetic tree with a short interior edge and pending edges of multiple length. In my talk, I will present an optimal branch length of the interior edge in this scenario and I will explain how many characters are at least needed to reconstruct the ‘true’ tree. Furthermore, we investigated some more properties of MP and found that for some trees, MP performs better when only a subset of the given set of taxa is considered. I will show such a phylogenetic tree and explain, for instance, why for a subset of size 1, i.e. a single leaf, this surprising scenario cannot happen.

INI 1
16:10 to 16:30 G Naylor ([Florida State])
The deep phylogeny problem and its consequences
Chair: Kathi Huber

Some of the most interesting and long-standing evolutionary questions concern relationships among ancient lineages. Molecular sequence data have not really provided the anticipated clear-cut answers to these questions. In general relationships that were cloaked in uncertainty based on traditional comparative anatomy remain uncertain when subjected to molecular approaches. I will present an overview of some of the difficulties associated with estimating relationships among ancient vertebrate lineages based on sequence comparisons.

INI 1
16:30 to 16:50 P Lockhart ([Massey])
On the nature of heterotachy
Chair: Kathi Huber

Heterotachy describes different lineage specific rates of evolution at individual sites in a multiple sequence alignment. Different underlying processes could contribute to this phenomenon. We discuss the form of heterotachy that is likely to occur in Nature. We show using simulations that under a biologically realistic model of heterotachy tree building can have low reconstruction accuracy. We discuss this problem.

INI 1
16:50 to 18:00 Welcome Wine Reception
18:45 to 19:30 Dinner at Wolfson Court (Residents only)
Tuesday 18th December 2007
09:00 to 10:00 A Dress (Chinese Academy of Sciences)
Recent progress in phylogenetic combinatorics
Chair: Stephen Willson
INI 1
10:00 to 10:20 A Kupczok ([CIBIV])
An exact algorithm for computing the geodesic distance between phylogenetic trees
Chair: Stephen Willson

Most of the used measurements for the distinctness of two phylogenetic trees exclusively evaluate topological criteria. For example, the prevalent Robinson-Foulds-distance counts the number of differing splits between two trees. To present a measure which also incorporates branch lengths information, Billera et al. (2001, Adv. Appl. Math) gave a formal mathematical definition of the space of phylogenetic trees and introduced a metric on this space, the so-called geodesic distance. In this framework, a tree topology with n interior branches corresponds to an n-dimensional nonnegative Euclidean space, and special instances of branch lengths define a point in that space. Two tree topologies with m splits in common share an m-dimensional surface. Therefore, the tree space is connected and Billera et al. have shown that there exists a unique shortest path between every two points in this space, the geodesic path of two trees. The length of this path is a continuous distance measure for phylogenetic trees and knowing the exact progression of the path helps in adressing further questions like computing a consensus tree with branch lengths from a set of phylogenetic trees. We demonstrate how to compute the geodesic path and its length in tree space exactly. The method is worst-case exponential in the number of differing splits between the two trees. However, information about incompatibe splits and branch lengths is utilized to avoid the enumeration of futile paths and to improve the performance.

INI 1
10:20 to 10:40 M Owen ([Cornell])
Practical distance computation in the space of phylogenetic trees
Chair: Stephen Willson

There are many different methods for constructing phylogenetic trees from biological data. To either compare one such algorithm with another, or to find the likelihood of a certain tree generated from the data, researchers often compute a distance between trees. In this talk, we present a practical algorithm for computing the geodesic distance, as well as the geodesic path, between two phylogenetic trees in the tree space introduced by Billera, Holmes, and Vogtmann(2001). In doing so, we show that the possible shortest paths between two trees can be represented as chains in a partially ordered set, whose elements are the sets formed by a closure relation on the incompatible edges between the trees. Furthermore, we give a linear time algorithm for computing the geodesic candidate corresponding to each maximal chain in this partially ordered set. Dynamic programming techniques can be used to further prune the search of the partially ordered set for the shortest path.

INI 1
10:40 to 11:00 J Koolen ([Pohang])
Optimal realisations
Chair: Stephen Willson

For a metric $(X,d)$ a weighted graph $G= (V, E, w)$ is called a realization of $d$ if (i) $X \subseteq V$ (ii) $d_g(x,y) = d(x,y)$ for all $x, y \in X$.

A realization is called {\em optimal} if the sum $\sum_{xy \in E} w(xy)$ is minimal among all the realizations. It is known that to find an optimal realization is NP-hard in general. But for the case of a tree-metric, i.e. a metric coming from a tree, it is the underlying (weighted) tree.

in this talk I will discuss properties of optimal realizations and why they are useful for the area of phyogenetic methods

INI 1
11:00 to 11:30 Coffee
11:30 to 11:50 J Rhodes ([Alaska Fairbanks])
The identifyability of matrix parameters for covarion-like models
Chair: Stephen Willson

A covarion or site-specific-rate-variation model of sequence evolution describes a process in which the substitution rate of each character (nucleotide, codon, amino acid) can speed up or slow down over time, as might be caused by changing functional constraints. Though the model was formulated a decade ago, only in the last few years was it shown that the topology of the evolutionary tree is identifiable for this model. We now establish identifiability of the numerical parameters, such as edge lengths and substitution rates. Necessary technical assumptions are that the number of rate classes be less than the number of character states, the tree relates at least 7 taxa, and numerical parameters are `generic'.

INI 1
11:50 to 12:10 M Bordewich ([Durham])
Optimising phylogenetic diversity across two trees
Chair: Stephen Willson

We present a polynomial-time algorithm for finding an optimal set of taxa that maximizes the weighted-sum of the phylogenetic diversity across two phylogenetic trees. This resolves one of the challenges proposed at the 'Current Challenges and Problems in Phylogenetics' workshop in the first week of the Phylogenetics programme. It also completely closes the gap between optimizing phylogenetic diversity on one tree, which is known to be in P, and optimizing phylogenetic diversity across three or more trees, which is known to be NP-hard.

INI 1
12:30 to 13:30 Lunch at Wolfson Court
14:00 to 14:20 R Yoshida ([Kentucky])
On the optimality of the neighbour-joining algorithm
Chair: Magnus Bordewich

The popular neighbor-joining (NJ) algorithm used for phylogenetic tree reconstruction is a greedy algorithm for finding the minimum evolution (ME) tree associated to a dissimilarity map. >From this point of view, NJ is ``optimal'' when the algorithm outputs the tree which minimizes the balanced minimum evolution criterion. We use the fact that the NJ tree topology and the ME tree topology are determined by polyhedral subdivisions of the spaces of dissimilarity maps ${\R}_{+}^{n \choose 2}$ to study the optimality of the neighbor-joining algorithm. In particular, we investigate and compare the polyhedral subdivisions for $n \leq 10$. A key requirement is the measurement of volumes of spherical polytopes in high dimension, which we obtain using a combination of traditional Monte Carlo methods and polyhedral algorithms. Our analysis reveals new insights into the performance of the NJ and ME algorithms for phylogenetic reconstruction. In particular we show that highly unrelated trees can be cooptimal in ME reconstruction, and that NJ optimality regions are not convex. We also conjecture that neighbor joining's ability to recover the ME tree depends on the diameter of the ME tree.

This is joint work with K. Eickmeyer, P. Huggins, and L. Pachter.

INI 1
14:20 to 14:40 V Moulton ([East Anglia])
Encoding phylogenetic trees with weighted quartets
Chair: Magnus Bordewich

Various methods have been proposed for constructing phylogenetic trees (and networks) that work by trying to piece together quartet trees, i.e. phylogenetic trees with four leaves. Hence, it is of interest to characterise when a collection of quartet trees corresponds to a (unique) phylogenetic tree. Recently, Dress and Erdos provided such a characterisation for binary phylogenetic trees, that is, phylogenetic trees all of whose internal vertices have degree 3. Here we provide a new characterisation for arbitrary phylogenetic trees - or, equivalently, compatible split systems - and discuss some extensions of this result to more general split systems.

This is joint work with Stefan Gruenewald, Katharina Huber, Charles Semple and Andreas Spillner.

INI 1
14:40 to 15:00 C Semple ([Canterbury])
Identifying Phylogenies: extending the notion of unique
Chair: Magnus Bordewich

Uniqueness is an extremely strong property. Even if the input data contains no conflicting information, to find that there is exactly one evolutionary tree that is consistent with the data is very unlikely. A property that is somewhat weaker and yet essentially just as good is that of identifiability. In this talk, we introduce and describe some recent work on this property.

INI 1
15:00 to 15:30 Tea
15:30 to 15:50 P Wagner ([Rostock])
On (k.l)-leaf powers
Chair: Magnus Bordewich

We say that, for k>1 and l>k, a tree T is a (k,l)-leaf root of a graph G=(V,E), if V is the set of leaves of T, for all edges xy in E, the distance d(x,y) between x and y in T (i.e., the number of edges of the unique path between the leaves x and y in T) is at most k and, for all non-edges xy outside E, d(x,y) is at least l. A graph G is a (k,l)-leaf power, if it has a (k,l)-leaf root.

This new notion generalises the concept of k-leaf power, which was introduced and studied by Nishimura, Ragde and Thilikos in 2002, motivated by the search for underlying phylogenetic trees: "...a fundamental problem in computational biology is the reconstruction of the phylogeny, or evolutionary history, of a set of species or genes, typically represented as a phylogenetic tree..." The species occur as leaves of the phylogenetic tree.

Recently, a lot of work has been done on k-leaf powers and roots as well as on their variants phylogenetic roots and Steiner roots. For k=3 and k=4, structural characterisations and linear time recognition algorithms of k-leaf powers are known, and, recently, a polynomial time recognition of 5-leaf powers was given. For larger k, the recognition problem is open. We give structural characterisations of (k,l)-leaf powers, for some k and l, which also imply efficient recognition of these classes, and in this way we also improve and extend a recent paper from 2006 by Kennedy, Lin and Yan on strictly chordal graphs and leaf powers.

INI 1
15:50 to 16:20 C Howe ([Cambridge])
Phylogenetics of medieval manuscripts
Chair: Magnus Bordewich

There are remarkable similarities between the error-prone copying of DNA sequences and the incorporation of errors by scribes as they copied manuscripts in the days before printing. Since the mid-19th century, manuscript scholars have used the distribution of ‘common errors’ to try to reconstruct the copying history of sets of manuscripts in a very labour-intensive way. We have shown that phylogenetic tree-building programs can be a powerful tool for scholars to analyse manuscript copying history much more rapidly than before. Of particular interest is the phenomenon of ‘contamination’, where a scribe would use more than one version of a text in constructing a copy. This has obvious parallels to lateral gene transfer and recombination. We are also interested in analysing DNA from the parchment itself to provide insights into medieval husbandry patterns as well as manuscript provenance.

INI 1
16:20 to 16:40 Discussion INI 1
Wednesday 19th December 2007
09:00 to 10:00 D Aldous ([California])
Some broad questions about the tree of life
Chair: Elizabeth Allman

Stepping away from technical issues in tree reconstruction from molecular data on extant species, I will describe broader questions that can be asked about the Tree of Life, and our attempts to devise a framework for their study. Here are three such questions. (i) It is appealing to say that any readily identifiable clade arose through adaptive radiation. Does data support this statement? What would trees look like if they didn't arise through adaptive radiation? (ii) For some clades (e.g. horse) the number $n$ of extant species is much smaller than the maximum (over past time) number $M$ of coexisting species. Is this remarkable? How large should we expect $M/n$ to be for a typical clade? (iii) Given the true phylogenetic tree on the extant species of a clade, but no information about extinct species, can one make any (even crude) estimates of past speciation and extinction rates?

More technical questions concern the likely shape of trees on higher-order taxa, or the likely size of fluctuation in time series of numbers of such taxa.

Based on joint work with Lea Popovic and Maxim Krikun.

Related Links

INI 1
10:00 to 10:20 T Gernhard ([Munchen])
Ranked tree shapes, shuffles, and a new test for neutrality
Chair: Elizabeth Allman

Phylogenetic tree shapes are an established object of study, and have proven useful for testing macroevolutionary hypotheses. However, tree shapes are typically considered without branch length information, which eliminates a considerable amount of useful data. We propose the use of ``ranked'' phylogenetic tree shapes-- tree shapes such that the order of diversification events is specified-- to test macroevolutionary hypotheses. We describe two classes of null models on ranked tree shapes, and then propose a test for deviation from these models. In particular, our test rejects the null hypothesis when relative rates of diversification for two daughter clades change over time. Thus, we consider the test as looking for evidence of ``bursting'' diversification on one hand, or ``refractory'' diversification on the other. Our test rejects neutral evolution for a sample of sequences of the Hepatitis C virus which will be discussed.

INI 1
10:20 to 10:40 B Faller ([Canterbury])
Distribution of phylogenetic diversity under random extinction
Chair: Elizabeth Allman

Phylogenetic diversity is a measure for describing how much of an evolutionary tree is spanned by a subset of species. If one applies this to the (unknown) subset of current species that will still be present at some future time, then this 'future phylogenetic diversity' provides a measure of the impact of various extinction scenarios in biodiversity conservation. We have studied the distribution of future phylogenetic diversity under a simple model of extinction (a generalized 'field of bullets' model). In this talk, I present our finding that the distribution of future phylogenetic diversity converges to a normal distribution as the number of species grows. This result is significant for biodiversity conservation.

INI 1
10:40 to 11:10 Coffee
11:10 to 11:30 A von Haeseler ([MFPL, Vienna])
10 Men and 1 woman?
Chair: Elizabeth Allman

Recently Patterson et al. 2006 speculated about a complex speciation of humans and chimpanzees. We re-analysed their data and will discuss a more parsimonious solution. Our interpretation of the data is supported by remains of behavioral traits in isolated human populations and in behavioral aspects of chimpanzees.

INI 1
11:30 to 11:50 Discussion INI 1
12:00 to 13:00 Lunch at Wolfson Court
13:00 to 17:00 Excursion to Bletchley Park INI 1
18:45 to 19:30 Dinner at Wolfson Court (Residents only)
Thursday 20th December 2007
09:00 to 10:00 A Roger ([Dalhousie])
Markov models of protein evolution: Lets get real!
Chair: Daniel Huson

Widely-used models of amino acid sequence evolution, such as the WAG and JTT models, are based on empirically-derived rate matrices that capture the exchangeability and frequencies of amino acids observed in large databases of alignments. For phylogenetic estimation, these are typically combined with some kind of parametric model for differeing rates amongst sites. However, these models ignore important effects in the evolution of real protein sequences such as changes in the rate of evolution at a position over the tree, changes in the amino acid frequencies in different lineages and site-specific amino acid exchangeabilities and frequencies that depend on the 3D structural and functional context of the site in the protein. Here I provide an overview of several models we have developed to capture these biologically important effects. We have addressed the problem of rate changes over lineages by implementing a general covarion model for amino acid sequences in the maximum likelihood framework. We have approached the problem of site-specific substitution processes in two ways. First we have implemented a “mixture of frequencies” model that captures the 3 dominant amino acid frequency patterns in a set of 21 taxon-rich alignments. Second, we have implemented an independence energy model that uses statistical energy potentials from the 3D structure of the protein of interest to construct site-specific substitution matrices. Likelihood ratio tests indicate that all of three of these models have significantly better fit to real protein sequence data than the standard empirical amino acid substitution models. Furthermore, some of these more biologically realistic substitution models avoid phylogenetic estimation biases that can arise if these important effects are ignored.

Related Links

INI 1
10:00 to 10:20 T Gesell ([MFPL, Vienna])
A phylogenetic definition of structure and further animations on the sequence space
Chair: Daniel Huson

We want to show pictures and animations illustrating possible novel ways to model sequence evolution, as well as discussing the importance of a definition of structure. Our phylogenetic definition of structure consists of three aspects: The substitution matrix, a neighbourhood system and the phylogenetic tree. The substitution matrix specifies the evolutionary dynamics of nucleotide evolution. However, the matrix is influenced by the neighbourhood system, that defines the interactions among sites in a DNA sequences. The phylogenetic tree introduces an additional dependency pattern in the observed sequences. To take all this aspects into account, we have developed SISSI (Simulating Site-Specific Interactions). SISSI simulates the evolution of a nucleotide sequence along a phylogenetic tree taking into account user defined site-specific interactions.Thus, it is possible to mimic sequence evolution under structural constraints. Based on illustrative examples and simulations, we will discuss the different definitions of structure, for example single minimum free energy and consensus structure, as well as our definition of structure from a phylogenetic point of view. This will be demonstrated with different phylogenetic animations.

INI 1
10:20 to 10:40 L Zhang ([Singapore])
Selecting genomes for reconstruction of ancestral genomes
Chair: Daniel Huson

It is often impossible to sequence all descendent genomes to reconstruct an ancestral genome. This fact leads to studying the genome selection for reconstruction problem. The talk covers our recent work in reconstruction accuracy analysis and greedy algorithm for the problem of selecting a given number of genomes for the best reconstruction.

INI 1
10:40 to 11:00 D Catanzaro ([Libre de Bruxelles])
MIP models for phylenetic reconstruction under minimum evolution
Chair: Daniel Huson

Molecular phylogenetics provides several criteria to select a phylogeny among plausible alternative ones. Usually, such criteria can be expressed in terms of objective functions, and the phylogenies optimizing them are referred as optimal. One of the most important criteria is Minimum Evolution (ME) which states that the optimal phylogeny for a given set of organisms is the one whose sum of the edge weights is minimal. Finding the phylogeny satisfying the minimum evolution criterion involves the solution of an optimization problem, called Minimum Evolution Problem (MEP), notoriously NP-Hard. Herewith, we introduce a number of mixed integer programming models and provide possible cuts and lower bounds for the optimal value.

INI 1
11:00 to 11:30 Coffee
11:30 to 11:50 G Baele ([Gent])
Challenges in modeling context-dependent evolution
Chair: Daniel Huson

Over the past decade, several proposals have been made for relaxing the assumption that sites evolve independently in phylogenetic inference, for example by modeling base-pair evolution, CpG effects and certain cases of site-dependent evolution. In this work, we model dependencies between sites by allowing the evolution at a site to depend upon its “context of evolution”, as given by its two immediate neighboring sites. Specifically, each site is assigned a specific evolutionary nucleotide model depending on the identities of its neighbors. To efficiently evaluate the corresponding models, we employ a “data augmentation” approach in an MCMC framework. In this presentation, we introduce the above approach and pay special attention to 2 potential problems. The first originates from our model’s implicit assumption that the neighboring sites remain fixed along each branch, which may be restrictive, especially for long branches. To accommodate this, we add intermediate (single-child) nodes on longer branches to explicitly allow for the neighboring sites to evolve along such branches. We evaluate the importance of these intermediate “ancestral sequences” by calculating the appropriate Bayes factor via thermodynamic integration. The second problem lies in the drastic increase in parameters that may arise when taking into account every possible context of evolution. We discuss efficient Bayesian approaches for model building which acknowledge the trade-off between added number of parameters and increased model fit.

INI 1
11:50 to 12:10 S Whelan ([Manchester])
A simple model for a complex world
Chair: Daniel Huson

Models of nucleotide substitution have enabled accurate reconstruction of many evolutionary relationships. The majority of commonly used models are special cases of the General Time Reversible (GTR) model with rate variation across sites described by a Γ-distribution (+Γ) and by a proportion of invariant sites (+I). Unfortunately, GTR+Γ+I family models rarely describe all the variation present in real sequence data. Many cases of complex phenomena, such as changes in GC content and heterotachy, have been identified, and several studies have shown that analyses produced from simple models can lead to erroneous conclusions.

I introduce a simple general hidden Markov model of nucleotide substitution that allows many evolutionarily important factors to vary both spatially between sites in the alignment and temporally through the tree. The new model may be considered a generalisation of the covarion model of Tuffley and Steel, which enables the rate of evolution, nucleotide frequencies, and the transition/transversion rate ratio to vary according to a hidden process. I describe the new model’s relationship to existing models and use it to investigate spatial and temporal variation in real data. The results of these analyses show that all of the factors examined have the potential to vary both temporally and spatially. Furthermore, the new model provides substantially better fit to sequence data than GTR+Γ+I models.

INI 1
12:30 to 13:30 Lunch at Wolfson Court
14:00 to 14:20 E Nisbet ([Cambridge])
Evolution of Chloroplasts
Chair: Charles Semple

The classical, text-book view is that chloroplasts in plants and algae arose from a single endosymbiotic event where an early eukaryote engulfed a photosynthetic cyanobacterium. This cyanobacterium became enslaved, giving rise to the three classes of chloroplast: those in green algae and plants, those in red algae and those in glaucophytes. By analysing the phylogenetic history of all retained, chloroplast genes and with comparison to cyanobacterial genomes, we present evidence showing that the history of the chloroplast may be more complex, and that more than one endosymbiosis event cannot be excluded.

RER Nisbet (Cambridge), M. Spencer (Liverpool), C. J. Howe (Cambridge), P.J. Lockhart (Massey).

INI 1
14:20 to 14:40 K St John ([New York])
Untangling tanglegrams
Chair: Charles Semple

Given two phylogenetic trees on the same leaf set, how best can they be displayed? If the trees are presented "sideways" with their roots on the outside and leaves in the middle, we can join the corresponding leaves in the trees by edges. Call this diagram a tanglegram. The crossing number of a tanglegram is the number of leaf-leaf edges that cross. Work by Dwyer and Schreiber `04 (and rediscovered by Valiente and co-workers, WABI `07) show that the tanglegram with optimal crossing number can be found in polynomial time if one tree is fixed. We present several new results for the more general case when both trees can be rearranged to give the tanglegram with the minimal crossing number.

INI 1
14:40 to 15:00 S Linz ([Canterbury])
Hybridisation in non-binary trees
Chair: Charles Semple

Molecular phylogenetics, the study of reconstructing evolutionary trees, is a well-established field of scientific endeavor. However, in certain circumstances, evolution is not best represented by trees. For example, a comparison of gene trees representing a set of taxa and reconstructed for different genetic loci sometimes reveals conflicting tree topologies. These discrepancies are often due to reticulation events such as horizontal gene transfer and hybridization. Assuming that the initial data set is correct, a basic problem is to compute the minimum number of reticulation events to explain the evolutionary history for a set of present-day species.

Recently, it has been shown that a theoretical framework - based on the notion of agreement forests - can be used to calculate the number of reticulation events for two rooted binary phylogenetic trees. Together with two reduction rules, this leads to an algorithm that runs efficiently for many biological examples and gives the exact solution. However, for many data sets, the reconstructed trees are not fully resolved. In this talk, we extend this framework to non-binary trees. Further, we show that the subtree prune and regraft distance between two non-binary trees, which is often used to model reticulate evolution, can be approached in a similar way.

INI 1
15:00 to 15:30 Tea
15:30 to 15:50 D Huson ([Tubingen])
Phylogenetic networks
Chair: Charles Semple

This will be a report on recent work on phylogenetic networks.

Related Links

INI 1
15:50 to 16:10 S Willson ([Iowa State])
Reconstruction of certain phylogenetic networks from the genomes at leaves
Chair: Charles Semple

A network N is a rooted acyclic directed graph. A base-set X for N is a subset of vertices including the root (or outgroup), all leaves, and all vertices of outdegree 1. A simple model of evolution is considered in which all characters are binary and in which back-mutations occur only at hybrid vertices. It is assumed that the genome is known for each member of the base-set X. If the network is known and is assumed to be "normal," then the genome of every vertex is uniquely determined and can be explicitly reconstructed. Under additional hypotheses involving time-consistency and separation of the hybrid vertices, the network itself can also be reconstructed from the genomes of all members of X. A polynomial-time procedure is outlined for performing the reconstruction.

INI 1
16:10 to 16:30 Discussion INI 1
19:30 to 23:00 Conference Dinner at Corpus Christi College
Friday 21st December 2007
09:00 to 10:00 N Rosenberg ([Michigan])
Lineage sorting and multi-species coalescent models
Chair: Mike Steel

Assuming that evolution follows a coalescent model within individual species, we examine the distribution of gene tree topologies conditional on species trees and the computation of this distribution. A particular focus of this talk will be on the properties of "anomalous gene trees," gene tree topologies that are more likely to evolve than the topology that matches the species tree. Some new results will be shown that illustrate the increase in complexity of anomalous gene trees with five or more taxa, and the results will be discussed in terms of their implications for phylogenetic inference from multigene data.

INI 1
10:00 to 10:20 J Degnan ([Michigan])
Coalescent consequences for consensus cladograms
Chair: Mike Steel

To investigate the theoretical properties of consensus trees obtained from large numbers of gene trees evolving at different loci, we construct consensus trees from independent gene trees that occur in proportion to their probabilities from coalescent theory. We consider majority-rule, rooted triple (R*), and greedy consensus both asymptotically as the number of loci approaches infinity and for finite numbers of loci. We investigate the effects of species tree branch lengths and find different consistency results for the three methods: majority-rule consensus trees can be increasingly likely to be unresolved, although not misleading; greedy consensus trees can be misleading for all species tree topologies with at least 5 taxa; and R* consensus is statistically consistent, i.e., guaranteed to return the species tree topology given enough loci, for any binary species tree topology with any set of branch lengths. We also investigate the performance of consensus trees as species tree estimators when there are finite numbers of loci.

INI 1
10:20 to 10:40 C Kosiol ([Cornell])
The influence of effective population size on a genome-wide positive selection scan
Chair: Mike Steel

There has been some discussion on the effect of population size on positive selection scans. It has been argued that positive selection as well as purifying selection is less efficient in humans and chimps than rodents because of their smaller population sizes than murids (Keightley et al., 2005; Rhesus Macaque Sequence and Analysis Consortium, 2007).

We have performed a positive selection scan on six mammalian genomes: human, chimpanzee, mouse, rat and dog. We find fewer positive selected genes on the primate clade than on the rodent clade. We try to quantify this effect by estimating ratios of population sizes of the species from the genomic data. The estimation is based on a extension of population genetic interpretation of a branch model of codon substitutions by Halpern and Bruno 1998 (see also Nielsen and Yang, 2003). The estimates we obtain for the ratio of macaque and human effective population sizes are in good agreement with the ratio calculated from polymorphim data (Wall et al., 2003, Hernandez et al, 2007). Calculations of further population size ratios on the mammalian tree with our method are underway. We will also discuss limitations of the estimation of population ratios from genomes of single individuals.

INI 1
10:40 to 11:00 Y Song ([California])
A genealogical approach to studying asexuality
Chair: Mike Steel

Given molecular genetic data from diploid individuals that, at present, reproduce mostly or exclusively asexually without recombination, an important problem in evolutionary biology is detecting evidence of past sexual reproduction (i.e., meiosis and mating) and recombination (both meiotic and mitotic). However, currently there is a lack of computational tools for carrying out such a study. In this talk, I will take a modest step toward filling that gap and describe a new method of testing asexuality by explicitly considering the evolutionary history of asexual diploid individuals.

INI 1
11:00 to 11:30 Coffee
11:30 to 11:50 A Labarre ([Libre de Bruxelles])
Minimum common supergraphs and haplotype networks
Chair: Mike Steel

In 2005 Cassens, Mardulyn and Milinkovitch proposed a new method for constructing phylogenetic networks in the context of intraspecific genealogies, also known as ``haplotype networks''. The proposed method, called ``Union of Maximum Parsimonious Trees'', is based on the global maximum parsimony approach, which aims at combining all most parsimonious trees into a single graph (in that context, trees are unrooted and undirected). However, their algorithm makes a number of arbitrary choices, produces solutions whose quality depends on the order in which the merging process is performed, and is a heuristic with an unclear objective function. We propose a combinatorial optimisation problem that can be used as a formal model for building such a graph, which consists in finding the minimum common supergraph of a given set of partially labelled trees. We propose a polynomial-time algorithm for solving the problem on two trees of a certain class, and a branch-and-bound algorithm in the case of two arbitrary trees. We will also discuss possible approaches when dealing with more than two trees.

INI 1
11:50 to 12:10 E Mossel ([California])
Incomplete lineage sorting: consistent phylogeny estimation from multiple loci
Chair: Mike Steel

We introduce a simple algorithm for reconstructing phylogenies from multiple gene trees in the presence of incomplete lineage sorting, that is, when the topology of the gene trees may differ from that of the species tree. We show that our technique is statistically consistent, that is, it returns the correct tree given sufficiently many unlinked loci. We also show that our technique can tolerate moderate estimation errors.

INI 1
12:10 to 12:30 Discussion, concluding remarks INI 1
12:30 to 13:30 Lunch at Wolfson Court
University of Cambridge Research Councils UK
    Clay Mathematics Institute London Mathematical Society NM Rothschild and Sons