Challenges in modeling context-dependent evolution
Seminar Room 1, Newton Institute
Over the past decade, several proposals have been made for relaxing the assumption that sites evolve independently in phylogenetic inference, for example by modeling base-pair evolution, CpG effects and certain cases of site-dependent evolution. In this work, we model dependencies between sites by allowing the evolution at a site to depend upon its context of evolution, as given by its two immediate neighboring sites. Specifically, each site is assigned a specific evolutionary nucleotide model depending on the identities of its neighbors. To efficiently evaluate the corresponding models, we employ a data augmentation approach in an MCMC framework. In this presentation, we introduce the above approach and pay special attention to 2 potential problems. The first originates from our models implicit assumption that the neighboring sites remain fixed along each branch, which may be restrictive, especially for long branches. To accommodate this, we add intermediate (single-child) nodes on longer branches to explicitly allow for the neighboring sites to evolve along such branches. We evaluate the importance of these intermediate ancestral sequences by calculating the appropriate Bayes factor via thermodynamic integration. The second problem lies in the drastic increase in parameters that may arise when taking into account every possible context of evolution. We discuss efficient Bayesian approaches for model building which acknowledge the trade-off between added number of parameters and increased model fit.