Isaac Newton Institute for Mathematical Sciences

Mathematical and Statistical Aspects of Molecular Biology

Bayesian logistic regression using a perfect phylogeny

Authors: Taane G Clark (Department of Statistics, University of Oxford), Maria De Iorio (Department of Mathematics, Imperial College), Robert C Griffiths (Department of Statistics, University of Oxford)


Haplotype data capture the genetic variation among individuals in a population and among populations. An understanding of this variation and the ancestral history of haplotypes is essential in genetic association studies of complex disease. We introduce a method for detecting associations between disease and haplotypes in a candidate gene region or candidate block with little or no recombination. In this setting, a perfect phylogeny constraint or the equivalent gene tree representation demonstrates the evolutionary relationship between single-nucleotide polymorphisms (SNPs) in the haplotypes. Our approach extends the logic regression approach (Ruczinski et al, 2003} to a Bayesian framework, and constrains the model space to that of a gene tree or perfect phylogeny. The gene tree hypothesis imposes constraints on possible solutions to the association problem. Because the method is within a regression framework, environmental factors and SNP-environment interactions can be incorporated, and tools for model diagnostics can be implemented. We demonstrate our method on simulated data from a coalescent model, as well as data from a candidate gene study of smoking persistence.