Isaac Newton Institute for Mathematical Sciences

Mathematical and Statistical Aspects of Molecular Biology

Graphical modeling of the joint distribution of alleles at associated loci.

Author: Alun Thomas (University of Utah)

Abstract

Pairwise linkage disequilibrium, haplotype blocks and recombination hot spots provide only a partial description of the patterns of dependences and independences between the allelic states at proximal loci. On the gross scale, where recombination and spatial relationships dominate, the associations can be reasonably described in these terms. However, on the fine scale of current high density maps the mutation process is also important and creates associations between loci which are independent of the physical ordering and which can not be summarized with pairwise measures of association.

Graphical modeling provides a standard statistical framework for characterizing precisely this sort of complex stochastic data. While graphical models are often used in situations where assumptions lead naturally to specific models, it is less well known that estimation of graphical models is also a developed field.

We show how decomposable graphical models can be fitted to dense genetic data. The objective function is the maximized log likelihood for the model penalized by a multiple of the model's degrees of freedom. Simulated annealing is used to find good solutions. The great potential of this approach is that categorical phenotypes can be included in the same analysis and association with polymorphisms assessed jointly with the inter locus associations. We illustrate our method and its potential with phenotypic data on sex, prostate cancer and genotypic data from 25 loci in the ELAC2 gene. The results contain third and fourth order locus interactions and show that at this density of markers linkage disequilibrium is not related to physical distance in a simple monotonic fashion. Graphical models provide more flexibility to express these features of the joint distribution of alleles than do simple monotonic functions connecting physical and genetic maps.