Ancestral Recombination Graphs for Missing and Removable Data
Seminar Room 1, Newton Institute
Recombination is a widespread feature of genomes and the study of recombination patterns is an important component in the design and analysis of disease association studies. Advances in algorithms for studying recombination must take into consideration the increasing size of data in both population size and number of sites typed. The goal of our work is to develop and implement algorithms to analyze recombination events in data that cannot be analyzed by existing methods, including data sets that are too large or data sets that include missing entries. We demonstrate a branch and bound approach that is practical, robust, and efficient, and furthermore show that the approach can be extended to analyze recombination bounds on circular data and input data with missing entries. We apply our methods to analyze simulated data and a benchmark lipoprotein gene and find that by including sites ignored in previous analysis (due to missing entries), the amount of overall detected reco mbination can be increased over prior methods.