Calling small indels in the 1000 Genomes low-coverage and high-coverage pilots
Seminar Room 1, Newton Institute
We developed a Bayesian realignment approach to call small indels in the 1000 Genomes low-coverage and high-coverage pilots. The method takes a set of candidate indels as input; from these and other candidate sequence variants identified from the mapped reads, we generate candidate haplotype sequences, which represent alternative hypotheses to the reference sequence. The reads are then realigned to these candidate haplotypes, yielding likelihoods from which the indel calls are made given suitable prior haplotype probabilities. The approach is naturally suited to account for context-dependent sequencing indel error rates. We will show results for the 1000 Genomes data, and of validation experiments that indicate the method achieved a low false-discovery rate.