Integration of sequence and array data in a population and haplotype-based model of SNPs and CNVs
Seminar Room 1, Newton Institute
Read depth analysis has been proposed as a method for detecting copy number variants from second generation sequence data. However, the resolution of this approach has a strong dependence on coverage. Thus, the resolution of current single-sample methods may be limited on low-coverage population sequencing projects such as the 1000 genomes project.
To overcome this, we developed a haplotype model which jointly learns the local CNV structure, as well as the CNV/SNP haplotype structure in the entire population. We used dense array CGH data collected on Hapmap samples to ascertain the resolution improvement available from this approach.
We have previously quantified the improvement in CNV genotyping accuracy from integrating multiple genotyping and CGH platforms in a single probabilistic model. In order to investigate whether existing array data can improve CNV genotyping accuracy from low coverage sequence data, we also integrated array and sequence data in our model, and ascertained improvements in CNV genotyping accuracy.