Next-generation data analysis
Seminar Room 1, Newton Institute
I will describe the analysis pipeline we are developing for Illumina/Solexa sequencing, including programs 'next_phred' for image analysis, basecalling, & quality assignment, 'phaster' for ultrafast quality-aware alignment of reads to a reference genome, and 'phast_lane' for identifying sequence variants from aligned reads. Next_phred's methods are quite different from Illumina's, and yield substantially more alignable reads, with a reduced error rate and more discriminating quality values. Phaster uses a simple word-frequency based strategy to efficiently search reads against an indexed reference genome. It finds gapped alignments, reports mapping quality scores analogous to those in Maq (Li et al 2008), and in our tests is superior in speed and sensitivity to programs based on the Burrows-Wheeler transform.