Genome-wide characteristics of sequence coverage by next-generation sequencing: how does this impact interpretation?
Seminar Room 1, Newton Institute
With a greatly increased capacity to generate sequence data from a sample comes unprecedented levels of resolution of the genome or transcriptome under study. Interpretations derived from analysis of this sequence data often hinge on a study of density or counts of sequence reads being observed from a particular region of the genome or transcriptome either for purposes of comparison between samples or for a qualitative description of which sequences are present in the sample under study. A challenging aspect of this sort of analysis is that sequence read density across the genome has been observed to be highly variable within and between samples and the sources of this variability are yet to be fully explained. This talk briefly explores examples of variability and some possible causes underlying a small number of these and how this understanding can be used to improve interpretation. In particular, we investigate the utility of an empirically derived understanding of intra-genome k-mer uniqueness to inform sequence read alignment and interpretation. We investigate the properties of k-mers within a range of sequence datasets with respect to sequencing bias, functional annotation and interpretation of alignment outcomes.