Statistical challenges in using comparative genomics for the identification of functional sequences
Seminar Room 1, Newton Institute
There are two main aspects of comparative sequence analysis that rely on high-dimensional statistical approaches: identifying evolutionarily constrained regions and determining the significance of their overlap with functional sequences. The identification of constrained sequences largely relies on our understanding of evolutionary models and applying them to multi-sequence alignments. However, our understanding of evolutionary processes is incomplete and our ability to generate perfect multi-sequence alignments is hampered by incomplete sequence datasets and general uncertainty in the process; these factors can lead to multiple equally plausible alignments, only one of which is typically represented in downstream analyses. In order to mitigate some of these issues, we have been developing new comparative genomics approaches that take into account the biochemical physical properties of DNA, such that we can understand which substitutions are more tolerable with respect to the three dimensional structure of DNA, and thus more neutral in evolution. We also plan to start taking into account alignment uncertainty into our predictions of constrained sequences. Determining the significance of our improved sequence constraint methods relies on a new statistical approach for determining the significance of overlap with known functional annotations. This new method, devised by Peter Bickel and colleagues, was applied to analyses performed within the ENCODE consortium and provides the basis for newer methods that will be discussed later in this meeting.
- http://genome.gov/encode/ - Information about the ENCODE Consortium
- http://genome.gov/Pages/Research/ENCODE/nature05874.pdf - Main publication from the ENCODE Pilot Project
If it doesn't, something may have gone wrong with our embedded player.
We'll get it fixed as soon as possible.