Isaac Newton Institute for Mathematical Sciences

Mathematical and Statistical Aspects of Molecular Biology

Statistical methodologies for detecting non-coding RNAs in vertebrate genomes

Author: Gayle McEwen (RFCGR)

Abstract

In recent years, the importance of non-coding RNAs (ncRNAs) as diverse functional molecules within the cell has become apparent. There are many species of ncRNAs that play vital roles in transcriptional/translational regulation, development and splicing. The majority of ncRNAs were first discovered fortuitously but recently many bioinformatic and statistical methodologies have been developed to predict new ncRNAs; however it is thought that a large number of ncRNAs remain to be discovered. We have used the partition function algorithm for sub-optimal folding of RNA to study the folding free energy of known ncRNAs compared to random shuffles of the sequences (with dinucleotide content preserved). From this, we have found a number of statistical measures that can be used to weakly detect ncRNAs. To improve the predictions in vertebrate genomes, we have used a comparative genomics approach with the human, mouse and Fugu rubripes genomes to locate conserved non-coding regions. These conserved regions are then tested to see if they show the characteristics of ncRNAs.