Isaac Newton Institute for Mathematical Sciences

Mathematical and Statistical Aspects of Molecular Biology

Sparse Bayesian Learning for analysis of biological sequences

Authors: Thomas A. Down (Sanger Institute), Tim J. P. Hubbard (Sanger Institute)

Abstract

Sparse Bayesian Learning is a recently developed machine learning framework which focuses on building simple models from large sets of candidate features. Here, we describe a protocol for using a Sparse Bayesian trainer, the Relevence Vector Machine, to explore extremely large sets of candidate features, and a family of models which apply the power of the RVM to classifying and detecting interesting points and regions in biological sequence data. The models described here have been applied successfully to the prediction of promoters, transcription start sites and other sites of interest in large genome sequences.