Bayesian nonparametric methods for prediction in EST analysis (Venue: GH seminar RM2)
Seminar Room 2, Newton Institute Gatehouse
Expressed sequence tags (ESTs) analyses are an important tool for gene identification in organisms. Given a preliminary EST survey from a certain cDNA library, various features of a possible additional sample have to be predicted. For instance, interest may rely on estimating the number of new genes to be detected and the gene discovery rate at each additional read. We propose a Bayesian nonparametric approach for prediction in EST analysis based on nonparametric priors inducing Gibbs-type exchangeable random partitions and derive estimators for the relevant quantities. Several EST datasets are analysed by resorting to the two parameter Poisson-Dirichlet process, which represents the most remarkable Gibbs-type prior. Our proposal has appealing properties over frequentist nonparametric methods, which become unstable when prediction is required for large future samples.