skip to content

Pre-modelling via BART

Thursday 28th February 2008 - 11:00 to 12:00
INI Seminar Room 1

Consider the canonical regression set-up where one wants to learn about the relationship between y, a variable of interest, and x_1,...,x_p, p potential predictor variables. Although one may ultimately want to build a parametric model to describe and summarize this relationship, preliminary analysis via flexible nonparametric models may provide useful guidance. For this purpose we propose BART (Bayesian Additive Regression Trees), a flexible nonparametric ensemble Bayes approach for estimating f(x_1,...,x_p), which is E(Y|x_1,...,x_p), for obtaining predictive regions for future y, for describing the marginal effects of subsets of x_1,...,x_p and for model-free variable selection. Essentially, BART approximates f by a Bayesian 'sum-of-trees' model where fitting and inference are accomplished via an iterative backfitting MCMC algorithm. By using a large number of trees, which yields a redundant basis for f, BART is seen to be remarkably effective at finding highly nonlinear relationships hidden within a large number of irrelevant potential predictors. BART also provides an omnibus test: the absence of any relationship between y and any subset of x_1,...,x_p is indicated when BART posterior intervals for f reveal no signal. (This is joint work with Hugh Chipman and Robert McCulloch.)

Presentation Material: 
University of Cambridge Research Councils UK
    Clay Mathematics Institute London Mathematical Society NM Rothschild and Sons