# Seminars (BNRW01)

Videos and presentation materials from other INI events are also available.

Search seminar archive

Event When Speaker Title Presentation Material
BNRW01 6th August 2007
09:00 to 10:30
Dirichlet process, related priors and posterior asymptotics

We begin with the problem of prior construction on the space of probability measures and motivate the Dirichlet process as a natural candidate. Naively, the process may be constructed on a product space through Kolmogorov consistency theorem, but measure theoretic difficulties arise. To avoid these problems, we describe a construction of the Dirichlet process through countable co-ordinate projections. We discuss several properties of the Dirichlet process such as expectation, variance, posterior conjugacy, self-similarity, support, discreteness, marginal distribution, ties, convergence and the Sethuraman construction. To construct a prior for density estimation, the Dirichlet process may be convoluted with a kernel. Next we turn our attention to the study of consistency and convergence rates of posterior distribution for density estimation. We argue that positive prior probabilities of a neighborhood of the true density defined by the Kullback-Leibler divergence plays a key role in consistency studies. We sketch arguments to show that commonly used priors such as Dirichlet mixtures, Polya trees and Gaussian processes satisfy Kullback-Leibler property under mild restrictions. Additional conditions involving existence of certain tests or bounds for metric entropy are needed for topologies other than the weak topology. Next we study convergence rates of posterior distributions with respect to the Hellinger or the total variation distance. We argue that the rate of concentration of the prior in the Kullback-Leibler type neighborhoods of the true density and growth rate of metric entropies determine the convergence rate. We discuss the example of Dirichlet mixture of normals.

BNRW01 6th August 2007
11:00 to 12:30
Models beyond the Dirichlet process
BNRW01 6th August 2007
14:00 to 15:30
Applications to biostatistics
BNRW01 6th August 2007
16:00 to 17:30
Applications to machine learning
BNRW01 7th August 2007
09:00 to 09:30
Bayesian semiparametric analysis of gene-environment interactionn unde conditional gene-environment independence (Venue: GH seminar RM2)

In case-control studies of gene-environment association with disease, when genetic and environmental exposures can be assumed to be independent in the underlying population, one may exploit the independence in order to derive more efficient estimation techniques than the traditional logistic regression analysis (Chatterjee and Carroll, 2005). However, covariates that stratify the population, such as age,ethnicity and alike, could potentially lead to non-independence. Modeling these stratification effects introduce a large number of parameters in the retrospective likelihood. We provide a novel semiparametric Bayesian approach to model stratification effects under the assumption of gene-environment independence in the control population using a Dirichlet Process Mixture. We illustrate the methods by applying them to data from a population-based case-control study on ovarian cancer conducted in Israel. A simulation study is conducted to compare our method with other popular choices. The results reflect that the semiparametric Bayesian model allows incorporation of key scientific evidence in the form of a prior and offers a flexible, robust alternative when standard parametric modelassumptions for the distribution of the genetic and environmental exposures do not hold.

BNRW01 7th August 2007
09:00 to 10:00
Posterior consisteny of logistic random effect models

We study the posterior consistency of the logistic random effect models. Usual parametric priors are put on the regression coefficients of the fixed effects and a nooparametric prior such as Dirichlet process or Polya tree is put on the distribution of random effects. We give sufficient conditions for the consistency of the joint posterior of the regression coefficients and random effect distribution, and explain how to prove it. Also, we discuss the limitation of the proposed sufficient conditions and possible extensions.

BNRW01 7th August 2007
09:30 to 10:00
M Ruggiero Bayesian countable representation of some population genetics diffusions (Venue: GH seminar RM2)

The Fleming-Viot processes are probability-measure-valued diffusions which arise as large population limits of a wide class of population genetics models. In a few formulations their stationary distribution is known to be either the Dirichlet process or a functional of the Dirichlet process, but the connections with Bayesian statistics are still to be explored. This work provides several explicit constructions of Fleming-Viot processes in the Bayesian nonparametric framework, and yields a previously unknown stationary distribution. In particular, by means of known and newly defined generalised Pòlya-urn schemes, several types of pure jump particle processes are introduced, describing the evolution in time of an exchangeable population. In each case, the process of empirical measures of the individuals converges in the Skorohod space to a specific Fleming-Viot diffusion, and the stationary distribution is the de Finetti measure of the infinite sequence of individuals. In presence of viability selection the stationary distribution turns out to be the two-parameter Poisson-Dirichlet process.

BNRW01 7th August 2007
10:00 to 11:00
Some good news about nonparametric priors in density estimation

Bayesian nonparametric methods have recently gained popularity in the context of density estimation. In particular, the density estimator arising from the mixture of Dirichlet process (MDP) and the mixture of normalized inverse gaussian process are now commonly exploited in practice. We perform a sensitivity analysis for a wide class of Bayesian nonparametric density estimators by perturbing the prior itself by means of a suitable function. Our findings bring some clear evidence in favor of Bayesian nonparametric density estimators due to a neutralization of the perturbation in the posterior distribution.

BNRW01 7th August 2007
11:30 to 12:30
W Johnson Semi-parametric survival analysis with time dependent covariates

We discuss semi-parametric modeling of survival data with tiime dependent covariates. The traditional Cox model, the Cox and Oakes model, and extensions of the proportional odds model and the accelerated failure time model are all considered. Baseline survival is modeled with a mixture of finite polya trees in each instance. Model selection among semi-parametric families is accomplished using the log pseudo marginal likelihood approach discussed in Geisser and Eddy (1979). Joint modeling of longitudinal and survival data is discussed and compared with fixed versus imputed values for the longitudinal process, using a particular data set.

BNRW01 7th August 2007
14:00 to 15:00
Genetic association studies in the presence of population structure and admixture

There has been considerable discussion among genetics researchers about the impact of population structure on association studies. When one samples from a population made up of subpopulations differing in disease risk and allele frequencies, estimates of disease association with a candidate locus can be exaggerated or attenuated. For example, any allele that occurs more frequently in a subpopulation with higher disease prevalence will potentially show a statistical association with the disease phenotype even if it is not linked to a causative locus. Different approaches have been proposed to address this issue. Attractive among these is that of Pritchard and coauthors (2000, 2001). They proposed modeling the subpopulations, classifying individuals accordingly and essentially pooling resulting stratified inference. Initial and most current methods adopting this approach mainly proceed sequentially and/or parametrically through clustering, classification and inference. We consider a unified semiparametric regression to model appropriately and to integrate out population structure in making association inference within a cohesive Bayesian framework. While this approach is feasible in additional situations (e.g., case-control studies), here we focus on the case of quantitative traits. Effectiveness of the proposed model and related Markov chain Monte Carlo computations is demonstrated via simulated data.

BNRW01 7th August 2007
15:30 to 16:30
Useful priors for covariance operators

Formulating useful priors for covariance operators is a challenging problem. This problem arises when one wishes to perform Bayesian inference with functional data. We discuss the issues and show that common priors for covariance matrices do not extend to operators on infinite dimensional function spaces. A method for constructing priors is described along with some of the mathematical properties of the priors. We show how to compute with these priors and give an application.

BNRW01 7th August 2007
16:30 to 17:00
Bayesian nonparametrics and variance regression-: mixtures of Dirichlet processes and the slice sampler

Mixtures of Dirichlet Processes (MDP) have been widely used as a method of overcoming the discreteness of the Dirichlet Process (DP). The two approaches taken to sample from the Dirichlet measure are: the marginal approach (Escobar and West 1995) where the measure is integrated out within the Gibbs sampler via a clever use of the Polya Urn construction of the DP and the conditional approach (see Ishwaran and Zarepour 2000,2002) which makes use of the infinite sum construction of the DP (see Sethuraman 1994). The ways around this infinite sum construction are either approximations or truncations (Ishwaran and Zarepour 2000) or using the retrospective sampler (see Papaspiliopoulos and Roberts 2005). The retrospective sampler deals with the infinite sum directly, via use of reversible jump steps. We introduce a simpler sampler, which instead of using reversible jumps, introduces an auxiliary variable and incorporates the slice sampler within the construction of the posteriors for the Gibbs sampler (see P. Damien, J. Wakefield, S.G.Walker 1999). The new algorithm works with the infinite sum construction of the DP from the very beginning and by introducing auxiliary variables the Gibbs sampler updating is done within finite sets.

BNRW01 7th August 2007
17:00 to 17:30
Canonical representations for dependent Dirichlet populations

A wide class of Markov processes having a Ferguson-Dirichlet stationary measure is characterized by solving the following problem: fix the eigenfunctions of the generator to be orthogonal polynomials; how do all possible eigenvalues look like? Such a representation reveals a strong connection with the classical Lancaster problem of finding the correlation structure of bivariate distributions with fixed marginals. A similar representation is shown to hold for processes on the discrete simplex with Multinomial-Dirichlet stationary measure. The connection between the two classes of stochastic processes has a strong Bayesian flavour, which stems from a probabilistic derivation of all Multivariate orthogonal polynomials involved.

BNRW01 7th August 2007
17:30 to 18:30
Minimally informative nonparametric Bayesian procedures

We address the problem of how to conduct a minimally informative nonparametric Bayesian analysis. The central question is how to devise a model so that the posterior distribution satisfies a few basic properties. In order to satisfy these properties, the concept of local mass emerges, and the limiting Dirichlet process (or limdir) model is constructed. The notion of local mass suggests that nonparametric prior distributions be constructed in a different fashion than is typical. Use of the limdir model is illustrated for one-way analysis of variance with a pair of data sets. Consistency issues in this context are addressed.

BNRW01 8th August 2007
09:00 to 09:30
On the posterior structure of NRMI (Venue: GH seminar RM2)

will present results on the posterior structure of normalized random measures with independent increments (NRMI). Such results are essential both for the understanding and use of NRMI. Given these, their implementation in hierarchical mixture models is trivial. Moreover, one should keep in mind that the usefulness of discrete nonparametric priors is not limited to mixture models.

Joint work with Lancelot F. James (Hong Kong University of Science and Technology) and Antonio Lijoi (University of Pavia).

BNRW01 8th August 2007
09:00 to 10:00
Semiparametric inference for the accelerated failure time model using hierarchical mixtures with generalised gamma processes

We adopt a Bayesian semiparametric approach for an accelerated failure time model, when the error distribution is a mixture of parametric densities on the positive reals with a (normalized) generalized gamma process (Brix, 1999) as mixing measure. This class of mixtures encompasses the Dirichlet process mixture (DPM) model, but it is more flexible in the detection of clusters in the data, as far as density estimation is concerned. Markov chain Monte Carlo techniques will be used to estimate the predictive distribution of the survival time, along with the posterior distribution of the regression parameters, for real and simulated datasets.

BNRW01 8th August 2007
09:30 to 10:00
Semiparametric bayes joint modeling with functional predictors (Venue: GH seminar RM2)

We consider the problem of semiparametric Bayes joint modeling of predictors and a response variable, with a particular emphasis on functional predictors. Parametric models for the predictor and response are coupled through a joint distribution for subject-specific predictor and response coefficients. This joint distribution is assigned a flexible mixture prior, which allows the response distribution within predictor clusters to be unknown. To avoid label ambiguity and accelerate computation, we propose a combined sequential updating and Gibbs sampling algorithm for posterior computation. The methods are applied to data on women's weight gain during pregnancy and birth weight.

BNRW01 8th August 2007
10:00 to 11:00
M De Iorio A DDP model for survival regression

We develop a Dependent Dirichlet Process model for survival analysis data. A major feature of the proposed approach is that there is no necessity for resulting survival curve estimates to satisfy the ubiquotous proportional hazards assumption. An illustration based on a cancer clinical trial is given where survival probabilities for times early in the study are estimated to be lower for those on a high dose treatment regimen than for those on the low dose treatment, while the reverse is true later for later times, possibly due to the toxic effect of the high dose for those who are not as healthy at the beginning of the study.

BNRW01 8th August 2007
11:30 to 12:30
Hybrid Dirichlet processes for functional data

I discuss Bayesian modeling of random effects and clustering in functional data; examples include functional regression and spatial data with individual heterogeneity. In the recent years there has been an enormous growth of interest in statistical applications of Bayesian nonparametric procedures for modeling heterogeneity and clustering structures in the data. Indeed, the Dirichlet process, and more generally species sampling priors, have revealed extremely fruitful for modeling the clustering allocation or the appearance of new clusters, or species, in samples from a population of (potentially infinite) species. Each individual is allocated in one of the observed species or in a new one according to a probability law which is implicit in the choice of the prior (e.g., by the well known Polya urn scheme for the Dirichlet process). However, for functional data this implies that a new species is envisaged even if the curve differs from the previously observed ones only for some coordinates. This often produces as many species as the sample size, thus defying the clustering purposes of the model. Instead, a more effective description of the data could be obtained by allowing hybrid species, where portions of the curves may belong to different species. This can model local mutations of the curves from one species to a new one. In other words, the Dirichlet process implies a probability law on global random partitions, while for multivariate or functional data new notions of dependent local partitions arise.

In my talk, I will first consider (finite or infinite) mixture models of Gaussian processes where the mixing distribution is a (finite-dimensional) functional Dirichlet process. However, functional Dirichlet processes imply a global effect for the mixing variables that induce the clustering, while, for functional data, local random effects along the curve are often more sensitive. Thus, we propose a generalized family of species sampling priors well suited for modeling hybrid species and local random partitions. This class of priors is appealing in that it provides a general and directly interpretable Bayesian mixture model for functional data, including as special cases models with global or local effects proposed very recently in the literature. Theoretical properties of the proposed priors will be developed, including a weak limit result for the finite-dimensional case. Applications to simulated data and image classification will illustrate the procedure.

BNRW01 8th August 2007
14:00 to 15:00
A Simoni Regularised posteriors in linear ill-posed inverse problems

The aim of the paper is to obtain a solution for a signal-noise problem, namely we want to make inference on an unobserved infinite dimensional parameter through a noisy indirect observation of this quantity. The parameter of interest appears as the solution of an ill-posed inverse problem. We place us in a Bayesian framework so that the parameter of interest is a stochastic process and the solution to the inference problem is the posterior distribution of such a parameter. We define and propose an easy way to identify the posterior distribution on a functional space, but due to the infinite dimension of our problem it is only possible to compute a regularized version of it. Furthermore, under some regularity condition of the true value of the parameter, we prove "frequentist" consistency of the regularized posterior distribution, but we find that the prior distribution is not able to generate the true value of the parameter satisfying this regularity condition. It perfectly agrees with previous literature and confirms once again the possible prior inconsistency in infinite-dimensional Bayesian experiments already stressed by Diaconis and Freedman (1986). However, the prior distribution that we specify is able to generate trajectories of the parameter of interest very closed to the true value. We also compute sufficient statistics for infinite dimensional parameters. A Monte Carlo simulation confirms goods properties of the proposed estimator.

BNRW01 8th August 2007
15:30 to 16:30
Bayesian semiparametric analysis for a single item maintenance optimis

We address the problem of a finite horizon single item maintenance optimization structured as a combination of preventive and corrective maintenance in a nuclear power plant environment. We present Bayesian semiparametric models to estimate the failure time distribution and costs involved. The objective function for the optimization is the expected total cost of maintenance over the pre-defined finite time horizon. Typically, the mathematical modeling of failure times are based on parametric models. These models fail to capture the true underlying relationships in the data; indeed, under a parametric assumption, the hazard rates are treated as unimodal, which, as shown in this paper, is incorrect. Importantly, assuming an increasing failure rate, as is typically done, we show, is way off the mark in the present context. Since hazard and cost estimates feed into the optimization phase, from a risk management perspective, potentially gross errors, resulting from purely parametric models, can be obviated. We show the effectiveness of our approach using real data from the South Texas Project Nuclear Operating Company (STPNOC) located in Bay City, Texas.

BNRW01 8th August 2007
16:30 to 17:30
Functional data analysis using a levy random fields model for multi-spectra peak identification and classification

We developed a novel approach for assessing proteomic differences between subjects of two treatment groups. Given multiple, high dimensional, proteomic profiles generated by Matrix Assisted Laser Desorption Ionization, Time-of-Flight mass spectrometers (MALDI-TOF MS), we used Bayesian nonparameteric methods to reduce the data to include only biologically relevant information upon which we based classification. We began by implementing a Levy random fields model that extracted pertinent features from individual spectra, and then extended this single spectrum model to incorporate data from multiple spectra. Specifically, we assert that one, m/z and resolution dependent, marked Gamma Process influences every, within-population, multi-modal spectrum and expect random, biological, or measurement error to force spectra to deviate from the process parameters. Under this assertion, a Bayesian hierarchical approach naturally models data quality control variables and peak parameters while leading to posterior predictions of experimental-group status.

BNRW01 8th August 2007
17:30 to 18:30
Bayesian nonparametric modelling with the Dirichlet process regression smoother

In this paper we discuss the problem of Bayesian fully nonparametric regression. A new construction of priors for nonparametric regression is discussed and a specific prior, the Dirichlet Process Regression Smoother, is proposed. We consider the problem of centring our process over a class of regression models and propose fully nonparametric regression models with flexible location structures. Computational methods are developed for all models described. Results are presented for simulated and actual data examples.

BNRW01 9th August 2007
09:00 to 09:30
Bayesian semiparametic cure rate model with an unknown threshold (Venue: GH seminar RM2)

We propose a Bayesian semiparametric model for survival data with a cure fraction. We explicitly consider a finite cure time in the model, which allows us to separate the cured and the uncured populations. We take a mixture prior of a Markov gamma process and a point mass at zero to model the baseline hazard rate function of the entire population. We focus on estimating the cure threshold after which subjects are considered cured. We can incorporate covariates through a structure similar to the proportional hazards model and allow the cure threshold also to depend on the covariates. For illustration, we undertake simulation studies and a full Bayesian analysis of a bone marrow transplant data set.

BNRW01 9th August 2007
09:00 to 10:00
Flexibly modelling conditional distributions in regression

A general methodology for nonparametric regression modelling is proposed based on a mixture-of-experts model extended along two important dimensions. First, the experts are allowed to be heteroscedastic. The standard model with homoscedastic experts is shown to give a poor fit to heteroscedastic data in finite samples, especially when the number of covariates is large. Moreover, with heteroscedastic experts we typically need a lot fewer of them, which is beneficial for interpretation and the efficiency of the inference algorithm. The second main extension is the introduction of variable selection among the covariates in the mean, variance, and in the set of covariates that control the mixture probabilities. The variable selection acts as a self-adjusting mechanism which is a very effective guard against overfitting, and makes fitting of high-dimensional nonparametric models feasible. We also point out a certain type of identification problem that arises with nonparametric experts, and we design the variable selection prior to solve this problem.

BNRW01 9th August 2007
09:30 to 10:00
K Yu Bayesian inference for quantile regression, expectile regression and M-quantile regression (Venue: GH seminar RM2)

Quantile regression, expectile regression and M-quantile regression, including time series based these models, have become popular with wide applications recent years. Based on the authors recent work on Bayesian quantile regression, this talk will outline nonparametric Bayesian inference quantile regression, including quantile autoregression. Moreover, this talk will introduce the idea of Bayesian expectile regression and M-quantile regression.

BNRW01 9th August 2007
10:00 to 11:00
Postulating monotonicity in nonparametric Bayesian regression

Strong structural assumptions, such as constant proportionality between hazard rates in analyzing survival data, or similar proportionality between odds when considering binary responses, are often imposed on the form of the regression function describing the effects of the covariates on a response. This is typically done as a modelling convention and without real support from contextual substantive arguments, evidence coming from earlier studies, or careful diagnostics afterwards. Here we consider one particular way of relaxing such assumptions, by postulating that the dependencies between the considered response and at least some of the covariates are monotonic in an assumed direction. We then consider a class of constructing such models, based on an extension of piecewise constant functions into the multivariate case. Applying Bayesian inference and MCMC, we then illustrate the method by an epidemiological study of some risk factors for cardiovascular diseases.

BNRW01 9th August 2007
11:30 to 12:30
The matrix stick-breaking process: flexible Bayes meta analysis

In analyzing data from multiple related studies, it is often of interest to borrow information across studies and to cluster similar studies. Although parametric hierarchical models are commonly used, a concern is sensitivity to the form chosen for the random effects distribution. A Dirichlet process (DP) prior can allow the distribution to be unknown, while clustering studies. However, the DP does not allow local clustering of studies with respect to a subset of the coefficients without making independence assumptions. Motivated by this problem, we propose a matrix stick-breaking process (MSBP) as a prior for a matrix of random probability measures. Properties of the MSBP are considered, and methods are developed for posterior computation using MCMC. Using the MSBP as a prior for a matrix of study-specific regression coefficients, we demonstrate advantages over parametric modeling in simulated examples. The methods are further illustrated using applications to a multinational bioassay study and to borrowing of information in compressing signals.

BNRW01 9th August 2007
14:00 to 15:00
Gaussian processes for machine learning

The aim of this talk is to give an overview of the work that has been going on in the Machine Learning community with respect to Gaussian process prediction; this may be of particular interest to statisticians who are less familiar with the machine learning literature.

Particular topics to be covered include approximations for inference (e.g. expectation propagation), covariance functions, dealing with hyperparameters, theoretical viewpoints, and approximations for large datasets.

BNRW01 9th August 2007
15:30 to 16:30
Bayesian nonparametric single-index regression

The single-index model provides a flexible approach to nonlinear regression, and unlike many nonparametric regression models, this model is interpretable, easily handles high-dimensional covariates, and readily incorporates interactions among individual covariates. The model is defined by y = g(w'x) + e, where g is an unspecified univariate (ridge) function, x is a p-dimensional covariate, and the regression errors e[1],...,e[n] are assumed to be iid Normal with common variance. Previous research on the single-index model has mainly been limited to issues of point-estimation, where specifically, the focus is to estimate the function g by maximizing a penalized likelihood, with the function defined by splines having a fixed number of knots located at fixed locations of the predictor space. In this talk I will discuss a novel, Bayesian nonparametric approach to the single-index model, where the function g is modeled by linear splines with the number and locations of knots treated as unknown parameters, where random-effect parameters are added to the model to describe the effects of different clusters of observations, and where the variance of the regression error is allowed to change nonparametrically with the value of the covariate. In particular, the random-effects are modeled by a Dirichlet Process centered on a Normal distribution, and the error variances are modeled by a Dirichlet Process centered on a inverse-gamma distribution. Moreover, this new single-index model can readily handle observed dependent variables that are either continuous, binary, ordered-categories, or counts. I will illustrate Bayesian nonparametric single-index models through the analysis of real data of students from secondary schools.

BNRW01 9th August 2007
16:30 to 17:00
A Bayes method for a monotone hazard rate via ${S}$-paths

A class of random hazard rates, which is defined as a mixture of an indicator kernel convoluted with a completely random measure, is of interest. We provide an explicit characterization of the posterior distribution of this mixture hazard rate model via a finite mixture of $\mathbf{S}-paths. A closed and tractable Bayes estimator for the hazard rate is derived to be a finite sum over$\mathbf{S}-paths. The path characterization or the estimator is proved to be a Rao-Blackwellization of an existing partition characterization or partition-sum estimator. This accentuates the importance of $\mathbf{S}-path in Bayesian modeling of monotone hazard rates. An efficient Markov chain Monte Carlo method is proposed to approximate this class of estimates. It is shown that$\mathbf{S}-path characterization also exists in modeling with covariates by a proportional hazard model, and the proposed algorithm again applies.

BNRW01 9th August 2007
17:00 to 17:30
Alternative posterior consisteency results in nonparametric binary regression using Gaussian process priors

We establish consistency of posterior distribution when a Gaussian process prior is used as a prior distribution for the unknown binary regression function. Specifically, we take the work of Ghosal and Roy (2007) as our starting point, and then weaken their assumptions on the smoothness of the Gaussian process kernel while retaining a stronger yet applicable condition about design points. Furthermore, we extend their results to multi-dimensional covariates under a weaker smoothness condition on the Gaussian process. Finally, we study the extent to which posterior consistency can be achieved under a general model structure, when additional hyperparameters in the covariance function of a Gaussian process are involved.

BNRW01 10th August 2007
09:00 to 09:30
Asymptotics for posterior hazards (Venue: GH seminar RM2)

A popular Bayesian nonparametric approach to survival analysis consists in modeling hazard rates as kernel mixtures driven by a completely random measure. A comprehensive analysis of the asymptotic behaviour of such models is provided. Consistency of the posterior distribution is investigated and central limit theorems for both linear and quadratic functionals of the posterior hazard rate are derived. The general results are then specialized to various specific kernels and mixing measures, thus yielding consistency under minimal conditions and neat central limit theorems for the distribution of functionals.

BNRW01 10th August 2007
09:00 to 10:00
FA Quintana Bayesian clustering with regression

We consider clustering with regression, i.e., we develop a probability model for random clusters that is indexed by covariates. The two motivating applications are inference for a clinical trial and for survival of patients with breast cancer. As part of the desired inference we wish to define clusters of patients. Defining a prior probability model for cluster memberships should include a regression on patient baseline covariates. We build on product partition models (PPM). We define an extension of the PPM to include the desired regression. This is achieved by including in the cohesion a new factor that increases the probability of experimental units with similar covariates to be included in the same cluster. We discuss implementations suitable for continuous, categorical, count and ordinal covariates.

BNRW01 10th August 2007
09:30 to 10:00
Bayesian nonparametric methods for prediction in EST analysis (Venue: GH seminar RM2)

Expressed sequence tags (ESTs) analyses are an important tool for gene identification in organisms. Given a preliminary EST survey from a certain cDNA library, various features of a possible additional sample have to be predicted. For instance, interest may rely on estimating the number of new genes to be detected and the gene discovery rate at each additional read. We propose a Bayesian nonparametric approach for prediction in EST analysis based on nonparametric priors inducing Gibbs-type exchangeable random partitions and derive estimators for the relevant quantities. Several EST datasets are analysed by resorting to the two parameter Poisson-Dirichlet process, which represents the most remarkable Gibbs-type prior. Our proposal has appealing properties over frequentist nonparametric methods, which become unstable when prediction is required for large future samples.

BNRW01 10th August 2007
10:00 to 11:00
S Basu Double Dirichlet process mixtures

In this work we consider a new class of Dirichlet process mixtures, that we call the double and multple DPM class, which generates a clustering structure in the data that is different from those generated by simple DPM or other DPM models. Fitting of double and related DPM models is possible by MCMC methods by multiple applications of the standard Polya urn and blocked Gibbs samplers within each sweep of the sampling. Based on experimental investigations we show that the proposed model performs reasonably well when the model is correctly specified and when the model is misspecified. We also investigate the similarity between the clustering produced by the model fit and the true clustering. Finally, we consider model comparison and model diagnostics, and illustrate the implementation, performance and applicability of the proposed class of DPM models in regressions for survival data and clustered longitudinal data.

BNRW01 10th August 2007
11:30 to 12:30
Normalised kernel-weighted random measures

This talk discusses a wide class of probability measure-valued processes to be used as nonparametric priors for problems with time-varying, spatially-varying or covariate-dependent distributions. They are constructed by normalizing correlated random measures, which are stationary and have a known marginal process. Dependence is modelled using kernels (a method that has become popular in spatial modelling). The ideas extend Griffin~(2007), which used an exponential kernel in time series problems, to arbitrary kernel functions. Computational issues will be discussed and the ideas will be illustrated by examples in financial time series.

Griffin, J. E. (2007): The Ornstein-Uhlenbeek Dirichlet Process and other measure valued processes for Bayesian inference,'' Technical Report, University of Warwick.

BNRW01 10th August 2007
14:00 to 15:00
Convexification and multimodality of random probability measures

In this work we develop and describe a new class of nonparametric prior distributions on the subspace of the random multivariate distributions. Our methodology is based on a variant of Khinchin's representation theorem for unimodal distributions extended to multimodal multivariate cases. Results using our approach in a bivariate setting with a random draw from a Dirichlet process are presented.