# Seminars (SCB)

Videos and presentation materials from other INI events are also available.

Search seminar archive

Event When Speaker Title Presentation Material
SCBW01 30th October 2006
10:05 to 11:00
Now you know your ABCs: examples and problems
SCBW01 30th October 2006
11:30 to 12:30
T Pettitt From doubly intractable distributions via auxiliary variables to likelihood free inference
SCBW01 30th October 2006
14:00 to 15:15
D Balding Some developments of ABC

Approximate Bayesian computation based on parameter sets simulated under the prior that are accepted or rejected according to whether a simulated dataset resembles the observed data, has become a widely-used tool in population genomic studies since Pritchard et al (1999), and its use is growing in other areas. Developments of the basic idea have involved regression adjustment of the accepted values to mitigate the effects of discrepancies between simulated and observed datasets (Beaumont et al 2002) and embedding the approximation within a Metropolis-Hastings algorithm to create "likelihood-free" MCMC. We review these and more recent developments, for example based on sequential Monte Carlo and various adaptive simulation schemes.

SCBW01 30th October 2006
15:45 to 17:00
A Frigessi Estimating functions in indirect inference

There are models for which the evaluation of the likelihood is infeasible in practice. For these models the MetropolisHastings acceptance probability cannot be easily computed. This is the case, for instance, when only departure times from a G/G/1 queue are observed and inference on the arrival and service distributions are required. Indirect inference is a method to estimate a parameter theta in models whose likelihood function does not have an analytical closed form, but from which random samples can be drawn for fixed values of theta. First an auxiliary model is chosen whose parameter beta can be directly estimated. Next, the parameters in the auxiliary model are estimated for the original data, leading to an estimate betahat. The parameter beta is also estimated by using several sampled data sets, simulated from the original model for different values of the original parameter theta. Finally, the parameter theta which leads to the best match to betahat is chosen as the indirect inference estimate. We analyse which properties an auxiliary model should have to give satisfactory indirect inference. We look at the situation where the data are summarized in a vector statistic T, and the auxiliary model is chosen so that inference on â is drawn from T only. Under appropriate assumptions the asymptotic covariance matrix of the indirect estimators is proportional to the asymptotic covariance matrix of T and componentwise inversely proportional to the square of the derivative, with respect to theta, of the expected value of T. We discuss how these results can be used in selecting good estimating functions. We apply our findings to the queuing problem.

SCBW01 31st October 2006
09:00 to 10:00
Population-based MC for sampling trans-dimensional Bayesian regression models
SCBW01 31st October 2006
10:00 to 11:00
Sequentially interacting Markov Chain Monte Carlo

We introduce a methodology to sample from a sequence of probability distributions and estimate their unknown normalizing constants. This problem is traditionally addressed using Sequential Monte Carlo (SMC) methods which rely on importance sampling/resampling ideas. We design here an alternative iterative algorithm. This algorithm is based on a sequence of interacting Markov chain Monte Carlo (MCMC) algorithms. We establish the convergence of this non-Markovian scheme and demonstrate this methodology on various examples arising in Bayesian inference.

(This is a joint work with Anthony E. Brockwell, Department of Statistics, Carnegie Mellon)

SCBW01 31st October 2006
11:30 to 12:30
O Cappe Adaptive population Monte Carlo

In (Cappé et al., 2004) we proposed a simulation scheme termed Population Monte Carlo which may be viewed as an Iterated Importance Sampling approach, with resampling and Markovian instrumental simulations. This scheme also is a particular case of the Sequential Monte Carlo Sampling approach of (Delmoral et al., 2006). I will discuss the Population Monte Carlo approach, focussing on the case where the target distribution is held fixed and the importance kernels are adapted during the iterations in order to optimize a performance criterion. In the case where the importance kernel is composed of a mixture of fixed kernels, the mixture weights can be adapted using update rules which are remarkably stable and have interesting connections with the Expectation-Maximization algorithm.

This talk is based on work done with (or by) several of my colleague in Paris - Arnaud Guillin, Jean-Michel Marin, Christian P. Robert (Cérémade) and Randal Douc (École Polytechnique) - as well as on ideas related to the ECOSSTAT project.

SCBW01 31st October 2006
14:00 to 15:15
Deterministic alternatives to MCMC

MCMC provides a powerful set of tools for inference in Bayesian models. However, for many applications to large scale problems, MCMC methods can be relatively inefficient compared to new deterministic approximations developed in the machine learning community. I will describe several modern and generally applicable deterministic algorithms for approximate inference, and mention their advantages and disadvantages compared to MCMC. To focus the talk, I will describe all algorithms in the context of inference in Dirichlet process mixtures (DPMs), a classical non-parametric Bayesian statistical model used to define mixture models with countably infinitely many components. In particular, I will cover the following algorithms for inference in DPMs: (1) the traditional Gibbs sampling MCMC algorithm, (2) Variational Bayesian (VB) approximations, (3) the Expectation Propagation (EP) algorithm, and (4) a new approximate inference method for DPMs based on Bayesian hierarchical clustering (BHC). All these algorithms provide different speed / accuracy tradeoffs for large-scale problems, and the underlying concepts can be applied to virtually any statistical model. My conclusion is the following: MCMC is but one of many ways of approaching intractable inference problems, and modern statistics is likely to benefit by broadening the toolbox to include novel inference methods arising from other communities.

Joint work with: Matthew Beal (U Buffalo), Katherine Heller (UCL), and Tom Minka (Microsoft).

SCBW01 1st November 2006
09:00 to 10:00
Applications of extended ensemble Monte Carlo

"Extended Ensemble Monte Carlo" is proposed as a generic term which indicates methods such as Parallel Tempering and Multicanonical Monte Carlo. These methods sample a composition or extension of given distributions, and provide a simple and practical way to attack slow mixing and ergodicity problems. The purpose of the talk is not to present a novel algorithm, but explore variations and applications of the algorithms, including their use in statistical physics, combinatorics, and rare event sampling. It is stressed that overcoming slow mixing is a key for the extension of application fields of MCMC.

SCBW01 1st November 2006
10:00 to 11:00
Sequential Monte Carlo for Generalized Linear Mixed Models

Sequential Monte Carlo methods for static problems combine the advantages of importance sampling and Markov chain based methods. We demonstrate how to use these exciting new techniques to fit generalised linear mixed models. A normal approximation to the likelihood is used to generate an initial sample, then transition kernels, reweighting and resampling result in evolution to a sample from the full posterior distribution. Since the technique does not rely on any ergodicity properties of the transition kernels, we can modify these kernels adaptively, resulting in a more efficient sampler.

SCBW01 1st November 2006
11:30 to 12:30
T Johnson A sequential importance sampler for reconstructing genetic pedigrees
SCBW01 1st November 2006
14:00 to 15:15
Retrospective sampling

Fuelled by the proliferation of large and complex data sets, statistical modelling has become increasingly ambitious. In fact, the use of models parameterised by an infinite number of parameters or latent variables is increasingly common. This is particularly appealing when models are most naturally formulated within an infinite dimensional setting, for instance continuous time-series.

It is well-understood that to obtain widely applicable statistical methodology for flexible families of complex models, requires powerful computational techniques (such as MCMC, SMC, etc). Unfortunately, infinite dimensional simulation is not usually feasible, so that the use of these computational methods for infinite-dimensional models is often only possible by adopting some kind of finite dimensional approximation to the chosen model. This is unappealing since the impact of the approximation is often difficult to assess, and the procedure often involves essentially using an approximate finite-dimensional model.

The talk will discuss a general technique for simulation which attempts to work directly with infinite dimensional random variables without truncation nor approximation. The talk is illustrated by concrete examples from the simulation of diffusion processes and Bayesian inference for Dirichlet mixture models. One surprising feature of the methodology is that exact infinite-dimensional algorithms are commonly far more efficient than approximate models

SCBW01 1st November 2006
15:45 to 17:00
C Andrieu The expected auxiliary variable method for Monte Carlo simulation

The expected auxiliary variable method is a general framework for Monte Carlo simulation in situations where quantities of interest are untractable and prevent the implementation of classical methods. The method finds application in situations where marginal computations are of interest, transdimensional move design is difficult in model selection setups, when the normalising constant of a particular distribution is unknown but required for exact computations. I will present several examples of applications of this principle as well as some theoretical results that we have recently obtained in some specific scenarios.

SCBW01 2nd November 2006
09:00 to 10:00
Extensions of the CE method for statistical analysis

The Cross-Entropy method is a new Monte Carlo paradigm pioneered by Rubinstein (1999) in Operation Research. Its primary applications are (i) the calculation of probability of rare events, and (ii) the optimisation of irregular, multi-modal functions. While these two objectives seem to have a little in common, the CE approach manages to express them in a similar framework. In this talk, we will explain how Statistics can benefit from the CE method, and how the CE method can also benefit in turn from Statistical methodology. We will discuss the following particular applications in Statistics: Monte-Carlo p-values, simulation of truncated distributions, variable selection, and mixture estimation. We will see that in each case CE provides significant improvements over current methods. Interestingly, we will see also vanilla CE rarely works directly, but tandard tools from Statistical Inference allow for developing more efficient algorithms. In particular, we will discuss a CE-EM algorithm for mixture estimation, which outperform any straight CE or EM algorithm, in terms, of finding higher modes of the likelihood function.

SCBW01 2nd November 2006
10:00 to 11:00
Nested sampling

Nested sampling is a new Monte Carlo algorithm invented by John Skilling. Whereas most Monte Carlo methods aim to generate samples from a posterior or to estimate posterior expectations, nested sampling's central aim is to evaluate the evidence (the normalizing constant, also known as the marginal likelihood or partition function). This important quantity can be computed by standard Monte Carlo methods (such as Gibbs sampling) only by adding extra computations (such as reversible-jump Monte Carlo or thermodynamic integration) which require careful tuning.

I will review nested sampling and describe tests of the method on graphical models.

(Joint work with Iain Murray and John Skilling)

SCBW01 2nd November 2006
11:30 to 12:30
Sequential Monte Carlo methods: can we replace the resampling step?

Sequential Monte Carlo methods provide reliable approximations of the conditional ditribution of a certain signal process given the data obtined from an associated observation process. The generic SMC method involves sampling from the prior distribution of the signal and then using a weighted bootstrap technique (or equivalent) with weights defined by the likelihood of the most recent observation data. If the number of updating stages becomes large, the repeated application of the the weighted bootstrap may lead to what the literature describes as "impoverished sample" or "sample attrition". That means that the sample being carried forward will have fewer and fewer distict values. In this talk, I propose a method that attempts to solve this problem for the continuous time filtering problem. The method replaces the sampling from the prior step with sampling from a distribution that depends on the entire (existing) sample and the most recent observation data and it does not contain the weighted bootstrap step. The method is motivated by recent advances in the area of McKean-Vlasov representations for solutions of stochastic PDEs and their application to solving the filtering problem in a continuous time framework.

SCBW01 2nd November 2006
14:00 to 15:15
Importance sampling for diffusion processes

In this talk I will discuss various techniques for constructing importance sampling estimators which approximate expectations (e.g the likelihood function, filtering densities, etc) when modelling with diffusion processes.

SCBW01 2nd November 2006
15:45 to 17:00
D Wilkinson Bayesian inference for nonlinear multivariate diffusion processes

In this talk I will give an overview of the problem of conducting Bayesian inference for the fixed parameters of nonlinear multivariate diffusion processes observed partially, discretely, and possibly with error. I will present a sequential strategy based on either SIR or MCMC-based filtering for approximate diffusion bridges, and a "global" MCMC algorithm that does not degenerate as the degree of data augmentation increases. The relationship of these techniques to methods of approximate Bayesian computation will be highlighted.

SCBW01 3rd November 2006
09:00 to 10:00
D Frenkel Configurationally-Biased MC and Virtual-move parallel tempering

Parallel tempering is a Monte Carlo scheme that was introduced by Geyer in 1991 as a tool to boost the efficiency of Monte Carlo sampling Mendelian genetics. The technique has found widespread application in statistical mechanics. I will describe how, in some cases, the efficiency of the method may be increased by including information about MC trial moves that are excluded from the Markov chain. If time allows, I will discuss Biased Monte Carlo schemes that allow us to sample the conformations of composite objects, such as polymers.

SCBW01 3rd November 2006
10:00 to 11:00
Branching process Monte Carlo

This talk is an exploration of the possible role for branching processes and related models in Monte Carlo simulation from a complex distribution, such as a Bayesian posterior. The motivation is that branching processes can support antithetic behaviour in a natural way by making offspring negatively correlated, and also that branching paths may assist in navigating past slowly-mixing parts of the state space. The basic theory of branching processes as used for sampling is established, including the appropriate analogue of global balance with respect to the target distribution, evaluation of moments, in particular asymptotic variances, and a start on the spectral theory. Although our model is a kind of 'population Monte Carlo', it should be noted that it has virtually nothing to do with particle filters, etc. Our target is not sequentially evolving, and we rely on ergodicity for convergence of functionals of the target distribution, rather than using importance sampling.

This is joint work with Antonietta Mira (University of Insubria, Varese, Italy).

SCBW01 3rd November 2006
14:00 to 15:15
Sampling the energy landscape: thermodynamics and rates

Stationary points of the potential energy surface provide a natural way to coarse-grain calculations of thermodynamics and kinetics, as well as a framework for basin-hopping global optimisation. Thermodynamic properties can be obtained from samples of local minima using the basin-sampling approach, and kinetic information can be extracted if the samples are extended to include transition states. Using statistical rate theory a minimum-to-minimum rate constant can be associated with each transition state, and phenomenological rates between sets of local minima that define thermodynamic states of interest can be calculated using a new graph transformation approach. Since the number of stationary points grows exponentially with system size a sampling scheme is required to produce representative pathways. The discrete path sampling approach provides a systematic way to achieve this objective once a single connected path between products and reactants has been located. In large systems such paths may involve dozens of stationary points of the potential energy surface. New algorithms have been developed for both geometry optimisation and making connections between distant local minima, which have enabled rates to be calculated for a wide variety of systems.

SCBW01 3rd November 2006
15:45 to 17:00
Randomized quasi-Monte Carlo for Markov Chains

Quasi-Monte Carlo (QMC) methods are numerical techniques for estimating large-dimensional integrals, usually over the unit hypercube. They can be applied, at least in principle, to any stochastic simulation whose aim is to estimate a mathematical expectation. This covers a wide range of applications. Practical error bounds are hard to obtain with QMC but randomized quasi-Monte Carlo (RQMC) permits one to compute an unbiased estimator of the integral, together with a confidence interval. RQMC can in fact be viewed as a variance-reduction technique.

In this talk, we review some key ideas of RQMC methods and provide concrete examples of their application to simulate systems modeled as Markov chains. We also present a new RQMC method, called array-RQMC, recently introduced to simulate Markov chains over a large number of steps. Our numerical illustrations indicate that RQMC can dramatically reduce the variance compared with standard Monte Carlo.

SCBW03 20th November 2006
10:05 to 11:00
A Frigessi Investigating the spread of infectious salmon anemia in Atlantic salmon farming: a stochastic space-time model

Infectious salmon anemia is an infectious disease of farmed salmon. The first outbreak was in Norway in 1984. Control strategies have not yet succeeded in controlling the spread in Norway and North America. The purpose of this research was to investigate the relative importance of teh main risk factore associated with different routes of transmission. We study proximity to an infectious farm, measured by distance and contact network, and the amount of biomass at farm sites. We allow for a further un-identified transmission route, possibly representing boat traffic or infected smolt. We suggest a stochastic space-time model for the disease along the farm sites of the norwegian coast. We analyse data betweenn 2000 and 2005, containig 73 cases and about 1100 farm sites. We shall present the model and preliminary results.

This is joint work with Ida Scheel (University of Oslo), Magne Aldrin (NR Norwegian Computing Centre) and Peder A. Jansen (The Norwegian Veterinary Institute).

SCBW03 20th November 2006
11:30 to 12:30
RB O'Hara Estimation of births deaths and immigration from mark-recapture data

The analysis of mark-recapture data is undergoing a period of development and expansion.Here we contribute to that by presenting a model which include both births and immigration,as well as the usual deaths.Data come from a long-term study of the willow tit (Parus montanus ), where we can assume that all births are recorded,and hence immigrants can also be identified. We model the rates of immigration,birth rare per parent,and death rates of juveniles and adults. Using a hierarchical model allows us to incorporate annual variation in these parameters.The model is fitted to the data using MCMC,as a Bayesian analysis. In addition to the model fitting,we also check several aspects of the model fit,in particular whether survival varies with age or immigrant status,and whether capture probability is affected by previous capture history.The latter check is important,as independence of capture histories is a key assumption that simplifies the model considerably.Here we find that the capture probability depends strongly on whether the individual was captured in the previous year. Our work moves MRR modelling closer to a description of the dynamics of the whole population,with the obvious potential for prediction,and use in making decisions about population management.

SCBW03 20th November 2006
14:00 to 15:15
Recent advances in statistical ecology using computationally intensive methods

Computationally intensive methods are becoming increasing popular within statistical ecology for analysing complex stochastic systems. Particular attention will focus on capture-recapture (and/or tag-recovery) data. We will concentrate on the use of Bayesian methods within this area and the (reversible jump) Markov chain Monte Carlo algorithm, for exploring the posterior distribution of interest. A number of issues will be discussed, including model discrimination and model-averaging, incorporating individual heterogeneity and dealing with missing data. Real data sets will be considered, illustrating the application and implementation of these methods and demonstrating the increased understanding of the systems obtained through the analysis. Areas of continuing and future research will also be discussed.

SCBW03 20th November 2006
15:45 to 17:00
ST Buckland Embedding population dynamics models in inference

Increasing pressures on the environment are generating an ever-increasing need to manage animal and plant populations sustainably, and to protect and rebuild endangered populations. Effective management requires reliable mathematical models, so that the consequences of management action can be predicted, and the uncertainty in these predictions quantified. These models must be able to predict the response of populations to anthropogenic change, while handling the major sources of uncertainty. We describe a simple building block approach to formulating discrete-time models. These models may include demographic stochasticity, environmental variability through covariates or random effects, multi-species dynamics such as in predator-prey and competition models, movement such as in metapopulation models, non-linear effects such as density dependence, and mating models. We discuss methods for fitting such models to time series of data, and quantifying uncertainty in parameter estimates and population states, including model uncertainty, using computer-intensive Bayesian methods.

SCBW03 21st November 2006
09:00 to 10:00
Use of Monte Carlo particle filters to fit and compare models for the dynamics of wild animal populations
SCBW03 21st November 2006
10:00 to 11:00
Do wandering albatrosses really perform Levy flights when foraging?

We examine the hypothesis that wandering albatrosses (Diomedea exulans) undergo Levy flights when roaming the skies in search of oceanic food sources. Levy flights are random walks whose step lengths are taken from a distribution with infinte variance, such as a power-law. The Levy flight consequently has no typical scale, and this has been interpreted as being an efficient way of searching for food on the ocean surface. We first re-analyse the original data that were used to infer Levy flights. These data come from wet/dry loggers that record the time periods for which the birds were airborne or on the ocean surface. We cast doubt as to whether these data are sufficient to conclude Levy flight behaviour. This prompts us to analyse recent data from birds fitted with much higher resolution wet/dry loggers. We find that the widely-held Levy flight hyopthesis can be refuted by the newer data. We will also briefly discuss other data sets and ecological questions arising from the unique Antarctic environment.

SCBW03 21st November 2006
11:30 to 12:30
Covariate information in complex event history data - some thoughts arising from a case study

The motivation behind this talk comes from considering epidemiological follow-up data for the purpose of studying the role of various risk factors of cardiovascular diseases. Commonly in such studies the statistical analysis is based on a hazard regression model where the covariates (e.g. blood pressure, cholesterol level, or body mass index) are measured only at the baseline. In addition to considering such more traditional risk factors, it is becoming increasingly common to try and assess also the role of some genetic factors contributing to the aetiology of such diseases, and then usually restricting the analysis to certain candidate loci that are potentially causative on the basis of the available information about their function. In principle, the corresponding causal mechanisms can involve pathways that are direct in the sense that they influence, in the postulated model structure, directly the outcome variable, or indirect in that their effect on the outcome is mediated via the levels of the measured risk factors.

SCBW03 21st November 2006
14:00 to 15:15
A general space-time growth-interaction process for inferring and developing structure from partial observations

Not only have marked point processes received relatively little attention in the literature, but most analyses ignore the fact that in real life spatial structure often develops dynamically through time. We therefore develop a computationally fast and robust spatial-temporal process, based on stochastic immigration-death and deterministic growth-interaction. For this enables both single and multiple snap-shot marked point process data to be studied in considerable depth. Combining logistic and linear growth with (symmetric) disc-interaction and (asymmetric) area-interaction generates a wide variety of mark-point spatial structures. A maximum psuedo-likelihood approach is developed for parameter estimation at fixed times, and a least squares procedure for parameter estimation based on multiple time points.

A related problem in spatial statistics and stochastic geometry concerns the modelling and statistical analysis of hard particle systems involving discs or spheres. For successively filling remaining empty structure leads to a limiting maximum packing pattern whose structure depends on the given characteristics of the particles. Using our process to develop such patterns extends current methods, since a newly arrived particle is not immediately rejected if it does not fit into a specific gap, but can change size to adapt to the interaction pressure placed on it.

SCBW03 22nd November 2006
09:00 to 10:00
C Gilligan Parameter estimation for spatio-temporal models of botanical epidemics
SCBW03 22nd November 2006
10:00 to 11:00
T Kypraios Roubst MCMC algorithms for Bayesian inference in stochastic eipdemic models

In general, inference problems for disease outbreak data are complicated by the facts that (i) the data are inherently dependent and (ii) the data are usually incomplete in the sense that the actual process of infection is not observed. We adopt a Bayesian approach and apply Markov Chain Monte Carlo (MCMC) methods in order to make inference for the parameters of interest (such as infection and removal rates). We show that once the size of the data set ncreases, the standard methods perform poorly. Therefore, apart from centered reparameterisation we extend the Non-Centered and partially Non-Centered algorithms presented in Neal and Roberts (2005). Finally, we adopt a fully Bayesian approach to analyze the Foot-and-Mouth disease occurred in 2001 in the UK and also discus modelling approaches for a potential Avian Influenza outbreak in the poultry industry of the UK.

SCBW03 22nd November 2006
11:30 to 12:30
The persistence of measles: from the schoolyard to sub-saharan Africa
SCBW03 22nd November 2006
14:00 to 15:15
Bayesian experimental design with Stochastic epidemic models

Inference and parameter estimation for stochastic epidemic models has been greatly facilitated by Bayesian methods and associated computational techniques such as Markov chain Monte Carlo. The question of how experiments should be designed ­ e.g. how populations should be sampled in space and time - to maximise the insights gained from these analyses is now being considered. This talk will describe how the Bayesian approach to experimental design, originally due to Muller, can be applied in the context of nonlinear stochastic epidemic models. In this approach, the design itself is treated as a random quantity. A distribution, which depends fundamentally on the utility of the design, is assigned to model parameters, experimental outcome and experimental design jointly. The design which is optimal, in terms of having the highest expected utility, corresponds to the mode of the design marginal distribution. We will demonstrate how, by using approximations to parameter likelihoods based on moment closure methods, it is computationally feasible to implement this approach to design experiments in practically relevant situations. In particular, we use the methods to explore possible designs for microcosm experiments on epidemics of fungal pathogens in plant communities.

SCBW03 22nd November 2006
15:45 to 17:00
Small worlds and giant epidemics

Key problems for models of disease spread relate to threshold, velocity of spread, final size and control. All these aspects depend crucially on the network structure of individual interactions.

Networks of interest range from the highly localised case, where interactions are only between near neighbours, to the opposite global extreme where all interact equally with all, so that a disease can spread much more quickly through the population. Understandably, there has been much recent interest in `small-world' and meta-population models, in which a relatively small number of long-distance connections can change a network from local to effectively global. Such models seem particularly relevant to the changed patterns of human and animal diseases in a world whose connectivity, in terms of both travel and trade, has increased hugely in recent decades.

In consequence, a number of different mathematical and statistical approaches have been developed recently that focus on networks. I shall discuss the strengths and weaknesses of some of these approaches, with examples drawn from both human and animal diseases, susch as SARS, Foot and Mouth disease and avian flu. I shall also discuss the wider implications, as illustrating what mathematics can and cannot do in helping us predict and control disease outbreaks.

SCBW03 23rd November 2006
09:00 to 10:00
A probabilistic test of the neutral model

A neutral model of community dynamics has been built using the hierarchical Bayesian framework and fitted with Markov Chain Monte Carlo methods to three community datasets. To fit the data well, the model would need parameter values that are impossible. This suggests that variation in species abundances cannot be explained solely by random drift between species as suggested by the neutral model.

SCBW03 23rd November 2006
10:00 to 11:00
Estimating mixing between subpopulations using respondent driven sampling

It is widely acknowledged that the level of mixing within a population plays an important role in the transmission dynamics of infectious diseases. However, obtaining information on mixing is notoriously difficult. Respondent-driven sampling (RDS), a kind of chain-referral sampling, is becoming an increasingly popular approach of sampling 'hidden' populations, such as injection drug users and men who have sex with men. RDS involves giving study participants a small number of coupons to give to other potential participants who are their friends or acquaintances. As a side-effect of the recruitment process, RDS provides information on mixing between different populations and, by asking individuals about their relationship to the person that recruited them, the extent of overlap between social and sexual networks. Current analytical techniques treat the recruitment process as a Markov chain, which is inappropriate as individuals may recruit more than one individual. We show how stochastic context-free grammars (SCFGs) can be used to model the tree-like recruitment process, which allows us to test for non-random mixing between subpopulations (e.g. infected/uninfected), for independence of characteristics between recruitees of a given recruiter, and for differences in patterns of mixing between different populations. We discuss the similarity of the recruitment process to a multitype branching process and a stochastic susceptible-infected epidemiological model.

SCBW03 23rd November 2006
11:30 to 12:30
Modeling tuberculosis in areas of high HIV prevalence

We describe a discrete event simulation model of tuberculosis (TB) and HIV disease, parameterized to describe the dual epidemics in Harare, Zimbabwe. TB and HIV are the leading causes of death from infectious disease among adults worldwide and the number of TB cases has risen significantly since the start of the HIV epidemic, particu-larly in Sub-Saharan Africa, where the HIV epidemic is most severe. There is a need to devise new strategies for TB control in countries with a high prevalence of HIV. This model has been designed to investigate strategies for reducing TB transmission by more efficient TB case detection. The model structure and its validation are discussed.

SCBW03 23rd November 2006
14:00 to 15:15
A Ganesh Epidemics on graphs: thresholds and curing strategies

We consider the contact process (SIS epidemic) on finite undirected graphs and study the relationship between the expected epidemic lifetime, the infection and cure rates, and properties of the graph. In particular, we show the following: 1) if the ratio of cure rate to infection rate exceeds the spectral radius of the graph, then the epidemic dies our quickly. 2) If the ratio of cure rate to infection rate is smaller than a generalisation of the isoperimetric constant, then the epidemic is long-lived. These results suffice to establish thresholds on certain classes of graphs with homogeneous node degrees. In addition, we obtain thresholds for epidemics on power-law graphs. Finally, we use these techniques to study the efficacy of different schemes for distributing curing resources among the nodes.

SCBW03 23rd November 2006
15:45 to 17:00
Building and fitting models of host-virus interaction

We treat a panmictic host population interacting with a virus. The virus is transmitted both horizontally and vertically. We modify the Moran model to describe the stochastic dynamics of individual host and viral lineages. For a sample of individuals from the population, the model gives rise to a branching and coalescing graph that contains the combined host and viral genealogies as a subgraph. The associated diffusion process, obtained in the limit of large host population, is related to the Neuhauser-Krone selection graph process.

We consider two study populations: cougars infected with FIV and a UK cohort of HIV patients. We fit the joint host-virus process to viral sequence data and known host pedigrees (which are trivial in the human case). We use MCMC to average over the variable dimension parameter space of labeled graphs.

SCBW03 24th November 2006
09:00 to 10:00
Uses and abuses of stochastic models in veterinary epidemiology

Equine influenza causes disease that, while similar to human infection caused by Influenza A H1N1 and H3N2, at least in terms of the pathogenesis, transmission and population level phylogeny, is markedly different in terms of seasonality in that there are no obvious consistent winter peaks of transmission.

The talk will focus around a programme of work directed at better understanding of the epidemiology and control of equine influenza infection. The programme has used stochastic versions of SEIR models, parameterised from experimental and epidemiological data of the disease in the natural host. Optimising the use of vaccination is of particular interest. Empirical data have allowed the extension of the basic models into those assessing the impact of virus selection (antigenic drift) and explain how rather small differences observed experimentally scale up to substantial population level effects. More recent developments of explore the extension of a basic model into a one involved variably connected patches. Practical issues which face all those working in similar fields relating to parameterisation of more complex stochastic epidemiological models will be discussed and comparison will be made with other epidemiological work.

SCBW03 24th November 2006
10:00 to 11:00
Bayesian inference for structured population models given final outcome data

We consider the problem of Bayesian inference for infection rates in a multi-type stochastic epidemic model in which the population has a given structure, given data on final outcome. For such data, a likelihood is both analytically and numerically intractable. This problem can be overcome by imputation of suitable latent variables. We describe two such approaches based on different representations of the epidemic model. We also consider extentions to the methodology for the situation where the observed data are a fraction of the entire population. The methods are illustrated with data on influenza outbreaks.

SCBW03 24th November 2006
11:30 to 12:30
Climate-driven spatial dynamics of plague among prairie dog colonies

I will present a Bayesian hierarchical model for the joint spatial dynamics of a host-parasite system. The model was fitted to long-term data on the regional plague dynamics and metapopulation dynamics of the black-tailed prairie dog, a declining keystone of North American prairies. The rate of plague transmission between colonies increases with increasing precipitation while the rate of infection from unknown sources decreases in response to hot weather. The annual dispersal distance of plague is about 10 km and topographic relief reduces the transmission rate. Larger colonies are more likely to become infected, but colony area does not affect the infectiousness of colonies. The results suggests that prairie dog movements do not drive the spread of plague through the landscape. Instead, prairie dogs are useful sentinels of plague epizootics. Simulations suggest that the model can be used for predicting long-term colony and plague dynamics as well as for identifying which colonies are most likely to become infected in a specific year.

SCBW03 24th November 2006
14:00 to 15:15
Statistical inference for epidemics among a population of households

This talk is concerned with a stochastic model for the spread of an SIR (susceptible $\to$ infective $\to$ removed) epidemic among a closed, finite population that contains several types of individual and is partitioned into households. A pseudolikelihood framework is presented for making statistical inference about the parameters governing such epidemics from final outcome data, when possibly only some of the households in the population are observed. The framework includes parameter estimation, hypothesis tests and goodness-of-fit. Asymptotic properties of the procedures are derived when the number of households in both the sample and the population are large, which correctly account for dependencies between households. The methodology is illustrated by applications to data on a \emph{variola minor} outbreak in Sao Paulo and to data on influenza outbreaks in Tecumseh, Michigan.

SCBW03 24th November 2006
15:45 to 17:00
Exact Bayesian inference and model selection for some infection models

While much progress in the analysis of infectious disease data depends upon MCMC methodology, the simpler and more exact method of rejection sampling can sometimes be very useful. Using examples of influenza data from a population divided into households, this talk will illustrate the use of rejection sampling in model fitting; use of an initial sample to improve the efficiency of the algorithm; selection between competing models of differing dimensionality.

SCBW02 11th December 2006
10:00 to 11:00
Overview of statistical issues in genome-wide association testing
SCBW02 11th December 2006
11:45 to 12:30
Causal effects in functional genomics

The power of traditional genetics studies to identify the genetic determinants of diseases is limited by the fact that complex disease traits depend on small incremental contributions from many loci. Integrative functional genomics represents a relatively novel approach to the problem. The idea is to use hypotheses on the patho-physiological mechanisms underlying the studied disease to focus attention on a restricted collection of molecular pathways and corresponding inheritable molecular phenotypes. On each sampled individual, data information is collected both at a DNA sequence level, within or around candidate genes, as well as at a clinical phenotype and at a molecular phenotype level.

We propose a general statistical framework for the design and analysis of functional genomics studies of the above kind. Our approach uses a directed graph representation of a probability model of the problem, incorporating "intervention nodes" for formal reasoning about causes and effects of causes, as proposed by Dawid. In fact, meaningful biological questions can often be formulated in terms of effects of specific interventions, for example, the effect of blocking a certain receptor by a drug. Our approach involves mapping available biological knowledge onto the graph, using graph semantics and conditional independence reasoning to formulate meaningful scientific questions, and identifying appropriate experimental designs for answering those questions. Finally, the graph can be used as a computational framework for estimating the quantities of interest.

The method will be illustrated with the aid of our study on the effect of platelet sensitivity to thrombotic occlusive events.

SCBW02 11th December 2006
14:00 to 15:00
Colouring and breaking sticks, pairwise coincidence losses, and clustering expression profiles

We consider methodology for Bayesian model-based clustering of gene expression profiles, that is, measurements of expression levels of a large number of genes, typically from microarray assays, across a number of different experimental conditions and/or biological subjects. We follow a familiar approach using Dirichlet-process-based models to cluster the genes implicitly, but depart from standard practice in several ways. First, we incorporate regression on covariate information at the condition/subject level by modelling regression coefficients, not the expectations of the data directly. More importantly, we replace the Dirichlet process by one of a richer family of models, generated from a stick-colouring-and-breaking construction, under which cluster identities are not exchangeable: this allows modelling a 'background' cluster, for example. Finally, we follow a formal decision-theoretic approach to point estimation of the clustering, using a pairwise coincidence loss function. This is joint work with John Lau at Bristol.

SCBW02 11th December 2006
15:30 to 16:30
P Brown Aspects of feature selection in Mass Spec proteomic functional data

We look at functional data as arising from mass spectroscopy data used in proteomics. The data may contain experimental factors and covariates but a desire is to provide interpretation and to discriminate between two or more groups. Modelling is often facilitated by the use of wavelets.

We review a variety of approaches to (i) modelling the functional data as response (ii) modelling directly the discriminatory categories conditional on functional data and experimental factor/covariates. Our ultimate focus will be on Bayesian models that allow regularisation. To this end we look at a variety of forms of scale mixture of normal prior distributions including forms of hyper-lasso and approaches to robustness and stability of discrimination. We are particularly interested in fast algorithms capable of scaling up to very many variables and which are flexible enough to allow a variety of prior structures.

Keywords: Bayesian methods; Hyper-lasso; Bayesian Wavelet functional modelling; MCMC; EM algorithm.

SCBW02 12th December 2006
09:00 to 10:00
JM Thornton From protein structure to biological function:progress and limitations

Understanding the relationship between protein structure and biological function has long been a major goal of structural biology. With the advent of many structural genomics projects, there is a practical need for tools to analyse and characterise the possible functional attributes for a new structure.

One of the major challenges in assigning function is to recognise a cognate ligand which may be a small molecule or a large macromolecule. At EBI we have been developing a range of methods which seek to annotate a functional site. These methods include:

 Using sequence data and global and local structure comparisons to recognise distant relatives or short sequence patterns that are characteristic of binding sites.  Using 3-dimensional templates for functional sites defined from proteins of known structure and function which can identify similarities between the query protein and other proteins in the PDB.  Using spherical harmonics to define the shape of a binding site and to compare this shape with all known binding sites in the PDB and with the small molecule metabolome.

These methods have some success dependent upon the shape and flexibility of the binding site. In this presentation I will review our progress in this area and describe application to the sulpher transferases. Some of this work has been integrated into the ProFunc pipeline (coordinated by R.A. Laskowski) which is a web server which can provide automated annotation for a new protein structure. (http://www.ebi.ac.uk/thornton-srv/databases/ProFunc/).

References:

Laskowski, R.A., Watson, J.D. & Thornton, J.M. (2005). ProFunc: a server for predicting protein function from 3D structure. Nucleic Acids Res., 33, W89-W93. Laskowski, R.A., Watson, J.D. & Thornton, J.M. (2005) Protein function prediction using local 3D templates. J. Mol Biol., 351, 614-626. Morris, R.J., Kahraman, A. & Thornton, J.M. (2005) Binding pocket shape analysis for protein function prediction. Acta. Cryst. D61, C156-C157.

SCBW02 12th December 2006
10:00 to 10:45
K Walter Modelling the boundaries of highly conserved non-coding DNA sequences in vertebrates

A comparison of the human and fish genomes produced more than 1000 highly conserved non-coding elements (CNEs), sequences of DNA that show a remarkable degree of similarity between human and fish despite an evolutionary distance of about 900 million years.

The high sequence conservation suggests that these CNEs possess some kind of function, though neither their function nor which part of their sequence is functional have been well defined yet. Since each CNE was defined by a pairwise sequence alignment, its boundary might not be accurate enough to design biological experiments to help identify its role in the genome. In a first step an examination of the CNE's nucleotide composition revealed a striking A+T pattern at the CNE boundary in fish as well as human.

In a step further we propose a probabilistic model that takes into consideration not only nucleotide composition but also phylogenetic information, and that aims to define the functional part of CNEs by using multiple sequence alignments of human, mouse, chicken, frog and fish.

SCBW02 12th December 2006
11:30 to 12:30
Bayesian analysis of gene expression data

Microarray experiments and gene expression data have a number of characteristics that make them attractive but challenging for Bayesian analysis. There are many sources of variability, the variability is structured at different levels array specific, gene specific, ....) and the ratio of signal to noise is low. Typical experiments involve few samples but a large number of genes, so that borrowing information, e.g. across genes, to improve inference becomes essential. Hence embedding the inference in a hierarchical model formulation is natural.

Bayesian models adapted to the level of information processed have been developed to address some of the questions raised that range from modelling the signal to synthesising gene lists across different experiments. In this talk, I shall illustrate their use in variety of contexts: probe level models attempting to quantify uncertainty of the signal, differential expression mixture models and gene list synthesis. Cutting across these developments are important issues of MCMC performance and model checking. These issues will be illustrated on case studies.

This is joint work with colleagues on the BGX project : Marta Blangiardo, Natalia Bochkina, Anne Mette Hein (now at Aarhus), Alex Lewin, Ernest Turro and Peter Green (Bristol).

SCBW02 12th December 2006
14:00 to 15:00
M Dermitzakis Inference of cis and trans regulatory variation in the human genome

The recent comparative analysis of the human genome has revealed a large number of conserved non-coding sequences in mammalian genomes. However, our understanding of the function of non-coding DNA is very limited. In this talk I will present recent analysis in my group and collaborators that aims at the identification of functionally variable regulatory regions in the human genome by correlating SNPs and copy number variants with gene expression data. I will also be presenting some analysis on inference of trans regulatory interactions and evolutionary consequences of gene expression variation.

SCBW02 12th December 2006
15:30 to 16:15
A Bayesian approach to association mapping in admixed populations

Admixed populations, such as African Americans, are an important resource for identifying mutations contributing to human disease. Until recently, disease mapping in such populations has used so called admixture linkage disequilibrium" to map loci for diseases whose incidence varies markedly between populations. These analysis methods typically require unlinked markers spaced at wide intervals, while data now becoming available provide genotype information at much higher densities. High density data provides exquisite information about admixture, and association information, but presents methodological challenges. We have developed and implemented an approach to utilize such data to probabilistically infer admixture segments, and to perform disease mapping. The approach uses the HapMap data as a framework, and employs a fully Bayesian methodology, providing a natural weighting of both broad-scale admixture linkage disequilibrium and fine-scale association information. The software is currently being applied to data from a prostate cancer association study examining African American cases and controls. Future population genetics based directions are briefly discussed.

SCBW02 12th December 2006
16:15 to 17:00
C Barnes Techniques for the detection of copy number variation using SNP genotyping arrays

The key goal of medical genetics is the search for the genetic variation responsible for disease. A major focus is on the use of single nucleotide polymorphism (SNP) genotyping arrays for genome wide association studies. However, recent studies suggest that copy number variation (CNV) accounts for a significant fraction of the total variation in the human genome. The susceptibility to a number of diseases, including HIV infection, is already known to be associated with copy number variants but the full functional and phenotypic impact of CNVs is not yet fully understood. In order to search for CNVs using SNP genotyping platforms we have developed a number of normalization schemes. These incorporate allele specific corrections, quantile normalization and corrections for the source, GC content and length of the PCR products. We have also developed methods of locating and categorizing CNVs using both existing algorithms, such as SWArray and CBS, and novel tools. These are implemented within a high throughput framework, essential for processing large datasets already available and from future projects. Here we present a map of common CNVs based on studies of a large set of healthy individuals.

SCBW02 13th December 2006
09:00 to 10:00
Minimal ancestral recombination graphs

Finding the ancestral recombination graph (ARG) that explains a data set with the least number of recombination is the parsimony-ARG analogue to finding parsimonious phylogenies. This is a hard computational problem and two main approaches will be discussed. Firstly, a "scan along sequences dynamic programming approach that works up to 10 sequences of any length. Secondly, a "trace history back in time" branch and bound approach that can work very fast for much larger number of sequences, but can also fail totally dependent on data". The second approach can also be extended to include gene conversion. Finally, the number of ancestral states that could be encountered for a given data set it counted for small number of sequences and segregating sites. It is also illustrated how likelihood calculations can be done on a restricted graph that contains close to minimal histories of a set of sequences

Allen, B. and Steel, M., Subtree transfer operations and their induced metrics on evolutionary trees,Annals of Combinatorics 5, 1-13 (2001)

Baroni, M., Grunewald, S., Moulton, V., and Semple, C. Bounding the number of hybridisation events for a consistent evolutionary history. Journal of Mathematical Biology 51 (2005), 171-182

Bordewich, M. and Semple, C. On the computational complexity of the rooted subtree prune and regraft distance. Annals of Combintorics 8 (2004), 409-423

Hein,J.J., T.Jiang, L.Wang & K.Zhang (1996): "On the complexity of comparing evolutionary trees" Discrete Applied Mathematics 71.153-169.

Hein, J., Schierup, M. & Wiuf, C. (2004) Gene Genealogies, Variation and Evolution, Oxford University Press

Lyngsø, R.B., Song, Y.S. & Hein, J. (2005) Minimum Recombination Histories by Branch and Bound. Lecture Notes in Bioinformatics: Proceedings of WABI 2005 3692: 239250.

Myers, S. R. and Griffiths, R. C. (2003). Bounds on the minimum number of recombination events in a sample history. Genetics 163, 375-394.

Song, Y.S., Lyngsø, R.B. & Hein, J. (2005) Counting Ancestral States in Population Genetics. Submitted.

Song, Y.S. & Hein, J. (2005) Constructing Minimal Ancestral Recombination Graphs. J. Comp. Biol., 12:147169

Song, Y.S. & Hein, J. (2004) On the minimum number of recombination events in the evolutionary history of DNA sequences. J. Math. Biol., 48:160186.

Song, Y.S. & Hein, J. (2003) Parsimonious reconstruction of sequence evolution and haplotype blocks: finding the minimum number of recombination events, Lecture Notes in Bioinformatics, Proceedings of WABI'03, 2812:287302.

SCBW02 13th December 2006
10:00 to 10:45
Estimating the effects of SNPs on protein structure

Understanding the effects that non-synonymous single nucleotide polymorphisms have on the structures of the gene products, the proteins, is important in identifying the origins of complex diseases. A method based on environment-specific amino acid substitutions observed within homologous protein families with known 3D structures was used to predict changes in stability caused by mutations. In the task of predicting only the sign of stability change, our method performs comparably or better to other published methods with an accuracy of 71%. The method was applied to a set of disease associated and non-disease associated mutations and was shown to distinguish the two sets in terms of protein stability. Our method may therefore have application in correlating SNPs with diseases caused by protein instability.

SCBW02 13th December 2006
11:30 to 12:30
Probabilistic modelling of metabolic regulation in prokaryotes

Suprisingly little is known about regulatory processes in prokaryotes outside a small group of model species such as Escherichia coli. Probabilistic models can help to combine the comparatively sparse direct experimental evidence for regulation in less well known organisms such as Mycobacterium tuberculosis with gene expression data and results from the application of bioinformatics and genomic tools. I will discuss the challenges of such a project and some of the statistical concepts that might be useful for tackling them.

SCBW02 14th December 2006
09:00 to 10:00
Estimating genealogies from marker data: a Bayesian approach

An issue often encountered in statistical genetics is whether, or to what extent, it is possible to estimate the degree in which individuals sampled from a background population are related to each other, on the basis of the available diploid multi-locus genotype data and some information on the demography of that population. In this talk, this question is considered by using an explicit modelling and reconstruction of the pedigrees and gene flows at the marker loci. For computational reasons, the analysis is restricted to a relatively recent history of the population, currently extending, depending of the data, up to ten or twenty generations backwards in time. As a computational tool, we use Markov Chain Monte Carlo numerical integration on the state space of genealogies of the sample individuals. The main technical challenge has been in devising a variety of joint proposal distributions which would guarantee that the algorithm has reasonable mixing properties. As illustrations of the method, we consider the question of relatedness both in terms of individuals (pedigree based relatedness estimation) and at the level of genes/genomes (IBD-estimation), using both simulated and real data.

SCBW02 14th December 2006
10:00 to 10:45
Detecting natural selection with empirical codon models: a synthesis of population genetics and molecular phylogenetics

The estimation of empirical codon models sheds new light on recently discussed questions about biological pressures and processes acting during codon sequence evolution (Averof et al., Science 287:1283 (2000), Bazykin et al., Nature 429:558 (2004), Friedman and Hughes, MBE 22:1285 (2005), Keightley et al., PLoS Biol 3:282 (2005)).

My results show that modelling the evolutionary process is improved by allowing for single, double and triple nucleotide changes; the affiliation between DNA triplets and the amino acid they encode is a main factor driving evolution; and the nonsynonymous-synonymous rate ratio is a suitable measure to classify substitution patterns observed for different proteins. However, comparing models estimated from genomic data and polymorphism data indicates that double and triple changes are not instantaneous.

This new view of how codon evolution proceeds leads to consequences for selection studies. I will discuss that under the new empirical codon model purifying selection is less purifying and that cases of positive selection are observed weaker than under the standard condon models (Yang et al.,Genetics 155: 431-449 (2000)).

SCBW02 14th December 2006
11:30 to 12:30
R Nielsen Detecting selection from population genetic data

We will present an analysis of several large scale human populaiton genetic data sets. We use a combination of simulation and analytical approaches to identify genes and genomic regions targeted by Darwinian selection. The biological implications of some of the results are discussed.

SCBW02 14th December 2006
14:00 to 15:00
N Patterson Population structure and eigenanalysis

When analyzing genetic data, one often wishes to determine if the samples are from a population that has structure. Can the samples be regarded as randomly chosen from a homogeneous population, or does the data imply that the population is not genetically homogeneous? We show that an old method (principal components) together with modern statistics (Tracy-Widom theory) can be combined to yield a fast and effective answer to this question. The technique is simple and practical on the largest datasets, and can be applied both to genetic markers that are biallelic or to markers that are highly polymorphic such as microsatellites. The theory also allows us to estimate the data size needed to detect structure if our samples are in fact from two populations that have a given, but small level of differentiation.

SCBW02 14th December 2006
15:30 to 16:15
C Bird Exploring the role of noncoding DNA in the function of the human genome through variation

The function of conserved non-coding (CNCs) DNA has been speculated about since its discovery. We have begun to investigate this by using variation data to study the effect of genomic location upon these sequences. We have used the phase II HapMap consortium SNP data to first investigate the signature of selective constraint of non-coding regions. Our results show that new (derived) alleles of SNPs within CNCs are rarer than new alleles in nonconserved regions (P = 3 x10-18), indicating that evolutionary pressure has suppressed CNC-derived allele frequencies. We have used whole genome alignments of the human, chimp and macaque genomes to identify 1356 non-coding sequences, conserved across multiple mammals, which show significantly accelerated substitution rate in the human lineage, indicated by a relative rate test in the human-chimp-macaque alignments. We subsequently test which of these 1356 sequences are a result of relaxation of selective constraint versus positive selection. Detectable segmental duplications are by their nature primate-specific events. An intriguing question is whether these rapidly evolving CNCs are enriched within segmental duplications? The accelerated CNCs could be due to a loss of selective constraint or positive selection, and either of these scenarios could relate to differential gene expression patterns between their associated paralogous genes. We have identified an enrichment of accelerated CNCs in the most recently formed segmental duplications. We are currently investigating the potential for reciprocal changes in duplicated CNCs. We have also recently identified a group of accelerated CNCs that contain SNPs that are identified as significant contributors to gene expression variation. We will present our current computational and functional analysis on the evolutionary properties of CNCs within and between species and the functional consequences on gene expression.

SCBW02 14th December 2006
16:15 to 17:00
C Hoggart A hybrid Bayesian method for detecting multiple causal variants from Genome-Wide association studies

Testing one SNP at a time does not fully realise the potential of genome-wide association studies to identify multiple causal variants of small effect, which is a plausible scenario for many complex diseases. Moreover, many simulation studies assume a single causal variant and so more complex realities are ignored. Analysing large numbers of variants simultaneously is now becoming feasible, thanks to developments in Bayesian stochastic search methods. We combine Bayesian shrinkage methods together with a local stochastic model search to identify complex interactions, both local and distal. Our approach can analyse up to 10,000 SNPs simultaneously, and finds multiple potential disease models each with an associated probability. We illustrate its power in comparison with a range of alternative methods, in simulations that incorporate multiple causal loci, acting singly or in interacting pairs, among 4,000 SNPs in a 20Mb region. We argue that, implemented in a two-stage procedure, our hybrid Bayesian analysis can provide a powerful solution to the problem of extracting maximal information from genome-wide association studies.

SCBW02 15th December 2006
09:00 to 10:00
A Thomas Towards linkage analysis with markers in linkage disequilibrium by graphical modelling

Recent developments of MCMC integration methods for computations on graphical models for two applications in statistical genetics are reviewed: modelling allelic association and pedigree based linkage analysis. Estimation of graphical models from haploid and diploid genotypes, and the importance of MCMC updating schemes beyond what is strictly necessary for irreducibility, are described and illustrated. We then outline an approach combining these methods to compute linkage statistics when alleles at the marker loci are in linkage disequilibrium. Other extensions suitable for analysis of SNP genotype data in pedigrees are also discussed and programs that implement these methods, and which are available from the author's web site, are described. We conclude with a discussion of how this still experimental approach might be further developed.

SCBW02 15th December 2006
10:00 to 10:45
Approximate Bayesian computation vs Markov chain Monte Carlo

Approximate Bayesian Computation (ABC) is a recent developed Bayesian technique that can be used to extract information from DNA data. This method has been firstly introduced to Population Genetics in 1997 by Pritchard.

Since 2002, with Beaumonts paper on the subject, its usage has been strongly increased. This Bayesian approach is used to estimate several demographic history parameters, from populations, using DNA data. Its main advantages are the decrease on computation time demanding and the increase on efficiency and flexibility when dealing with multiparameter models.

In this project it has been studied a particular ABC method similar to the one used by Beaumont in 2006, against a commonly used Markov Chain Monte Carlo (MCMC) method (Hey and Nielsen, 2004) to infer about the accuracy of the first method. It was also explored the use of this method with more complex demographic models. These two approaches use DNA sequence data to extract demographic information (e.g. population sizes, time of splitting events, migration rates).

The study confirms the competitiveness of this method when compared to an MCMC approach as well as its potential role on researches with more complex, therefore more realistic, models.