skip to content
 

Seminars (DAE)

Videos and presentation materials from other INI events are also available.

Search seminar archive

Event When Speaker Title Presentation Material
DAEW01 18th July 2011
11:30 to 12:30
Optimal model-based design for experiments: some new objective and constraint formulations
The presentation will briefly review some of the reasons for the recent renewed interest in the Design of Experiments (DoE) and some key developments which, in the author's view and experience, underpin and enable this success. One is identified in the ability of combining classical DoE methods with substantially more sophysticated mathematical descriptions of the physics in the experiment being designed, thus putting the "model-based" firmly in front of DoE. Another, the main subject of the talk, is a better understanding of the relationship between desired performance and evaluation metric(s), leading to the disaggregation of a single "best" design objective into constituent components, and to much richer formulations of the design problem that can be tailored to specific situations. A final reason is the substantial improvement in the numerical and computing tools supporting the model-based design of experiments, but also and chiefly in the availaility of integrated modelling/solution environments which make the whole technology accessible to a much wider engineering community. The presentation will illustrate, with reference to examples, some of the new problem formulations that can be used to represents more sophysticated design requirements (including parameter precision, anti-correlation, robustness to uncertainty) and, briefly, some of the newer solution approaches (including design of parallel experiments, on-line re-design). It will also illustrate some successful applications in a variety of demanding industrial areas, ranging from fuel cells, to complex reactor design, to biomedical applications.
DAEW01 18th July 2011
14:00 to 15:00
Optimal experimental design for nonlinear systems: Application to microbial kinetics identification
Dynamic biochemical processes are omnipresent in industry, e.g., brewing, production of enzymes and pharmaceuticals. However, since accurate models are required for model based optimisation and measurements are often labour and cost intensive, Optimal Experiment Design (OED) techniques for parameter estimation are valuable tools to limit the experimental burden while maximising the information content. To this end, often scalar measures of the Fisher information matrix (FIM) are exploited in the objective function. In this contribution, we focus on the parameter estimation of nonlinear microbial kinetics. More specifically, the following issues are addressed: (1) Nonlinear kinetics. Since microbial kinetics is most often nonlinear, the unknown parameters appear explicitly in the design equations. Therefore, selecting optimal initialization values for these parameters as well as setting up a convergent sequential design scheme is of great importance. (2) Biological kinetics. Since we deal with models for microbial kinetics, the design of dynamic experiments is facing additional constraints. For example, upon applying a step change in temperature, an (unmodelled) lag phase is induced in the microbial population's response. To avoid this, additional constraints need to be formulated on the admissible gradients of the input profiles thus safeguarding model validity under dynamically changing environmental conditions. (3) Not only do different scalar measures of the FIM exist, but they may also be competing. For instance, the E-criterion tries to minimise the largest error, while the modified E-criterion aims at obtaining a similar accuracy for all parameters. Given this competing nature, a multi-objective optimisation approach is adopted for tackling these OED problems. The aim is to produce the set of optimal solutions, i.e., the so-called Pareto set, in order to illustrate the trade-offs to be made. In addition, combinations of parameter estimation quality and productivity related objectives are explored in order to allow an accurate estimation during production runs, and decrease down-time and losses due to modelling efforts. To this end, ACADO Multi-Objective has been employed, which is a flexible toolkit for solving dynamic optimisation or optimal control problems with multiple and conflicting objectives. The results obtained are illustrated with both simulation studies and experimental data collected in our lab.
DAEW01 18th July 2011
15:30 to 16:30
Individuals are different: Implications on the design of experiments
If dynamics is measured repeatedly in biological entities like human beings or animals, the diversity of individuals may have a crucial impact on the outcomes of the measurements. An adequate approach for this situation is to assume random coefficients for each individual. This leads to non-linear mixed models, which have attracted an increasing popularity in many fields of applications in recent years due to advanced computer facilities. In such studies main emphasis is laid to the estimation of population (location) parameters for the mean behaviour of the individuals, but besides that also interest may be in the prediction of further response for the specific individuals under investigation. Here we will indicate the problems and implications of this approach to the design of experiments and illustrate various consequences by the simple example of an exponential decay. However, it remains unsolved, what is the "correct" measure of performance of a design in this setting.
DAEW01 18th July 2011
16:30 to 17:00
Optimal design for the estimation of population location parameters in nonlinear mixed effects models
Nonlinear mixed effects models are frequently used in the analysis of grouped data. Specially in pharmacological studies the observed individuals usually share a common response structure, such that information from individual responses might be merged to obtain efficient estimates. The mixed effects Models can be used to model population studies by assuming the individual parameter vectors to be realizations of independently distributed random variables, what yields for nonlinear response functions of the individual parameters nontrivial models. Unfortunately, in nonlinear mixed effects models problems occur, as there exists no closed form representation of the likelihood-function of the observations and hence no closed form of the Fisher Information. Optimal designs in nonlinear mixed effects models are usually based on approximations of the Fisher Information, such that bad approximations might lead to bad experimental designs. In this talk we discuss different approaches for approximating the information matrix and the influence of the approximations on the implied designs in pharmacokinetic studies.
DAEW01 18th July 2011
17:00 to 17:30
Optimal experimental design for stochastic population models
Markov population processes are popular models for studying a wide range of phenomena including the spread of disease, the evolution of chemical reactions and the movements of organisms in population networks (metapopulations). Our ability to use these models can effectively be limited by our knowledge about parameters, such as disease transmission and recovery rates in an epidemic. Recently, there has been interest in devising optimal experimental designs for stochastic models, so that practitioners can collect data in a manner that maximises the precision of maximum likelihood estimates of the parameters for these models. I will discuss some recent work on optimal design for a variety of population models, beginning with some simple one-parameter models where the optimal design can be obtained analytically and moving on to more complicated multi-parameter models in epidemiology that involve latent states and non-exponentially distributed infectious periods. For these more complex models, the optimal design must be arrived at using computational methods and we rely on a Gaussian diffusion approximation to obtain analytical expressions for the Fisher information matrix, which is at the heart of most optimality criteria in experimental design. I will outline a simple cross-entropy algorithm that can be used for obtaining optimal designs for these models. We will also explore some recent work on optimal designs for population networks with the aim of estimating migration parameters, with application to avian metapopulations.
DAEW01 19th July 2011
09:00 to 10:00
VB Melas On sufficient conditions for implementing the functional approach
Let us consider the general nonlinear regression model under standard assumptions on the experimental errors. Let also the following assumptions be fulfilled: (i) the regression function depends on a scalar variable belonging to the design interval, (ii) the derivatives of the function with respect to the parameters generate an extended Chebyshev system on the design interval, (iii) the matrix of second derivatives of the optimality criterion with respect to the different information matrix elements is positive definite. Then under non-restrictive assumptions it can be proved that the Jacobi matrix of the system of differential equations that defines implicitly support points and weight coefficients of the optimal design is invertible. This allows us to implement the Implicit Function Theorem for representing the points and the weights by a Taylor series. The corresponding theorems as well as particular examples of nonlinear models are elaborated. The results are generalisations of those given in the monograph published recently by the author.
DAEW01 19th July 2011
10:00 to 11:00
Enhanced model-based experiment design techniques for parameter identification in complex dynamic systems under uncertainty
A wide class of physical systems can be described by dynamic deterministic models expressed in the form of systems of differential and algebraic equations. Once a dynamic model structure is found adequate to represent a physical system, a set of identification experiments needs to be carried out to estimate the set of parameters of the model in the most precise and accurate way. Model-based design of experiments (MBDoE) techniques represent a valuable tool for the rapid assessment and development of dynamic deterministic models, allowing for the maximisation of the information content of the experiments in order to support and improve the parameter identification task. However, uncertainty in the model parameters or in the model structure itself or in the representation of the experimental facility may lead to design procedures that turn out to be scarcely informative. Additionally, constraints may occur to be violated, thus making the experiment unfeasible or even unsafe. Handling uncertainty is a complex and still open problem, although over the last years significant research effort has been devoted to tackle some issues in this area. Here, some approaches developed at CAPE-Lab at University of Padova will be critically discussed. First Online Model-Based Redesign of Experiment (OMBRE) strategies will be taken into account. In OMBRE the objective is to exploit the information as soon as soon as it is generated by the running experiment. The manipulated input profiles of the running experiment are updated by performing one or more intermediate experiment designs (i.e., redesigns), and each redesign is performed adopting the current value of the parameter set. In addition, a model updating policy including disturbance estimation embedded within an OMBRE strategy (DE-OMBRE) can be considered. In the DE-OMBRE approach, an augmented model lumping the effect of systematic errors is considered to estimate both the states and the system outputs in a given time frame, updating the constraint conditions in a consistent way as soon as the effect of unknown disturbances propagates in the system. Backoff-based MBDoE, where uncertainty is explicitly accounted for so as to plan a test that is both optimally informative and safe by design, is eventually discussed.
DAEW01 19th July 2011
11:30 to 12:30
Optimal experimental designs for stochastic processes whose covariance is a function of the mean
Recent literature emphasizes, for the analysis of compartmental models, the need for models for stochastic processes whose covariance structure depends on the mean. Covariance functions must be positive definite and this fact is nontrivial and constitutes one of the challenges of the present work, for a stochastic process whose covariance is a function of the mean. We show that there exists a class of functions that, composed with the mean of the process, preserve positive definiteness and can be used for the purposes of the present talk. We offer some examples for an easy construction of such covariances and then study the problem of locally D-optimal design through both simulation studies as well as real data inherent to a radiation retention model in the human body.
DAEW01 19th July 2011
14:00 to 15:00
New approach to designing experiments with correlated observations
I will review some results of an on-going joint research project with Holger Detter and Andrey Pepelyshev. In this project, we propose and develop a new approach to the problem of optimal design for regression experiments with correlated observations. This approach extends the well-known techniques of Bickel-Herzberg and covers the cases of long-range dependence in observations and different asymptotical relations between the number of observations and the size of the design space. In many interesting cases the correlations kernels become singular which implies that traditional methods are no longer applicable. In these cases, a potential theory can be used to derive optimality conditions and establish the existence and uniqueness of the optimal designs. In many instances the optimal designs can be explicitly computed.
DAEW01 19th July 2011
15:30 to 16:15
On exact optimal sampling designs for processes with a product covariance structure
Assume a random process with a parametrized mean value and a Wiener covariance structure. For this model, we will exhibit three classes of mean value functions for which it is possible to find an explicit form of the exact optimal sampling design. We will also show that the optimum design problems with a product covariance structure can be transformed one into another. This gives us insight into relations of seemingly different optimal designs problems.
DAEW01 19th July 2011
16:15 to 17:00
A Bardow Optimal experimental design for the well and the ill(-posed problems)
The talk discusses both recent applications and extensions of model-based optimal experimental design (OED) theory for challenging problems motivated from chemical engineering. Despite the progress of advanced modeling and simulation methods, experiments will continue to form the basis of all engineering and science. Since experiments are usually require significant effort, best use of these resources should be made. Model-based optimal experimental design provides a rigorous framework to achieve this goal by determining the best settings for the experimental degrees of freedom for the question of interest. In this work, the benefits of applying optimal experimental methods will be demonstrated for the determination of physical properties in chemical engineering applications. In particular, the application to diffusion measurements is considered. Since diffusion is slow, current experiments tend to be very time-consuming. Recently, lab-on-a-chip technology brought the promise of speeding up the measurements due to a drastic decrease in characteristic distances and thus diffusion time. Here, a rigorous optimization of microfluidic experiments for the determination of diffusion coefficients is performed. The OED results are quantitatively validated in experiments showing that the accuracy in diffusion measurements can be increased by orders of magnitude while reducing measurement times to minutes. After discussing applications, extensions of classical OED methods are presented. In particular, the experimental design of ill-posed problems is considered. Here, classical design approaches lead to even qualitatively wrong designs whereas the recently introduced METER criterion allows for a sound solution. The METER criterion aims at the minimization of the expected total error and thereby captures the bias-variance trade-off in ill-posed problems. For the development of predictive models for physical properties, model discrimination and validation are critical steps. For this task, a rational framework is proposed to identify the components and mixtures that allow for optimal model discrimination. The proposed framework combines model-based methods for optimal experimental design with approaches from computer-aided molecular design (CAMD). By selecting the right mixtures to test, a targeted and more efficient approach towards predictive models for physical properties becomes viable.
DAEW01 19th July 2011
17:00 to 17:30
Bayesian optimization: A framework for optimal computational effort for experimental design
DOE on models involving time or space dynamics is often very computationally demanding. Predicting a single experimental outcome may require significant computation, let alone evaluating a design criterion and optimizing it with respect to design parameters. To find the exact optimum of the design criterion would typically take infinite computation, and any finite computation will yield a result possessing some uncertainty (due to approximation of the design criterion as well as stopping the optimization procedure). Ideally, one would like to optimize not only the design criterion, but also the way it is approximated and optimized in order to get the largest likely improvement in the design criterion relative to the computational effort spent. Using a Bayesian method for the optimization of the design criterion (not only for calculating the design criterion) can accomplish such an optimal trade-off between (computational) resources spent planning the experiment and expected gain from carrying it out. This talk will lay out the concepts and theory necessary to perform a fully Bayesian optimization that maximizes the expected improvement of the design criterion in relation the computational effort spent.
DAEW01 20th July 2011
09:00 to 10:00
L Pronzato Adaptive design and control
There exist strong relations between experimental design and control, for instance in situations where optimal inputs are constructed in order to obtain precise parameter estimation in dynamical systems or when suitably designed perturbations are introduced in adaptive control to force enough excitation into the system. The presentation will focus on adaptive design when the construction of an optimal experiment requires the knowledge of the model parameters and current estimated values are substituted for unknown true values. This adaptation to estimated values creates dependency among observations and makes the investigation of the asymptotic behaviors of the design and estimator a much more complicated issue than when the design is specified independently of the observations. Also, even if the system considered is static, this adaptation introduces some feedback and the adaptive-design mechanism can be considered as a particular adaptive-control scheme. The role of experimental design in the asymptotic properties of estimators will be emphasized. The assumption that the set of experimental variables (design points) is finite facilitates the study of the asymptotic properties of estimators (strong consistency and asymptotic normality) in stochastic regression models. Two situations will be considered: adaptive D-optimal design and adaptive design with a cost constraint where the design should make a compromise between maximizing an information criterion (D-optimality) and minimizing a cost (function optimization). The case when the weight given to cost minimization asymptotically dominates will be considered in detail in connection with self-tuning regulation and self-tuning optimization problems.
DAEW01 20th July 2011
10:00 to 10:30
Methodology and application of optimal input design for parameter estimation
An optimal input design technique for parameter estimation is presented in this talk. The original idea is the combination of a dynamic programming method with a gradient algorithm for an optimal input synthesis. This approach allows us to include realistic practical constraints on the input and output variables. A description of this approach is presented, followed by an example concerning an aircraft longitudinal flight.
DAEW01 20th July 2011
10:30 to 11:00
Neural networks for nonlinear modeling of dynamic systems: Design problems
We start from a brief review of artificial neural networks with external dynamics as models for nonlinear dynamic systems (NARX, NFIR). We discuss problems arising in designing of such networks. In particular, we put emphasis on active learning, i.e., on iterative improvements of the Fisher information matrix. Furthermore, we propose random projections (applied to input and/or output signals) for increasing the robustness of model selection process.
DAEW01 20th July 2011
11:30 to 12:30
H Hjalmarsson Applications-oriented experiment design for dynamical systems
In this talk we present a framework for applications-oriented experiment design for dynamic systems. The idea is to generate a design such that certain performance criteria of the application are satisfied with high probability. We discuss how to approximate this problem by a convex optimization problem and how to address Achilles' heel of optimal experiment design, i.e., that the optimal design depends on the true system. We also elaborate on how the cost of an identification experiment is related to the performance requirements of the application and the importance of experiment design in reduced order modeling. We illustrate the methods on some problems from control and systems theories.
DAEW01 20th July 2011
14:00 to 15:00
Optimal input signals for parameter estimation in distributed-parameter systems
In the first part of the lecture we recall classical results on selecting optimal input signals for parameter estimation in systems with temporal (or spatial) dynamics only and their generalizations to unbounded signals. As a motivation for studying input signals, which can influence our system both in space and in time, we provide several examples of new techniques emerged in high energy lasers and in micro- and nano-technologies. We also mention an increasing role of cameras as sensors. Then, we discuss extensions of optimality conditions for input signals, trying to reveal an interplay between their spatial and temporal behavior. We concentrate on open loop input signals for linear systems, described by partial differential equations (PDE) or their Green's functions. Finally, we sketch the following open problems: (i) simultaneous optimization of sensor positions and input signals, (ii) experiment design for estimating spatially varying coefficients of PDEs.
DAEW01 20th July 2011
15:30 to 16:15
Numerical methods and application strategies for optimum experimental design for nonlinear differential equation models
We consider dynamic processes which are modeled by systems of nonlinear differential equations. Usually the models contain parameters of unknown quantity. To calibrate the models, the parameters have to be estimated from experimental data. Due to the uncertainty of data, the resulting parameter estimate is random. Its uncertainty can be described by confidence regions and the relevant variance-covariance matrix. The statistical significance of the parameter estimation can be maximized by minimizing design criteria defined on the variance-covariance matrix with respect to controls describing layout and processing of experiments and subject to constraints on experimental costs and operability. The resulting optimum experimental design problems are constrained non-standard optimal control problems whose objective depends implicitly on the derivatives of the model states with respect to the parameters. For a numerical solution we have developed methods based on the direct approach of optimal control, on quasi-Newton methods for nonlinear optimization, and on the efficient integration and differentiation of differential equations. To use experimental design for practical problems, we have developed strategies including robustification, multiple experiment formulations, a sequential strategy and an on-line approach. Application examples show that optimally designed experiments yield information about processes much more reliable, much faster and at a significantly lower cost than trial-and-error or black-box approaches. We have implemented our methods in the software package VPLAN which is applied to practical problems from several partners from different fields like chemistry, chemical engineering, systems biology, epidemiology and robotics. In this talk we formulate experimental design problems, present numerical methods for the solution, discuss application strategies and give application examples from practice.
DAEW01 20th July 2011
16:15 to 17:00
SGM Biedermann Optimal design for inverse problems
In many real life applications, it is impossible to observe the feature of interest directly. For example, non-invasive medical imaging techniques rely on indirect observations to reconstruct an image of the patient's internal organs. We investigate optimal designs for such inverse problems. We use the optimal designs as benchmarks to investigate the efficiency of designs commonly used in applications. Several examples are discussed for illustration. Our designs provide guidelines to scientists regarding the experimental conditions at which the indirect observations should be taken in order to obtain an accurate estimate for the object of interest.
DAEW01 20th July 2011
17:00 to 17:30
Bayesian experimental design for percolation and other random graph models
The problem of optimal arrangement of nodes of a random graph will be discussed in this workshop. The nodes of graphs under study are fixed, but their edges are random and established according to the so called edge-probability function. This function may depend on the weights attributed to the pairs of graph nodes (or distances between them) and a statistical parameter. It is the purpose of experimentation to make inference on the statistical parameter and, thus, to learn about it as much as possible. We also distinguish between two different experimentation scenarios: progressive and instructive designs. We adopt a utility-based Bayesian framework to tackle this problem. We prove that the infinitely growing or diminishing node configurations asymptotically represent the worst node arrangements. We also obtain the exact solution to the optimal design problem for proximity (geometric) graphs and numerical solution for graphs with threshold edge-probability functions. We use simulation based optimisation methods, mainly Monte Carlo and Markov Chain Monte Carlo, in order to obtain solution in the general case. We study the optimal design problem for inference based on partial observations of random graphs by employing data augmentation technique. In particular, we consider inference and optimal design problems for finite open clusters from bond percolation on the integer lattices and derive a range of both numerical and analytical results for these graphs. (Our motivation here is that open clusters in bond percolation may be seen as final outbreaks of an SIR epidemic with constant infectious times.) We introduce inner-outer design plots by considering a bounded region of the lattice and deleting some of the lattice nodes within this region and show that the 'mostly populated' designs are not necessarily optimal in the case of incomplete observations under both progressive and instructive design scenarios. Some of the obtained results may generalise to other lattices.
DAEW01 21st July 2011
09:00 to 09:45
D Ucinski Sensor network scheduling for identification of spatially distributed processes
Since for distributed parameter systems it is impossible to observe their states over the entire spatial domain, the question arises of where to locate discrete sensors so as to estimate the unknown system parameters as accurately as possible. Both researchers and practitioners do not doubt that making use of sensors placed in an `intelligent' manner may lead to dramatic gains in the achievable accuracy of the parameter estimates, so efficient sensor location strategies are highly desirable. In turn, the complexity of the sensor location problem implies that there are very few sensor placement methods which are readily applicable to practical situations. What is more, they are not well known among researchers. The aim of the talk is to give account of both classical and recent original work on optimal sensor placement strategies for parameter identification in dynamic distributed systems modelled by partial differential equations. The reported work constitutes an attempt to meet the needs created by practical applications, especially regarding environmental processes, through the development of new techniques and algorithms or adopting methods which have been successful in akin fields of optimal control and optimum experimental design. While planning, real-valued functions of the Fisher information matrix of parameters are primarily employed as the performance indices to be minimized with respect to the positions of pointwise sensors. Extensive numerical results are included to show the efficiency of the proposed algorithms. A couple of case studies regarding the design of air quality monitoring networks and network design for groundwater pollution problems are adopted as an illustration aiming at showing the strength of the proposed approach in studying practical problems.
DAEW01 21st July 2011
09:45 to 10:30
Resource-limited mobile sensor routing for parameter estimation of distributed systems
The problem of determining optimal observation strategies for identification of unknown parameters in distributed-parameter system is discussed. Particularly, a setting where the measurement process is performed by collecting spatial data from mobile nodes with sensing capacity forming an organized network is considered. The framework is based on the use of a criterion defined on the Fisher information matrix associated with the estimated parameters as a measure of the information content in the measurements. Motivations stem from engineering practice, where the clusterization of measurements at some spatial positions and at a given time moment often leads to a decrease in the robustness of the observational system to the model misspecification. Furthermore, there are some technical limitations imposed on the sensor paths in order to avoid collisions, satisfy energy constraints and/or provide a proper deployment of mobile sensor nodes. The approach is to convert the problem to a canonical optimal control one in which the control forces of the sensors may be optimized. Then, through an adaptation of some pairwise communication algorithms, a numerical scheme is developed, which decomposes the resulting problem and distributes the computational burden between network nodes. Numerical solutions are then obtained using widespread powerful numerical packages which handle various constraints imposed on the node motions. As a result, an adaptive scheme is outlined to determine guidance policies for network nodes in a decentralized fashion.
DAEW01 21st July 2011
10:30 to 11:00
From parametric optimization to optimal experimental design: A new perspective in the context of partial differential equations
We propose a new perspective of the optimal experimental design problem (OED), whose several theoretical and computational aspects have been previously studied. The formal setting of parametric optimization leads to the definition of a generalized framework from which the OED problem can be derived. Although this approach does not have a direct impact on the computational aspects, it links the OED problem to a wider field of theoretical results ranging from optimal control problems to the stability of optimization problems. Following this approach, we derive the OED problem in the context of partial differential equations (PDE) and present a primal-dual active set strategy to solve the constrained OED problem. Numerical examples are presented.
DAEW01 21st July 2011
11:30 to 12:30
Bayesian experimental design for stochastic dynamical models
Advances in Bayesian computational methods have meant that it is now possible to fit a broad range of stochastic, non-linear dynamical models (including spatio-temporal formulations) within a rigorous statistical framework. In epidemiology these methods have proved particularly valuable for producing insights into transmission dynamics on historical epidemics and for assessing potential control strategies. On the other hand, there has been less attention paid to the question how future data should be collected most efficiently for the purpose of analysis with these models. This talk will describe how the Bayesian approach to experimental design can be applied with standard epidemic models in order to identify the most efficient manner for collecting data to provide information on key rate parameters. Central to the approach is the representation of the design as a 'parameter' in an extended parameter space with the optimal design appearing as the marginal mode for an appropriately specified joint distribution. We will also describe how approximations, derived using moment-closure techniques, can be applied in order to make tractable the computational of likelihood functions which, given the partial nature of the data, would be prohibitively complex using methods such as data augmentation. The talk will illustrate the ideas in the context of designing microcosm experiments to study the spread of fungal pathogens in agricultural crops, where the design problem relates to the particular choice of sampling times used. We will examine the use of utility functions based entirely on information measures that quantify the difference between prior and posterior parameter distributions, and also discuss how economic factors can be incorporated in the construction of utilities for this class of problems. The talk will demonstrate how, if sampling times are appropriately selected, it may be possible to reduce drastically the amount of sampling required in comparison to designs currently used, without compromising the information gained on key parameters. Some challenges and opportunities for future research on design with stochastic epidemic models will also be discussed.
DAEW01 21st July 2011
14:00 to 15:00
Spatial design criteria and space-filling properties
Several papers have recently strengthened the bridge connecting geostatistics and spatial econometrics. For these two fields various criteria have been developed for constructing optimal spatial sampling designs. We will explore relationships between these types of criteria as well as elude to space-filling or not space-filling properties.
DAEW01 21st July 2011
15:30 to 16:15
M Stehlík Optimal design and properties of correlated processes with semicontinuous covariance
Semicontinuous covariance functions have been used in regression and kriging by many authors. In a recent work we introduced purely topologically defined regularity conditions on covariance kernels which are still applicable for increasing and infill domain asymptotics for regression problems and kriging. These conditions are related to the semicontinuous maps of Ornstein Uhlenbeck Processes. Thus these conditions can be of benefit for stochastic processes on more general spaces than the metric ones. Besides, the new regularity conditions relax the continuity of covariance function by consideration of a semicontinuous covariance. We discuss the applicability of the introduced topological regularity conditions for optimal design of random fields. A stochastic process with parametrized mean and covariance is observed over a compact set. The information obtained from observations is measured through the information functional (defined on the Fisher information matrix). We start with discussion on the role of equidistant designs for the correlated process. Various aspects of their prospective optimality will be reviewed and some issues on designing for spatial processes will be also provided. Finally we will concentrate on relaxing the continuity of covariance. We will introduce the regularity conditions for isotropic processes with semicontinuous covariance such that increasing domain asymptotics is still feasible, however more flexible behavior may occur here. In particular, the role of the nugget effect will be illustrated and practical application of stochastic processes with semicontinuous covariance will be given.
DAEW01 22nd July 2011
09:00 to 10:00
Advances in nonlinear geoscientific experimental and survey design
Geoscience is replete with inverse problems that must be solved routinely. Many such problems such as using satellite remote-sensing data to estimate properties of the Earth's surface, or solving Geophysical imaging and monitoring problems for potentially dynamic properties of the Earth's subsurface, involve large datasets that cost millions of dollars to collect. Optimising the information content of such data is therefore crucial. While linearised experimental design methods have been deployed within the Geosciences, most Geophysical problems are significantly nonlinear. This renders linearised design criteria invalid as they can significantly over- or under-estimate the information content of any dataset. Over the past few years we have therefore focussed on developing new nonlinear design methods that can be applied to practical data types and geometries for surveys of increasing size. We will summarise three advances in practical nonlinear design, one using a new design criterion applied in the data space, one using a new 'bi-focal' model space criterion, and one using a fast Monte Carlo refinement procedure that significantly speeds up nonlinear design calculations. Applications of the first two techniques are to design subsurface (micro-)seismic energy-source location problems, application of the third is to design so-called industrial seismic amplitude-versus-offset data sets to derive (an)elastic properties of subsurface geological strata. Using the first of these we managed to design an industrially practical Geophysical survey design using fully non-linearised methods.
DAEW01 22nd July 2011
10:00 to 11:00
PB Wilkinson SMART: Progress towards an optimising time-lapse geoelectrical imaging system
Electrical resistivity tomography (ERT) is a widely-used geophysical technique for shallow subsurface investigations and monitoring. A range of automatic multi-electrode ERT systems, both commercial and academic, are routinely used to collect resistivity data sets that cover large survey areas at high spatial and temporal density. But despite the flexibility of these systems, the data still tend to be measured using traditional arrangements of electrodes. Recent research by several international groups has highlighted the possibility of using automatically generated survey designs which are optimised to produce the best possible tomographic image resolution given the limitations of time and practicality required to collect and process the data. Here we examine the challenges of applying automated ERT survey design to real experiments where resistivity imaging is being used to monitor subsurface processes. Using synthetic and real examples we address the problems of avoiding electrode polarisation effects, making efficient use of multiple simultaneous measurement channels, and making optimal measurements in noisy environments. These are essential steps towards implementing SMART (Sensitivity-Modulated Adaptive Resistivity Tomography), a robust self-optimising ERT monitoring system. We illustrate the planned design and operation of the SMART system using a simulated time-lapse experiment to monitor a saline tracer. The results demonstrate the improvements in image resolution that can be expected over traditional ERT monitoring.
DAEW01 22nd July 2011
11:30 to 12:30
Information-based methods in dynamic learning
The history of information/entropy in learning due to Blackwell, Renyi, Lindley and others is sketched. Using results of de Groot, with new proofs, we arrive at a general class of information functions which gives "expected" learning in the Bayes sense. It is shown how this is intimately connected with the theory of majorization: learning means a more peaked distribution in a majorization sense. Counter-examples show that in some real situations it is possible to un-learn in the sense of having a less peaked posterior than prior. This does not happen in the standard Gaussian case, but does in cases such as the Beta-mixed binomial. Applications are made to experimental design. With designs for non-linear and dynamic system an idea of "local learning" is defined, in which the above theory is applied locally. Some connection with ideas of "active learning" in the machine learning area is attempted.
DAE 28th July 2011
14:00 to 16:00
Optimal resistor networks and their relevance to incomplete block designs
DAE 4th August 2011
10:00 to 12:00
A Cakiroglu Optimal designs and geometries
DAEW02 9th August 2011
09:30 to 10:30
D Bates Lecture on modelling: Mixed effects non-linear and generalized linear models
Mixed-effects models are defined by the distributions of two vector-valued random variables, an n-dimensional response vector, Y and an unobserved q-dimensional random-effects vector, B. The mean of the conditional distribution, Y|B=b, depends on a linear predictor expression of the form Xß+Zb where ß is a p-dimensional fixed-effects parameter vector and the fixed and known model matrices, X and Z, are of the appropriate dimension. For linear mixed-effects models the conditional mean is the linear predictor; for generalized linear mixed-effects models the conditional mean is the value of an inverse link function applied to the linear predictor and for a nonlinear mixed-effects model the conditional mean is the result of applying a nonlinear model function for which the parameter vector is derived from the linear predictor. We describe the formulation of these mixed-effects models and provide computationally effective expressions for the profiled deviance function through which the maximum likelihood parameter estimates can be determined. In the case of the linear mixed-effects model the profiled deviance expression is exact. For generalized linear or nonlinear mixed-effects models the profiled deviance is approximated, either through a Laplace approximation or, at the expense of somewhat greater computational effort, through adaptive Gauss-Hermite quadrature.
DAEW02 9th August 2011
11:00 to 11:45
To estimate or to predict - implications on the design for linear mixed models
During the last years mixed models have attracted an increasing popularity in many fields of applications due to advanced computer facilities. Although the main theme of the present workshop is devoted to optimal design of experiments for non-linear mixed models, it may be illustrative to elaborate the specific features of mixed models already in the linear case: Besides the estimation of population (location) parameters for the mean behaviour of the individuals a prediction of the response for the specific individuals under investigation may be of prior interest, for example in oncology studies to determine the further treatment of the patients investigated. While there have been some recent developments in optimal design for estimating the population parameters, the problem of optimal design for prediction has been considered as completely solved since the seminal paper by Gladitz and Pilz (1982). However, the optimal designs obtained there require the population parameters to be known or may be considered as an approximation, if the number of individuals is large. The latter may be inadequate, when the resulting "optimal design" fails to allow for estimation of the population parameters. Therefore we will develop the theory and solutions for finite numbers of individuals. Finally we will illustrate the trade-off in optimal designs caused by the two competing aims of estimation and prediction by a simple example. Gladitz, J. and J. Pilz (1982): Construction of optimal designs in random coefficient regression models. Statistics 13, 371-385.
DAEW02 9th August 2011
11:45 to 12:30
Experimental designs for estimating variance components
Many experiments are designed to estimate as precise as possible the fixed parameters in the required models. For example, D-optimum designs ensure that the volume of the confidence ellipsoid for these parameters is minimized. In some cases, only some of the fixed parameters are of interest. DS-optimality is then required. However, little attention has been given to the accuracy of the estimation of the variance components in the models, while they are very important for the interpretation of the results and in some cases it is their estimation that is the reason for the studies to be carried out. We give examples of such studies and focus on the design of experiments where only the variance components are important. The resulting DV-optimum designs are useful to use in crossed or split-plot validation experiments where fixed effects can be regarded as nuisance parameters. We conclude with some considerations about the implications of our results on the design of experiments where both the fixed parameters and the variance components are important.
DAEW02 9th August 2011
14:00 to 14:45
S Gilmour GLMs and GLMMs in the analysis of randomized experiments
The Normal linear model analysis is usually used as an approximation to the exact randomization analysis and extended to structures, such as nonorthogonal split-plot designs, as a natural approximation. If the responses are counts or proportions a generalized linear model (GLM) is often used instead. It will be shown that GLMs do not in general arise as natural approximations to the randomization model. Instead the natural approximation is a generalized linear mixed model (GLMM).
DAEW02 9th August 2011
14:45 to 15:30
Block designs for non-normal data via conditional and marginal models
Many experiments in all areas of science, technology and industry measure a response that cannot be adequately described by a linear model with normally distributed errors. In addition, the further complication often arises of needing to arrange the experiment into blocks of homogeneous units. Examples include industrial manufacturing experiments with binary responses, clinical trials where subjects receive multiple treatments and crystallography experiments in early-stage drug discovery. This talk will present some new approaches to the design of such experiments, assuming both conditional (subject-specific) and marginal (population-averaged) models. The different methods will be compared, and some advantages and disadvantages highlighted. Common issues, including the impact of correlations and the dependence of the design on the values of model parameters, will also be discussed.
DAEW02 9th August 2011
16:00 to 16:45
T Waite Designs for mixed models with binary response
For an experiment measuring a binary response, a generalized linear model such as the logistic or probit is typically used to model the data. However these models assume that the responses are independent. In blocked experiments, where responses in the same block are potentially correlated, it may be appropriate to include random effects in the predictor, thus producing a generalized linear mixed model (GLMM). Obtaining designs for such models is complicated by the fact that the information matrix, on which most optimality criteria are based, is computationally expensive to evaluate (indeed if one computes naively, the search for an optimal design is likely to take several months). When analyzing GLMMs, it is common to use analytical approximations such as marginal quasi-likelihood (MQL) and penalized quasi-likelihood (PQL) in place of full maximum likelihood. In this talk, we consider the use of such computationally cheap approximations as surrogates for the true information matrix when producing designs. This reduces the computational burden substantially, and enables us to obtain designs within a much shorter time frame. However, other issues also need to be considered such as the accuracy of the approximations and the dependence of the optimal design on the unknown values of the parameters. In particular, we evaluate the effectiveness of designs found using these analytic approximations through comparison to designs that are found using a more computationally expensive numerical approximation to the likelihood.
DAEW02 10th August 2011
09:30 to 10:30
E Demidenko Lecture on estimation and inference: Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models
Traditionally linear and nonlinear mixed effects models are estimated by maximum likelihood assuming normal distribution. The goal of this lecture is to discuss non-iterative methods for estimation of linear mixed models and simplified methods for estimation of generalized linear and nonlinear mixed models. In particular, we will talk about testing the presence of random effects, often overlooked fundamental test in the framework of mixed effects model. Simplified methods for generalized linear mixed models (GLMM), such as conditional logistic regression models with random intercepts and Poisson model for count data will be discussed. Limitations of popular generalized estimating equation (GEE) approach are uncovered. On the other hand, it is shown that this approach is valid for Poisson mixed model. Fixed sample maximum likelihood approach is introduced and its statistical properties are investigated via statistical simulations. Open problems and future work on statistical inference for mixed models are outlined.
DAEW02 10th August 2011
11:00 to 11:45
S Ueckert Optimal design of clinical trials in Alzheimer's disease
Clinical trials in Alzheimer's disease are long, costly and have a low success rate. In this setting, optimal design theory constitutes a valuable tool to help planning of clinical studies efficiently. Our work presents how the theory of optimal design can be applied to a complex situation like an Alzheimer's trial. We illustrate how to handle challenges like dropout, covariate relationships and clinically relevant constraints. Furthermore, we illustrate how clinical trials can be directly optimized for drug effect detection power.
DAEW02 10th August 2011
11:45 to 12:30
Optimal designs for pharmacokinetic and viral dynamic nonlinear mixed effect models in HIV treatment
Background: Nonlinear mixed effects models (NLMEM) are increasingly used for analysis of dose-exposure-response models. Methods for “population” designs evaluation/ optimisation are needed for complex models to limit the number of samples in each patient. Approaches for population designs optimisation based on the Fisher information matrix for NLMEM are developed, using mostly first order approximation of the model. Antiretroviral treatment in patients with HIV infection is complex and show large inter -individual variability. Pharmacokinetic and viral dynamic models are available to describe evolution of concentrations, viral loads and CD4 counts. Parameters of these models are estimated through NLMEM.

Objectives: 1) to evaluate and optimise designs in patients for the pharmacokinetic study of an antiretroviral drug (zidovudine) and its active metabolite using cost functions, 2) to evaluate and optimise designs for viral dynamic response and study power to compare treatments efficacy.

Methods: We used the models and estimated parameters from data of patients of the COPHAR 2 - ANRS 111 trial. Measuring active metabolite concentration of zidovudine is costly, as they are intracellular, and we explored D-optimal population designs using various cost functions. The viral dynamic model is a complex model written in ordinary differential equations. We proposed sparse designs with limited number of visits per patient during the one year follow up. We studied the predicted power to compare two treatments. These analyses were performed using PFIM3.2, an R function that we developed for population designs.

Results: We found a design with only three samples for zidovudine and two samples for its active metabolite and showed that optimal designs varied with cost functions. For the viral dynamic model, we showed that a design with 6 visits, if optimally located, can provide good information on response. We evaluated the power to compare two treatments and computed the number of subject needed to get adequate power.

Conclusion: We showed that population design optimisation provides efficient designs respecting clinical constraints in multi responses nonlinear mixed effects models.
DAEW02 10th August 2011
14:00 to 14:45
Bayesian enrichment strategies for randomized discontinuation trials
We propose optimal choice of the design parameters for random discontinuation designs (RDD) using a Bayesian decision-theoretic approach. We consider applications of RDDs to oncology phase II studies evaluating activity of cytostatic agents. The design consists of two stages. The preliminary open-label stage treats all patients with the new agent and identi?es a possibly sensitive subpopulation. The subsequent second stage randomizes, treats, follows, and compares outcomes among patients in the identi?ed subgroup, with randomization to either the new or a control treatment. Several tuning parameters characterize the design: the number of patients in the trial, the duration of the preliminary stage, and the duration of follow-up after randomization. We de?ne a probability model for tumor growth, specify a suitable utility function, and develop a computational procedure for selecting the optimal tuning parameters.
DAEW02 10th August 2011
14:45 to 15:30
Sequential stopping for high-throughput experiments
In high-throughput experiments sample size is typically chosen informally. Although formal sample size calculations have been proposed, they depend critically on prior knowledge. We propose a sequential strategy which, by updating knowledge when new data is available, depends less critically on prior assumptions. Compared to fixed sample size approaches, our sequential strategy stops experiments early when enough evidence has been accumulated, and recommends continuation when additional data is likely to provide valuable information. The approach is based on a decision-theoretic framework, guaranteeing that the chosen design proceeds in a coherent fashion. We propose a utility function based on the number of true positives which is straightforward to specify and intuitively appealing. As for most sequential design problems, an exact solution is computationally prohibitive. To address the computational challenge and also to limit the dependence on an arbitrarily chosen utility function we propose instead a simulation-based approximation with decision boundaries. The approach allows us to determine good designs within reasonable computing time and is characterized by intuitively appealing decision boundaries. We apply the method to next-generation sequencing, microarray and reverse phase protein array studies. We show that it can lead to substantial increases in posterior expected utility. An implementation of the proposed approach is available in the Bioconductor package gaga.
DAEW02 10th August 2011
16:00 to 16:30
Profiling the deviance to assess variability of parameter estimates in mixed models
DAEW02 11th August 2011
09:30 to 10:30
Mixed models: design of experiments
After a short discussion of commonalities between the mixed effect models and the Bayesian setting I define two design problems. The first one is related to the estimation of the population parameters and is often used in comparison of different treatments or in dose response studies. The necessity to estimate individual parameters (for a specific experimental unit like a clinical center or even a patient) leads to another optimization problem. I compare various criteria of optimality for both settings and derive elemental information matrices for various special cases. The latter allows to apply the standard machinery of optimal design of experiments.
DAEW02 11th August 2011
11:00 to 11:45
M Patan Group sensor scheduling for parameter estimation of random-effects distributed systems
The problem of sensor location for monitoring network with stationary nodes used for estimating unknown parameters of distributed-parameter system is addressed. In particular, the situation is considered, when the system parameters at the experimentation stage may randomly change according to the slight fluctuations of experimental conditions or differences in individual properties of observed distributed systems. A proper theoretical formulation of the sensor scheduling problem is provided together with a characterization of the optimal solutions. The theory is applicable to those practical situations in which a distributed system is sensitive to sampling or gives a different response at each run of the experiment. In the presented approach, some results from experimental design theory for dynamic systems are extended for the purpose of configuring a sensor grid in order to obtain practical and numerically tractable representation of optimum designs for estimation of the mean values of the parameters. A suitable computational scheme is illustrated by numerical example on a sensor scheduling problem for a two-dimensional example of dynamical distributed process representing the performance of magnetic brake.
DAEW02 11th August 2011
11:45 to 12:30
Optimum designs for transform-both-sides nonlinear mixed effects model in the presence of covariates
In the early stage of drug developing process, pharmaceutical companies are interested in whether the candidate compounds show interactions with other drugs. Since most of the drugs are metabolized in human liver, these early stage pharmacokinetic experiments are conducted at different levels of concentrations of the compound under study with randomly selected liver tissues. As enzymes play a vital role in metabolizing drugs, examination of association between compound and different enzymes could be useful in understanding the compound's potentiality of adverse drug reactions. Michaelis-Menten model is often used to examine the association between enzymes and compound. In many cases transform-both-sides Michales-Menten model fits pharmacokinetic data well compared to the regular Michaelis-Menten model. In this talk, we will discuss optimum designs for such transform-both-sides Michaelis-Menten model when information on covariates associated with randomly selected liver tissues are available.
DAEW02 12th August 2011
09:45 to 10:30
S Leonov Approximation of the individual Fisher information matrix and its use in design of population PK/PD studies
We continue a discussion started at PODE 2010 meeting about different types of approximation of the individual Fisher information matrix and their use in design of population PK/PD studies which are described by nonlinear mixed effects models. We focus on several Monte Carlo-based options and provide examples of their performance.
DAEW02 12th August 2011
11:00 to 11:45
T Mielke Approximation of the Fisher information and design in nonlinear mixed effects models
The missing closed form representation of the probability density of the observations is one main problem in the analysis of Nonlinear Mixed Effects Models. Often local approximations based on linearizations of the model are used to approximately describe the properties of estimators. The Fisher Information is of special interest for designing experiments, as its inverse yields a lower bound of the variance of any unbiased estimator. Different linearization approaches for the model yield different approximations of the true underlying stochastical model and the Fisher Information (Mielke and Schwabe (2010)). Target of the presentation are alternative motivations of Fisher-Information approximations, based on conditional moments. For an individual design, known inter-individual variance and intra-individual variance, the Fisher Information for estimating the population location parameter vector results in an expression depending on conditional moments, such that approximations of the expectation of the conditional variance and the variance of the conditional expectation yield approximations of the Fisher Information, which are less based on distribution assumptions. Tierney et. al. (1986) described fully exponential Laplace approximations as an accurate method for approximating posterior moments and densities in Bayesian models. We present approximations of the Fisher Information, obtained by approximations of conditional moments with a similar heuristic and compare the impact of different Fisher Information approximations on the optimal design for estimating the population location parameters in pharmacokinetic studies.
DAEW02 12th August 2011
11:45 to 12:30
Improved conditional approximations of the population Fisher information matrix
We present an extended approximation of the Fisher Information Matrix (FIM) for nonlinear mixed effects models based on a first order conditional (FOCE) approximation of the population likelihood. Unlike previous FOCE based FIM, we use the empirical Bayes estimates to derive the FIM. In several examples, compared to the old FOCE based FIM, the improved FIM predicts parameter uncertainty much closer to simulation based empirical parameter uncertainty. Furthermore, this approach seems more robust against other approximations of the FIM, i.e. (Full/Reduced FIM). Finally, the new FOCE derived FIM is slightly closer to the simulated empirical precision than the FO based FIM.
DAEW02 12th August 2011
14:00 to 14:45
T Waterhouse Experiences in optimal design for population PK/PD models
In all stages of contemporary drug development, the use of mixed effects ("population") models has become crucial to the understanding of pharmacokinetic (PK) and pharmacodynamic (PD) data. Population PK/PD models allow for the use of sparse sampling (i.e., fewer samples per subject), and they can be used to explain different sources of variability, ultimately leading to the possibility of dose optimisation for special populations or even individuals.

The design of trials involving population PK/PD models is often assessed via simulation, although the use of optimal design is gaining prominence. In recent years there have been a number of methodological advances in this area, but this talk will focus on more practical considerations of optimal design in the setting of a pharmaceutical company, from time and cost constraints to awareness and acceptance of optimal design methods. Several examples will be presented for illustration.
DAEW02 12th August 2011
14:45 to 15:30
S Duffull A general method to determine sampling windows for nonlinear mixed effects models
Many clinical pharmacology studies require repeated measurements to be taken on each patient and analysis of the data are conducted within the framework of nonlinear mixed effects models. It is increasingly common to design these studies using information theoretic principles due to the need for parsimony because of the presence of many logistical and ethical constraints. D-optimal design methods are often used to identify the best possible study conditions, such as the dose and number and timing of blood sample collection. However, the optimal times for collecting blood samples may not be feasible in clinical practice. Sampling windows, a time interval for blood sample collection, have been proposed to provide flexibility while preserving efficient parameter estimation. Due to the complexity of nonlinear mixed effects models there is generally no analytical solution available to determine sampling windows. We propose a method for determination of sampling windows based on MCMC sampling techniques. The proposed method reaches the stationary distribution rapidly and provides time-sensitive windows around the optimal design points. The proposed method is applicable to determine windows around any continuous design variable for which repeated measures per run are required. This has particular importance for clinical pharmacology studies.
DAEW02 12th August 2011
16:00 to 16:30
Software for the population optimum design of experiments
DAEW03 15th August 2011
09:00 to 09:45
Design of clinical trials with multiple end points of different types
Several correlated endpoints are observed in almost any clinical trial. Typically one of them is claimed as a primary end point and the design (dose allocation and sample size) is driven by a single response model. I discuss the design problem with multiple end points which potentially may be of different nature. For instance, the efficacy end point may be continuous while the toxicity end point may be discrete. I emphasize the necessity to differentiate between the responses and utility functions. The response (end point) functions are what we observe while the utility functions are what should be reported or used in the decision making process. The discussed criteria of optimality are related to the latter and usually describe the precision of their estimators.
DAEW03 15th August 2011
09:45 to 10:30
The Role of Operating Characteristic in Assessing Bayesian Designs in Pharmaceutical Drug Development
The guidelines that are available covering the reporting of Bayesian clinical trials cover many important aspects including the choice of prior, issues surrounding computation such as the convergence of MCMC approaches and the appropriate statistics for summarising posterior distributions. Noteworthy is the total absence of any discussion of operating characteristics of Bayesian designs. This may be because these guidelines are largely written by academic and/or autonomous government groups and not by those involved in pharmaceutical drug development, for example sponsor associations or regulatory agencies. However operating characteristics are becoming of increasing importance in drug development as is witnessed by the EMA and FDA guidances on adaptive designs and the FDA guidance on Bayesian methodology in device trials. In this talk I investigate issues in determining operating characteristics of clinical trial designs with a particular emphasis on Bayesian designs but will also cover more general issues such as the design of simulation experiments and simulation evidence for strong control of type I error.
DAEW03 15th August 2011
11:00 to 11:45
The Critical Path: "Biomarker Development and Streamlining Clinical Trials"
DAEW03 15th August 2011
11:45 to 12:30
Experiments for Enzyme Kinetic Models
Enzymes are biological catalysts that act on substrates. The speed of reaction as a function of substrate concentration typically follows the nonlinear Michaelis-Menten model. The reactions can be modified by the presence of inhibitors, which can act by several different mechanisms, leading to a variety of models, all also nonlinear.

The talk will describe the models and derive optimum experimental designs for model building. When the model is known these include D-optimum designs for all the parameters for which we obtain analytical solutions. Ds-optimum designs for the inhibition constant are also of scientific importance.

When the model is not known, the choice is often between two three-parameter models. These can be combined in a single four-parameter model. Ds-optimum designs for the parameter of combination provide a means of establishing which model is true. However, T-optimum designs for departures from the individual models provide tests of maximum power for departures from the models. With two models on an equal footing, compound T-optimum designs are required. Their properties are compared with those of the Ds-optimum designs in the combined model, which have the advantage of being easier to compute.
DAEW03 15th August 2011
14:00 to 14:45
The Role of Cluster Randomization Trials in Health Research
Cluster randomization trials are those that randomize intact social units or clusters of individuals to different intervention groups. Such trials have been particularly widespread in the evaluation of educational programs and innovations in the provision of health care. This talk will deal with basic issues that must be considered when investigators first consider adopting a cluster randomization trial. Foremost among these is the need to justify the choice of this design given its statistical inefficiency relative to an individually randomized design. The role of matching and stratification in the design of a cluster trial ,and the reasons why many such trials are underpowered, will also be discussed.
DAEW03 15th August 2011
14:00 to 14:45
Biomarker-based Bayesian Adaptive Designs for Targeted Agent Development - Implementation and Lessons Learned from the BATTLE Trial
Advances in biomedicine have fueled the development of targeted agents in cancer therapy. Targeted therapies have shown to be more efficacious and less toxic than the conventional chemotherapies. Targeted therapies, however, do not work for all patients. One major challenge is to identify markers for predicting treatment efficacy. We have developed biomarker-based Bayesian adaptive designs to (1) identify prognostic and predictive markers for targeted agents, (2) test treatment efficacy, and (3) provide better treatments for patients enrolled in the trial. In contrast to the frequentist equal randomization designs, Bayesian adaptive randomization designs allow treating more patients with effective treatments, monitoring the trial more frequently to stop ineffective treatments early, and increasing efficiency while controlling type I and type II errors. Bayesian adaptive design can be more efficient, more ethical, and more flexible in the study conduct than standard design s. We have recently completed a biopsy-required, biomarker-driven lung cancer trial, BATTLE, for evaluating four targeted treatments. Lessons learned from the design, conduct, and analysis of this Bayesian adaptive design will be given.
DAEW03 15th August 2011
14:45 to 15:25
Cluster Randomised Trials: coping with selective recruitment, baseline covariates and anticipated drop-outs?
DAEW03 15th August 2011
14:45 to 15:25
K Wathen ISPY-2: Adaptive Design to Identify Treatments for Biomarker
The ISPY2 process is a new approach to conducting clinical research that utilizes a patient’s biomarkers measurements to predict which treatment is most likely to provide benefit. Patients will be adaptively randomized and the treatment assignment probabilities will be altered to favor the treatment that, on average, appears superior for a given patient’s biomarker characteristics. In contrast to the traditional phase II clinical trial, which has a fixed number of treatments, the ISPY2 process will allow new agents to enter the trial as they become available and will "graduate" treatments based on the likelihood of future success in a subset of the patient population. A simulation study is presented and examples given to demonstrate the adaptive nature of the design.
DAEW03 15th August 2011
15:25 to 16:05
Sample size calculations for cluster randomised trials
In this talk I will address the major issues in calculating an adequate sample size for cluster randomised trials. It has long been recognised that in order for these trials to be adequately powered, between cluster variability must be accounted for in the sample size calculations. This is usually done by using an estimate of the intra-cluster correlation coefficient (ICC) in a design effect which is then used to adjust the sample size required for an individually randomised trial aiming to detect the same clinically important difference. More recently it has been recognised that variable cluster size should also be accounted for and a simple adjustment to the design effect provides a means to do this. Investigators still face three challenges, however: lack of information about variability in cluster size prior to the trial, lack of information about the value of the ICC prior to the trial, the adjustment for variable cluster size does not strictly match all methods of analysis. I will illustrate these challenges with some examples and outline approaches that have been and could be adopted to address them.
DAEW03 15th August 2011
15:25 to 16:05
T Braun Bayesian Adaptive Designs for Identifying Maximum Tolerated Combinations of Two Agents
Phase I trials of combination cancer therapies have been published for a variety of cancer types. Unfortunately, a majority of these trials suffer from poor study designs that either escalate doses of only one of the agents and/or use an algorithmic approach to determine which combinations of the two agents maintain a desired rate of dose-limiting toxicities (DLTs), which we refer to as maximum tolerated combinations (MTCs). We present a survey of recent approaches we have developed for the design of Phase I trials seeking to determine the MTC. For each approach, we present a model for the probability of DLT as a function of the doses of both agents. We use Bayesian methods to adaptively estimate the parameters of the model as each patient completes their follow-up in the trial, from which we determine the doses to assign to the next patient enrolled in the trial. We describe methods for generating prior distributions for the parameters in our model from a basic set of i nformation elicited from clinical investigators. We compare and contrast the performance of each approach in a series of simulations of a hypothetical trial that examines combinations of four doses of two agents and compare the results to those of an algorithmic design known as an A+B+C design.
DAEW03 15th August 2011
16:35 to 17:15
A cluster-randomised cross-over trial
I will describe a trial which combined a cluster-randomised design with a cross-over design. The Preterm Infant Parenting (PIP) trial evaluated a nurse-led training intervention delivered to parents of prematurely born babies to help them meet their babies' needs. An individually randomised trial risked extensive "contamination" of parents in the control arm with knowledge of the intervention, so the investigators instead randomised neonatal units. However, neonatal units differ widely, and only 6 neonatal units were available, so a conventional cluster randomised design would have been underpowered. In the selected design, the six neonatal units were randomly allocated to deliver intervention or control to families recruited during a first 6-month period; after a 2-month interval, each unit then delivered the opposite condition to families recruited during a second 6-month period.

I will present the relative precisions of individually randomised, cluster-randomised and cluster-crossover designs, and design issues including the need for a wash-out period to minimise carry-over. The analysis can be conveniently done using cluster-level summaries. I will end by discussing whether cluster-crossover designs should be more widely used.
DAEW03 15th August 2011
16:35 to 17:15
Dose Selection Incorporating PK/PD Information in Early Phase Clinical Trials.
Early phase clinical trials generate information on pharmacokinetic parameters and on safety issues. In addition, a dose level, or a set of dose levels, needs to be selected for further examination in later phases. If patients, rather than healthy volunteers, take part in the early phase, it may be possible to observe the effects of the drug on the disease. In the presentation we will discuss some statistical, ethical and economic aspects of designing optimum adaptive clinical trials for dose selection incorporating both pharmacokinetic and pharmacodynamic endpoints.
DAEW03 15th August 2011
17:15 to 17:55
Ethical issues posed by cluster randomized trials in health research
The cluster randomized trial (CRT) is used increasingly in knowledge translation research, quality improvement research, community based intervention studies, public health research, and research in developing countries. While there is a small but growing literature on the subject, ethical issues raised by CRTs require further analysis. CRTs only partly fit within the current paradigm of research ethics. They pose difficult ethical issues for two basic reasons related to their design. First, CRTs involve the randomization of groups rather than individuals, and our understanding of the moral status of groups in incomplete. As a result, the answers to pivotal ethical questions, such as who may speak in behalf of a particular group and on what authority they may do so, are unclear. Second, in CRTs the units of randomization, experimentation, and observation may differ, meaning, for instance, that the group that receives the experimental intervention may not be the same as the group from which data are collected. The implications for the ethics of trials of experimental interventions with (solely) indirect effects on patients and others is not currently well understood. Here I lay out some basic considerations on who is a research subject, from whom one must obtain informed consent, and the use of gatekeepers in CRTs in health research (Trials 2011; 12(1): 100).
DAEW03 15th August 2011
17:15 to 17:55
K Cheung Objective Calibration of the Bayesian Continual Reassessment Method
The continual reassessment method (CRM) is a Bayesian model-based design for percentile estimation in sequential dose finding trials. The main idea of the CRM is to treat the next incoming patient (or group of patients) at a recent posterior update of the target percentile. This approach is intuitive and ethically appealing on a conceptual level. However, the performance of the CRM can be sensitive to how the CRM model is specified. In addition, since the specified model directly affect the generation of the design points in the trial, sensitivity analysis may not be feasible after the data are collected.

As there are infinitely many ways to specify a CRM model, the process of model calibration, typically done by trial and error in practice, can be complicated and time-consuming. In my talk, I will first review the system of model parameters in the CRM, and then describe some semi-automated algorithms to specify these parameters based on existing dose finding theory. Simulation results will be given to illustrate this semi-automated calibration process in the context of some real trial examples.
DAEW03 16th August 2011
09:30 to 10:00
Adaptive Dose-Ranging Designs with Two Efficacy Endpoints
Following the introduction of the continual reassessment method by O’Quigley, Pepe and Fisher, there has been considerable interest in formal statistical procedures for phase I dose-finding studies. The great majority of published accounts relate to cancer patients treated once with a single dose of the test drug who return a single binary observation concerning the incidence of toxicity. However, most phase I dose-finding studies are not of such a simple form. Drugs being developed for milder conditions than cancer are usually first tested in healthy volunteers who participate in multiple dosing periods, returning a continuous pharmacokinetic response each time.

This talk will describe Bayesian decision procedures which have been developed for such dose-finding studies in healthy volunteers. The principles behind the approach will be described and an evaluation of its properties presented. An account will be given of an implementation of the approach in a study conducted in Scandinavia. Generalisation to studies in which more than one response is used will also be discussed.
DAEW03 16th August 2011
10:00 to 10:30
Penalized optimal design for dose finding
We consider optimal design under a cost constraint, where a scalar coefficient L sets the compromise between information and cost. For suitable cost functions, one can force the support points of an optimal design measure to concentrate around points of minimum cost by increasing the value of L, which can be considered as a tuning parameter that specifies the importance given to the cost constraint.

An example of adaptive design in a dose-finding problem with a bivariate binary model will be presented. As usual in nonlinear situations, the optimal design for any arbitrary choice of L depends on the unknown value of the model parameters. The construction of this optimal design can be made adaptive, by using a steepest-ascent algorithm where the current estimated value of the parameters (by Maximum Likelihood) is substituted for their unknown value. Then, taking benefit of the fact that the design space (the set of available doses) is finite, one can prove the strong consistency and asymptotic normality of the ML estimator when L is kept constant. Since the cost is reduced when L is increased, it is tempting to let L increase with the number of observations (patients enroled in the trial). The strong consistency of the ML estimator is then preserved when L increases slowly enough.
DAEW03 16th August 2011
10:30 to 11:00
C Jennison Jointly optimal design of Phase II and Phase III clinical trials: an over-arching approach
We consider the joint design of Phase II and Phase III trials. We propose a decision theoretic formulation with a gain function arising from a positive Phase III outcome and costs for sampling and for time taken to reach a positive conclusion. With a prior for the dose response model and a risk curve for the probability that doses fail on safety grounds, the challenge is to optimise the design for comparing doses in Phase II, the choice of dose or doses to take forward to Phase III, and the Phase III design. We shall show it is computationally feasible to tackle this problem and discuss possible generalisations from an initial, simple formulation.
DAEW03 16th August 2011
11:30 to 12:00
Utility and pitfals of dose ranging trials with multiple study objectives: fixed or adaptive
Multiple study objectives have been proposed in dose-ranging studies. Traditionally, a dose-response study is pursued as a fixed design of equal randomization ratio to each study arm and with only single study objective of detecting dose-response (DR). The PhRMA adaptive dose ranging working group has taken the ownership of the problem using an adaptive design or an adaptive analysis approach acknowledging the trial exploratory. The authors have critically pursued multiple study objectives via simulation studies (JBS 2007, SBR 2010) and concluded that achieving the first goal of detecting DR is much easier than achieving the fourth goal of estimating it, or the second and third goals of identifying the target dose to bring into the confirmatory phase. It is tempting to consider dose-ranging, dose-response and sometimes exposure-response studies as pivotal evidence, especially when they are designed as a two-stage adaptive trial. Design according to the study objective is vital to the success of the study. In this presentation, the utility and pitfalls of a two-stage adaptive dose-ranging trial will be elucidated. Challenges and reflection on some of the successful and not so successful regulatory examples will be highlighted. The appropriate stages distinguishing between learning stage versus confirmatory stage in a drug development program will also be discussed using some typical studies.
DAEW03 16th August 2011
12:00 to 12:30
J Pinheiro Improving dose-finding methods in clinical development: design, adaptation, and modeling
The pharmaceutical industry experiences increasingly challenging conditions, with a combination of escalating development costs, tougher regulatory environment, expiring patents on important drugs, and fewer promising drugs in late stage of development. Part of this pipeline problem is attributed to poor dose selection for confirmatory trials leading to high attrition rates (estimated at 50%) for Phase 3 programs. Improving the efficiency of drug development, in general, and of dose-finding studies in particular, is critical for the survival of the industry. A variety of methods have been proposed to improve dose selection and, more broadly, understanding of the dose-response relationship for a compound. Among them: adaptive designs, modeling and simulation approaches, optimal designs, and clinical utility indices. In this talk we’ll discuss and illustrate the utilization of some of those approaches in the context of dose-finding trials. The results of a comprehensive se t of simulation studies conducted by the PhRMA working group on Adaptive Dose-Ranging Studies will be used to discuss the relative merits of the various approaches and to motivate recommendations on their use in practice.
DAEW03 16th August 2011
14:00 to 14:30
Response-adaptive dose-finding under model uncertainty
In pharmaceutical drug development, dose-finding studies are of critical importance because both safety and clinically relevant efficacy have to be demonstrated for a specific dose of a new compound before market authorization. Motivated by a real dose-finding study, we propose response-adaptive designs addressing two major challenges in dose-finding studies: uncertainty about the dose-response models and large variability in parameter estimates. To allocate new cohorts of patients in an ongoing study, we use optimal designs that are robust under model uncertainty. In addition, we use a Bayesian shrinkage approach to stabilize the parameter estimates over the successive interim analyses used in the adaptations. This approach allows us to calculate updated parameter estimates and model probabilities that can then be used to calculate the optimal design for subsequent cohorts. The resulting designs are hence robust with respect to model misspecification and additionally can ef ficiently adapt to the information accrued in an ongoing study. We focus on adaptive designs for estimating the minimum effective dose, although alternative optimality criteria or mixtures thereof could be used, enabling the design to address multiple objectives. In an extensive simulation study, we investigate the operating characteristics of the proposed method under a variety of scenarios.
DAEW03 16th August 2011
14:30 to 15:00
Dose Escalation using a Bayesian Model: rational decision rules.
In dose escalation studies, the potential ethical costs of administering high doses must be weighted against the added utility from gaining safety information about high (and potentially effective doses). This is the rationale for starting with low doses while the confidence in the safety of the drug is low, and escalating to higher doses as confidence grow. A decision theoretical framework is proposed.
DAEW03 16th August 2011
15:00 to 15:30
Bayesian approaches to Phase I clinical trials: methodological and practical aspects
Statistics plays an important role in drug development, in particular in confirmatory (phase III) clinical trials, where statistically convincing evidence is a requirement for the registration of a drug. However, statistical contributions to phase I clinical trials are typically sparse. A notable exception is oncology, where statistical methods abound. After a short review of the main approaches to phase I cancer trials, we discuss a fully adaptive model-based Bayesian approach which strikes a reasonable balance with regard to various objectives. First, proper quantification of the risk of dose-limiting toxicities (DLT) is the key to acceptable dosing recommendations during the trial, and the declaration of the maximum tolerable dose (MTD), a dose with an acceptable risk of DLT, at the end of the trial. In other words, statistically driven dosing-recommendations should be clinically meaningful. Second, the operating characteristics of the design should be acceptable. That is, the probability to find the correct MTD should be reasonably high. Third, not too many patients should be exposed to overly toxic doses. And fourth, the approach should allow for the inclusion of relevant study-external information, such as pre-clinical data or data from other human studies. The methodological and practical aspects of the Bayesian approach to dose finding trials in Oncology phase I will be discussed, and examples from actual trials will be used to illustrate and highlight important issues. The presentation concludes with a discussion of the main challenges for a large-scale implementation of innovative clinical trial designs in the pharmaceutical industry.
DAEW03 16th August 2011
15:30 to 16:00
S Leonov Application of model-based designs in drug development
We discuss the use of optimal model-based designs at different stages of drug development. Special attention is given to adaptive model-based designs in dose finding studies and designs for nonlinear mixed models which arise in population pharmacokinetic/pharmacodynamic studies. Examples of software tools and their application are provided
DAEW03 16th August 2011
16:45 to 18:00
M Krams Design of Experiments in Healthcare, dose-ranging studies, astrophysics and other dangerous things
Panel discussion of the day's topics including:
  • - Clinical objectives at different stages of drug development
  • - Their formulation in terms of design of experiment objectives
  • - Optimal design for each objective
  • - Challenges in their implementation
DAEW03 17th August 2011
09:00 to 09:45
A Brief History of DCEs and Several Important Challenge

A confrontation with reality led to integration of conjoint measurement, discrete multivariate analysis of contingency tables, random utility theory and discrete choice models and design of statistical experiments. Few seem to realise that discrete choice experiments (DCEs) are in fact sparse, incomplete contingency tables. Thus, much of that literature informs and assists design and analysis of DCEs, such that often complex statistical models are largely unnecessary. Many lack this perspective, and hence much of the literature is dominated by model-driven views of the design and analysis of DCEs.

The transition from the first DCEs to the present was very incremental and haphazard, with many advances being driven by market confrontations. For example "availability" designs arose from being asked to solve problems with out-of-stock conditions, infrastructure interruptions (eg, road or bridge closures), etc. Progress became more rapid and systematic from the late 1990s onwards, particularly with researchers skilled in optimal design theory getting involved in the field. Thus, there have been major strides in the optimal design of DCEs, but there now seems to be growing awareness that experiments on humans pose interesting issues for "optimal" design, particularly designs that seek to optimise statistical efficiency.

Along the way we stumbled onto individuals, error variance differences, cognitive process differences and we're still stumbling.

This talk is about a journey that starts in 1927 with paired comparisons, travels along an ad hoc path until it runs into an airline in 1978, emerges five years later as a systematic way to design and implement multiple comparisons, and slowly wanders back and forth until it begins to pick up speed and follow a "more optimal" path. Where is it going? Well, one researcher's optimum, may well be one human's suboptimum. Where should it be going? The road ahead is littered with overconfidence and assumptions. A better path is to invest in insurance against ignorance and assumptions.

DAEW03 17th August 2011
09:45 to 10:30
Discrete Choice Experiments in Health Economics
Since their introduction in health economics in the early 1990s, there has been an increased interest in the use of discrete choice experiments (DCEs), both at the applied and methodological level. At the applied level, whilst the technique was introduced into health economics to go beyond narrow definitions of health benefits (Quality Adjusted Life Years, QALYs), and value broader measures of utility (patient experiences/well being), the technique is being applied to address an ever increasing range of policy questions. Methodologically developments have also been made with respect to methods for developing attributes and levels, techniques for defining choice sets to present to individuals (experimental design) and methods for analysis of response data. This talk considers the journey of DCEs in health economics, discussing both where we are, and where we should go.
DAEW03 17th August 2011
11:00 to 11:45
J Rose Sample size, statistical power and discrete choice experiments: How much is enough
Discrete choice experiments (DCE) represent an important method for capturing data on the preferences held by both patients and health care practitioners for various health care policies and/or products. Identifying methods for reducing the number of respondents required for SC experiments is important for many studies given increases in survey costs. Such reductions, however, must not come at the cost of a lessening in the reliability of the parameter estimates obtained from models of discrete choice.

The usual method of reducing the number of sampled respondents in DCE experiments conducted in health studies appears to be using orthogonal fractional factorial experimental designs with respondents assigned to choice situations via either a blocking variable or via random assignment. Through the use of larger block sizes (i.e., each block has a larger number of choice situations) or by the use of a greater number of choice situations being randomly assigned per respondent, analysts may decrease the number of respondents whilst retaining a fixed number of choice observations collected. It should be noted, however, that whilst such strategies reduce the number of respondents required for DCE experiments, they also reduce the variability observed in other covariates collected over the sample.

Yet despite practical reasons to reduce survey costs, particularly through reductions in the sample sizes employed in DCE studies, questions persist as to the minimum number of choice observations, both in terms of the number respondents as well as the number of questions asked of each respondent, that are required to obtain reliable parameter estimates for discrete choice models estimated from DCE data. In this talk, we address both issues in the context of the main methods of generating experimental designs for DCEs in health care studies. We demonstrate a method for calculating the minimum sample size required for a DCE that does not require rules of thumb.
DAEW03 17th August 2011
11:00 to 11:45
C Taylor Systematic review of the use of stepped wedge cluster randomized trials
Background In a stepped wedge cluster randomized controlled trial, clusters are randomly allocated to the order in which they will receive the intervention, with one cluster receiving the intervention at the beginning of each study period (step). Therefore by the end of the recruitment period all clusters have received the intervention, but the number of periods in the ‘control’ and ‘intervention’ sections of the wedge will vary across clusters.

Objective To describe the application of the stepped wedge cluster randomized controlled trial design using a systematic review.

Study Design and Setting We searched MEDLINE, EMBASE, PSYCINFO, HMIC, CINAHL, Cochrane Library, Web of Knowledge and Current Controlled Trials Register for articles published up to January 2010. Stepped wedge cluster randomized controlled trials from all fields of research were included. Two authors independently reviewed and extracted data from the studies.

Results Twenty five studies were included in the review. Motivations for using the design included ethical, logistical, financial, social and political acceptability and methodological reasons. Most studies were evaluating an intervention during routine implementation. For most of the included studies there was also a belief or empirical evidence suggesting that the intervention would do more good than harm. There was variation in data analysis methods and insufficient quality of reporting.

Conclusions The stepped wedge cluster randomized controlled trial design has been mainly used for evaluating interventions during routine implementation, particularly for interventions that have been shown to be effective in more controlled research settings, or where there is lack of evidence of effectiveness but there is a strong belief that the intervention will do more good than harm. There is need for consistent data analysis and reporting.
DAEW03 17th August 2011
11:45 to 12:30
P Goos Optimal designs for discrete choice experiments in the presence of many attributes
In a discrete choice experiment each respondent typically chooses the best product or service sequentially from many groups or choice sets of alternatives which are characterized by a number of different attributes. Respondents can find it difficult to trade off prospective products or services when every attribute of the offering changes in each comparison. Especially in studies involving many attributes, respondents get overloaded by the complexity of the choice task. To overcome respondent fatigue, it makes sense to simplify the comparison by holding some of the attributes constant in every choice set. A study in the health care literature where eleven attributes were allocated across three different experimental designs with only five attributes being varied motivates the approach we present. However, our algorithm is more general, allowing for any number of attributes and a smaller number of fixed attributes. We describe our algorithmic approach and show how the resulting design performed in our motivating example.
DAEW03 17th August 2011
11:45 to 12:30
Challenges in the Design and Analysis of a Randomized, Phased Implementation (Stepped-Wedge) Study in Brazil
The cluster randomized one-way crossover design, known as a stepped-wedge design, is becoming increasingly popular, especially for health studies in less industrialized countries. This design, however, presents numerous challenges, both for design and analysis.

Two issues regarding the design of a stepped-wedge study will be highlighted: randomization and power. Specifically, first, there is the question of how best to constrain the randomization so that it is balanced over time with respect to covariates-a highly constrained but ad hoc procedure will be presented. Second, the various pieces of information necessary for a full power calculation will be delineated.

As with cluster-randomized designs in general, close attention must be given to study hypotheses of interest, and the relation of these to the two levels of intervention-cluster and individual. A study of isoniazid prophylaxis implementation in 29 clinics in Rio de Janeiro is used to exemplify the range of questions that can arise. A few analyses of the data are also presented, so as to illustrate the degree to which data analytic choices to address these questions can vary the results, and to show the longitudinal complexities that need be considered.
DAEW03 17th August 2011
12:30 to 13:00
H Grossmann Partial profile paired comparison designs for avoiding information overload
The inclusion of many attributes makes a choice experiment more realistic. The price to be paid for this increased face validity is however that the respondents' task becomes cognitively more demanding. In order to avoid negative side effects, such as fatigue or information overload, a common strategy is to employ partial profiles, which are incomplete descriptions of the available alternatives. This talk presents efficient designs for the situation where each choice set is a pair of partial profiles and where only the main effects of the attributes are to be estimated.
DAEW03 17th August 2011
12:30 to 13:00
SG Thompson Stepped wedge randomised trials
DAEW03 18th August 2011
09:00 to 09:45
E Lancsar Discrete choice experimental design for alternative specific choice models: an application exploring preferences for drinking water
Health economic applications of discrete choice experiments have generally used generic forced choice experimental designs, or to a lesser extent generic designs with an appended status quo or opt out option. Each has implications for the types of indirect utility functions that can be estimated from such designs. Less attention has been paid to allowing for alternative specific choice experiments. This paper focuses on the development and use of an experimental design that allows for both labelled alternatives and alternative specific attribute effects in the context of a best worst choice study designed to investigate preferences for different types of drinking water. Results including testing for alternative specific effects and preferences for different types of drinking water options are presented, with implications explored.
DAEW03 18th August 2011
09:45 to 10:30
The usefulness of Bayesian optimal designs for discrete choice experiments

Recently, the use of Bayesian optimal designs for discrete choice experiments has gained a lot of attention, stimulating the development of Bayesian choice design algorithms. Characteristic for the Bayesian design strategy is that it incorporates the available information about people's preferences for various product attributes in the choice design. In a first part of this talk, we show how this information can be best incorporated in the design using an experiment from health care in which preferences are measured for changes in eleven health system performance domains.

The Bayesian design methodology is in contrast with the linear design methodology which is also used in discrete choice design, and which depends for any claims of optimality on the unrealistic assumption that people have no preference for any of the attribute levels. Nevertheless, linear design principles have often been used to construct discrete choice experiments. In a second part, we show using a simulation study that the resulting utility-neutral optimal designs are not competitive with Bayesian optimal designs for estimation purposes.

DAEW03 18th August 2011
11:45 to 12:30
P van de Ven An efficient alternative to the complete matched-pairs design for assessing non-inferiority of a new diagnostic test
Studies for assessing non-inferiority of a new diagnostic test relative to a standard test typically use a complete matched-pairs design in which results for both tests are obtained for all subjects. We present alternative non-inferiority tests for the situation where results for the standard test are obtained for all subjects but results for the new test are obtained for a subset of those subjects only. This situation is common when results for the standard test are available from a monitoring or screening programme or from a large biobank. A stratified sampling procedure is presented for drawing the subsample of subjects that receive the new diagnostic test with strata defined by the two outcome categories of the standard test. Appropriate statistical tests for non-inferiority of the new diagnostic test are derived. We show that if diagnostic test positivity is low, the number of subjects to be tested with the new test is minimized when stratification is non-proportional.
DAEW03 18th August 2011
14:00 to 14:45
From Bench to Bedside: The Application of Differential Protein Networks on Bayesian Adaptive Designs for Trials with Targetted Therapies
DAEW03 18th August 2011
14:45 to 15:30
K Kim A Bayesian Adaptive Design with Biomarkers for Targeted Therapies and Some Commentary on Adaptive Designs
Pharmacogenomic biomarkers are considered an important component of targeted therapies as they can potentially be used to identify patients who are more likely to benefit from them. New study designs may be helpful which can evaluate both the prognosis based on the biomarkers and the response from targeted therapies. In this talk I will present a recently developed Bayesian response-adaptive design. The design utilizes individual pharmacogenomic profiles and clinical outcomes as they become available during the course of the trial to assign most effective treatment to patients. I will present simulation studies of the proposed design. In closing I will share my perspectives on adaptive designs in general.
DAEW03 18th August 2011
16:00 to 16:45
Optimizing the Concentration and Bolus of a Drug Delivered by Continuous Infusion
We consider treatment regimes in which an agent is administered continuously at a specified concentration until either a therapeutic response is achieved or a predetermined maximum infusion time is reached. Additionally, a portion of the planned maximum total amount of the agent is administered as an initial bolus. Efficacy is the time to response, and toxicity is a binary indicator of an adverse event that may occur after infusion. The amount of the agent received by the patient thus depends on the time to response, which in turn affect the probability of toxicity. An additional complication arises if response is evaluated periodically, since the response time is interval censored. We address the problem of designing a clinical trial in which such response time data and toxicity are used to jointly optimize the concentration and size of the initial bolus. We propose a sequentially adaptive Bayesian design that chooses the optimal treatment for each patient by maximizing the posterior mean utility of the joint efficacy-toxicity outcome. The methodology is illustrated by a clinical trial of tissue plasminogen activator (tPA) infused intra-arterially as rapid treatment for acute ischemic stroke. The fundamental problem is that too little tPA may not dissolve the clot that caused the stroke, but too much may cause a symptomatic intra-cranial hemorrhage, which often is fatal. A computer simulation study of the design in the context of the tPA trial is presented.
DAEW03 18th August 2011
16:45 to 17:30
Discussion of three talks on (covariate) adaptive designs
DAEW03 19th August 2011
09:45 to 10:30
Functional uniform prior distributions for nonlinear regression
In this talk I will consider the topic of finding prior distributions in nonlinear modelling situations, that is, when a major component of the statistical model depends on a non-linear function. Making use of a functional change of variables theorem, one can derive a distribution that is uniform in the space of functional shapes of the underlying nonlinear function and then back-transform to obtain a prior distribution for the original model parameters. The primary application considered in this article is non-linear regression in the context of clinical dose-finding trials. Here the so constructed priors have the advantage that they are parametrization invariant as opposed to uniform priors on parameter scale and can be calculated prior to data collection as opposed to the Jeffrey’s prior. I will investigate the priors for a real data example and for calculation of Bayesian optimal designs, which require the prior distribution to be available before data collection has started (so that classical objective priors such as Jeffreys priors cannot be used).
DAEW03 19th August 2011
11:00 to 11:45
PKPD modelling to optimize dose-escalation trials in Oncology
The purpose of dose-escalation trials in Oncology is to determine the highest dose that would provide the desirable treatment effect without unacceptable toxicity, a so-called Maximum Tolerated Dose (MTD). Neuenschwander et al. [1] introduced a Bayesian model-based approach that provides realistic inferential statements about the probabilities of a Dose-Limiting Toxicity (DLT) at each dose level. After each patient cohort, information is derived from the posterior distribution of the model parameters. This model output helps the clinical team to define the dose for the next patient cohort. The approach not only allows for more efficient patient allocation, but also for inclusion of prior information regarding the shape of the dose-toxicity curve. However, in its’ simplest form, the method relies on an assumption that toxicity events are driven solely by the dose, and that the patients' population is homogeneous w.r.t. the response. This is rarely the case, in particular in a very heterogeneous cancer patients' population. Stratification of the response by covariates, such as disease, disease status, baseline characteristics, etc., could potentially reduce the variability and allow to identify subpopulations that are more or less prone to experience an event. This stratification requires enough data been available, that is rarely the case when toxicity events are used as a response variable. We propose to use a PKPD approach to model the mechanistic process underlying the toxicity. In such a way, all the data, also including those from patients that have not (yet) experienced a toxicity event, are taken into account. Furthermore, various covariates can be introduced into the model, and predictions for patients' subgroups of interest could be done. Thus, we will aim to reduce the number of patients exposed to low and inefficient doses, the number of cohorts and the total number of patients required to define MTD. Finally we hope to reach MTD faster at a lower cost. We test the methodology on a concrete example and discuss the benefits and drawbacks of the approach. References [1] Neuenschwander B., Branson M., Gsponer T. Critical aspects of the Bayesian approach to Phase I cancer trials, Statistics in Medicine 2008, 27:2420-2439 [2] Piantadosi S. and Liu G, Improved Designs for Dose Escalation Studies Using Pharmacokinetic measurements, Statistics in Medicine 1996, 15, 1605-1618 [3] Müller, P. and Quintana, F. A. (2010) Random Partition Models with Regression on Covariates. Journal of Statistical Planning and Inference, 140(10), 2801-2808 [4] Berry S., Carlin B., Lee J. and Müller P. Bayesian Adaptive Methods for Clinical Trials, CRC Press, 2010
DAEW03 19th August 2011
11:45 to 12:30
Designs and models for Phase I oncology trials with intra-patient dose escalation
DAEW03 19th August 2011
14:00 to 14:45
N Flournoy Some Issues Response-Adaptive Designs for Dose-finding Experiments
We discuss some of the many issues involved in selecting an adaptive design. Included in these considerations are choices to go with frequentist or Bayesian, parametric or nonparametric procedures. There is great appeal in using all the information gained to date, but in many settings, two or three stage designs have been shown to perform almost as well as fully adaptive ones. Furthermore, with many procedures, an unfortunate string of early responses can have strong undesirable effects on estimates. These consequences can be mitigated by using a short term memory procedure rather than a long term memory procedure. When interest is in the MTD, placing subjects around the MTD is symbiotic with efficiently estimating the MTD; this is not so when interest is in finding a dose that is efficacious without toxicity. The compromise between designing to optimize for ethical treatment versus optimizing for efficiency in estimating best dose should be given serious attention in practice; although it has been stated that adaptive designs let one do both, this simply is not the case. Finally, we briefly consider issues related to stopping for toxicity and lack of efficacy, sample size recalculation and dropping or adding treatments. How flexible should a clinical trial be? Are analysts prepared for the negative impact such flexibility has on estimates of effect size?
DAEW03 19th August 2011
14:45 to 15:30
B Rosenberger Principles for Response-Adaptive Randomization
We discuss guiding principles for the use of response-adaptive randomization in clinical trials. First, we describe a set of criteria by which the investigator can determine whether response-adaptive randomization is useful. The we discuss a template for the appropriate selection of a response-adaptive randomization procedure. Such guidance should be useful in designing state-of-the-art clinical trials.
DAEW03 19th August 2011
15:30 to 16:15
A Giovagnoli Recent developments in adaptive clinical trials to account for individual and collective ethics
Most Phase III clinical trials are carried out in order to compare different drugs or therapies. The aim may be to estimate some treatment effects separately or, more commonly, to estimate or test their differences. The ethical concern of assigning treatments to patients so as to care for each of them individually often conflicts with the demands for rigorous experimentation on one hand, and randomization on the other. Recently, there has been a growing statistical interest in sequential procedures for treatment comparison which at each stage use the available information with the ethical aim of skewing allocations towards the best treatment. In two recent papers ([1], [2]) the present authors have approached the problem via the optimization of a compromise criterion, obtained by taking a weighted average of a design optimality measure and a measure of the subjects' risk. The relative weights in the compound criterion have been allowed to depend on the true state of nature, since it is reasonable to suppose that the more the effects of the treatments differ, the more important for the patients are the chances of receiving the best treatment.

The purpose of this presentation is to extend the theoretical results of [1] and [2] and enhance their applicability by means of some numerical examples. We shall first of all find a "target" allocation, namely one that optimizes the above-mentioned compound criterion for different response models, also taking into account observable categorical covariates. Since the target does in general depend on the unknown parameters, the implementation of adaptive randomization methods to make the experiment converge to the desired target is illustrated. For simplicity here we consider the most common case of just two treatments.

References

  1. 1. A. Baldi Antognini, A. Giovagnoli (2010) "Compound Optimal Allocation for Individual and Collective Ethics in Binary Clinical Trials." Biometrika 97(4), 935 - 946
  2. 2. A. Baldi Antognini, M. Zagoraiou (2010) "Covariate adjusted designs for combining efficiency, ethics and randomness in normal response trials", in mODa 9 - Advances in Model Oriented Design and Analysis (A. Giovagnoli, A. Atkinson, B. Torsney eds, C. May coed.), HEIDELBERG: Physica-Verlag, Springer (GERMANY), 17-24, ISBN: 978-3-7908-2409-4
DAEW04 30th August 2011
09:30 to 10:30
J Wu Post-Fisherian Experimentation: from Physical to Virtual
Experimental design has been a scientific discipline since the founding work of Fisher. During the 80-year history, its development has been largely dominated by work in physical experiments. With advances in high-performance computing and numerical modeling, virtual experiments on a computer have become viable. This talk will highlight some major developments (physical and virtual) in this long period. Fisher’s principles (replication, randomization, blocking) will be reviewed, together with principles (effect hierarchy, sparsity, heredity) for factorial experiments. A fresh look at interactions and effect aliasing will be provided, with some surprisingly new insights on an age-old problem. Robust parameter design, another significant development which focuses on variation modeling and reduction, will be mentioned. Turning to computer experiments, the key differences with physical experiments will be highlighted. These include the lack of replication errors which ent ails new governing principles other than Fisher’s and the use of space-filling designs instead of fractional factorials. There are two strategies for modeling and analysis: based on Gaussian processes or on function approximations. These seemingly conflicting approaches can be better linked by bringing a stochastic structure to the numerical errors. Throughout the talk, real experiments/data, ranging from manufacturing to nano technology, will be used for illustration. (Note: this talk will be an adapted version of the COPSS Fisher Lecture the speaker will deliver during the Joint Statistical Meetings in Miami in August).
DAEW04 30th August 2011
11:00 to 11:30
Randomization, Regularization and Covariate Balance in Response-Adaptive Designs for Clinical Trials
Results on the sequential construction of optimum experimental designs yield a very general procedure for generating response-adaptive designs for the sequential allocation of treatments to patients in clinical trials. The designs provide balance across prognostic factors with a controllable amount of randomization and speciable skewing towards better treatments. Results on the loss, bias and power of such rules will be discussed and the importance of regularization will be stressed in the avoidance of extreme allocations. The designs will be considered in the wider context of decisions about treatment allocation to patients within the study and to the population of future patients.
DAEW04 30th August 2011
11:30 to 12:00
N Flournoy Information in adaptive optimal design with emphasis on the two stage case
In 1963, Box and Hunter, followed by many others, recommended selecting sequential treatments to maximize the increment of some information measure (e.g., the determinant of the Fisher information matrix). Under nonlinear regression models, because information is a function of unknown parameters, such increments must be estimated; and the information from different stages is not independent. To explore the accrual of information in adaptive designs, we study a basic one parameter nonlinear regression model with additive independent normal errors. The stage 1 treatment is taken to be fixed, the treatment allocation rule for stage 2 is taken to be a unique function of maximum likelihood estimates derived from stage 1 data. Although conditioning on the design is common in data analyses, we show in this scenario, that conditioning on the stage 2 treatment is equivalent to conditioning on the stage 1 data. This raises questions about the role of conditioning in the analysis o f adaptive designs. We also explore the efficiency conducting studies in stages and the effect of allocating different proportions of subjects to stage 1 versus stage 2.
DAEW04 30th August 2011
12:00 to 12:30
A particle filter for Bayesian sequential design
A particle filter approach is presented for sequential design with a focus on Bayesian adaptive dose finding studies for the estimation of the maximum tolerable dose. The approach is computationally convenient in that the information of newly observed data can be incorporated through a simple re-weighting step. Furthermore, the method does not require prior information represented as imagined data as in other dose finding approaches, although such data can be included straightforwardly if available. We also consider a flexible parametric model together with a newly developed hybrid design utility that can produce more robust estimates of the target dose in the presence of substantial model and parameter uncertainty.
DAEW04 30th August 2011
14:00 to 14:30
D Montgomery Constructing and Assessing Exact G-Optimal Designs
Methods for constructing G-optimal designs are reviewed. A new and very efficient algorithm for generating near G-optimal designs is introduced, and employed to construct designs for second-order models over cuboidal regions. The algorithm involves the use of Brent’s minimization algorithm with coordinate exchange to create designs for 2 to 5 factors. Designs created using this new method either match or exceed the G-efficiency of previously reported designs. A new graphical tool, the variance ratio fraction of design space (VRFDS) plot, is used for comparison of the prediction variance for competing designs over a given region of interest. Using the VRFDS plot to compare G-optimal designs to I-optimal designs shows that the G-optimal designs have higher prediction variance over the vast majority of the design region. This suggests that, for many response surface studies, I-optimal designs may be superior to G-optimal designs.
DAEW04 30th August 2011
14:30 to 15:00
Causal Inference from 2-level factorial designs
A framework for causal inference from two-level factorial and fractional factorial designs with particular sensitivity to applications to social, behavioral and biomedical sciences is proposed. The framework utilizes the concept of potential outcomes that lies at the center stage of causal inference and extends Neyman's repeated sampling approach for estimation of causal effects and randomization tests based on Fisher's sharp null hypothesis to the case of 2-level factorial experiments. The framework allows for statistical inference from a finite population, permits definition and estimation of parameters other than "average factorial effects" and leads to more flexible inference procedures than those based on ordinary least squares estimation from a linear model. It also ensures validity of statistical inference when the investigation becomes an observational study in lieu of a randomized factorial experiment due to randomization restrictions.
DAEW04 30th August 2011
15:00 to 15:30
Engineering-Driven Statistical Adjustment and Calibration
There can be discrepancy between physics-based models and reality, which can be reduced by statistically adjusting and calibrating the models using real data. Gaussian process models are commonly used for capturing the bias between the physics-based model and the truth. Although this is a powerful approach, the resulting adjustment can be quite complex and physically non-interpretable. A different approach is proposed here which is to postulate adjustment models based on the engineering judgment of the observed discrepancy. This often leads to models that are very simple and easy to interpret. The approach will be illustrated using many real case studies.
DAEW04 30th August 2011
16:00 to 16:30
An overview of functional data analysis with an application facial motion modelling
Data in the form of curves, trajectories and shape changes present unique challenges. We present an overview of functional data analysis. We show how these methods can be used to model facial motion with application to cleft lip surgery.
DAEW04 30th August 2011
16:30 to 17:00
Robust microarray experiments by design: a multiphase framework

Speed and Yang with Smyth (2008) outlined six main phases in genomics, proteomics and metabolomics microarray experiments. They suggested that statisticians could assist in the design of experiments in each phase of such an experiment. That being the case, the experiments potentially involve multiple randomizations (Brien and Bailey, 2006) and are multiphase. Consequently, a multiphase framework for their design will be explored, the first step of which is to list out the phases in the experiment. One set of six phases for the physical conduct of microarray experiments will be described and the sources of variability that affect these phases discussed.

The multiphase design of an example microarray experiment will be investigated, beginning with the simplest option of completely randomizing in every phase and then examining the consequences of batching in one or more phases and of not randomizing in all phases. To examine the properties of a design, a mixed model and ANOVA that include terms and sources for all the identified phases will be derived. For this, the factor-allocation description of Brien, Harch, Correll and Bailey (2011) will be used. It is argued that the multiphase framework used is flexible, promotes the consideration of randomization in all phases and facilitates the identification of all sources of variability at play in a microarray experiment.

DAEW04 30th August 2011
17:00 to 17:30
Designed Biofluid Mixtures Allow Feature-Wise Evaluation Of Metabolic Profiling Analytical Platforms
The development of spectral analysis platforms for targeted metabolic profiling may help streamline quantification and will undoubtedly facilitate biological interpretation of metabolomics/metabonomics datasets. A general method for evaluating the performance, coverage and applicability of analytical methods in metabolic profiling is much needed to aid biomarker assessment.

The substantial variation in spectral and compositional background that exist in samples generated by real biofluid studies are often not capture by traditional evaluations of analytical performance that use compounds addition (spikes). Such approaches may therefore underestimate the contribution of matrix effects to the measurement of major metabolites and confound analysis.

We illustrate how a strategy of mixing intact biofluids in a predetermined experimental design can be used to evaluate, compare and optimise the performance of quantitative spectral analysis tools in conditions that better approximate a real metabolic profiling experiment.

Results of preliminary experiments on two commonly-used profiling platforms (high-resolution 1D 1H nuclear magnetic resonance (NMR) spectroscopy and ultra high performance liquid chromatography-mass spectrometry (UPLC-MS)) are discussed. Use of multivariate regression allowed feature-wise statistics to be generated as a summary of the overall performance of each platform.

The use of designed biofluid mixtures as a basis of evaluating the feature-wise variation in instrument response provides a rational basis for exploiting information from several samples simultaneously, in contrast to spectral deconvolution, which is typically applied to one spectrum at a time.
DAEW04 31st August 2011
09:00 to 09:45
P Goos Optimal design of blocked and split-plot experiments for fixed-effects and variance-components estimation
Many industrial experiments, such as block experiments and split-plot experiments, involve one or more restrictions on the randomization. In these experiments the observations are obtained in groups. A key difference between blocked and split-plot experiments is that there are two sorts of factors in split-plot experiments. Some factors are held constant for all the observations within a group or whole plot, whereas others are reset independently for each individual observation. The former factors are called whole-plot factors, whereas the latter are referred to as sub-plot factors. Often, the levels of the whole-plot factors are, in some sense, hard to change, while the levels of the sub-plot factors are easy to change. D-optimal designs, which guarantee efficient estimation of the fixed effects of the statistical model that is appropriate given the random block or split-plot structure, have been constructed in the literature by various authors. However, in general, model estimation for block and split-plot designs requires the use of generalized least squares and the estimation of two variance components. We propose a new Bayesian optimal design criterion which does not just focus on fixed-effects estimation but also on variance-component estimation. A novel feature of the criterion is that it incorporates prior information about the variance components through log-normal or beta prior distributions. Finally, we also present an algorithm for generating efficient designs based on the new criterion. We implement several lesser-known quadrature approaches for the numerical approximation of the new optimal design criterion. We demonstrate the practical usefulness of our work by generating optimal designs for several real-life experimental scenarios.
DAEW04 31st August 2011
09:45 to 10:20
Split-Plot Experiments with Factor-Dependent Whole Pot Sizes
In industrial split-plot experiments, the number of runs within each whole plot is usually determined independently from the factor settings. As a matter of fact, it is often equal to the number of runs that can be done within a given period of time or to the number of samples that can be processed in one oven run or with one batch. In such cases, the size of every whole plot in the experiment is fixed no matter what factor levels are actually used in the experiment. In this talk, we discuss the design of a real-life experiment on the production of coffee cream where the number of runs within a whole plot is not fixed, but depends on the level of one of the whole-plot factors. We provide a detailed discussion of various ways to set up the experiment and discuss how existing algorithms to construct optimal split-plot designs can be modified for that purpose. We conclude with a few general recommendations.
DAEW04 31st August 2011
11:00 to 12:00
Panel Discussion: Future Directions for DOE
Five prominent researchers, each from a different background, will briefly describe some of the directions future research in design of experiments could take. Among the challenges addressed will be increasingly large and complex data sets, increased computing power and its impact on design and analysis and challenges arising from unexplored areas of application. The audience will be invited to contribute their own opinions.

Featuring Anthony Atkinson, Robert Wilkinson, John Stufken, David M. Steinberg and R. A. Bailey.

DAEW04 31st August 2011
12:00 to 12:20
Poster Storm
DAEW04 31st August 2011
14:00 to 14:30
H Dette Optimal designs, matrix polynomials and random matrices
In this talk we relate classical optimal design problems for weighted polynomial regression to random matrix theory. Exploring this relation we are able to derive old and new results in this important field of mathematical physics. In particular, we study the asymptotic eigenvalue distribution of random band matrices generalizing.
DAEW04 31st August 2011
14:30 to 15:00
Nature-Inspired Metaheuristic Algorithms for Generating Optimal Experimental Designs
We explore a particle swarm optimization (PSO) method for finding optimal experimental designs. This method is relatively new, simple yet powerful and widely used in many fields to tackle real problems. The method does not assume the objective function to be optimized is convex or differentiable. We demonstrate using examples that once a given regression model is specified, the PSO method can generate many types of optimal designs quickly, including optimal minimax designs where effective algorithms to generate such designs remain elusive.
DAEW04 31st August 2011
15:00 to 15:30
D-optimal designs for Two-Variable Binary Logistic Models with Interaction
It is not uncommon for medical researchers to administer two drugs simultaneously to a patient and to monitor the response as binary, that is either positive or negative. Interest lies in the interaction of the drugs and specifically in whether that interaction is synergistic, antagonistic or simply additive. A number of statistical models for this setting have been proposed in the literature, some complex, but arguably the most widely used is the two-variable binary logistic model which can be formulated succinctly as ln (p/(1-p)) = beta0 + beta1 x1 + beta2 x2 + + beta12 x1 x2 (*) where p is the probability of a positive response, x1 and x2 are the doses or log-doses of the drugs and beta0, beta1, beta2 and beta12 are unknown parameters. There is a broad base of research on the fitting, analysis and interpretation of this model but, somewhat surprisingly, few studies on the construction of the attendant optimal designs. In fact there are two substantive reports on this design problem, both unpublished, namely the Ph.D. thesis of Kupchak (2000) and the technical report of Jia and Myers (2001). In this talk the problem of constructing D-optimal designs for the model (*) is addressed. The approach builds on that of Jia and Myers (2001) with design points represented in logit space and lying on hyperbolae in that space. Algebraic results proved somewhat elusive and just two tentative propositions are given. To counter this, a taxonomy of designs, obtained numerically and dictated by the values of the unknown parameters, is also reported. This work forms part of the Ph.D. thesis of Kabera (2009) and is joint with Gaetan Kabera of the Medical Research Council of South Africa and Prince Ndlovu of the University of South Africa. References Jia Y. and Myers R.H. (2001). “Optimal Experimental Designs for Two-variable Logistic Regression Models.” Technical Report, Department of Statistics, VPI & SU, Backsberg, Virginia. Kabera M.G. (2009). “D-optimal Designs for Drug Synergy.” Ph.D. thesis, University of KwaZulu-Natal. Kupchak P.I. (2000). “Optimal Designs for the Detection of Drug Interaction.” Ph.D. thesis, University of Toronto.
DAEW04 31st August 2011
16:00 to 16:30
Estimating the heterogeneity distribution of willingness-to-pay using individualized choice sets
Two prominent approaches exist nowadays for estimating the distribution of willingness-to-pay (WTP) based on choice experiments. One is to work in the usual preference space in which the random utility model is expressed in terms of partworths. These partworths or utility coefficients are estimated together with their distribution. The WTP and the corresponding heterogeneity distribution of WTP is derived from these results. The other approach reformulates the utility in terms of WTP (called WTP-space) and estimates the WTP and the heterogeneity distribution of WTP directly. Though often used, working in preference space has severe drawbacks as it often leads to WTP-distributions with long flat tails, infinite moments and therefore many extreme values.

By moving to WTP-space, authors have tried to improve the estimation of WTP and its distribution from a modeling perspective. In this paper we will further improve the estimation of individual level WTP and corresponding heterogeneity distribution by designing the choice sets more efficiently. We will generate individual sequential choice designs in WTP space. The use of this sequential approach is motivated by findings of Yu et al. (2011) who show that this approach allows for superior estimation of the utility coefficients and their distribution. The key feature of this approach is that it uses Bayesian methods to generate individually optimized choice sets sequentially based on prior information of each individual which is further updated after each choice made. Based on a simulation study in which we compare the efficiency of this sequential design procedure with several non-sequential choice designs, we can conclude that the sequential approach improves the estimation results substantially.

DAEW04 31st August 2011
16:30 to 17:00
Assessing the Efficiencies of Optimal Discrete Choice Experiments in the Presence of Respondent Fatigue

Discrete choice experiments are an increasingly popular form of marketing research due to the accessibility of on-line respondents. While statistically optimal experimental designs have been developed for use in discrete choice experiments, recent research has suggested that efficient designs often fatigue or burden the respondent to the point that decreased response rates and/or decreased response precision are observed. Our study was motivated by high early-termination rates for one such optimally-designed study.

In this talk, we examine the design of discrete choice experiments in the presence of respondent fatigue and/or burden. To do so, we propose a model that links the respondent's utility error variance to a function that accommodates respondent fatigue and burden. Based on estimates of fatigue and burden effects from our own work and published studies, we study the impact of these factors on the realized efficiencies of commonly-used D-optimal choice designs. The trade-offs between the number of surveys, the number of choice sets per survey, and the number of profiles per choice set are delineated.

DAEW04 31st August 2011
17:00 to 17:30
M Crabbe Improving the efficiency of individualized designs for the mixed logit model by including covariates
Conjoint choice experiments have become an established tool to get a deeper insight in the choice behavior of consumers. Recently, the discrete choice literature focused attention on the use of covariates like demographics, socio-economic variables or other individual-specific characteristics in design and estimation of discrete choice models, more specifically on whether the incorporation of such choice related respondent information aids in increasing estimation and prediction accuracy. The discrete choice model considered in this paper is the panel mixed logit model. This random-effects choice model accommodates preference heterogeneity and moreover, accounts for the correlation between individuals’ successive choices. Efficient choice data for the panel mixed logit model is obtained by individually adapted sequential Bayesian designs, which are customized to the specific preferences of a respondent, and reliable estimates for the model parameters are acquired by means of a hierarchical Bayes estimation approach. This research extends both experimental design and model estimation for the panel mixed logit model to include covariate information. Simulation studies of various experimental settings illustrate how the inclusion of influential covariates yields more accurate estimates for the individual parameters in the panel mixed logit model. Moreover, we show that the efficiency loss in design and estimation resulting from including choice unrelated respondent characteristics is negligible.
DAEW04 1st September 2011
09:00 to 09:30
Optimal designs for 2^k experiments with binary response
We consider optimal designs for an experiment with k qualitative factors at 2 levels each with binary response. For local D-optimality we derive theoretical results, propose algorithms to search for optimal designs and study properties of the algorithms. The robustness of optimal designs to specification of assumed parameter values is studied. We also briefly examine Bayesian optimal designs.
DAEW04 1st September 2011
09:30 to 10:00
D-optimal designs for multinomial experiments
Consider a multinomial experiment where the value of a response variable falls in one of k classes. The k classes are not assumed to have a hierarchical structure. Let ij represent the probability that the ith experimental unit gives a response that falls in the jth class. By modelling ln(ij=i1) as a linear function of the values of m predictor variables, we may analyse the results of the experiment using a Generalized Linear Model.

We describe the construction of D-optimal experimental designs for use in such an experiment. Difficulties in obtaining these designs will be described, together with attempts to overcome these obstacles.
DAEW04 1st September 2011
10:00 to 10:30
B Torsney Fitting Latent Variable Models for Paired Comparisons and Ranking Studies - An Application of Optimal Design Theory

In a paired comparisons experiment a subject has to indicate which of two 'treatments' Ti, Tj is preferred. We observe Oij, the frequency with which Ti is preferred to Tj.in nij comparisons. Under a class of models for such data, which include the Bradley Terry and Thurstone models, P(Ti is preferred to Tj) = F( i - j), where F(.) is a symmetric distribution function and ( i) is a treatment index. For identifiability purposes constraints must be imposed on parameters. One is to assume that ipi = 1, where pi = ln( i); an alternative is ipi = 1. Thus theorems identifying optimal design weights and algorithms for determining them carry over to the maximum likelihood estimation of these parameters.

Of course these tools can also be used to determine locally optimal designs for such models.

We will explore this fusion of topics, taking the opportunity to expand on the class of models, both for simple paired comparisons data and also for data consisting of orderings or rankings. In particular we will exploit multiplicative algorithms for maximum likelihood estimation.

DAEW04 1st September 2011
11:00 to 11:30
Bayesian Adaptive Design for State-space Models with Covariates
Modelling data that change over space and time is important in many areas, such as environmental monitoring of air and noise pollution using a sensor network over a long period of time. Often such data are collected dynamically together with the values of a variety of related variables. Due to resource limitations, an optimal choice (or design) for the locations of the sensors is important for achieving accurate predictions. This choice depends on the adopted model, that is, the spatial and temporal processes, and the dependence of the responses on relevant covariates. We investigate adaptive designs for state-space models where the selection of locations at time point $t_{n+1}$ draws on information gained from observations made at the locations sampled at preceding time points $t_1, \ldots, t_n$. A Bayesian design selection criterion is developed and its performance is evaluated using several examples.
DAEW04 1st September 2011
11:30 to 12:00
Design of Networked Experiments
We consider experiments on a number of subjects, and examine how the links between subjects in an experiment, affect the optimal design. For example, in a marketing experiment, it is reasonable to believe that a product may be preferred more by a subject whose 'friend' also prefers that product, and we may wish to use this 'friendship' information to improve our design.

We present optimal designs to measure both the direct effect and the network effect. We discuss how the structure of the network has a large influence on the optimal design, but show that even if we know many properties of the network, as represented by the eigenvalues of a graph, we cannot determine an absolute design.

We present examples based on marketing experiments, and show how the results can be applied to experiments in social sciences and elsewhere.
DAEW04 1st September 2011
12:00 to 12:20
Poster Storm
DAEW04 1st September 2011
16:20 to 16:50
Trip to Rothamsted - The Development of Statistical Design Concepts at Rothamsted
Modern applied statistics began in 1919, when R.A. Fisher was appointed as the first statistician at Rothamsted. Experiments were already taking place of course, notably the Broadbalk long-term fertiliser trial of wheat at Rothamsted which had been running since 1843. However, concepts like replication, randomization, blocking and factorial structure were unknown – and it took some strong persuasion by Fisher before they became accepted.

Only very limited analysis options were available too, and again some of the concepts like degrees of freedom, or methodology like maximum likelihood, proved to be very controversial. Nevertheless, by the time that Fisher left Rothamsted in 1935, the foundations had been laid for very many of the design principles and statistical methods that we now take for granted.

In this talk I shall describe how some of the key ideas developed under the challenges from the biological research at Rothamsted, sketch out a few of the events and controversies, and indicate how some of the research threads have been continued by his successors.
DAEW04 2nd September 2011
09:00 to 09:30
On Minimal-Point Designs
A minimal-point design has its number of experimental runs equals to the number of parameters. This is the minimal effort possible to obtain an unbiased estimate for all parameters. Some recent advances for minimal-point design under various models will be discussed. Specifically, a new class of minimal-point design robust to interactions for first-order model is proposed; a new class of minimal-point design, making use of Conference Matrices, for definitive screening will be explored, and if time permits, new minimal-point designs for full second-order response surface models will be discussed. A related issue on the construction of conference matrix and its applications in design of experiment will be introduced.
DAEW04 2nd September 2011
09:30 to 10:00
J Godolphin The Specification of Robust Binary Block Designs Against the Loss of Whole Blocks
A new criterion is suggested for measuring the robustness of a binary block design D against the loss of whole blocks, which is based on the concept of rectangle concurrencies.

This criterion is superior to the notion of minimal concurrence that has been employed previously and it enables improved conditions for classifying the robustness status of D to be derived.

It is shown that several classes of binary block designs, with the positive feature of possessing either partial balance or near balance, are maximally robust; thus expanding a classic result in the literature, known as Ghosh's theorem, that confines this status to balanced designs.

DAEW04 2nd September 2011
10:00 to 10:30
New Classes of Second-Order Equivalent-Estimation Split-Plot Designs
In many industrial experiments, complete randomization of the runs is impossible as, often, they involve factors whose levels are hard or costly to change. In such cases, the split-plot design is a cost-efficient alternative that reduces the number of independent settings of the hard-to-change factors. In general, the use of generalized least squares is required for model estimation based on data from split-plot designs. However, the ordinary least squares estimator is equivalent to the generalized least squares estimator for some split-plot designs, including some second-order split-plot response surface designs. These designs are called equivalent-estimation designs. An important consequence of the equivalence is that basic experimental design software can be used to analyze the data. We introduce two new families of equivalent-estimation split-plot designs, one based on subset designs and another based on a class of rotatable response surface designs constructed using supp lementary difference sets. The resulting designs complement existing catalogs of equivalent-estimation designs and allow for a more flexible choice of the number of hard-to-change factors, the number of easy-to-change factors, the number and size of whole plots and the total sample size.
DAEW04 2nd September 2011
11:00 to 11:30
The algebraic method in statistics: Betti numbers and Alexander duality
After a brief review of the algebraic method in statistics, using G-bases, some newer results are described. The first relates the average degree concept to the Betti numbers of the monomial ideal of models. "Flatter" models in the sense of having lower degree are associated with more complex ideals having larger Betti numbers. The Alexander duality relates models and their complements within a factorial framework and leads to large classes of design for which it is straightforward to read off the model structure.
DAEW04 2nd September 2011
11:30 to 12:00
An Evolutionary Approach to Experimental Design for Combinatorial Optimization
In this presentation we investigate an approach which combines statistical methods and optimization algorithms in order to explore a large search space when the great number of variables and the economical constraints limit the ability of classical techniques to reach the optimum of a function. The method we propose - the Model Based Ant Colony Design (MACD) - couples real experimentation with simulated experiments and boosts an “Ant Colony” algorithm (Dorigo et al., 2004) by means of a simulator (strictly speaking an emulator), i.e. a predictive statistical model. Candidate solutions are generated by computer simulation using Ant Colony Optimization, a probabilistic technique for solving computational problem which consists in finding good paths through graphs and is based on the foraging behaviour of real ants. The evaluation of the candidate solutions is achieved by physical experiments and is fed back into the simulative phase in a recursive way.

The properties of the proposed approach are studied by means of numerical simulations, testing the algorithm on some mathematical benchmark functions. Generation after generation, the evolving design requires a small number of experimental points to test, and consequently a small investment in terms of resources. Furthermore, since the research was inspired by a real problem in Enzyme Engineering and Design, namely finding a new enzyme with a specific biological function, we have tested MACD on the real application. The results shows that the algorithm has explored a region of the sequence space not sampled by natural evolution, identifying artificial sequences that fold into a tertiary structure closely related to the target one.
DAEW04 2nd September 2011
12:00 to 12:30
A Boukouvalas Optimal Design under Heteroscedasticity for Gaussian Process Emulators with replicated observations
Computer models, or simulators, are widely used in a range of scientific fields to aid understanding of the processes involved and make predictions. Such simulators are often computationally demanding and are thus not amenable to statistical analysis. Emulators provide a statistical approximation, or surrogate, for the simulators accounting for the additional approximation uncertainty.

For random output, or stochastic, simulators the output dispersion, and thus variance, is typically a function of the inputs. This work extends the emulator framework to account for such heteroscedasticity by constructing two new heteroscedastic Gaussian process representations and proposes an experimental design technique to optimally learn the model parameters. The design criterion is an extension of Fisher information to heteroscedastic variance models. Replicated observations are efficiently handled in both the design and model inference stages. We examine the effect of such optimal designs on both model parameter uncertainty and predictive variance through a series of simulation experiments on both synthetic and real world simulators.
DAEW04 2nd September 2011
13:30 to 14:00
Optimal design of experiments with very low average replication
Trials of new crop varieties usually have very low average replication. Thus one possiblity is to have a single plot for each new variety and several plots for a control variety, with the latter well spread out over the field. A more recent proposal is to ignore the control, and instead have two plots for each of a small proportion of the new varieties.

Variation in the field may be accounted for by a polynomial trend, by spatial correlation, or by blocking. However, if the experiment has a second phase, such as making bread from flour milled from the grain produced in the first phase, then that second phase usually has blocks. The optimality criterion used is usually the A criterion: the average variance of the pairwise differences between the new varieties. I shall compare designs under the A criterion when the average replication is much less than two.
DAEW04 2nd September 2011
14:00 to 14:30
A Dean Screening strategies in the presence of interactions
Screening is the process of using designed experiments and statistical analyses to search through a large number of potentially influential factors in order to discover the few factors that have a substantial effect on a measured response (i.e. that are "active"). In this setting, conventional fractional factorial experiments typically require too many observations to be economically viable. To overcome this problem in practice, interactions are often dropped from consideration and assumed to be negligible, sometimes without substantive justification. Such loss of information can be a serious problem in industrial experimentation because exploitation of interactions is a key tool for product improvement. This talk describes an assessment and comparison of two screening strategies for interactions, namely supersaturated designs and group screening, together with a variety of data analysis methods, based on shrinkage regression and Bayesian methods. Recommendation s on the use of the screening strategies are provided through simulation studies.
DAEW04 2nd September 2011
14:30 to 15:00
B Jones A Class of Three-Level Designs for Definitive Screening in the Presence of Second-Order Effects
Screening designs are attractive for assessing the relative impact of a large number of factors on a response of interest. Experimenters often prefer quantitative factors with three levels over two-level factors because having three levels allows for some assessment of curvature in the factor-response relationship. Yet, the most familiar screening designs limit each factor to only two levels. We propose a new class of designs that have three levels, provide estimates of main effects that are unbiased by any second-order effect, require only one more than twice as many runs as there are factors, and avoid confounding of any pair of second-order effects. Moreover, for designs having six factors or more, our designs allow for the estimation of the full quadratic model in any three factors. In this respect, our designs may render follow-up experiments unnecessary in many situations, thereby increasing the efficiency of the entire experimentation process.
DAEW04 2nd September 2011
15:00 to 15:30
A comparison of three Bayesian approaches for constructing model robust designs
While optimal designs are commonly used in the design of experiments, the optimality of those designs frequently depends on the form of an assumed model. Several useful criteria have been proposed to reduce such dependence, and efficient designs have been then constructed based on the criteria, often algorithmically. In the model robust design paradigm, a space of possible models is specified and designs are sought that are efficient for all models in the space. The Bayesian criterion given by DuMouchel and Jones (1994), posits a single model that contains both primary and potential terms. In this article we propose a new Bayesian model robustness criterion that combines aspects of both of these approaches. We then evaluate the efficacy of these three alternatives empirically. We conclude that the model robust criteria generally lead to improved robustness; however, the increased robustness can come at a significant cost in terms of computing requirements.
DAEW04 2nd September 2011
16:00 to 16:30
B Rosenberger Sequential Monitoring of Randomization Tests
The U. S. Food and Drug Administration often requires a randomization-based analysis of the primary outcome in a clinical trial, which they sometimes refer to as "re-randomization tests" (we prefer "randomization tests"). Conditional inference is inherently difficult when using a Monte Carlo approach to "re-randomize", and is impossible using standard techniques for some randomization procedures. We describe a new approach by deriving the exact conditional distribution of the randomization procedure and then using Monte Carlo to generate sequences directly from the conditional reference set. We then extend this technique to sequential monitoring, by computing the exact joint distribution of sequentially-computed conditional randomization tests. This allows for a spending-function approach using randomization tests instead of population-based tests. Defining information under a randomization model is tricky, and we describe various ways to & quot;estimate" information using the exact conditional variance of the randomization test statistics.
DAEW04 2nd September 2011
16:30 to 17:00
Optimising the allocation of participants in a two-stage randomised experiment to estimate selection, preference and treatment effects
Experimental outcomes may be affected by the choice of treatment that participants might make (if they were indeed allowed to choose), a so-called selection effect, and by whether they actually receive their preferred treatment, a so-called preference effect. Selection and preference effects can be important (possibly even larger than the usual treatment effect), but they cannot be estimated in conventional randomised experimental designs.

An alternative approach is the two-stage randomised design, in which participants are first randomly divided into two subgroups. In one subgroup, participants are randomly assigned to treatments, while in the other, participants are allowed to choose their own treatment. This approach can yield estimates of the direct treatment effect, and of the preference and selection effects. The latter two provide insight that goes considerably beyond what is possible in standard randomised experiments, notably the usual parallel group design.

In this presentation, we will consider the optimal proportion of participants who should be allocated to the choice subgroup and allowed to determine their own treatment. The precision of the estimated selection, preference and treatment effects are functions of: the total sample size; the proportion of participants allocated to choose their treatment; the variances of the response (or outcome); the proportions of participants who select each treatment in the choice group; and the selection, preference and treatment effects themselves. We develop general expressions for the optimum proportion of participants in the choice group, depending on the inverses of these variances, and on which effects are of primary interest. We illustrate the results with trial data comparing alternative clinical management strategies for women with abnormal results on cervical screening.
DAEW05 5th September 2011
10:00 to 10:20
T Santner Conference Organisation and Goals
DAEW05 5th September 2011
10:20 to 10:45
Savings Millions (lives and money) with Simulation Experiments
There are fundamental differences between real]world and computer simulation experiments. In simulations, an experimenter has full control over everything ] time, uncertainty, causality, structure, environment, etc. This talk demonstrates how synergies in simulation experiments, models, codes, and analysis sometimes make these indistinguishable. This motivates us to revise two famous quotes about statistical modeling and experimental design in the simulation context. George E. P. Box, one of the most prolific statisticians in recent history, is attributed with the quote gAll models are wrong; some are usefulh. A simulation model and experiment that is designed for a specific analytical purpose may be both more wrong and more useful. Sir R. A. Fisher, credited as the creator of statistical experimental design, is often quoted as saying, gThe best time to design an experiment is after youfve run it.h The best time to design a simulation model and experiment is often while youfre running it. Examples are presented where analysis]specific simulations and simulation]specific experiments have greatly improved the production and distribution of life]saving biopharmaceuticals. However, the methods apply more generally to production and service systems.
DAEW05 5th September 2011
11:30 to 11:55
Emulating the Nonlinear Matter Power Spectrum for the Universe
DAEW05 5th September 2011
12:10 to 12:30
Poster Preview (All Poster Presenters)
DAEW05 5th September 2011
14:00 to 14:30
Design for Variation
DAEW05 5th September 2011
14:30 to 15:00
Design and analysis of variable fidelity multi-objective experiments
At the core of any design process is the need to predict performance and vary designs accordingly. Performance prediction may come in many forms, from back-of-envelope through high fidelity simulations to physical testing. Such experiments may be one- or two-dimensional simplifications and may include all or some environmental factors. Traditional practice is to increase the fidelity and expense of the experiments as the design progresses, superseding previous low-fidelity results. However, by employing a surrogate modelling approach, all results can contribute to the design process. This talk presents the use of nested space filling experimental designs and a co-Kriging based multi-objective expected improvement criterion to select pareto optimal solutions. The method is applied to the design of an unmanned air vehicle wing and the rear wing of a race-car.
DAEW05 5th September 2011
15:30 to 16:00
The Design of Validation Experiments
When we build an emulator to analyse a computer experiment the normal practice is to use a two stage approach to design. An initial space filling design is used to build the emulator. A second experiment is then carried out to validate the emulator. In this paper I will consider what form this validation experiment should take. Current practice is to use another space filling design, unrelated to the first. Clearly we want our validation design to be space filling, we want to validate the emulator everywhere. But we might have other criteria as well for validation. For instance we might want to make sure that we validate our estimate of the spatial scales in which case we want our validation design to include points at varying distances from our original design.
DAEW05 5th September 2011
16:00 to 16:30
Statistical calibration of complex computer models: A case study with the Lyon-Feder-Mobarry model of the magnetosphere
The magnetosphere is the region of the Earth's magnetic field that forms a protective bubble which impedes the transfer of energy and momentum from solar wind plasma. The Lyon-Fedder-Mobarry model is used for coupled magnetosphere-ionosphere simulation and to model the eff ect of electron storms on the upper atmosphere. This model is generally run in a high-performance computing environment and the output represents a bivariate spatial-temporal field. In this work, we outline an approach for calibrating this computer model and quantifying the uncertainty in the calibration parameters that combines the computationally expensive but sparser high-resolution output with the lower fidelity but computationally inexpensive low-resolution output.
DAEW05 5th September 2011
16:30 to 17:00
Emulating complex codes: The implications of using separable covariance functions
Emulators are crucial in experiments where the computer code is sufficiently expensive that the ensemble of runs cannot span the parameter space. In this case they allow the ensemble to be augmented with additional judgements concerning smoothness and monotonicity. The emulator can then replace the code in inferential calculations, but in my experience a more important role for emulators is in trapping code errors.

The theory of emulation is based around the construction of a stochastic processes prior, which is then updated by conditioning on the runs in the ensemble. Almost invariably, this prior contains a component with a separable covariance function. This talk considers exactly what this separability implies for the nature of the underlying function. The strong conclusion is that processes with separable covariance functions are second-order equivalent to the product of second-order uncorrelated processes.

This is an alarmingly strong prior judgement about the computer code, ruling out interactions. But, like the property of stationarity, it does not survive the conditioning process. The cautionary response is to include several regression terms in the emulator prior.
DAEW05 5th September 2011
17:15 to 18:15
A Short Overview of Orthogonal Arrays
Combinatorial arrangements now known as orthogonal arrays were introduced for use in statistics in the 1940's. The primary purpose for their introduction was to guide the selection of level combinations in a fractional factorial experiment, and this is still an important reason for their interest in statistics. Various criteria based on statistical properties have been introduced over the years to distinguish between different orthogonal arrays of the same size, and some authors have attempted to enumerate all non-isomorphic arrays of small sizes. Orthogonal arrays also possess interesting relationships to several other combinatorial arrangements, including to error-correcting codes and Hadamard matrices. In this talk, aimed at a general mathematical audience, we will present a brief and selective overview of orthogonal arrays, including their existence, construction, and relationships to other arrangements.
DAEW05 6th September 2011
09:00 to 09:30
Construction of orthogonal and nearly orthogonal Latin hypercube designs for computer experiments
We present a method for constructing good designs for computer experiments. The method derives its power from its basic structure that builds large designs using small designs. We specialize the method for the construction of orthogonal Latin hypercubes and obtain many results along the way. In terms of run sizes, the existence problem of orthogonal Latin hypercubes is completely solved. We also present an explicit result showing that how large orthogonal Latin hypercubes can be constructed using small orthogonal Latin hypercubes. Another appealing feature of our method is that it can easily be adapted to construct other designs. We examine how to make use of the method to construct nearly orthogonal Latin hypercubes.
DAEW05 6th September 2011
09:30 to 10:00
Weighted space-filling designs
We investigate methods of incorporating known dependencies between variables into design selection for computer codes. Adaptations of computer-generated coverage and spread designs are considered, with “distance” between two input points redefined to include a weight function. The methods can include quantitative and qualitative variables, and different types of prior information. They are particularly appropriate for computer codes where there may be large areas of the design space in which it is not scientifically useful or relevant to take an observation. The different approaches are demonstrated through interrogation of a computer model for atmospheric dispersion.
DAEW05 6th September 2011
10:00 to 10:30
B Iooss Space filling designs for computer experiments: some algorithms and numerical results on industrial problems
Complex computer codes, for instance simulating physical phenomena, are often too time expensive to be directly used to perform uncertainty, sensitivity, optimization and robustness analyses. A widely accepted method to circumvent this problem consists in replacing cpu time expensive computer models by cpu inexpensive mathematical functions, called metamodels. A necessary condition to a successful modelling of these computer experiments is to explore the whole space of variations of the computer model input variables. However in many industrial applications, we are faced to the harsh problem of the high dimensionality of the exploration space. In this communication, we will first focus on the metamodel validation process which consists in evaluating the metamodel predictivity with respect to the initial computer code. This step has to be realized with caution and robustness in industrial applications, especially in the framework of safety studies.

We propose and test an algorithm, which optimizes the distance between the validation points and the metamodel learning points in order to estimate the true metamodel predictivity with a minimum number of validation points. Comparisons with classical validation algorithms and application to a nuclear safety computer code show the relevance of this sequential validation design. Second, we will present some recent results about the properties of different space filling designs. In practice, one has to choose which design to use in an exploratory phase of a numerical model. We will show the usefulness of some classification tools, as those based on the minimal spanning trees. We adopt a numerical approach to compare the performance of different types of space filling designs, in function of their interpoint-distance, L2-discrepancies and various sub-projection properties. Finally, we will present two recent problems, posed in some industrial applications: the introductions of inequality constraints between the inputs of a space filling design and the building of space filling design mixing quantitative and qualitative factors.
DAEW05 6th September 2011
11:30 to 12:00
Orthogonal nearly Latin hypercubes
DAEW05 6th September 2011
12:00 to 12:30
Sequential screening with elementary effects
The Elementary Effects (EE) method (Morris, 1991) is a simple but effective screening strategy. Starting from a number of initial points, the method creates random trajectories to then estimate factor effects. In turn, those estimates are used for factor screening. Recent research advances (Campolongo et al., 2004,2006) have enhanced the performance of the elementary effects method and the projections of the resulting design (Pujol, 2008). The presentation concentrates on a proposal (Boukouvalas et al., 2011) which turns the elementary effects method into a sequential design strategy. After describing the methodology, some examples are given and compared against the traditional EE method.
DAEW05 6th September 2011
14:00 to 15:00
Panel: Challenges When Interfacing Physical Experiments and Computer Models
DAEW05 6th September 2011
15:30 to 16:00
Batch sequential experimental designs for computer experiments
Finding optimal designs for computer experiments that are modeled using a stationary Gaussian Stochastic Process (GaSP) model is challenging because optimality criteria are usually functions of the unknown model parameters. One popular approach is to adopt sequential strategies. These have been shown to be very effective when the optimality criterion is formulated as an expected improvement function. Most of these sequential strategies assume observations are taken sequentially one at a time. However, when observations can be taken k at a time, it is not obvious how to implement sequential designs. We discuss the problems that can arise when implementing batch sequential designs and present several strategies for sequential designs taking observations in k-at-a-time batches. We illustrate these strategies with examples.
DAEW05 6th September 2011
16:00 to 16:30
Bridge Designs for Modeling Systems with Small Error Variance
A necessary characteristic of designs for deterministic computer simulations is that they avoid replication. This characteristic is also necessary for one-dimensional projections of the design, since it may turn out that only one of the design factors has any non-negligible effect on the response. Latin Hypercube designs have uniform one-dimensional projections are not efficient for fitting low order polynomials when there is a small error variance. D-optimal designs are very efficient for polynomial fitting but have substantial replication in projections. We propose a new class of designs that bridge the gap between Latin Hypercube designs and D-optimal designs. These designs guarantee a minimum distance between points in any one-dimensional projection. Subject to this constraint they are D-optimal for any pre-specified model.
DAEW05 6th September 2011
16:30 to 17:00
Non-collapsing Spacing-filling Designs for Bounded Regions
Many physical processes can be described via mathematical models implemented as computer codes. Since a computer code may take hours or days to produce a single output, a cheaper surrogate model (emulator) may be fitted for exploring the region of interest. The performance of the emulator depends on the "space-filling" properties of the design; that is, how well the design points are spread throughout the experimental region. The output from many computer codes is deterministic, in which case no replications are required at, or near, any design point to estimate error variability. In addition, designs that do not replicate any value for any single input ("non-collapsing" designs) are the most efficient when one or more inputs turn out to have little effect on the response. An algorithm is described for constructing space-filling and non-collapsing designs for computer experiments when the input region is bounded.
DAEW05 7th September 2011
09:30 to 09:40
A Visitors Guide to Experiment Design for Dynamic Discrete-Event Stochastic Simulation
DAEW05 7th September 2011
09:40 to 10:00
YJ Son Distributed Federation of Multi-paradigm Simulations and Decision Models for Planning and Control
In this talk, we first discuss simulation-based shop floor planning and control, where 1) on-line simulation is used to evaluate decision alternatives at the planning stage, 2) the same simulation model (executing in the fast mode) used at the planning stage is used as a real-time task generator (real-time simulation) during the control stage, and 3) the real-time simulation drives the manufacturing system by sending and receiving messages to an executor. We then discuss how simulation-based shop floor planning and control can be extended to enterprise level activities (top floor). To this end, we discuss the analogies between the shop floor and top floor in terms of the components required to construct simulation-based planning and control systems such as resource models, coordination models, physical entities, and simulation models. Differences between them are also discussed in order to identify new challenges that we face for top floor planning and control. A major difference is the way a simulation model is constructed so that it can be used for planning, depending on whether time synchronization among member simulations becomes an issue or not. We also discuss the distributed computing platform using web services and grid computing technologies, which allow us to integrate simulation and decision models, and software and hardware components. Finally, we discuss DDDAMS (Dynamic Data-Driven Adaptive Multi-Scale Simulation) framework, where the aim is to augment the validity of simulation models in the most economic way via incorporating dynamic data into the executing model and the executing model's steering the measurement process for selective data update.
DAEW05 7th September 2011
10:00 to 10:30
M Marathe Policy Informatics for Co-evolving Socio-technical Networks: Issues in Believability and Usefulness
The talk outlines a high resolution interaction-based approach to support policy informatics for large co-evolving socio-technical networks. Such systems consist of a large number of interacting physical, technological, and human/societal components. Quantitative changes in HPC including faster machines and service-oriented software have created qualitative changes in the way information can be integrated in analysis of these large heterogeneous systems and supporting policy makers as they consider the pros and cons of various decision choices.

Agent-oriented simulation is an example of an interaction based computational technique useful for reasoning about biological, information and social networks. Developing scalable models raises important computational and conceptual issues, including computational efficiency, necessity of detailed representation and uncertainty quantification.

The talk will describe the development of high performance computing based crises management system called Comprehensive National Incident Management System (CNIMS). As an illustrative case study we will describe how CNIMS can be used for developing a scalable computer assisted decision support system for pandemic planning and response. We will conclude by discussing challenging validation and verification issues that arise when developing such models.
DAEW05 7th September 2011
11:30 to 12:00
Enhancing Stochastic Kriging Metamodels with Stochastic Gradient Estimators
Stochastic kriging is the natural extension of kriging metamodels for the design and analysis of computer experiments to the design and analysis of stochastic simulation experiments where response variance may differ substantially across the design space. In addition to estimating the mean response, it is sometimes possible to obtain an unbiased or consistent estimator of the response-surface gradient from the same simulation runs. However, like the response itself, the gradient estimator is noisy. In this talk we present methodology for incorporating gradient estimators into response surface prediction via stochastic kriging, evaluate its effectiveness in improving prediction, and specifically consider two gradient estimators: the score function/likelihood ratio method and infinitesimal perturbation analysis.
DAEW05 7th September 2011
12:00 to 12:30
Simulation optimization via bootstrapped Kriging: survey
This presentation surveys simulation optimization via Kriging (also called Gaussian Process or spatial correlation) metamodels. These metamodels may be analyzed through bootstrapping, which is a versatile statistical method but must be adapted to the specific problem being analyzed. More precisely, a random or discrete- event simulation may be run several times for the same scenario (combination of simulation input values); the resulting replicated responses may be resampled with replacement, which is called ìdistribution-free bootstrappingî. In engineering, however, deterministic simulation is often applied; such a simulation is run only once for the same scenario, so "parametric bootstrapping" is used. This bootstrapping assumes a multivariate Gaussian distribution, which is sampled after its parameters are estimated from the simulation input/output data. More specifically, this talk covers the following recent approaches: (1) Efficient Global Optimiz ation (EGO) via Expected Improvement (EI) using parametric bootstrapping to obtain an estimator of the Kriging predictor's variance accounting for the randomness resulting from estimating the Kriging parameters. (2) Constrained optimization via Mathematical Programming applied to Kriging metamodels using distribution-free bootstrapping to validate these metamodels. (3) Robust optimization accounting for an environment that is not exactly known (so it is uncertain); this optimization may use Mathematical Programming and Kriging with distribution-free bootstrapping to estimate the Pareto frontier. (4) Bootstrapped Kriging may preserve a characteristic such as monotonicity of the outputs as a function of the inputs.
DAEW05 8th September 2011
09:30 to 10:00
SG Henderson Input Uncertainty: Examples and Philosophies
I will present two examples that point to the importance of appreciating input uncertainty in simulation modeling. Both examples are set in the context of optimization problems where the objective function is estimated (with noise) through discrete-event simulation. I will then discuss some modeling philosophies adopted by the deterministic optimization research community in the context of input uncertainty, with the goal of identifying some modeling philosophies that might be appropriate for simulation optimization.
DAEW05 8th September 2011
10:00 to 10:30
Some Challenges with Input Uncertainty
In this short presentation, I will summarize several recent applications from various parts of health care that highlight several interesting challenges with modeling and input uncertainty. Beyond the usual challenges associated with Bayesian model average-type approaches for parameter uncertainty about statistical input parameters, some challenges will also be identified. They include the difficulty that some decision makers have in thinking about data for “input” parameters when “output” data is easier to observe, data on intermediate or surrogate endpoints may be easier or less expensive to collect, and the structure of the model that translates inputs to outputs itself might be uncertain. These examples and challenges have a number of implications, many not yet adequately solved, for policy decisions, the sensitivity of decisions to input uncertainty, the prioritization of data to reduce input uncertainty in a way that effectively, when to stop learnin g and when to make a system design decision, and how to model extreme events (e.g., heavy tails) that may have implications for the nonexistence of certain moments of interest. We will review a number of these as time permits.
DAEW05 8th September 2011
10:30 to 11:00
Screening for Important Inputs by Bootstrapping
We consider resampling techniques in two contexts involving uncertainty in input modelling. The first concerns the fitting of input models to input data. This is a problem of estimation and can be treated either parametrically or non-parametrically. In either case the problem of assessing uncertainty in the fitted input model arises. We discuss how resampling can be used to deal with this. The second problem concerns the situation where the simulation output depends on a large number input variables and the problem is to identify which input variables are important in influencing output behaviour. Again we discuss how resampling can be used to handle this problem. An interesting aspect of both problems is that the replications used in the resampling involved in both problems are mutually independent. This means that greatly increased processing speed is possible if replications can be carried out in parallel. Recent developments in computer architecture makes parallel implementation much more readily available. This has a particularly interesting consequence for handling input uncertainty when simulation is used in real time decision taking, where processing speed is paramount. We discuss this aspect especially in the context of real time system improvement, if not real time optimization.
DAEW05 8th September 2011
11:30 to 12:00
Metamodels and the Bootstrap for Input Model Uncertainty Analysis
The distribution of simulation output statistics includes variation form the finiteness of samples used to construct input probability models. Metamodels and bootstrapping provide a way to characterize this error. The metamodel-fiting experiment benefits from a sequential design strategy. We describe the elements of such a strategy, and show how they impact performance.
DAEW05 8th September 2011
12:00 to 12:30
R V Joseph Coupled Gaussian Process Models
Gaussian Process (GP) models are commonly employed in computer experiments for modeling deterministic functions. The model assumes second-order stationarity and therefore, the predictions can become poor when such assumptions are violated. In this work, we propose a more accurate approach by coupling two GP models together that incorporates both the non-stationarity in mean and variance. It gives better predictions when the experimental design is sparse and can also improve the prediction intervals by quantifying the change of local variability associated with the response. Advantages of the new predictor are demonstrated using several examples from the literature.
DAEW05 8th September 2011
14:00 to 14:30
Assessing simulator uncertainty using evaluations from several different simulators
Any simulator-based prediction must take account of the discrepancy between the simulator and the underlying system. In physical systems, such as climate, this discrepancy has a complex, unknown structure that makes direct elicitation very demanding. Here, we propose a fundamentally different framework to that currently in use and consider information in a collection of simulator-evaluations, known as a Multi-Model Ensemble (MME). We justify our approach both in terms of its transparency, tractability, and consistency with standard practice in, say, Climate Science. The statistical modelling framework is that of second-order exchangeability, within a Bayes linear treatment. We apply our methods based on a reconstruction of boreal winter surface temperature.
DAEW05 8th September 2011
14:30 to 15:00
Constrained Optimization and Calibration for Deterministic and Stochastic Simulation Experiments
Optimization of the output of computer simulators, whether deterministic or stochastic, is a challenging problem because of the typical severe multimodality. The problem is further complicated when the optimization is subject to unknown constraints, those that depend on the value of the output, so the function must be evaluated in order to determine if the constraint has been violated. Yet, even an invalid response may still be informative about the function, and thus could potentially be useful in the optimization. We develop a statistical approach based on Gaussian processes and Bayesian learning to approximate the unknown function and to estimate the probability of meeting the constraints, leading to a sequential design for optimization and calibration.
DAEW05 8th September 2011
15:30 to 17:00
JPC Kleijnen & S Chick & S Sanchez Panel: Input uncertainty and experimental robustness
DAEW05 9th September 2011
09:00 to 09:30
Interpolation of Deterministic Simulator Outputs using a Gaussian Process Model
For many expensive deterministic computer simulators, the outputs do not have replication error and the desired metamodel (or statistical emulator) is an interpolator of the observed data. Realizations of Gaussian spatial processes (GP) are commonly used to model such simulator outputs. Fitting a GP model to n data points requires the computation of the inverse and determinant of n x n correlation matrices, R, that are sometimes computationally unstable due to near-singularity of R. This happens if any pair of design points are very close together in the input space. The popular approach to overcome near-singularity is to introduce a small nugget (or jitter) parameter in the model that is estimated along with other model parameters. The inclusion of a nugget in the model often causes unnecessary over-smoothing of the data. In this talk, we present a lower bound on the nugget that minimizes the over-smoothing and an iterative regularization approach to construct a predictor th at further improves the interpolation accuracy. We also show that the proposed predictor converges to the GP interpolator.
DAEW05 9th September 2011
09:30 to 10:00
D Bingham Calibration of multi-fidelity models for radiative shock
Environmental and economic industry relies on high-performance materials such as lightweight alloys, recyclable motor vehicle and building components, and high efficiency lighting. Material properties as expressed through crystal structure is crucial to this understanding. Based on first-principles calculations, it is still impossible in most materials to infer ground-state properties purely from a knowledge of their atomic components. Many methods attempt to predict crystal structures and compound stability, we explore models which infer the existence of structures on the basis of combinatorics and geometric simplicity. Computational models based on these first physics principles are called VASP codes. We illustrate the use of a statistical surrogate model to produce predictions of VASP codes as a function of a moderate number of VASP inputs.
DAEW05 9th September 2011
10:00 to 10:30
Investigating discrepancy in computer model predictions
In most computer model predictions, there will be two sources of uncertainty: uncertainty in the choice of model input parameters, and uncertainty in how well the computer model represents reality. Dealing with the second source of uncertainty can be difficult, particularly when we have no field data with which to compare the accuracy of the model predictions. We propose a framework for investigating the "discrepancy" of the computer model output: the difference between the model run at its 'best' inputs and reality, which involves 'opening the black box' and considering structural errors within the model. We can then use sensitivity analysis tools to identify important sources of model error, and hence direct effort into improving the model. Illustrations are given in the field of health economic modelling.
DAEW05 9th September 2011
11:30 to 12:00
Bayesian Calibration of Computer Model Ensembles

Using field observations to calibrate complex mathematical models of a physical process allows one to obtain statistical estimates of model parameters and construct predictions of the observed process that ideally incorporate all sources of uncertainty. Many of the methods in the literature use response surface approaches, and have demonstrated success in many applications. However there are notable limitations, such as when one has a small ensemble of model runs where the model outputs are high dimensional. In such instances, arriving at a response surface model that reasonably describes the process can be dicult, and computational issues may also render the approach impractical.

In this talk we present an approach that has numerous beneifts compared to some popular methods. First, we avoid the problems associated with defining a particular regression basis or covariance model by making a Gaussian assumption on the ensemble. By applying Bayes theorem, the posterior distribution of unknown calibration parameters and predictions of the field process can be constructed. Second, as the approach relies on the empirical moments of the distribution, computational and stationarity issues are much reduced compared to some popular alternatives. Finally, in the situation that additional observations are arriving over time, our method can be seen as a fully Bayesian generalization of the popular Ensemble Kalman Filter.

DAEW05 9th September 2011
12:00 to 12:30
Accurate emulators for large-scale computer experiments
Large-scale computer experiments are becoming increasingly important in science. A multi-step procedure is introduced to statisticians for modeling such experiments, which builds an accurate interpolator in multiple steps. In practice, the procedure shows substantial improvements in overall accuracy but its theoretical properties are not well established. We introduce the terms nominal and numeric error and decompose the overall error of an interpolator into nominal and numeric portions. Bounds on the numeric and nominal error are developed to show theoretically that substantial gains in overall accuracy can be attained with the multi-step approach.
DAEW05 9th September 2011
14:00 to 15:30
Panel: Future Challenges Integrating Multiple Modes of Experimentation
DAE 26th September 2011
09:30 to 09:40
Cambridge Statistics Initiative (CSI) Special One-Day meeting: Introduction
DAE 26th September 2011
09:40 to 09:50
Measures for capturing coverage of genetic variation in a population
The price of DNA sequencing and related technologies has dropped to a point where we can consider sequencing the genomes of a suffciently large sample of individuals in a human population to capture almost all genetic variation.

We are already engaged in such studies in population isolates such as in Kuusamo in the north-east of Finland (population 20,000, founded by 34 families around 1650) or Orkney (population 15,000). There are many questions of eficient design, but also of what the quantity of interest is and how to measure it. Almost all variation is shared by inheritance, but every person has some genetic variants not inherited from our parents, due to new mutations. I will introduce some of the measures and strategies we are using, and I hope initiate discussion. I believe there is lots of scope for new ideas.

DAE 26th September 2011
09:50 to 10:00
Statistical modeling of gene expression levels
Next-generation sequencing (NGS) technology has revolutionized our ability to assay the genome, transcriptome and epigenome of multiple different organisms. However, to ensure that the data generated can be utilized to answer pertinent biological questions, it is vital that appropriate statistical tools are developed.

In this talk, I will discuss some of the statistical challenges that arise when modeling gene expression measurements made using NGS. I will also discuss how current models will have to be extended and adapted as we move from studying gene expression levels measured across large populations of cells to measurements made at the single-cell level

DAE 26th September 2011
10:00 to 10:10
E Zeggini Rare variant analysis in large-scale association and sequencing studies
Recent advances in whole-genome genotyping technologies, the availability of large, well- defined sample sets, and a better understanding of common human sequence variation, coupled with the development of appropriate quality control and analysis pipelines, have led to the identification of many novel common genetic determinants of complex traits. However, despite these successes, much of the genetic component of these traits remains unaccounted for. One largely unexplored paradigm which may contribute to this missing heritability is a model of multiple rare causal variants, each of modest effect and residing within the same functional unit, for example, a gene. Joint analysis of rare variants, searching for accumulations of minor alleles in individuals, for a dichotomous or quantitative trait, may thus provide signals of association with complex phenotypes that could not have been identified through traditional association analysis of single nucleotide polymorphisms (SNPs). However, statistical methods to perform such joint analyses of rare variants have not yet been fully developed or evaluated. We have implemented rare variant analysis methods in user-friendly software and have extended approaches to collapsing rare allele tables and allele-matching tests by incorporating variant-specific quality scores (for example arising from next generation sequencing studies in which different positions have been covered at different depths) and genotype-specific probabilities (for example arising from 1000 genomes project-imputed data. We evaluated these methods and find increases in power to detect association under varying allelic architectures and parameters. We make recommendations for the analysis of rare variants in large-scale association and next generation sequencing studies.
DAE 26th September 2011
10:10 to 10:20
Estimating statistical significance of exome sequencing data for rare mendelian disorders using population-wide linkage analysis
Exome sequencing of a small number of unrelated affected individuals has proved to be a highly effective approach for identifying causative genes of rare mendelian diseases. A widely used strategy is to consider as candidate causative mutations only those variants that have not been seen previously in other individuals, and those variants predicted to affect protein sequence, e.g. non-synonymous variants or stop-codons.

For the recessive disorder Gray Platelet Syndrome we identified 7 novel coding mutations in 4 affected individuals, all in different locations in one gene and absent from 994 individuals from the 1000 Genomes project; intuitively a highly significant result (Albers et al. Nat Genet 2011). However, in the case where the candidate causative mutations segregate at low frequency in the general population the significance may be less obvious. This raises a number of questions: what is the statistical significance of such findings in small numbers of affected individuals? If we would assume that the causative mutations are not necessarily in coding sequence, would these results be genome-wide significant? Motivated by these issues, we are developing a statistical model based on the idea that filtering out previously seen variants can be thought of as performing a whole-population parametric linkage analysis, whereby the individuals carrying previously seen variants represent the unaffected individuals.

We use the coalescent, a mathematical description of the notion that ultimately all individuals in a population are descendants of a single common ancestor, to model the unknown pedigree shared by the affected individuals and the unaffected individuals. I will discuss implications of population stratification, false positive variant calls and variation in coverage for singleton rates and significance estimates.

DAE 26th September 2011
10:20 to 10:30
On the exploration of Affymetrix ligation-based SNP assays
Single Nucleotide Polymorphisms (SNPs) are genetic variants that take place through the alteration of a single nucleotide (A, C, G or T ) on the DNA sequence. At SNP loca- tions, the possible nucleotides are referred to as alleles and are labelled, for simplicity, as A and B. The combination of alleles (AA, AB and BB) are called genotypes and play an important role on genome-wide association studies (GWAS), through which researchers investigate the association between traits of interest (like diseases) and genomic markers. GWASes depend strongly on the availability of accurate genotypes for the samples involved in the study. Several methodologies can be used for genotyping and one of the most common is DNA microarrays. Affymetrix is well-known for manufacturing one- color arrays comprised of 25 nucleotides long probes. However, their latest genotyping platform, called Axiom, uses a multicolor strategy to label the majority of the SNPs on ligation-based assays that use 30nt long probes.

Previous algorithms for both processing and genotype calling relied on the properties of the data generated by the older assays. Therefore, they may require modifications in order to be used with data from this new product. Here, we present a comprehensive investigation on the properties of the data generated using Axiom arrays, including the changes that are to be implemented on our algorithm for preprocessing SNP data and a discussion about the impact of this shift on downstream analyses involving SNP data.

DAE 26th September 2011
10:30 to 10:40
Inferring an individual's "physiological" age from multiple ageing-related phenotypes
What is ageing? One hypothesis is that ageing is global systemic degradation of multiple organ systems. Based on this assumption we propose a linear model which attempts to infer an individual's "physiological" age from multiple clinical measurements. Inference is performed using the variational Bayes approximation using the Infer.NET framework, a Microsoft Research project akin to WinBUGS. We apply the model to around 6000 individuals in the Twins UK study and look for gene expression levels and SNPs as- sociated with ageing "delta": the dierence between an individual's physiological and chronological age.

We propose an extension allowing non-linear variation of the clinical variables with age using a mixture of experts model. Finally we question whether a model with multiple dimensions of ageing might more closely resemble reality.

DAE 26th September 2011
10:40 to 10:50
Bayesian evidence synthesis to estimate progression of human papillomavirus
Human papillomavirus (HPV) types 16 and 18 are associated with about 70% of cer- vical cancers. To evaluate the long-term benefits of cervical screening and vaccination against HPV, estimates of the natural history of HPV are required. A Markov model has previously been developed to estimate progression rates of HPV, through grades of neoplasia, to cancer. The model was fitted to cross-sectional data by age group from the UK, including data from a trial of HPV testing, population cervical screening data, and cancer registry data. Parameter uncertainties and model choices were originally only acknowledged by informal scenario analysis. We therefore reimplement this model in a Bayesian framework to take full account of parameter and model uncertainty. Assump- tions may then be weighted coherently according to how well they are supported by data. There is a complex network of evidence and parameters, involving misclassified and aggregated data, data available on dierent age groupings, and external data of indirect relevance. This is implemented as a Bayesian graphical model, and posterior distributions are estimated by MCMC. This work raises issues of uncertainty in complex evidence syntheses, and aims to encourage greater use in practice of techniques which are familiar in the statistical world.
DAE 26th September 2011
12:00 to 12:10
S Seaman Inverse probability weighting with missing predictors of missingness or treatment assignment
Inverse probability weighting is commonly used in two situations. First, it is used in a propensity score approach to deal with confounding in non-randomised studies of effect of treatment on outcome. Here weights are inverse probabilities of assignment to active treatment. Second, it is used to correct bias arising when an analysis model is tted to incomplete data by restricting to complete cases. Here weights are inverse probabilities of being a complete case.

Usually weights are estimated by regressing an indicator of whether the individual receives active treatment (in the first situation) or is a complete case (in the second) on a set of predictors.

Problems arise when these predictors can be missing. In this presentation, I shall discuss a method that involves multiply imputing these missing predictors.

DAE 26th September 2011
12:10 to 12:20
Log-concavity, nearest-neighbour classification, variable selection and the Statistics Clinic
I will give a brief overview of some of my current research interests. These include log-concave density estimation and its applications, optimal weighted nearest neighbour classification, and the use of subsampling to improve high-dimensional variable selection algorithms. Finally, I will advertise the Statistics Clinic, http://www.statslab.cam.ac.uk/clinic which meets fortnightly during term, and where anyone in the university can obtain free statistical advice.
DAE 26th September 2011
12:20 to 12:30
Functional Bernstein-type inequalities via Rademacher Processess; with applications to statistics
DAE 26th September 2011
12:30 to 12:40
R Evans Variation independent parametrizations
Variation independence can be a useful tool for developing algorithms and for param- eter interpretation. We present a simple method for creating variation independent parametrizations of some discrete models using Fourier-Motzkin elimination, with some examples.
DAE 26th September 2011
12:40 to 12:50
S Yildirim Forward Smoothing and Online EM in changepoint systems
In this talk, I will focus on forward smoothing in changepoint systems, which are gen- erally used to model the heterogeneity in the statistical data. After showing the SMC implementation of forward smoothing, I will show how we can perform the online EM algorithm for parameter estimation in changepoint systems.
DAE 26th September 2011
14:00 to 14:10
F Huszar & NMT Houlsby Bayesian sequential experiment design for quantum tomography
Quantum tomography is a valuable tool in quantum information processing and ex- perimental quantum physics, being essential for characterisation of quantum states, processes, and measurement equipment. Quantum state tomography (QST) aims to determine the unobservable quantum state of a system from outcomes of measurements performed on an ensemble of identically prepared systems. Measurements in quantum systems are non-deterministic, hence QST is a classical statistical estimation problem.

Full tomography of quantum states is inherently resource-intensive: even in moder- ately sized systems these experiments often take weeks. Sequential optimal experiment design aims at making these experiments shorter by adaptively reconfiguring the mea- surement in the light of partial data. In this talk, I am going to introduce the problem of quantum state tomography from a statistical estimation perspective, and describe a sequential Bayesian Experiment Design framework that we developed. I will report simulated experiments in which our framework achieves a ten-fold reduction in required experimentation time.

DAE 26th September 2011
14:10 to 14:20
Design and analysis of biodiversity experiments
Colleagues in ecology designed an experiment to see whether various favourable responses were affected by the number of different species present in the ecosystem, keeping the total number of organisms constant.

I thought that their data were better explained by a model that was more obvious to me. I will describe the experiment, the family of models we discussed, the conclusion from the data analysis, and the design of subsequent studies.

DAE 26th September 2011
14:20 to 14:30
Optimal design and analysis procedures in two stage trials with a binary endpoint
Two-stage trial designs provide the exibility to stop early for efficacy or futility, and are popular because they have a smaller sample size on average compared to a traditional trial with the same type I and II errors. This makes them financially attractive but also has the ethical benefit of reducing, in the long run, the number of patients who are given ineffective treatments. Therefore designs which minimise the expected sample size are referred to as 'ptimal'. However, two-stage designs can impart a substantial bias into the parameter estimate at the end of the trial. The properties of standard and bias adjusted maximum likelihood estimators, as well as mean and median unbiased estimators are reviewed with respect to a binary endpoint. Optimal two-stage design and analysis procedures are then identified that balance projected sample size considerations with estimator performance.
DAE 26th September 2011
14:30 to 14:40
Multiphase experiments in the biological sciences
Multitiered experiments are characterized as involving multiple randomizations (Brien et al., 2003; Brien and Bailey, 2006). Multiphase experiments are one class of such experiments, other classes being some superimposed experiments and some plant and animal experiments. Particularly common are multiphase experiments with at least one later laboratory phase (Brien et al., 2011); some examples of them, illustrating current research areas, will be presented.
DAE 26th September 2011
14:40 to 14:50
Casual inference in observational epidemiology: looking into mechanism
We propose a method for the study of gene-gene, gene-environment and gene-treatment interactions which are interpretable in terms of mechanism. Tests for detecting mechanistic - as opposed to 'statistical' - interactions have been previously proposed, but they are meaningful only if a number of assumptions and conditions are verified. Con- sequently, they are not always applicable and, in those situations where they are, their validity depends on an appropriate choice of stratifying variables. This paper proposes a novel formulation of the problem. We illustrate the method with the aid of studies where evidence from case-control studies of genetic association is combined with information from biological experiments, to elucidate the role of specific molecular mechanism (au- tophagy, ion channels) in susceptibility to specific diseases (Crohn's Disease, Multiple Sclerosis).
DAE 26th September 2011
14:50 to 15:00
Structural equation modeling analysis for causal inference from multiple omics datasets
Recent developments in technology allow us to collect multiple highly-dimensional 'omics' datasets from thousands of individuals in a highly standardized and unbiased manner. Open questions remain how best to integrate the multiple omics datasets to un- derstand underlying biological mechanisms and infer causal pathways. We have begun exploring causal relationships between genetic variants, clinically-relevant quantitative phenotypes and metabolomics datasets using Structural Equation Modeling (SEM), ap- plied to a subset of the common disease loci identified from genome-wide association studies. We provide proof-of-principle evidence that SEM analysis is able to identify reproducible path models supporting association of SNPs to intermediate phenotypes through metabolomics intermediates. We address further challenges arising from the analysis of multiple omics datasets and suggest future directions including nonlinear model based approaches and the simultaneous dimension reduction (or variable selection) methods.
DAE 26th September 2011
16:05 to 16:15
Identifying the effect of treatment on the treated
In the counterfactual literature, the effect of treatment on the treated (ETT) is often branded as the effect on the treated group. This definition of ETT is vague and poten- tially misleading because ETT is the effect on those who would normally be treated. A more transparent definition of ETT is given within the decision theoretic framework. The proposed definition of ETT is used to highlight misuse of terminology in the lit- erature and discuss the types of studies that can be used for identifying ETT. Criteria for identifying ETT from observational data, when there are unobserved confounders, are given. The criteria are compared to those formulated within the counterfactual framework.
DAE 26th September 2011
16:15 to 16:25
Communicating and evaluating probabilities
I will talk about three related projects: (a) Getting children to express numerical confidence in their knowledge (b) Collaboration with the Met Office in an online weather game incorporating probabilistic forecasts. (c) Development of an online quiz.
DAE 26th September 2011
16:25 to 16:35
S Nowozin Statistical problems in computer vision
Computer vision is one of the many fields that successfully adopted machine learning for building predictive models. Yet, despite their success some of the fields' most popularly used models such as conditional random fields remain poorly understood theoretically and require approximations to be practical. I discuss a few of open theoretical and practical questions in these models in the computer vision context.
DAE 26th September 2011
16:35 to 16:45
A Bejan Using velocity fields in evaluating urban traffic congestion via sparse public transport data and crowedsourced maps
It is widely recognised that congestion in urban areas causes financial loss to business and increased use of energy compared with free owing traffic. Providing one with accurate information on traffic conditions can encourage journeys at times of low congestion and uptake of public transport. Installing a static measurement infrastructure in a city to provide this information may be an expensive option and potentially invade privacy. Increasingly, public transport vehicles are equipped with sensors to provide realtime arrival time estimates, but these data are fl eet specific and sparse. The recent work with colleagues from the Cambridge University Computer Laboratory showed how to overcome data mining issues and use this kind of data to statistically analyse journey times experienced by road users generally (i.e. journey durations experienced by public transport users as well as individual car drivers) and in uence of various factors (e.g. time of day, school/out of school term effects, etc)[Be10, Ba11]. Furthermore, we showed how the specifics of these location data may be used in conjunction with other sources of data, such as crowdsourced maps, in order to recover speed information from the sparse movement data and reconstruct information on transport traffic fl ow dynamics in terms of velocity fields on road networks[Be11]. In my short talk I will present a number of snapshots illustrating this analysis and some results and introduce the problem of comparing/classifying velocity fields and early spotting of accidents and their consequences for the traffic and road users.
DAE 26th September 2011
16:45 to 16:55
Evaluating Peterborough's no cold calling initiative using space-time Bayesian hierarchical modelling
As part of a wider Neighbourhood Policing strategy, Cambridgeshire Constabulary in- stituted "No Cold Calling" (NCC) zones to reduce cold calling (unsolicited visits to sell products/services), which is often associated with rogue trading and distraction bur- glary. We evaluated the NCC-targeted areas chosen in 2005-6 and report whether they experienced a measurable impact on burglary rates in the period up to 2008. Time series data for burglary at the Census Output Area level is analysed using a Bayesian hierarchical modelling approach, addressing issues often encountered in small area quan- titative policy evaluation. Results reveal a positive NCC impact on stabilising burglary rates in the targeted areas.
DAE 26th September 2011
17:10 to 17:30
P Dawid Cambridge Statistics Initiative (CSI) Special One-Day meeting: Discussion & Conclusions
DAE 11th October 2011
11:00 to 12:30
INRA: Generation of flexible regular fractional factorial designs and its implementation in R
DAE 13th October 2011
14:00 to 15:00
H Grossmann Automatic analysis of variance for orthogonal plot structures
DAE 25th October 2011
11:00 to 12:00
Optimal cross-over designs for full interaction models
DAEW06 26th October 2011
09:30 to 12:30
Monomial Ideals

Eduardo Saenz (Universidad de La Rioja) Introduction to monomial ideals and reliability

The analysis of system reliability can be performed using the algebra of monomial ideals. In this short introduction we give an idea of what algebraic tools are used, the advantages and drawbacks of this method and some directions of future work.

Henry Wynn (London School of Economics) Monomial ideals, Alexander duality

Hugo Maruri-Aguilar (Queen Mary, University of London) Aberration and Betti numbers

DAEW06 26th October 2011
13:30 to 17:00
T Kahle & G Pistone & F Rapallo & S Kuhnt & D Bruynooghe Graphical models, Markov bases and related topics

Thomas Kahle (Max-Planck-Institut fur Mathematik, Leipzig) What's new with Markov Bases and hot to understand support sets of log-linear models via polytopes

Gianni Pistone (Università degli Studi di Torino) Hilbert basis in design and loglinear models Abstract - see below.

Fabio Rapallo (Università degli Studi del Piemonte Orientale) Weakened independence models

It is known that in a two-way contingency table the set of all 2 by 2 minors characterizes the independence model. A family of models can be defined by selecting subsets of minors. These models are termed as "weakened independence models'' (Carlini and Rapallo, 2011). Restricted to adjacent minors, some results have been obtained in the study of such models. For instance, the sufficient statistic is fully described. Several problems are still open in this research topic:
  • (a) the use of such models to detect clusters in the contingency tables;
  • (b) to study the connections between weakened independence models and mixture models;
  • (c) to generalize the definition to more complex models; (d) to extend the theory to general 2 by 2 minors.

Sonja Kuhnt (Technische Universität Dortmund) Algebraic identifiability and comparison of generalised linear models

As part of the Collaborative Research Centre 823 at the Technical University of Dortmund we work on a project which is concerned with "Modelling and controlling thermokinetic coating processes". The effect of spraying parameters on the coating quality is indirectly modelled by using in-flight characteristics of the particles. Therefore we study firstly the relationship between machine parameters X and in-flight particles Y and secondly the relationship between the in-flight particles Y and the coating properties Z. Besides the main topic of modelling and controlling of the two step process we are interested in the choice of suitable experimental designs. Here we extract two research questions, which can be set into the algebraic statistics framework. So far generalised linear models have turned out to be a suitable model class. We would like to know which models can be identified based on a chosen design. Secondly we would like to compare the difference in models derived from a direct regression of Z on X compared with a regression of Z on Y, where the regression relationship from X to Y is known. These questions can be reformulated as follows:
  • 1. How can we derive results of algebraic identifiability with respect to generalized linear models?
  • 2. How can we compare direct and two step regression models based on algebraic statistics?
We look forward to a productive discussion of these problems.

Daniel Bruynooghe (London School of Economics) Differential cumulants and monomial ideals

DAEW06 27th October 2011
09:00 to 12:30
R Bailey & H Warren & M Piera Rogantin & R Fontana & H Maruri-Aguilar Algebraic Approaches to Combinatorial Design

Hugo Maruri-Aguilar(Queen Mary, University of London)Some computational results for block designs

Rosemary Bailey (Queen Mary, University of London) Connectivity in block design, and Laplacian matrices

Helen Warren (London School of Hygiene and Tropical Medicine) Robustness of block designs

A new robustness criteria, Vulnerability, measures the likelihood of an incomplete block design resulting in a disconnected eventual design due to the loss of random observations during the course of the experiment. Formulae have been derived for calculating the vulnerability measure, which aids in design selection and comparison, by producing a full vulnerability ranking of a set of competing designs. For example, this provides a new method for distinguishing between non-isomorphic BIBDs, since despite them all having identical optimality properties, their vulnerabilities can vary. Theory has been developed relating design concurrences to block intersection counts. These combinatorial results have provided further insight into the properties and characteristics of robust designs. Furthermore these have led to interesting closure properties for vulnerability between BIBDs and their complements, between BIBDs and non-balanced designs constructed from them by the removal of or addition of blocks (e.g. Regular Graph Designs, Nearly Balanced Designs), and between BIBDs and replicated BIBDs. It would be interesting to investigate the combinatorial properties of replicated designs in more detail, from connectedness and optimality perspectives, especially since other work on crossover designs has similarly found that replication leads to less robustness, and in order to extend the concept of vulnerability to other blocked designs, e.g. row-column designs, crossover designs and factorial designs. Finally it would be interesting to incorporate prior knowledge of varying probabilities for each observation being lost, rather than assuming observation loss to be random.

Maria Piera Rogantin (Università degli Studi di Genova) Use of indicator functions in design

Roberto Fontana (Politecnico di Torino) Algebra and factorial designs

DAE 10th November 2011
17:00 to 18:00
P Dawid Causal Inference from Experimental Data (30th R A Fisher Memorial Lecture)
One of the greatest scientific advances of the 20th Century was not substantive, but methodological: the laying out by Fisher of the principles of sound experimentation, so allowing valid conclusions to be drawn about the effects of interventions - what we must surely regard as "causal inference". More recently "causal inference" has developed as a major enterprise in its own right, with its own specialist formulations and methods; however, these owe more to Neyman than to Fisher. In this lecture I shall explore the connexions and contrasts between older and newer ideas in causal inference, revisit an old argument between Neyman and Fisher, and argue for the restructuring of modern theories of causal inference along more Fisherian lines.
DAE 24th November 2011
14:00 to 14:30
Information in Two Stage Adaptive Design
DAE 24th November 2011
14:30 to 15:00
Optimizing Combination Therapy under a Bivariate Weibull Distribution with Application to Toxicity and Efficacy Responses
OFB010 30th November 2011
11:00 to 12:00
T Davis Dimensional Analysis in Experimental Design
In this talk, and since we are in the Isaac Newton Institute, I will focus on using the physics of the problem being tackled to determine a strategy to design an experiment to fit a model for prediction. At the heart of the approach is an application of Edgar Buckingham’s 1914 “Pi” theorem. Buckingham’s result, which is based on dimensional analysis, has been seemingly neglected by statisticians, but it provides a “bridge” between a purely theoretical approach to model building, and an empirical one based on e.g. polynomial approximations such as 2nd order response surfaces. I will illustrate the ideas with a few examples, in the hope that I can show that dimensional analysis should take its place at the heart of experimental design in engineering applications.
OFB010 30th November 2011
12:00 to 12:30
DoE in the Automotive Industry - Approaching the Limits of Current Methods?
The presentation outlines the main applications of DoE in the field of automotive engine development and calibration. DoE has been applied to engine calibration (optimising the settings of electronically-controlled engine parameters for low emissions and fuel consumption) for many years. The task has become significantly more complex in recent years due to the various new fuel injection technologies, and up to ten variables must be calibrated at each and every engine speed and load. Many engine responses are non-linear and there are considerable interactions between control variables so the conservative approach of separate DoEs at multiple speed-load conditions still predominates. Polynomials are adequate for such "local" models with narrow variable ranges and six or fewer variables. But over wider ranges or when speed and load are included (so-called "global" models) responses are highly non-linear and polynomials are unsuitable. Some practitioners use radial basis functions, neural networks or stochastic process models but such methods do not always yield the requisite accuracy for "global" models. Furthermore, the most reliable of these techniques, stochastic process models, are limited by computational considerations when datasets are large. The overview of the current "state of the art" methods is presented with the aim of stimulating discussion on what mathematical methods could form the basis of future DoE tools for the automotive industry.
OFB010 30th November 2011
14:00 to 14:30
From ideas to implementation
At GlaxoSmithKline we use sequential experimental design to generate the process understanding that identifies critical process parameters and safe operating conditions for the manufacture of active pharmaceutical ingredients used in medicines. We are always looking for more efficient and effective experimental strategies and ways of managing uncertainty. At a recent conference, Stuart Hunter observed that "within the last ten years there has been some spectacular progress in the field of experimental design - the arena has completely changed". We want to show the decision-making progress around how much, and when to invest in experimental designs, what are the benefits, what risks are faced and are these acceptable? We will discuss how we have taken learnings from a couple of recently published papers on Supersaturated Designs and Definitive Designs and show how we have implemented these ideas to add value within GlaxoSmithKline References Marley, C. J. and Woods, D. C. (2010), "A comparison of design and model selection methods for supersaturated experiments," Computational Statistics and Data Analysis, 54, 3158-3167. A Class of Three-Level Designs for Definitive Screening in the Presence of Second-Order Effects Brad Jones, SAS Institute, Christopher J. Nachsteim Journal of Quality Technology Vol. 43, No. 1, January 2011
OFB010 30th November 2011
14:30 to 15:00
Mixture of Mixture Designs: Optimisation of Laundry Formulations
OFB010 30th November 2011
15:00 to 15:30
Experimental design challenges in fuel & lubricant R & D
OFB010 30th November 2011
16:30 to 17:30
Industry Day Panel Discussion
DAE 8th December 2011
14:00 to 15:00
Multiplicative algorithms for computing D-optimal experimental designs with linear constraints on the design weights
DAE 19th December 2011
14:00 to 15:00
Strategies to Analyse Unreplicated Factorial Split-Plot Designs
DAE 20th December 2011
10:00 to 11:00
An introduction to polynomial chaos and its applications
DAEW07 6th July 2015
10:00 to 10:45
Designing an adaptive trial with treatment selection and a survival endpoint
We consider a clinical trial in which two versions of a new treatment are compared against control with the primary endpoint of overall survival. At an interim analysis, mid-way through the trial, one of the two treatments is selected, based on the short term response of progression-free survival. For such an adaptive design the familywise type I error rate can be protected by use of a closed testing procedure to deal with the two null hypotheses and combination tests to combine data from before and after the interim analysis. However, with the primary endpoint of overall survival, there is still a danger of inflating the type I error rate: we present a way of applying the combination test that solves this problem simply and effectively. With the methodology in place, we then assess the potential benefits of treatment selection in this adaptive trial design.
DAEW07 6th July 2015
11:15 to 12:00
Adaptive dose-finding with power control
A main objective in dose-finding trials besides the characterisation of dose-response relationship is often to prove that the drug has effect. The sample size calculation prior to the trial aims to control the power for this effect proof. In most cases however, there is great uncertainty concerning the anticipated effect of the drug during planning. Sample size re-estimation based on an unblinded interim effect estimate has been used in this situation. In practice, this re-estimation can have drawbacks as sample size becomes variable which makes planning and funding complicated. Further it introduces the risk that people start to speculate about the effect once the re-estimated sample size is communicated. In this talk, we will investigate a design which avoids this problem but controls the power for the effect proof in an adaptive way. We discuss methods for proper statistical inference for the described design.
DAEW07 6th July 2015
12:00 to 12:30
Start-up designs for response-adpative randomization procedures
Response-adaptive randomization procedures are appropriate for clinical trials in which two or more treatments are to be compared, patients arrive sequentially and the response of each patient is recorded before the next patient arrives. For those procedures which involve sequential estimation of model parameters, start-up designs are commonly used in order to provide initial estimates of the parameters. In this talk a suite of such start-up designs for two treatments and binary patient responses are considered and compared in terms of the numbers of patients required in order to give meaningful parameters estimates, the number of patients allocated to the better treatment and the bias in the parameter estimates. It is shown that permuted block designs with blocks of size 4 are to be preferred over a wide range of parameter values. Results from a simulation study involving complete trials will also be reported.
DAEW07 6th July 2015
13:30 to 14:15
An adaptive optimal design with a small fixed stage-one sample size
A large number of experiments in clinical trials, biology, biochemistry, etc. are, out of necessity, conducted in two stages. A first-stage experiment (a pilot study) is often used to gain information about feasibility of the experiment or to provide preliminary data for grant applications. We study the theoretical statistical implications of using a small sample of data (1) to design the second stage experiment and (2) in combination with the second-stage data for data analysis. To illuminate the issues, we consider an experiment under a non-linear regression model with normal errors. We show how the dependency between data in the different stages affects the distribution of parameter estimates when the first-stage sample size is fixed and finite; letting the second stage sample size go to infinity, maximum likelihood estimates are found to have a mixed normal distribution.
DAEW07 6th July 2015
14:15 to 15:00
Some critical issues in adaptive experimental designs for treatment comparisons
The study of adaptive designs has received impetus over the past 20 years, mainly because of its potential in applications to clinical trials, and a large number of these designs is now available in the statistical and biomedical literature, starting from simple ones like Efron’s Biased Coin Design and Zelen’s Play-the-Winner, to the more sophisticated ones: D_A-optimum Biased Coin designs, Doubly-Adaptive designs, ERADE, Randomly Reinforced Urns, Reinforced Doubly-Adaptive Biased Coins, etc., not forgetting covariates (Covariate-Adjusted Response-Adaptive, Covariate-Adaptive Biased Coin, Covariate-adjusted Doubly-adaptive Biased Coin designs, etc.).

A complicating factor is that nowadays adaptive experiments are in general multipurpose: they try to simultaneously achieve inferential efficiency, bias avoidance, and utilitarian or ethical gains. Another is the very nature of an adaptive design, which is a random experiment that may or may not converge to a fixed treatment allocation.

This talk does not intend to be a survey of the existing literature, rather it is an effort to highlight potentially critical points: inference after an adaptive experiment, combined versus constrained optimization, speed of convergence to a desired target, possible misuse of simulations. Each of these points will be discussed with reference to one or more specific adaptive designs.

DAEW07 6th July 2015
15:30 to 16:15
V Dragalin Adaptive population enrichment designs
There is a growing interest among regulators and sponsors in using precision medicine approaches that allow for targeted patients to receive maximum benefit from the correct dose of a specific drug. Population enrichment designs offer a specific adaptive trial methodology to study the effect of experimental treatments in various sub-populations of patients under investigation. Instead of limiting the enrolment only to the enriched population, these designs enable the data-driven selection of one or more pre-specified subpopulations at an interim analysis and the confirmatory proof of efficacy in the selected subset at the end of the trial. In this presentation, the general methodology and designing issues when planning such a design will be described and illustrated using two case studies.
DAEW07 6th July 2015
16:15 to 17:00
A population-finding design with non-parametric Bayesian response model
Targeted therapies on the basis of genomic aberrations analysis of the tumor have become a mainstream direction of cancer prognosis and treatment. Studies that match patients to targeted therapies for their particular genomic aberrations, across different cancer types, are known as basket trials. For such trials it is important to find and identify the subgroup of patients who can most benefit from an aberration-specific targeted therapy, possibly across multiple cancer types.

We propose an adaptive Bayesian clinical trial design for such subgroup identification and adaptive patient allocation. We start with a decision theoretic approach, then construct a utility function and a flexible non-parametric Bayesian response model. The main features of the proposed design and population finding methods are that we allow for variable sets of covariates to be recorded by different patients and, at least in principle, high order interactions of covariates. The separation of the decision problem and the probability model allows for the use of highly flexible response models. Another important feature is the adaptive allocation of patients to an optimal treatment arm based on posterior predictive probabilities. The proposed approach is demonstrated via extensive simulation studies.

DAEW07 7th July 2015
10:05 to 10:30
10 years of progress in population design methodology and applications
DAEW07 7th July 2015
10:30 to 11:00
S Leonov & T Mielke Optimal design and parameter estimation for population PK/PD models
In this presentation we discuss methods of model-based optimal experimental design that are used in population pharmacokinetic/pharmacodynamic studies and focus on links between various parameter estimation methods for nonlinear mixed effects models and various options for approximation of the information matrix in optimal design algorithms.
DAEW07 7th July 2015
11:30 to 12:00
Evaluation of the Fisher information matrix in nonlinear mixed effects models using Monte Carlo Markov Chains
For the analysis of longitudinal data, and especially in the field of pharmacometrics, nonlinear mixed effect models (NLMEM) are used to estimate population parameters and the interindividual variability. To design these studies, optimal design based on the expected Fisher information matrix (FIM) can be used instead of performing time-consuming clinical trials simulations. Until recently, the FIM in NLMEM was mostly evaluated with first-order linearization (FO). We propose an approach to evaluate the exact FIM using Monte Carlo (MC) approximation and Markov Chains Monte Carlo (MCMC). Our approach is applicable to continuous as well as discrete data and was implemented in R using the probabilistic programming language STAN. This language enables to efficiently draw MCMC samples and to calculate the partial derivatives of the conditional log-likelihood directly from the model. The method requires several minutes for a FIM evaluation but yields an asymptotically exact FIM. Furthermore, computation time remains similar even for complex models with many parameters. We compare our approach to clinical trials simulation for various continuous and discrete examples.
DAEW07 7th July 2015
12:00 to 12:30
Computation of the Fisher information matrix for discrete nonlinear mixed effects models
Despite an increasing use of optimal design methodology for non-linear mixed effect models (NLMEMs) during the clinical drug development process (Mentr´e et al., 2013), examples involving discrete data NLMEMs remain scarce (Ernest et al., 2014). One reason are the limitations of existing approaches to calculate the Fisher information matrix (FIM) which are either model dependent and based on linearization (Ogungbenro and Aarons, 2011) or computationally very expensive (Nyberg et al., 2009). The main computational challenges in the computation of the FIM for discrete NLMEMs evolve around the calculation of two integrals. First the integral required to calculate the expectation over the data and second the integral of the likelihood over the distribution of the random effects. In this presentation Monte-Carlo (MC), Latin-Hypercube (LH) and Quasi-Random (QR) sampling for the calculation of the first as well as adaptive Gaussian quadrature (AGQ) and QR sampling for the calculation of the second integral are proposed. The resulting methods are compared for a number of discrete data models and evaluated in the context of model based adaptive optimal design.
DAEW07 7th July 2015
14:00 to 14:30
Designs for generalized linear models with random block effects
For an experiment measuring independent discrete responses, a generalized linear model, such as the logistic or log-linear, is typically used to analyse the data. In blocked experiments, where observations from the same block are potentially correlated, it may be appropriate to include random effects in the predictor, thus producing a generalized linear mixed model. Selecting optimal designs for such models is complicated by the fact that the Fisher information matrix, on which most optimality criteria are based, is computationally expensive to evaluate. In addition, the dependence of the information matrix on the unknown values of the parameters must be overcome by, for example, use of a pseudo-Bayesian approach. For the first time, we evaluate the efficiency, for estimating conditional models, of optimal designs from closed-form approximations to the information matrix, derived from marginal quasilikelihood and generalized estimating equations. It is found that, for binary-response models, naive application of these approximations may result in inefficient designs. However, a simple correction for the marginal attenuation of parameters yields much improved designs when the intra-block dependence is moderate. For stronger intra-block dependence, our adjusted marginal modelling approximations are sometimes less effective. Here, more efficient designs may be obtained from a novel asymptotic approximation. The use of approximations from this suite reduces the computational burden of design search substantially, enabling straightforward selection of multi-factor designs.
DAEW07 7th July 2015
14:30 to 15:00
Design and Analysis of In-Vitro Pharmacokinetic Experiments
In many pharmacokinetic experiments, the main goal is to identify enzymes that are related to the metabolic process of a substrate of interest. Since most of the enzymes that are involved in drug metabolism are located in the liver, human liver microsomes (HLMs) are used in these in vitro experiments. Experimental runs are conducted for each HLM at different levels of substrate concentration and the response, the initial rate of reaction, is measured from each experimental run. The relationship between such a response and the substrate concentration is usually nonlinear and so it is assessed from the size of the nonlinear regression parameters. However, the use of different HLMs requires additional random effects and there might also be covariate information on these HLMs. A further complication is uncertainty about the error structure of the resulting nonlinear mixed model. Methods for obtaining optimal designs for such models will be described. The resulting designs will be compared with the larger designs used in current practice. It will be shown that considerable savings in experimental time and effort are possible. Practical issues around the design and analysis will be discussed, along with suggestions of how the methods are best implemented.
DAEW07 7th July 2015
15:00 to 15:30
Incorporating pharmacokinetic information in phase I studies in small populations
Objectives: To review and extend existing methods which take into account PK measurements in sequential adaptive designs for early dose-finding studies in small populations, and to evaluate the impact of PK measurements on the selection of the maximum tolerated dose (MTD). Methods: This work is set in the context of phase I dose-finding studies in oncology, where the objective is to determine the MTD while limiting the number of patients exposed to high toxicity. We assume toxicity to be related to a PK measure of exposure, and consider 6 possible dose levels. Three Bayesian phase I methods from the literature were modified and compared to the standard continual reassessment method (CRM) through simulations. In these methods PK measurement, more precisely the AUC, is present as covariate for a link function of probability of toxicity (Piantadosi and Liu (1996), Whitehead et al. (2007)) and/or as dependent variable in linear regression versus dose (Patterson et al. (1999), Whitehead et al. (2007)). We simulated trials based on a model for the TGF- inhibitor LY2157299 in patients with glioma (Gueorguieva et al., 2014). The PK model was reduced to a one-compartment model with first-order absorption as in Lestini et al. (2014) in order to achieve a close solution for the probability of toxicity. Toxicity was assumed to occur when the value of a function of AUC was above a given threshold, either in the presence or without inter-individual variability (IIV). For each scenario, we simulated 1000 trials with 30, 36 and 42 patients. Results: Methods which incorporate PK measurements had good performance when informative prior knowledge was available in term of Bayesian prior distribution on parameters. On the other hand, keeping fixed the priors information, methods that included PK values as covariate were less flexible and lead to trials with more toxicities than the same trials with CRM. Conclusion: Incorporating PK values as covariate did not alter the efficiency of estimation of MTD when the prior was well specified. The next step will be to assess the impact on the estimation of the dose-concentration-toxicity curve for the different approaches and to explore the introduction of fully model-based PK/PD in dose allocation rules.
DAEW07 7th July 2015
16:00 to 16:30
Model based adaptive optimal designs of adult to children bridging studies using FDA proposed stopping criteria
In this work we apply the FDA proposed precision criteria necessary for pediatric pharmacokinetic studies (Wang et. al., 2012) as a stopping criteria for a model based adaptive optimal design (MBAOD) of an adult to children pharmacokinetic bridging study. We demonstrate the power of the MBAOD compared to both traditional designs as well as non-adaptive optimal designs.
DAEW07 7th July 2015
16:30 to 17:00
Cost constrained optimal designs for regression models with random parameters
I describe various optimization problems related to the design of experiments for regression models with random parameters, aka mixed effect models and population models. In the terms of the latter two different goals can be pursuit: estimation of population parameters and individual parameters. Respectively we have to face two types of optimality criteria and cost constraints. Additional strata appear if one would observe that the following two experimental situations occur in practice: either repeated observations are admissible for a given experimental unit (object or subject), or not. Clinical studies with multiple sites with slightly different treatment outcomes (treatment-by-cite interaction) is an example when repeated and independent observation are possible - a few subjects can on each treatment arm. PK studies may serve as an example when repeated observations cannot be performed - only one observation at the given moment can be performed on a subject. All these caveats lead to the different design problems that I try to link together.
DAEW07 8th July 2015
09:00 to 09:40
Model-robust designs for quantile regression
We give methods for the construction of designs for regression models, when the purpose of the investigation is the estimation of the conditional quantile function and the estimation method is quantile regression. The designs are robust against misspecified response functions, and against unanticipated heteroscedasticity. Our methods, previously developed for approximate linear models, are modified so as to apply to non-linear responses. The results will be illustrated in a dose-response example.
DAEW07 8th July 2015
09:40 to 10:20
Locally optimal designs for errors-in-variables models
We consider the construction of locally optimal designs for nonlinear regression models when there are measurement errors in the predictors. Corresponding approximate design theory is developed for models subject to a functional error structure for both maximum likelihood and least squares estimation where the latter leads to non-concave optimisation problems. Locally D-optimal designs are found explicitly for the Michaelis-Menten, EMAX and exponential regression models and are then compared with the corresponding designs derived under the assumption of no measurement error in concrete applications.
DAEW07 8th July 2015
10:20 to 11:00
WK Wong Nature-inspired meta-heuristic algorithms for generating optimal experimental designs
Nature-inspired meta-heuristic algorithms are increasingly studied and used in many disciplines to solve high-dimensional complex optimization problems in the real world. It appears relatively few of these algorithms are used in mainstream statistics even though they are simple to implement, very flexible and able to find an optimal or a nearly optimal solution quickly. Frequently, these methods do not require any assumption on the function to be optimized and the user only needs to input a few tuning parameters.

I will demonstrate the usefulness of some of these algorithms for finding different types of optimal designs for nonlinear models in dose response studies. Algorithms that I plan to discuss are more recent ones such as Cuckoo and Particle Swarm Optimization. I also compare their performances and advantages relative to deterministic state-of-the art algorithms.

DAEW07 8th July 2015
11:30 to 12:00
Construction of efficient experimental designs under resource constraints
We will introduce "resource constraints" as a general concept that covers many practical restrictions on experimental design, such as limits on budget, time, or available material. To compute optimal or near-optimal exact designs of experiments under multiple resource constraints, we will propose a tabu search heuristic related to the Detmax procedure. To illustrate the scope and performance of the proposed algorithm, we chose the problem of construction of optimal experimental designs for dose escalation studies.
DAEW07 8th July 2015
12:00 to 12:30
Designs for dose-escalation trials
For First-in-Human trials of a new drug, healthy volunteers are recruited in cohorts. For safety reasons, only the lowest dose and placebo may be used in the first cohort, and no new planned dose may be used until the one immediately below has been used in a previous cohort. How should doses be allocated to cohorts?
OFB024 9th July 2015
10:00 to 10:15
J Toland Welcome and Introduction
OFB024 9th July 2015
10:15 to 10:45
V Dragalin Design of Experiments for IMI2 Strategic Research Agenda "The Right Prevention and Treatment for the Right Patient at The Right Time"
OFB024 9th July 2015
10:45 to 11:15
IDeAL (Integrated DEsign and AnaLysis of small population group trials) - a Collaborative Approach between Industry and Academia
OFB024 9th July 2015
11:30 to 12:00
Accounting for Parameter Uncertainty in Two-Stage Designs for Phase II Dose-Response Studies
OFB024 9th July 2015
12:00 to 12:30
Adaptive Dose Finding Designs in the Presence Of Model Uncertainty
OFB024 9th July 2015
12:30 to 13:00
Confirmatory Testing for a Beneficial Treatment Effect in Dose-Response Studies using MCP-Mod and an Adaptive Interim Analysis
OFB024 9th July 2015
13:45 to 14:15
Bayesian Adaptive Randomisation Revisited
OFB024 9th July 2015
14:15 to 14:45
Y Tymofyeyev Brick Tunnel Randomization, Overcoming Inconvenient Allocation Ratio
OFB024 9th July 2015
14:45 to 15:15
Optimal Design of Clinical Trials with Censoring Driven by Random Enrollment
OFB024 9th July 2015
15:30 to 16:00
Sequential Parallel Comparison Design for Trials with High Placebo Response: Overview and Case Studies
OFB024 9th July 2015
16:00 to 16:15
Regulatory Aspects of Adaptive Clinical Trial Designs: Perspectives from Scientific Advice to Marketing Authorisation Applications (Part 1)
OFB024 9th July 2015
16:15 to 16:30
Regulatory Aspects of Adaptive Clinical Trial Designs: Perspectives from Scientific Advice to Marketing Authorisation Applications (Part 2)
OFB024 9th July 2015
16:30 to 17:00
Open Discussion & Questions
DAEW07 10th July 2015
09:00 to 09:45
Efficient study designs
In the current economic climate there is a tightening of research income while the demand to answer important clinical questions it could be argued is the same or is even increasing. Efficient study designs enable the questions to be answered while making optimal use of finite resource.

An efficient design is defined as "statistically robust" designs that facilitate quicker and/or lower cost decisions when compared to conventional trial design. An efficient design is an umbrella term which encompasses aspects such adaptive designs and trials where routine health service data are used for the primary outcomes as well as "smarter" ways of doing trials. By being efficient in the study design resources and time can be saved in the assessment of new health technologies.

Through a series case studies the benefits of efficient study designs will be highlighted. For example: The PLEASANT trial used routinely collected data for data collection and as a result was approximately £1 million cheaper; retrospective analysis of RATPAC as a group sequential showed that if the trial had been adaptive it would have required a third of the patients and saved £250,000. An audit of public funded trials suggested that if futility were undertaken then 20% of trials would stop early and the proportion of successfully recruiting trials would increase from 45% to 64%. As well as highlighting the benefits the presentation will also discuss some of the challenges.

DAEW07 10th July 2015
09:45 to 10:30
Cross-sectional versus longitudinal design: does it matter?
Mixed models play an important role in describing observations in healthcare. To keep things simple we only consider linear mixed models, and we are only interested in the typical response, i.e. in the population location parameters. When discussing the corresponding design issues one is often faced with the widespread belief that standard designs which are optimal, when the random effects are neglected, retain their optimality in the mixed model. This seems to be obviously true, if there are only random block effects related to a unit specific level (e.g. random intercept). However, if there are also random effect sizes which vary substantially between units, then these standard designs may lose their optimality property. This phenomenon occurs in the situation of a cross-sectional design, where the experimental setting is fixed for each unit and varies between units, as well as in the situation of a longitudinal design, where the experimental setting varies within units and coincides across units. We will compare the resulting optimal solutions and check their relative efficiencies.
DAEW07 10th July 2015
11:00 to 11:45
Randomization for small clinical trials
Rare disease clinical trials present unique problems for the statistician designing the study. First, the trial may be small to reflect the uniquely small population of diseased in the population. Hence, the usual large sample beneficial properties of randomization (balancing on unknown covariates, distribution of standard tests, converging to a target allocation) may not apply. We describe the impact of such trials on consideration of randomization procedures, and discuss randomization as a basis for inference. We conclude that, in small trials, the randomization procedure chosen does matter, and randomization tests should be used as a matter of course due to its property of preserving the type I error rate under time trends.
DAEW07 10th July 2015
11:45 to 12:30
Assessment of randomization procedures based on single sequences under selection bias
Randomization is a key feature of randomized clinical trials aiming to protect against various types of bias. Different randomization procedures were introduced in the past decades and their analytical properties have been studied by various authors. Among others, balancing behaviour, protection against selection and chronological bias etc have been investigated. However, in summary no procedure performs best on all criteria. On the other hand, in the design phase of a clinical trial the scientist has to select a particular randomization procedure to be used in the allocation process which takes into account the research conditions of the trial. Up to now, less support is available to guide the scientist hereby, e.g. to weigh up the properties with respect to practical needs of the research question to be answered by the clinical trial. We propose a method to assess the impact of chronological and selection bias in a parallel group randomized clinical trial with continuous normal outcome on the probability of type one error to derive scientific arguments for selection of an appropriate randomization procedure.

This is joint work with Simon Langer.

DAEW07 10th July 2015
13:30 to 14:15
H Dette Optimal designs for correlated observations
In this talk we will relate optimal design problems for regression models with correlated observations to estimation problems in stochastic differential equations. By using "classical" theory of mathematical statistics for stochastic processes we provide a complete solution of the optimal design problem for a broad class of correlation structures.
DAEW07 10th July 2015
14:15 to 15:00
Designs for dependent observations
We consider the problem of optimal experimental design for the regression models on an interval, where the observations are correlated and the errors come from either Markov or conditionally Markov process. We study transformations of these regression models and corresponding designs. We show, in particular, that in many cases we can assume that the underlying model of errors is the Brownian motion.

This is a joint work with H. Dette and A. Pepelyshev.

DAEW07 10th July 2015
15:30 to 16:15
J Kunert Optimal crossover designs with interactions between treatments and experimental units
The interactions are modelled as a random variable and therefore introduce a correlation between observations of the same treatment on the same experimental unit. Therefore, we have a dependence structure that depends on the design.
DAEW07 10th July 2015
16:15 to 17:00
J Stufken On connections between orthogonal arrays and D-optimal designs for certain generalized linear models with group effects
We consider generalized linear models in which the linear predictor depends on a single quantitative variable and on multiple factorial effects. For each level combination of the factorial effects (or run, for short), a design specifies the number of values of the quantitative variable (or values, for short) that are to be used, and also specifies what those values are. Moreover, a design informs us for each combinations of runs and values what proportion of times that it should be used. Stufken and Yang (2012, Statistica Sinica) obtained complete class results for locally optimal designs for such models. The complete classes that they found consisted of designs with at most two values for each run. Many optimal designs found numerically in these complete classes turned out to have precisely two values for each run, resulting in designs with a large support size. Focusing on D-optimality, we show that under certain assumptions for the linear predictor, optimal designs with smaller support sizes can be found through the use of orthogonal arrays. This work is joined with Xijue Tan, and was part of his PhD dissertation at the University of Georgia.
University of Cambridge Research Councils UK
    Clay Mathematics Institute London Mathematical Society NM Rothschild and Sons