skip to content

Large-scale multiple testing: finding needles in a haystack

Presented by: 
T Cai [Pennsylvania]
Monday 23rd June 2008 - 15:30 to 16:30
INI Seminar Room 1
Session Chair: 
Richard Samworth

Due to advances in technology, it has become increasingly common in scientific investigations to collect vast amount of data with complex structures. Examples include microarray studies, fMRI analysis, and astronomical surveys. The analysis of these data sets poses many statistical challenges not present in smaller scale studies. In these studies, it is often required to test thousands and even millions of hypotheses simultaneously. Conventional multiple testing procedures are based on thresholding the ordered p-values. In this talk, we consider large-scale multiple testing from a compound decision theoretical point of view by treating it as a constrained optimization problem. The solution to this optimization problem yields an oracle procedure. A data-driven procedure is then constructed to mimic the performance of the oracle and is shown to be asymptotically optimal. In particular, the results show that, although p-value is appropriate for testing a single hypothesis, it fails to serve as the fundamental building block in large-scale multiple testing. Time permitting, I will also discuss simultaneous testing of grouped hypotheses.

This is joint work with Wenguang Sun (University of Pennsylvania).

Related Links

The video for this talk should appear here if JavaScript is enabled.
If it doesn't, something may have gone wrong with our embedded player.
We'll get it fixed as soon as possible.
University of Cambridge Research Councils UK
    Clay Mathematics Institute London Mathematical Society NM Rothschild and Sons