Large-scale multiple testing: finding needles in a haystack
Seminar Room 1, Newton Institute
Due to advances in technology, it has become increasingly common in scientific investigations to collect vast amount of data with complex structures. Examples include microarray studies, fMRI analysis, and astronomical surveys. The analysis of these data sets poses many statistical challenges not present in smaller scale studies. In these studies, it is often required to test thousands and even millions of hypotheses simultaneously. Conventional multiple testing procedures are based on thresholding the ordered p-values. In this talk, we consider large-scale multiple testing from a compound decision theoretical point of view by treating it as a constrained optimization problem. The solution to this optimization problem yields an oracle procedure. A data-driven procedure is then constructed to mimic the performance of the oracle and is shown to be asymptotically optimal. In particular, the results show that, although p-value is appropriate for testing a single hypothesis, it fails to serve as the fundamental building block in large-scale multiple testing. Time permitting, I will also discuss simultaneous testing of grouped hypotheses.
This is joint work with Wenguang Sun (University of Pennsylvania).
- http://www-stat.wharton.upenn.edu/~tcai/ - Download papers here.
If it doesn't, something may have gone wrong with our embedded player.
We'll get it fixed as soon as possible.