skip to content

Transposably invariant sample reuse: the pigeonhole bootstrap and blockwise cross-validation

Presented by: 
A Owen [Stanford]
Wednesday 25th June 2008 - 11:30 to 12:30
INI Seminar Room 1
Session Chair: 
John Rice

Sample reuse methods like the bootstrap and cross-validation are widely used in statistics and machine learning. They provide measures of accuracy with some face value validity that is not dependent on strong model assumptions.

These methods depend on repeating or omitting cases, while keeping all the variables in those cases. But for many data sets, it is not obvious whether the rows are cases and colunns are variables, or vice versa. For example, with movie ratings organized by movie and customer, both movie and customer IDs can be thought of as variables.

This talk looks at bootstrap and cross-validation methods that treat rows and columns of the matrix symmetrically. We get the same answer on X as on X'. McCullagh has proved that no exact bootstrap exists in a certain framework of this type (crossed random effects). We show that a method based on resampling both rows and columns of the data matrix tracks the true error, for some simple statistics applied to large data matrices.

Similarly we look at a method of cross-validation that leaves out blocks of the data matrix, generalizing a proposal due to Gabriel that is used in the crop science literature. We find empirically that this approach provides a good way to choose the number of terms in a truncated SVD model or a non-negative matrix factorization. We also apply some recent results in random matrix theory to the truncated SVD case.

Related Links

The video for this talk should appear here if JavaScript is enabled.
If it doesn't, something may have gone wrong with our embedded player.
We'll get it fixed as soon as possible.
University of Cambridge Research Councils UK
    Clay Mathematics Institute London Mathematical Society NM Rothschild and Sons