Non-asymptotic variable identification via the Lasso and the elastic net
Seminar Room 2, Newton Institute Gatehouse
The topic of l_1 regularized or Lasso-type estimation has received considerable attention over the past decade. Recent theoretical advances have been mainly concerned with the risk of the estimators and corresponding sparsity oracle inequalities. In this talk we will imvestigate the quality of the l_1 penalized estimators from a different perspective, shifting the emphasis to non-asymptotic variable selection, which complements the consistent variable selection literature. Our main results are established for regression models, with emphasis on the square and logistic loss. The identification of the tagged SNPs associated with a disease, in genome-wide association studies, provides the principal motivation for this analysis. The performance of the method depends crucially on the choice of the tuning sequence and we discuss non-asymptotic choices for which we can correctly detect sets of variables associated with the response at any pre-specified confidence level. These tuning sequences are different for the two loss functions, but in both cases larger than those required for best risk performance, The stability of the design matrix is another major issue in correct variable selection, especially when the total number of variables exceeds the sample size. A possible solution os provided by further regularization, for instance via an l_1+l_2 or elastic net penalty. We discuss the merits and limitations of this method in the same context as above.