TY - JOUR
T1 - Variable selection and model building via likelihood basis pursuit
AU - Zhang, Hao Helen
AU - Wahba, Grace
AU - Lin, Yi
AU - Voelker, Meta
AU - Ferris, Michael
AU - Klein, Ronald
AU - Klein, Barbara
N1 - Funding Information:
Hao Helen Zhang is Assistant Professor, Department of Statistics, North Carolina State University, Raleigh, NC 27695 (E-mail: [email protected]. edu). Grace Wahba is IJ Schoenberg and Bascom Professor (E-mail: wahba@ stat.wisc.edu) and Yi Lin is Associate Professor (E-mail: [email protected]), Department of Statistics, and Michael Ferris is Professor (E-mail: ferris@cs. wisc.edu), Department of Computer Sciences and Industrial Engineering, University of Wisconsin-Madison, Madison, WI 53706. Meta Voelker is Senior Analyst, Alphatech Inc., Arlington, VA 22203 (E-mail: meta.voelker@dc. alphatech.com). Ronald Klein is Professor (E-mail: [email protected]) and Barbara Klein is Professor (E-mail: [email protected]), Department of Ophthalmology, Medical School, University of Wisconsin-Madison, Madison, WI 53726. This work was supported in part by National Science Foundation grants DMS-00-72292, DMS-01-34987, DMS-04-05913, and CCR-9972372; National Institutes of Health grants EY09946 and EY03083; and AFOSR grant F49620-01-1-0040. The authors thank the editor, the associate editor, and the two referees for their constructive comments and suggestions that have led to significant improvement of this article.
PY - 2004/9
Y1 - 2004/9
N2 - This article presents a nonparametric penalized likelihood approach for variable selection and model building, called likelihood basis pursuit (LBP). In the setting of a tensor product reproducing kernel Hilbert space, we decompose the log-likelihood into the sum of different functional components such as main effects and interactions, with each component represented by appropriate basis functions. Basis functions are chosen to be compatible with variable selection and model building in the context of a smoothing spline ANOVA model. Basis pursuit is applied to obtain the optimal decomposition in terms of having the smallest l 1 norm on the coefficients. We use the functional L 1 norm to measure the importance of each component and determine the "threshold" value by a sequential Monte Carlo bootstrap test algorithm. As a generalized LASSO-type method, LBP produces shrinkage estimates for the coefficients, which greatly facilitates the variable selection process and provides highly interpretable multivariate functional estimates at the same time. To choose the regularization parameters appearing in the LBP models, generalized approximate cross-validation (GACV) is derived as a tuning criterion. To make GACV widely applicable to large datasets, its randomized version is proposed as well. A technique "slice modeling" is used to solve the optimization problem and makes the computation more efficient. LBP has great potential for a wide range of research and application areas such as medical studies, and in this article we apply it to two large ongoing epidemiologic studies, the Wisconsin Epidemiologic Study of Diabetic Retinopathy (WESDR) and the Beaver Dam Eye Study (BDES).
AB - This article presents a nonparametric penalized likelihood approach for variable selection and model building, called likelihood basis pursuit (LBP). In the setting of a tensor product reproducing kernel Hilbert space, we decompose the log-likelihood into the sum of different functional components such as main effects and interactions, with each component represented by appropriate basis functions. Basis functions are chosen to be compatible with variable selection and model building in the context of a smoothing spline ANOVA model. Basis pursuit is applied to obtain the optimal decomposition in terms of having the smallest l 1 norm on the coefficients. We use the functional L 1 norm to measure the importance of each component and determine the "threshold" value by a sequential Monte Carlo bootstrap test algorithm. As a generalized LASSO-type method, LBP produces shrinkage estimates for the coefficients, which greatly facilitates the variable selection process and provides highly interpretable multivariate functional estimates at the same time. To choose the regularization parameters appearing in the LBP models, generalized approximate cross-validation (GACV) is derived as a tuning criterion. To make GACV widely applicable to large datasets, its randomized version is proposed as well. A technique "slice modeling" is used to solve the optimization problem and makes the computation more efficient. LBP has great potential for a wide range of research and application areas such as medical studies, and in this article we apply it to two large ongoing epidemiologic studies, the Wisconsin Epidemiologic Study of Diabetic Retinopathy (WESDR) and the Beaver Dam Eye Study (BDES).
KW - Generalized approximate cross-validation
KW - LASSO
KW - Monte Carlo bootstrap test
KW - Nonparametric variable selection
KW - Slice modeling
KW - Smoothing spline ANOVA
UR - http://www.scopus.com/inward/record.url?scp=4944223275&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=4944223275&partnerID=8YFLogxK
U2 - 10.1198/016214504000000593
DO - 10.1198/016214504000000593
M3 - Article
AN - SCOPUS:4944223275
SN - 0162-1459
VL - 99
SP - 659
EP - 672
JO - Journal of the American Statistical Association
JF - Journal of the American Statistical Association
IS - 467
ER -