TY - JOUR
T1 - Linear or nonlinear? automatic structure discovery for partially linear models
AU - Zhang, Hao Helen
AU - Cheng, Guang
AU - Liu, Yufeng
N1 - Funding Information:
Hao Helen Zhang is Associate Professor, Department of Statistics, North Carolina State University, Raleigh, NC 27695 (E-mail: [email protected]), and Associate Professor, Department of Mathematics, University of Arizona, Tucson, AZ 85721 (E-mail: [email protected]). Guang Cheng is Assistant Professor, Department of Statistics, Purdue University, West Lafayette, IN 47906. Yufeng Liu is Associate Professor, Department of Statistics and Operations Research, Carolina Center for Genome Sciences, University of North Carolina, Chapel Hill, NC 27599. The authors are supported in part by NSF grants DMS-0645293 (Zhang), DMS-0906497 (Cheng), and DMS-0747575 (Liu), NIH grants NIH/NCI R01 CA-085848 (Zhang), NIH/NCI R01 CA-149569 (Liu), and NIH/NCI P01 CA142538 (Zhang and Liu). The authors thank the editor, the associate editor, and two reviewers for their helpful comments and suggestions which led to a much improved presentation.
PY - 2011
Y1 - 2011
N2 - Partially linear models provide a useful class of tools for modeling complex data by naturally incorporating a combination of linear and nonlinear effects within one framework. One key question in partially linear models is the choice of model structure, that is, how to decide which covariates are linear and which are nonlinear. This is a fundamental, yet largely unsolved problem for partially linear models. In practice, one often assumes that the model structure is given or known and then makes estimation and inference based on that structure. Alternatively, there are two methods in common use for tackling the problem: hypotheses testing and visual screening based on the marginal fits. Both methods are quite useful in practice but have their drawbacks. First, it is difficult to construct a powerful procedure for testing multiple hypotheses of linear against nonlinear fits. Second, the screening procedure based on the scatterplots of individual covariate fits may provide an educated guess on the regression function form, but the procedure is ad hoc and lacks theoretical justifications. In this article, we propose a new approach to structure selection for partially linear models, called the LAND (Linear And Nonlinear Discoverer). The procedure is developed in an elegant mathematical framework and possesses desired theoretical and computational properties. Under certain regularity conditions, we show that the LAND estimator is able to identify the underlying true model structure correctly and at the same time estimate the multivariate regression function consistently. The convergence rate of the new estimator is established as well. We further propose an iterative algorithm to implement the procedure and illustrate its performance by simulated and real examples. Supplementary materials for this article are available online.
AB - Partially linear models provide a useful class of tools for modeling complex data by naturally incorporating a combination of linear and nonlinear effects within one framework. One key question in partially linear models is the choice of model structure, that is, how to decide which covariates are linear and which are nonlinear. This is a fundamental, yet largely unsolved problem for partially linear models. In practice, one often assumes that the model structure is given or known and then makes estimation and inference based on that structure. Alternatively, there are two methods in common use for tackling the problem: hypotheses testing and visual screening based on the marginal fits. Both methods are quite useful in practice but have their drawbacks. First, it is difficult to construct a powerful procedure for testing multiple hypotheses of linear against nonlinear fits. Second, the screening procedure based on the scatterplots of individual covariate fits may provide an educated guess on the regression function form, but the procedure is ad hoc and lacks theoretical justifications. In this article, we propose a new approach to structure selection for partially linear models, called the LAND (Linear And Nonlinear Discoverer). The procedure is developed in an elegant mathematical framework and possesses desired theoretical and computational properties. Under certain regularity conditions, we show that the LAND estimator is able to identify the underlying true model structure correctly and at the same time estimate the multivariate regression function consistently. The convergence rate of the new estimator is established as well. We further propose an iterative algorithm to implement the procedure and illustrate its performance by simulated and real examples. Supplementary materials for this article are available online.
KW - Model selection
KW - RKHS
KW - Semiparametric regression
KW - Shrinkage
KW - Smoothing splines
UR - http://www.scopus.com/inward/record.url?scp=80054678826&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=80054678826&partnerID=8YFLogxK
U2 - 10.1198/jasa.2011.tm10281
DO - 10.1198/jasa.2011.tm10281
M3 - Article
AN - SCOPUS:80054678826
SN - 0162-1459
VL - 106
SP - 1099
EP - 1112
JO - Journal of the American Statistical Association
JF - Journal of the American Statistical Association
IS - 495
ER -