TY - JOUR
T1 - Interaction Screening for Ultrahigh-Dimensional Data
AU - Hao, Ning
AU - Zhang, Hao Helen
N1 - Funding Information:
Ning Hao is Assistant Professor, Department of Mathematics, University of Arizona, Tucson, AZ 85721 (E-mail: nhao@math.arizona.edu). Hao Helen Zhang is Associate Professor, Department of Mathematics, University of Arizona, Tucson, AZ 85721 (E-mail: hzhang@math.arizona.edu). The authors are partially supported by NSF Grants DMS-1309507 (Hao and Zhang), DMS-1347844 (Zhang), NIH Grants NIH/NCI R01 CA-085848 (Zhang) and P01 CA142538 (Zhang), AMS-Simons Travel Grant (Hao). The authors are grateful to Dr. Han Xiao and to the editors, associate editor, and four referees for their helpful comments and suggestions.
Publisher Copyright:
© 2014 American Statistical Association.
PY - 2014/9
Y1 - 2014/9
N2 - In ultrahigh-dimensional data analysis, it is extremely challenging to identify important interaction effects, and a top concern in practice is computational feasibility. For a dataset with n observations and p predictors, the augmented design matrix including all linear and order-2 terms is of size n × (p2+ 3p)/2. When p is large, say more than tens of hundreds, the number of interactions is enormous and beyond the capacity of standard machines and software tools for storage and analysis. In theory, the interaction-selection consistency is hard to achieve in high-dimensional settings. Interaction effects have heavier tails and more complex covariance structures than main effects in a random design, making theoretical analysis difficult. In this article, we propose to tackle these issues by forward-selection-based procedures called iFOR, which identify interaction effects in a greedy forward fashion while maintaining the natural hierarchical model structure. Two algorithms, iFORT and iFORM, are studied. Computationally, the iFOR procedures are designed to be simple and fast to implement. No complex optimization tools are needed, since only OLS-type calculations are involved; the iFOR algorithms avoid storing and manipulating the whole augmented matrix, so the memory and CPU requirement is minimal; the computational complexity is linear in p for sparse models, hence feasible for p ≫ n. Theoretically, we prove that they possess sure screening property for ultrahigh-dimensional settings. Numerical examples are used to demonstrate their finite sample performance. Supplementary materials for this article are available online.
AB - In ultrahigh-dimensional data analysis, it is extremely challenging to identify important interaction effects, and a top concern in practice is computational feasibility. For a dataset with n observations and p predictors, the augmented design matrix including all linear and order-2 terms is of size n × (p2+ 3p)/2. When p is large, say more than tens of hundreds, the number of interactions is enormous and beyond the capacity of standard machines and software tools for storage and analysis. In theory, the interaction-selection consistency is hard to achieve in high-dimensional settings. Interaction effects have heavier tails and more complex covariance structures than main effects in a random design, making theoretical analysis difficult. In this article, we propose to tackle these issues by forward-selection-based procedures called iFOR, which identify interaction effects in a greedy forward fashion while maintaining the natural hierarchical model structure. Two algorithms, iFORT and iFORM, are studied. Computationally, the iFOR procedures are designed to be simple and fast to implement. No complex optimization tools are needed, since only OLS-type calculations are involved; the iFOR algorithms avoid storing and manipulating the whole augmented matrix, so the memory and CPU requirement is minimal; the computational complexity is linear in p for sparse models, hence feasible for p ≫ n. Theoretically, we prove that they possess sure screening property for ultrahigh-dimensional settings. Numerical examples are used to demonstrate their finite sample performance. Supplementary materials for this article are available online.
KW - Forward selection
KW - GWAS
KW - Heredity condition
KW - Sure screening
UR - http://www.scopus.com/inward/record.url?scp=84907499739&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84907499739&partnerID=8YFLogxK
U2 - 10.1080/01621459.2014.881741
DO - 10.1080/01621459.2014.881741
M3 - Article
AN - SCOPUS:84907499739
VL - 109
SP - 1285
EP - 1301
JO - Journal of the American Statistical Association
JF - Journal of the American Statistical Association
SN - 0162-1459
IS - 507
ER -