Abstract
The problem of interaction selection in high-dimensional data analysis has recently received much attention. This note aims to address and clarify several fundamental issues in interaction selection for linear regression models, especially when the input dimension p is much larger than the sample size n. We first discuss how to give a formal definition of “importance” for main and interaction effects. Then we focus on two-stage methods, which are computationally attractive for high-dimensional data analysis but thus far have been regarded as heuristic. We revisit the counterexample of Turlach and provide new insight to justify two-stage methods from the theoretical perspective. In the end, we suggest new strategies for interaction selection under the marginality principle and provide some simulation results.
Original language | English (US) |
---|---|
Pages (from-to) | 291-297 |
Number of pages | 7 |
Journal | American Statistician |
Volume | 71 |
Issue number | 4 |
DOIs | |
State | Published - Oct 2 2017 |
Keywords
- Heredity condition
- Hierarchical structure
- Interaction effects
- Linear model
- Marginality principle
ASJC Scopus subject areas
- Statistics and Probability
- General Mathematics
- Statistics, Probability and Uncertainty