Incomplete data with blockwise missing patterns are commonly encountered in analytics, and solutions typically entail listwise deletion or imputation. However, as the proportion of missing values in input features increases, listwise/columnwise deletion leads to information loss while imputation diminishes the integrity of the training dataset. We present the blockwise reduced modeling (BRM) method for analyzing blockwise missing patterns, that adapts and improves upon the notion of reduced modeling proposed by Friedman, Kohavi, and Yun in 1996 as lazy classification trees. In contrast to the original idea of reduced modeling of delaying model induction until a prediction is required, our method exploits the blockwise missing patterns to pre-train ensemble models that require minimum imputation of data. Models are pre-trained over the overlapping subsets of an incomplete dataset that contains only populated values. During prediction, each test instance is mapped to one of these models based on its feature missing pattern. BRM can be applied to any supervised learning model for tabular data. We evaluate the utility of BRM using multiple linear and nonlinear models trained over four datasets – Alzheimer's Disease Neuroimaging Initiative, Wellbuilt for Wellbeing, Facebook COVID-19 Symptoms, and Capital Bike-share System; the first three datasets containing real blockwise missing patterns. We demonstrate that BRM is superior to existing benchmarks in terms of predictive performance for linear as well as nonlinear models across all datasets. It also scales very well and is more reliable than existing benchmarks. BRM is particularly useful for analyzing incomplete data with multiple features having >20% missing values.
|Date made available||2022|