A Reduced Modeling Approach for Making Predictions With Incomplete Data Having Blockwise Missing Patterns

  • Karthik Srinivasan (Creator)
  • Faiz Currim (Creator)
  • Sudha Ram (Creator)

Dataset

Description

Incomplete data with blockwise missing patterns are commonly encountered in analytics, and solutions typically entail listwise deletion or imputation. However, as the proportion of missing values in input features increases, listwise or columnwise deletion leads to information loss, while imputation diminishes the integrity of the training dataset. We present the blockwise reduced modeling (BRM) method for analyzing blockwise missing patterns, which adapts and improves upon the notion of reduced modeling proposed by Friedman, Kohavi, and Yun in 1996 as lazy decision trees. In contrast to the original idea of reduced modeling of delaying model induction until a prediction is required, our method exploits the blockwise missing patterns to pre-train ensemble models that require minimum imputation of data. Models are pre-trained over the overlapping subsets of an incomplete dataset that contain only populated values. During prediction, each test instance is mapped to one of these models based on its feature-missing pattern. BRM can be applied to any supervised learning model for tabular data. We benchmark the predictive performance of BRM using simulations of blockwise missing patterns on three complete datasets from public repositories. Thereafter, we evaluate its utility on three datasets with actual blockwise missing patterns. We demonstrate that BRM is superior to most existing benchmarks in terms of predictive performance for linear and nonlinear models. It also scales well and is more reliable than existing benchmarks for making predictions with blockwise missing pattern data.
Date made available2023
PublisherCode Ocean

Cite this