A semi-parallel framework for greedy information-theoretic feature selection

Heng Liu, Gregory Ditzler

Research output: Contribution to journalArticlepeer-review

9 Scopus citations

Abstract

Feature selection (FS) is a well-studied area that avoids issues related the curse of dimensionality and overfitting. FS is a preprocessing procedure that identifies the feature subset that is both relevant and non-redundant. Although FS has been driven by the exploration of “big data” and the development of high-performance computing, the implementation of scalable information-theoretic FS remains an under-explored topic. In this contribution, we revisit the greedy optimization procedure of information-theoretic filter FS and propose a semi-parallel optimizing paradigm that can provide an equivalent feature set as the greedy FS algorithms in a fraction of the time. We focus on greedy selection algorithms due to their larger computational complexity associated with a rapidly growing number of features. Our framework is benchmarked against twelve datasets, including one extremely large dataset that has more than a million features, and we show our framework can significantly speed up the process of FS while selecting nearly the same features as the state-of-the-art information-theoretic FS methods.

Original languageEnglish (US)
Pages (from-to)13-28
Number of pages16
JournalInformation Sciences
Volume492
DOIs
StatePublished - Aug 2019

Keywords

  • Feature selection
  • Information theory
  • Parallel computing

ASJC Scopus subject areas

  • Software
  • Control and Systems Engineering
  • Theoretical Computer Science
  • Computer Science Applications
  • Information Systems and Management
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'A semi-parallel framework for greedy information-theoretic feature selection'. Together they form a unique fingerprint.

Cite this