Data poisoning against information-theoretic feature selection

Heng Liu, Gregory Ditzler

Research output: Contribution to journalArticlepeer-review

9 Scopus citations

Abstract

A typical assumption made in machine learning is that a learning model does not consider an adversary's existence that can subvert a classifier's objective. As a result, machine learning pipelines exhibit vulnerabilities in an adversarial environment. Feature Selection (FS) is an essential preprocessing stage in data analytics and has been widely used in security-sensitive machine learning applications; however, FS research in adversarial machine learning has been largely overlooked. Recently, empirical works demonstrated that the FS is also vulnerable in an adversarial environment. In the past decade, although the research community has made extensive efforts to promote the classifiers’ robustness and develop countermeasures against adversaries, only a few contributions investigated FS's behavior in a malicious environment. Given that machine learning pipelines increasingly rely on FS to combat the “curse of dimensionality” and overfitting, insecure FS can be the “Achilles heel” of data pipelines. In this contribution, we explore the weaknesses of information-theoretic FS methods by designing a generic FS poisoning algorithm. We also show the transferability of the proposed poisoning method across seven information-theoretic FS methods. The experiments on 16 benchmark datasets demonstrate the efficacy of our proposed poisoning algorithm and the existence of transferability.

Original languageEnglish (US)
Pages (from-to)396-411
Number of pages16
JournalInformation Sciences
Volume573
DOIs
StatePublished - Sep 2021

Keywords

  • Adversarial learning
  • Feature selection
  • Information theory

ASJC Scopus subject areas

  • Software
  • Control and Systems Engineering
  • Theoretical Computer Science
  • Computer Science Applications
  • Information Systems and Management
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Data poisoning against information-theoretic feature selection'. Together they form a unique fingerprint.

Cite this