TY - JOUR
T1 - Data poisoning against information-theoretic feature selection
AU - Liu, Heng
AU - Ditzler, Gregory
N1 - Funding Information:
This work was supported by grants from the Department of Energy #DE-NA0003946, National Science Foundation CAREER #1943552, and Army Research Lab W56KGU-20-C-0002. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.
Publisher Copyright:
© 2021 Elsevier Inc.
PY - 2021/9
Y1 - 2021/9
N2 - A typical assumption made in machine learning is that a learning model does not consider an adversary's existence that can subvert a classifier's objective. As a result, machine learning pipelines exhibit vulnerabilities in an adversarial environment. Feature Selection (FS) is an essential preprocessing stage in data analytics and has been widely used in security-sensitive machine learning applications; however, FS research in adversarial machine learning has been largely overlooked. Recently, empirical works demonstrated that the FS is also vulnerable in an adversarial environment. In the past decade, although the research community has made extensive efforts to promote the classifiers’ robustness and develop countermeasures against adversaries, only a few contributions investigated FS's behavior in a malicious environment. Given that machine learning pipelines increasingly rely on FS to combat the “curse of dimensionality” and overfitting, insecure FS can be the “Achilles heel” of data pipelines. In this contribution, we explore the weaknesses of information-theoretic FS methods by designing a generic FS poisoning algorithm. We also show the transferability of the proposed poisoning method across seven information-theoretic FS methods. The experiments on 16 benchmark datasets demonstrate the efficacy of our proposed poisoning algorithm and the existence of transferability.
AB - A typical assumption made in machine learning is that a learning model does not consider an adversary's existence that can subvert a classifier's objective. As a result, machine learning pipelines exhibit vulnerabilities in an adversarial environment. Feature Selection (FS) is an essential preprocessing stage in data analytics and has been widely used in security-sensitive machine learning applications; however, FS research in adversarial machine learning has been largely overlooked. Recently, empirical works demonstrated that the FS is also vulnerable in an adversarial environment. In the past decade, although the research community has made extensive efforts to promote the classifiers’ robustness and develop countermeasures against adversaries, only a few contributions investigated FS's behavior in a malicious environment. Given that machine learning pipelines increasingly rely on FS to combat the “curse of dimensionality” and overfitting, insecure FS can be the “Achilles heel” of data pipelines. In this contribution, we explore the weaknesses of information-theoretic FS methods by designing a generic FS poisoning algorithm. We also show the transferability of the proposed poisoning method across seven information-theoretic FS methods. The experiments on 16 benchmark datasets demonstrate the efficacy of our proposed poisoning algorithm and the existence of transferability.
KW - Adversarial learning
KW - Feature selection
KW - Information theory
UR - http://www.scopus.com/inward/record.url?scp=85107825123&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85107825123&partnerID=8YFLogxK
U2 - 10.1016/j.ins.2021.05.049
DO - 10.1016/j.ins.2021.05.049
M3 - Article
AN - SCOPUS:85107825123
VL - 573
SP - 396
EP - 411
JO - Information Sciences
JF - Information Sciences
SN - 0020-0255
ER -