TY - GEN
T1 - An Explainable Outlier Detection-based Data Cleaning Approach for Intrusion Detection
AU - Ha, Theodore
AU - Shao, Sicong
AU - Hariri, Salim
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - The effectiveness of machine learning (ML)-based intrusion detection systems (IDSs) for detecting widespread cyberattacks on critical infrastructure and government systems has been demonstrated in recent years. Nevertheless, with ML models becoming more complex, people can hardly understand their decisions. Further, most works on model explanations focus on analyzing the ML model itself. However, data cleaning is also vital in influencing the model's detection behavior. On the other hand, data cleaning for ML-based IDSs is challenging because modern IDS datasets may contain outliers that affect the training stage. In this work, we propose an explainable data cleaning approach for intrusion detection, which can effectively perform explainable isolation forest-based outlier detection in the data preprocessing stage for intrusion detection. Through experiments on real-world network intrusion datasets, we evaluate the effectiveness of our approach. Experiment results demonstrate that eliminating outliers improves intrusion detection and that data cleaning using outlier detection is explainable.
AB - The effectiveness of machine learning (ML)-based intrusion detection systems (IDSs) for detecting widespread cyberattacks on critical infrastructure and government systems has been demonstrated in recent years. Nevertheless, with ML models becoming more complex, people can hardly understand their decisions. Further, most works on model explanations focus on analyzing the ML model itself. However, data cleaning is also vital in influencing the model's detection behavior. On the other hand, data cleaning for ML-based IDSs is challenging because modern IDS datasets may contain outliers that affect the training stage. In this work, we propose an explainable data cleaning approach for intrusion detection, which can effectively perform explainable isolation forest-based outlier detection in the data preprocessing stage for intrusion detection. Through experiments on real-world network intrusion datasets, we evaluate the effectiveness of our approach. Experiment results demonstrate that eliminating outliers improves intrusion detection and that data cleaning using outlier detection is explainable.
KW - data cleaning
KW - explainable
KW - intrusion detection
KW - outlier removal
UR - http://www.scopus.com/inward/record.url?scp=85190108228&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85190108228&partnerID=8YFLogxK
U2 - 10.1109/AICCSA59173.2023.10479247
DO - 10.1109/AICCSA59173.2023.10479247
M3 - Conference contribution
AN - SCOPUS:85190108228
T3 - Proceedings of IEEE/ACS International Conference on Computer Systems and Applications, AICCSA
BT - 2023 20th ACS/IEEE International Conference on Computer Systems and Applications, AICCSA 2023 - Proceedings
PB - IEEE Computer Society
T2 - 20th ACS/IEEE International Conference on Computer Systems and Applications, AICCSA 2023
Y2 - 4 December 2023 through 7 December 2023
ER -