TY - JOUR
T1 - Review of malicious code detection in data mining applications
T2 - challenges, algorithms, and future direction
AU - Razaque, Abdul
AU - Bektemyssova, Gulnara
AU - Yoo, Joon
AU - Hariri, Salim
AU - Khan, Meer Jaro
AU - Nalgozhina, Nurgul
AU - Hwang, Jaeryong
AU - Khan, M. Ajmal
N1 - Publisher Copyright:
© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2025.
PY - 2025/6
Y1 - 2025/6
N2 - In an era where machine learning critically underpins business operations, detecting vulnerabilities introduced by malicious code has become increasingly essential. Although prior research has extensively explored malicious code within machine learning algorithms, a targeted analysis specifically designed to identify and address these threats remains necessary. This paper presents an exhaustive literature review, focusing on the key processes of insertion, recognition, decision-making, and selection of malicious codes. We aim to uncover architectural weaknesses in data mining applications that amplify system vulnerabilities. Leveraging an integrative review covering publications from 2008 to 2024, we synthesize insights from a diverse array of academic and digital sources, examining 167 pertinent articles. This rigorous approach reveals the nuanced effects of malicious code on feature selection algorithms, crucial for maintaining data integrity. Our findings indicate that malicious code can significantly disrupt various sectors, including industrial, telecommunications, and biological data mining, adversely affecting clustering, classification, and regression algorithms. However, an encouraging outcome is observed in advanced feature selection algorithms that demonstrate resilience by effectively filtering out irrelevant data inputs. The paper concludes with a strong call for the development of sophisticated detection methods, which are vital for mitigating the growing risks associated with malicious code. It stresses the importance of proactive algorithm identification and classification to preserve the efficacy of data mining. Current challenges in accurately classifying machine learning algorithms raise concerns about data privacy, security, and potential biases. Ongoing research is crucial for improving data interoperability and algorithm transparency, thereby strengthening the defense mechanisms of machine learning applications against the complex and evolving landscape of cyber threats.
AB - In an era where machine learning critically underpins business operations, detecting vulnerabilities introduced by malicious code has become increasingly essential. Although prior research has extensively explored malicious code within machine learning algorithms, a targeted analysis specifically designed to identify and address these threats remains necessary. This paper presents an exhaustive literature review, focusing on the key processes of insertion, recognition, decision-making, and selection of malicious codes. We aim to uncover architectural weaknesses in data mining applications that amplify system vulnerabilities. Leveraging an integrative review covering publications from 2008 to 2024, we synthesize insights from a diverse array of academic and digital sources, examining 167 pertinent articles. This rigorous approach reveals the nuanced effects of malicious code on feature selection algorithms, crucial for maintaining data integrity. Our findings indicate that malicious code can significantly disrupt various sectors, including industrial, telecommunications, and biological data mining, adversely affecting clustering, classification, and regression algorithms. However, an encouraging outcome is observed in advanced feature selection algorithms that demonstrate resilience by effectively filtering out irrelevant data inputs. The paper concludes with a strong call for the development of sophisticated detection methods, which are vital for mitigating the growing risks associated with malicious code. It stresses the importance of proactive algorithm identification and classification to preserve the efficacy of data mining. Current challenges in accurately classifying machine learning algorithms raise concerns about data privacy, security, and potential biases. Ongoing research is crucial for improving data interoperability and algorithm transparency, thereby strengthening the defense mechanisms of machine learning applications against the complex and evolving landscape of cyber threats.
KW - Cyber threats
KW - Data mining
KW - Industrial applications
KW - Malicious code detection
KW - Vulnerabilities
UR - http://www.scopus.com/inward/record.url?scp=85217282005&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85217282005&partnerID=8YFLogxK
U2 - 10.1007/s10586-024-05017-x
DO - 10.1007/s10586-024-05017-x
M3 - Article
AN - SCOPUS:85217282005
SN - 1386-7857
VL - 28
JO - Cluster Computing
JF - Cluster Computing
IS - 3
M1 - 206
ER -