TY - GEN
T1 - Detecting cyber threats in non-english hacker forums
T2 - 2020 IEEE Symposium on Security and Privacy Workshops, SPW 2020
AU - Ebrahimi, Mohammadreza
AU - Samtani, Sagar
AU - Chai, Yidong
AU - Chen, Hsinchun
N1 - Funding Information:
This material is based upon work supported by the National Science Foundation (NSF) under Grants SES-1314631 (SaTC SBE), ACI-1443019 (DIBBs), CNS-1936370 (SaTC CORE), and CNS-1850362 (CRII SaTC).
Publisher Copyright:
© 2020 IEEE.
PY - 2020/5
Y1 - 2020/5
N2 - The regularity of devastating cyber-attacks has made cybersecurity a grand societal challenge. Many cybersecurity professionals are closely examining the international Dark Web to proactively pinpoint potential cyber threats. Despite its potential, the Dark Web contains hundreds of thousands of non-English posts. While machine translation is the prevailing approach to process non-English text, applying MT on hacker forum text results in mistranslations. In this study, we draw upon Long-Short Term Memory (LSTM), Cross-Lingual Knowledge Transfer (CLKT), and Generative Adversarial Networks (GANs) principles to design a novel Adversarial CLKT (A-CLKT) approach. A-CLKT operates on untranslated text to retain the original semantics of the language and leverages the collective knowledge about cyber threats across languages to create a language invariant representation without any manual feature engineering or external resources. Three experiments demonstrate how A-CLKT outperforms state-of-the-art machine learning, deep learning, and CLKT algorithms in identifying cyber-threats in French and Russian forums.
AB - The regularity of devastating cyber-attacks has made cybersecurity a grand societal challenge. Many cybersecurity professionals are closely examining the international Dark Web to proactively pinpoint potential cyber threats. Despite its potential, the Dark Web contains hundreds of thousands of non-English posts. While machine translation is the prevailing approach to process non-English text, applying MT on hacker forum text results in mistranslations. In this study, we draw upon Long-Short Term Memory (LSTM), Cross-Lingual Knowledge Transfer (CLKT), and Generative Adversarial Networks (GANs) principles to design a novel Adversarial CLKT (A-CLKT) approach. A-CLKT operates on untranslated text to retain the original semantics of the language and leverages the collective knowledge about cyber threats across languages to create a language invariant representation without any manual feature engineering or external resources. Three experiments demonstrate how A-CLKT outperforms state-of-the-art machine learning, deep learning, and CLKT algorithms in identifying cyber-threats in French and Russian forums.
KW - Adversarial learning
KW - Cross-lingual knowledge transfer
KW - Generative adversarial networks
KW - Hacker forums
KW - Long short-term memory
UR - http://www.scopus.com/inward/record.url?scp=85099724665&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85099724665&partnerID=8YFLogxK
U2 - 10.1109/SPW50608.2020.00021
DO - 10.1109/SPW50608.2020.00021
M3 - Conference contribution
AN - SCOPUS:85099724665
T3 - Proceedings - 2020 IEEE Symposium on Security and Privacy Workshops, SPW 2020
SP - 20
EP - 26
BT - Proceedings - 2020 IEEE Symposium on Security and Privacy Workshops, SPW 2020
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 21 May 2020
ER -