TY - GEN
T1 - Detecting cyber threats in non-english dark net markets
T2 - 16th IEEE International Conference on Intelligence and Security Informatics, ISI 2018
AU - Ebrahimi, Mohammadreza
AU - Surdeanu, Mihai
AU - Chen, Hsinchun
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/12/24
Y1 - 2018/12/24
N2 - Recent advances in proactive cyber threat intelligence rely on early detection of cyber threats in hacker communities. Dark Net Markets (DNMs) are growing platforms in hacker community that provide hackers with highly- specialized tools and products which may not be found in other platforms. While text classification techniques have been used for cyber threat detection in English DNMs, the task is hindered in non-English platforms due to the language barrier and lack of ground-truth data. Current approaches use monolingual models on machine translated data to overcome these challenges. However, the translation errors can deteriorate the classification results. The abundance of data in English DNMs can be leveraged in learning non-English threats without using machine translation. In this study, we show that a deep cross-lingual model that can jointly learn the common language representation from two languages, significantly outperforms a monolingual model learned on machine translated data for identifying cyber threats in non-English DNMs. Unlike most studies, our approach does not require any external data source such as bilingual word embeddings or bilingual lexicons. Our experiments on Russian DNMs show that this approach can achieve better performance than state-of-the-art methods for non-English cyber threat detection in malicious hacker community.
AB - Recent advances in proactive cyber threat intelligence rely on early detection of cyber threats in hacker communities. Dark Net Markets (DNMs) are growing platforms in hacker community that provide hackers with highly- specialized tools and products which may not be found in other platforms. While text classification techniques have been used for cyber threat detection in English DNMs, the task is hindered in non-English platforms due to the language barrier and lack of ground-truth data. Current approaches use monolingual models on machine translated data to overcome these challenges. However, the translation errors can deteriorate the classification results. The abundance of data in English DNMs can be leveraged in learning non-English threats without using machine translation. In this study, we show that a deep cross-lingual model that can jointly learn the common language representation from two languages, significantly outperforms a monolingual model learned on machine translated data for identifying cyber threats in non-English DNMs. Unlike most studies, our approach does not require any external data source such as bilingual word embeddings or bilingual lexicons. Our experiments on Russian DNMs show that this approach can achieve better performance than state-of-the-art methods for non-English cyber threat detection in malicious hacker community.
KW - Cross-lingual transfer learning
KW - Cyber threat
KW - Dark Net Markets
KW - Deep learning
UR - http://www.scopus.com/inward/record.url?scp=85061055388&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85061055388&partnerID=8YFLogxK
U2 - 10.1109/ISI.2018.8587404
DO - 10.1109/ISI.2018.8587404
M3 - Conference contribution
AN - SCOPUS:85061055388
T3 - 2018 IEEE International Conference on Intelligence and Security Informatics, ISI 2018
SP - 85
EP - 90
BT - 2018 IEEE International Conference on Intelligence and Security Informatics, ISI 2018
A2 - Lee, Dongwon
A2 - Mezzour, Ghita
A2 - Kumaraguru, Ponnurangam
A2 - Saxena, Nitesh
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 9 November 2018 through 11 November 2018
ER -