TY - GEN
T1 - Chinese underground market jargon analysis based on unsupervised learning
AU - Zhao, Kangzhi
AU - Zhang, Yong
AU - Xing, Chunxiao
AU - Li, Weifeng
AU - Chen, Hsinchun
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2016/11/15
Y1 - 2016/11/15
N2 - With the rapid growth of online population, China has become the world's largest online market. This also gives rise to the Chinese underground market, which has facilitated many of the cybercrimes in China. Consequently, there is a need for research scrutinizing Chinese underground markets. One major challenge facing cybersecurity researchers is to understand the unfamiliar cybercriminal jargons. To this end, we are motivated to analyze jargons in Chinese underground market. Particularly, we utilize the recent advancements in unsupervised machine learning methods, word embedding and Latent Dirichlet Allocation. We evaluate our work on a research testbed encompassing 29 exclusive underground market QQ groups with 23,000 members. Specifically, we test the ability of the proposed approach to learn semantically similar words of known cybersecurity-related jargons. Results suggest the state-of-The-Art unsupervised learning approaches can help better understand cybercriminal language, providing promising insights for future research on Chinese underground markets.
AB - With the rapid growth of online population, China has become the world's largest online market. This also gives rise to the Chinese underground market, which has facilitated many of the cybercrimes in China. Consequently, there is a need for research scrutinizing Chinese underground markets. One major challenge facing cybersecurity researchers is to understand the unfamiliar cybercriminal jargons. To this end, we are motivated to analyze jargons in Chinese underground market. Particularly, we utilize the recent advancements in unsupervised machine learning methods, word embedding and Latent Dirichlet Allocation. We evaluate our work on a research testbed encompassing 29 exclusive underground market QQ groups with 23,000 members. Specifically, we test the ability of the proposed approach to learn semantically similar words of known cybersecurity-related jargons. Results suggest the state-of-The-Art unsupervised learning approaches can help better understand cybercriminal language, providing promising insights for future research on Chinese underground markets.
KW - Chinese underground market
KW - cybersecurity
KW - language model
KW - unsupervised learning
UR - http://www.scopus.com/inward/record.url?scp=85004093068&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85004093068&partnerID=8YFLogxK
U2 - 10.1109/ISI.2016.7745450
DO - 10.1109/ISI.2016.7745450
M3 - Conference contribution
AN - SCOPUS:85004093068
T3 - IEEE International Conference on Intelligence and Security Informatics: Cybersecurity and Big Data, ISI 2016
SP - 97
EP - 102
BT - IEEE International Conference on Intelligence and Security Informatics
A2 - Mao, Wenji
A2 - Wang, G. Alan
A2 - Zhou, Lina
A2 - Kaati, Lisa
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 14th IEEE International Conference on Intelligence and Security Informatics, ISI 2015
Y2 - 28 September 2016 through 30 September 2016
ER -