Proactively Identifying Emerging Hacker Threats from the Dark Web

Sagar Samtani, Hongyi Zhu, Hsinchun Chen

Research output: Contribution to journalArticlepeer-review

16 Scopus citations


Cybersecurity experts have appraised the total global cost of malicious hacking activities to be $450 billion annually. Cyber Threat Intelligence (CTI) has emerged as a viable approach to combat this societal issue. However, existing processes are criticized as inherently reactive to known threats. To combat these concerns, CTI experts have suggested proactively examining emerging threats in the vast, international online hacker community. In this study, we aim to develop proactive CTI capabilities by exploring online hacker forums to identify emerging threats in terms of popularity and tool functionality. To achieve these goals, we create a novel Diachronic Graph Embedding Framework (D-GEF). D-GEF operates on a Graph-of-Words (GoW) representation of hacker forum text to generate word embeddings in an unsupervised manner. Semantic displacement measures adopted from diachronic linguistics literature identify how terminology evolves. A series of benchmark experiments illustrate D-GEF's ability to generate higher quality than state-of-the-art word embedding models (e.g., word2vec) in tasks pertaining to semantic analogy, clustering, and threat classification. D-GEF's practical utility is illustrated with in-depth case studies on web application and denial of service threats targeting PHP and Windows technologies, respectively. We also discuss the implications of the proposed framework for strategic, operational, and tactical CTI scenarios. All datasets and code are publicly released to facilitate scientific reproducibility and extensions of this work.

Original languageEnglish (US)
Article number3409289
JournalACM Transactions on Privacy and Security
Issue number4
StatePublished - Aug 2020


  • Cyber threat intelligence
  • deep learning
  • diachronic linguistics
  • graph convolutions
  • graph embeddings
  • hacker forums
  • ransomware
  • text graphs

ASJC Scopus subject areas

  • Computer Science(all)
  • Safety, Risk, Reliability and Quality


Dive into the research topics of 'Proactively Identifying Emerging Hacker Threats from the Dark Web'. Together they form a unique fingerprint.

Cite this