TY - GEN
T1 - Labeling Hacker Exploits for Proactive Cyber Threat Intelligence
T2 - 18th IEEE International Conference on Intelligence and Security Informatics, ISI 2020
AU - Ampel, Benjamin
AU - Samtani, Sagar
AU - Zhu, Hongyi
AU - Ullman, Steven
AU - Chen, Hsinchun
N1 - Funding Information:
This work was supported in part by the National Science Foundation under grant numbers DUE-1303362 (SFS), OAC-1917117 (CICI), and CNS-1850362 (SaTC CRII).
Funding Information:
This work was supported in part by the National Science FoundationundergrantnumbersDUE-1303362(SFS),OAC-1917117(CICI),andCNS-1850362(SaTCCRII).
Publisher Copyright:
© 2020 IEEE.
PY - 2020/11/9
Y1 - 2020/11/9
N2 - With the rapid development of new technologies, vulnerabilities are at an all-time high. Companies are investing in developing Cyber Threat Intelligence (CTI) to counteract these new vulnerabilities. However, this CTI is generally reactive based on internal data. Hacker forums can provide proactive CTI value through automated analysis of new trends and exploits. One way to identify exploits is by analyzing the source code that is posted on these forums. These source code snippets are often noisy and unlabeled, making standard data labeling techniques ineffective. This study aims to design a novel framework for the automated collection and categorization of hacker forum exploit source code. We propose a deep transfer learning framework, the Deep Transfer Learning for Exploit Labeling (DTL-EL). DTL-EL leverages the learned representation from professional labeled exploits to better generalize to hacker forum exploits. This model classifies the collected hacker forum exploits into eight predefined categories for proactive and timely CTI. The results of this study indicate that DTL-EL outperforms other prominent models in hacker forum literature.
AB - With the rapid development of new technologies, vulnerabilities are at an all-time high. Companies are investing in developing Cyber Threat Intelligence (CTI) to counteract these new vulnerabilities. However, this CTI is generally reactive based on internal data. Hacker forums can provide proactive CTI value through automated analysis of new trends and exploits. One way to identify exploits is by analyzing the source code that is posted on these forums. These source code snippets are often noisy and unlabeled, making standard data labeling techniques ineffective. This study aims to design a novel framework for the automated collection and categorization of hacker forum exploit source code. We propose a deep transfer learning framework, the Deep Transfer Learning for Exploit Labeling (DTL-EL). DTL-EL leverages the learned representation from professional labeled exploits to better generalize to hacker forum exploits. This model classifies the collected hacker forum exploits into eight predefined categories for proactive and timely CTI. The results of this study indicate that DTL-EL outperforms other prominent models in hacker forum literature.
KW - Hacker forums
KW - cyber threat intelligence
KW - deep transfer learning
KW - exploit labeling
KW - source code
KW - text classification
UR - http://www.scopus.com/inward/record.url?scp=85098966836&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85098966836&partnerID=8YFLogxK
U2 - 10.1109/ISI49825.2020.9280548
DO - 10.1109/ISI49825.2020.9280548
M3 - Conference contribution
AN - SCOPUS:85098966836
T3 - Proceedings - 2020 IEEE International Conference on Intelligence and Security Informatics, ISI 2020
BT - Proceedings - 2020 IEEE International Conference on Intelligence and Security Informatics, ISI 2020
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 9 November 2020 through 10 November 2020
ER -