A Generative Adversarial Learning Framework for Breaking Text-Based CAPTCHA in the Dark Web

Ning Zhang, Mohammadreza Ebrahimi, Weifeng Li, Hsinchun Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Scopus citations

Abstract

Cyber threat intelligence (CTI) necessitates automated monitoring of dark web platforms (e.g., Dark Net Markets and carding shops) on a large scale. While there are existing methods for collecting data from the surface web, large-scale dark web data collection is commonly hindered by anti-crawling measures. Text-based CAPTCHA serves as the most prohibitive type of these measures. Text-based CAPTCHA requires the user to recognize a combination of hard-to-read characters. Dark web CAPTCHA patterns are intentionally designed to have additional background noise and variable character length to prevent automated CAPTCHA breaking. Existing CAPTCHA breaking methods cannot remedy these challenges and are therefore not applicable to the dark web. In this study, we propose a novel framework for breaking text-based CAPTCHA in the dark web. The proposed framework utilizes Generative Adversarial Network (GAN) to counteract dark web-specific background noise and leverages an enhanced character segmentation algorithm. Our proposed method was evaluated on both benchmark and dark web CAPTCHA testbeds. The proposed method significantly outperformed the state-of-the-art baseline methods on all datasets, achieving over 92.08% success rate on dark web testbeds. Our research enables the CTI community to develop advanced capabilities of large-scale dark web monitoring.

Original languageEnglish (US)
Title of host publicationProceedings - 2020 IEEE International Conference on Intelligence and Security Informatics, ISI 2020
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781728188003
DOIs
StatePublished - Nov 9 2020
Event18th IEEE International Conference on Intelligence and Security Informatics, ISI 2020 - Virtual, Arlington, United States
Duration: Nov 9 2020Nov 10 2020

Publication series

NameProceedings - 2020 IEEE International Conference on Intelligence and Security Informatics, ISI 2020

Conference

Conference18th IEEE International Conference on Intelligence and Security Informatics, ISI 2020
Country/TerritoryUnited States
CityVirtual, Arlington
Period11/9/2011/10/20

Keywords

  • automated CAPTCHA breaking
  • cyber threat intelligence
  • dark web
  • generative adversarial networks

ASJC Scopus subject areas

  • Information Systems and Management
  • Safety, Risk, Reliability and Quality
  • Computer Networks and Communications
  • Computer Vision and Pattern Recognition
  • Information Systems

Fingerprint

Dive into the research topics of 'A Generative Adversarial Learning Framework for Breaking Text-Based CAPTCHA in the Dark Web'. Together they form a unique fingerprint.

Cite this