TY - GEN
T1 - Identifying, Collecting, and Monitoring Personally Identifiable Information
T2 - 18th IEEE International Conference on Intelligence and Security Informatics, ISI 2020
AU - Liu, Yizhi
AU - Lin, Fang Yu
AU - Ahmad-Post, Zara
AU - Ebrahimi, Mohammadreza
AU - Zhang, Ning
AU - Hu, James Lee
AU - Xin, Jingyu
AU - Li, Weifeng
AU - Chen, Hsinchun
N1 - Funding Information:
ACKNOWLEDGMENTS This material is based upon work supported by the National Science Foundation (NSF) under Secure and Trustworthy Cyberspace (SaTC), (grant No. 1936370), Cybersecurity Innovation for Cyberinfrastructure (grant No. 1917117), and CyberCorps Scholarship-for-Service (grant No. 1921485).
Publisher Copyright:
© 2020 IEEE.
PY - 2020/11/9
Y1 - 2020/11/9
N2 - Personally identifiable information (PII) has become a major target of cyber-attacks, causing severe losses to data breach victims. To protect data breach victims, researchers focus on collecting exposed PII to assess privacy risk and identify at-risk individuals. However, existing studies mostly rely on exposed PII collected from either the dark web or the surface web. Due to the wide exposure of PII on both the dark web and surface web, collecting from only the dark web or the surface web could result in an underestimation of privacy risk. Despite its research and practical value, jointly collecting PII from both sources is a non-trivial task. In this paper, we summarize our effort to systematically identify, collect, and monitor a total of 1, 212, 004, 819 exposed PII records across both the dark web and surface web. Our effort resulted in 5.8 million stolen SSNs, 845, 000 stolen credit/debit cards, and 1.2 billion stolen account credentials. From the surface web, we identified and collected over 1.3 million PII records of the victims whose PII is exposed on the dark web. To the best of our knowledge, this is the largest academic collection of exposed PII, which, if properly anonymized, enables various privacy research inquiries, including assessing privacy risk and identifying at-risk populations.
AB - Personally identifiable information (PII) has become a major target of cyber-attacks, causing severe losses to data breach victims. To protect data breach victims, researchers focus on collecting exposed PII to assess privacy risk and identify at-risk individuals. However, existing studies mostly rely on exposed PII collected from either the dark web or the surface web. Due to the wide exposure of PII on both the dark web and surface web, collecting from only the dark web or the surface web could result in an underestimation of privacy risk. Despite its research and practical value, jointly collecting PII from both sources is a non-trivial task. In this paper, we summarize our effort to systematically identify, collect, and monitor a total of 1, 212, 004, 819 exposed PII records across both the dark web and surface web. Our effort resulted in 5.8 million stolen SSNs, 845, 000 stolen credit/debit cards, and 1.2 billion stolen account credentials. From the surface web, we identified and collected over 1.3 million PII records of the victims whose PII is exposed on the dark web. To the best of our knowledge, this is the largest academic collection of exposed PII, which, if properly anonymized, enables various privacy research inquiries, including assessing privacy risk and identifying at-risk populations.
KW - PII
KW - dark web
KW - data breach
KW - data collection
KW - privacy
KW - surface web
UR - http://www.scopus.com/inward/record.url?scp=85098945773&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85098945773&partnerID=8YFLogxK
U2 - 10.1109/ISI49825.2020.9280540
DO - 10.1109/ISI49825.2020.9280540
M3 - Conference contribution
AN - SCOPUS:85098945773
T3 - Proceedings - 2020 IEEE International Conference on Intelligence and Security Informatics, ISI 2020
BT - Proceedings - 2020 IEEE International Conference on Intelligence and Security Informatics, ISI 2020
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 9 November 2020 through 10 November 2020
ER -