TY - GEN
T1 - A framework for stylometric similarity detection in online settings
AU - Abbasi, Ahmed
AU - Chen, Hsinchun
PY - 2007
Y1 - 2007
N2 - Online marketplaces and communication media such as email, web sites, forums, and chat rooms have been ubiquitously integrated into our everyday lives. Unfortunately, the anonymous nature of these channels makes them an ideal avenue for online fraud, hackers, and cybercrime. Anonymity and the sheer volume of online content make cyber identity tracing an essential yet strenuous endeavor for Internet users and human analysts. In order to address these challenges, we propose a framework for online stylometric analysis to assist in distinguishing authorship in online communities based on writing style. Our framework includes the use of a scalable identity-level similarity detection technique coupled with an extensive stylistic feature set and an identity database. The framework is intended to support stylometric authentication for Internet users as well as provide support for forensic investigations. The proposed technique and extended feature set were evaluated on a test bed encompassing thousands of feedback comments posted by 100 electronic market traders. The method outperformed benchmark stylometric techniques with an accuracy of approximately 95% when differentiating between 200 trader identities. The results indicate that the proposed stylometric analysis approach may help mitigate the effects of online anonymity abuse.
AB - Online marketplaces and communication media such as email, web sites, forums, and chat rooms have been ubiquitously integrated into our everyday lives. Unfortunately, the anonymous nature of these channels makes them an ideal avenue for online fraud, hackers, and cybercrime. Anonymity and the sheer volume of online content make cyber identity tracing an essential yet strenuous endeavor for Internet users and human analysts. In order to address these challenges, we propose a framework for online stylometric analysis to assist in distinguishing authorship in online communities based on writing style. Our framework includes the use of a scalable identity-level similarity detection technique coupled with an extensive stylistic feature set and an identity database. The framework is intended to support stylometric authentication for Internet users as well as provide support for forensic investigations. The proposed technique and extended feature set were evaluated on a test bed encompassing thousands of feedback comments posted by 100 electronic market traders. The method outperformed benchmark stylometric techniques with an accuracy of approximately 95% when differentiating between 200 trader identities. The results indicate that the proposed stylometric analysis approach may help mitigate the effects of online anonymity abuse.
UR - http://www.scopus.com/inward/record.url?scp=84870155622&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84870155622&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84870155622
SN - 9781604233810
T3 - Association for Information Systems - 13th Americas Conference on Information Systems, AMCIS 2007: Reaching New Heights
SP - 1442
EP - 1451
BT - Association for Information Systems - 13th Americas Conference on Information Systems, AMCIS 2007
T2 - 13th Americas Conference on Information Systems, AMCIS 2007
Y2 - 10 August 2007 through 12 August 2007
ER -