Abstract
The advent and rapid proliferation of internet communication has allowed the realization of numerous security issues. The anonymous nature of online mediums such as email, web sites, and forums provides an attractive communication method for criminal activity. Increased globalization and the boundless nature of the internet have further amplified these concerns due to the addition of a multilingual dimension. The world's social and political climate has caused Arabic to draw a great deal of attention. In this study we apply authorship identification techniques to Arabic web forum messages. Our research uses lexical, syntactic, structural, and content-specific writing style features for authorship identification. We address some of the problematic characteristics of Arabic in route to the development of an Arabic language model that provides a respectable level of classification accuracy for authorship discrimination. We also run experiments to evaluate the effectiveness of different feature types and classification techniques on our dataset.
Original language | English (US) |
---|---|
Pages (from-to) | 183-197 |
Number of pages | 15 |
Journal | LECTURE NOTES IN COMPUTER SCIENCE |
Volume | 3495 |
DOIs | |
State | Published - 2005 |
Event | IEEE International Conference on Intelligence and Security Informatics, ISI 2005 - Atlanta, GA, United States Duration: May 19 2005 → May 20 2005 |
ASJC Scopus subject areas
- Theoretical Computer Science
- General Computer Science