Cross-topic authorship attribution: Will out-of-topic data help?

Upendra Sapkota, Thamar Solorio, Manuel Montes-Y-gomez, Steven Bethard, Paolo Rosso

Research output: Chapter in Book/Report/Conference proceedingConference contribution

37 Scopus citations

Abstract

Most previous research on authorship attribution (AA) assumes that the training and test data are drawn from same distribution. But in real scenarios, this assumption is too strong. The goal of this study is to improve the prediction results in cross-topic AA (CTAA), where the training data comes from one topic but the test data comes from another. Our proposed idea is to build a predictive model for one topic using documents from all other available topics. In addition to improving the performance of CTAA, we also make a thorough analysis of the sensitivity to changes in topic of four most commonly used feature types in AA. We empirically illustrate that our proposed framework is significantly better than the one trained on a single out-of-domain topic and is as effective, in some cases, as same-topic setting.

Original languageEnglish (US)
Title of host publicationCOLING 2014 - 25th International Conference on Computational Linguistics, Proceedings of COLING 2014
Subtitle of host publicationTechnical Papers
PublisherAssociation for Computational Linguistics, ACL Anthology
Pages1228-1237
Number of pages10
ISBN (Electronic)9781941643266
StatePublished - 2014
Externally publishedYes
Event25th International Conference on Computational Linguistics, COLING 2014 - Dublin, Ireland
Duration: Aug 23 2014Aug 29 2014

Publication series

NameCOLING 2014 - 25th International Conference on Computational Linguistics, Proceedings of COLING 2014: Technical Papers

Conference

Conference25th International Conference on Computational Linguistics, COLING 2014
Country/TerritoryIreland
CityDublin
Period8/23/148/29/14

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Cross-topic authorship attribution: Will out-of-topic data help?'. Together they form a unique fingerprint.

Cite this