Domain adaptation for authorship attribution: Improved structural correspondence learning

Upendra Sapkota, Thamar Solorio, Manuel Montes-y-Gómez, Steven Bethard

Research output: Chapter in Book/Report/Conference proceedingConference contribution

14 Scopus citations

Abstract

We present the first domain adaptation model for authorship attribution to leverage unlabeled data. The model includes extensions to structural correspondence learning needed to make it appropriate for the task. For example, we propose a median-based classification instead of the standard binary classification used in previous work. Our results show that punctuation-based character n-grams form excellent pivot features. We also show how singular value decomposition plays a critical role in achieving domain adaptation, and that replacing (instead of concatenating) non-pivot features with correspondence features yields better performance.

Original languageEnglish (US)
Title of host publication54th Annual Meeting of the Association for Computational Linguistics, ACL 2016 - Long Papers
PublisherAssociation for Computational Linguistics (ACL)
Pages2226-2235
Number of pages10
ISBN (Electronic)9781510827585
DOIs
StatePublished - 2016
Externally publishedYes
Event54th Annual Meeting of the Association for Computational Linguistics, ACL 2016 - Berlin, Germany
Duration: Aug 7 2016Aug 12 2016

Publication series

Name54th Annual Meeting of the Association for Computational Linguistics, ACL 2016 - Long Papers
Volume4

Conference

Conference54th Annual Meeting of the Association for Computational Linguistics, ACL 2016
Country/TerritoryGermany
CityBerlin
Period8/7/168/12/16

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Domain adaptation for authorship attribution: Improved structural correspondence learning'. Together they form a unique fingerprint.

Cite this