TY - GEN
T1 - Cross-topic authorship attribution
T2 - 25th International Conference on Computational Linguistics, COLING 2014
AU - Sapkota, Upendra
AU - Solorio, Thamar
AU - Montes-Y-gomez, Manuel
AU - Bethard, Steven
AU - Rosso, Paolo
PY - 2014
Y1 - 2014
N2 - Most previous research on authorship attribution (AA) assumes that the training and test data are drawn from same distribution. But in real scenarios, this assumption is too strong. The goal of this study is to improve the prediction results in cross-topic AA (CTAA), where the training data comes from one topic but the test data comes from another. Our proposed idea is to build a predictive model for one topic using documents from all other available topics. In addition to improving the performance of CTAA, we also make a thorough analysis of the sensitivity to changes in topic of four most commonly used feature types in AA. We empirically illustrate that our proposed framework is significantly better than the one trained on a single out-of-domain topic and is as effective, in some cases, as same-topic setting.
AB - Most previous research on authorship attribution (AA) assumes that the training and test data are drawn from same distribution. But in real scenarios, this assumption is too strong. The goal of this study is to improve the prediction results in cross-topic AA (CTAA), where the training data comes from one topic but the test data comes from another. Our proposed idea is to build a predictive model for one topic using documents from all other available topics. In addition to improving the performance of CTAA, we also make a thorough analysis of the sensitivity to changes in topic of four most commonly used feature types in AA. We empirically illustrate that our proposed framework is significantly better than the one trained on a single out-of-domain topic and is as effective, in some cases, as same-topic setting.
UR - http://www.scopus.com/inward/record.url?scp=84959911457&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84959911457&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84959911457
T3 - COLING 2014 - 25th International Conference on Computational Linguistics, Proceedings of COLING 2014: Technical Papers
SP - 1228
EP - 1237
BT - COLING 2014 - 25th International Conference on Computational Linguistics, Proceedings of COLING 2014
PB - Association for Computational Linguistics, ACL Anthology
Y2 - 23 August 2014 through 29 August 2014
ER -