TY - GEN
T1 - Detection of Puffery on the English Wikipedia
AU - Bertsch, Amanda
AU - Bethard, Steven
N1 - Publisher Copyright:
© 2021 Association for Computational Linguistics.
PY - 2021
Y1 - 2021
N2 - On Wikipedia, an online crowdsourced encyclopedia, volunteers enforce the encyclopedia’s editorial policies. Wikipedia’s policy on maintaining a neutral point of view has inspired recent research on bias detection, including “weasel words” and “hedges”. Yet to date, little work has been done on identifying “puffery,” phrases that are overly positive without a verifiable source. We demonstrate that collecting training data for this task requires some care, and construct a dataset by combining Wikipedia editorial annotations and information retrieval techniques. We compare several approaches to predicting puffery, and achieve 0.963 f1 score by incorporating citation features into a RoBERTa model. Finally, we demonstrate how to integrate our model with Wikipedia’s public infrastructure to give back to the Wikipedia editor community.
AB - On Wikipedia, an online crowdsourced encyclopedia, volunteers enforce the encyclopedia’s editorial policies. Wikipedia’s policy on maintaining a neutral point of view has inspired recent research on bias detection, including “weasel words” and “hedges”. Yet to date, little work has been done on identifying “puffery,” phrases that are overly positive without a verifiable source. We demonstrate that collecting training data for this task requires some care, and construct a dataset by combining Wikipedia editorial annotations and information retrieval techniques. We compare several approaches to predicting puffery, and achieve 0.963 f1 score by incorporating citation features into a RoBERTa model. Finally, we demonstrate how to integrate our model with Wikipedia’s public infrastructure to give back to the Wikipedia editor community.
UR - https://www.scopus.com/pages/publications/85138782394
UR - https://www.scopus.com/pages/publications/85138782394#tab=citedBy
U2 - 10.18653/v1/2021.wnut-1.36
DO - 10.18653/v1/2021.wnut-1.36
M3 - Conference contribution
AN - SCOPUS:85138782394
T3 - W-NUT 2021 - 7th Workshop on Noisy User-Generated Text, Proceedings of the Conference
SP - 329
EP - 333
BT - W-NUT 2021 - 7th Workshop on Noisy User-Generated Text, Proceedings of the Conference
A2 - Xu, Wei
A2 - Ritter, Alan
A2 - Baldwin, Tim
A2 - Rahimi, Afshin
PB - Association for Computational Linguistics (ACL)
T2 - 7th Workshop on Noisy User-Generated Text, W-NUT 2021
Y2 - 11 November 2021
ER -