TY - GEN
T1 - Visual supervision in bootstrapped information extraction
AU - Berger, Matthew
AU - Nagesh, Ajay
AU - Levine, Joshua A.
AU - Surdeanu, Mihai
AU - Zhang, Hao Helen
N1 - Publisher Copyright:
© 2018 Association for Computational Linguistics
PY - 2018
Y1 - 2018
N2 - We challenge a common assumption in active learning, that a list-based interface populated by informative samples provides for efficient and effective data annotation. We show how a 2D scatterplot populated with diverse and representative samples can yield improved models given the same time budget. We consider this for bootstrapping-based information extraction, in particular named entity classification, where human and machine jointly label data. To enable effective data annotation in a scatterplot, we have developed an embedding-based bootstrapping model that learns the distributional similarity of entities through the patterns that match them in a large data corpus, while being discriminative with respect to human-labeled and machine-promoted entities. We conducted a user study to assess the effectiveness of these different interfaces, and analyze bootstrapping performance in terms of human labeling accuracy, label quantity, and labeling consensus across multiple users. Our results suggest that supervision acquired from the scatterplot interface, despite being noisier, yields improvements in classification performance compared with the list interface, due to a larger quantity of supervision acquired.
AB - We challenge a common assumption in active learning, that a list-based interface populated by informative samples provides for efficient and effective data annotation. We show how a 2D scatterplot populated with diverse and representative samples can yield improved models given the same time budget. We consider this for bootstrapping-based information extraction, in particular named entity classification, where human and machine jointly label data. To enable effective data annotation in a scatterplot, we have developed an embedding-based bootstrapping model that learns the distributional similarity of entities through the patterns that match them in a large data corpus, while being discriminative with respect to human-labeled and machine-promoted entities. We conducted a user study to assess the effectiveness of these different interfaces, and analyze bootstrapping performance in terms of human labeling accuracy, label quantity, and labeling consensus across multiple users. Our results suggest that supervision acquired from the scatterplot interface, despite being noisier, yields improvements in classification performance compared with the list interface, due to a larger quantity of supervision acquired.
UR - http://www.scopus.com/inward/record.url?scp=85070706291&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85070706291&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85070706291
T3 - Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018
SP - 2043
EP - 2053
BT - Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018
A2 - Riloff, Ellen
A2 - Chiang, David
A2 - Hockenmaier, Julia
A2 - Tsujii, Jun'ichi
PB - Association for Computational Linguistics
T2 - 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018
Y2 - 31 October 2018 through 4 November 2018
ER -