Diamonds in the rough: Event extraction from imperfect microblog data

Ander Intxaurrondo, Eneko Agirre, Oier Lopez De Lacalle, Mihai Surdeanu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

10 Scopus citations

Abstract

We introduce a distantly supervised event extraction approach that extracts complex event templates from microblogs. We show that this near real-time data source is more challenging than news because it contains information that is both approximate (e.g., with values that are close but different from the gold truth) and ambiguous (due to the brevity of the texts), impacting both the evaluation and extraction methods. For the former, we propose a novel, "soft", F1 metric that incorporates similarity between extracted fillers and the gold truth, giving partial credit to different but similar values. With respect to extraction methodology, we propose two extensions to the distant supervision paradigm: to address approximate information, we allow positive training examples to be generated from information that is similar but not identical to gold values; to address ambiguity, we aggregate contexts across tweets discussing the same event. We evaluate our contributions on the complex domain of earthquakes, with events with up to 20 arguments. Our results indicate that, despite their simplicity, our contributions yield a statistically-significant improvement of 33% (relative) over a strong distantly-supervised system. The dataset containing the knowledge base, relevant tweets and manual annotations is publicly available.

Original languageEnglish (US)
Title of host publicationNAACL HLT 2015 - 2015 Conference of the North American Chapter of the Association for Computational Linguistics
Subtitle of host publicationHuman Language Technologies, Proceedings of the Conference
PublisherAssociation for Computational Linguistics (ACL)
Pages641-650
Number of pages10
ISBN (Electronic)9781941643495
DOIs
StatePublished - 2015
EventConference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2015 - Denver, United States
Duration: May 31 2015Jun 5 2015

Publication series

NameNAACL HLT 2015 - 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference

Other

OtherConference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2015
Country/TerritoryUnited States
CityDenver
Period5/31/156/5/15

ASJC Scopus subject areas

  • Computer Science Applications
  • Language and Linguistics
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Diamonds in the rough: Event extraction from imperfect microblog data'. Together they form a unique fingerprint.

Cite this