Customizing an Information Extraction System to a New Domain

Mihai Surdeanu, David McClosky, Mason R. Smith, Andrey Gusev, Christopher D. Manning

Research output: Chapter in Book/Report/Conference proceedingConference contribution

21 Scopus citations

Abstract

We introduce several ideas that improve the performance of supervised information extraction systems with a pipeline architecture, when they are customized for new domains. We show that: (a) a combination of a sequence tagger with a rule-based approach for entity mention extraction yields better performance for both entity and relation mention extraction; (b) improving the identification of syntactic heads of entity mentions helps relation extraction; and (c) a deterministic inference engine captures some of the joint domain structure, even when introduced as a postprocessing step to a pipeline system. All in all, our contributions yield a 20% relative increase in F1 score in a domain significantly different from the domains used during the development of our information extraction system.

Original languageEnglish (US)
Title of host publicationWorkshop on Relational Models of Semantics, RELMS 2011 at the 49th Annual Meeting of the Association for Computational Linguistics
Subtitle of host publicationHuman Language Technologies, ACL-HLT 2011 - Proceedings
PublisherAssociation for Computational Linguistics (ACL)
Pages2-10
Number of pages9
ISBN (Electronic)9781932432985
StatePublished - 2011
Externally publishedYes
EventACL 2011 Workshop on Relational Models of Semantics, RELMS 2011 - Portland, United States
Duration: Jun 23 2011 → …

Publication series

NameProceedings of the Annual Meeting of the Association for Computational Linguistics
ISSN (Print)0736-587X

Conference

ConferenceACL 2011 Workshop on Relational Models of Semantics, RELMS 2011
Country/TerritoryUnited States
CityPortland
Period6/23/11 → …

ASJC Scopus subject areas

  • Computer Science Applications
  • Linguistics and Language
  • Language and Linguistics

Fingerprint

Dive into the research topics of 'Customizing an Information Extraction System to a New Domain'. Together they form a unique fingerprint.

Cite this