TY - GEN
T1 - Customizing an Information Extraction System to a New Domain
AU - Surdeanu, Mihai
AU - McClosky, David
AU - Smith, Mason R.
AU - Gusev, Andrey
AU - Manning, Christopher D.
N1 - Publisher Copyright:
© Proceedings of the Annual Meeting of the Association for Computational Linguistics 2011.
PY - 2011
Y1 - 2011
N2 - We introduce several ideas that improve the performance of supervised information extraction systems with a pipeline architecture, when they are customized for new domains. We show that: (a) a combination of a sequence tagger with a rule-based approach for entity mention extraction yields better performance for both entity and relation mention extraction; (b) improving the identification of syntactic heads of entity mentions helps relation extraction; and (c) a deterministic inference engine captures some of the joint domain structure, even when introduced as a postprocessing step to a pipeline system. All in all, our contributions yield a 20% relative increase in F1 score in a domain significantly different from the domains used during the development of our information extraction system.
AB - We introduce several ideas that improve the performance of supervised information extraction systems with a pipeline architecture, when they are customized for new domains. We show that: (a) a combination of a sequence tagger with a rule-based approach for entity mention extraction yields better performance for both entity and relation mention extraction; (b) improving the identification of syntactic heads of entity mentions helps relation extraction; and (c) a deterministic inference engine captures some of the joint domain structure, even when introduced as a postprocessing step to a pipeline system. All in all, our contributions yield a 20% relative increase in F1 score in a domain significantly different from the domains used during the development of our information extraction system.
UR - http://www.scopus.com/inward/record.url?scp=84944053050&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84944053050&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84944053050
T3 - Proceedings of the Annual Meeting of the Association for Computational Linguistics
SP - 2
EP - 10
BT - Workshop on Relational Models of Semantics, RELMS 2011 at the 49th Annual Meeting of the Association for Computational Linguistics
PB - Association for Computational Linguistics (ACL)
T2 - ACL 2011 Workshop on Relational Models of Semantics, RELMS 2011
Y2 - 23 June 2011
ER -