Feature-rich two-stage logistic regression for monolingual alignment

Md Arafat Sultan, Steven Bethard, Tamara Sumner

Research output: Chapter in Book/Report/Conference proceedingConference contribution

12 Scopus citations

Abstract

Monolingual alignment is the task of pairing semantically similar units from two pieces of text. We report a top-performing supervised aligner that operates on short text snippets. We employ a large feature set to (1) encode similarities among semantic units (words and named entities) in context, and (2) address cooperation and competition for alignment among units in the same snippet. These features are deployed in a two-stage logistic regression framework for alignment. On two benchmark data sets, our aligner achieves F1 scores of 92.1% and 88.5%, with statistically significant error reductions of 4.8% and 7.3% over the previous best aligner. It produces top results in extrinsic evaluation as well.

Original languageEnglish (US)
Title of host publicationConference Proceedings - EMNLP 2015
Subtitle of host publicationConference on Empirical Methods in Natural Language Processing
PublisherAssociation for Computational Linguistics (ACL)
Pages949-959
Number of pages11
ISBN (Electronic)9781941643327
DOIs
StatePublished - 2015
Externally publishedYes
EventConference on Empirical Methods in Natural Language Processing, EMNLP 2015 - Lisbon, Portugal
Duration: Sep 17 2015Sep 21 2015

Publication series

NameConference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing

Conference

ConferenceConference on Empirical Methods in Natural Language Processing, EMNLP 2015
Country/TerritoryPortugal
CityLisbon
Period9/17/159/21/15

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Information Systems

Fingerprint

Dive into the research topics of 'Feature-rich two-stage logistic regression for monolingual alignment'. Together they form a unique fingerprint.

Cite this