Spinning straw into gold: Using free text to train monolingual alignment models for non-factoid question answering

Rebecca Sharp, Peter Jansen, Mihai Surdeanu, Peter Clark

Research output: Chapter in Book/Report/Conference proceedingConference contribution

15 Scopus citations

Abstract

Monolingual alignment models have been shown to boost the performance of question answering systems by "bridging the lexical chasm" between questions and answers. The main limitation of these approaches is that they require semistructured training data in the form of question-answer pairs, which is difficult to obtain in specialized domains or lowresource languages. We propose two inexpensive methods for training alignment models solely using free text, by generating artificial question-answer pairs from discourse structures. Our approach is driven by two representations of discourse: a shallow sequential representation, and a deep one based on Rhetorical Structure Theory. We evaluate the proposed model on two corpora from different genres and domains: one from Yahoo! Answers and one from the biology domain, and two types of non-factoid questions: manner and reason. We show that these alignment models trained directly from discourse structures imposed on free text improve performance considerably over an information retrieval baseline and a neural network language model trained on the same data.

Original languageEnglish (US)
Title of host publicationNAACL HLT 2015 - 2015 Conference of the North American Chapter of the Association for Computational Linguistics
Subtitle of host publicationHuman Language Technologies, Proceedings of the Conference
PublisherAssociation for Computational Linguistics (ACL)
Pages231-237
Number of pages7
ISBN (Electronic)9781941643495
DOIs
StatePublished - 2015
EventConference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2015 - Denver, United States
Duration: May 31 2015Jun 5 2015

Publication series

NameNAACL HLT 2015 - 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference

Other

OtherConference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2015
Country/TerritoryUnited States
CityDenver
Period5/31/156/5/15

ASJC Scopus subject areas

  • Computer Science Applications
  • Language and Linguistics
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Spinning straw into gold: Using free text to train monolingual alignment models for non-factoid question answering'. Together they form a unique fingerprint.

Cite this