Using sentence-selection heuristics to rank text segments in TXTRACTOR

Daniel McDonald, Hsinchun Chen

Research output: Contribution to conferencePaperpeer-review

40 Scopus citations

Abstract

TXTRACTOR is a tool that uses established sentence-selection heuristics to rank text segments, producing summaries that contain a user-defined number of sentences. The purpose of identifying text segments is to maximize topic diversity, which is an adaptation of the Maximal Marginal Relevance criterion used by Carbonell and Goldstein [5]. Sentence selection heuristics are then used to rank the segments. We hypothesize that ranking text segments via traditional sentence-selection heuristics produces a balanced summary with more useful information than one produced by using segmentation alone. The proposed summary is created in a three-step process, which includes 1) sentence evaluation 2) segment identification and 3) segment ranking. As the required length of the summary changes, low-ranking segments can then be dropped from (or higher ranking segments added to) the summary. We compared the output of TXTRACTOR to the output of a segmentation tool based on the TextTiling algorithm to validate the approach.

Original languageEnglish (US)
Pages28-35
Number of pages8
DOIs
StatePublished - 2002
EventProceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries - Portland, OR, United States
Duration: Jul 14 2002Jul 18 2002

Other

OtherProceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries
Country/TerritoryUnited States
CityPortland, OR
Period7/14/027/18/02

Keywords

  • Information retrieval
  • Text extraction
  • Text segmentation
  • Text summarization

ASJC Scopus subject areas

  • Software
  • Information Systems
  • Computer Science Applications
  • Library and Information Sciences

Fingerprint

Dive into the research topics of 'Using sentence-selection heuristics to rank text segments in TXTRACTOR'. Together they form a unique fingerprint.

Cite this