Abstract
TXTRACTOR is a tool that uses established sentence-selection heuristics to rank text segments, producing summaries that contain a user-defined number of sentences. The purpose of identifying text segments is to maximize topic diversity, which is an adaptation of the Maximal Marginal Relevance criterion used by Carbonell and Goldstein [5]. Sentence selection heuristics are then used to rank the segments. We hypothesize that ranking text segments via traditional sentence-selection heuristics produces a balanced summary with more useful information than one produced by using segmentation alone. The proposed summary is created in a three-step process, which includes 1) sentence evaluation 2) segment identification and 3) segment ranking. As the required length of the summary changes, low-ranking segments can then be dropped from (or higher ranking segments added to) the summary. We compared the output of TXTRACTOR to the output of a segmentation tool based on the TextTiling algorithm to validate the approach.
Original language | English (US) |
---|---|
Pages | 28-35 |
Number of pages | 8 |
DOIs | |
State | Published - 2002 |
Event | Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries - Portland, OR, United States Duration: Jul 14 2002 → Jul 18 2002 |
Other
Other | Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries |
---|---|
Country/Territory | United States |
City | Portland, OR |
Period | 7/14/02 → 7/18/02 |
Keywords
- Information retrieval
- Text extraction
- Text segmentation
- Text summarization
ASJC Scopus subject areas
- Software
- Information Systems
- Computer Science Applications
- Library and Information Sciences