Abstract
Most information retrieval systems use stopword lists and stemming algorithms. However, we have found that recognizing singular and plural nouns, verb forms, negation, and prepositions can produce dramatically different text classification results. We present results from text classification experiments that compare relevancy signatures, which use local linguistic context, with corresponding indexing terms that do not. In two different domains, relevancy signatures produced better results than the simple indexing terms. These experiments suggest that stopword lists and stemming algorithms may remove or conflate many words that could be used to create more effective indexing terms.
| Original language | English (US) |
|---|---|
| Pages (from-to) | 130-136 |
| Number of pages | 7 |
| Journal | SIGIR Forum (ACM Special Interest Group on Information Retrieval) |
| DOIs | |
| State | Published - 1995 |
| Externally published | Yes |
| Event | 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 1995 - Seattle, WA, USA Duration: Jul 9 1995 → Jul 13 1995 |
ASJC Scopus subject areas
- Management Information Systems
- Hardware and Architecture
Fingerprint
Dive into the research topics of 'Little words can make a big difference for text classification'. Together they form a unique fingerprint.Cite this
- APA
- Standard
- Harvard
- Vancouver
- Author
- BIBTEX
- RIS