Abstract
Current approaches to word sense disambiguation use and combine various machine-learning techniques. Most refer to characteristics of the ambiguous word and surrounding words and are based on hundreds of examples. Unfortunately, developing large training sets is time-consuming. We investigate the use of symbolic knowledge to augment machine-learning techniques for small datasets. UMLS semantic types assigned to concepts found in the sentence and relationships between these semantic types form the knowledge base. A naïve Bayes classifier was trained for 15 words with 100 examples for each. The most frequent sense of a word served as the baseline. The effect of increasingly accurate symbolic knowledge was evaluated in eight experimental conditions. Performance was measured by accuracy based on 10-fold cross-validation. The best condition used only the semantic types of the words in the sentence. Accuracy was then on average 10% higher than the baseline; however, it varied from 8% deterioration to 29% improvement. In a follow-up evaluation, we noted a trend that the best disambiguation was found for words that were the least troublesome to the human evaluators.
Original language | English (US) |
---|---|
Pages (from-to) | 381-385 |
Number of pages | 5 |
Journal | Studies in health technology and informatics |
Volume | 107 |
DOIs | |
State | Published - 2004 |
Externally published | Yes |
Keywords
- Artificial intelligence
- UMLS
- Unified Medical Language System
- machine learning
- naïve Bayes
- small datasets
- symbolic knowledge
- word sense disambiguation
ASJC Scopus subject areas
- Biomedical Engineering
- Health Informatics
- Health Information Management