Topic model methods for automatically identifying out-of-scope resources

Steven Bethard, Soumya Ghosh, James H. Martin, Tamara Sumner

Research output: Chapter in Book/Report/Conference proceedingConference contribution

7 Scopus citations

Abstract

Recent years have seen the rise of subject-themed digital libraries, such as the NSDL pathways and the Digital Library for Earth System Education (DLESE). These libraries often need to manually verify that contributed resources cover top- ics that fit within the theme of the library. We show that such scope judgments can be automated using a combination of text classification techniques and topic modeling. Our models address two significant challenges in making scope judgments: only a small number of out-of-scope resources are typically available, and the topic distinctions required for digital libraries are much more subtle than classic text classification problems. To meet these challenges, our mod- els combine support vector machine learners optimized to diffierent performance metrics and semantic topics induced by unsupervised statistical topic models. Our best model\ is able to distinguish resources that belong in DLESE from resources that don't with an accuracy of around 70%. We see these models as the first steps towards increasing the scalability of digital libraries and dramatically reducing the workload required to maintain them.

Original languageEnglish (US)
Title of host publicationJCDL'09 - Proceedings of the 2009 ACM/IEEE Joint Conference on Digital Libraries
Pages19-28
Number of pages10
DOIs
StatePublished - 2009
Externally publishedYes
Event2009 ACM/IEEE Joint Conference on Digital Libraries, JCDL'09 - Austin, TX, United States
Duration: Jun 15 2009Jun 19 2009

Publication series

NameProceedings of the ACM/IEEE Joint Conference on Digital Libraries
ISSN (Print)1552-5996

Other

Other2009 ACM/IEEE Joint Conference on Digital Libraries, JCDL'09
Country/TerritoryUnited States
CityAustin, TX
Period6/15/096/19/09

Keywords

  • Digital libraries
  • Machine learning
  • Relevance
  • Scope
  • Topics

ASJC Scopus subject areas

  • General Engineering

Fingerprint

Dive into the research topics of 'Topic model methods for automatically identifying out-of-scope resources'. Together they form a unique fingerprint.

Cite this