Unsupervised extraction of text segments from heterogeneous document collections

Research output: Contribution to journalArticlepeer-review

1 Scopus citations


This paper describes a simple, unsupervised bootstrapping procedure that identifies morphological description segments from heterogeneous biodiversity document collections. While the procedure is used to preprocess biodiversity literature for semantic annotation of morphological descriptions in our project, it also can be used to crawl the Web for morphological descriptions for a biodiversity niche search engine.

Original languageEnglish (US)
JournalProceedings of the ASIST Annual Meeting
StatePublished - Nov 2010


  • Biodiversity document collections
  • Morphological description
  • Segment information retrieval
  • Semantic annotation
  • Unsupervised machine learning

ASJC Scopus subject areas

  • Information Systems
  • Library and Information Sciences


Dive into the research topics of 'Unsupervised extraction of text segments from heterogeneous document collections'. Together they form a unique fingerprint.

Cite this