Unsupervised semantic markup of literature for biodiversity digital libraries

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Scopus citations

Abstract

This paper reports the further development of machine learning techniques for semantic markup of biodiversity literature, especially morphological descriptions of living organisms such as those hosted at efloras.org and algaebase.org. Syntactic parsing and supervised machine learning techniques have been explored by earlier research. Limitations of these techniques promoted our investigation of an unsupervised learning approach that combines the strength of earlier techniques and avoids the limitations. Semantic markup at the organ and character levels is discussed. Research on semantic markup of natural heritage literature has direct impact on the development of semantic-based access in biodiversity digital libraries.

Original languageEnglish (US)
Title of host publicationJCDL'08
Subtitle of host publicationProceedings of the 8th ACM/IEEE-CS Joint Conference on Digital Libraries 2008
Pages25-28
Number of pages4
DOIs
StatePublished - 2008
Event8th ACM/IEEE-CS Joint Conference on Digital Libraries 2008, JCDL'08 - Pittsburgh, PA, United States
Duration: Jun 16 2008Jun 20 2008

Publication series

NameProceedings of the ACM International Conference on Digital Libraries

Other

Other8th ACM/IEEE-CS Joint Conference on Digital Libraries 2008, JCDL'08
Country/TerritoryUnited States
CityPittsburgh, PA
Period6/16/086/20/08

Keywords

  • Biodiversity informatics
  • Morphological description
  • Natural heritage literature
  • Semantic annotation
  • Semantic markup
  • Tagging
  • Unsupervised machine learning
  • XML

ASJC Scopus subject areas

  • Software
  • Information Systems
  • Computer Science Applications
  • Library and Information Sciences

Fingerprint

Dive into the research topics of 'Unsupervised semantic markup of literature for biodiversity digital libraries'. Together they form a unique fingerprint.

Cite this