Building the “Plant glossary”—a controlled botanical vocabulary using terms extracted from the floras of North America and China

Lorena Endara, Heather A. Cole, J. Gordon Burleigh, Nathalie S. Nagalingum, James A. Macklin, Jing Liu, Sonali Ranade, Hong Cui

Research output: Contribution to journalArticlepeer-review

11 Scopus citations


Taxonomic descriptions contain valuable phenotypic data that is often not directly accessible for modern evolutionary, ecological, or biodiversity analyses. We describe a process for building a consensus-based controlled vocabulary from taxonomic descriptions for plants, which also can be applied for building controlled vocabularies for other taxon groups. Controlled vocabularies are useful as lexicons for text mining algorithms, as source of candidate terms for ontologies, and as guides to help future authors use domain vocabulary more appropriately and consistently. We extracted phenotype-describing phrases terms from descriptions of 30 volumes of the Flora of North America and Flora of China and merged these with terms from the Categorical Glossary of the Flora of North America. Seven contributors placed the terms into a set of categories until there was an agreement among two or more categorizations per term. Term categorization makes the meaning of a term more explicit for the subsequent users of the glossary. The resulting “Plant Glossary” (terms and categorization of terms) contains 9228 terms grouped in 53 categories. Differences in term categorization represented 49% of the categorization effort, and the many differences among individual classifications can be attributed to individual interpretation of terms and to the fluid nature of descriptive language used in Floras. The difficulties experienced while classifying the terms allowed us to explore cases where the use of language can hinder the accurate and detailed annotation of taxonomic descriptions. The Plant Glossary represents a significant step towards creating and enriching formal ontologies for plant phenotypes as the semantic phenomena found through this exercise is useful background information for building ontologies. The glossary has been used by new software to parse and annotate plant taxonomic descriptions, and over 6000 new terms are available for creating ontologies.

Original languageEnglish (US)
Pages (from-to)953-966
Number of pages14
Issue number4
StatePublished - 2017


  • Controlled vocabulary
  • Phenotypic traits
  • Plant glossary
  • Semantics
  • Taxonomic descriptions

ASJC Scopus subject areas

  • Ecology, Evolution, Behavior and Systematics
  • Plant Science


Dive into the research topics of 'Building the “Plant glossary”—a controlled botanical vocabulary using terms extracted from the floras of North America and China'. Together they form a unique fingerprint.

Cite this