TY - JOUR
T1 - A natural language processing pipeline to extract phenotypic data from formal taxonomic descriptions with a focus on flagellate plants
AU - Endara, Lorena
AU - Burleigh, J. Gordon
AU - Laporte, Marie Angélique
AU - Cooper, Laurel
AU - Jaiswal, Pankaj
AU - Cui, Hong
N1 - Funding Information:
Nathalie Nagalingum (California Academy of Sciences) and Eric Schuettpeltz (Smithsonian Institution) provided taxonomic descriptions and guided the sampling strategy, Annika Smith (Florida Museum of Natural History) contributed with term definitions. This work was supported by the National Science Foundation NSF-Building a Comprehensive Evolutionary History of Flagellate Plants (DEB-1541506), and NSF-Exploring Taxon Concepts (ETC) through Analyzing Fine-Grained Semantic Markup of Descriptive Literature (DBI-1147266). Funding for the Planteome project is provided by the National Science Foundation award IOS-1340112.
Funding Information:
ACKNOWLEDGMENT Nathalie Nagalingum (California Academy of Sciences) and Eric Schuettpeltz (Smithsonian Institution) provided taxonomic descriptions and guided the sampling strategy, Annika Smith (Florida Museum of Natural History) contributed with term definitions. This work was supported by the National Science Foundation NSF-Building a Comprehensive Evolutionary History of Flagellate Plants (DEB-1541506), and NSF-Exploring Taxon Concepts (ETC) through Analyzing Fine-Grained Semantic Markup of Descriptive Literature (DBI-1147266). Funding for the Planteome project is provided by the National Science Foundation award IOS-1340112.
Publisher Copyright:
© 2018 CEUR-WS.
PY - 2018
Y1 - 2018
N2 - Assembling large-scale phenotypic datasets for evolutionary and biodiversity studies of plants can be extremely difficult and time consuming. New semi-automated Natural Language Processing (NLP) pipelines can extract phenotypic data from taxonomic descriptions, and their performance can be enhanced by incorporating information from ontologies, like the Plant Ontology (PO) and the Plant Trait Ontology (TO). These ontologies are powerful tools for comparing phenotypes across taxa for large-scale evolutionary and ecological analyses, but they are largely focused on terms associated with flowering plants. We describe a bottom-up approach to identify terms from flagellate plants (including bryophytes, lycophytes, ferns, and gymnosperms) that can be added to existing plant ontologies. We first parsed a large corpus of electronic taxonomic descriptions using the Explorer of Taxon Concepts tool (http://taxonconceptexplorer. org/) and identified flagellate plant specific terms that were missing from the existing ontologies. We extracted new structure and trait terms, and we are currently incorporating the missing structure terms to the PO and modifying the definitions of existing terms to expand their coverage to flagellate plants. We will incorporate trait terms to the TO in the near future.
AB - Assembling large-scale phenotypic datasets for evolutionary and biodiversity studies of plants can be extremely difficult and time consuming. New semi-automated Natural Language Processing (NLP) pipelines can extract phenotypic data from taxonomic descriptions, and their performance can be enhanced by incorporating information from ontologies, like the Plant Ontology (PO) and the Plant Trait Ontology (TO). These ontologies are powerful tools for comparing phenotypes across taxa for large-scale evolutionary and ecological analyses, but they are largely focused on terms associated with flowering plants. We describe a bottom-up approach to identify terms from flagellate plants (including bryophytes, lycophytes, ferns, and gymnosperms) that can be added to existing plant ontologies. We first parsed a large corpus of electronic taxonomic descriptions using the Explorer of Taxon Concepts tool (http://taxonconceptexplorer. org/) and identified flagellate plant specific terms that were missing from the existing ontologies. We extracted new structure and trait terms, and we are currently incorporating the missing structure terms to the PO and modifying the definitions of existing terms to expand their coverage to flagellate plants. We will incorporate trait terms to the TO in the near future.
KW - Flagellate plants
KW - Matrices
KW - Natural language processing
KW - Phenotypic traits
KW - Phylogeny
KW - Plant ontology
KW - Plant trait ontology
KW - Taxonomic descriptions
UR - http://www.scopus.com/inward/record.url?scp=85059844191&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85059844191&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85059844191
SN - 1613-0073
VL - 2285
JO - CEUR Workshop Proceedings
JF - CEUR Workshop Proceedings
T2 - 9th International Conference on Biological Ontology, ICBO 2018
Y2 - 7 August 2018 through 10 August 2018
ER -