Feasibility and Utility of Lexical Analysis for Occupational Health Text

Philip I Harber, Gondy Augusta Leroy

Research output: Contribution to journalArticlepeer-review

6 Scopus citations


Objective: Assess feasibility and potential utility of natural language processing (NLP) for storing and analyzing occupational health data. Methods: Basic NLP lexical analysis methods were applied to 89,000 Mine Safety and Health Administration (MSHA) free text records. Steps included tokenization, term and co-occurrence counts, term annotation, and identifying exposure-health effect relationships. Presence of terms in the Unified Medical Language System (UMLS) was assessed. Results: The methods efficiently demonstrated common exposures, health effects, and exposure-injury relationships. Many workplace terms are not present in UMLS or map inaccurately. Conclusions: Use of free text rather than narrowly defined numerically coded fields is feasible, flexible, and efficient. It has potential to encourage workers and clinicians to provide more data and to support automated knowledge creation. The lexical method used is easily generalizable to other areas. The UMLS vocabularies should be enhanced to be relevant to occupational health.

Original languageEnglish (US)
Pages (from-to)578-587
Number of pages10
JournalJournal of occupational and environmental medicine
Issue number6
StatePublished - Jun 1 2017

ASJC Scopus subject areas

  • Public Health, Environmental and Occupational Health


Dive into the research topics of 'Feasibility and Utility of Lexical Analysis for Occupational Health Text'. Together they form a unique fingerprint.

Cite this