TY - JOUR
T1 - Feasibility and Utility of Lexical Analysis for Occupational Health Text
AU - Harber, Philip I
AU - Leroy, Gondy Augusta
N1 - Publisher Copyright:
Copyright © 2017 American College of Occupational and Environmental Medicine.
PY - 2017/6/1
Y1 - 2017/6/1
N2 - Objective: Assess feasibility and potential utility of natural language processing (NLP) for storing and analyzing occupational health data. Methods: Basic NLP lexical analysis methods were applied to 89,000 Mine Safety and Health Administration (MSHA) free text records. Steps included tokenization, term and co-occurrence counts, term annotation, and identifying exposure-health effect relationships. Presence of terms in the Unified Medical Language System (UMLS) was assessed. Results: The methods efficiently demonstrated common exposures, health effects, and exposure-injury relationships. Many workplace terms are not present in UMLS or map inaccurately. Conclusions: Use of free text rather than narrowly defined numerically coded fields is feasible, flexible, and efficient. It has potential to encourage workers and clinicians to provide more data and to support automated knowledge creation. The lexical method used is easily generalizable to other areas. The UMLS vocabularies should be enhanced to be relevant to occupational health.
AB - Objective: Assess feasibility and potential utility of natural language processing (NLP) for storing and analyzing occupational health data. Methods: Basic NLP lexical analysis methods were applied to 89,000 Mine Safety and Health Administration (MSHA) free text records. Steps included tokenization, term and co-occurrence counts, term annotation, and identifying exposure-health effect relationships. Presence of terms in the Unified Medical Language System (UMLS) was assessed. Results: The methods efficiently demonstrated common exposures, health effects, and exposure-injury relationships. Many workplace terms are not present in UMLS or map inaccurately. Conclusions: Use of free text rather than narrowly defined numerically coded fields is feasible, flexible, and efficient. It has potential to encourage workers and clinicians to provide more data and to support automated knowledge creation. The lexical method used is easily generalizable to other areas. The UMLS vocabularies should be enhanced to be relevant to occupational health.
UR - http://www.scopus.com/inward/record.url?scp=85020637663&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85020637663&partnerID=8YFLogxK
U2 - 10.1097/JOM.0000000000001035
DO - 10.1097/JOM.0000000000001035
M3 - Article
C2 - 28598934
AN - SCOPUS:85020637663
SN - 1076-2752
VL - 59
SP - 578
EP - 587
JO - Journal of occupational and environmental medicine
JF - Journal of occupational and environmental medicine
IS - 6
ER -