TY - JOUR
T1 - Transparent deep learning to identify autism spectrum disorders (ASD) in EHR using clinical notes
AU - Leroy, Gondy
AU - Andrews, Jennifer G.
AU - Kealohi-Preece, Madison
AU - Jaswani, Ajay
AU - Song, Hyunju
AU - Galindo, Maureen Kelly
AU - Rice, Sydney A.
N1 - Publisher Copyright:
© 2024 The Author(s). Published by Oxford University Press on behalf of the American Medical Informatics Association.
PY - 2024/6/1
Y1 - 2024/6/1
N2 - Objective: Machine learning (ML) is increasingly employed to diagnose medical conditions, with algorithms trained to assign a single label using a black-box approach. We created an ML approach using deep learning that generates outcomes that are transparent and in line with clinical, diagnostic rules. We demonstrate our approach for autism spectrum disorders (ASD), a neurodevelopmental condition with increasing prevalence. Methods: We use unstructured data from the Centers for Disease Control and Prevention (CDC) surveillance records labeled by a CDC-trained clinician with ASD A1-3 and B1-4 criterion labels per sentence and with ASD cases labels per record using Diagnostic and Statistical Manual of Mental Disorders (DSM5) rules. One rule-based and three deep ML algorithms and six ensembles were compared and evaluated using a test set with 6773 sentences (N = 35 cases) set aside in advance. Criterion and case labeling were evaluated for each ML algorithm and ensemble. Case labeling outcomes were compared also with seven traditional tests. Results: Performance for criterion labeling was highest for the hybrid BiLSTM ML model. The best case labeling was achieved by an ensemble of two BiLSTM ML models using a majority vote. It achieved 100% precision (or PPV), 83% recall (or sensitivity), 100% specificity, 91% accuracy, and 0.91 F-measure. A comparison with existing diagnostic tests shows that our best ensemble was more accurate overall. Conclusions: Transparent ML is achievable even with small datasets. By focusing on intermediate steps, deep ML can provide transparent decisions. By leveraging data redundancies, ML errors at the intermediate level have a low impact on final outcomes.
AB - Objective: Machine learning (ML) is increasingly employed to diagnose medical conditions, with algorithms trained to assign a single label using a black-box approach. We created an ML approach using deep learning that generates outcomes that are transparent and in line with clinical, diagnostic rules. We demonstrate our approach for autism spectrum disorders (ASD), a neurodevelopmental condition with increasing prevalence. Methods: We use unstructured data from the Centers for Disease Control and Prevention (CDC) surveillance records labeled by a CDC-trained clinician with ASD A1-3 and B1-4 criterion labels per sentence and with ASD cases labels per record using Diagnostic and Statistical Manual of Mental Disorders (DSM5) rules. One rule-based and three deep ML algorithms and six ensembles were compared and evaluated using a test set with 6773 sentences (N = 35 cases) set aside in advance. Criterion and case labeling were evaluated for each ML algorithm and ensemble. Case labeling outcomes were compared also with seven traditional tests. Results: Performance for criterion labeling was highest for the hybrid BiLSTM ML model. The best case labeling was achieved by an ensemble of two BiLSTM ML models using a majority vote. It achieved 100% precision (or PPV), 83% recall (or sensitivity), 100% specificity, 91% accuracy, and 0.91 F-measure. A comparison with existing diagnostic tests shows that our best ensemble was more accurate overall. Conclusions: Transparent ML is achievable even with small datasets. By focusing on intermediate steps, deep ML can provide transparent decisions. By leveraging data redundancies, ML errors at the intermediate level have a low impact on final outcomes.
KW - autism spectrum disorders
KW - deep learning
KW - machine learning
KW - natural language processing
KW - transparency
UR - http://www.scopus.com/inward/record.url?scp=85193946285&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85193946285&partnerID=8YFLogxK
U2 - 10.1093/jamia/ocae080
DO - 10.1093/jamia/ocae080
M3 - Article
C2 - 38626184
AN - SCOPUS:85193946285
SN - 1067-5027
VL - 31
SP - 1313
EP - 1321
JO - Journal of the American Medical Informatics Association
JF - Journal of the American Medical Informatics Association
IS - 6
ER -