TY - GEN
T1 - Classifying abnormalities in computed tomography radiology reports with rule-based and natural language processing models
AU - Han, Songyue
AU - Tian, James
AU - Kelly, Mark
AU - Selvakumaran, Vignesh
AU - Henao, Ricardo
AU - Rubin, Geoffrey D.
AU - Lo, Joseph Y.
N1 - Publisher Copyright:
© 2019 SPIE.
PY - 2019
Y1 - 2019
N2 - Purpose: When conducting machine learning algorithms on classification and detection of abnormalities for medical imaging, many researchers are faced with the problem that it is hard to get enough labeled data. This is especially difficult for modalities such as computed tomography (CT) with potentially 1000 or more slice images per case. To solve this problem, we plan to use machine learning algorithms to identify abnormalities within existing radiologist reports, thus creating case-level labels that may be used for weakly supervised training on the image data. We used a two-stage procedure to label the CT reports. In the first stage, a rule-based system labeled a smaller set of cases automatically with high accuracy. In the second stage, we developed machine learing algorithms using the labels from the rule-based system and word vectors learned without supervision from unlabeled CT reports. Method: In this study, we used approximately 24,000 CT reports from Duke University Health System. We initially focused on three organs, the lungs, liver/gallbladder, and kidneys. We first developed a rule-based system that can quickly identify certain types of abnormalities within CT reports with high accuracy. For each organ and disease combination, we produced several hundred cases with rule-based labels. These labels were combined with word vectors generated using word2vec from all the unlabeled reports to train two different machine learning algorithms: (a) average of word vectors merged by logistic regression, and (b) recurrent neural network (RNN). Result: Performance was evaluated by receiver operating characteristic (ROC) area under the curve (AUC) over an independent test set of 440 reports for which those organs were manually labeled as normal or abnormal by clinical experts. For lungs, the performance was 0.796 for average word vector and 0.827 for RNN. Liver performance was 0.683 for average word vector and 0.791 for RNN. For kidneys, it was 0.786 for average word vector and 0.928 for RNN. Conclusion: It is possible to label large numbers of cases automatically. These rule-based labels can then be used to build a classification model for large numbers of medical reports. With word2vec and other transfer learning techniques, we can get a good generalization performance.
AB - Purpose: When conducting machine learning algorithms on classification and detection of abnormalities for medical imaging, many researchers are faced with the problem that it is hard to get enough labeled data. This is especially difficult for modalities such as computed tomography (CT) with potentially 1000 or more slice images per case. To solve this problem, we plan to use machine learning algorithms to identify abnormalities within existing radiologist reports, thus creating case-level labels that may be used for weakly supervised training on the image data. We used a two-stage procedure to label the CT reports. In the first stage, a rule-based system labeled a smaller set of cases automatically with high accuracy. In the second stage, we developed machine learing algorithms using the labels from the rule-based system and word vectors learned without supervision from unlabeled CT reports. Method: In this study, we used approximately 24,000 CT reports from Duke University Health System. We initially focused on three organs, the lungs, liver/gallbladder, and kidneys. We first developed a rule-based system that can quickly identify certain types of abnormalities within CT reports with high accuracy. For each organ and disease combination, we produced several hundred cases with rule-based labels. These labels were combined with word vectors generated using word2vec from all the unlabeled reports to train two different machine learning algorithms: (a) average of word vectors merged by logistic regression, and (b) recurrent neural network (RNN). Result: Performance was evaluated by receiver operating characteristic (ROC) area under the curve (AUC) over an independent test set of 440 reports for which those organs were manually labeled as normal or abnormal by clinical experts. For lungs, the performance was 0.796 for average word vector and 0.827 for RNN. Liver performance was 0.683 for average word vector and 0.791 for RNN. For kidneys, it was 0.786 for average word vector and 0.928 for RNN. Conclusion: It is possible to label large numbers of cases automatically. These rule-based labels can then be used to build a classification model for large numbers of medical reports. With word2vec and other transfer learning techniques, we can get a good generalization performance.
KW - Computed Tomography
KW - Machine learning
KW - Natural language processing
KW - Transfer learning
UR - http://www.scopus.com/inward/record.url?scp=85068110752&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85068110752&partnerID=8YFLogxK
U2 - 10.1117/12.2513577
DO - 10.1117/12.2513577
M3 - Conference contribution
AN - SCOPUS:85068110752
T3 - Progress in Biomedical Optics and Imaging - Proceedings of SPIE
BT - Medical Imaging 2019
A2 - Mori, Kensaku
A2 - Hahn, Horst K.
PB - SPIE
T2 - Medical Imaging 2019: Computer-Aided Diagnosis
Y2 - 17 February 2019 through 20 February 2019
ER -