Comparison of evaluation metrics of deep learning for imbalanced imaging data in osteoarthritis studies

Shen Liu, Frank Roemer, Yong Ge, Edward J. Bedrick, Zong Ming Li, Ali Guermazi, Leena Sharma, Charles Eaton, Marc C. Hochberg, David J. Hunter, Michael C. Nevitt, Wolfgang Wirth, C. Kent Kwoh, Xiaoxiao Sun

Research output: Contribution to journalArticlepeer-review

Abstract

Purpose: To compare the evaluation metrics for deep learning methods that were developed using imbalanced imaging data in osteoarthritis studies. Materials and methods: This retrospective study utilized 2996 sagittal intermediate-weighted fat-suppressed knee MRIs with MRI Osteoarthritis Knee Score readings from 2467 participants in the Osteoarthritis Initiative study. We obtained probabilities of the presence of bone marrow lesions (BMLs) from MRIs in the testing dataset at the sub-region (15 sub-regions), compartment, and whole-knee levels based on the trained deep learning models. We compared different evaluation metrics (e.g., receiver operating characteristic (ROC) and precision-recall (PR) curves) in the testing dataset with various class ratios (presence of BMLs vs. absence of BMLs) at these three data levels to assess the model's performance. Results: In a subregion with an extremely high imbalance ratio, the model achieved a ROC-AUC of 0.84, a PR-AUC of 0.10, a sensitivity of 0, and a specificity of 1. Conclusion: The commonly used ROC curve is not sufficiently informative, especially in the case of imbalanced data. We provide the following practical suggestions based on our data analysis: 1) ROC-AUC is recommended for balanced data, 2) PR-AUC should be used for moderately imbalanced data (i.e., when the proportion of the minor class is above 5% and less than 50%), and 3) for severely imbalanced data (i.e., when the proportion of the minor class is below 5%), it is not practical to apply a deep learning model, even with the application of techniques addressing imbalanced data issues.

Original languageEnglish (US)
Pages (from-to)1242-1248
Number of pages7
JournalOsteoarthritis and Cartilage
Volume31
Issue number9
DOIs
StatePublished - Sep 2023

Keywords

  • Bone marrow lesion
  • Deep learning
  • Imbalanced data
  • Osteoarthritis
  • Precision recall curve
  • Receiver operating characteristic

ASJC Scopus subject areas

  • Rheumatology
  • Biomedical Engineering
  • Orthopedics and Sports Medicine

Fingerprint

Dive into the research topics of 'Comparison of evaluation metrics of deep learning for imbalanced imaging data in osteoarthritis studies'. Together they form a unique fingerprint.

Cite this