TY - JOUR
T1 - Classification of imbalanced oral cancer image data from high-risk population
AU - Song, Bofan
AU - Li, Shaobai
AU - Sunny, Sumsum
AU - Gurushanth, Keerthi
AU - Mendonca, Pramila
AU - Mukhia, Nirza
AU - Patrick, Sanjana
AU - Gurudath, Shubha
AU - Raghavan, Subhashini
AU - Tsusennaro, Imchen
AU - Leivon, Shirley T.
AU - Kolur, Trupti
AU - Shetty, Vivek
AU - Bushan, Vidya
AU - Ramesh, Rohan
AU - Peterson, Tyler
AU - Pillai, Vijay
AU - Wilder-Smith, Petra
AU - Sigamani, Alben
AU - Suresh, Amritha
AU - Kuriakose, Moni Abraham
AU - Birur, Praveen
AU - Liang, Rongguang
N1 - Publisher Copyright:
© The Authors. Published by SPIE under a Creative Commons Attribution 4.0 International License. Distribution or reproduction of this work in whole or in part requires full attribution of the original publication, including its DOI.
PY - 2021/10/1
Y1 - 2021/10/1
N2 - Significance: Early detection of oral cancer is vital for high-risk patients, and machine learning-based automatic classification is ideal for disease screening. However, current datasets collected from high-risk populations are unbalanced and often have detrimental effects on the performance of classification. Aim: To reduce the class bias caused by data imbalance. Approach: We collected 3851 polarized white light cheek mucosa images using our customized oral cancer screening device. We use weight balancing, data augmentation, undersampling, focal loss, and ensemble methods to improve the neural network performance of oral cancer image classification with the imbalanced multi-class datasets captured from high-risk populations during oral cancer screening in low-resource settings. Results: By applying both data-level and algorithm-level approaches to the deep learning training process, the performance of the minority classes, which were difficult to distinguish at the beginning, has been improved. The accuracy of "premalignancy"class is also increased, which is ideal for screening applications. Conclusions: Experimental results show that the class bias induced by imbalanced oral cancer image datasets could be reduced using both data- and algorithm-level methods. Our study may provide an important basis for helping understand the influence of unbalanced datasets on oral cancer deep learning classifiers and how to mitigate.
AB - Significance: Early detection of oral cancer is vital for high-risk patients, and machine learning-based automatic classification is ideal for disease screening. However, current datasets collected from high-risk populations are unbalanced and often have detrimental effects on the performance of classification. Aim: To reduce the class bias caused by data imbalance. Approach: We collected 3851 polarized white light cheek mucosa images using our customized oral cancer screening device. We use weight balancing, data augmentation, undersampling, focal loss, and ensemble methods to improve the neural network performance of oral cancer image classification with the imbalanced multi-class datasets captured from high-risk populations during oral cancer screening in low-resource settings. Results: By applying both data-level and algorithm-level approaches to the deep learning training process, the performance of the minority classes, which were difficult to distinguish at the beginning, has been improved. The accuracy of "premalignancy"class is also increased, which is ideal for screening applications. Conclusions: Experimental results show that the class bias induced by imbalanced oral cancer image datasets could be reduced using both data- and algorithm-level methods. Our study may provide an important basis for helping understand the influence of unbalanced datasets on oral cancer deep learning classifiers and how to mitigate.
KW - deep learning
KW - ensemble learning
KW - imbalanced multi-class datasets
KW - mobile screening device
KW - oral cancer
UR - http://www.scopus.com/inward/record.url?scp=85118701338&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85118701338&partnerID=8YFLogxK
U2 - 10.1117/1.JBO.26.10.105001
DO - 10.1117/1.JBO.26.10.105001
M3 - Article
C2 - 34689442
AN - SCOPUS:85118701338
SN - 1083-3668
VL - 26
JO - Journal of biomedical optics
JF - Journal of biomedical optics
IS - 10
M1 - 105001
ER -