TY - JOUR
T1 - Interpretable deep learning approach for oral cancer classification using guided attention inference network
AU - Figueroa, Kevin Chew
AU - Song, Bofan
AU - Sunny, Sumsum
AU - Li, Shaobai
AU - Gurushanth, Keerthi
AU - Mendonca, Pramila
AU - Mukhia, Nirza
AU - Patrick, Sanjana
AU - Gurudath, Shubha
AU - Raghavan, Subhashini
AU - Imchen, Tsusennaro
AU - Leivon, Shirley T.
AU - Kolur, Trupti
AU - Shetty, Vivek
AU - Bushan, Vidya
AU - Ramesh, Rohan
AU - Pillai, Vijay
AU - Wilder-Smith, Petra
AU - Sigamani, Alben
AU - Suresh, Amritha
AU - Kuriakose, Moni Abraham
AU - Birur, Praveen
AU - Liang, Rongguang
N1 - Funding Information:
This work was supported by the National Institute of Biomedical Imaging and Bioengineering (No. UH2EB022623), the National Institute of Cancers (No. UH3CA239682), and the National Institute of Dental and Craniofacial Research (No. R01DE030682) of National Institutes of Health (NIH).
Publisher Copyright:
© The Authors.
PY - 2022/1/1
Y1 - 2022/1/1
N2 - Significance: Convolutional neural networks (CNNs) show the potential for automated classification of different cancer lesions. However, their lack of interpretability and explainability makes CNNs less than understandable. Furthermore, CNNs may incorrectly concentrate on other areas surrounding the salient object, rather than the network's attention focusing directly on the object to be recognized, as the network has no incentive to focus solely on the correct subjects to be detected. This inhibits the reliability of CNNs, especially for biomedical applications. Aim: Develop a deep learning training approach that could provide understandability to its predictions and directly guide the network to concentrate its attention and accurately delineate cancerous regions of the image. Approach: We utilized Selvaraju et al.'s gradient-weighted class activation mapping to inject interpretability and explainability into CNNs. We adopted a two-stage training process with data augmentation techniques and Li et al.'s guided attention inference network (GAIN) to train images captured using our customized mobile oral screening devices. The GAIN architecture consists of three streams of network training: classification stream, attention mining stream, and bounding box stream. By adopting the GAIN training architecture, we jointly optimized the classification and segmentation accuracy of our CNN by treating these attention maps as reliable priors to develop attention maps with more complete and accurate segmentation. Results: The network's attention map will help us to actively understand what the network is focusing on and looking at during its decision-making process. The results also show that the proposed method could guide the trained neural network to highlight and focus its attention on the correct lesion areas in the images when making a decision, rather than focusing its attention on relevant yet incorrect regions. Conclusions: We demonstrate the effectiveness of our approach for more interpretable and reliable oral potentially malignant lesion and malignant lesion classification.
AB - Significance: Convolutional neural networks (CNNs) show the potential for automated classification of different cancer lesions. However, their lack of interpretability and explainability makes CNNs less than understandable. Furthermore, CNNs may incorrectly concentrate on other areas surrounding the salient object, rather than the network's attention focusing directly on the object to be recognized, as the network has no incentive to focus solely on the correct subjects to be detected. This inhibits the reliability of CNNs, especially for biomedical applications. Aim: Develop a deep learning training approach that could provide understandability to its predictions and directly guide the network to concentrate its attention and accurately delineate cancerous regions of the image. Approach: We utilized Selvaraju et al.'s gradient-weighted class activation mapping to inject interpretability and explainability into CNNs. We adopted a two-stage training process with data augmentation techniques and Li et al.'s guided attention inference network (GAIN) to train images captured using our customized mobile oral screening devices. The GAIN architecture consists of three streams of network training: classification stream, attention mining stream, and bounding box stream. By adopting the GAIN training architecture, we jointly optimized the classification and segmentation accuracy of our CNN by treating these attention maps as reliable priors to develop attention maps with more complete and accurate segmentation. Results: The network's attention map will help us to actively understand what the network is focusing on and looking at during its decision-making process. The results also show that the proposed method could guide the trained neural network to highlight and focus its attention on the correct lesion areas in the images when making a decision, rather than focusing its attention on relevant yet incorrect regions. Conclusions: We demonstrate the effectiveness of our approach for more interpretable and reliable oral potentially malignant lesion and malignant lesion classification.
KW - Guided attention inference network
KW - Interpretable deep learning
KW - Oral cancer
UR - http://www.scopus.com/inward/record.url?scp=85123567795&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85123567795&partnerID=8YFLogxK
U2 - 10.1117/1.JBO.27.1.015001
DO - 10.1117/1.JBO.27.1.015001
M3 - Article
C2 - 35023333
AN - SCOPUS:85123567795
VL - 27
JO - Journal of Biomedical Optics
JF - Journal of Biomedical Optics
SN - 1083-3668
IS - 1
M1 - 015001
ER -