Skip to main navigation Skip to search Skip to main content

Lightweight TransUNet with Knowledge Distillation for Efficient Medical Image Segmentation

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Recent advances in computer-assisted interventions and postoperative surgical video analysis have advanced significantly, contributing to improvements in surgical planning, skill assessments, and training. These advances have transformed the surgical landscape by enabling near real-time segmentation of medical images and providing decision support systems that offer critical guidance and assistance to surgeons of all levels of experience. One of the leading deep neural network models used in medical image analysis is TransUnet, which combines the strengths of Transformers and U-Net architecture models. Leveraging this hybrid architecture, TransUNet has achieved superior performance in a variety of medical segmentation tasks. However, its complexity and computational demands, largely inherited from the Transformer model, introduce challenges in terms of high model complexity and inference efficiency. Such challenges limit its deployment in clinical settings that require real-time processing. To address these limitations, we propose an efficient approach that incorporates knowledge distillation alongside a modified architecture of TransUNet. Specifically, we replace the inherited Multi-Head Self-Attention (MHSA) with a Single-Head Self-Attention (SHSA) mechanism to overcome the quadratic computational complexity of the MHSA, and then we train the most optimized lightweight TransUNet (student) model to mimic a high-performing teacher model of the TransUNet through the knowledge distillation process. This scheme effectively reduces the complexity of the student model while maintaining accurate segmentation results, thus enabling real-time performance in clinical settings. In our experiments, we evaluated our approach against the benchmarking Multi-Atlas Labeling Beyond the Cranial Vault (BTCV) dataset and Cataract-1K dataset, demonstrating that our distilled model with SHSA achieves an improved trade-off between accuracy and latency, making it more suitable for practical deployment in surgical environments.

Original languageEnglish (US)
Title of host publicationReal-Time Image Processing and Deep Learning 2025
EditorsNasser Kehtarnavaz, Mukul V. Shirvaikar
PublisherSPIE
ISBN (Electronic)9781510687059
DOIs
StatePublished - 2025
EventReal-Time Image Processing and Deep Learning 2025 - Orlando, United States
Duration: Apr 14 2025Apr 15 2025

Publication series

NameProceedings of SPIE - The International Society for Optical Engineering
Volume13458
ISSN (Print)0277-786X
ISSN (Electronic)1996-756X

Conference

ConferenceReal-Time Image Processing and Deep Learning 2025
Country/TerritoryUnited States
CityOrlando
Period4/14/254/15/25

Keywords

  • Attention Mechanism
  • Knowledge Distillation
  • Medical Image Analysis
  • Real-Time Segmentation
  • TransUNet

ASJC Scopus subject areas

  • Electronic, Optical and Magnetic Materials
  • Condensed Matter Physics
  • Computer Science Applications
  • Applied Mathematics
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Lightweight TransUNet with Knowledge Distillation for Efficient Medical Image Segmentation'. Together they form a unique fingerprint.

Cite this