MUTUAL EFFORT FOR EFFICIENCY: A SIMILARITY-BASED TOKEN PRUNING FOR VISION TRANSFORMERS IN SELF-SUPERVISED LEARNING

  • Sheng Li
  • , Qitao Tan
  • , Yue Dai
  • , Zhenglun Kong
  • , Tianyu Wang
  • , Jun Liu
  • , Ao Li
  • , Ninghao Liu
  • , Yufei Ding
  • , Xulong Tang
  • , Geng Yuan

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

Self-supervised learning (SSL) offers a compelling solution to the challenge of extensive labeled data requirements in traditional supervised learning. With the proven success of Vision Transformers (ViTs) in supervised tasks, there is increasing interest in adapting them for SSL frameworks. However, the high computational demands of SSL pose substantial challenges, particularly on resource-limited platforms like edge devices, despite its ability to achieve high accuracy without labeled data. Recent studies in supervised learning have shown that token pruning can reduce training costs by removing less informative tokens without compromising accuracy. However, SSL's dual-branch encoders make traditional single-branch pruning strategies less effective, as they fail to account for the critical cross-branch similarity information, leading to reduced accuracy in SSL. To this end, we introduce SimPrune, a novel token pruning strategy designed for ViTs in SSL. SimPrune leverages cross-branch similarity information to efficiently prune tokens, retaining essential semantic information across dual branches. Additionally, we incorporate a difficulty-aware pruning strategy to further enhance SimPrune's effectiveness. Experimental results show that our proposed approach effectively reduces training computation while maintaining accuracy. Specifically, our approach offers 24% savings in training costs compared to SSL baseline, without sacrificing accuracy.

Original languageEnglish (US)
Title of host publication13th International Conference on Learning Representations, ICLR 2025
PublisherInternational Conference on Learning Representations, ICLR
Pages40063-40080
Number of pages18
ISBN (Electronic)9798331320850
StatePublished - 2025
Event13th International Conference on Learning Representations, ICLR 2025 - Singapore, Singapore
Duration: Apr 24 2025Apr 28 2025

Publication series

Name13th International Conference on Learning Representations, ICLR 2025

Conference

Conference13th International Conference on Learning Representations, ICLR 2025
Country/TerritorySingapore
CitySingapore
Period4/24/254/28/25

ASJC Scopus subject areas

  • Language and Linguistics
  • Computer Science Applications
  • Education
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'MUTUAL EFFORT FOR EFFICIENCY: A SIMILARITY-BASED TOKEN PRUNING FOR VISION TRANSFORMERS IN SELF-SUPERVISED LEARNING'. Together they form a unique fingerprint.

Cite this