TY - GEN
T1 - MUTUAL EFFORT FOR EFFICIENCY
T2 - 13th International Conference on Learning Representations, ICLR 2025
AU - Li, Sheng
AU - Tan, Qitao
AU - Dai, Yue
AU - Kong, Zhenglun
AU - Wang, Tianyu
AU - Liu, Jun
AU - Li, Ao
AU - Liu, Ninghao
AU - Ding, Yufei
AU - Tang, Xulong
AU - Yuan, Geng
N1 - Publisher Copyright:
© 2025 13th International Conference on Learning Representations, ICLR 2025. All rights reserved.
PY - 2025
Y1 - 2025
N2 - Self-supervised learning (SSL) offers a compelling solution to the challenge of extensive labeled data requirements in traditional supervised learning. With the proven success of Vision Transformers (ViTs) in supervised tasks, there is increasing interest in adapting them for SSL frameworks. However, the high computational demands of SSL pose substantial challenges, particularly on resource-limited platforms like edge devices, despite its ability to achieve high accuracy without labeled data. Recent studies in supervised learning have shown that token pruning can reduce training costs by removing less informative tokens without compromising accuracy. However, SSL's dual-branch encoders make traditional single-branch pruning strategies less effective, as they fail to account for the critical cross-branch similarity information, leading to reduced accuracy in SSL. To this end, we introduce SimPrune, a novel token pruning strategy designed for ViTs in SSL. SimPrune leverages cross-branch similarity information to efficiently prune tokens, retaining essential semantic information across dual branches. Additionally, we incorporate a difficulty-aware pruning strategy to further enhance SimPrune's effectiveness. Experimental results show that our proposed approach effectively reduces training computation while maintaining accuracy. Specifically, our approach offers 24% savings in training costs compared to SSL baseline, without sacrificing accuracy.
AB - Self-supervised learning (SSL) offers a compelling solution to the challenge of extensive labeled data requirements in traditional supervised learning. With the proven success of Vision Transformers (ViTs) in supervised tasks, there is increasing interest in adapting them for SSL frameworks. However, the high computational demands of SSL pose substantial challenges, particularly on resource-limited platforms like edge devices, despite its ability to achieve high accuracy without labeled data. Recent studies in supervised learning have shown that token pruning can reduce training costs by removing less informative tokens without compromising accuracy. However, SSL's dual-branch encoders make traditional single-branch pruning strategies less effective, as they fail to account for the critical cross-branch similarity information, leading to reduced accuracy in SSL. To this end, we introduce SimPrune, a novel token pruning strategy designed for ViTs in SSL. SimPrune leverages cross-branch similarity information to efficiently prune tokens, retaining essential semantic information across dual branches. Additionally, we incorporate a difficulty-aware pruning strategy to further enhance SimPrune's effectiveness. Experimental results show that our proposed approach effectively reduces training computation while maintaining accuracy. Specifically, our approach offers 24% savings in training costs compared to SSL baseline, without sacrificing accuracy.
UR - https://www.scopus.com/pages/publications/105010204694
UR - https://www.scopus.com/pages/publications/105010204694#tab=citedBy
M3 - Conference contribution
AN - SCOPUS:105010204694
T3 - 13th International Conference on Learning Representations, ICLR 2025
SP - 40063
EP - 40080
BT - 13th International Conference on Learning Representations, ICLR 2025
PB - International Conference on Learning Representations, ICLR
Y2 - 24 April 2025 through 28 April 2025
ER -