TY - GEN
T1 - SPViT
T2 - 17th European Conference on Computer Vision, ECCV 2022
AU - Kong, Zhenglun
AU - Dong, Peiyan
AU - Ma, Xiaolong
AU - Meng, Xin
AU - Niu, Wei
AU - Sun, Mengshu
AU - Shen, Xuan
AU - Yuan, Geng
AU - Ren, Bin
AU - Tang, Hao
AU - Qin, Minghai
AU - Wang, Yanzhi
N1 - Publisher Copyright:
© 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2022
Y1 - 2022
N2 - Recently, Vision Transformer (ViT) has continuously established new milestones in the computer vision field, while the high computation and memory cost makes its propagation in industrial production difficult. Considering the computation complexity, the internal data pattern of ViTs, and the edge device deployment, we propose a latency-aware soft token pruning framework, SPViT, which can be set up on vanilla Transformers of both flatten and hierarchical structures, such as DeiTs and Swin-Transformers (Swin). More concretely, we design a dynamic attention-based multi-head token selector, which is a lightweight module for adaptive instance-wise token selection. We further introduce a soft pruning technique, which integrates the less informative tokens chosen by the selector module into a package token rather than discarding them completely. SPViT is bound to the trade-off between accuracy and latency requirements of specific edge devices through our proposed latency-aware training strategy. Experiment results show that SPViT significantly reduces the computation cost of ViTs with comparable performance on image classification. Moreover, SPViT can guarantee the identified model meets the latency specifications of mobile devices and FPGA, and even achieve the real-time execution of DeiT-T on mobile devices. For example, SPViT reduces the latency of DeiT-T to 26 ms (26%−41% superior to existing works) on the mobile device with 0.25%−4% higher top-1 accuracy on ImageNet. Our code is released at https://github.com/PeiyanFlying/SPViT.
AB - Recently, Vision Transformer (ViT) has continuously established new milestones in the computer vision field, while the high computation and memory cost makes its propagation in industrial production difficult. Considering the computation complexity, the internal data pattern of ViTs, and the edge device deployment, we propose a latency-aware soft token pruning framework, SPViT, which can be set up on vanilla Transformers of both flatten and hierarchical structures, such as DeiTs and Swin-Transformers (Swin). More concretely, we design a dynamic attention-based multi-head token selector, which is a lightweight module for adaptive instance-wise token selection. We further introduce a soft pruning technique, which integrates the less informative tokens chosen by the selector module into a package token rather than discarding them completely. SPViT is bound to the trade-off between accuracy and latency requirements of specific edge devices through our proposed latency-aware training strategy. Experiment results show that SPViT significantly reduces the computation cost of ViTs with comparable performance on image classification. Moreover, SPViT can guarantee the identified model meets the latency specifications of mobile devices and FPGA, and even achieve the real-time execution of DeiT-T on mobile devices. For example, SPViT reduces the latency of DeiT-T to 26 ms (26%−41% superior to existing works) on the mobile device with 0.25%−4% higher top-1 accuracy on ImageNet. Our code is released at https://github.com/PeiyanFlying/SPViT.
KW - FPGA
KW - Hardware acceleration
KW - Mobile devices
KW - Model compression
KW - Vision transformer
UR - https://www.scopus.com/pages/publications/85142778327
UR - https://www.scopus.com/pages/publications/85142778327#tab=citedBy
U2 - 10.1007/978-3-031-20083-0_37
DO - 10.1007/978-3-031-20083-0_37
M3 - Conference contribution
AN - SCOPUS:85142778327
SN - 9783031200823
T3 - Lecture Notes in Computer Science
SP - 620
EP - 640
BT - Computer Vision – ECCV 2022 - 17th European Conference, Proceedings
A2 - Avidan, Shai
A2 - Brostow, Gabriel
A2 - Cissé, Moustapha
A2 - Farinella, Giovanni Maria
A2 - Hassner, Tal
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 23 October 2022 through 27 October 2022
ER -