TY - GEN
T1 - Attention as activation
AU - Dai, Yimian
AU - Oehmcke, Stefan
AU - Gieseke, Fabian
AU - Wu, Yiquan
AU - Barnard, Kobus
N1 - Funding Information:
This work was supported in part by the National Natural Science Foundation of China under Grant No. 61573183, the Open Project Program of the National Laboratory of Pattern Recognition (NLPR) under Grant No. 201900029, the Nanjing University of Aeronautics and Astronautics PhD short-term visiting scholar project under Grant No. 180104DF03, the Excellent Chinese and Foreign Youth Exchange Program, China Association for Science and Technology, China Scholarship Council under Grant No. 201806830039.
Publisher Copyright:
© 2020 IEEE
PY - 2020
Y1 - 2020
N2 - Activation functions and attention mechanisms are typically treated as having different purposes and have evolved differently. However, both concepts can be formulated as a nonlinear gating function. Inspired by their similarity, we propose a novel type of activation units called attentional activation (ATAC) units as a unification of activation functions and attention mechanisms. In particular, we propose a local channel attention module for the simultaneous non-linear activation and element-wise feature refinement, which locally aggregates point-wise cross-channel feature contexts. By replacing the well-known rectified linear units by such ATAC units in convolutional networks, we can construct fully attentional networks that perform significantly better with a modest number of additional parameters. We conducted detailed ablation studies on the ATAC units using several host networks with varying network depths to empirically verify the effectiveness and efficiency of the units. Furthermore, we compared the performance of the ATAC units against existing activation functions as well as other attention mechanisms on the CIFAR-10, CIFAR-100, and ImageNet datasets. Our experimental results show that networks constructed with the proposed ATAC units generally yield performance gains over their competitors given a comparable number of parameters.
AB - Activation functions and attention mechanisms are typically treated as having different purposes and have evolved differently. However, both concepts can be formulated as a nonlinear gating function. Inspired by their similarity, we propose a novel type of activation units called attentional activation (ATAC) units as a unification of activation functions and attention mechanisms. In particular, we propose a local channel attention module for the simultaneous non-linear activation and element-wise feature refinement, which locally aggregates point-wise cross-channel feature contexts. By replacing the well-known rectified linear units by such ATAC units in convolutional networks, we can construct fully attentional networks that perform significantly better with a modest number of additional parameters. We conducted detailed ablation studies on the ATAC units using several host networks with varying network depths to empirically verify the effectiveness and efficiency of the units. Furthermore, we compared the performance of the ATAC units against existing activation functions as well as other attention mechanisms on the CIFAR-10, CIFAR-100, and ImageNet datasets. Our experimental results show that networks constructed with the proposed ATAC units generally yield performance gains over their competitors given a comparable number of parameters.
UR - http://www.scopus.com/inward/record.url?scp=85110521482&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85110521482&partnerID=8YFLogxK
U2 - 10.1109/ICPR48806.2021.9413020
DO - 10.1109/ICPR48806.2021.9413020
M3 - Conference contribution
AN - SCOPUS:85110521482
T3 - Proceedings - International Conference on Pattern Recognition
SP - 4131
EP - 4136
BT - Proceedings of ICPR 2020 - 25th International Conference on Pattern Recognition
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 25th International Conference on Pattern Recognition, ICPR 2020
Y2 - 10 January 2021 through 15 January 2021
ER -