TY - GEN
T1 - Deep self-attention for sequential recommendation
AU - Zhang, Beichuan
AU - Xiao, Zhijiao
AU - Zhong, Shenghua
N1 - Funding Information:
This work was supported by the National Natural Science Foundation of China [No. 62002230], the Natural Science Foundation of Guangdong Province [No. 2019A1515011181], the Science and Technology Innovation Commission of Shenzhen under Grant [No. JCYJ20190808162613130], and the Shenzhen high-level talents program.
Publisher Copyright:
© 2021 Knowledge Systems Institute Graduate School. All rights reserved.
PY - 2021
Y1 - 2021
N2 - Sequential recommendation aims to recommend the next item that a user will likely interact with by capturing the useful sequential patterns from users' historical behaviors. Recently, it has become an important and popular component in various e-commerce platforms. As a successful network, Transformer has been widely used to adaptively capture the dynamics of users' historical behaviors for sequential recommendation. In recommender systems, the size of embedding is usually set to be small. Under small embedding, the dot-product in Transformer may have the limitation on calculating the complex relevance between keys and queries. To address the common but neglected issue, in this paper, we present a new model, Deep Self-Attention for Sequential Recommendation (DSASrec), which proposes a chunking deep attention to compute attention weights. The chunking deep attention has two modules: a deep module and a chunking module. The deep module is used to improve the nonlinearity of the attention function. The chunking module is used to calculate attention weights several times like the multi-head attention in Transformer. Extensive experiments on three benchmark datasets show that our model can achieve state-of-the-art results. Our implementation is available in PyTorch.
AB - Sequential recommendation aims to recommend the next item that a user will likely interact with by capturing the useful sequential patterns from users' historical behaviors. Recently, it has become an important and popular component in various e-commerce platforms. As a successful network, Transformer has been widely used to adaptively capture the dynamics of users' historical behaviors for sequential recommendation. In recommender systems, the size of embedding is usually set to be small. Under small embedding, the dot-product in Transformer may have the limitation on calculating the complex relevance between keys and queries. To address the common but neglected issue, in this paper, we present a new model, Deep Self-Attention for Sequential Recommendation (DSASrec), which proposes a chunking deep attention to compute attention weights. The chunking deep attention has two modules: a deep module and a chunking module. The deep module is used to improve the nonlinearity of the attention function. The chunking module is used to calculate attention weights several times like the multi-head attention in Transformer. Extensive experiments on three benchmark datasets show that our model can achieve state-of-the-art results. Our implementation is available in PyTorch.
KW - Chunking representation
KW - Deep learning
KW - Dot-product
KW - Recommender system
KW - Transformer
UR - http://www.scopus.com/inward/record.url?scp=85114271300&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85114271300&partnerID=8YFLogxK
U2 - 10.18293/SEKE2021-035
DO - 10.18293/SEKE2021-035
M3 - Conference contribution
AN - SCOPUS:85114271300
T3 - Proceedings of the International Conference on Software Engineering and Knowledge Engineering, SEKE
SP - 321
EP - 326
BT - Proceedings - SEKE 2021
PB - Knowledge Systems Institute Graduate School
T2 - 33rd International Conference on Software Engineering and Knowledge Engineering, SEKE 2021
Y2 - 1 July 2021 through 10 July 2021
ER -