TY - JOUR
T1 - Scalable generalized linear bandits
T2 - 31st Annual Conference on Neural Information Processing Systems, NIPS 2017
AU - Jun, Kwang Sung
AU - Bhargava, Aniruddha
AU - Nowak, Robert
AU - Willett, Rebecca
N1 - Funding Information:
Acknowledgments This work was partially supported by the NSF grant IIS-1447449 and the MURI grant 2015-05174-04. The authors thank Yasin Abbasi-Yadkori and Anshumali Shrivastava for providing constructive feedback and Xin Hunt for her contribution at the initial stage.
Publisher Copyright:
© 2017 Neural information processing systems foundation. All rights reserved.
PY - 2017
Y1 - 2017
N2 - Generalized Linear Bandits (GLBs), a natural extension of the stochastic linear bandits, has been popular and successful in recent years. However, existing GLBs scale poorly with the number of rounds and the number of arms, limiting their utility in practice. This paper proposes new, scalable solutions to the GLB problem in two respects. First, unlike existing GLBs, whose per-time-step space and time complexity grow at least linearly with time t, we propose a new algorithm that performs online computations to enjoy a constant space and time complexity. At its heart is a novel Generalized Linear extension of the Online-to-confidence-set Conversion (GLOC method) that takes any online learning algorithm and turns it into a GLB algorithm. As a special case, we apply GLOC to the online Newton step algorithm, which results in a low-regret GLB algorithm with much lower time and memory complexity than prior work. Second, for the case where the number N of arms is very large, we propose new algorithms in which each next arm is selected via an inner product search. Such methods can be implemented via hashing algorithms (i.e., "hash-amenable") and result in a time complexity sublinear in N. While a Thompson sampling extension of GLOC is hash-amenable, its regret bound for d-dimensional arm sets scales with d3/2, whereas GLOC's regret bound scales with d. Towards closing this gap, we propose a new hash-amenable algorithm whose regret bound scales with d5/4. Finally, we propose a fast approximate hash-key computation (inner product) with a better accuracy than the state-of-the-art, which can be of independent interest. We conclude the paper with preliminary experimental results confirming the merits of our methods.
AB - Generalized Linear Bandits (GLBs), a natural extension of the stochastic linear bandits, has been popular and successful in recent years. However, existing GLBs scale poorly with the number of rounds and the number of arms, limiting their utility in practice. This paper proposes new, scalable solutions to the GLB problem in two respects. First, unlike existing GLBs, whose per-time-step space and time complexity grow at least linearly with time t, we propose a new algorithm that performs online computations to enjoy a constant space and time complexity. At its heart is a novel Generalized Linear extension of the Online-to-confidence-set Conversion (GLOC method) that takes any online learning algorithm and turns it into a GLB algorithm. As a special case, we apply GLOC to the online Newton step algorithm, which results in a low-regret GLB algorithm with much lower time and memory complexity than prior work. Second, for the case where the number N of arms is very large, we propose new algorithms in which each next arm is selected via an inner product search. Such methods can be implemented via hashing algorithms (i.e., "hash-amenable") and result in a time complexity sublinear in N. While a Thompson sampling extension of GLOC is hash-amenable, its regret bound for d-dimensional arm sets scales with d3/2, whereas GLOC's regret bound scales with d. Towards closing this gap, we propose a new hash-amenable algorithm whose regret bound scales with d5/4. Finally, we propose a fast approximate hash-key computation (inner product) with a better accuracy than the state-of-the-art, which can be of independent interest. We conclude the paper with preliminary experimental results confirming the merits of our methods.
UR - http://www.scopus.com/inward/record.url?scp=85047002400&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85047002400&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85047002400
SN - 1049-5258
VL - 2017-December
SP - 99
EP - 109
JO - Advances in Neural Information Processing Systems
JF - Advances in Neural Information Processing Systems
Y2 - 4 December 2017 through 9 December 2017
ER -