TY - JOUR
T1 - Thompson Sampling for Robust Transfer in Multi-Task Bandits
AU - Wang, Zhi
AU - Zhang, Chicheng
AU - Chaudhuri, Kamalika
N1 - Funding Information:
We thank Geelon So for insightful discussions. ZW and KC thank the National Science Foundation under IIS 1915734 and CCF 1719133 for research support. CZ acknowledges startup funding support from the University of Arizona. We also thank the anonymous reviewers for their constructive feedback.
Publisher Copyright:
Copyright © 2022 by the author(s)
PY - 2022
Y1 - 2022
N2 - We study the problem of online multi-task learning where the tasks are performed within similar but not necessarily identical multi-armed bandit environments. In particular, we study how a learner can improve its overall performance across multiple related tasks through robust transfer of knowledge. While an upper confidence bound (UCB)-based algorithm has recently been shown to achieve nearly-optimal performance guarantees in a setting where all tasks are solved concurrently, it remains unclear whether Thompson sampling (TS) algorithms, which have superior empirical performance in general, share similar theoretical properties. In this work, we present a TS-type algorithm for a more general online multi-task learning protocol, which extends the concurrent setting. We provide its frequentist analysis and prove that it is also nearly-optimal using a novel concentration inequality for multi-task data aggregation at random stopping times. Finally, we evaluate the algorithm on synthetic data and show that the TS-type algorithm enjoys superior empirical performance in comparison with the UCB-based algorithm and a baseline algorithm that performs TS for each individual task without transfer.
AB - We study the problem of online multi-task learning where the tasks are performed within similar but not necessarily identical multi-armed bandit environments. In particular, we study how a learner can improve its overall performance across multiple related tasks through robust transfer of knowledge. While an upper confidence bound (UCB)-based algorithm has recently been shown to achieve nearly-optimal performance guarantees in a setting where all tasks are solved concurrently, it remains unclear whether Thompson sampling (TS) algorithms, which have superior empirical performance in general, share similar theoretical properties. In this work, we present a TS-type algorithm for a more general online multi-task learning protocol, which extends the concurrent setting. We provide its frequentist analysis and prove that it is also nearly-optimal using a novel concentration inequality for multi-task data aggregation at random stopping times. Finally, we evaluate the algorithm on synthetic data and show that the TS-type algorithm enjoys superior empirical performance in comparison with the UCB-based algorithm and a baseline algorithm that performs TS for each individual task without transfer.
UR - http://www.scopus.com/inward/record.url?scp=85163107058&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85163107058&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85163107058
SN - 2640-3498
VL - 162
SP - 23363
EP - 23416
JO - Proceedings of Machine Learning Research
JF - Proceedings of Machine Learning Research
T2 - 39th International Conference on Machine Learning, ICML 2022
Y2 - 17 July 2022 through 23 July 2022
ER -