Thompson Sampling for Robust Transfer in Multi-Task Bandits

Zhi Wang, Chicheng Zhang, Kamalika Chaudhuri

Research output: Contribution to journalConference articlepeer-review

Abstract

We study the problem of online multi-task learning where the tasks are performed within similar but not necessarily identical multi-armed bandit environments. In particular, we study how a learner can improve its overall performance across multiple related tasks through robust transfer of knowledge. While an upper confidence bound (UCB)-based algorithm has recently been shown to achieve nearly-optimal performance guarantees in a setting where all tasks are solved concurrently, it remains unclear whether Thompson sampling (TS) algorithms, which have superior empirical performance in general, share similar theoretical properties. In this work, we present a TS-type algorithm for a more general online multi-task learning protocol, which extends the concurrent setting. We provide its frequentist analysis and prove that it is also nearly-optimal using a novel concentration inequality for multi-task data aggregation at random stopping times. Finally, we evaluate the algorithm on synthetic data and show that the TS-type algorithm enjoys superior empirical performance in comparison with the UCB-based algorithm and a baseline algorithm that performs TS for each individual task without transfer.

Original languageEnglish (US)
Pages (from-to)23363-23416
Number of pages54
JournalProceedings of Machine Learning Research
Volume162
StatePublished - 2022
Event39th International Conference on Machine Learning, ICML 2022 - Baltimore, United States
Duration: Jul 17 2022Jul 23 2022

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability

Fingerprint

Dive into the research topics of 'Thompson Sampling for Robust Transfer in Multi-Task Bandits'. Together they form a unique fingerprint.

Cite this