Regret Analysis of Stochastic Multi-armed Bandit Problem with Clustered Information Feedback

Tianchi Zhao, Bo Jiang, Ming Li, Ravi Tandon

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

In this paper, we analyze the regret bound of Multi-armed Bandit (MAB) algorithms under the setting where the payoffs of an arbitrary-size cluster of arms are observable in each round. Compared to the well-studied bandit or full feedback setting, where the payoffs of the selected arm or all the arms are observable, the clustered feedback setting can be viewed as a generalization and a connection. We focus on two most representative MAB algorithms: Upper Confidence Bound and Thompson sampling, and adapt them into the clustered feedback setting. Then, we theoretically derive the regret bound for each of them considering the general type of payoffs (value comes from continuous domains). We show that the regret bounds of these two algorithms with clustered information feedback depend only on the number of clusters. Finally, we simulate both synthetic data and real-world data to compare the performance of these algorithms with different numbers of observable payoffs in each round, the results validate our analysis.

Original languageEnglish (US)
Title of host publication2020 International Joint Conference on Neural Networks, IJCNN 2020 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781728169262
DOIs
StatePublished - Jul 2020
Event2020 International Joint Conference on Neural Networks, IJCNN 2020 - Virtual, Glasgow, United Kingdom
Duration: Jul 19 2020Jul 24 2020

Publication series

NameProceedings of the International Joint Conference on Neural Networks

Conference

Conference2020 International Joint Conference on Neural Networks, IJCNN 2020
Country/TerritoryUnited Kingdom
CityVirtual, Glasgow
Period7/19/207/24/20

Keywords

  • clustered information feedback
  • multi-armed bandit
  • problem-dependent bound.
  • regret bound

ASJC Scopus subject areas

  • Software
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Regret Analysis of Stochastic Multi-armed Bandit Problem with Clustered Information Feedback'. Together they form a unique fingerprint.

Cite this