Anytime exploration for multi-armed bandits using confidence information

Kwang Sung Jun, Robert Nowak

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations

Abstract

We introduce anytime Explore-m, a pure exploration problem for multi-armed bandits (MAB) that requires making a prediction of the topm arms at every time step. Anytime Explorers is more practical than fixed budget or fixed confidence formulations of the top-m problem, since many applications involve a finite, but unpredictable, budget. However, the development and analysis of anytime algorithms present many challenges. We propose AT-LUCB (AnyTime Lower and Upper Confidence Bound), the first nontrivial algorithm that provably solves anytime Explore-m. Our analysis shows that the sample complexity of AT-LUCB is competitive to anytime variants of existing algorithms. Moreover, our empirical evaluation on AT-LUCB shows that AT-LUCB performs as well as or better than state-of-the-art baseline methods for anytime Explore-m.

Original languageEnglish (US)
Title of host publication33rd International Conference on Machine Learning, ICML 2016
EditorsKilian Q. Weinberger, Maria Florina Balcan
PublisherInternational Machine Learning Society (IMLS)
Pages1530-1544
Number of pages15
ISBN (Electronic)9781510829008
StatePublished - 2016
Externally publishedYes
Event33rd International Conference on Machine Learning, ICML 2016 - New York City, United States
Duration: Jun 19 2016Jun 24 2016

Publication series

Name33rd International Conference on Machine Learning, ICML 2016
Volume3

Conference

Conference33rd International Conference on Machine Learning, ICML 2016
Country/TerritoryUnited States
CityNew York City
Period6/19/166/24/16

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software
  • Computer Networks and Communications

Cite this