Anytime exploration for multi-armed bandits using confidence information

Kwang Sung Jun, Robert Nowak

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations


We introduce anytime Explore-m, a pure exploration problem for multi-armed bandits (MAB) that requires making a prediction of the topm arms at every time step. Anytime Explorers is more practical than fixed budget or fixed confidence formulations of the top-m problem, since many applications involve a finite, but unpredictable, budget. However, the development and analysis of anytime algorithms present many challenges. We propose AT-LUCB (AnyTime Lower and Upper Confidence Bound), the first nontrivial algorithm that provably solves anytime Explore-m. Our analysis shows that the sample complexity of AT-LUCB is competitive to anytime variants of existing algorithms. Moreover, our empirical evaluation on AT-LUCB shows that AT-LUCB performs as well as or better than state-of-the-art baseline methods for anytime Explore-m.

Original languageEnglish (US)
Title of host publication33rd International Conference on Machine Learning, ICML 2016
EditorsKilian Q. Weinberger, Maria Florina Balcan
PublisherInternational Machine Learning Society (IMLS)
Number of pages15
ISBN (Electronic)9781510829008
StatePublished - 2016
Externally publishedYes
Event33rd International Conference on Machine Learning, ICML 2016 - New York City, United States
Duration: Jun 19 2016Jun 24 2016

Publication series

Name33rd International Conference on Machine Learning, ICML 2016


Conference33rd International Conference on Machine Learning, ICML 2016
Country/TerritoryUnited States
CityNew York City

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software
  • Computer Networks and Communications

Cite this