Contextual bandits with continuous actions: Smoothing, zooming, and adapting

Akshay Krishnamurthy, John Langford, Aleksandrs Slivkins, Chicheng Zhang

Research output: Contribution to journalConference articlepeer-review

14 Scopus citations

Abstract

We study contextual bandit learning for any competitor policy class and continuous action space. We obtain two qualitatively different regret bounds: one competes with a smoothed version of the policy class under no continuity assumptions, while the other requires standard Lipschitz assumptions. Both bounds exhibit data-dependent “zooming" behavior and, with no tuning, yield improved guarantees for benign problems. We also study adapting to unknown smoothness parameters, establishing a price-of-adaptivity and deriving optimal adaptive algorithms that require no additional information.

Original languageEnglish (US)
Pages (from-to)2025-2027
Number of pages3
JournalProceedings of Machine Learning Research
Volume99
StatePublished - 2019
Externally publishedYes
Event32nd Conference on Learning Theory, COLT 2019 - Phoenix, United States
Duration: Jun 25 2019Jun 28 2019

Keywords

  • Contextual bandits
  • Lipschitz bandits
  • Nonparametric learning

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability

Fingerprint

Dive into the research topics of 'Contextual bandits with continuous actions: Smoothing, zooming, and adapting'. Together they form a unique fingerprint.

Cite this