Contextual bandits with continuous actions: Smoothing, zooming, and adapting

Akshay Krishnamurthy, John Langford, Aleksandrs Slivkins, Chicheng Zhang

Research output: Contribution to journalArticlepeer-review

13 Scopus citations

Abstract

We study contextual bandit learning with an abstract policy class and continuous action space. We obtain two qualitatively different regret bounds: one competes with a smoothed version of the policy class under no continuity assumptions, while the other requires standard Lipschitz assumptions. Both bounds exhibit data-dependent "zooming"behavior and, with no tuning, yield improved guarantees for benign problems. We also study adapting to unknown smoothness parameters, establishing a price-of-adaptivity and deriving optimal adaptive algorithms that require no additional information.

Original languageEnglish (US)
JournalJournal of Machine Learning Research
Volume21
StatePublished - Jul 2020

Keywords

  • Contextual bandits
  • Nonparametric learning

ASJC Scopus subject areas

  • Software
  • Artificial Intelligence
  • Control and Systems Engineering
  • Statistics and Probability

Fingerprint

Dive into the research topics of 'Contextual bandits with continuous actions: Smoothing, zooming, and adapting'. Together they form a unique fingerprint.

Cite this