Abstract
We study contextual bandit learning with an abstract policy class and continuous action space. We obtain two qualitatively different regret bounds: one competes with a smoothed version of the policy class under no continuity assumptions, while the other requires standard Lipschitz assumptions. Both bounds exhibit data-dependent "zooming"behavior and, with no tuning, yield improved guarantees for benign problems. We also study adapting to unknown smoothness parameters, establishing a price-of-adaptivity and deriving optimal adaptive algorithms that require no additional information.
| Original language | English (US) |
|---|---|
| Journal | Journal of Machine Learning Research |
| Volume | 21 |
| State | Published - Jul 2020 |
Keywords
- Contextual bandits
- Nonparametric learning
ASJC Scopus subject areas
- Software
- Artificial Intelligence
- Control and Systems Engineering
- Statistics and Probability
Fingerprint
Dive into the research topics of 'Contextual bandits with continuous actions: Smoothing, zooming, and adapting'. Together they form a unique fingerprint.Cite this
- APA
- Standard
- Harvard
- Vancouver
- Author
- BIBTEX
- RIS