Abstract
We study contextual bandit learning for any competitor policy class and continuous action space. We obtain two qualitatively different regret bounds: one competes with a smoothed version of the policy class under no continuity assumptions, while the other requires standard Lipschitz assumptions. Both bounds exhibit data-dependent “zooming" behavior and, with no tuning, yield improved guarantees for benign problems. We also study adapting to unknown smoothness parameters, establishing a price-of-adaptivity and deriving optimal adaptive algorithms that require no additional information.
Original language | English (US) |
---|---|
Pages (from-to) | 2025-2027 |
Number of pages | 3 |
Journal | Proceedings of Machine Learning Research |
Volume | 99 |
State | Published - 2019 |
Externally published | Yes |
Event | 32nd Conference on Learning Theory, COLT 2019 - Phoenix, United States Duration: Jun 25 2019 → Jun 28 2019 |
Keywords
- Contextual bandits
- Lipschitz bandits
- Nonparametric learning
ASJC Scopus subject areas
- Artificial Intelligence
- Software
- Control and Systems Engineering
- Statistics and Probability