Abstract
We study contextual bandit learning for any competitor policy class and continuous action space. We obtain two qualitatively different regret bounds: one competes with a smoothed version of the policy class under no continuity assumptions, while the other requires standard Lipschitz assumptions. Both bounds exhibit data-dependent “zooming" behavior and, with no tuning, yield improved guarantees for benign problems. We also study adapting to unknown smoothness parameters, establishing a price-of-adaptivity and deriving optimal adaptive algorithms that require no additional information.
| Original language | English (US) |
|---|---|
| Pages (from-to) | 2025-2027 |
| Number of pages | 3 |
| Journal | Proceedings of Machine Learning Research |
| Volume | 99 |
| State | Published - 2019 |
| Externally published | Yes |
| Event | 32nd Conference on Learning Theory, COLT 2019 - Phoenix, United States Duration: Jun 25 2019 → Jun 28 2019 |
Keywords
- Contextual bandits
- Lipschitz bandits
- Nonparametric learning
ASJC Scopus subject areas
- Software
- Control and Systems Engineering
- Statistics and Probability
- Artificial Intelligence