Abstract
We propose a novel linear bandit algorithm called LinMED (Linear Minimum Empirical Divergence), which is a linear extension of the MED algorithm that was originally designed for multi-armed bandits. LinMED is a randomized algorithm that admits a closed-form computation of the arm sampling probabilities, unlike the popular randomized algorithm called linear Thompson sampling. Such a feature proves useful for off-policy evaluation where the unbiased evaluation requires accurately computing the sampling probability. We prove that LinMED enjoys a near-optimal regret bound of d√n up to logarithmic factors where d is the dimension and n is the time horizon. We further show that LinMED enjoys a d2/∆ (log2(n)) log (log(n)) problem-dependent regret where ∆ is the smallest sub-optimality gap. Our empirical study shows that LinMED has a competitive performance with the state-of-the-art algorithms.
| Original language | English (US) |
|---|---|
| Pages (from-to) | 1585-1593 |
| Number of pages | 9 |
| Journal | Proceedings of Machine Learning Research |
| Volume | 258 |
| State | Published - 2025 |
| Event | 28th International Conference on Artificial Intelligence and Statistics, AISTATS 2025 - Mai Khao, Thailand Duration: May 3 2025 → May 5 2025 |
ASJC Scopus subject areas
- Software
- Control and Systems Engineering
- Statistics and Probability
- Artificial Intelligence
Fingerprint
Dive into the research topics of 'Minimum Empirical Divergence for Sub-Gaussian Linear Bandits'. Together they form a unique fingerprint.Cite this
- APA
- Standard
- Harvard
- Vancouver
- Author
- BIBTEX
- RIS