Abstract
We introduce the bilinear bandit problem with low-rank structure in which an action takes the form of a pair of arms from two different entity types, and the reward is a bilinear function of the known feature vectors of the arms. The unknown in the problem is a d1 by d2 matrix (formula presented) that defines the reward, and has low rank (formula presented). Determination of (formula presented) with this low-rank structure poses a significant challenge in finding the right exploration-exploitation tradeoff. In this work, we propose a new two-stage algorithm called “Explore-Subspace-Then-Refine” (ESTR). The first stage is an explicit subspace exploration, while the second stage is a linear bandit algorithm called “almost-low-dimensional OFUL” (LowOFUL) that exploits and further refines the estimated subspace via a regularization technique. We show that the regret of ESTR is (formula presented) where Õ hides logarithmic factors and T is the time horizon, which improves upon the regret of(formula presented) attained for a naïve linear bandit reduction. We conjecture that the regret bound of ESTR is unimprovable up to polylogarithmic factors, and our preliminary experiment shows that ESTR outperforms a naïve linear bandit reduction.
| Original language | English (US) |
|---|---|
| Pages (from-to) | 3163-3172 |
| Number of pages | 10 |
| Journal | Proceedings of Machine Learning Research |
| Volume | 97 |
| State | Published - 2019 |
| Externally published | Yes |
| Event | 36th International Conference on Machine Learning, ICML 2019 - Long Beach, United States Duration: Jun 9 2019 → Jun 15 2019 |
ASJC Scopus subject areas
- Software
- Control and Systems Engineering
- Statistics and Probability
- Artificial Intelligence
Fingerprint
Dive into the research topics of 'Bilinear Bandits with Low-rank Structure'. Together they form a unique fingerprint.Cite this
- APA
- Standard
- Harvard
- Vancouver
- Author
- BIBTEX
- RIS