TY - GEN
T1 - Projection-Free Methods for Stochastic Simple Bilevel Optimization with Convex Lower-level Problem
AU - Cao, Jincheng
AU - Jiang, Ruichen
AU - Abolfazli, Nazanin
AU - Hamedani, Erfan Yazdandoost
AU - Mokhtari, Aryan
N1 - Publisher Copyright:
© 2023 Neural information processing systems foundation. All rights reserved.
PY - 2023
Y1 - 2023
N2 - In this paper, we study a class of stochastic bilevel optimization problems, also known as stochastic simple bilevel optimization, where we minimize a smooth stochastic objective function over the optimal solution set of another stochastic convex optimization problem. We introduce novel stochastic bilevel optimization methods that locally approximate the solution set of the lower-level problem via a stochastic cutting plane, and then run a conditional gradient update with variance reduction techniques to control the error induced by using stochastic gradients. For the case that the upper-level function is convex, our method requires Õ(max{1/ϵ2f, 1/ϵ2g}) stochastic oracle queries to obtain a solution that is ϵfoptimal for the upper-level and ϵg-optimal for the lower-level. This guarantee improves the previous best-known complexity of O(max{1/ϵ4f, 1/ϵ4g}). Moreover, for the case that the upper-level function is non-convex, our method requires at most Õ(max{1/ϵ3f, 1/ϵ3g}) stochastic oracle queries to find an (ϵf, ϵg)-stationary point. In the finite-sum setting, we show that the number of stochastic oracle calls required by our method are Õ(√n/ϵ) and Õ(√n/ϵ2) for the convex and non-convex settings, respectively, where ϵ = min{ϵf, ϵg}.
AB - In this paper, we study a class of stochastic bilevel optimization problems, also known as stochastic simple bilevel optimization, where we minimize a smooth stochastic objective function over the optimal solution set of another stochastic convex optimization problem. We introduce novel stochastic bilevel optimization methods that locally approximate the solution set of the lower-level problem via a stochastic cutting plane, and then run a conditional gradient update with variance reduction techniques to control the error induced by using stochastic gradients. For the case that the upper-level function is convex, our method requires Õ(max{1/ϵ2f, 1/ϵ2g}) stochastic oracle queries to obtain a solution that is ϵfoptimal for the upper-level and ϵg-optimal for the lower-level. This guarantee improves the previous best-known complexity of O(max{1/ϵ4f, 1/ϵ4g}). Moreover, for the case that the upper-level function is non-convex, our method requires at most Õ(max{1/ϵ3f, 1/ϵ3g}) stochastic oracle queries to find an (ϵf, ϵg)-stationary point. In the finite-sum setting, we show that the number of stochastic oracle calls required by our method are Õ(√n/ϵ) and Õ(√n/ϵ2) for the convex and non-convex settings, respectively, where ϵ = min{ϵf, ϵg}.
UR - https://www.scopus.com/pages/publications/85191166682
UR - https://www.scopus.com/pages/publications/85191166682#tab=citedBy
M3 - Conference contribution
AN - SCOPUS:85191166682
T3 - Advances in Neural Information Processing Systems
BT - Advances in Neural Information Processing Systems 36 - 37th Conference on Neural Information Processing Systems, NeurIPS 2023
A2 - Oh, A.
A2 - Neumann, T.
A2 - Globerson, A.
A2 - Saenko, K.
A2 - Hardt, M.
A2 - Levine, S.
PB - Neural information processing systems foundation
T2 - 37th Conference on Neural Information Processing Systems, NeurIPS 2023
Y2 - 10 December 2023 through 16 December 2023
ER -