Low-thrust many-revolution trajectory design and orbit transfers are becoming increasingly important with the development of high specific impulse, low-thrust engines. Closed-loop feedback-driven (CLFD) control laws can be used to solve these trajectory design problems with minimal computational cost and offer potential for autonomous guidance. However, they have user-defined parameters which limit their optimality. In this work, an actor-critic reinforcement learning framework is proposed to make the parameters of the Lyapunov-based Q-law state-dependent, ensuring the controller can adapt as the dynamics evolve during a transfer. The proposed framework should be independent of the particular CLFD control law and provides improved solutions for mission analysis. There is also potential for future on-board autonomous use, as trajectories are closed-form and can be generated without an initial guess. The current results focus on GTO-GEO transfers in Keplerian dynamics and later with eclipse and J2 effects. Both time-optimal and mass-optimal transfers are presented, and the stability to uncertainties in orbit determination are discussed. The task of handling orbit perturbations is left to future work.