In this paper, a meta-reinforcement learning approach is used to generate a guidance algorithm capable of carrying out multi-target missions. Specifically, two models are trained to learn how to realize multiple fuel-optimal low-thrust rendezvous maneuvers between circular co-planar orbits with close radii. The first model is entirely based on a Multilayer Perceptron (MLP) neural network, while the second one also relies on a Long Short-Term Memory (LSTM) layer, which provides augmented generalization capability by incorporating memory-dependent internal states. The two networks are trained via Proximal Policy Optimization (PPO) on a wide distribution of transfers, which encompasses all possible trajectories connecting any pair of targets of a given set, and in a given time window. The aim is to produce a nearly-optimal guidance law that could be directly used for any transfer leg of the actual multi-target mission. To assess the validity of the proposed approach, a sensitivity analysis on a single leg is carried out by varying the radius either of the initial or the final orbit, the transfer time, and the initial phase angle between the chaser and the target. The results show that the LSTM-equipped network is able to better reconstruct the optimal control in almost all the analyzed scenarios, and, at the same time, to achieve, in average, a lower value of the terminal constraint violation.