TY - GEN
T1 - Meta-Reinforcement Learning for Adaptive Spacecraft Guidance during Multi-Target Missions
AU - Federici, Lorenzo
AU - Scorsoglio, Andrea
AU - Zavoli, Alessandro
AU - Furfaro, Roberto
N1 - Publisher Copyright:
© 2021 by Mr. Lorenzo Federici. Published by the IAF, with permission and released to the IAF to publish in all forms.
PY - 2021
Y1 - 2021
N2 - In this paper, a meta-reinforcement learning approach is used to generate a guidance algorithm capable of carrying out multi-target missions. Specifically, two models are trained to learn how to realize multiple fuel-optimal low-thrust rendezvous maneuvers between circular co-planar orbits with close radii. The first model is entirely based on a Multilayer Perceptron (MLP) neural network, while the second one also relies on a Long Short-Term Memory (LSTM) layer, which provides augmented generalization capability by incorporating memory-dependent internal states. The two networks are trained via Proximal Policy Optimization (PPO) on a wide distribution of transfers, which encompasses all possible trajectories connecting any pair of targets of a given set, and in a given time window. The aim is to produce a nearly-optimal guidance law that could be directly used for any transfer leg of the actual multi-target mission. To assess the validity of the proposed approach, a sensitivity analysis on a single leg is carried out by varying the radius either of the initial or the final orbit, the transfer time, and the initial phase angle between the chaser and the target. The results show that the LSTM-equipped network is able to better reconstruct the optimal control in almost all the analyzed scenarios, and, at the same time, to achieve, in average, a lower value of the terminal constraint violation.
AB - In this paper, a meta-reinforcement learning approach is used to generate a guidance algorithm capable of carrying out multi-target missions. Specifically, two models are trained to learn how to realize multiple fuel-optimal low-thrust rendezvous maneuvers between circular co-planar orbits with close radii. The first model is entirely based on a Multilayer Perceptron (MLP) neural network, while the second one also relies on a Long Short-Term Memory (LSTM) layer, which provides augmented generalization capability by incorporating memory-dependent internal states. The two networks are trained via Proximal Policy Optimization (PPO) on a wide distribution of transfers, which encompasses all possible trajectories connecting any pair of targets of a given set, and in a given time window. The aim is to produce a nearly-optimal guidance law that could be directly used for any transfer leg of the actual multi-target mission. To assess the validity of the proposed approach, a sensitivity analysis on a single leg is carried out by varying the radius either of the initial or the final orbit, the transfer time, and the initial phase angle between the chaser and the target. The results show that the LSTM-equipped network is able to better reconstruct the optimal control in almost all the analyzed scenarios, and, at the same time, to achieve, in average, a lower value of the terminal constraint violation.
UR - http://www.scopus.com/inward/record.url?scp=85123888613&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85123888613&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85123888613
T3 - Proceedings of the International Astronautical Congress, IAC
BT - IAF Astrodynamics Symposium 2021 - Held at the 72nd International Astronautical Congress, IAC 2021
PB - International Astronautical Federation, IAF
T2 - IAF Astrodynamics Symposium 2021 at the 72nd International Astronautical Congress, IAC 2021
Y2 - 25 October 2021 through 29 October 2021
ER -