TY - JOUR
T1 - Meta-reinforcement learning for adaptive spacecraft guidance during finite-thrust rendezvous missions
AU - Federici, Lorenzo
AU - Scorsoglio, Andrea
AU - Zavoli, Alessandro
AU - Furfaro, Roberto
N1 - Publisher Copyright:
© 2022 IAA
PY - 2022/12
Y1 - 2022/12
N2 - In this paper, a meta-reinforcement learning approach is investigated to design an adaptive guidance algorithm capable of carrying out multiple rendezvous space missions. Specifically, both a standard fully-connected network and a recurrent neural network are trained by proximal policy optimization on a wide distribution of finite-thrust rendezvous transfers between circular co-planar orbits. The recurrent network is also provided with the control and reward at the previous simulation step, thus allowing it to build, thanks to its history-dependent state, an internal representation of the considered task distribution. The ultimate goal is to generate a model which could adapt to unseen tasks and produce a nearly-optimal guidance law along any transfer leg of a multi-target mission. As a first step towards the solution of a complete multi-target problem, a sensitivity analysis on the single rendezvous leg is carried out in this paper, by varying the radius either of the initial or the final orbit, the transfer time, and the initial phasing between the chaser and the target. Numerical results show that the recurrent-network-based meta-reinforcement learning approach is able to better reconstruct the optimal control in almost all the analyzed scenarios, and, at the same time, to meet, with greater accuracy, the terminal rendezvous condition, even when considering problem instances that fall outside the original training domain.
AB - In this paper, a meta-reinforcement learning approach is investigated to design an adaptive guidance algorithm capable of carrying out multiple rendezvous space missions. Specifically, both a standard fully-connected network and a recurrent neural network are trained by proximal policy optimization on a wide distribution of finite-thrust rendezvous transfers between circular co-planar orbits. The recurrent network is also provided with the control and reward at the previous simulation step, thus allowing it to build, thanks to its history-dependent state, an internal representation of the considered task distribution. The ultimate goal is to generate a model which could adapt to unseen tasks and produce a nearly-optimal guidance law along any transfer leg of a multi-target mission. As a first step towards the solution of a complete multi-target problem, a sensitivity analysis on the single rendezvous leg is carried out in this paper, by varying the radius either of the initial or the final orbit, the transfer time, and the initial phasing between the chaser and the target. Numerical results show that the recurrent-network-based meta-reinforcement learning approach is able to better reconstruct the optimal control in almost all the analyzed scenarios, and, at the same time, to meet, with greater accuracy, the terminal rendezvous condition, even when considering problem instances that fall outside the original training domain.
KW - Autonomous spacecraft guidance
KW - Meta-reinforcement learning
KW - Optimal control
KW - Proximal policy optimization
KW - Recurrent neural network
KW - Rendezvous mission
UR - http://www.scopus.com/inward/record.url?scp=85138828635&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85138828635&partnerID=8YFLogxK
U2 - 10.1016/j.actaastro.2022.08.047
DO - 10.1016/j.actaastro.2022.08.047
M3 - Article
AN - SCOPUS:85138828635
SN - 0094-5765
VL - 201
SP - 129
EP - 141
JO - Acta Astronautica
JF - Acta Astronautica
ER -