TY - JOUR
T1 - Image-Based Deep Reinforcement Meta-Learning for Autonomous Lunar Landing
AU - Scorsoglio, Andrea
AU - D’ambrosio, Andrea
AU - Ghilardi, Luca
AU - Gaudet, Brian
AU - Curti, Fabio
AU - Furfaro, Roberto
N1 - Publisher Copyright:
© 2022, AIAA International. All rights reserved.
PY - 2022/1
Y1 - 2022/1
N2 - Future exploration and human missions on large planetary bodies (e.g., moon, Mars) will require advanced guidance navigation and control algorithms for the powered descent phase, which must be capable of unprecedented levels of autonomy. The advent of machine learning, and specifically reinforcement learning, has enabled new possibilities for closed-loop autonomous guidance and navigation. In this paper, image-based reinforcement meta-learning is applied to solve the lunar pinpoint powered descent and landing task with uncertain dynamic parameters and actuator failure. The agent, a deep neural network, takes real-time images and ranging observations acquired during the descent and maps them directly to thrust command (i.e., sensor-to-action policy). Training and validation of the algorithm and Monte Carlo simulations shows that the resulting closed-loop guidance policy reaches errors in the order of meters in different scenarios, even when the environment is partially observed, and the state of the spacecraft is not fully known.
AB - Future exploration and human missions on large planetary bodies (e.g., moon, Mars) will require advanced guidance navigation and control algorithms for the powered descent phase, which must be capable of unprecedented levels of autonomy. The advent of machine learning, and specifically reinforcement learning, has enabled new possibilities for closed-loop autonomous guidance and navigation. In this paper, image-based reinforcement meta-learning is applied to solve the lunar pinpoint powered descent and landing task with uncertain dynamic parameters and actuator failure. The agent, a deep neural network, takes real-time images and ranging observations acquired during the descent and maps them directly to thrust command (i.e., sensor-to-action policy). Training and validation of the algorithm and Monte Carlo simulations shows that the resulting closed-loop guidance policy reaches errors in the order of meters in different scenarios, even when the environment is partially observed, and the state of the spacecraft is not fully known.
UR - http://www.scopus.com/inward/record.url?scp=85123899809&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85123899809&partnerID=8YFLogxK
U2 - 10.2514/1.A35072
DO - 10.2514/1.A35072
M3 - Article
AN - SCOPUS:85123899809
SN - 0022-4650
VL - 59
SP - 153
EP - 165
JO - Journal of Spacecraft and Rockets
JF - Journal of Spacecraft and Rockets
IS - 1
ER -