Adaptive pinpoint and fuel efficient mars landing using reinforcement learning

Brian Gaudet, Roberto Furfaro

Research output: Contribution to journalArticlepeer-review

31 Scopus citations


Future unconstrained and science-driven missions to Mars will require advanced guidance algorithms that are able to adapt to more demanding mission requirements, e.g. landing on selected locales with pinpoint accuracy while autonomously flying fuel-efficient trajectories. In this paper, a novel guidance algorithm designed by applying the principles of reinforcement learning (RL) theory is presented. The goal is to devise an adaptive guidance algorithm that enables robust, fuel efficient, and accurate landing without the need for off line trajectory generation and real-time tracking. Results from a Monte Carlo simulation campaign show that the algorithm is capable of autonomously following trajectories that are close to the optimal minimum-fuel solutions with an accuracy that surpasses that of past and future Mars missions. The proposed RL-based guidance algorithm exhibits a high degree of flexibility and can easily accommodate autonomous retargeting while maintaining accuracy and fuel efficiency. Although reinforcement learning and other similar machine learning techniques have been previously applied to aerospace guidance and control problems (e.g., autonomous helicopter control), this appears, to the best of the authors knowledge, to be the first application of reinforcement learning to the problem of autonomous planetary landing.

Original languageEnglish (US)
Article number7004667
Pages (from-to)397-411
Number of pages15
JournalIEEE/CAA Journal of Automatica Sinica
Issue number4
StatePublished - Oct 1 2014


  • Markov decision process
  • Mars landing guidance
  • policy iteration
  • reinforcement learning

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Information Systems
  • Artificial Intelligence


Dive into the research topics of 'Adaptive pinpoint and fuel efficient mars landing using reinforcement learning'. Together they form a unique fingerprint.

Cite this