Future Mars missions will require advanced guidance, navigation, and control algorithms for the powered descent phase in order to target specific surface locations and achieve pinpoint accuracy (landing error ellipse < 5m radius). This requires both a navigation system capable of estimating the lander’s state in real-time and a guidance and control system that can map the estimated lander state to body-frame actuator commands. In this paper we present a novel integrated guidance and control algorithm designed by applying the principles of reinforcement learning theory. The key innovation is the use of reinforcement learning to learn a policy mapping the lander’s estimated state directly to actuator commands, with the policy resulting in accurate and fuel efficient trajectories. Specifically, we use proximal policy optimization, a policy gradient method, to learn the policy. We present simulation results demonstrating the guidance and control system’s performance in a 6-DOF simulation environment, and demonstrate robustness to noise and system parameter uncertainty.