This paper focuses on the use of meta-reinforcement learning for the autonomous guidance of a spacecraft with low thrust during the terminal phase of an impact mission towards a binary asteroid system. The control policy is replaced by a convolutional-recurrent neural network, which is used to map optical observations collected by the on-board camera to the optimal control thrust and thrusting times. The network is trained by Proximal Policy Optimization, a state-of-the-art policy-gradient reinforcement learning algorithm. The final phase of the DART mission is used as test case. The objective is to maneuver the spacecraft to impact on the smaller object, Dimorphos, in the 65803 Didymos binary system. The spacecraft dynamics are described within the bi-elliptic restricted four-body problem with an additional solar radiation pressure term. The initial conditions are randomly scattered according to actual specifications of the DART mission. A random error on the orbital position of Dimorphos is also considered to reflect an uncertainty on the binary system’s characteristics and dynamics. The control system aims at minimizing the error on the final spacecraft position. Numerical results show that the guidance system is able to correctly drive the spacecraft towards the final impact point in almost all test scenarios.