In this paper, we approach the problem of deception detection in videos. Current approaches are limited since they (i) are used in short videos focusing only on a small act of deception, (ii) are hard to interpret, and (iii) do not make use of any human model that could help them in the detection task. To address those limitations, we propose a novel framework that uses as input the 1-dimensional Facial Action Unit (FAU) and Gaze signals. By using a higher-level input and not the raw video, we are able to train a conceptually simple, modular and powerful model that achieves state-of-the-art performance in video-based deception detection. Finally, we propose a novel approach to interpret our model’s predictions, by computing the attention of the neural network in the time domain. This method can enable domain scientists perform retrospective analysis of deceptive behavior.