We present a general approach for automatically matching electronic slides to videos of corresponding presentations for use in distance learning and video proceedings of conferences. We deal with a large variety of videos, various frame compositions and color balances, arbitrary slides sequence and with dynamic cameras switching, pan, tilt and zoom. To achieve high accuracy, we develop a two-phases process with unsupervised scene background modelling. In the first phase, scale invariant feature transform (SIFT) keypoints are applied to frame to slide matching, under constraint projective transformation (constraint homography) using a random sample consensus (RANSAC). Successful first-phase matches are then used to automatically build a scene background model. In the second phase the background model is applied to the remaining unmatched frames to boost the matching performance for difficult cases such as wide field of view camera shots where the slide shows as a small portion of the frame. We also show that color correction is helpful when color-related similarity measures are used for identifying slides. We provide detailed quantitative experimentation results characterizing the effect of each part of our approach. The results show that our approach is robust and achieves high performance on matching slides to a number of videos with different styles.