TY - JOUR
T1 - Robust spatiotemporal matching of electronic slides to presentation videos
AU - Fan, Quanfu
AU - Barnard, Kobus
AU - Amir, Arnon
AU - Efrat, Alon
N1 - Funding Information:
Manuscript received July 19, 2009; revised February 28, 2010; accepted January 06, 2011. Date of publication January 31, 2011; date of current version July 15, 2011. This work was supported in part by the Arizona Center for Information Science and Technology (ACIST) and in part by IBM. The work of A. Efrat was supported in part by the National Science Foundation (NSF) CAREER under Grant #CCF-0348000. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Kenneth K. M. Lam.
PY - 2011/8
Y1 - 2011/8
N2 - We describe a robust and efficient method for automatically matching and time-aligning electronic slides to videos of corresponding presentations. Matching electronic slides to videos provides new methods for indexing, searching, and browsing videos in distance-learning applications. However, robust automatic matching is challenging due to varied frame composition, slide distortion, camera movement, low-quality video capture, and arbitrary slides sequence. Our fully automatic approach combines image-based matching of slide to video frames with a temporal model for slide changes and camera events. To address these challenges, we begin by extracting scale-invariant feature-transformation (SIFT) keypoints from both slides and video frames, and matching them subject to a consistent projective transformation (homography) by using random sample consensus (RANSAC). We use the initial set of matches to construct a background model and a binary classifier for separating video frames showing slides from those without. We then introduce a new matching scheme for exploiting less distinctive SIFT keypoints that enables us to tackle more difficult images. Finally, we improve upon the matching based on visual information by using estimated matching probabilities as part of a hidden Markov model (HMM) that integrates temporal information and detected camera operations. Detailed quantitative experiments characterize each part of our approach and demonstrate an average accuracy of over 95% in 13 presentation videos.
AB - We describe a robust and efficient method for automatically matching and time-aligning electronic slides to videos of corresponding presentations. Matching electronic slides to videos provides new methods for indexing, searching, and browsing videos in distance-learning applications. However, robust automatic matching is challenging due to varied frame composition, slide distortion, camera movement, low-quality video capture, and arbitrary slides sequence. Our fully automatic approach combines image-based matching of slide to video frames with a temporal model for slide changes and camera events. To address these challenges, we begin by extracting scale-invariant feature-transformation (SIFT) keypoints from both slides and video frames, and matching them subject to a consistent projective transformation (homography) by using random sample consensus (RANSAC). We use the initial set of matches to construct a background model and a binary classifier for separating video frames showing slides from those without. We then introduce a new matching scheme for exploiting less distinctive SIFT keypoints that enables us to tackle more difficult images. Finally, we improve upon the matching based on visual information by using estimated matching probabilities as part of a hidden Markov model (HMM) that integrates temporal information and detected camera operations. Detailed quantitative experiments characterize each part of our approach and demonstrate an average accuracy of over 95% in 13 presentation videos.
KW - Distance learning
KW - homography constraint
KW - matching slides to video frames
KW - scale-invariant feature-transformation (SIFT) keypoints
KW - video indexing and browsing
UR - http://www.scopus.com/inward/record.url?scp=79960518181&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79960518181&partnerID=8YFLogxK
U2 - 10.1109/TIP.2011.2109727
DO - 10.1109/TIP.2011.2109727
M3 - Article
C2 - 21292597
AN - SCOPUS:79960518181
SN - 1057-7149
VL - 20
SP - 2315
EP - 2328
JO - IEEE Transactions on Image Processing
JF - IEEE Transactions on Image Processing
IS - 8
M1 - 5705574
ER -