Search the audio, browse the video - A generic paradigm for video collections

Arnon Amir, Savitha Srinivasan, Alon Efrat

Research output: Contribution to journalReview articlepeer-review

9 Scopus citations


The amount of digital video being shot, captured, and stored is growing at a rate faster than ever before. The large amount of stored video is not penetrable without efficient video indexing, retrieval, and browsing technology. Most prior work in the field can be roughly categorized into two classes. One class is based on image processing techniques, often called content-based image and video retrieval, in which video frames are indexed and searched for visual content. The other class is based on spoken document retrieval, which relies on automatic speech recognition and text queries. Both approaches have major limitations. In the first approach, semantic queries pose a great challenge, while the second, speech-based approach, does not support efficient video browsing. This paper describes a system where speech is used for efficient searching and visual data for efficient browsing, a combination that takes advantage of both approaches. A fully automatic indexing and retrieval system has been developed and tested. Automated speech recognition and phonetic speech indexing support text-to-speech queries. New browsable views are generated from the original video. A special synchronized browser allows instantaneous, context-preserving switching from one view to another. The system was successfully used to produce searchable-browsable video proceedings for three local conferences.

Original languageEnglish (US)
Pages (from-to)209-222
Number of pages14
JournalEurasip Journal on Applied Signal Processing
Issue number2
StatePublished - Feb 1 2003


  • Automatic video indexing
  • Phonetic speech retrieval
  • Video and speech retrieval
  • Video browsing

ASJC Scopus subject areas

  • Signal Processing
  • Hardware and Architecture
  • Electrical and Electronic Engineering


Dive into the research topics of 'Search the audio, browse the video - A generic paradigm for video collections'. Together they form a unique fingerprint.

Cite this