Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary

P. Duygulu, K. Barnard, J. F.G. de Freitas, D. A. Forsyth

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1101 Scopus citations

Abstract

We describe a model of object recognition as machine translation. In this model, recognition is a process of annotating image regions with words. Firstly, images are segmented into regions, which are classified into region types using a variety of features. A mapping between region types and keywords supplied with the images, is then learned, using a method based around EM. This process is analogous with learning a lexicon from an aligned bitext. For the implementation we describe, these words are nouns taken from a large vocabulary. On a large test set, the method can predict numerous words with high accuracy. Simple methods identify words that cannot be predicted well. We show how to cluster words that individually are difficult to predict into clusters that can be predicted well — for example, we cannot predict the distinction between train and locomotive using the current set of features, but we can predict the underlying concept. The method is trained on a substantial collection of images. Extensive experimental results illustrate the strengths and weaknesses of the approach.

Original languageEnglish (US)
Title of host publicationComputer Vision - ECCV 2002 - 7th European Conference on Computer Vision, Proceedings
EditorsAnders Heyden, Gunnar Sparr, Mads Nielsen, Peter Johansen
PublisherSpringer-Verlag
Pages97-112
Number of pages16
ISBN (Electronic)9783540437482
DOIs
StatePublished - 2002
Event7th European Conference on Computer Vision, ECCV 2002 - Copenhagen, Denmark
Duration: May 28 2002May 31 2002

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume2353
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other7th European Conference on Computer Vision, ECCV 2002
Country/TerritoryDenmark
CityCopenhagen
Period5/28/025/31/02

Keywords

  • Correspondence
  • EM algorithm
  • Object recognition

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary'. Together they form a unique fingerprint.

Cite this