Schema-less, semantics-based change detection for XML documents

Shuohao Zhang, Curtis Dyreson, Richard T. Snodgrass

Research output: Chapter in Book/Report/Conference proceedingChapter

6 Scopus citations

Abstract

Schema-less change detection is the processes of comparing successive versions of an XML document or data collection to determine which portions are the same and which have changed, without using a schema. Change detection can be used to reduce space in an historical data collection and to support temporal queries. Most previous research has focused on detecting structural changes between document versions. But techniques that depend on structure break down when the structural change is significant. This paper develops an algorithm for detecting change based on the semantics, rather than on the structure, of a document. The algorithm is based on the observation that information that identifies an element is often conserved across changes to a document. The algorithm first isolates identifiers for elements. It then uses these identifiers to associate elements in successive versions.

Original languageEnglish (US)
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
EditorsXiaofang Zhou, Maria E. Orlowska, Stanley Su, Mike P. Papazoglou, Keith G. Jeffery
PublisherSpringer-Verlag
Pages279-290
Number of pages12
ISBN (Electronic)3540238948, 9783540238942
DOIs
StatePublished - 2004

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume3306
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Schema-less, semantics-based change detection for XML documents'. Together they form a unique fingerprint.

Cite this