Analysis of memory constrained live provenance

Peng Chen, Tom Evans, Beth Plale

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Scopus citations


We conjecture that meaningful analysis of large-scale provenance can be preserved by analyzing provenance data in limited memory while the data is still in motion; that the provenance needs not be fully resident before analysis can occur. As a proof of concept, this paper defines a stream model for reasoning about provenance data in motion for Big Data provenance.We propose a novel streaming algorithm for the backward provenance query, and apply it to the live provenance captured from agent-based simulations. The performance test demonstrates high throughput, low latency and good scalability, in a distributed stream processing framework built on Apache Kafka and Spark Streaming.

Original languageEnglish (US)
Title of host publicationProvenance and Annotation of Data and Processes - 6th International Provenance and Annotation Workshop, IPAW 2016, Proceedings
EditorsBoris Glavic, Marta Mattoso
Number of pages13
ISBN (Print)9783319405926
StatePublished - 2016
Event6th International Provenance and Annotation Workshop, IPAW 2016 - McLean, United States
Duration: Jun 7 2016Jun 8 2016

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


Conference6th International Provenance and Annotation Workshop, IPAW 2016
Country/TerritoryUnited States


  • Agent-Based model
  • Live data provenance
  • Stream processing

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)


Dive into the research topics of 'Analysis of memory constrained live provenance'. Together they form a unique fingerprint.

Cite this