Hashedcubes: Simple, Low Memory, Real-Time Visual Exploration of Big Data

Cícero A.L. Pahins, Sean A. Stephens, Carlos Scheidegger, João L.D. Comba

Research output: Contribution to journalArticlepeer-review

77 Scopus citations

Abstract

We propose Hashedcubes, a data structure that enables real-time visual exploration of large datasets that improves the state of the art by virtue of its low memory requirements, low query latencies, and implementation simplicity. In some instances, Hashedcubes notably requires two orders of magnitude less space than recent data cube visualization proposals. In this paper, we describe the algorithms to build and query Hashedcubes, and how it can drive well-known interactive visualizations such as binned scatterplots, linked histograms and heatmaps. We report memory usage, build time and query latencies for a variety of synthetic and real-world datasets, and find that although sometimes Hashedcubes offers slightly slower querying times to the state of the art, the typical query is answered fast enough to easily sustain a interaction. In datasets with hundreds of millions of elements, only about 2% of the queries take longer than 40ms. Finally, we discuss the limitations of data structure, potential spacetime tradeoffs, and future research directions.

Original languageEnglish (US)
Article number7539326
Pages (from-to)671-680
Number of pages10
JournalIEEE Transactions on Visualization and Computer Graphics
Volume23
Issue number1
DOIs
StatePublished - Jan 2017

Keywords

  • Scalability
  • data cube
  • interactive exploration
  • multidimensional data

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Computer Vision and Pattern Recognition
  • Computer Graphics and Computer-Aided Design

Fingerprint

Dive into the research topics of 'Hashedcubes: Simple, Low Memory, Real-Time Visual Exploration of Big Data'. Together they form a unique fingerprint.

Cite this