TY - JOUR
T1 - Hashedcubes
T2 - Simple, Low Memory, Real-Time Visual Exploration of Big Data
AU - Pahins, Cícero A.L.
AU - Stephens, Sean A.
AU - Scheidegger, Carlos
AU - Comba, João L.D.
N1 - Funding Information:
We would like to thank AT&T Research for providing the set of queries, CNPq Process 308851/2015-3 and Capes for the financial support, as well as the anonymous reviewers.
Publisher Copyright:
© 2016 IEEE.
PY - 2017/1
Y1 - 2017/1
N2 - We propose Hashedcubes, a data structure that enables real-time visual exploration of large datasets that improves the state of the art by virtue of its low memory requirements, low query latencies, and implementation simplicity. In some instances, Hashedcubes notably requires two orders of magnitude less space than recent data cube visualization proposals. In this paper, we describe the algorithms to build and query Hashedcubes, and how it can drive well-known interactive visualizations such as binned scatterplots, linked histograms and heatmaps. We report memory usage, build time and query latencies for a variety of synthetic and real-world datasets, and find that although sometimes Hashedcubes offers slightly slower querying times to the state of the art, the typical query is answered fast enough to easily sustain a interaction. In datasets with hundreds of millions of elements, only about 2% of the queries take longer than 40ms. Finally, we discuss the limitations of data structure, potential spacetime tradeoffs, and future research directions.
AB - We propose Hashedcubes, a data structure that enables real-time visual exploration of large datasets that improves the state of the art by virtue of its low memory requirements, low query latencies, and implementation simplicity. In some instances, Hashedcubes notably requires two orders of magnitude less space than recent data cube visualization proposals. In this paper, we describe the algorithms to build and query Hashedcubes, and how it can drive well-known interactive visualizations such as binned scatterplots, linked histograms and heatmaps. We report memory usage, build time and query latencies for a variety of synthetic and real-world datasets, and find that although sometimes Hashedcubes offers slightly slower querying times to the state of the art, the typical query is answered fast enough to easily sustain a interaction. In datasets with hundreds of millions of elements, only about 2% of the queries take longer than 40ms. Finally, we discuss the limitations of data structure, potential spacetime tradeoffs, and future research directions.
KW - Scalability
KW - data cube
KW - interactive exploration
KW - multidimensional data
UR - http://www.scopus.com/inward/record.url?scp=84998953602&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84998953602&partnerID=8YFLogxK
U2 - 10.1109/TVCG.2016.2598624
DO - 10.1109/TVCG.2016.2598624
M3 - Article
C2 - 27875182
AN - SCOPUS:84998953602
SN - 1077-2626
VL - 23
SP - 671
EP - 680
JO - IEEE Transactions on Visualization and Computer Graphics
JF - IEEE Transactions on Visualization and Computer Graphics
IS - 1
M1 - 7539326
ER -