TY - JOUR
T1 - Looks Good to Me
T2 - Visualizations As Sanity Checks
AU - Correll, Michael
AU - Li, Mingwei
AU - Kindlmann, Gordon
AU - Scheidegger, Carlos
N1 - Funding Information:
Scheidegger and Li’s work in this project was partially supported by NSF award IIS-1513651 and the Arizona Board of Regents.
Publisher Copyright:
© 2018 IEEE.
PY - 2019/1
Y1 - 2019/1
N2 - Famous examples such as Anscombe's Quartet highlight that one of the core benefits of visualizations is allowing people to discover visual patterns that might otherwise be hidden by summary statistics. This visual inspection is particularly important in exploratory data analysis, where analysts can use visualizations such as histograms and dot plots to identify data quality issues. Yet, these visualizations are driven by parameters such as histogram bin size or mark opacity that have a great deal of impact on the final visual appearance of the chart, but are rarely optimized to make important features visible. In this paper, we show that data flaws have varying impact on the visual features of visualizations, and that the adversarial or merely uncritical setting of design parameters of visualizations can obscure the visual signatures of these flaws. Drawing on the framework of Algebraic Visualization Design, we present the results of a crowdsourced study showing that common visualization types can appear to reasonably summarize distributional data while hiding large and important flaws such as missing data and extraneous modes. We make use of these results to propose additional best practices for visualizations of distributions for data quality tasks.
AB - Famous examples such as Anscombe's Quartet highlight that one of the core benefits of visualizations is allowing people to discover visual patterns that might otherwise be hidden by summary statistics. This visual inspection is particularly important in exploratory data analysis, where analysts can use visualizations such as histograms and dot plots to identify data quality issues. Yet, these visualizations are driven by parameters such as histogram bin size or mark opacity that have a great deal of impact on the final visual appearance of the chart, but are rarely optimized to make important features visible. In this paper, we show that data flaws have varying impact on the visual features of visualizations, and that the adversarial or merely uncritical setting of design parameters of visualizations can obscure the visual signatures of these flaws. Drawing on the framework of Algebraic Visualization Design, we present the results of a crowdsourced study showing that common visualization types can appear to reasonably summarize distributional data while hiding large and important flaws such as missing data and extraneous modes. We make use of these results to propose additional best practices for visualizations of distributions for data quality tasks.
KW - Graphical perception
KW - data quality
KW - univariate visualizations
UR - http://www.scopus.com/inward/record.url?scp=85052643856&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85052643856&partnerID=8YFLogxK
U2 - 10.1109/TVCG.2018.2864907
DO - 10.1109/TVCG.2018.2864907
M3 - Article
AN - SCOPUS:85052643856
SN - 1077-2626
VL - 25
SP - 830
EP - 839
JO - IEEE Transactions on Visualization and Computer Graphics
JF - IEEE Transactions on Visualization and Computer Graphics
IS - 1
M1 - 8440818
ER -