TY - GEN
T1 - Collaborative visual analysis with RCloud
AU - North, Stephen
AU - Scheidegger, Carlos
AU - Urbanek, Simon
AU - Woodhull, Gordon
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2015/12/4
Y1 - 2015/12/4
N2 - Consider the emerging role of data science teams embedded in larger organizations. Individual analysts work on loosely related problems, and must share their findings with each other and the organization at large, moving results from exploratory data analyses (EDA) into automated visualizations, diagnostics and reports deployed for wider consumption. There are two problems with the current practice. First, there are gaps in this workflow: EDA is performed with one set of tools, and automated reports and deployments with another. Second, these environments often assume a single-developer perspective, while data scientist teams could get much benefit from easier sharing of scripts and data feeds, experiments, annotations, and automated recommendations, which are well beyond what traditional version control systems provide. We contribute and justify the following three requirements for systems built to support current data science teams and users: discoverability, technology transfer, and coexistence. In addition, we contribute the design and implementation of RCloud, a system that supports the requirements of collaborative data analysis, visualization and web deployment. About 100 people used RCloud for two years. We report on interviews with some of these users, and discuss design decisions, tradeoffs and limitations in comparison to other approaches.
AB - Consider the emerging role of data science teams embedded in larger organizations. Individual analysts work on loosely related problems, and must share their findings with each other and the organization at large, moving results from exploratory data analyses (EDA) into automated visualizations, diagnostics and reports deployed for wider consumption. There are two problems with the current practice. First, there are gaps in this workflow: EDA is performed with one set of tools, and automated reports and deployments with another. Second, these environments often assume a single-developer perspective, while data scientist teams could get much benefit from easier sharing of scripts and data feeds, experiments, annotations, and automated recommendations, which are well beyond what traditional version control systems provide. We contribute and justify the following three requirements for systems built to support current data science teams and users: discoverability, technology transfer, and coexistence. In addition, we contribute the design and implementation of RCloud, a system that supports the requirements of collaborative data analysis, visualization and web deployment. About 100 people used RCloud for two years. We report on interviews with some of these users, and discuss design decisions, tradeoffs and limitations in comparison to other approaches.
KW - collaboration
KW - computer-supported cooperative work
KW - provenance
KW - visual analytics process
KW - visualization
UR - http://www.scopus.com/inward/record.url?scp=84983189224&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84983189224&partnerID=8YFLogxK
U2 - 10.1109/VAST.2015.7347627
DO - 10.1109/VAST.2015.7347627
M3 - Conference contribution
AN - SCOPUS:84983189224
T3 - 2015 IEEE Conference on Visual Analytics Science and Technology, VAST 2015 - Proceedings
SP - 25
EP - 32
BT - 2015 IEEE Conference on Visual Analytics Science and Technology, VAST 2015 - Proceedings
A2 - Chen, Min
A2 - Andrienko, Gennady
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 10th IEEE Conference on Visual Analytics Science and Technology, VAST 2015
Y2 - 25 October 2015 through 30 October 2015
ER -