TY - JOUR
T1 - DBMS metrology
T2 - Measuring query time
AU - Currim, Sabah
AU - Snodgrass, Richard T.
AU - Suh, Young Kyoon
AU - Zhang, Rui
N1 - Funding Information:
This research has been supported in part by NSF grants IIS-0415101, IIS-0639106, CNS-0838948, IIS-1016205, and EIA-0080123, and by partial support from a grant from Microsoft Corporation. We thank Benjamin Dicken, Preetha Chatterjee, Pallavi Chilappagari, Jennifer Dempsey, David Gallup, Kevan Holdaway, Matthew Johnson, Andrey Kvochko, Derek Merek, Lopamudra Sarangi, Cheng Yi, and Haziel Zuniga for their contributions to AZDBLAB; and Tom Buchanan, Phil Kaslo, Tom Lowry, and John Luiten for constructing and maintaining our experimental instrument, a laboratory of 10 machines and associated software. Finally, we thank Nikolaus Augsten and the referees of the previous conference version and of this journal version for their many helpful comments.
Publisher Copyright:
© 2016 ACM.
PY - 2016/11
Y1 - 2016/11
N2 - It is surprisingly hard to obtain accurate and precise measurements of the time spent executing a query because there are many sources of variance. To understand these sources, we review relevant per-process and overall measures obtainable from the Linux kernel and introduce a structural causal model relating these measures. A thorough correlational analysis provides strong support for this model. We attempted to determine why a particular measurement wasn't repeatable and then to devise ways to eliminate or reduce that variance. This enabled us to articulate a timing protocol that applies to proprietary DBMSes, that ensures the repeatability of a query, and that obtains a quite accurate query execution time while dropping very few outliers. This resulting query time measurement procedure, termed the Tucson Timing Protocol Version 2 (TTPv2), consists of the following steps: (i) perform sanity checks to ensure data validity; (ii) drop some query executions via clearly motivated predicates; (iii) drop some entire queries at a cardinality, again via clearly motivated predicates; (iv) for those that remain, compute a single measured time by a carefully justified formula over the underlying measures of the remaining query executions; and (v) perform post-analysis sanity checks. The result is a mature, general, robust, self-checking protocol that provides a more precise and more accurate timing of the query. The protocol is also applicable to other operating domains in which measurements of multiple processes each doing computation and I/O is needed.
AB - It is surprisingly hard to obtain accurate and precise measurements of the time spent executing a query because there are many sources of variance. To understand these sources, we review relevant per-process and overall measures obtainable from the Linux kernel and introduce a structural causal model relating these measures. A thorough correlational analysis provides strong support for this model. We attempted to determine why a particular measurement wasn't repeatable and then to devise ways to eliminate or reduce that variance. This enabled us to articulate a timing protocol that applies to proprietary DBMSes, that ensures the repeatability of a query, and that obtains a quite accurate query execution time while dropping very few outliers. This resulting query time measurement procedure, termed the Tucson Timing Protocol Version 2 (TTPv2), consists of the following steps: (i) perform sanity checks to ensure data validity; (ii) drop some query executions via clearly motivated predicates; (iii) drop some entire queries at a cardinality, again via clearly motivated predicates; (iv) for those that remain, compute a single measured time by a carefully justified formula over the underlying measures of the remaining query executions; and (v) perform post-analysis sanity checks. The result is a mature, general, robust, self-checking protocol that provides a more precise and more accurate timing of the query. The protocol is also applicable to other operating domains in which measurements of multiple processes each doing computation and I/O is needed.
KW - Accuracy
KW - Database ergalics
KW - Repeatability
KW - Tucson timing protocol
UR - http://www.scopus.com/inward/record.url?scp=84997241532&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84997241532&partnerID=8YFLogxK
U2 - 10.1145/2996454
DO - 10.1145/2996454
M3 - Article
AN - SCOPUS:84997241532
SN - 0362-5915
VL - 42
JO - ACM Transactions on Database Systems
JF - ACM Transactions on Database Systems
IS - 1
M1 - 3
ER -