TY - GEN
T1 - Skew handling techniques in sort-merge join
AU - Li, Wei
AU - Gao, Dengfeng
AU - Snodgrass, Richard Thomas
N1 - Funding Information:
This work was supported in part by NSF Grants EIA-0080123 and IIS-0100436 and by grants from Amazon.corn and the Boeing Corporation.
Publisher Copyright:
© 2002 ACM.
PY - 2002/6/3
Y1 - 2002/6/3
N2 - Joins are among the most frequently executed operations. Several fast join algorithms have been developed and extensively studied; these can be categorized as sort-merge, hash-based, and index-based algorithms. While all three types of algorithms exhibit excellent performance over most data, ameliorating the performance degradation in the presence of skew has been investigated only for hash-based algorithms. However, for sort-merge join, even a small amount of skew present in realistic data can result in a significant performance hit on a commercial DBMS. This paper examines the negative ramifications of skew in sort-merge join and proposes several refinements that deal effectively with data skew. Experiments show that some of these algorithms also impose virtually no penalty in the absence of data skew and are thus suitable for replacing existing sort-merge implementations. We also show how sort-merge band join performance is significantly enhanced with these refinements.
AB - Joins are among the most frequently executed operations. Several fast join algorithms have been developed and extensively studied; these can be categorized as sort-merge, hash-based, and index-based algorithms. While all three types of algorithms exhibit excellent performance over most data, ameliorating the performance degradation in the presence of skew has been investigated only for hash-based algorithms. However, for sort-merge join, even a small amount of skew present in realistic data can result in a significant performance hit on a commercial DBMS. This paper examines the negative ramifications of skew in sort-merge join and proposes several refinements that deal effectively with data skew. Experiments show that some of these algorithms also impose virtually no penalty in the absence of data skew and are thus suitable for replacing existing sort-merge implementations. We also show how sort-merge band join performance is significantly enhanced with these refinements.
UR - http://www.scopus.com/inward/record.url?scp=85134326971&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85134326971&partnerID=8YFLogxK
U2 - 10.1145/564691.564711
DO - 10.1145/564691.564711
M3 - Conference contribution
AN - SCOPUS:85134326971
T3 - Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, SIGMOD 2002
SP - 169
EP - 180
BT - Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, SIGMOD 2002
PB - Association for Computing Machinery, Inc
T2 - 2002 ACM SIGMOD International Conference on Management of Data, SIGMOD 2002
Y2 - 3 June 2002 through 6 June 2002
ER -