TY - JOUR
T1 - Main memory-based algorithms for efficient parallel aggregation for temporal databases
AU - Gao, Dengfeng
AU - Gendrano, Jose Alvin G.
AU - Moon, Bongki
AU - Snodgrass, Richard T.
AU - Park, Minseok
AU - Huang, Bruce C.
AU - Rodrigue, Jim M.
N1 - Funding Information:
This work was sponsored in part by National Science Foundation CAREER Award (IIS-9876037), NSF Grants IIS-0100436, CDA-9500991, EAE-0080123, IRI-9632569, IIS-9817798, and NSF Research Infrastructure Programs EIA-0080123 and EIA-9500991, and by grants from Amazon.com and the Boeing Corporation. We would like to thank the reviewers for their suggestions, which improved this paper. We also thank Greg Andrews and Tom Lowry for their helpful insights on the impact of the network architecture on our performance measures.
PY - 2004/9
Y1 - 2004/9
N2 - The ability to model the temporal dimension is essential to many applications. Furthermore, the rate of increase in database size and stringency of response time requirements has out-paced advancements in processor and mass storage technology, leading to the need for parallel temporal database management systems. In this paper, we introduce a variety of parallel temporal aggregation algorithms for the shared-nothing architecture; these algorithms are based on the sequential aggregation tree algorithm. We are particularly interested in developing parallel algorithms that can maximally exploit available memory to quickly compute large-scale temporal aggregates without intermediate disk writes and reads. Via an empirical study, we found that the number of processing nodes, the partitioning of the data, the placement of results, and the degree of data reduction effected by the aggregation impacted the performance of the algorithms. For distributed result placement, we discovered that greedy time division merge was the obvious choice. For centralized results and high data reduction, pair-wise merge was preferred for a large number of processing nodes; for low data reduction, it only performed well up to 32 nodes. This led us to a centralized variant of greedy time division merge which was best for the remaining cases. We present a cost model that closely predicts the running time of greedy time division merge.
AB - The ability to model the temporal dimension is essential to many applications. Furthermore, the rate of increase in database size and stringency of response time requirements has out-paced advancements in processor and mass storage technology, leading to the need for parallel temporal database management systems. In this paper, we introduce a variety of parallel temporal aggregation algorithms for the shared-nothing architecture; these algorithms are based on the sequential aggregation tree algorithm. We are particularly interested in developing parallel algorithms that can maximally exploit available memory to quickly compute large-scale temporal aggregates without intermediate disk writes and reads. Via an empirical study, we found that the number of processing nodes, the partitioning of the data, the placement of results, and the degree of data reduction effected by the aggregation impacted the performance of the algorithms. For distributed result placement, we discovered that greedy time division merge was the obvious choice. For centralized results and high data reduction, pair-wise merge was preferred for a large number of processing nodes; for low data reduction, it only performed well up to 32 nodes. This led us to a centralized variant of greedy time division merge which was best for the remaining cases. We present a cost model that closely predicts the running time of greedy time division merge.
KW - Aggregation tree algorithm
KW - Cost model
KW - Data reduction
KW - Parallel temporal aggregation
KW - Result placement
KW - Temporal declustering
UR - http://www.scopus.com/inward/record.url?scp=3543114951&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=3543114951&partnerID=8YFLogxK
U2 - 10.1023/B:DAPD.0000028553.70337.e1
DO - 10.1023/B:DAPD.0000028553.70337.e1
M3 - Article
AN - SCOPUS:3543114951
SN - 0926-8782
VL - 16
SP - 123
EP - 163
JO - Distributed and Parallel Databases
JF - Distributed and Parallel Databases
IS - 2
ER -