TY - GEN
T1 - HDagg
T2 - 36th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2022
AU - Zarebavani, Behrooz
AU - Cheshmi, Kazem
AU - Liu, Bangtian
AU - Strout, Michelle Mills
AU - Dehnavi, Maryam Mehri
N1 - Funding Information:
This work was supported in part by NSERC Discovery Grants (RGPIN-06516, DGECR00303), the Canada Research Chairs program, and U.S. NSF awards NSF CCF-1814888, NSF CCF1657175, NSF CCF-1563732; used the Extreme Science and Engineering Discovery Environment (XSEDE) [46] which is supported by NSF grant number ACI-1548562; and was enabled in part by Compute Canada and Scinet.
Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - This paper proposes a novel aggregation algorithm, called Hybrid DAG Aggregation (HDagg), that groups iterations of sparse matrix computations with loop carried dependence to improve their parallel execution on multicore processors. Prior approaches to optimize sparse matrix computations fail to provide an efficient balance between locality, load balance, and synchronization and are primarily optimized for codes with a tree-structure data dependence. HDagg is optimized for sparse matrix computations that their data dependence graphs (DAGs) do not have a tree structure, such as incomplete matrix factorization algorithms. It uses a hybrid approach to aggregate vertices and wavefronts in the DAG of a sparse computation to create well-balanced parallel workloads with good locality. Across three sparse kernels, triangular solver, incomplete Cholesky, and incomplete LU, HDagg outperforms existing sparse libraries such as MKL with an average speedup of 3.56× and is faster than state-of-the-art inspector-executor approaches that optimize sparse computations, i.e. DAGP, LBC, wavefront parallelism techniques, and SpMP by an average speedup of 3.87×, 3.41×, 1.95×, and 1.43× respectively.
AB - This paper proposes a novel aggregation algorithm, called Hybrid DAG Aggregation (HDagg), that groups iterations of sparse matrix computations with loop carried dependence to improve their parallel execution on multicore processors. Prior approaches to optimize sparse matrix computations fail to provide an efficient balance between locality, load balance, and synchronization and are primarily optimized for codes with a tree-structure data dependence. HDagg is optimized for sparse matrix computations that their data dependence graphs (DAGs) do not have a tree structure, such as incomplete matrix factorization algorithms. It uses a hybrid approach to aggregate vertices and wavefronts in the DAG of a sparse computation to create well-balanced parallel workloads with good locality. Across three sparse kernels, triangular solver, incomplete Cholesky, and incomplete LU, HDagg outperforms existing sparse libraries such as MKL with an average speedup of 3.56× and is faster than state-of-the-art inspector-executor approaches that optimize sparse computations, i.e. DAGP, LBC, wavefront parallelism techniques, and SpMP by an average speedup of 3.87×, 3.41×, 1.95×, and 1.43× respectively.
KW - Loop-carried Dependence
KW - Parallelism
KW - Sparse Matrix Computations
UR - http://www.scopus.com/inward/record.url?scp=85136336267&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85136336267&partnerID=8YFLogxK
U2 - 10.1109/IPDPS53621.2022.00121
DO - 10.1109/IPDPS53621.2022.00121
M3 - Conference contribution
AN - SCOPUS:85136336267
T3 - Proceedings - 2022 IEEE 36th International Parallel and Distributed Processing Symposium, IPDPS 2022
SP - 1217
EP - 1227
BT - Proceedings - 2022 IEEE 36th International Parallel and Distributed Processing Symposium, IPDPS 2022
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 30 May 2022 through 3 June 2022
ER -