TY - GEN
T1 - POSTER
T2 - 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2022
AU - Cheshmi, Kazem
AU - Strout, Michelle Mills
AU - Dehnavi, Maryam Mehri
N1 - Funding Information:
This work was supported in part by NSERC Discovery Grants (RGPIN-06516, DGECR00303), the Canada Research Chairs program, and U.S. NSF awards NSF CCF-1814888, NSF CCF-1657175; used the Extreme Science and Engineering Discovery Environment (XSEDE) [14] which is supported by NSF grant number ACI-1548562; and was enabled in part by Compute Canada and Scinet1. We also like to thank George Huang for his involvement in the prototyping of the approach.
Publisher Copyright:
© 2022 Owner/Author.
PY - 2022/4/2
Y1 - 2022/4/2
N2 - This work proposes a framework called FuSy that analyzes the data dependence graphs (DAGs) of two sparse kernels and creates an efficient schedule to execute the kernels in combination. Sparse kernels are frequently used in scientific codes and in machine learning algorithms and very often they are used in combination. Iterative linear system solvers are an example where kernels such as sparse triangular solver (SpTRSV) and sparse matrix-vector multiplication (SpMV) are called consecutively in each iteration of the solver. Prior approaches typically optimize these sparse kernels independently leading to high synchronization overheads and low locality. We propose an approach that analyzes the DAGs of two sparse kernels and then creates a new order of execution that enables running the two kernels efficiently in parallel. To investigate the efficiency of our approach, we compare it with the state-of-the-art MKL library for two kernel combinations, SpTRSV-SpMV and SpMV-SpTRSV which are commonly used in iterative solvers. Experimental results show that our approach is on average 2.6X and 1.8X faster than the MKL library for a set of matrices from the Suitesparse matrix repository.
AB - This work proposes a framework called FuSy that analyzes the data dependence graphs (DAGs) of two sparse kernels and creates an efficient schedule to execute the kernels in combination. Sparse kernels are frequently used in scientific codes and in machine learning algorithms and very often they are used in combination. Iterative linear system solvers are an example where kernels such as sparse triangular solver (SpTRSV) and sparse matrix-vector multiplication (SpMV) are called consecutively in each iteration of the solver. Prior approaches typically optimize these sparse kernels independently leading to high synchronization overheads and low locality. We propose an approach that analyzes the DAGs of two sparse kernels and then creates a new order of execution that enables running the two kernels efficiently in parallel. To investigate the efficiency of our approach, we compare it with the state-of-the-art MKL library for two kernel combinations, SpTRSV-SpMV and SpMV-SpTRSV which are commonly used in iterative solvers. Experimental results show that our approach is on average 2.6X and 1.8X faster than the MKL library for a set of matrices from the Suitesparse matrix repository.
KW - loop fusion
KW - loop-carried dependence
KW - sparse matrix code
UR - http://www.scopus.com/inward/record.url?scp=85127564261&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85127564261&partnerID=8YFLogxK
U2 - 10.1145/3503221.3508439
DO - 10.1145/3503221.3508439
M3 - Conference contribution
AN - SCOPUS:85127564261
T3 - Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP
SP - 459
EP - 460
BT - PPoPP 2022 - Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
PB - Association for Computing Machinery
Y2 - 2 April 2022 through 6 April 2022
ER -