TY - GEN

T1 - POSTER

T2 - 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2022

AU - Cheshmi, Kazem

AU - Strout, Michelle Mills

AU - Dehnavi, Maryam Mehri

N1 - Funding Information:
This work was supported in part by NSERC Discovery Grants (RGPIN-06516, DGECR00303), the Canada Research Chairs program, and U.S. NSF awards NSF CCF-1814888, NSF CCF-1657175; used the Extreme Science and Engineering Discovery Environment (XSEDE) [14] which is supported by NSF grant number ACI-1548562; and was enabled in part by Compute Canada and Scinet1. We also like to thank George Huang for his involvement in the prototyping of the approach.
Publisher Copyright:
© 2022 Owner/Author.

PY - 2022/4/2

Y1 - 2022/4/2

N2 - This work proposes a framework called FuSy that analyzes the data dependence graphs (DAGs) of two sparse kernels and creates an efficient schedule to execute the kernels in combination. Sparse kernels are frequently used in scientific codes and in machine learning algorithms and very often they are used in combination. Iterative linear system solvers are an example where kernels such as sparse triangular solver (SpTRSV) and sparse matrix-vector multiplication (SpMV) are called consecutively in each iteration of the solver. Prior approaches typically optimize these sparse kernels independently leading to high synchronization overheads and low locality. We propose an approach that analyzes the DAGs of two sparse kernels and then creates a new order of execution that enables running the two kernels efficiently in parallel. To investigate the efficiency of our approach, we compare it with the state-of-the-art MKL library for two kernel combinations, SpTRSV-SpMV and SpMV-SpTRSV which are commonly used in iterative solvers. Experimental results show that our approach is on average 2.6X and 1.8X faster than the MKL library for a set of matrices from the Suitesparse matrix repository.

AB - This work proposes a framework called FuSy that analyzes the data dependence graphs (DAGs) of two sparse kernels and creates an efficient schedule to execute the kernels in combination. Sparse kernels are frequently used in scientific codes and in machine learning algorithms and very often they are used in combination. Iterative linear system solvers are an example where kernels such as sparse triangular solver (SpTRSV) and sparse matrix-vector multiplication (SpMV) are called consecutively in each iteration of the solver. Prior approaches typically optimize these sparse kernels independently leading to high synchronization overheads and low locality. We propose an approach that analyzes the DAGs of two sparse kernels and then creates a new order of execution that enables running the two kernels efficiently in parallel. To investigate the efficiency of our approach, we compare it with the state-of-the-art MKL library for two kernel combinations, SpTRSV-SpMV and SpMV-SpTRSV which are commonly used in iterative solvers. Experimental results show that our approach is on average 2.6X and 1.8X faster than the MKL library for a set of matrices from the Suitesparse matrix repository.

KW - loop fusion

KW - loop-carried dependence

KW - sparse matrix code

UR - http://www.scopus.com/inward/record.url?scp=85127564261&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85127564261&partnerID=8YFLogxK

U2 - 10.1145/3503221.3508439

DO - 10.1145/3503221.3508439

M3 - Conference contribution

AN - SCOPUS:85127564261

T3 - Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP

SP - 459

EP - 460

BT - PPoPP 2022 - Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

PB - Association for Computing Machinery

Y2 - 2 April 2022 through 6 April 2022

ER -