TY - GEN
T1 - Matrox
T2 - 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2020
AU - Liu, Bangtian
AU - Cheshmi, Kazem
AU - Soori, Saeed
AU - Strout, Michelle Mills
AU - Dehnavi, Maryam Mehri
N1 - Publisher Copyright:
© 2020 Association for Computing Machinery.
PY - 2020/2/19
Y1 - 2020/2/19
N2 - Hierarchical matrix approximations have gained significant traction in the machine learning and scientific community as they exploit available low-rank structures in kernel methods to compress the kernel matrix. The resulting compressed matrix, HMatrix, is used to reduce the computational complexity of operations such as HMatrix-matrix multiplications with tuneable accuracy in an evaluation phase. Existing implementations of HMatrix evaluations do not preserve locality and often lead to unbalanced parallel execution with high synchronization. Also, current solutions require the compression phase to re-execute if the kernel method or the required accuracy change. MatRox is a framework that uses novel structure analysis strategies with code specialization and a storage format to improve locality and create load-balanced parallel tasks for HMatrix-matrix multiplications. Modularization of the matrix compression phase enables the reuse of computations when there are changes to the input accuracy and the kernel function. The MatRox-generated code for matrix-matrix multiplication is 2.98×, 1.60×, and 5.98× faster than library implementations available in GOFMM, SMASH, and STRUMPACK respectively. Additionally, the ability to reuse portions of the compression computation for changes to the accuracy leads to up to 2.64× improvement with MatRox over five changes to accuracy using GOFMM.
AB - Hierarchical matrix approximations have gained significant traction in the machine learning and scientific community as they exploit available low-rank structures in kernel methods to compress the kernel matrix. The resulting compressed matrix, HMatrix, is used to reduce the computational complexity of operations such as HMatrix-matrix multiplications with tuneable accuracy in an evaluation phase. Existing implementations of HMatrix evaluations do not preserve locality and often lead to unbalanced parallel execution with high synchronization. Also, current solutions require the compression phase to re-execute if the kernel method or the required accuracy change. MatRox is a framework that uses novel structure analysis strategies with code specialization and a storage format to improve locality and create load-balanced parallel tasks for HMatrix-matrix multiplications. Modularization of the matrix compression phase enables the reuse of computations when there are changes to the input accuracy and the kernel function. The MatRox-generated code for matrix-matrix multiplication is 2.98×, 1.60×, and 5.98× faster than library implementations available in GOFMM, SMASH, and STRUMPACK respectively. Additionally, the ability to reuse portions of the compression computation for changes to the accuracy leads to up to 2.64× improvement with MatRox over five changes to accuracy using GOFMM.
UR - https://www.scopus.com/pages/publications/85082399190
UR - https://www.scopus.com/pages/publications/85082399190#tab=citedBy
U2 - 10.1145/3332466.3374548
DO - 10.1145/3332466.3374548
M3 - Conference contribution
AN - SCOPUS:85082399190
T3 - Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP
SP - 389
EP - 402
BT - PPoPP 2020 - Proceedings of the 2020 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
PB - Association for Computing Machinery
Y2 - 22 February 2020 through 26 February 2020
ER -