TY - JOUR
T1 - SGCNAX
T2 - A Scalable Graph Convolutional Neural Network Accelerator With Workload Balancing
AU - Li, Jiajun
AU - Zheng, Hao
AU - Wang, Ke
AU - Louri, Ahmed
N1 - Funding Information:
This work was supported by National Science Foundation under Grants CCF-1702980, CCF-1812495, CCF-1901165, and CCF-2131946
Publisher Copyright:
© 1990-2012 IEEE.
PY - 2021
Y1 - 2021
N2 - Convolutional Neural Networks (GCNs) have emerged as promising tools for graph-based machine learning applications. Given that GCNs are both compute- and memory-intensive, this constitutes a major challenge for the underlying hardware to efficiently process large-scale GCNs. In this article, we introduce SGCNAX, a scalable GCN accelerator architecture for the high-performance and energy-efficient acceleration of GCNs. Unlike prior GCN accelerators that either employ limited loop optimization techniques, or determine the design variables based on random sampling, we systematically explore the loop optimization techniques for GCN acceleration and propose a flexible GCN dataflow that adapts to different GCN configurations to achieve optimal efficiency. We further propose two hardware-based techniques to address the workload imbalance problem caused by the unbalanced distribution of zeros in GCNs. Specifically, SGCNAX exploits an outer-product-based computation architecture that mitigates the intra-PE (Processing Elements) workload imbalance, and employs a group-and-shuffle approach to mitigate the inter-PE workload imbalance. Simulation results show that SGCNAX performs 9.2×, 1.6× and 1.2× better, and reduces DRAM accesses by a factor of 9.7×, 2.9× and 1.2× compared to HyGCN, AWB-GCN, and GCNAX, respectively.
AB - Convolutional Neural Networks (GCNs) have emerged as promising tools for graph-based machine learning applications. Given that GCNs are both compute- and memory-intensive, this constitutes a major challenge for the underlying hardware to efficiently process large-scale GCNs. In this article, we introduce SGCNAX, a scalable GCN accelerator architecture for the high-performance and energy-efficient acceleration of GCNs. Unlike prior GCN accelerators that either employ limited loop optimization techniques, or determine the design variables based on random sampling, we systematically explore the loop optimization techniques for GCN acceleration and propose a flexible GCN dataflow that adapts to different GCN configurations to achieve optimal efficiency. We further propose two hardware-based techniques to address the workload imbalance problem caused by the unbalanced distribution of zeros in GCNs. Specifically, SGCNAX exploits an outer-product-based computation architecture that mitigates the intra-PE (Processing Elements) workload imbalance, and employs a group-and-shuffle approach to mitigate the inter-PE workload imbalance. Simulation results show that SGCNAX performs 9.2×, 1.6× and 1.2× better, and reduces DRAM accesses by a factor of 9.7×, 2.9× and 1.2× compared to HyGCN, AWB-GCN, and GCNAX, respectively.
KW - Graph convolutional neural networks
KW - dataflow accelerators
KW - domain-specific accelerators
KW - memory access optimization
UR - http://www.scopus.com/inward/record.url?scp=85121346330&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85121346330&partnerID=8YFLogxK
U2 - 10.1109/TPDS.2021.3133691
DO - 10.1109/TPDS.2021.3133691
M3 - Article
AN - SCOPUS:85121346330
VL - 33
SP - 2834
EP - 2845
JO - IEEE Transactions on Parallel and Distributed Systems
JF - IEEE Transactions on Parallel and Distributed Systems
SN - 1045-9219
IS - 11
ER -