Abstract
Convolutional Neural Networks (GCNs) have emerged as promising tools for graph-based machine learning applications. Given that GCNs are both compute- and memory-intensive, this constitutes a major challenge for the underlying hardware to efficiently process large-scale GCNs. In this article, we introduce SGCNAX, a scalable GCN accelerator architecture for the high-performance and energy-efficient acceleration of GCNs. Unlike prior GCN accelerators that either employ limited loop optimization techniques, or determine the design variables based on random sampling, we systematically explore the loop optimization techniques for GCN acceleration and propose a flexible GCN dataflow that adapts to different GCN configurations to achieve optimal efficiency. We further propose two hardware-based techniques to address the workload imbalance problem caused by the unbalanced distribution of zeros in GCNs. Specifically, SGCNAX exploits an outer-product-based computation architecture that mitigates the intra-PE (Processing Elements) workload imbalance, and employs a group-and-shuffle approach to mitigate the inter-PE workload imbalance. Simulation results show that SGCNAX performs 9.2×, 1.6× and 1.2× better, and reduces DRAM accesses by a factor of 9.7×, 2.9× and 1.2× compared to HyGCN, AWB-GCN, and GCNAX, respectively.
Original language | English (US) |
---|---|
Pages (from-to) | 2834-2845 |
Number of pages | 12 |
Journal | IEEE Transactions on Parallel and Distributed Systems |
Volume | 33 |
Issue number | 11 |
DOIs | |
State | Published - Nov 1 2022 |
Externally published | Yes |
Keywords
- Graph convolutional neural networks
- dataflow accelerators
- domain-specific accelerators
- memory access optimization
ASJC Scopus subject areas
- Signal Processing
- Hardware and Architecture
- Computational Theory and Mathematics