CURE: A High-Performance, Low-Power, and Reliable Network-on-Chip Design Using Reinforcement Learning

Ke Wang, Ahmed Louri

Research output: Contribution to journalArticlepeer-review

25 Scopus citations

Abstract

We propose CURE, a deep reinforcement learning (DRL)-based NoC design framework that simultaneously reduces network latency, improves energy-efficiency, and tolerates transient errors and permanent faults. CURE has several architectural innovations and a DRL-based hardware controller to manage design complexity and optimize trade-offs. First, in CURE, we propose reversible multi-function adaptive channels (RMCs) to reduce NoC power consumption and network latency. Second, we implement a new fault-secure adaptive error correction hardware in each router to enhance reliability for both transient errors and permanent faults. Third, we propose a router power-gating and bypass design that powers off NoC components to reduce power and extend chip lifespan. Further, for the complex dynamic interactions of these techniques, we propose using DRL to train a proactive control policy to provide improved fault-tolerance, reduced power consumption, and improved performance. Simulation using the PARSEC benchmark shows that CURE reduces end-to-end packet latency by 39 percent, improves energy efficiency by 92 percent, and lowers static and dynamic power consumption by 24 and 38 percent, respectively, over conventional solutions. Using mean-time-to-failure, we show that CURE is 7.7× more reliable than the conventional NoC design.

Original languageEnglish (US)
Article number9061016
Pages (from-to)2125-2138
Number of pages14
JournalIEEE Transactions on Parallel and Distributed Systems
Volume31
Issue number9
DOIs
StatePublished - Sep 1 2020
Externally publishedYes

Keywords

  • Computer architecture
  • deep reinforcement learning
  • network-on-chip(NoC)
  • reliability

ASJC Scopus subject areas

  • Signal Processing
  • Hardware and Architecture
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'CURE: A High-Performance, Low-Power, and Reliable Network-on-Chip Design Using Reinforcement Learning'. Together they form a unique fingerprint.

Cite this