High-performance, Energy-efficient, Fault-tolerant Network-on-Chip Design Using Reinforcement Learnin

Ke Wang, Ahmed Louri, Avinash Karanth, Razvan Bunescu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

37 Scopus citations

Abstract

Network-on-Chips (NoCs) are becoming the standard communication fabric for multi-core and system on a chip (SoC) architectures. As technology continues to scale, transistors and wires on the chip are becoming increasingly vulnerable to various fault mechanisms, especially timing errors, resulting in exacerbation of energy efficiency and performance for NoCs. Typical techniques for handling timing errors are reactive in nature, responding to the faults after their occurrence. They rely on error detection/correction techniques which have resulted in excessive power consumption and degraded performance, since the error detection/correction hardware is constantly enabled. On the other hand, indiscriminately disabling error handling hardware can induce more errors and intrusive retransmission traffic. Therefore, the challenge is to balance the trade-offs among error rate, packet retransmission, performance, and energy. In this paper, we propose a proactive fault-tolerant mechanism to optimize energy efficiency and performance with reinforcement learning (RL). First, we propose a new proactive error handling technique comprised of a dynamic scheme for enabling per-router error detection/correction hardware and an effective retransmission mechanism. Second, we propose the use of RL to train the dynamic control policy with the goals of providing increased fault-tolerance, reduced power consumption and improved performance as compared to conventional techniques. Our evaluation indicates that, on average, end-to-end packet latency is lowered by 55%, energy efficiency is improved by 64%, and retransmission caused by faults is reduced by 48% over the reactive error correction techniques.

Original languageEnglish (US)
Title of host publicationProceedings of the 2019 Design, Automation and Test in Europe Conference and Exhibition, DATE 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1166-1171
Number of pages6
ISBN (Electronic)9783981926323
DOIs
StatePublished - May 14 2019
Event22nd Design, Automation and Test in Europe Conference and Exhibition, DATE 2019 - Florence, Italy
Duration: Mar 25 2019Mar 29 2019

Publication series

NameProceedings of the 2019 Design, Automation and Test in Europe Conference and Exhibition, DATE 2019

Conference

Conference22nd Design, Automation and Test in Europe Conference and Exhibition, DATE 2019
Country/TerritoryItaly
CityFlorence
Period3/25/193/29/19

ASJC Scopus subject areas

  • Hardware and Architecture
  • Electrical and Electronic Engineering
  • Safety, Risk, Reliability and Quality
  • Control and Optimization

Fingerprint

Dive into the research topics of 'High-performance, Energy-efficient, Fault-tolerant Network-on-Chip Design Using Reinforcement Learnin'. Together they form a unique fingerprint.

Cite this