Dynamic error mitigation in NoCs using intelligent prediction techniques

Dominic DiTomaso, Travis Boraten, Avinash Kodi, Ahmed Louri

Research output: Chapter in Book/Report/Conference proceedingConference contribution

22 Scopus citations

Abstract

Network-on-chips (NoCs) are quickly becoming the standard communication fabric for multi-core systems. As technology continues to scale down into the nanometer regime, device behavior will become increasingly unreliable due to a combination of aging, soft errors, aggressive transistor design, and process-voltage-Temperature variations. Further, stringent timing constraints in NoCs are designed so that data can be pushed faster. The net result is an increase in errors which must be mitigated by the NoC. Typical techniques for handling faults are often reactive as they respond to faults after the error has occurred, making the recovery process inefficient in energy and time. In this paper, we take a different approach wherein we propose to use proactive, fault-Tolerant schemes to be employed before the fault affects the system. We propose to utilize machine learning techniques to train a decision tree which can be used to predict faults efficiently in the network. Based on the prediction model, we dynamically mitigate these predicted faults through error correction codes (ECC) and relaxed timing transmission. Our results indicate that, on average, we can accurately predict timing errors 60.6% better than a static single error correction and double error detection (SECDED) technique resulting in an average 26.8% reduction in retransmitted packets, a average net speedup of 3.31 x, and an average energy savings of 60.0% over other designs for real traffic patterns.

Original languageEnglish (US)
Title of host publicationMICRO 2016 - 49th Annual IEEE/ACM International Symposium on Microarchitecture
PublisherIEEE Computer Society
ISBN (Electronic)9781509035083
DOIs
StatePublished - Dec 14 2016
Event49th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2016 - Taipei, Taiwan, Province of China
Duration: Oct 15 2016Oct 19 2016

Publication series

NameProceedings of the Annual International Symposium on Microarchitecture, MICRO
Volume2016-December
ISSN (Print)1072-4451

Other

Other49th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2016
Country/TerritoryTaiwan, Province of China
CityTaipei
Period10/15/1610/19/16

ASJC Scopus subject areas

  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'Dynamic error mitigation in NoCs using intelligent prediction techniques'. Together they form a unique fingerprint.

Cite this