Tackling permanent faults in the Network-on-Chip router pipeline

Pavan Poluri, Ahmed Louri

Research output: Chapter in Book/Report/Conference proceedingConference contribution

13 Scopus citations

Abstract

The proliferation of multi-core and many-core chips for performance scaling is making the Network-on-Chip (NoC) occupy a growing amount of silicon area spanning several metal layers. The NoC is neither immune to hard faults and transient faults nor unaffected by the adverse increase in hard faults caused by technology scaling. The ramifications for the NoC are immense: a single fault in the NoC may paralyze the working of the entire chip. To this end, we propose a Permanent Fault Tolerant Router (PFTR) that is capable of tolerating multiple permanent faults in the pipeline. PFTR is designed by making architectural modifications to individual pipeline stages of the baseline NoC router. These architectural modifications involve adding minimum extra circuitry and exploiting temporal parallelism to accomplish fault tolerance. Tolerance of multiple faults is achieved by striking a balance between three important design factors namely, area overhead, power overhead and reliability. We use Silicon Protection Factor [13] (SPF) as the reliability metric to assess the reliability improvement of the proposed architecture. SPF takes into account the number of faults required to cause failure and the area overhead of the additional circuitry to evaluate reliability. SPF calculation reveals that the proposed PFTR is 11 times more reliable than the baseline NoC router. Synthesis results using Cadence Encounter RTL Compiler at 45nm technology show that the additional circuitry adds an area overhead of 31% and power overhead of 30% with respect to the baseline NoC router. PFTR provides much better reliability with much less overhead as compared to other fault tolerant routers such as BulletProof [13], Vicis [14] and RoCo [15].

Original languageEnglish (US)
Title of host publicationProceedings - 2013 25th International Symposium on Computer Architecture and High Performance Computing, SBAC-PAD 2013
PublisherIEEE Computer Society
Pages49-56
Number of pages8
ISBN (Print)9781479929276
DOIs
StatePublished - 2013
Event2013 25th International Symposium on Computer Architecture and High Performance Computing, SBAC-PAD 2013 - Porto de Galinhas, PE, Brazil
Duration: Oct 23 2013Oct 26 2013

Publication series

NameProceedings - Symposium on Computer Architecture and High Performance Computing
ISSN (Print)1550-6533

Other

Other2013 25th International Symposium on Computer Architecture and High Performance Computing, SBAC-PAD 2013
Country/TerritoryBrazil
CityPorto de Galinhas, PE
Period10/23/1310/26/13

Keywords

  • Area
  • Latency
  • Network-on-Chip
  • Power
  • Reliability
  • Router architecture

ASJC Scopus subject areas

  • Hardware and Architecture
  • Software

Fingerprint

Dive into the research topics of 'Tackling permanent faults in the Network-on-Chip router pipeline'. Together they form a unique fingerprint.

Cite this