Abstract
The increasing number of cores on a chip has made the network on chip (NoC) concept the standard communication paradigm for chip multiprocessors. A fault in an NoC leads to undesirable ramifications that can severely impact the performance of a chip. Therefore, it is vital to design fault tolerant NoCs. In this paper, we present Shield , a reliable NoC router architecture that has the unique ability to tolerate both hard and soft errors in the routing pipeline using techniques such as spatial redundancy, exploitation of idle cycles, bypassing of faulty resources and selective hardening. Using Mean Time to Failure and Silicon Protection Factor metrics, we show that Shield is six times more reliable than the baseline-unprotected router and is at least 1.5 times more reliable than existing fault tolerant router architectures. We introduce a new metric called Soft Error Improvement Factor and show that the soft error tolerance of Shield has improved by three times in comparison to the baseline-unprotected router. This reliability improvement is accomplished by incurring an area and power overhead of 34 and 31 percent respectively. Latency analysis using SPLASH-2 and PARSEC reveals that in the presence of faults, latency increases by a modest 13 and 10 percent respectively.
Original language | English (US) |
---|---|
Article number | 7390298 |
Pages (from-to) | 3058-3070 |
Number of pages | 13 |
Journal | IEEE Transactions on Parallel and Distributed Systems |
Volume | 27 |
Issue number | 10 |
DOIs | |
State | Published - Oct 1 2016 |
Keywords
- Network-on-chip
- hard faults
- mean time to failure
- router architecture
- soft errors
ASJC Scopus subject areas
- Signal Processing
- Hardware and Architecture
- Computational Theory and Mathematics