Abstract
Creating an environment of "no doubt" for computing systems is critical for supporting next generation science, engineering, and commercial applications. With reconfigurable devices such as Field Programmable Gate Arrays (FPGAs), designers are provided with a seductive tool to use as a basis for sophisticated but highly reliable platforms. Reconfigurable computing platforms potentially offer the enhancement of reliability and recovery from catastrophic failures through partial and dynamic reconfigurations; and eliminate the need for redundant hardware resources typically used by existing fault-tolerant systems. We propose a two-level self-healing methodology to offer 100% availability for mission critical systems with comparatively less hardware overhead and performance degradation. Our proposed system first undertakes healing at the node-level. Failing to rectify the system at the node-level, network-level healing is then undertaken. We have designed a system based on Xilinx Virtex-5 FPGAs and Cirronet wireless mesh nodes to demonstrate autonomous wireless healing capability among networked node devices. Our prototype is a proof-of-concept work which demonstrates the feasibility of using FPGAs to provide maximum computational availability in a critical self-healing distributed architecture.
Original language | English (US) |
---|---|
Pages (from-to) | 269-284 |
Number of pages | 16 |
Journal | Cluster Computing |
Volume | 12 |
Issue number | 3 |
DOIs | |
State | Published - 2009 |
Keywords
- Adaptive
- FPGA
- Partial reconfiguration
- Reconfigurable
- Self-healing
ASJC Scopus subject areas
- Software
- Computer Networks and Communications