Implementation of Self-Healing Asynchronous Circuits at the Example of a Video-Processing Algorithm
- T. Panhofer W. Friesenbichler A. Steininger
Implementation of Self-Healing Asynchronous Circuits at the Example - - PowerPoint PPT Presentation
Implementation of Self-Healing Asynchronous Circuits at the Example of a Video-Processing Algorithm T. Panhofer W. Friesenbichler A. Steininger Vienna University of Technology Outline Motivation & Objective Asynchronous Logic
2
Motivation & Objective Asynchronous Logic Self-Healing Concept Case Study: SH implementation of
Experimental Results (& Lessons Learnt) Conclusion & Outlook
3
threshold voltages, delays, leakages,…
lower voltage, smaller critical charge,…
more functions/chip, higher temperature
4
5
based on local handshaking (closed loop), not on global clock (open loop)
6
implicit request explicit acknowledge
dual-rail encoded data two representations for HI/LO tokens in alternating „phases“
7
8
☺ DI logic tends to stop working in this case
☺ handshake signals tend to point there
☺ temporal robustness makes re-routing easier
9
10
11
TMR without interruption of service (2oo3) selfhealing possibly with interruption (1oo2)
more options to bypass defective element no need to rely on „luck“ (next defect not in
12
static => too memory intensive
dynamic => too performance intensive
13
„random repair“ without diagnosis bits of a counter control switches count up upon watchdog timeout
if defect not removed => circuit still halted
with first valid configuration circuit operation
14
linear correction dead column correction
15
pipeline with forks, joins and loops
long mission time extreme environment high dependabiltiy required no manual repair possible
16
STEFAN = Synthesizeable Test Environment For Asynchronous Networks
17
Autonomous reconfiguration Single stuck-at fault injected at internal
Counter used as
18
# of 4-input LUTs (Xilinx Virtex-4) Standard FPGAs can be used for prototyping of
207% resources but multiple fault tolerance Reconfiguration Unit might have significant impact
19
counter causes overhead => use LFSR too many values to try => split controllers ineffective repair attempts may corrupt state
block-wise diagnosis with local „random“ repair
20
detection reconfiguration and recovery
22
23
coarse grain: constant overhead fine grain: decreasing relative
switches