Reconfigurable and Adaptive Systems (RAS) Adaptive Systems (RAS) - PowerPoint PPT Presentation

Institut für Technische Informatik Institut für Technische Informatik Chair for Embedded Systems - Prof. Dr. J. Henkel Chair for Embedded Systems - Prof. Dr. J. Henkel Vorlesung im SS 2014 Reconfigurable and Reconfigurable and Adaptive Systems (RAS) Adaptive Systems (RAS) 8. Fault Tolerance and Reliability Lars Bauer, Artjom Grudnitsky, in FPGA based Systems Hongyan Zhang, Jörg Henkel - 1 - - 2 - RAS Topic Overview Institut für Technische Informatik Chair for Embedded Systems - Prof. Dr. J. Henkel 1. Introduction 2. Overview 3. Special Instructions 8 8.1 Introduction 4. Fine-Grained Reconfigurable Processors • Introduction 5. Configuration Prefetching • Fault Detection and Mitigation 6. Coarse-Grained Techniques Reconfigurable Processors • Applications of 7. Adaptive Reliability Techniques Reconfigurable Processors • LHC 8. Fault-tolerance • Space by Reconfiguration • OTERA - 3 - - 4 - L. Bauer, CES, KIT, 2014

Why Fault Tolerance? Types of Faults � CMOS Scaling increases � Permanent Faults: e.g. stuck-at failures in CLBs and opens, occurrence of bridges, shorts in the programmable switching matrix � Manufacturing defects � Could occur during the fabrication process without being detected � Post-deployment degradation � Damage of device resources may also appear in the life cycle of � Especially important for FPGAs FPGAs as they have a high amount of transistors and interconnect wires � Transient Faults: have a temporary cause that can alter Gordon E. Moore (co-founded Intel in 1968) signal values or state stored in memory cells, which � Environmental conditions can incur temporary faults creates indefinite and incorrect states in the computation ITRS � E.g. Aerospace industry – use hardened � E.g. by a high energy particle strike resulting in an energy # of dopant atoms in Transistor-channel devices for mission critical tasks, FPGAs for exchange and charge displacement non-critical data processing � Intermittent Faults: have a permanent cause in the � Unlike ASICs, FPGAs can adapt # dopant atoms structure of the circuit but their effect is intermittent, e.g. to deal with permanent and depending on temperature or power consumption temporary faults - 5 - - 6 - L. Bauer, CES, KIT, 2014 L. Bauer, CES, KIT, 2014 Negative Bias Temperature Negative Bias Temperature Instability (NBTI) Instability (NBTI) ( cont‘d ) V g � NBTI manifests itself as a shift in � Breakdown of Si-H bonds at gate V th the silicon-oxide interface S D V th shift [V] oxide � Causes increase in transistor delay due to voltage/thermal stress p p n � NBTI leads to delay faults and � causes interface traps resulting circuit failure P-type MOSFET Stress Recovery � Recovery effect in periods of no � Affects mostly P-MOSFETs stress because of negative gate bias � When voltage and temperature � Effect in N-MOSFETS is are low, V th can shift back towards H + 0 V g [V] its original value negligible trap O H O H � Full recovery from a stress period � Despite research focus: only possible in infinite time -1 Si Si Si Si Si � In practice, overall V th shift NBTI is observed, but not increases over longer periods, e.g. yet fully understood months or years Time V g < 0 � STRESS! - 7 - - 8 - L. Bauer, CES, KIT, 2014 L. Bauer, CES, KIT, 2014

NBTI Impact on Lifetime of NBTI and Temperature SRAM The NBTI effect is minimum here 40% (SNM) degradation after 7 because the NBTI stress will equally � Temperature plays important aspect in NBTI modeling Signal to Noise Margin be distributed between the two PMOS 35% � Higher temperatures transistors existing in the SRAM 30% years in 32nm increase shift in 25% threshold voltage 20% � � Vth approximately 15% 50% higher at 75°C 10% than 55°C 5% � NBTI effect at 75°C 0% is approximately equal to alternating between Percentage of time that the cell stores zero [%] 85°C and 25°C src: S. Kothawade, K. Chakraborty, S. Roy, "Analysis and mitigation of NBTI aging in register file: An end-to-end approach" - 9 - - 10 - L. Bauer, CES, KIT, 2014 L. Bauer, CES, KIT, 2014 Types of Degradation (cont’d) Types of Degradation ( cont‘d ) � Hot-Carrier Injection (HCI): build up of � Time-Dependent Dielectric Breakdown trapped charges in the gate-channel interface (TDDB): over time conducting path forms in region thin oxide layers � progressive reduction of carrier mobility � increase in CMOS threshold voltage � Switching speed slower, leads to timing problems [CCMA10] G S D - 11 - - 12 - L. Bauer, CES, KIT, 2014 L. Bauer, CES, KIT, 2014

Main Reason for many of these Dennard Scaling vs. Power Density effects: High-Fields � Transistor and power scaling are no longer � Most of device problems can be tracked down to high-field balanced effects – related to the failure to follow Dennard Scaling � Scaling is limited by power � Higher power density leads to thermal problems � Accelerates aging effects Classical scaling (Dennard) Assuming a constant area Device count S 2 Chip freq. may reduce due to wire delay Device frequency S Device power (cap) 1/S Device power (V dd ) 1/S 2 Voltage scales 1/S � Power squared Power Density 1 [W/mm 2 ] S: Scaling Factor; Device: Transistor src: G. Venkatesh et al., “Conservation Cores: Reducing the Energy of Mature Computations”, ASPLOS ‘10 src: Radhakrishnan et al ., IEDM (2001) - 13 - - 14 - L. Bauer, CES, KIT, 2014 L. Bauer, CES, KIT, 2014 Types of Degradation ( cont‘d ) Dennard Scaling vs. Power Density � Transistor and power scaling are no longer � Electromigration: thermally activated metal balanced ions may leave their potential wells � Scaling is limited by power � Higher power density leads to thermal problems � electric field and momentum exchange through electrons direct metal ion migration � Accelerates aging effects Classical scaling (Dennard) Power Limited Scaling � can lead to open/short circuits Device count S 2 Device count S 2 Device frequency S Device frequency S Device power (cap) 1/S Device power (cap) 1/S Device power (V dd ) 1/S 2 Device power (V dd ) ~1 Power Density 1 Power Density S 2 S: Scaling Factor; Device: Transistor [wikipedia] src: G. Venkatesh et al., “Conservation Cores: Reducing the Energy of Mature Computations”, ASPLOS ‘10 - 15 - - 16 - L. Bauer, CES, KIT, 2014 L. Bauer, CES, KIT, 2014

Radiation induced faults Institut für Technische Informatik Chair for Embedded Systems - Prof. Dr. J. Henkel � Radiation induced faults � Single Event Upsets/Single Event Transients 8 8.2 Fault Detection and � Most common: single bit flip in SRAM cell � SEU effect on ASIC Mitigation Techniques � Transient (only variation is time duration of fault) � Even if latched, will be eventually overwritten High-Energy Particle (Neutron or Proton) � SEU effect on FPGAs p+ Isolation � Permanent (until reset/ Gate reconfiguration) if n+ n+ N-Well configuration memory + + + - + - + - + - - + - - + - + hit by SEU + - Depletion P-Well - Region P-Substrate Sources: Intel, S. Borker@DAC’03, Patrick-Emil Zörner, W.D. Nix, 1992, L.Finkelstein, Intel 2005, R. Baumann, - 17 - - 18 - L. Bauer, CES, KIT, 2014 TI@Design&Test’05, Ziegler, IBM@IBM JRD’96 Fault detection methods Modular Redundancy comparison � Masks errors, but does not correct underlying src: [SCC08] Detection Resource Performance Granularity Coverage fault Speed Overhead Overhead O Modular Fast: as Very large: Very small: Coarse: Good: All � Problem: error accumulation Redundancy soon as fault triplicate + Voter delay protect manifest � External is manifest voter module errors sized detected � Multiple FPGAs working in lockstep, i.e. per- blocks forming the same operation in each cycle � Output sent to radiation hardened voter � Internal � Replicate functional block in FPGA � Popular configurations � Triple Modular Redundancy (TMR) � Duplication with Comparison (DWC) - 19 - - 20 - L. Bauer, CES, KIT, 2014 L. Bauer, CES, KIT, 2014

Reconfigurable and Adaptive Systems (RAS) Adaptive Systems (RAS) - PowerPoint PPT Presentation

Institut fr Technische Informatik Institut fr Technische Informatik Chair for Embedded Systems - Prof. Dr. J. Henkel Chair for Embedded Systems - Prof. Dr. J. Henkel Vorlesung im SS 2014 Reconfigurable and Reconfigurable and Adaptive

Reconfigurable and Reconfigurable and Adaptive Systems (RAS) Adaptive Systems (RAS) 7. Adaptive

Reconfigurable Computing Reconfigurable Computing Reconfigurable Architectures Reconfigurable

Reconfigurable and Reconfigurable and Adaptive Systems (RAS) Adaptive Systems (RAS) 4.

Reconfigurable Computing Computing Reconfigurable Reconfigurable Architectures Architectures

Reconfigurable and Adaptive Systems (RAS) 7. Adaptive Reconfigurable Processors Lars Bauer, Jrg

INTRODUCTION TO RAS AL KHAIMAH ECONOMIC ZONE INTRODUCTION TO RAKEZ Online Video RAS AL KHAIMAH

Reconfigurable and Adaptive Systems (RAS) Lars Bauer, Jrg Henkel - 1 - Institut fr

Reconfigurable and Adaptive Systems (RAS) Lars Bauer, Jrg Henkel - 1 - Institut fr

Reconfigurable and Adaptive Systems (RAS) Lars Bauer, Jrg Henkel - 1 - Institut fr

Reconfigurable and Adaptive Systems (RAS) Lars Bauer, Jrg Henkel - 1 - Institut fr

Reconfigurable and Adaptive Systems (RAS) New Directions in FPGA Design A. Grudnitsky, L. Bauer,

Reconfigurable and Adaptive Systems (RAS) Lars Bauer, Artjom Grudnitsky, Hongyan Zhang, Jrg

Reconfigurable and Adaptive Systems (RAS) Lars Bauer, Jrg Henkel - 1 - Organisation

Reconfigurable and Adaptive Systems (RAS) Lars Bauer, Artjom Grudnitsky, Hongyan Zhang, Jrg

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Innovative Systems Research Network 11 th Annual Conference, Halifax, NS Session VII, Theme 1

Introduction to Participatory Mapping Gita Ljubicic & Joel Heath November 8, 2018

Rethinking Class-Balanced Methods for Long-tailed Visual Recognition from a Domain Adaptation

Geoengi gine neeri ring f ng for C r Climate te Cha hang nge: Na Natu ture Ha Has A

Healthcare: A wide-angled view CHF Australia Policy Forum, 12 November 2019 Michael Brennan,

Analysing re-sequencing samples Anna Johansson WABI /

Assessing the Integration of International Students Shideh Hanasaab, PhD Dashew Center for

Real-Time Multi/Many-Core Architecture Heechul Yun 1 Real-Time Multi/Many-Core Architecture

Reconfigurable and Adaptive Systems (RAS) Adaptive Systems (RAS) - PowerPoint PPT Presentation

Institut fr Technische Informatik Institut fr Technische Informatik Chair for Embedded Systems - Prof. Dr. J. Henkel Chair for Embedded Systems - Prof. Dr. J. Henkel Vorlesung im SS 2014 Reconfigurable and Reconfigurable and Adaptive

Reconfigurable and Reconfigurable and Adaptive Systems (RAS) Adaptive Systems (RAS) 7. Adaptive

Reconfigurable Computing Reconfigurable Computing Reconfigurable Architectures Reconfigurable

Reconfigurable and Reconfigurable and Adaptive Systems (RAS) Adaptive Systems (RAS) 4.

Reconfigurable Computing Computing Reconfigurable Reconfigurable Architectures Architectures

Reconfigurable and Adaptive Systems (RAS) 7. Adaptive Reconfigurable Processors Lars Bauer, Jrg

INTRODUCTION TO RAS AL KHAIMAH ECONOMIC ZONE INTRODUCTION TO RAKEZ Online Video RAS AL KHAIMAH

Reconfigurable and Adaptive Systems (RAS) Lars Bauer, Jrg Henkel - 1 - Institut fr

Reconfigurable and Adaptive Systems (RAS) Lars Bauer, Jrg Henkel - 1 - Institut fr

Reconfigurable and Adaptive Systems (RAS) Lars Bauer, Jrg Henkel - 1 - Institut fr

Reconfigurable and Adaptive Systems (RAS) Lars Bauer, Jrg Henkel - 1 - Institut fr

Reconfigurable and Adaptive Systems (RAS) New Directions in FPGA Design A. Grudnitsky, L. Bauer,

Reconfigurable and Adaptive Systems (RAS) Lars Bauer, Artjom Grudnitsky, Hongyan Zhang, Jrg

Reconfigurable and Adaptive Systems (RAS) Lars Bauer, Jrg Henkel - 1 - Organisation

Reconfigurable and Adaptive Systems (RAS) Lars Bauer, Artjom Grudnitsky, Hongyan Zhang, Jrg

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Innovative Systems Research Network 11 th Annual Conference, Halifax, NS Session VII, Theme 1

Introduction to Participatory Mapping Gita Ljubicic &amp; Joel Heath November 8, 2018

Rethinking Class-Balanced Methods for Long-tailed Visual Recognition from a Domain Adaptation

Geoengi gine neeri ring f ng for C r Climate te Cha hang nge: Na Natu ture Ha Has A

Healthcare: A wide-angled view CHF Australia Policy Forum, 12 November 2019 Michael Brennan,

Analysing re-sequencing samples Anna Johansson WABI /

Assessing the Integration of International Students Shideh Hanasaab, PhD Dashew Center for

Real-Time Multi/Many-Core Architecture Heechul Yun 1 Real-Time Multi/Many-Core Architecture

Introduction to Participatory Mapping Gita Ljubicic & Joel Heath November 8, 2018