Approximate Computing on Unreliable Silicon Georgios Karakonstantis - PowerPoint PPT Presentation

Approximate Computing on Unreliable Silicon Georgios Karakonstantis 2 Jeremy Constantin, Andreas Burg 1 Adam Teman 1 1 Swiss Federal Institute of Technology Lausanne (EPFL), Switzerland 2 Queen’s University Belfast, U.K. Dagstuhl 30-11/15

Objective: Improve Energy Efficiency New Classical Main Idea: Utilize application’s Main Idea: Reduce the complexity error resiliency to address of an algorithm. hardware induced errors Techniques Techniques • Allow and tolerate errors • Scale down bit-precision • Limit errors to less significant • Prune computations computations and variables • Simplify algorithms • Ensure graceful performance Metrics degradation • Quality (SNR, PSNR,…) Metrics • Energy • Quality (SNR, PSNR,…) • Energy • Yield • Reliability (e.g. MTTF) 2

 Variability summarizes three different problems True randomness Need for overdesign to account for worst case assumptions Lack of knowledge Variability Inability to model: (chaotic behavior)  Failure to design under all worst-case assumptions can and will lead to hardware misbehavior  Two main types of failures • Logic level: violation of timing constraints causes erroneous computations and control plane failure • Memory: data is lost or not properly stored 3

Static components … Random dopant Line-edge roughness Process variations fluctuation Dynamic/runtime factors … 1010010 Vdd 0100100 0100100 Voltage variation Data dependencies Thermal Wearout /aging … Single event upsets Only errors that are truely random (intentionally not covered in this talk) NBTI 4

Manufacturing Runtime/Dynamic Wearout failure failure failure Time [s] Time [y] Die to die and within die variations Behavior of each circuit mostly Aging is a slow process • • Each die is an individual realization deterministic and on short time scale Parameters change • “Randomness” due to random of a random process on a long time scale • • Parameters are fixed after data and model uncertainty Long-term average • manufacturing Averaging only meaningful is meaningless with true random input  Non-ergodic behavior renders analysis of circuits under variations difficult: averaging requires great care 5

• Predicting the exact timing of a circuit is almost impossible even if all factors are precisely known • Predicting the consequences of a timing failure in any or multiple points is even more impossible today • Different instances of the same circuit behave very differently • Despite the high sensitivity, unfortunately, the behavior of each instance of the circuit is also deterministic 6

Quality (SNR) degradation of different adders under frequency-over-scaling Some key observations: • Transition region of graceful quality degradation is small • Better architectures are also more sensitive to errors (smaller transition region) 7

Objective: exploit timing margin in low-power processors  Error-detection sequentials measure timing margins in all pipeline stages  Cycle-by-cycle adjustable clock generator  Processor state determines instantaneous clock period Critical Range Optimization in OpenRISC +38% speedup -24% power consumption Opportunity J. Constantin, et al. “ Exploiting dynamic timing margins in microprocessors for frequency-over-scaling with instruction-based clock adjustment”, DATE 2015 8

Graphs removed since unpublished Summary of Main points: • Under timing violations without additional sources of randomness, there is a sudden transition between fault free operation and 100% failure beyond static timing limit • When adding uncertainty by means of supply voltage noise we get a transition region between functional and full failure • Unfortunately, the transition region is rather small (e.g., 50MHz at a clock of ~700MHz) 9

Consideration of the application level provides additional scalability: graceful performance degradation Quality Execution time  Approximate computing Allow for graceful New paradigm: performance  Scalable algorithms degradation  Stochastic computing  Application/algorithm-level fault tolerance X Application to Communications Task deadline Iterative algorithms adjust to process variations V DD =nominal target delay # of occurances target delay V DD =low path delay 10

• Memories account for the bulk of leakage and active power consumption • There is a clear relationship between savings (in area and power) and the amount of errors we expect • Errors can easily be located and be associated with individual variables or quantities in higher abstractions • Important variables can be protected against errors • The impact of errors is easy to model accurately and can be propagated well through the stack and other abstraction levels 11

Compact “better -than-worst- case” study of inherent fault-tolerance of 1 2 memory design for FT applications wireless systems • Memories with graceful HSPA+ System performance degradation Transmitter t ret System tolerates surprisingly high • Average-case refresh number of defects in costly memories 50x 3 Application of unreliable memories to forward error correction decoders 1 A. Teman , et al, “Energy versus data integrity trade-offs in embedded high-density logic compatible dynamic memories”, DATE 2015 12 2 G. Karakonstantis, et al. “On the exploitation of the inherent error resilience of wireless systems under unreliable silicon”, DAC 2012 3 P. Meinerzhagen , et al. “Refresh -Free Dynamic Standard-Cell Based Memories: Application to QC- LDPC Decoder” , ISCAS 2015

Controlled errors with a modified test criterion Conventional yield criterion: Modified yield criterion: accept only dies with no errors accept dies with less than N errors % of dies % of dies 80% 80% Nominal VDD toward low power operation 80% yield 90% yield (OK) (high) 40% 40% • Improves yield for a given power/quality metric 20% 20% Bit errors Bit errors • Keeps the yield under more stringent power constraints per die per die 0 <5 <100 >100 0 <5 <100 >100 Yield Reduced VDD loss 80% 80% 80% yield 60% yield (OK) (too low) 40% 40% 20% 20% Bit errors Bit errors per die per die 0 <5 <100 >100 0 <5 <100 >100 13

Problem: Different instances of same memory  Each manufactured die is subject to a specific error pattern (number of errors and error locations)  Impact on quality depends strongly on the number MSB MSB MSB MSB LSB LSB LSB LSB of errors and the error location (word and bit location) Very different Non-ergodicity invalidates quality assessment across dies performance impact Impact on quality distribution:  Some chips with less than N errors work perfectly, others fail miserably Few errors in MSBs Many errors in LSBs 14

 Binning based on specific error pattern: not feasible due to too many different patterns (predicting impact of each pattern on quality during test is impossible) Proper test criteria are hard to define and ensuring consistent quality is difficult Solution: ensure that all chips (with given number of errors have the same average quality)  Average behavior over time must be independent of the physical error location  Add logic to memories to change mapping between logical and physical locations Physical failures remain Logical bit-failures wonder Quality changes with each application in same location around in the memory of the algorithm (averaging) Physical to logical bit/address mapping MSB MSB MSB MSB LSB LSB LSB LSB Time/algorithm iteration 15

Best-effort statistical data Data representations for correction unreliable memories C. Roth, et. al. “ Statistical data correction for unreliable memories ”, Asilomar 2014 16 Roth, Christian, et al. "Data mapping for unreliable memories." Communication, Control, and Computing, Annual Allerton Conference on . IEEE, 2012.

Idea: Identify failing bit locations during runtime and store bits of lower significance (LSB) in those locations S. Ganapathy , et. al. “Mitigating the Impact of Faults in Unreliable Memories for Error Resilient Applications ”, DAC 2015 17

 Bit Shuffling Mechanism  Identify failing bits in a memory word at run-time  Use a shifter to store bit of lower significance (LSB) in those locations  Shuffling can be performed at varying levels of granularity • On a per-bit basis, where failing bit always stores the LSB • On a segment basis, where group of bits are shifted • Helps trade-off area and power for output quality  Magnitude in error computed for 32- bit integer in 2’s complement mode  2 n fm segments/word 18

 Overhead compared to (39,32) SECDED ECC.  We show results for (22,16) SECDED ECC  Priority ECC design  Most significant 16-Bits are protected in a 32-Bit word  Reduces power, area and latency overhead by as much as 83%, 89% and 77% respectively  For 3 evaluated applications (Elasticnet, PCA and KNN), output quality within 10%, 0.2% and 7% of fault-free memory with SECDED ECC. 19

Approximate Computing on Unreliable Silicon Georgios Karakonstantis - PowerPoint PPT Presentation

Approximate Computing on Unreliable Silicon Georgios Karakonstantis 2 Jeremy Constantin, Andreas Burg 1 Adam Teman 1 1 Swiss Federal Institute of Technology Lausanne (EPFL), Switzerland 2 Queens University Belfast, U.K. Dagstuhl 30-11/15

Approximate Computing Is Dead; Long Live Approximate Computing Adrian Sampson Cornell Hardware

Approximate Nearest Neighbors Search Approximate Nearest Neighbors Search in High Dimensions in

Probable Cause The Deanonymizing Effects of Approximate DRAM Amir Rahmati , Matthew Hicks, Dan

Approximate Graph Operations on Parallel Platforms Approximate Graph Operations on Parallel

SNNAP: Approximate Computing on Programmable SoCs via Neural Acceleration Thierry Moreau Hadi

Approximate Bayesian Computation Chris Drovandi, Charisse Farr October 24, 2012 Chris Drovandi,

Backward Analysis via Over-Approximate Abstraction and Under-Approximate Subtraction Alexey

Approximate Reasoning for the Semantic Web Part V Approximate Resolution for OWL Frank van

Two Approximate- Programmability Birds, One Statistical- Inference Stone Adrian Sampson

Approximate Program Synthesis James Bornholt Emina Torlak Luis Ceze Dan Grossman University of

Approximate Bayesian Computation Dr. Jarad Niemi STAT 615 - Iowa State University December 5,

Approximate inference: Sampling methods Probabilistic Graphical Models Sharif University of

Approximate Cross-Validation and Dynamic Experiments for Policy Choice Maximilian Kasy

Faster Parallel Algorithm for Approximate Shortest Path Jason Li (CMU) STOC 2020 March 2, 2020

Approximate Nearest Neighbors Sariel Har Peled: Notes Arya, Mount, Netenyahu, Silverman, Wu An

Sensors Approximate computing approximate edge detection Machine learning hidden

Recovery Techniques for Streaming Audio zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA A

Computing and Communications 1. Introduction Ying Cui Department of Electronic Engineering

Reliable Communication for Datacenters Mahesh Balakrishnan Cornell University Mahesh

DP-QPSK Receiver OPTI 500, Spring 2012, Lecture 16, Coherent Transmitters and Receivers 1

Reducing Web Latency: The Virtue of Gentle Aggression Tobias Flach , Nandita Dukkipati, Andreas

Error-Bounded Correction of Noisy Labels Songzhu Zheng , Pengxiang Wu, Aman Goswami, Mayank

Shannon's Theory of Communication An operational introduction 5 September 2014, Introduction to

Network Security: Network Review and Firewalls Henning Schulzrinne Columbia University, New York

Sambuz

Useful Links

Newsletter

Mail Us