Approximate Computing on Unreliable Silicon Georgios Karakonstantis - - PowerPoint PPT Presentation

approximate computing on
SMART_READER_LITE
LIVE PREVIEW

Approximate Computing on Unreliable Silicon Georgios Karakonstantis - - PowerPoint PPT Presentation

Approximate Computing on Unreliable Silicon Georgios Karakonstantis 2 Jeremy Constantin, Andreas Burg 1 Adam Teman 1 1 Swiss Federal Institute of Technology Lausanne (EPFL), Switzerland 2 Queens University Belfast, U.K. Dagstuhl 30-11/15


slide-1
SLIDE 1

Approximate Computing on Unreliable Silicon

Georgios Karakonstantis2 Jeremy Constantin, Andreas Burg1 Adam Teman1

1Swiss Federal Institute of Technology Lausanne (EPFL), Switzerland 2Queen’s University Belfast, U.K.

Dagstuhl 30-11/15

slide-2
SLIDE 2

2

Objective: Improve Energy Efficiency Main Idea: Reduce the complexity

  • f an algorithm.

Classical Main Idea: Utilize application’s error resiliency to address hardware induced errors New

Techniques

  • Scale down bit-precision
  • Prune computations
  • Simplify algorithms

Metrics

  • Quality (SNR, PSNR,…)
  • Energy

Techniques

  • Allow and tolerate errors
  • Limit errors to less significant

computations and variables

  • Ensure graceful performance

degradation Metrics

  • Quality (SNR, PSNR,…)
  • Energy
  • Yield
  • Reliability (e.g. MTTF)
slide-3
SLIDE 3

3

  • Variability summarizes three different problems
  • Failure to design under all worst-case assumptions can and will lead to hardware

misbehavior

  • Two main types of failures
  • Logic level: violation of timing constraints causes erroneous computations and control

plane failure

  • Memory: data is lost or not properly stored

Variability

True randomness

Lack of knowledge Inability to model: (chaotic behavior)

Need for overdesign to account for worst case assumptions

slide-4
SLIDE 4

Static components … Dynamic/runtime factors … Wearout/aging …

4

Random dopant fluctuation Process variations Line-edge roughness NBTI

Vdd

Voltage variation Thermal Single event upsets

1010010 0100100 0100100

Data dependencies

Only errors that are truely random (intentionally not covered in this talk)

slide-5
SLIDE 5
  • Non-ergodic behavior renders analysis of circuits under variations difficult:

averaging requires great care

5

Manufacturing Runtime/Dynamic Wearout Time [s] Time [y]

failure failure failure

Die to die and within die variations

  • Each die is an individual realization
  • f a random process
  • Parameters are fixed after

manufacturing Behavior of each circuit mostly deterministic and on short time scale

  • “Randomness” due to random

data and model uncertainty

  • Averaging only meaningful

with true random input Aging is a slow process

  • Parameters change
  • n a long time scale
  • Long-term average

is meaningless

slide-6
SLIDE 6
  • Predicting the exact timing of a circuit is almost

impossible even if all factors are precisely known

  • Predicting the consequences of a timing failure in any
  • r multiple points is even more impossible today
  • Different instances of the same circuit behave very

differently

  • Despite the high sensitivity, unfortunately, the

behavior of each instance of the circuit is also deterministic

6

slide-7
SLIDE 7

7

Quality (SNR) degradation

  • f different adders under

frequency-over-scaling Some key observations:

  • Transition region of graceful

quality degradation is small

  • Better architectures are also

more sensitive to errors (smaller transition region)

slide-8
SLIDE 8

Objective: exploit timing margin in low-power processors

  • Error-detection sequentials measure timing margins

in all pipeline stages

  • Cycle-by-cycle adjustable clock generator
  • Processor state determines instantaneous

clock period

8

Critical Range Optimization in OpenRISC

Opportunity +38% speedup

  • 24% power consumption
  • J. Constantin, et al. “Exploiting dynamic timing margins in microprocessors for frequency-over-scaling with instruction-based clock adjustment”, DATE 2015
slide-9
SLIDE 9

Graphs removed since unpublished Summary of Main points:

  • Under timing violations without additional sources of

randomness, there is a sudden transition between fault free operation and 100% failure beyond static timing limit

  • When adding uncertainty by means of supply voltage

noise we get a transition region between functional and full failure

  • Unfortunately, the transition region is rather small

(e.g., 50MHz at a clock of ~700MHz)

9

slide-10
SLIDE 10

10

Execution time Quality

Task deadline

 Approximate computing  Scalable algorithms  Stochastic computing  Application/algorithm-level fault tolerance

New paradigm: Allow for graceful performance degradation

path delay # of occurances

VDD=nominal VDD=low target delay target delay

X

Consideration of the application level provides additional scalability: graceful performance degradation

Application to Communications

Iterative algorithms adjust to process variations

slide-11
SLIDE 11
  • Memories account for the bulk of leakage and active

power consumption

  • There is a clear relationship between savings (in area

and power) and the amount of errors we expect

  • Errors can easily be located and be associated with

individual variables or quantities in higher abstractions

  • Important variables can be protected against errors
  • The impact of errors is easy to model accurately and

can be propagated well through the stack and other abstraction levels

11

slide-12
SLIDE 12

12

Application of unreliable memories to forward error correction decoders

Transmitter

HSPA+ System

System tolerates surprisingly high number of defects in costly memories

study of inherent fault-tolerance of wireless systems Compact “better-than-worst-case” memory design for FT applications

tret

50x

  • Memories with graceful

performance degradation

  • Average-case refresh
  • 1A. Teman, et al, “Energy versus data integrity trade-offs in embedded high-density logic compatible dynamic memories”, DATE 2015
  • 2G. Karakonstantis, et al. “On the exploitation of the inherent error resilience of wireless systems under unreliable silicon”, DAC 2012

1 2

  • 3P. Meinerzhagen, et al. “Refresh-Free Dynamic Standard-Cell Based Memories: Application to QC-LDPC Decoder” , ISCAS 2015

3

slide-13
SLIDE 13

Controlled errors with a modified test criterion

13

Bit errors per die % of dies <5 <100 >100 80% 40% 20% Bit errors per die % of dies <5 <100 >100 80% 40% 20%

Conventional yield criterion: accept only dies with no errors Modified yield criterion: accept dies with less than N errors

Bit errors per die <5 <100 >100 80% 40% 20% Bit errors per die <5 <100 >100 80% 40% 20%

80% yield (OK) 90% yield (high) 60% yield (too low) 80% yield (OK)

Nominal VDD Reduced VDD

toward low power operation Yield loss

  • Improves yield for a given power/quality metric
  • Keeps the yield under more stringent power constraints
slide-14
SLIDE 14

Problem:

  • Each manufactured die is subject to a specific

error pattern (number of errors and error locations)

  • Impact on quality depends strongly on the number
  • f errors and the error location (word and bit location)

Non-ergodicity invalidates quality assessment across dies Impact on quality distribution:

  • Some chips with less than N errors work perfectly, others fail miserably

14

Different instances of same memory

LSB MSB LSB MSB LSB MSB LSB MSB

Very different performance impact Few errors in MSBs Many errors in LSBs

slide-15
SLIDE 15
  • Binning based on specific error pattern: not feasible due to too many different

patterns (predicting impact of each pattern on quality during test is impossible) Proper test criteria are hard to define and ensuring consistent quality is difficult Solution: ensure that all chips (with given number of errors have the same average quality)

  • Average behavior over time must be independent of the physical error location
  • Add logic to memories to change mapping between logical and physical locations

15

LSB MSB LSB MSB LSB MSB LSB MSB

Physical to logical bit/address mapping

Time/algorithm iteration Physical failures remain in same location Logical bit-failures wonder around in the memory Quality changes with each application

  • f the algorithm (averaging)
slide-16
SLIDE 16

16

Best-effort statistical data correction Data representations for unreliable memories

  • C. Roth, et. al. “Statistical data correction for unreliable memories ”, Asilomar 2014

Roth, Christian, et al. "Data mapping for unreliable memories." Communication, Control, and Computing, Annual Allerton Conference on. IEEE, 2012.

slide-17
SLIDE 17

Idea: Identify failing bit locations during runtime and store bits of lower significance (LSB) in those locations

17

  • S. Ganapathy, et. al. “Mitigating the Impact of Faults in Unreliable Memories for Error Resilient Applications ”, DAC 2015
slide-18
SLIDE 18
  • Bit Shuffling Mechanism
  • Identify failing bits in a memory word

at run-time

  • Use a shifter to store bit of lower

significance (LSB) in those locations

  • Shuffling can be performed at varying

levels of granularity

  • On a per-bit basis, where failing bit

always stores the LSB

  • On a segment basis, where group of bits

are shifted

  • Helps trade-off area and power for
  • utput quality
  • Magnitude in error computed for 32-

bit integer in 2’s complement mode

  • 2n

fm segments/word

18

slide-19
SLIDE 19
  • Overhead compared to (39,32)

SECDED ECC.

  • We show results for (22,16)

SECDED ECC

  • Priority ECC design
  • Most significant 16-Bits are

protected in a 32-Bit word

  • Reduces power, area and

latency overhead by as much as 83%, 89% and 77% respectively

  • For 3 evaluated applications

(Elasticnet, PCA and KNN),

  • utput quality within 10%, 0.2%

and 7% of fault-free memory with SECDED ECC.

19

slide-20
SLIDE 20
  • Need to distinguish between different types of errors
  • An accurate model for some errors may be equally

important to have as it is difficult to get

  • A killing factor in todays hardware is the small

transition region between 100% functional and complete failure

  • We may need new hardware structures that ensure a

graceful performance degradation

  • Accepting errors in memories: currently the most

promising and best understood approach

20