Parallel Streaming Computation on Error-Prone Processors Yavuz - - PowerPoint PPT Presentation

parallel streaming computation on error prone processors
SMART_READER_LITE
LIVE PREVIEW

Parallel Streaming Computation on Error-Prone Processors Yavuz - - PowerPoint PPT Presentation

Parallel Streaming Computation on Error-Prone Processors Yavuz Yetim, Margaret Martonosi, Sharad Malik Hardware Errors on the Rise Soft Errors Due to Cosmic Rays Random Process Variation [Sierawski et al., 2011] [Khun et al., 2011] 25


slide-1
SLIDE 1

Parallel Streaming Computation on Error-Prone Processors

Yavuz Yetim, Margaret Martonosi, Sharad Malik

slide-2
SLIDE 2

X

Hardware Errors on the Rise

Random Process Variation [Khun et al., 2011] Soft Errors Due to Cosmic Rays [Sierawski et al., 2011]

5 10 15 20 25 65 55 45 40 Upsets/B muons/Mb Technology Node (nm) 1 10 100 1000 10000 100000 1 10 100 1000 10000 Average Number of Dopant Atoms Technology Node (nm)

slide-3
SLIDE 3

X

Traditional Solutions

0.002 0.004 0.006 0.008 0.01 550 590 630 670 710 750 790 830 Norm Number of Dies Delay (ps)

PDF of Delay Reliable Higher Latencies or Voltage Margins Redundancy

High Power, Performance and Area Overhead

Processor 1 Processor 2 Output Check Input Replication Memory subsystem with ECC

up to 100% SECDED: 1-cycle latency, ~10k gates 4EC5ED: 14-cycle latency, ~100k gates

slide-4
SLIDE 4

X

Reliable memory

Architectures for Error-Prone Computing

EnerJ [Sampson et al., 2011] ERSA [Leem et al., 2010]

Reliable core & memory

main thread:

  • algorithmic control
  • worker thread error handling

Unreliable core & memory

worker thread:

  • do-all unit
  • restarted on error

.

. . .

Flikker [Liu et al., 2011]

.

. . .

Unreliable memory

critical int x; int y;

Processor / Memory Unreliable execution unit / register / memory Reliable execution unit / register / memory

Instruction / Data tolerant

Reliable Unreliable

*

slide-5
SLIDE 5

X

To Minimal Reliable Hardware

Output:

  • Crashes due to

memory errors

  • Hangs due to

control-flow errors Error-prone processor Error-tolerant application

slide-6
SLIDE 6

X

To Minimal Reliable Hardware

Output:

  • Crashes due to

memory errors

  • Hangs due to

control-flow errors Error-prone processor Error-tolerant application StreamIt programming model + memory segmentation Filter 1 Filter 2 Filter 3 Filter 4 Control-flow with scopes:

  • Known run-times of modular

control-flow regions determine timeout limits

  • Coarse-grain sequencing of

computation Regions with R/W/X permissions Memory:

  • Only allowed accesses are

allowed, other dropped

slide-7
SLIDE 7

X

To Minimal Reliable Hardware

Output:

  • Crashes due to

memory errors

  • Hangs due to

control-flow errors Error-prone processor + coarse-grain control-flow, memory, I/O management Error-tolerant application Error-tolerant application Error-prone processor Output: Graceful quality degradation with errors *Extracting Useful Computation From Error-Prone Processors [Yetim et al, 2013]

slide-8
SLIDE 8

X

Communication Errors For Parallel Streaming Applications

Error-tolerant application Multiple processing nodes with single- threaded protection Output: Unacceptable quality

This work

  • Communication errors

– Unrecoverable corruption of the communication mechanism – Data misalignment among producer/consumer threads

  • CommGuard

– Application-level communication information – Low overhead recovery from communication errors

slide-9
SLIDE 9

X

Outline

  • Motivation
  • Communication Errors in Parallel Streaming

Applications

  • CommGuard System Overview
  • Experimental Methodology and Results
  • Conclusions
slide-10
SLIDE 10

X

Communication Errors Transmission Failure

Producer Consumer Concurrent Software Queue

  • List of free pointers
  • List of data pointers
  • Locks
  • State shared by both ends
  • State retained throughout

computation

Corruption in lists, pointers and locks are permanent push pop

slide-11
SLIDE 11

X

Communication Errors Transmission Failure

Producer Consumer Error-free Hardware Queue push pop

  • Data items are flowing
  • Image is not coherent
slide-12
SLIDE 12

X

Communication Errors Misalignment I

Producer(): push R; push G; push B; Consumer(): pop R; pop G; pop B; Error-free Hardware Queue G R B R B R Misalignment due to a control-flow error is permanent

slide-13
SLIDE 13

X R[0:63] G[0:63] B[0:63] P[64:127] R[64:127] G[64:127] B[64:127]

Communication Errors Misalignment II

Producer R Producer G Join Producer B P[0:63] G[192:255] R[128:191] B[128:191] Misalignment at join nodes are also permanent

slide-14
SLIDE 14

X

Outline

  • Motivation
  • Communication Errors in Parallel Streaming

Applications

  • CommGuard System Overview
  • Experimental Methodology and Results
  • Conclusions
slide-15
SLIDE 15

X

CommGuard Overview

Producer Consumer Iteration iteration iteration iteration markers

  • Expecting item,

received marker: PAD

  • Expecting marker,

received item: DISCARD

slide-16
SLIDE 16

X

CommGuard Overview

split join For all incoming edges

  • If items missing: PAD
  • If items extra: DISCARD

Local iteration counter Local iteration counter

slide-17
SLIDE 17

X

CommGuard System Overview

Unreliable Producer Frame Inserter Unreliable Consumer Frame Checker Header Pad, Discard, Pad & Discard Stall Push New iteration Hardware Queue Item Pop Header Item New iteration

slide-18
SLIDE 18

X

Outline

  • Motivation
  • Communication Errors in Parallel Streaming

Applications

  • CommGuard System Overview
  • Experimental Methodology and Results
  • Conclusions
slide-19
SLIDE 19

X

Experimental Methodology

  • Built on prior simulation Infrastructure by [Yetim et al, DATE 2013]

– Virtutech Simics modeling 32-bit Intel x86 – Error injection capabilities – Protection modules for sequential streaming applications – Architecturally visible errors following distribution with given mean time between errors (MTBE)

  • Pick error injection cycle
  • Picks random register, pick random bit
  • Flip bit, repeat
  • Extensions for multi-core simulation

– Monitor scheduling of selected threads – Pin threads to processor cores – Per-core error injection – Protection modules implemented for every core

  • Modeled frame checker and frame inserter
  • JPEG Decoder as a streaming application
slide-20
SLIDE 20

X

Output at Different Error Rates

  • Output quality restored after misalignment through

CommGuard

  • Graceful output degradation with increasing errors
slide-21
SLIDE 21

X

Run-time Overhead Due to Stalls

  • Run-time increases due to stalls caused by misalignments
  • Only 2% even at high error-rates
slide-22
SLIDE 22

X

Amount of Padding

  • Padding to resolve misalignments is observed even at low

error rates

slide-23
SLIDE 23

X

Outline

  • Motivation
  • Communication Errors in Parallel Streaming

Applications

  • CommGuard System Overview
  • Experimental Methodology and Results
  • Conclusions
slide-24
SLIDE 24

X

Conclusions

  • Communication in parallel applications add fragility

– Error-prone communication subsystem – Data misalignments due to asynchronous threads

  • Explicit communication & control-flow can be used

– Encapsulate coarse-grain data units – Use small checker circuitry to recover from communication errors

  • Low overhead solutions to sustain quality

– Only ~150B of reliable state per core and less than 2% run- time overhead even at high error rate – 16dB can be sustained for errors as frequent as every 1ms

slide-25
SLIDE 25

Parallel Streaming Computation on Error-Prone Processors

Yavuz Yetim, Margaret Martonosi, Sharad Malik

slide-26
SLIDE 26

X

Backup Slides

slide-27
SLIDE 27

X

Suitably Error Tolerant

slide-28
SLIDE 28

X

Frame Checker FSM

slide-29
SLIDE 29

X

Avoid Running Indefinitely

Program Program Regular execution Indefinite run due to errors Program Divide program to regions with time limits Scope 1 Scope 2 Loop 1 Loop 2 Loop 1 Loop 2 Too long Too long, break

slide-30
SLIDE 30

X W

Disallowed Memory Accesses

Memory Memory Regular execution Crash due to errors Memory Suppress crashes R/X R/W R/X R/W R/X R/W Crash X W X W Don’t crash, Bump PC X

slide-31
SLIDE 31

X

Overall Design

MIS: Coarse-grained control flow constraints and recovery MFU: Coarse-grained constrains

  • n memory accesses

Streamed I/O: Manages bounded data streams

slide-32
SLIDE 32

X

Communication Errors Single-threaded

Producer Consumer push 16 pop 64 Toy producer-consumer streaming application P P P Core 0 P C ... Statically allocated 64-item buffer

  • Static location is preserved in reliable I-Cache throughout the computation
  • Every new [P] or [C] iteration recovers the pointer values
  • Communication never halts indefinitely
slide-33
SLIDE 33

X

Shared State

Value Details (S)tatic or (D)ynamic

Firing per frame How many times a node needs to fire before the computation starts for the next frame S Frame limit Number of total frames the application needs to process S Active frame How many frames have been processed so far D Active firing How many times the node has fired for the active frame D

  • The inserter and the checker need to keep state to operate
  • State below is shared by every inserter and checker belonging

to a node

slide-34
SLIDE 34

X

Additional Frame Checker State

State Details (E)rroneous (N)ormal Receiving items Node is receiving items for the active frame N Expecting a header Node has started new frame computationally hence the next item in the queue should be a header N Discarding The computation in the node is ahead of the communication of the edge E Padding The communication of the edge is ahead of the computation in the node E

slide-35
SLIDE 35

X

CommGuard Placement

Previous Filter Next . . . . FC FI . . . .

slide-36
SLIDE 36

X

Output Quality For Varying MTBEs

  • Compare lossy compression to error-prone decompression
  • For raw image file I, encoded file E and decoded files F or P:
  • This study was performed for MP3 and JPEG decoder benchmarks

– Widely used – Full-runs – Each experimental setting: 10 times Raw Image Compressed Image Decompressed Image

Decompressed Image

Compression Error-free Baseline: Error-free SNR Ours: Error-prone SNR Error-prone