Architectural Methods to Understand Soft Errors/ Process Variations - - PowerPoint PPT Presentation

architectural methods to understand soft errors process
SMART_READER_LITE
LIVE PREVIEW

Architectural Methods to Understand Soft Errors/ Process Variations - - PowerPoint PPT Presentation

Architectural Methods to Understand Soft Errors/ Process Variations in DSN 2012 Jun YAO Nara Institute of Science and Technology yaojun@is.naist.jp Brief Introduction of DSN12 DSN12 The 42 nd Annual IEEE/IFIP International


slide-1
SLIDE 1

Architectural Methods to Understand Soft Errors/ Process Variations in DSN 2012

Jun YAO Nara Institute of Science and Technology

yaojun@is.naist.jp

slide-2
SLIDE 2

Brief Introduction of DSN’12

 DSN’12

  • The 42nd Annual IEEE/IFIP International Conference on

Dependable System and Networks

  • Two symposiums into one conference

 PDS: Performance and Dependability Symposium  Performance, dependability and security;  DCCS: Dependable Computing and Communication Systems  Dependability and security.

NAra Institute of Science & Technology, YAO, Jun, 2012/8/28

SERConf 2012@福岡

2

slide-3
SLIDE 3

Fields of Papers

 DSN/PCS

  • 24 papers accepted at a rate of 30%.

 3 related to processor architecture or lower level (12.5%)

 DSN/DCCS

  • 27 papers accepted at a rate of 17.3%.

NAra Institute of Science & Technology, YAO, Jun, 2012/8/28

SERConf 2012@福岡

3

ISCA 2012: Rate 18%.

 Image:

  • Far more SW than HW.
  • Security/availability are

more preferred.

  • Should try PDS (rejected

by DCCS).

slide-4
SLIDE 4

Papers of Interests

 Understanding Soft Error Propagation

Using Efficient Vulnerability-Driven Fault Injection -- PDS

  • Xin Xu and Man-Lap Li@George Washington

Univ.

  • doi: 10.1109/DSN.2012.6263923
  • Purpose:

 Effectively inject error during simulation/validation  Understanding the output of error injection

NAra Institute of Science & Technology, YAO, Jun, 2012/8/28

SERConf 2012@福岡

4

slide-5
SLIDE 5

 Causes: particle strikes, radiation…  Consequences:

  • Abort: System crash or hang, application

abnormally exit

  • Silent data corruption (SDC): wrong

application outputs when application not abort

  • Masked: Fault is not visible

NAra Institute of Science & Technology, YAO, Jun, 2012/8/28

SERConf 2012@福岡

5

Soft Errors in Microprocessors

> 90%

 ✔ Lower the cost for detection & protection  ✕ Bad for evaluation.

slide-6
SLIDE 6

Some data from Yao’s research

SERConf 2012@福岡

IF ID RR MA WB IF ID RR MA WB EX EX

2 2

=? =?

3 3 4 4

=? =? =? =? =? =?

1 1

=? =? =? =? == == != !=

DMR processor: DARA

6

 Alpha source➜  Fault inject rate: 0.58 FF/sec in DARA 0.46

 Architectural vulnerability factor (AVF)

  • Out = in & 0x3  out sensitive to in[1:0] only.
slide-7
SLIDE 7

Vulnerability-Driven Fault Injection

 Goals: Reduce the error injected on

masked values;

  • Same amount of injection get more

erroneous result.

 Approach:

  • Guide error injection by vulnerability

analysis

 Results:

  • Increases error occurrence by 59%.

NAra Institute of Science & Technology, YAO, Jun, 2012/8/28

SERConf 2012@福岡

7

slide-8
SLIDE 8

VA: get CriticalFault injection space

 First-level dynamically dead (FDD) instruction

  • Above Add instruction;

 Transitively dynamically dead (TDD) instruction

  • Result generated but not consumed.

NAra Institute of Science & Technology, YAO, Jun, 2012/8/28

SERConf 2012@福岡

8

Instruction trace: Load [ R1 ] ➜ R2; Add 0x1, R2 ➜ R3; Move R4 ➜ R3; Load [R5] ➜ R2; Remove to get critical fault injection space.

slide-9
SLIDE 9

Guided Error Injection Flow

NAra Institute of Science & Technology, YAO, Jun, 2012/8/28

SERConf 2012@福岡

9

  • 1. Collect instruction

trace 2. Generate injection map (reduced)

  • 3. Simulation: randomly error

injection guided by the map. 4.Results analysis (visible error ?).

29% 49%

slide-10
SLIDE 10

Overall Error Injection Result

 CriticalFault provides 18% more error

  • ccurrence in average

 SDC error increases under guided injection.

NAra Institute of Science & Technology, YAO, Jun, 2012/8/28

SERConf 2012@福岡

10

slide-11
SLIDE 11

More Interesting Results

 Classify injection by types

  • Three categories

 Faulty control: T= a>0; if (T) goto Loop_exit;  Faulty address: LD [R1]➜R2; ST R2➜[R3];  Faulty data: a = b + c; etc.

NAra Institute of Science & Technology, YAO, Jun, 2012/8/28

SERConf 2012@福岡

11

 Two kinds of explorations:

  • How soft error is propagated inside the processor
  • How long will it be a problem
slide-12
SLIDE 12

Faulty control/address

NAra Institute of Science & Technology, YAO, Jun, 2012/8/28

SERConf 2012@福岡

12

 60% of faulty control is not visible to the final program

  • results. ➜ quite different to my imagination.

 90% address faults results to ABORT:

  • High necessity to cover address.
slide-13
SLIDE 13

Faulty Data

 Resemblance of faulty data to all

faults

  • Faulty data leads to all possibility cases

NAra Institute of Science & Technology, YAO, Jun, 2012/8/28

SERConf 2012@福岡

13

slide-14
SLIDE 14

Life Time of Injected Error

 Under abort cases, the control path

will divert within 100 instructions.

NAra Institute of Science & Technology, YAO, Jun, 2012/8/28

SERConf 2012@福岡

14

Err Branch Correct path Wrong path

slide-15
SLIDE 15

Conclusion

 Give a way to reduce the error

injection space

 Show the responses of different

instruction types upon error injection

NAra Institute of Science & Technology, YAO, Jun, 2012/8/28

SERConf 2012@福岡

15

 My image: balancing cost & behaviors are

important

  • Cost for redundancy is always high
  • But without redundancy, we can not trace error occurrence.
slide-16
SLIDE 16

Papers of Interests

 VARIUS-NTV: A Microarchitectural

Model to Capture the Increased Sensitivity of Manycores to Process Variations at Near-Threshold Voltages

  • - DCCS
  • Ulya R. Karpuzcu, Krishna B. Kolluru, Nam Sung

Kim, and Josep Torrellas@UIUC

  • doi: 10.1109/DSN.2012.6263951
  • Purpose: Modeling Process Variation

 Estimate NTV in many core architecture

NAra Institute of Science & Technology, YAO, Jun, 2012/8/28

SERConf 2012@福岡

16

slide-17
SLIDE 17

Approaches

 Extends existing VARIUS, adds NTV  Download at:

  • http://iacoma.cs.uiuc.edu/varius/ntv/varius

NTV.html

NAra Institute of Science & Technology, YAO, Jun, 2012/8/28

SERConf 2012@福岡

17

slide-18
SLIDE 18

Evaluation Setup

 288 core chip

  • 36 clusters, 8 cores per cluster
  • Core: single issue in-order

 11nm process

NAra Institute of Science & Technology, YAO, Jun, 2012/8/28

SERConf 2012@福岡

18

slide-19
SLIDE 19

Results

 Variations in Vddmin

NAra Institute of Science & Technology, YAO, Jun, 2012/8/28

SERConf 2012@福岡

19

slide-20
SLIDE 20

Variation of Frequency

 Sub-threshold

zone

  • 2.3x

 NTV zone

  • 3.7x difference

NAra Institute of Science & Technology, YAO, Jun, 2012/8/28

SERConf 2012@福岡

20  Please download  and try it.

slide-21
SLIDE 21

The End

NAra Institute of Science & Technology, YAO, Jun, 2012/8/28

SERConf 2012@福岡

21