Measurement of Timing Error Detection Performance of Software-based - - PowerPoint PPT Presentation

measurement of timing error detection performance of
SMART_READER_LITE
LIVE PREVIEW

Measurement of Timing Error Detection Performance of Software-based - - PowerPoint PPT Presentation

Measurement of Timing Error Detection Performance of Software-based Error Detection Mechanisms and Its Correlation with Simulation Yutaka Masuda, Masanori Hashimoto, Takao Onoye Dept. of Information Systems Eng., Osaka University


slide-1
SLIDE 1

Measurement of Timing Error Detection Performance

  • f Software-based Error Detection Mechanisms

and Its Correlation with Simulation

Yutaka Masuda, Masanori Hashimoto, Takao Onoye

  • Dept. of Information Systems Eng., Osaka University

{masuda.yutaka, hasimoto}@ist.osaka-u.ac.jp

1

slide-2
SLIDE 2

Agenda

Background and objective Silicon measurement Correlation between silicon measurement and simulation Conclusion

2

slide-3
SLIDE 3

Challenges in Post-Silicon Validation

3

To localize errors w/ trace buffer, we need to quickly detect errors !! A number of tests Unexpected behavior happens due to logic bug Electrical timing error (This work)

. Error Cannot record in trace buffer!

Test Long detection latency (e.g. billions cycles)

System crash, Blue screen etc. Trace buffer depth discarded

slide-4
SLIDE 4

EDM* Trans. for quick error detection

4

a0 = b0; a1 = b1; if (a0 != a1) error(); a = b; EDM-L Duplicate all instructions Check : When variable written

  • Eg. EDM-L (EDM for short Latency) [1]

Processor

c

Input & Run

1001010 10100 ・・・ RAM a0=b0; a1=b1; Check; ・・・

c

C/C++ compile No HW modification EDM trans.

a=b; ・・・

Original EDM program

(*) Error Detection Mechanisms, one of SW-based error detec. tech.

EDM-L quickly detects 86 % of elect. timing errors that vary exec. results [1]. (only evaluated in simulation. )

[1] Y. Masuda, M. Hashimoto, and T. Onoye, “Performance Evaluation of Software-based Error Detection Mechanisms for Localizing Electrical Timing Failures under Dynamic Supply Noise,” Proc. ICCAD, 2015.

slide-5
SLIDE 5

Objective

Scenario 2 : Localize electrical errors that vary exec. results.

5

Scenario 1 : Localize electrical errors in original program.

  • 1. To answer “How much electrical errors can EDM* localize?”

based on silicon measurement!

  • 2. To evaluate correlation between sim. and silicon results.

EDM Original reproduce Short latency

slide-6
SLIDE 6

Reproducibility and Detectability

6

For making EDM work well, 2 conditions should be satisfied.

COND1 : Reproducibility

(necessary for Scenario1) Original Duplicated Check Original

Original EDM

reproduce

COND2 : Detectability

(necessary for Scenario1 and Scenario2) error latency ≤ 1000 cycles → satisfied Detect quickly Original Duplicated Check

EDM

slide-7
SLIDE 7

Agenda

Background and objective Silicon measurement Correlation between silicon measurement and simulation Conclusion

7

slide-8
SLIDE 8

Preparation

8

Evaluate error occurrence border

  • freq. for each workload and Vdd

PC

USB

DC voltage source Supply Vdd Border freq.

Test chip

(MeP processor fabricated in 65nm)

1 1.1 1.2 1.3 1.4 200 220 240 260 280 300 320 340 360 380 400 voltage [v] frequency [MHz] "false_result.txt" "true_result.txt"

Frequency Vdd

Border frequency

slide-9
SLIDE 9

Measurement

9

Evaluate error occurrence time for computing error detection latency.

User Program @10 MHz Nfast cycle @10 MHz Initiali- zation @ border freq. Time

– repeat program execution by changing Nfast in binary search manner

  • exec. results err ?

User program : dijkstra, sha, crc (MiBench) Supply voltage : 1.0 - 1.4 V with 0.1V interval Test chip : 5 chips Total : 75 measurements

slide-10
SLIDE 10

Evaluation Result

10 COND1 : Reproducibility COND2 : Detectability

25%

4% 31%

40%

Both COND1 and COND2 satisfied Only COND1 satisfied Only COND2 satisfied Neither COND1 nor COND2 satisfied

Scenario1

Detect 25 % of original errors.

Scenario2

Detect 56 % of errors varying results.

56% 11% 0% 33%

Detected & Latency < 1000 cycles Detected & latency > 1000 cycles Not detected & correct results Not detected & incorrect results

slide-11
SLIDE 11

Agenda

Background and objective Silicon measurement Correlation between silicon measurement and simulation Conclusion

11

slide-12
SLIDE 12

Simulation setup

12

Evaluation setup

PDN design Border freq. Silicon Low noise

  • Exec. results vary

Previous Sim.[1] 3% - 7% Vdd drop Timing error occurs Updated Sim. Zero noise

  • Exec. results vary

Consider 2 simulation setup

  • 1. Previous Sim.[1]
  • 2. Sim. which updates PDN and definition of border freq.

Freq.

# of errors Error

  • ccurs

Results vary

slide-13
SLIDE 13

Correlation between silicon and sim. (Scenario1)

13

Updated Sim.

Consistent between updated sim. and silicon

–Detectability for original errors : 25%(Silicon) 23%(updated Sim.)

25%

4% 31%

40%

Silicon

COND1 : Reproducibility, COND2: Detectability

23%

7% 20%

50% 0%

4% 20%

76%

Both COND1 and COND2 satisfied Only COND1 satisfied Only COND2 satisfied Neither COND1 nor COND2 satisfied

Previous Sim.[1]

(Localize electrical timing errors in original program)

slide-14
SLIDE 14

14

Updated Sim.

Silicon

Correlation between silicon and sim. (Scenario2)

Previous Sim[1].

(Localize potential errors that vary results)

44 % (Updated Sim.) For errors varying results, EDM detects 56 % (Silicon)

Consistency improvement by simulation update

87 % =

. . (Previous Sim.)

56% 11% 0% 33% 44% 43% 0% 13% 20% 1% 77% 2%

Detected & Latency < 1000 cycles Detected & latency > 1000 cycles Not detected & correct results Not detected & incorrect results

slide-15
SLIDE 15

Agenda

Background and objective Silicon measurement Correlation between silicon measurement and simulation Conclusion

15

slide-16
SLIDE 16

Conclusion

  • Evaluated error detection performance of EDM transformation for

supply noise induced timing errors based on silicon measurement. – Considered two EDM usage scenarios – In scenario1, EDM detected 25% of original errors. – In scenario2, EDM detected 56% of errors varying results.

  • Evaluate correlation of EDM performance between sim. and silicon.

–Update PDN design and definition of border frequency. –Consistent between updated sim. and silicon.

16

slide-17
SLIDE 17

Backup Slide Difficulty of Electrical Error Localization

17

Can SW-based trans. debug the original error ? Program transformation change inst. sequence.

  • Inst. seq. #1

Voltage Voltage Time Original program Transformed program

Supply voltage varies.

  • Inst. seq. #2 ・・・
  • Inst. seq. #1 + #1’

Check ・・・ Time

Error

The same error appear?

slide-18
SLIDE 18

Even when the same instructions are executed, memory and registers usage changes. ⇒ EDM changes inductive noise and this prevents the error reproduction.

940 960 980 1000 1020 1040 1 2 3 4 5 6 voltage [mv] time [ns] "dijkstra_full-EDM" "dijkstra_original"

Voltage [mV] Time [ns]

10 20 30 40

941 942 943 944 945 946 947

Ratio[%]

Minimum voltage in the MOV instruction [mV]

18

Backup Slide Why low reproduction ratio?