Post-Silicon Bug Diagnosis with Inconsistent Executions
Andrew DeOrio Daya Shanker Khudia Valeria Bertacco
9 November 2011
University of Michigan
ICCAD’11
Inconsistent Executions Andrew DeOrio Daya Shanker Khudia Valeria - - PowerPoint PPT Presentation
Post-Silicon Bug Diagnosis with Inconsistent Executions Andrew DeOrio Daya Shanker Khudia Valeria Bertacco University of Michigan ICCAD11 9 November 2011 Impact of errors $475 M Functional bugs 17 Jan FDIV bug : Intel announces a
9 November 2011
University of Michigan
ICCAD’11
17 Jan 1995
FDIV bug: Intel announces a pre-tax charge of $475M dollars against earnings for replacement of flawed processors
Kris Kaspersky: Remote Code Execution Through Intel CPU Bugs
2
3
4
post-si platform
pushl %epb movl %epb
same post- silicon test
many different results
[Whetsel 1991, Abramovici 2006, Dahlgren 2003]
[Park 2009]
[Gao 2009, Li 2010, Yang 2008]
5
post-si test post-si platform hw sensors
HW logging SW post-analysis signatures
pass fail
bug location bug
time band model
6
7
window window time@1=2 time@1=1
HW logging SW post-analysis
HW logging SW post-analysis
passing testcase failing testcase
match?
0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1
Distribution Signature value
passing testcases failing testcases
8
HW logging SW post-analysis
0.2 0.4 0.6 0.8 1
Distribution Signature value
passing testcases failing testcases 0.2 0.4 0.6 0.8 1
Distribution Signature value
passing testcases failing testcases
Example: time@1
9
Example: CRC
10
1 Memory Buffer chip under test Off-chip through debug port register
EN
register
EN
1
HW logging SW post-analysis
11
post-si test post-si platform hw sensors
HW logging SW post-analysis signatures
pass fail
bug location bug
time band model
HW logging SW post-analysis
0.2 0.4 0.6 0.8 1 4 8 12 16 20 24
behavior of 1 signal from the MEM stage of a 5-stage pipeline processor
0.5 1 0.5 1
µ ± 2σ
bug band
bug occurrence bug detected
12
Failing group
signalA signalB signalC
1 2 3 4
windows
Passing group
signalA signalB signalC
1 2 3 4
windows
bug band signals signals windows signatures signatures
13
HW logging SW post-analysis
14
detected signals detection latency
PCX gnt SA Xbar elect BR fxn MMU fxn PCX atm SA PCX fxn Xbar combo MCU combo MMU combo EXU elect blimp_rand
√+ √ √ √ √+ √+ √+ f.n. √+ f.n.
fp_addsub
n.b. f.p. √ √ √ √+ f.p. n.b. √+ f.p.
fp_muldiv
n.b. f.p. √
√
√ √+ f.p. f.p. √+ f.p.
isa2_basic
n.b. f.n. √ n.b. √+ √+ √+ √+ n.b. f.n.
isa3_asr_pr
n.b. √ √ f.n. √+ √ √+ √+ √ √
isa3_window
n.b. √ √ n.b. √+ √ f.n. f.n. n.b. √
ldst_sync
n.b. √+ √ √ √+ √+ √+ √+ √+ n.b.
mpgen_smc
n.b. √+ √ √ √+ √+ √+ √+ √+ √+
n2_lsu_asi
n.b. f.n. √ f.n. √+ √+ √+ √+ √+ n.b.
tlu_rand
n.b. √+ √ √ √+ √+ √+ √+ √+ √+
15
n.b. no bug √ found √+ exact signal f.p. false pos. f.n. false neg.
PCX gnt SA Xbar elect BR fxn MMU fxn PCX atm SA PCX fxn Xbar combo MCU combo MMU combo EXU elect blimp_rand
√+ √ √ √ √+ √+ √+ f.n. √+ f.n.
fp_addsub
n.b. f.p. √ √ √ √+ f.p. n.b. √+ f.p.
fp_muldiv
n.b. f.p. √
√
√ √+ f.p. f.p. √+ f.p.
isa2_basic
n.b. f.n. √ n.b. √+ √+ √+ √+ n.b. f.n.
isa3_asr_pr
n.b. √ √ f.n. √+ √ √+ √+ √ √
isa3_window
n.b. √ √ n.b. √+ √ f.n. f.n. n.b. √
ldst_sync
n.b. √+ √ √ √+ √+ √+ √+ √+ n.b.
mpgen_smc
n.b. √+ √ √ √+ √+ √+ √+ √+ √+
n2_lsu_asi
n.b. f.n. √ f.n. √+ √+ √+ √+ √+ n.b.
tlu_rand
n.b. √+ √ √ √+ √+ √+ √+ √+ √+
16
n.b. no bug √ found √+ exact signal f.p. false pos. f.n. false neg.
PCX gnt SA Xbar elect BR fxn MMU fxn PCX atm SA PCX fxn Xbar combo MCU combo MMU combo EXU elect blimp_rand
√+ √ √ √ √+ √+ √+ f.n. √+ f.n.
fp_addsub
n.b. f.p. √ √ √ √+ f.p. n.b. √+ f.p.
fp_muldiv
n.b. f.p. √
√
√ √+ f.p. f.p. √+ f.p.
isa2_basic
n.b. f.n. √ n.b. √+ √+ √+ √+ n.b. f.n.
isa3_asr_pr
n.b. √ √ f.n. √+ √ √+ √+ √ √
isa3_window
n.b. √ √ n.b. √+ √ f.n. f.n. n.b. √
ldst_sync
n.b. √+ √ √ √+ √+ √+ √+ √+ n.b.
mpgen_smc
n.b. √+ √ √ √+ √+ √+ √+ √+ √+
n2_lsu_asi
n.b. f.n. √ f.n. √+ √+ √+ √+ √+ n.b.
tlu_rand
n.b. √+ √ √ √+ √+ √+ √+ √+ √+
17
n.b. no bug √ found √+ exact signal f.p. false pos. f.n. false neg.
1,000 2,000 3,000 4,000 5,000 6,000
PCX gnt SA XBar elect BR fxn MMU fxn PCX atm SA PCX fxn XBar combo MCU combo MMU combo EXU elect AVERAGE
Δ time bug injection to detection (cycles)
18
40 80 120 160 200 PCX gnt SA XBar elect BR fxn MMU fxn PCX atm SA PCX fxn XBar combo MCU combo MMU combo EXU elect AVERAGE
19
20 40 60 0.2 0.4 0.6 0.8 1
false negatives false positives sum
20
21
1 Memory Buffer chip under test Off-chip through debug port register
EN
register
EN
1
22