Bridging Pre- and Post-silicon Debugging with BiPeD
Andrew DeOrio Jialin Li and Valeria Bertacco
November 2012
University of Michigan
ICCAD “Simulation-based Verification” Session
Bridging Pre- and Post-silicon Debugging with BiPeD Andrew DeOrio - - PowerPoint PPT Presentation
Bridging Pre- and Post-silicon Debugging with BiPeD Andrew DeOrio Jialin Li and Valeria Bertacco University of Michigan ICCAD Simulation - based Verification Session November 2012 Verification Opportunities Pre-Silicon Post-Silicon -
Andrew DeOrio Jialin Li and Valeria Bertacco
November 2012
University of Michigan
ICCAD “Simulation-based Verification” Session
Pre-Silicon Post-Silicon
2
+ High observability + Reproducible bugs + High speed
little information sharing
Pre-Silicon Post-Silicon
3
+ High observability + Reproducible bugs + High speed
little information sharing
High observability → learn correct behavior High speed → enforce correct behavior Shared correctness model
High speed High observability, detailed debugging info No need for bug reproduction
Pre-Silicon Post-Silicon
4
little information sharing
Shared correctness model
5
Post
Pre
interfaces
protocols
data off-chip
debugging information
unknown tests
protocols
protocols
Protocol detection Protocol extraction Transaction extraction
6
Post
test platform
error ! transaction extraction
time errant transaction location ( signals ) transaction history
Pre
test test
tests
Protocol Database logic simulator
protocol extraction
7
Design Under Test
module testbench initial begin clock = 0; #5 clock = 1; endPre-silicon Tests
select interface signals to analyze
Simulation
protocol diagram:
describes interface behavior
protocol extraction
01100 00000 00010 00100 00101
“INFERNO: Streamlining Verification with Inferred Semantics”, DeOrio, et. al, 2009
8
transition
01100 00000 00010 00100 00101
event protocol diagram
protect thread sync TLB bypass ASI reload flush
time (cycles)
9
01100 00000 00010 00100 00101
SPARC core
TLU LSU interface
bit 0: protect bit 1: thread sync bit 2: TLB bypass bit 3: ASI reload bit 4: flush
10
Post
test platform
error
! transaction extraction
time errant transaction location ( signals ) transaction history
Pre
test test
tests
Protocol Database logic simulator
protocol extraction
11
test platform
protocol detector circular buffer
Error!
module testbench initial begin clock = 0; #5 clock = 1; endpost-si tests load protocols into programmable HW run high-coverage post-silicon tests
error is detected
01100 00000 00010 00100 00101
TLU Cache crossbar Memory
12
monitored interface
...
event CAM
...
priority enc
...
transition CAM
...
current event previous event valid event valid transition
error
... event history
history
test platform
protocol detector circular buffer
detect multiple protocols simultaneously
13
monitored interface
...
event CAM
...
priority enc
...
transition CAM
...
current event previous event valid event valid transition
error
... event history
history
check event check transition record history
14
monitored interface
...
event CAM
...
priority enc
...
transition CAM
...
current event previous event valid event valid transition
error
... event history
history
– 15.3KB storage each, for biggest OST2 protocol 33 bits x 62 events 622 transitions 1,024 events 10 protocols
– Cycle 10,000
15
SPARC core
TLU LSU interface
16
Post
test platform
error
! transaction extraction
time errant transaction location ( signals ) transaction history
Pre
test test
tests
Protocol Database logic simulator
protocol extraction
17
test platform
protocol detector circular buffer
module testbench initial begin clock = 0; #5 clock = 1; endPost-si Tests
transfer off-chip transaction extraction
01100 00000 00010 00100 00101
TLU Cache crossbar Memory
Inferno [DeOrio, et. al, 2009]
18
01100 00100
thread sync
00100 00000 01100
burst TLB bypass w/ thread sync
00000 00100
TLB bypass
19
01100 00000 00010 00100 00101
SPARC core
TLU LSU interface
bit 0: protect bit 1: thread sync bit 2: TLB bypass bit 3: ASI reload bit 4: flush address reload TLB bypass w/ flush TLB bypass burst TLB bypass w/sync
00100 00000 00101
buggy transition!
00010 10100
20
01100 00000 00010 00100 00101
01100 00100 00100 00000 01100 00000 00100
Extracted transaction history
thread sync burst TLB bypass w/ thread sync TLB bypass TLB bypass
00000 00100
3,694-3,732
TLU protocol diagram (subset)
4,492-4,531 4,539-4,543 4,545-4,602 ... 4,609 – 10,017
cycle
reload, flush
w/thread sync, TLB bypass, TLB bypass
21
00100 00000 00101
buggy transition!
00010 10100
– May miss bugs that only affect data signals – Interface signal selection important
– High pre-silicon coverage → fewer false positives – If f.p. is encountered, update the database
22
1,000 passing runs
23
10 testcases 100 random seeds: variable memory delay, crossbar random traffic 10 bugs: e.g., functional bug in PCX, fetch thread ID BiPeD HW BiPeD SW
detected transactions
100 buggy runs 10 interfaces
branch EX valid inst. cache-proc MEM rd ack FPU execept. fetch thread LSU access table walk PCX stall CCX/PCX req CPX
1,719 16 f.n.
branch
242
CCX
16k 39 16 742
memory
223
execute
16
FPU
f.p. 22k 48k 739 48k 22k
fetch
47
perf. TLU
16
PCX
767 764
24
first interface to find bug
f.p. false positive f.n. false negative
Interfaces Bugs
25
5 10 15 20 25 40 80 120 160 200
Cumulative events Cumulative transitions Testcase and total number of test executions transitions events
26
40 80 120 160 200 240 32 64 128 256 512 1024
Number of transactions Circular buffer size (entries) Total transactions Unique transactions
0.1 KB 4 KB
27
0% 10% 20% 30%
False positives (percent) Omitted testcase
[Ammons 2002, Ernst 2008]
– Detect invariants – Check tests against invariants
– Inferno: verification with transactions [DeOrio 2009] – Data mining high-level specifications [Li 2010]
– Manual debugging [Abramovici 2006]
– Automated debugging of specific components [Park 2011]
– Manual, hardcoded txn checkers [Singerman 2011]
28
with post-silicon detection
– Coverage metrics – Runtime verification
29