Accelerating Multiprocessor Simulation with a Memory Timestamp Record
Kenneth Barr Heidi Pan Michael Zhang Krste Asanovic March 21, 2005
Massachusetts Institute of Technology
Accelerating Multiprocessor Simulation with a Memory Timestamp - - PowerPoint PPT Presentation
Accelerating Multiprocessor Simulation with a Memory Timestamp Record Kenneth Barr Heidi Pan Michael Zhang Krste Asanovic Massachusetts Institute of Technology March 21, 2005 Intelligent sampling gives best speed-accuracy tradeoff for
Massachusetts Institute of Technology
Barr, Pan, Zhang, and Asanović. ISPASS. March 21, 2005. 2
single sample
Warmup + sample detailed ignored ISA only detailed ignored ISA+µarch ISA+MTR Update Reconstruct caches
d e t a i l e d
ISA only ignored
measure
(SimPoints)
Fast Functional Warming (SMARTS, FFW)
Record
Barr, Pan, Zhang, and Asanović. ISPASS. March 21, 2005. 3
Fast (less warmup), but tied to µarch
Slow due to warmup, but allows any µarch
Fast, NOT tied to µarch, supports multiprocessors…
Barr, Pan, Zhang, and Asanović. ISPASS. March 21, 2005. 4
CPU1 CPU2 CPUn $ $ $ Memory Directory
time CPUs
Barr, Pan, Zhang, and Asanović. ISPASS. March 21, 2005. 5
(Alameldeen and Wood, 2003)
– DRAM refresh – Hard disk arrangement delays DMA – Incoming packet interrupts application – Locking order reversed – Processes migrate
differently!
Time = 2.5 CPU
Time = 1.8 Time = 2.1 1 2 4 3
Barr, Pan, Zhang, and Asanović. ISPASS. March 21, 2005. 6
… … …
… CPUn-1 Last Writetime
… …
CPU0 N-1
Block Address
Last Writer Last Readtime
Barr, Pan, Zhang, and Asanović. ISPASS. March 21, 2005. 7
… … …
… CPUn-1 Last Writetime
… …
CPU0
N
Block Address Last Writer Last Readtime
CPU1 CPU2 CPUn
$ $ $ Memory
Directory CPU1 CPU2 CPUn
$ $ $ Memory
Directory
Barr, Pan, Zhang, and Asanović. ISPASS. March 21, 2005. 8
MTR: Memory Trace:
3 1 4 2 Time CPU1 CPU0 d c … CPUn-1 Last Writetime … … … CPU0 b e a Block Address Last Writer Last Readtime
Barr, Pan, Zhang, and Asanović. ISPASS. March 21, 2005. 9
MTR: Memory Trace:
3 1 4 2 Time Read a CPU1 CPU0 d c … CPUn-1 Last Writetime … … … CPU0 b e a Block Address Last Writer Last Readtime
Barr, Pan, Zhang, and Asanović. ISPASS. March 21, 2005. 10
MTR: Memory Trace:
3 Read e 1 4 2 Time Read a CPU1 CPU0 d c … CPUn-1 Last Writetime … … … CPU0 b 1 e a Block Address Last Writer Last Readtime
Barr, Pan, Zhang, and Asanović. ISPASS. March 21, 2005. 11
MTR: Memory Trace:
3 Read e 1 4 2 Time Read b Read a CPU1 CPU0 d c … CPUn-1 Last Writetime … … … CPU0 2 b 1 e a Block Address Last Writer Last Readtime
Barr, Pan, Zhang, and Asanović. ISPASS. March 21, 2005. 12
MTR: Memory Trace:
Read c 3 Read e 1 4 2 Time Read b Read a CPU1 CPU0 d 3 c … CPUn-1 Last Writetime … … … CPU0 2 b 1 e a Block Address Last Writer Last Readtime
Barr, Pan, Zhang, and Asanović. ISPASS. March 21, 2005. 13
MTR: Memory Trace:
Read c 3 Read e 1 4 2 Time Write b Read b Read a CPU1 CPU0 d 3 c … CPUn-1 4 Last Writetime … … … CPU0 CPU1 2 b 1 e a Block Address Last Writer Last Readtime
Barr, Pan, Zhang, and Asanović. ISPASS. March 21, 2005. 14
… … … … …
… CPUn-1 Last Writetime
… … …
CPU0 Block Address Last Writer Last Readtime
Set 1 Set 0 Set 3 Set 2 Way 1 Way 0
Barr, Pan, Zhang, and Asanović. ISPASS. March 21, 2005. 15
… … … … …
… CPUn-1 Last Writetime
… … …
CPU0 Block Address Last Writer Last Readtime
Set 1 Set 0 Set 3 Set 2 Way 1 Way 0
Barr, Pan, Zhang, and Asanović. ISPASS. March 21, 2005. 16
… … … … …
… CPUn-1 Last Writetime
… … …
CPU0 Block Address Last Writer Last Readtime
Set 1 Set 0 Set 3 Set 2 Way 1 Way 0
Barr, Pan, Zhang, and Asanović. ISPASS. March 21, 2005. 17
– One set, two ways
– Determine which blocks map to same set – Only ways most recent timestamps are present. Check validity later. d 3 c … CPUn-1 4 Last Writetime … … … CPU0 CPU1 2 b 1 e a Block Address Last Writer Last Readtime Set 1 Set 0 Way 1 Way 0 Set 1 Set 0 Way 1 Way 0
Barr, Pan, Zhang, and Asanović. ISPASS. March 21, 2005. 18
– One set, two ways
– Determine which blocks map to same set – Only ways most recent timestamps are present. Check validity later.
d 3 c … CPUn-1 4 Last Writetime … … … CPU0 CPU1 2 b 1 e a Block Address Last Writer Last Readtime Set 1 Set 0 Way 1 Way 0
Barr, Pan, Zhang, and Asanović. ISPASS. March 21, 2005. 19
– One set, two ways
– Determine which blocks map to same set – Only ways most recent timestamps are present. Check validity later.
d 3 c … CPUn-1 4 Last Writetime … … … CPU0 CPU1 2 b 1 e a Block Address Last Writer Last Readtime
Set 1 Set 0 Way 1 Way 0
Barr, Pan, Zhang, and Asanović. ISPASS. March 21, 2005. 20
– One set, two ways
– Determine which blocks map to same set – Only ways most recent timestamps are present. Check validity later.
d 3 c … CPUn-1 4 Last Writetime … … … CPU0 CPU1 2 b 1 e a Block Address Last Writer Last Readtime
Set 1 Set 0 Way 1 Way 0
Barr, Pan, Zhang, and Asanović. ISPASS. March 21, 2005. 21
– One set, two ways
– Determine which blocks map to same set – Only ways most recent timestamps are present. Check validity later.
d 3 c … CPUn-1 4 Last Writetime … … … CPU0 CPU1 2 b 1 e a Block Address Last Writer Last Readtime
Set 1
Set 0 Way 1 Way 0
Barr, Pan, Zhang, and Asanović. ISPASS. March 21, 2005. 22
– One set, two ways
– Determine which blocks map to same set – Only ways most recent timestamps are present. Check validity later.
d 3 c … CPUn-1 4 Last Writetime … … … CPU0 CPU1 2 b 1 e a Block Address Last Writer Last Readtime
Set 1
Set 0 Way 1 Way 0
Barr, Pan, Zhang, and Asanović. ISPASS. March 21, 2005. 23
– One set, two ways
– Determine which blocks map to same set – Only ways most recent timestamps are present. Check validity later.
CPU1?
d 3 c … CPUn-1 4 Last Writetime … … … CPU0 CPU1 2 b 1 e a Block Address Last Writer Last Readtime
Set 1
Set 0 Way 1 Way 0
Set 1 Set 0 Way 1 Way 0
Barr, Pan, Zhang, and Asanović. ISPASS. March 21, 2005. 24
Set 1 Set 0 Set 3 Set 2 Way 1 Way 0
Set 1 Set 0 Set 3 Set 2 Way 1 Way 0
Set 1 Set 0 Set 3 Set 2 Way 1 Way 0
Barr, Pan, Zhang, and Asanović. ISPASS. March 21, 2005. 25
d 3 c … CPUn-1 4 Last Writetime … … … CPU0 CPU1 2 b 1 e a Block Address Last Writer Last Readtime
invalid Valid, dirty
Which cache has the most recent copy of ‘b?’
Set 1
Set 0 Way 1 Way 0
Set 1 Set 0 Way 1 Way 0
Barr, Pan, Zhang, and Asanović. ISPASS. March 21, 2005. 26
d 3 c … CPUn-1 4 Last Writetime … … … CPU0 CPU1 2 b 1 e a Block Address Last Writer Last Readtime I d CPU0 S c S M S State CPU1 b CPU0 e a Block Address CPU0 Sharers
Barr, Pan, Zhang, and Asanović. ISPASS. March 21, 2005. 27
Barr, Pan, Zhang, and Asanović. ISPASS. March 21, 2005. 28
CPU0 writes b CPU0 reads b CPU0 writes b CPU0 reads b CPU0 writes b’ evicting b Time
MTR:
CPU0 Writer n+k CPU0 b address n Writetime CPU1
Barr, Pan, Zhang, and Asanović. ISPASS. March 21, 2005. 30
Memory Timestamp Record
Detailed Memory System
CPU 1 CPU N-1
CPU 0
Detailed Mode Enable
Main Memory Magic Memory
Stall
Bochs
Barr, Pan, Zhang, and Asanović. ISPASS. March 21, 2005. 31
Memory Timestamp Record
Detailed Memory System
CPU 1 CPU N-1
CPU 0
Detailed Mode Enable
Main Memory Magic Memory
Stall
Bochs
Barr, Pan, Zhang, and Asanović. ISPASS. March 21, 2005. 32
Memory Timestamp Record
Detailed Memory System
CPU 1 CPU N-1
CPU 0
Detailed Mode Enable
Main Memory Magic Memory
Stall
Bochs
Barr, Pan, Zhang, and Asanović. ISPASS. March 21, 2005. 33
Memory Timestamp Record
Detailed Memory System
CPU 1 CPU N-1
CPU 0
Detailed Mode Enable
Main Memory Magic Memory
Stall
Bochs
Barr, Pan, Zhang, and Asanović. ISPASS. March 21, 2005. 34
– scientific (comp. fluid dynamics) – OpenMP (loop iterations in parallel) – Fortran
– dbench: (Samba) several clients making file-centric system calls – Apache: several clients hammer web server (via loopback interface)
– uses spawn/sync primitives (dynamic thread creation/scheduling)
Barr, Pan, Zhang, and Asanović. ISPASS. March 21, 2005. 35
– Functional simulation of ISA – Cache / directory state kept accurate
– Both FFW and MTR should be accurate and fast – MTR should be faster than FFW – To be useful, FFW and MTR must answer questions in the same way as a detailed model, but faster
Barr, Pan, Zhang, and Asanović. ISPASS. March 21, 2005. 36
– Every 10k cycles choose victim processor – Victim will run 25% slower to emulate variation – Note: variation has MUCH larger effect during fast mode
dbench Cache 1 miss rate (%) 1:10 1:100 1:1000
Barr, Pan, Zhang, and Asanović. ISPASS. March 21, 2005. 37
actual
message types, the MSI vs. MESI change is dramatic.
– All fast-fwd bars move with the detailed bar. – Movement beyond range of detailed runs
more closely match detailed run
– Or, tune victim/slowdown
writeback rep (no ambig. resolution) (no ambig. resolution)
Barr, Pan, Zhang, and Asanović. ISPASS. March 21, 2005. 38
0.2 0.4 0.6 0.8 1 1 : 1 1 : 1 1 : 1 Runtime (normalized to FFW 1:10) 0.2 0.4 0.6 0.8 1 1 : 1 1 : 1 1 : 1
– MTR does less work in common case
– MTR has costlier transition, but – Reconstruction scales with touched lines not total accesses
Barr, Pan, Zhang, and Asanović. ISPASS. March 21, 2005. 39