accelerating multiprocessor simulation with a memory
play

Accelerating Multiprocessor Simulation with a Memory Timestamp - PowerPoint PPT Presentation

Accelerating Multiprocessor Simulation with a Memory Timestamp Record Kenneth Barr Heidi Pan Michael Zhang Krste Asanovic Massachusetts Institute of Technology March 21, 2005 Intelligent sampling gives best speed-accuracy tradeoff for


  1. Accelerating Multiprocessor Simulation with a Memory Timestamp Record Kenneth Barr Heidi Pan Michael Zhang Krste Asanovic Massachusetts Institute of Technology March 21, 2005

  2. Intelligent sampling gives best speed-accuracy tradeoff for uniprocessors (Yi, HPCA `05) • Single sample detailed ignored • Fastforward + ignored ISA only detailed single sample measure • Fastforward + ignored ISA only d e t a i l e d Warmup + sample • Selective Sampling (SimPoints) • Statistical Sampling • Statistical sampling w/ ISA+ µ arch Fast Functional Warming (SMARTS, FFW) • Memory Timestamp Record ISA+MTR Update Reconstruct caches Barr, Pan, Zhang, and Asanovi ć . ISPASS. March 21, 2005. 2

  3. Snapshots amortize fast-forwarding, but require slow warming or bind to a particular µ arch Slow due to ISA only warmup, but snapshots: allows any µ arch ISA+ µ arch Fast (less warmup), snapshots: but tied to µ arch MTR Fast, NOT tied to snapshots: µ arch, supports multiprocessors… Barr, Pan, Zhang, and Asanovi ć . ISPASS. March 21, 2005. 3

  4. Multiprocessors simulation is especially slow • More cores → CPU1 CPU2 CPUn More state/complexity → $ $ $ Long, complex simulations Memory Directory • Full system, threaded apps → CPUs More variability → More simulation time Barr, Pan, Zhang, and Asanovi ć . ISPASS. March 21, 2005. 4

  5. For full-system simulations of commercial workloads, subtle variation matters! (Alameldeen and Wood, 2003) 4 3 CPU 2 1 Time = 2.5 Time = 1.8 Time = 2.1 • All produce same result, each has different runtime – DRAM refresh – Hard disk arrangement delays DMA – Incoming packet interrupts application – Locking order reversed – Processes migrate • Is our new gizmo a success? Maybe OS just ordered threads differently! Barr, Pan, Zhang, and Asanovi ć . ISPASS. March 21, 2005. 5

  6. What is the Memory Timestamp Record (MTR)? • MTR is abstract picture of an multiprocessor’s coherence state CPU0 … CPUn-1 Block Address Last Readtime Last Writetime Last Writer 0 … … … … N-1 … Barr, Pan, Zhang, and Asanovi ć . ISPASS. March 21, 2005. 6

  7. What is the Memory Timestamp Record (MTR)? • MTR is abstract picture of an multiprocessor’s coherence state – Allow quick update during fast forwarding – Fill in concrete caches and directory prior to sampling CPU0 … CPUn-1 CPU1 CPU1 CPU2 CPU2 CPUn CPUn Block Address Last Readtime Last Last $ $ $ $ $ $ Writetime Writer 0 … … … … Memory Memory Directory Directory - N 1 … Barr, Pan, Zhang, and Asanovi ć . ISPASS. March 21, 2005. 7

  8. MTR example: update • MTR contains one entry per Memory Trace: memory block; locality keeps Time CPU0 CPU1 it sparse. • New access times overwrite 0 old (self-compressing) 1 2 3 MTR: 4 CPU0 … CPUn-1 Block Address Last Readtime Last Writetime Last Writer a … b … c d e … Barr, Pan, Zhang, and Asanovi ć . ISPASS. March 21, 2005. 8

  9. MTR example: update • MTR contains one entry per Memory Trace: memory block; locality keeps Time CPU0 CPU1 it sparse. • New access times overwrite 0 Read a old (self-compressing) 1 2 3 MTR: 4 CPU0 … CPUn-1 Block Address Last Readtime Last Writetime Last Writer a 0 … b … c d e … Barr, Pan, Zhang, and Asanovi ć . ISPASS. March 21, 2005. 9

  10. MTR example: update • MTR contains one entry per Memory Trace: memory block; locality keeps Time CPU0 CPU1 it sparse. • New access times overwrite 0 Read a old (self-compressing) 1 Read e 2 3 MTR: 4 CPU0 … CPUn-1 Block Address Last Readtime Last Writetime Last Writer a 0 … b … c d e 1 … Barr, Pan, Zhang, and Asanovi ć . ISPASS. March 21, 2005. 10

  11. MTR example: update • MTR contains one entry per Memory Trace: memory block; locality keeps Time CPU0 CPU1 it sparse. • New access times overwrite 0 Read a old (self-compressing) 1 Read e 2 Read b 3 MTR: 4 CPU0 … CPUn-1 Block Address Last Readtime Last Writetime Last Writer a 0 … b 2 … c d e 1 … Barr, Pan, Zhang, and Asanovi ć . ISPASS. March 21, 2005. 11

  12. MTR example: update • MTR contains one entry per Memory Trace: memory block; locality keeps Time CPU0 CPU1 it sparse. • New access times overwrite 0 Read a old (self-compressing) 1 Read e 2 Read b 3 Read c MTR: 4 CPU0 … CPUn-1 Block Address Last Readtime Last Writetime Last Writer a 0 … b 2 … c 3 d e 1 … Barr, Pan, Zhang, and Asanovi ć . ISPASS. March 21, 2005. 12

  13. MTR example: update • MTR contains one entry per Memory Trace: memory block; locality keeps Time CPU0 CPU1 it sparse. • New access times overwrite 0 Read a old (self-compressing) 1 Read e 2 Read b 3 Read c MTR: 4 Write b CPU0 … CPUn-1 Block Address Last Readtime Last Writetime Last Writer a 0 … b 2 … 4 CPU1 c 3 d e 1 … Barr, Pan, Zhang, and Asanovi ć . ISPASS. March 21, 2005. 13

  14. 1. Coalesce: determining correct cache tags MTR: CPU0 … CPUn-1 Block Address Last Readtime Last Writetime Last Writer … … … … … … … … Cache: Way 0 Way 1 Set 0 Set 1 Set 2 Set 3 Barr, Pan, Zhang, and Asanovi ć . ISPASS. March 21, 2005. 14

  15. 1. Coalesce: determining correct cache tags MTR: CPU0 … CPUn-1 Block Address Last Readtime Last Writetime Last Writer … … … … … … … … Cache: Way 0 Way 1 Set 0 Set 1 Set 2 Set 3 Barr, Pan, Zhang, and Asanovi ć . ISPASS. March 21, 2005. 15

  16. 1. Coalesce: determining correct cache tags MTR: CPU0 … CPUn-1 Block Address Last Readtime Last Writetime Last Writer … … … … … … … … Cache: Way 0 Way 1 Set 0 Set 1 Set 2 Set 3 Barr, Pan, Zhang, and Asanovi ć . ISPASS. March 21, 2005. 16

  17. MTR example: coalesce • Choose organization – One set, two ways • Coalesce – Determine which blocks map to same set – Only ways most recent timestamps are present. Check validity later. CPU0 CPUn-1 … Block Address Last Readtime Last Writetime Last Writer a 0 … b 2 … 4 CPU1 c 3 d e 1 … Way 0 Way 1 Way 0 Way 1 Set 0 Set 0 Set 1 Set 1 Barr, Pan, Zhang, and Asanovi ć . ISPASS. March 21, 2005. 17

  18. MTR example: coalesce • Choose organization – One set, two ways • Coalesce – Determine which blocks map to same set – Only ways most recent timestamps are present. Check validity later. CPU0 CPUn-1 … Block Address Last Readtime Last Writetime Last Writer a 0 … b 2 … 4 CPU1 c 3 d e 1 … • What are the contents of CPU0 cache? Way 0 Way 1 Set 0 Set 1 Barr, Pan, Zhang, and Asanovi ć . ISPASS. March 21, 2005. 18

  19. MTR example: coalesce • Choose organization – One set, two ways • Coalesce – Determine which blocks map to same set – Only ways most recent timestamps are present. Check validity later. CPU0 CPUn-1 … Block Address Last Readtime Last Writetime Last Writer a 0 … b 2 … 4 CPU1 c 3 d e 1 … • What are the contents of CPU0 cache? Way 0 Way 1 Set 0 a 0 Set 1 Barr, Pan, Zhang, and Asanovi ć . ISPASS. March 21, 2005. 19

  20. MTR example: coalesce • Choose organization – One set, two ways • Coalesce – Determine which blocks map to same set – Only ways most recent timestamps are present. Check validity later. CPU0 CPUn-1 … Block Address Last Readtime Last Writetime Last Writer a 0 … b 2 … 4 CPU1 c 3 d e 1 … • What are the contents of CPU0 cache? Way 0 Way 1 Set 0 a 0 Set 1 b 2 Barr, Pan, Zhang, and Asanovi ć . ISPASS. March 21, 2005. 20

  21. MTR example: coalesce • Choose organization – One set, two ways • Coalesce – Determine which blocks map to same set – Only ways most recent timestamps are present. Check validity later. CPU0 CPUn-1 … Block Address Last Readtime Last Writetime Last Writer a 0 … b 2 … 4 CPU1 c 3 d e 1 … • What are the contents of CPU0 cache? Way 0 Way 1 Set 0 a 0 c 3 Set 1 b 2 Barr, Pan, Zhang, and Asanovi ć . ISPASS. March 21, 2005. 21

  22. MTR example: coalesce • Choose organization – One set, two ways • Coalesce – Determine which blocks map to same set – Only ways most recent timestamps are present. Check validity later. CPU0 CPUn-1 … Block Address Last Readtime Last Writetime Last Writer a 0 … b 2 … 4 CPU1 c 3 d e 1 … • What are the contents of CPU0 cache? Way 0 Way 1 Set 0 e 1 c 3 Set 1 b 2 Barr, Pan, Zhang, and Asanovi ć . ISPASS. March 21, 2005. 22

  23. MTR example: coalesce • Choose organization – One set, two ways • Coalesce – Determine which blocks map to same set – Only ways most recent timestamps are present. Check validity later. CPU0 CPUn-1 … Block Address Last Readtime Last Writetime Last Writer a 0 … b 2 … 4 CPU1 c 3 d e 1 … CPU1? • What are the contents of CPU0 cache? Way 0 Way 1 Way 0 Way 1 Set 0 e 1 c 3 Set 0 Set 1 b 2 Set 1 b 4 Barr, Pan, Zhang, and Asanovi ć . ISPASS. March 21, 2005. 23

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend