dist gem5 distributed simulation of
play

dist-gem5: Distributed Simulation of Compute Clusters Mohammad - PowerPoint PPT Presentation

dist-gem5: Distributed Simulation of Compute Clusters Mohammad Alian, Umur Darbaz, Gabor Dozsa, Stephan Diestelhorst, Daehoon Kim, Nam Sung Kim University of Illinois Urbana-Champaign ARM Ltd., Cambridge, UK 1 2 Outline motivation


  1. dist-gem5: Distributed Simulation of Compute Clusters Mohammad Alian, Umur Darbaz, Gabor Dozsa, Stephan Diestelhorst, Daehoon Kim, Nam Sung Kim University of Illinois Urbana-Champaign ARM Ltd., Cambridge, UK 1

  2. 2 Outline • motivation  accelerating large-scale simulation • dist-gem5 architecture  packet forwarding  synchronization  checkpointing  network model • evaluation  validation, speedup, synchronization overhead • conclusion what is gem5 dist-gem5 architecture evaluation conclusion

  3. 3 Outline • motivation  accelerating large-scale simulation • dist-gem5 architecture  packet forwarding  synchronization  checkpointing  network model • evaluation  validation, speedup, synchronization overhead • conclusion what is gem5 dist-gem5 architecture evaluation conclusion

  4. 4 What is gem5 – overview • full-system, cycle-level, event-driven simulator • used/maintained at universities and industry Stream Traffic Traffic NoMali Gen Monitor Line ARMv7a ARMv8 GPU models Sim KVMv7 FracFact Points Atomic Timing Power KVMv8 PCA Model Int. Out of Simulation support In Order Order Snoop Crossbar Bridges CPU Models filter Interconnect ARM ISA Support UHDLCD UART DMA GICv2 L1-L3 $ SCU RTC UFS NVMe 10Gb ArchTimer PMU Timers Flash DRAM HMC NIC Core Integrated IP Memory IO components what is gem5 dist-gem5 architecture evaluation conclusion

  5. 5 Why dist-gem5? • performance and power dissipation of a distributed system  complex interplay among system components at scale • need a full-system, cycle-level simulator which is fast enough to simulate a large-scale computer system scale devices OS • distributed simulation:  simulate a distributed system w/ many simulation hosts performance memory cores Power network caches ISAs what is gem5 dist-gem5 architecture evaluation conclusion

  6. 6 dist-gem5 architecture – high level view • gem5 processes modeling full systems run in host #1 physical machine parallel on a cluster of physical machines simulated system #1 gem5 process • simulated network switch  forward packets among the simulated systems host #2 host #4  synchronize the distributed simulation simulated simulated  simulate network topology network system #2 switch host #3 simulated system #3 what is gem5 dist-gem5 architecture evaluation conclusion

  7. 7 Outline • motivation  accelerating large-scale simulation • dist-gem5 architecture  packet forwarding  synchronization  checkpointing  network model • evaluation  validation, speedup, synchronization overhead • conclusion what is gem5 dist-gem5 architecture evaluation conclusion

  8. 8 dist-gem5 architecture – core components packet distributed simulated forwarding check-pointing network synchronization what is gem5 dist-gem5 architecture evaluation conclusion

  9. 9 dist-gem5 architecture – core components packet distributed simulated forwarding check-pointing network synchronization what is gem5 dist-gem5 architecture evaluation conclusion

  10. 10 dist-gem5 architecture – packet forw rwarding phys NIC#1 phys port1 phys phys port3 NIC#3 physical host #1 phys port2 physical switch phys NIC#2 physical host #3 physical host #2 what is gem5 dist-gem5 architecture evaluation conclusion

  11. 11 dist-gem5 architecture – packet forw rwarding sim phys NIC NIC#1 simulated phys system #1 sim port1 port0 gem5 #1 phys phys port3 NIC#3 physical host #1 phys sim port2 port1 sim physical switch phys NIC NIC#2 simulated switch simulated gem5 #3 system #2 physical host #3 gem5 #2 physical host #2 what is gem5 dist-gem5 architecture evaluation conclusion

  12. 12 dist-gem5 architecture – packet forw rwarding simulated packets sim are embedded into phys NIC TCP sim pkt host TCP/IP packets NIC#1 simulated sim pkt phys system #1 sim port1 port0 gem5 #1 phys phys TCP sim pkt sim pkt port3 NIC#3 physical host #1 phys sim port2 port1 sim physical switch phys NIC sim pkt NIC#2 simulated switch simulated gem5 #3 system #2 physical host #3 gem5 #2 physical host #2 what is gem5 dist-gem5 architecture evaluation conclusion

  13. 13 Asynchronous processing of f incoming messages • simulation thread (main thread) physical host  process/insert events in the event queue gem5 process eventQ  in case of send pkt event, encapsulate the simulation send pkt simulated Ethernet packet in a message and thread phys send it out NIC • receiver thread receiver recv pkt  create for each gem5 process thread  waits for incoming packets  creates a recv pkt event and insert it to the event queue what is gem5 dist-gem5 architecture evaluation conclusion

  14. 14 dist-gem5 architecture – core components packet distributed simulated forwarding check-pointing network synchronization what is gem5 dist-gem5 architecture evaluation conclusion

  15. 15 Need for synchronization • receiver gem5 can run ahead of send time sender gem5 gem5#0  physical host mismatch  different events to be simulated network delay processed late packet arrival • slowed down receiver gem5 to ensure simulation accuracy gem5#1 • quantum-based synchronization recv time expected delivery time wall clock time what is gem5 dist-gem5 architecture evaluation conclusion

  16. 16 Accurate packet forw rwarding global sync • quantum : interval for periodic quantum send time synchronization in simulated time gem5#0 gem5#0 • sync-event flushes inter gem5 simulated network delay communication channels • if quantum ≤ simulated link delay: expected delivery time packet arrival wall  expected delivery tick falls clock time inside the next quantum gem5#1 gem5#1 • optimal quantum size for accurate forwarding == simulated link delay quantum wall clock time what is gem5 dist-gem5 architecture evaluation conclusion

  17. 17 dist-gem5 architecture – core components packet distributed simulated forwarding check-pointing network synchronization what is gem5 dist-gem5 architecture evaluation conclusion

  18. 18 dist-gem5 architecture – network modeling aggregate simulate in one gem5 process switch top of rack top of rack top of rack switch #0 switch #1 switch #7 Server #0 server #8 server #56 Server #1 server #9 server #57 server #2 server #10 server #58 . . . server #3 server #11 server #59 server #4 server #12 server #60 server #5 server #13 server #61 server #6 server #14 server #62 server #7 server #15 server #63 what is gem5 dist-gem5 architecture evaluation conclusion

  19. 19 Configurable network model MAC Table In-orderQ#0 IPORT#0 OPORT#0 • configurable baseline Ethernet switch model . . . . . .  port number, delay, bandwidth, buffer size In-orderQ#n IPORT#n OPORT#n gem5 aggregate simulated port simulated etherLink switch simulated etherSwitch p0 p1 p7 p8 p8 p8 top of rack top of rack top of rack switch #0 switch #1 switch #7 distEtherLink p0 p7 p7 p0 p0 p7 . . . . . . . . . physical host what is gem5 dist-gem5 architecture evaluation conclusion

  20. 20 Outline • motivation  accelerating large-scale simulation • dist-gem5 architecture  packet forwarding  synchronization  checkpointing  network model • evaluation  validation, speedup, synchronization overhead • conclusion what is gem5 dist-gem5 architecture evaluation conclusion

  21. 21 Methodology – simulation techniques • For example, simulating a cluster w/ 7 nodes and 1 network switch: dist-gem5 single-threaded-gem5 parallel-gem5 system#6 switch system#6 switch system#6 switch gem5#6 gem5#7 gem5#6 gem5#7 system#4 system#5 system#4 system#5 system#4 system#5 gem5#4 gem5#5 system#2 system#3 gem5#4 gem5#5 quad core physical host system#0 system#1 system#2 system#3 system#2 system#3 gem5#2 gem5#3 gem5#0 gem5#6 gem5#7 quad core physical host system#0 system#1 system#0 system#1 gem5#0 gem5#1 gem5#4 gem5#5 quad core physical host quad core physical host what is gem5 dist-gem5 architecture evaluation conclusion

  22. 22 Methodology – experimental setup • focus on off-chip network performance using network intensive applications  iperf, memcached, httperf, tcptest, netperf, NAS parallel benchmark • verification/validation against:  single-threaded-gem5  physical cluster category gem5 configuration o 4 node cluster w/ AMD A10-5800K O3 core 4 cores; 4 way superscalar • speedup comparison against: memory 8GB DDR3 1600 MHz  single-threaded-gem5 network Intel GbE NIC; 1 μ s Link latency  parallel-gem5 OS Linux Ubuntu 14.04 (Kernel 4.3) what is gem5 dist-gem5 architecture evaluation conclusion

  23. 23 Verification • same node/network config  dist-gem5 generates identical dist-gem5 single-threaded-gem5 simulation statistics compared to system#6 switch single-threaded-gem5 system#6 switch gem5#6 gem5#7  different cluster sizes system#4 system#5 system#4 system#5 = gem5#4 gem5#5 system#2 system#3 quad core physical host system#0 system#1 system#2 system#3 gem5#0 gem5#6 gem5#7 quad core physical host system#0 system#1 gem5#4 gem5#5 quad core physical host

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend