methods for emulation of multi core cpu performance
play

Methods for Emulation of Multi-Core CPU Performance Tomasz Buchert 1 - PowerPoint PPT Presentation

Methods for Emulation of Multi-Core CPU Performance Tomasz Buchert 1 Lucas Nussbaum 2 Jens Gustedt 1 1 INRIA Nancy Grand Est 2 LORIA / Nancy - Universit e Validation of distributed systems Approaches: Theoretical approach (paper and


  1. Methods for Emulation of Multi-Core CPU Performance Tomasz Buchert 1 Lucas Nussbaum 2 Jens Gustedt 1 1 INRIA Nancy – Grand Est 2 LORIA / Nancy - Universit´ e

  2. Validation of distributed systems Approaches: Theoretical approach (paper and pencil) � the most general results and understanding � very hard (leads to unsolvability results) Experimentation (real application on a real environment) � realistic context, credibility � difficulty of preparation and control, questionable reproducibility Simulation (modeled application inside modeled environment) � very simple and perfectly reproducible � experimental bias, possibly unrealistic Emulation (real application inside a modeled environment) � control over the experiment parameters � difficult Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 2 / 26

  3. Emulation The perfect emulated environment should emulate (independently): Network bandwidth, latency, topology Memory capabilities Background noise (network, faults) CPU speed and its features Some parts implemented in Wrekavoc – a tool to define and control heterogeneity of the cluster In this talk, however, we specifically concentrate on Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 3 / 26

  4. Emulation The perfect emulated environment should emulate (independently): Network bandwidth, latency, topology Memory capabilities Background noise (network, faults) CPU speed and its features Some parts implemented in Wrekavoc – a tool to define and control heterogeneity of the cluster In this talk, however, we specifically concentrate on Emulation of CPU speed Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 3 / 26

  5. Our goal 0 1 2 3 4 5 6 7 (1) control over the speed of each CPU/core independently Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 4 / 26

  6. Our goal 0 1 2 3 4 5 6 7 VN 1 VN 2 VN 3 Virtual node 4 (2) ability to create separately scheduled zones of tasks Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 4 / 26

  7. Existing methods (CPU-Freq) Hardware solution to reduce heat, noise and power usage For: no overhead of emulation completely unintrusive meaningful CPU time measure Against: only a finite set of different frequency levels Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 5 / 26

  8. Existing methods (CPU-Lim) Method available in Wrekavoc tool Algorithm: if CPU usage ≥ threshold → send SIGSTOP to the process if CPU usage < threshold → send SIGCONT to the process CPU usage = CPU time of the process process lifetime For: easy and almost POSIX-compliant Against: intrusive and unscalable decision based on one process instead of global CPU usage sleeping is indistinguishable from preemption Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 6 / 26

  9. Existing methods (Fracas) Based on idea from KRASH (a load injection tool) Uses Linux Cgroups and Completely Fair Scheduler A predefined portion of the CPU is given to tasks burning CPU All other processes are given the remaining CPU time CPU burner CPU burner CPU burner Emulated Emulated processes processes Emulated processes Core 1 Core 2 Core 3 Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 7 / 26

  10. Existing methods (Fracas) Based on idea from KRASH (a load injection tool) Uses Linux Cgroups and Completely Fair Scheduler A predefined portion of the CPU is given to tasks burning CPU All other processes are given the remaining CPU time For: unintrusive scalable Against: unportable to other systems sensitive to the configuration of the scheduler Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 7 / 26

  11. New methods (CPU-Gov) Generalization of CPU-Freq Alternates between two neighbouring hardware frequencies 0 . 75 τ 0 . 25 τ 2 . 4 GHz 1 . 5 GHz 1 . 2 GHz τ τ Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 8 / 26

  12. New methods (CPU-Gov) Generalization of CPU-Freq Alternates between two neighbouring hardware frequencies For: no overhead, unintrusive and meaningful CPU time measure (inherited from CPU-Freq) continuous range of emulated frequency Against: dependency on the hardware implementation (inherited from CPU-Freq) special algorithm for small values of emulated frequency Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 8 / 26

  13. New methods (CPU-Hogs) Generalization of CPU-burning technique For each core there is a high-priority thread created They ”burn” a required number of CPU cycles For: simple and portable (POSIX) does not rely on the hardware Against: theoretical problems with scalability (not observed) Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 9 / 26

  14. New methods (CPU-Hogs) Generalization of CPU-burning technique For each core there is a high-priority thread created They ”burn” a required number of CPU cycles cores 1 0 time Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 9 / 26

  15. Evaluation Microbenchmarks with different types of work: CPU intensive – running a tight computational loop IO bound – sending UDP packets over a network CPU and IO intensive – sleeping mixed with a computation multiprocessing – running multiple processes with CPU work multithreading – running multiple threads with CPU work memory speed (STREAM benchmark) – sustainable memory bandwidth Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 10 / 26

  16. Evaluation (cont.) Tested with 1, 2, 4 and 8 emulated cores X-axis – emulated frequency Y-axis – speed perceived by the benchmark each test repeated 40 times, results = average with 95% confidence interval Evaluation performed on Grid’5000 platform Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 11 / 26

  17. Grid’5000 9 sites, 1528 machines Lille, Rennes, Orsay, Nancy, Bordeaux, Lyon, Grenoble, Toulouse, Sophia Dedicated to research on distributed systems and HPC Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 12 / 26

  18. CPU intensive work (one core) 8 , 000 CPU-Lim Loops/sec CPU-Hogs 6 , 000 Fracas CPU-Gov 4 , 000 CPU-Freq 2 , 000 0 . 5 1 1 . 5 2 2 . 5 3 Emulated CPU frequency (GHz) ✞ ☎ All methods work as expected. ✝ ✆ Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 13 / 26

  19. IO-intensive work (one core) · 10 4 1 CPU-Lim Loops/sec CPU-Hogs Fracas CPU-Gov 0 . 5 CPU-Freq 0 . 5 1 . 5 2 . 5 1 2 3 Emulated CPU frequency (GHz) ✞ ☎ IO operations should not scale with CPU frequency. ✝ ✆ Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 14 / 26

  20. Memory speed (one core) · 10 4 1 CPU-Lim 0 . 8 CPU-Hogs MB/sec Fracas 0 . 6 CPU-Gov 0 . 4 CPU-Freq 0 . 2 0 . 5 1 . 5 2 . 5 1 2 3 Emulated CPU frequency (GHz) ✞ ☎ Ideally, memory speed would not be scaled as well. ✝ ✆ Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 15 / 26

  21. Computing and sleeping workload (one core) 6 , 000 CPU-Lim Loops/sec CPU-Hogs 4 , 000 Fracas CPU-Gov CPU-Freq 2 , 000 0 . 5 1 1 . 5 2 2 . 5 3 Emulated CPU frequency (GHz) ✞ ☎ The relation should be proportional, but CPU-Lim’s is not. ✝ ✆ Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 16 / 26

  22. Multiprocessing benchmark (one core) CPU-Lim 1 , 500 Loops/sec CPU-Hogs Fracas 1 , 000 CPU-Gov CPU-Freq 500 0 . 5 1 1 . 5 2 2 . 5 3 Emulated CPU frequency (GHz) ✞ ☎ This relation should be proportional again (but CPU-Lim’s is not). ✝ ✆ Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 17 / 26

  23. Multithreading benchmark (one core) CPU-Lim 1 , 500 Loops/sec CPU-Hogs Fracas 1 , 000 CPU-Gov CPU-Freq 500 0 . 5 1 1 . 5 2 2 . 5 3 Emulated CPU frequency (GHz) ✞ ☎ The execution speed scales with the frequency. ✝ ✆ Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 18 / 26

  24. Multithreading benchmark (two cores) 4 , 000 3 , 000 CPU-Lim Loops/sec CPU-Hogs 2 , 000 Fracas CPU-Gov 1 , 000 CPU-Freq 0 . 5 1 1 . 5 2 2 . 5 3 Emulated CPU frequency (GHz) ✞ ☎ CPU-Lim and Fracas run twice as slow as CPU-Freq. ✝ ✆ Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 19 / 26

  25. Multiprocessing benchmark (two cores) 4 , 000 3 , 000 CPU-Lim Loops/sec CPU-Hogs 2 , 000 Fracas CPU-Gov 1 , 000 CPU-Freq 0 . 5 1 1 . 5 2 2 . 5 3 Emulated CPU frequency (GHz) ✞ ☎ CPU-Lim and Fracas fail in this benchmark too. ✝ ✆ Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 20 / 26

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend