Methods for Emulation of Multi-Core CPU Performance Tomasz Buchert 1 - - PowerPoint PPT Presentation

methods for emulation of multi core cpu performance
SMART_READER_LITE
LIVE PREVIEW

Methods for Emulation of Multi-Core CPU Performance Tomasz Buchert 1 - - PowerPoint PPT Presentation

Methods for Emulation of Multi-Core CPU Performance Tomasz Buchert 1 Lucas Nussbaum 2 Jens Gustedt 1 1 INRIA Nancy Grand Est 2 LORIA / Nancy - Universit e Validation of distributed systems Approaches: Theoretical approach (paper and


slide-1
SLIDE 1

Methods for Emulation of Multi-Core CPU Performance

Tomasz Buchert1 Lucas Nussbaum2 Jens Gustedt1

1 INRIA Nancy – Grand Est 2 LORIA / Nancy - Universit´

e

slide-2
SLIDE 2

Validation of distributed systems

Approaches: Theoretical approach (paper and pencil)

the most general results and understanding very hard (leads to unsolvability results)

Experimentation (real application on a real environment)

realistic context, credibility difficulty of preparation and control, questionable reproducibility

Simulation (modeled application inside modeled environment)

very simple and perfectly reproducible experimental bias, possibly unrealistic

Emulation (real application inside a modeled environment)

control over the experiment parameters difficult

Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 2 / 26

slide-3
SLIDE 3

Emulation

The perfect emulated environment should emulate (independently): Network bandwidth, latency, topology Memory capabilities Background noise (network, faults) CPU speed and its features Some parts implemented in Wrekavoc – a tool to define and control heterogeneity of the cluster In this talk, however, we specifically concentrate on

Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 3 / 26

slide-4
SLIDE 4

Emulation

The perfect emulated environment should emulate (independently): Network bandwidth, latency, topology Memory capabilities Background noise (network, faults) CPU speed and its features Some parts implemented in Wrekavoc – a tool to define and control heterogeneity of the cluster In this talk, however, we specifically concentrate on

Emulation of CPU speed

Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 3 / 26

slide-5
SLIDE 5

Our goal

1 2 3 4 5 6 7 (1) control over the speed of each CPU/core independently

Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 4 / 26

slide-6
SLIDE 6

Our goal

1 2 3 4 5 6 7 VN 1 VN 2 VN 3 Virtual node 4 (2) ability to create separately scheduled zones of tasks

Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 4 / 26

slide-7
SLIDE 7

Existing methods (CPU-Freq)

Hardware solution to reduce heat, noise and power usage For:

no overhead of emulation completely unintrusive meaningful CPU time measure

Against:

  • nly a finite set of different frequency levels

Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 5 / 26

slide-8
SLIDE 8

Existing methods (CPU-Lim)

Method available in Wrekavoc tool Algorithm:

if CPU usage ≥ threshold → send SIGSTOP to the process if CPU usage < threshold → send SIGCONT to the process

CPU usage = CPU time of the process

process lifetime

For:

easy and almost POSIX-compliant

Against:

intrusive and unscalable decision based on one process instead of global CPU usage sleeping is indistinguishable from preemption

Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 6 / 26

slide-9
SLIDE 9

Existing methods (Fracas)

Based on idea from KRASH (a load injection tool) Uses Linux Cgroups and Completely Fair Scheduler A predefined portion of the CPU is given to tasks burning CPU All other processes are given the remaining CPU time

Emulated processes CPU burner Core 1 Emulated processes CPU burner Core 2 Emulated processes CPU burner Core 3 Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 7 / 26

slide-10
SLIDE 10

Existing methods (Fracas)

Based on idea from KRASH (a load injection tool) Uses Linux Cgroups and Completely Fair Scheduler A predefined portion of the CPU is given to tasks burning CPU All other processes are given the remaining CPU time For:

unintrusive scalable

Against:

unportable to other systems sensitive to the configuration of the scheduler

Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 7 / 26

slide-11
SLIDE 11

New methods (CPU-Gov)

Generalization of CPU-Freq Alternates between two neighbouring hardware frequencies 1.2 GHz 2.4 GHz

τ

1.5 GHz

τ 0.75τ 0.25τ

Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 8 / 26

slide-12
SLIDE 12

New methods (CPU-Gov)

Generalization of CPU-Freq Alternates between two neighbouring hardware frequencies For:

no overhead, unintrusive and meaningful CPU time measure (inherited from CPU-Freq) continuous range of emulated frequency

Against:

dependency on the hardware implementation (inherited from CPU-Freq) special algorithm for small values of emulated frequency

Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 8 / 26

slide-13
SLIDE 13

New methods (CPU-Hogs)

Generalization of CPU-burning technique For each core there is a high-priority thread created They ”burn” a required number of CPU cycles For:

simple and portable (POSIX) does not rely on the hardware

Against:

theoretical problems with scalability (not observed)

Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 9 / 26

slide-14
SLIDE 14

New methods (CPU-Hogs)

Generalization of CPU-burning technique For each core there is a high-priority thread created They ”burn” a required number of CPU cycles cores time 1

Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 9 / 26

slide-15
SLIDE 15

Evaluation

Microbenchmarks with different types of work: CPU intensive – running a tight computational loop IO bound – sending UDP packets over a network CPU and IO intensive – sleeping mixed with a computation multiprocessing – running multiple processes with CPU work multithreading – running multiple threads with CPU work memory speed (STREAM benchmark) – sustainable memory bandwidth

Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 10 / 26

slide-16
SLIDE 16

Evaluation (cont.)

Tested with 1, 2, 4 and 8 emulated cores X-axis – emulated frequency Y-axis – speed perceived by the benchmark each test repeated 40 times, results = average with 95% confidence interval Evaluation performed on Grid’5000 platform

Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 11 / 26

slide-17
SLIDE 17

Grid’5000

9 sites, 1528 machines

Lille, Rennes, Orsay, Nancy, Bordeaux, Lyon, Grenoble, Toulouse, Sophia

Dedicated to research on distributed systems and HPC

Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 12 / 26

slide-18
SLIDE 18

CPU intensive work (one core)

0.5 1 1.5 2 2.5 3 2,000 4,000 6,000 8,000

Emulated CPU frequency (GHz) Loops/sec CPU-Lim CPU-Hogs Fracas CPU-Gov CPU-Freq

✞ ✝ ☎ ✆

All methods work as expected.

Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 13 / 26

slide-19
SLIDE 19

IO-intensive work (one core)

0.5 1 1.5 2 2.5 3 0.5 1 ·104

Emulated CPU frequency (GHz) Loops/sec CPU-Lim CPU-Hogs Fracas CPU-Gov CPU-Freq

✞ ✝ ☎ ✆

IO operations should not scale with CPU frequency.

Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 14 / 26

slide-20
SLIDE 20

Memory speed (one core)

0.5 1 1.5 2 2.5 3 0.2 0.4 0.6 0.8 1 ·104

Emulated CPU frequency (GHz) MB/sec CPU-Lim CPU-Hogs Fracas CPU-Gov CPU-Freq

✞ ✝ ☎ ✆

Ideally, memory speed would not be scaled as well.

Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 15 / 26

slide-21
SLIDE 21

Computing and sleeping workload (one core)

0.5 1 1.5 2 2.5 3 2,000 4,000 6,000

Emulated CPU frequency (GHz) Loops/sec CPU-Lim CPU-Hogs Fracas CPU-Gov CPU-Freq

✞ ✝ ☎ ✆

The relation should be proportional, but CPU-Lim’s is not.

Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 16 / 26

slide-22
SLIDE 22

Multiprocessing benchmark (one core)

0.5 1 1.5 2 2.5 3 500 1,000 1,500

Emulated CPU frequency (GHz) Loops/sec CPU-Lim CPU-Hogs Fracas CPU-Gov CPU-Freq

✞ ✝ ☎ ✆

This relation should be proportional again (but CPU-Lim’s is not).

Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 17 / 26

slide-23
SLIDE 23

Multithreading benchmark (one core)

0.5 1 1.5 2 2.5 3 500 1,000 1,500

Emulated CPU frequency (GHz) Loops/sec CPU-Lim CPU-Hogs Fracas CPU-Gov CPU-Freq

✞ ✝ ☎ ✆

The execution speed scales with the frequency.

Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 18 / 26

slide-24
SLIDE 24

Multithreading benchmark (two cores)

0.5 1 1.5 2 2.5 3 1,000 2,000 3,000 4,000

Emulated CPU frequency (GHz) Loops/sec CPU-Lim CPU-Hogs Fracas CPU-Gov CPU-Freq

✞ ✝ ☎ ✆

CPU-Lim and Fracas run twice as slow as CPU-Freq.

Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 19 / 26

slide-25
SLIDE 25

Multiprocessing benchmark (two cores)

0.5 1 1.5 2 2.5 3 1,000 2,000 3,000 4,000

Emulated CPU frequency (GHz) Loops/sec CPU-Lim CPU-Hogs Fracas CPU-Gov CPU-Freq

✞ ✝ ☎ ✆

CPU-Lim and Fracas fail in this benchmark too.

Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 20 / 26

slide-26
SLIDE 26

Multiprocessing benchmark (four cores)

0.5 1 1.5 2 2.5 3 2,000 4,000 6,000

Emulated CPU frequency (GHz) Loops/sec CPU-Lim CPU-Hogs Fracas CPU-Gov CPU-Freq

✞ ✝ ☎ ✆

What happened here?

Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 21 / 26

slide-27
SLIDE 27

Multiprocessing benchmark (eight cores)

0.5 1 1.5 2 2.5 3 5,000 10,000

Emulated CPU frequency (GHz) Loops/sec CPU-Lim CPU-Hogs Fracas CPU-Gov CPU-Freq

✞ ✝ ☎ ✆

Fracas fails once more (but CPU-Lim doesn’t!).

Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 22 / 26

slide-28
SLIDE 28

Summary of the evaluation

CPU-Freq:

very good results coarse granularity

CPU-Lim:

not scalable due to implementation, intrusive higher variance controls processes, not threads

Fracas:

good behavior for a single-task workload scalable bad behavior for multitask workload behavior differs from one version of Linux to another

Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 23 / 26

slide-29
SLIDE 29

Summary of the evaluation (cont.)

CPU-Gov and CPU-Hogs:

improvement over previous methods good and stable behavior in virtually every benchmark scalability independent from the underlying OS

Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 24 / 26

slide-30
SLIDE 30

Future work

Emulate memory bandwidth Emulate other aspects of CPU Test the methods with real-life applications Integrate the best methods into an open source, user-friendly emulator (Wrekavoc)

Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 25 / 26

slide-31
SLIDE 31

Conclusions

Presented CPU-Hogs, CPU-Gov and previously existing methods Compared them by running a set of microbenchmarks Evaluated experimentally on Grid’5000 New methods show a big improvement in the quality of emulation

Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 26 / 26

slide-32
SLIDE 32

Thanks for listening.

Questions?

Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 27 / 26