Datacenter application interference CMPs (popular in datacenters) - - PowerPoint PPT Presentation

datacenter application interference
SMART_READER_LITE
LIVE PREVIEW

Datacenter application interference CMPs (popular in datacenters) - - PowerPoint PPT Presentation

Datacenter application interference CMPs (popular in datacenters) offer increased throughput and reduced power consumption They also increase resource sharing between applications, which can result in negative interference. 1 Resource


slide-1
SLIDE 1

Datacenter application interference

  • CMPs (popular in datacenters) offer increased

throughput and reduced power consumption

  • They also increase resource sharing between

applications, which can result in negative interference.

1

slide-2
SLIDE 2

Resource contention is well studied … at least on single machines. 3 main methods: (1) Gladiator style match-ups (2) Static analysis to predict application resource usage (3) Measure benchmark resource usage; apply to live applications

2

slide-3
SLIDE 3

New methodology for understanding datacenter interference is needed.

One that can handle complexities of a datacenter:  (10s of) thousands of applications  real user inputs  production hardware  financially feasible  low overhead Hardware counter measurements of live applications.

3

slide-4
SLIDE 4

Our contributions

  • 1. ID complexities in datacenters
  • 2. New measurement methodology
  • 3. First large-scale study of measured

interference on live datacenter applications.

4

slide-5
SLIDE 5

Complexities of understanding application interference in a datacenter

5

slide-6
SLIDE 6

Large chips and high core utilizations Profiling 1000 12-core,

24-hyperthread Google servers running production workloads revealed the average machine had >14/24 HW threads in use.

6

slide-7
SLIDE 7

Heterogeneous application mixes

Often applications have more than

  • ne co-runner on a machine.

Observed max of 19 unique co- runner threads (out

  • f 24 HW

threads). 0-1 Co-runners 2-3 Co-runners 4+ Co-runners

7

slide-8
SLIDE 8

Application complexities

  • Fuzzy definitions
  • Varying and sometimes

unpredictable inputs

  • Unknown optimal performance

8

slide-9
SLIDE 9

Hardware & Economic Complexities

  • Varying micro-arch platforms
  • Necessity for low overhead =

limited measurement capabilities

  • Corporate policies

9

slide-10
SLIDE 10

Measurement methodology

10

slide-11
SLIDE 11

Measurement Methodology

The goal: A generic methodology to collect application interference data

  • n live production

datacenter servers

11

slide-12
SLIDE 12

Measurement Methodology

12

  • App. A
  • App. B

Time

slide-13
SLIDE 13

Measurement Methodology

  • 1. Use sample-

based monitoring to collect per machine per core event (HW counter) sample data.

1.

13

slide-14
SLIDE 14

Measurement Methodology

14

  • App. A
  • App. B

2 M instrs 2 M instrs 2 M instrs 2 M instrs 2 M instrs 2 M instrs 2 M instrs 2 M instrs 2 M instrs 2 M instrs

1 2 3 4 5 6 1 2 3 4

slide-15
SLIDE 15

Measurement Methodology

  • 2. Identify sample

sized co-runner relationships…

2.

15

slide-16
SLIDE 16

Measurement Methodology

16

  • App. A
  • App. B

Samples A:1- A:6 are co-runners with App. B. Samples B:1- B:4 are co-runners with App. A.

slide-17
SLIDE 17

Measurement Methodology

17

  • App. C
  • App. A
  • App. B

Say that a new App. C starts running

  • n CPU 1…

… B:4 no longer has a co-runner.

slide-18
SLIDE 18

Measurement Methodology

  • 3. Filter

relationships by

  • arch. independent

interference classes…

3.

18

slide-19
SLIDE 19

Measurement Methodology

Be on opp. sockets.

19

slide-20
SLIDE 20

Measurement Methodology

Share only I/O

20

slide-21
SLIDE 21

Measurement Methodology

  • 4. Aggregate

equivalent co- schedules.

4.

21

slide-22
SLIDE 22

Measurement Methodology

22

For example:

  • Aggregate all the samples of App. A

that have App. B as a shared core co- runner.

  • Aggregate all samples of App. A that

have App. B as a shared core co-runner and App. C as a shared socket co- runner.

slide-23
SLIDE 23

Measurement Methodology

  • 5. Finally, calculate

statistical indicators (means, medians) to get a midpoint performance for

  • app. interference

comparisons

5.

23

slide-24
SLIDE 24

Measurement Methodology

24

  • App. A
  • App. B
  • Avg. IPC

= 2.0

  • Avg. IPC

= 1.5

slide-25
SLIDE 25

Applying the measurement methodology at Google.

25

slide-26
SLIDE 26

Applying the Methodology @ Google

Event Instrs  IPC Sampling period 2.5 Million Number of machines* 1000

Experiment Details:

* All had Intel Westmere chips (24 hyperthreads, 12 cores), matching clock speed, RAM, O/S

  • 1. Collect

samples

Method:

26

slide-27
SLIDE 27

Applying the Methodology @ Google

Event Instrs  IPC Sampling period 2.5 Million Number of machines* 1000

Experiment Details:

* All had Intel Westmere chips (24 hyperthreads, 12 cores), matching clock speed, RAM, O/S Unique binary apps 1102 Co-runner relationships (top 8 apps)

  • Avg. shared core rel’ns 1M (min 2K)
  • Avg. shared socket

9.5M (min 12K)

  • Avg. opposite socket

11M (min 14K)

Collection results:

  • 1. Collect

samples

Method:

  • 2. ID sample

size relationships

  • 3. Filter by

interference classes

27

slide-28
SLIDE 28

Applying the Methodology @ Google

  • 4. Aggregate

equiv. schedules

Method:

  • 5. Calculate

statistical indicators

28

slide-29
SLIDE 29

Analyze Interference

streeview’s IPC changes with top co-runners

Overall median IPC across 1102 applications

29

slide-30
SLIDE 30

Beyond noisy interferers (shared core)

30

Co-running applications Base Application Less or pos. interference Negative interference Noisy data

slide-31
SLIDE 31

Beyond noisy interferers (shared core)

* Recall minimum pair has 2K samples; medians across full grid of 1102 apps

31

Base Applications Co-running applications Less or pos. interference Noisy data Negative interference

slide-32
SLIDE 32

Performance Strategies

  • Restrict negative beyond noisy

interferers (or encourage positive interferers as co-runners)

  • Isolate sensitive or antagonistic

applications

32

slide-33
SLIDE 33

Takeaways

  • 1. New datacenter application

interference studies can use our identified complexities as a check list.

  • 2. Our measurement methodology

(verified at Google in 1st large-scale measurements of live datacenter interference), is generally applicable and shows promising initial performance opportunities.

33

slide-34
SLIDE 34

Questions?

melanie@cs.columbia.edu http://arcade.cs.columbia.edu/

34