ACM Symposium on Cloud Computing 2019 1 Tenants Cloud providers - - PowerPoint PPT Presentation

acm symposium on cloud computing 2019
SMART_READER_LITE
LIVE PREVIEW

ACM Symposium on Cloud Computing 2019 1 Tenants Cloud providers - - PowerPoint PPT Presentation

ACM Symposium on Cloud Computing 2019 1 Tenants Cloud providers Rent Virtual Machines (VMs) VM Operate cloud infrastructures VM Great budget expenditure for: VM Data center equipment Power provisioning 2 Tenants Cloud providers


slide-1
SLIDE 1

ACM Symposium on Cloud Computing 2019

1

slide-2
SLIDE 2

Cloud providers Tenants

Great budget expenditure for:

  • Data center equipment
  • Power provisioning

Rent Virtual Machines (VMs) Operate cloud infrastructures

VM VM

VM

2

slide-3
SLIDE 3

Cloud providers Tenants

Great budget expenditure for:

  • Data center equipment
  • Power provisioning

Rent Virtual Machines (VMs) Operate cloud infrastructures

Ø Virtual resources might be provisioned (via tenants) for peak load Ø Tenants’ VM placement (via providers) is challenging

VM VM

VM

3

slide-4
SLIDE 4

fg = foreground/online workload

4

CDF of average CPU and memory usage, Alibaba cluster trace (2018). Cumulative probability, F(x) X = Average usage

slide-5
SLIDE 5

VM-level CPU usage for the Azure trace (2017).

fg = foreground/online workload

CPU utilization (%)

5

CDF of average CPU and memory usage, Alibaba cluster trace (2018). Cumulative probability, F(x) X = Average usage Time (days)

slide-6
SLIDE 6

VM-level CPU usage for the Azure trace (2017).

fg = foreground/online workload

CPU utilization (%)

6

CDF of average CPU and memory usage, Alibaba cluster trace (2018). Cumulative probability, F(x) X = Average usage Time (days)

Great opportunity to use cloud idle resources

slide-7
SLIDE 7

CDF of average CPU and memory usage, Alibaba cluster trace (2018).

7

Cumulative probability, F(x) X = Average usage

slide-8
SLIDE 8

CDF of average CPU and memory usage, Alibaba cluster trace (2018).

Ø

  • bg = background/batch workload

8

Cumulative probability, F(x) X = Average usage

slide-9
SLIDE 9

CDF of average CPU and memory usage, Alibaba cluster trace (2018).

Ø

  • bg = background/batch workload

Problem statement: How to schedule background batch jobs to improve utilization without hurting black-box foreground performance?

9

Cumulative probability, F(x) X = Average usage

slide-10
SLIDE 10

Ø Ø

  • Ø

10

slide-11
SLIDE 11

Ø

  • Ø
  • fg: facebook

bg: FB-Hadoop

11

slide-12
SLIDE 12

Ø Ø Ø Ø

Virtual Machine (VM)

Physical server

Container [n_socket] Worker process network Data sources

Scavenger Daemon

Container [1] Worker process

12

slide-13
SLIDE 13

Ø

  • 1

2 3 Last Level Cache (LLC) Using Linux’s cpuset cgroups Ubuntu 16.04, KVM, Docker CPU Cores Container DCopy VM Web serving

13

slide-14
SLIDE 14

Ø

  • 1

2 3 Last Level Cache (LLC) Using Linux’s cpuset cgroups Ubuntu 16.04, KVM, Docker CPU Cores Container DCopy VM Web serving 95%ile RT degradation (%) Background CPU usage (%)

14

slide-15
SLIDE 15

Ø

  • 1

2 3 Last Level Cache (LLC) Using Linux’s cpuset cgroups Ubuntu 16.04, KVM, Docker CPU Cores Container DCopy VM Web serving 95%ile RT degradation (%) Instruction Per Cycle (IPC) degradation(%) Background CPU usage (%)

15

slide-16
SLIDE 16

Ø

  • 1

2 3 Last Level Cache (LLC) Using Linux’s cpuset cgroups Ubuntu 16.04, KVM, Docker CPU Cores Container DCopy VM IPC is used as performance proxy Web serving 95%ile RT degradation (%) Instruction Per Cycle (IPC) degradation(%) Background CPU usage (%)

16

slide-17
SLIDE 17

Ø

  • 17
slide-18
SLIDE 18

ØOur generic online algorithm

  • Monitor VMs’ perf metric (e.g., memory usage) for window-size
  • Calculate mean, 𝜈, and standard deviation, 𝜏
  • React based on the VMs’ perf metric and 𝝂 +/- 𝒅.𝝉

18

window-size 𝝂 − 𝒅. 𝝉 𝝂 + 𝒅. 𝝉 Do nothing bg-- bg++ Time Normalized metric value [memory usage, network usage]

Simplified illustration

Headroom bg = 1 – (𝝂 + 𝒅. 𝝉) bg = 0

slide-19
SLIDE 19

Ø

  • 19
slide-20
SLIDE 20

Ø

  • Foreground

Training CloudSuite Widely used benchmark suite Testing TailBench Designed for latency-critical applications Background (SparkBench) KMeans A popular clustering algorithm SparkPi Computes Pi with very high precision

20

slide-21
SLIDE 21

Ø

  • Foreground

Training CloudSuite Widely used benchmark suite Testing TailBench Designed for latency-critical applications Background (SparkBench) KMeans A popular clustering algorithm SparkPi Computes Pi with very high precision

Sensitivity analysis Experimental evaluation

21

slide-22
SLIDE 22

Workload Domain Tail latency scale Xapian Online search Milliseconds Moses Real-time translation Milliseconds Silo In-memory database (OLTP) Microseconds Specjbb Java middleware Microseconds Masstree Key-value store Microseconds Shore On-disk database (OLTP) Milliseconds Sphinx Speech recognition Seconds Img-dnn Image recognition Milliseconds

22

The load generators employed in TailBench are open-loop.

http://people.csail.mit.edu/sanchez/papers/2016.tailbench.iiswc.pdf

slide-23
SLIDE 23

250GB DRAM 10 Gb/s network KVM, Docker Ubuntu 16.04 PM1 PM2 LLC of size 25MB Resource Manager, Name Node, Data Node Processor socket 0 1 2 3 4 5 6 7 8 9 LLC of size 25MB Processor socket 1 1 2 3 4 5 6 7 8 9

23

slide-24
SLIDE 24

VM1 250GB DRAM 10 Gb/s network KVM, Docker Ubuntu 16.04 Background PM1 PM2 LLC of size 25MB Resource Manager, Name Node, Data Node Processor socket 0 1 2 3 4 5 6 7 8 9 LLC of size 25MB Processor socket 1 1 2 3 4 5 6 7 8 9

24

1 2 3 4 5 6 7 8 9

slide-25
SLIDE 25

VM1 250GB DRAM 10 Gb/s network KVM, Docker Ubuntu 16.04 Background PM1 PM2 LLC of size 25MB VM2 Background Resource Manager, Name Node, Data Node Processor socket 0 1 2 3 4 5 6 7 8 9 LLC of size 25MB Processor socket 1 1 2 3 4 5 6 7 8 9

25

1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9

slide-26
SLIDE 26

Ø Ø

  • Ø

26

slide-27
SLIDE 27

VM1 Workload || VM2 Workload Better bg: SparkPi 95%ile latency degradation (%)

27

slide-28
SLIDE 28

VM1 Workload || VM2 Workload Better bg: SparkPi CPU Memory 43%↑ 201%↑ 95%ile latency degradation (%)

28

slide-29
SLIDE 29

VM1 Workload || VM2 Workload Better bg: KMeans Better bg: SparkPi CPU Memory 43%↑ 201%↑ 95%ile latency degradation (%) 95%ile latency degradation (%)

29

slide-30
SLIDE 30

VM1 Workload || VM2 Workload Better CPU Memory 34%↑ 321%↑ bg: KMeans Better bg: SparkPi CPU Memory 43%↑ 201%↑ 95%ile latency degradation (%) 95%ile latency degradation (%)

30

slide-31
SLIDE 31

Sorting FFT 10 20

Baseline Heracles Static 160Mbps Static 80Mbps Scavenger

Better

Lab testbed: 2-vCPU foreground VM, 2-core background container.

Increase in transfer time (%)

31

slide-32
SLIDE 32

Scavenger outperforms static approaches while affording higher background usage.

Sorting FFT 10 20

Baseline Heracles Static 160Mbps Static 80Mbps Scavenger

Better

Lab testbed: 2-vCPU foreground VM, 2-core background container.

CPU Network 37%↑ 180Mbps ↑ Increase in transfer time (%)

32

slide-33
SLIDE 33

4 35346 369968 13 177343 62 50612

xapian moses silo specjbb masstree shore sphinx img-dnn 1 2 3 95%ile latency

No background Baseline Scavenger

Cloud testbed: 4-vCPU foreground VM, 6-core background DCopy container.

Better Normalized 95%ile latency

33

slide-34
SLIDE 34

4 35346 369968 13 177343 62 50612

xapian moses silo specjbb masstree shore sphinx img-dnn 1 2 3 95%ile latency

No background Baseline Scavenger

Cloud testbed: 4-vCPU foreground VM, 6-core background DCopy container.

Better 3-5% CPU ↑ Scavenger can successfully and aggressively regulate bg workload to mitigate its impact on fg performance. Normalized 95%ile latency

34

slide-35
SLIDE 35

Ø Ø

  • Ø
  • 35
slide-36
SLIDE 36

36