 
              ACM Symposium on Cloud Computing 2019 1
Tenants Cloud providers Rent Virtual Machines (VMs) VM Operate cloud infrastructures VM Great budget expenditure for: VM • Data center equipment • Power provisioning 2
Tenants Cloud providers Rent Virtual Machines (VMs) VM Operate cloud infrastructures VM Great budget expenditure for: VM • Data center equipment • Power provisioning Ø Virtual resources might be provisioned (via tenants) for peak load Ø Tenants’ VM placement (via providers) is challenging 3
Cumulative probability, F(x) X = Average usage CDF of average CPU and memory usage, Alibaba cluster trace (2018). fg = foreground/online workload 4
Cumulative CPU probability, utilization F(x) (%) X = Average usage Time (days) CDF of average CPU and memory usage, VM-level CPU usage for the Azure Alibaba cluster trace (2018). trace (2017). fg = foreground/online workload 5
Cumulative CPU probability, utilization F(x) (%) X = Average usage Time (days) CDF of average CPU and memory usage, VM-level CPU usage for the Azure Great opportunity to use cloud idle resources Alibaba cluster trace (2018). trace (2017). fg = foreground/online workload 6
Cumulative probability, F(x) X = Average usage CDF of average CPU and memory usage, Alibaba cluster trace (2018). 7
Ø Cumulative • probability, F(x) • X = Average usage CDF of average CPU and memory usage, Alibaba cluster trace (2018). bg = background/batch workload 8
Ø Cumulative • probability, F(x) • X = Average usage CDF of average CPU and memory usage, Alibaba cluster trace (2018). Problem statement: How to schedule background batch jobs to improve utilization without hurting black-box foreground performance? bg = background/batch workload 9
Ø Ø • • • • Ø 10
Ø • fg: facebook bg: FB-Hadoop • • • Ø • • o 11
Ø Ø Ø Ø Physical server network Virtual Machine ( VM ) … Container [n_socket] Container [1] Worker process Worker process Data Scavenger Daemon sources 12
Ø • VM Container Web serving DCopy 0 1 2 3 CPU Cores Last Level Cache (LLC) Ubuntu 16.04, KVM, Docker Using Linux’s cpuset cgroups 13
Ø • VM Container Web serving DCopy 95%ile RT degradation 0 1 2 3 (%) CPU Cores Last Level Cache (LLC) Ubuntu 16.04, KVM, Docker Background CPU usage (%) Using Linux’s cpuset cgroups 14
Ø • VM Container Web serving DCopy Instruction Per 95%ile RT Cycle (IPC) degradation 0 1 2 3 degradation(%) (%) CPU Cores Last Level Cache (LLC) Ubuntu 16.04, KVM, Docker Background CPU usage (%) Using Linux’s cpuset cgroups 15
Ø • VM Container Web serving DCopy Instruction Per 95%ile RT Cycle (IPC) degradation 0 1 2 3 degradation(%) (%) CPU Cores IPC is used as performance proxy Last Level Cache (LLC) Ubuntu 16.04, KVM, Docker Background CPU usage (%) Using Linux’s cpuset cgroups 16
Ø • o • • 17
Ø Our generic online algorithm • Monitor VMs’ perf metric (e.g., memory usage) for window-size • Calculate mean, 𝜈 , and standard deviation, 𝜏 • React based on the VMs’ perf metric and 𝝂 +/- 𝒅 . 𝝉 Headroom Simplified illustration window-size 𝝂 + 𝒅. 𝝉 bg-- Normalized Do nothing metric value [memory usage, 𝝂 − 𝒅. 𝝉 bg++ network usage] Time bg = 0 bg = 1 – ( 𝝂 + 𝒅. 𝝉 ) 18
Ø • 19
Ø • Training CloudSuite Widely used benchmark suite Foreground Testing TailBench Designed for latency-critical applications KMeans A popular clustering algorithm Background (SparkBench) SparkPi Computes Pi with very high precision 20
Ø • Training CloudSuite Widely used benchmark suite Foreground Testing TailBench Designed for latency-critical applications KMeans A popular clustering algorithm Background (SparkBench) SparkPi Computes Pi with very high precision Sensitivity analysis Experimental evaluation 21
The load generators employed in TailBench are open-loop. Workload Domain Tail latency scale Xapian Online search Milliseconds Moses Real-time translation Milliseconds Silo In-memory database (OLTP) Microseconds Specjbb Java middleware Microseconds Masstree Key-value store Microseconds Shore On-disk database (OLTP) Milliseconds Sphinx Speech recognition Seconds Img-dnn Image recognition Milliseconds http://people.csail.mit.edu/sanchez/papers/2016.tailbench.iiswc.pdf 22
0 1 2 3 4 0 1 2 3 4 PM 1 5 6 7 8 9 5 6 7 8 9 LLC of size 25MB LLC of size 25MB Processor socket 0 Processor socket 1 250GB DRAM 10 Gb/s network KVM, Docker Resource Manager, PM 2 Ubuntu 16.04 Name Node, Data Node 23
Background VM 1 0 0 1 1 2 2 3 3 4 4 0 1 2 3 4 PM 1 5 5 6 6 7 7 8 8 9 9 5 6 7 8 9 LLC of size 25MB LLC of size 25MB Processor socket 0 Processor socket 1 250GB DRAM 10 Gb/s network KVM, Docker Resource Manager, PM 2 Ubuntu 16.04 Name Node, Data Node 24
Background VM 2 Background VM 1 0 0 1 1 2 2 3 3 4 4 0 0 1 1 2 2 3 3 4 4 PM 1 5 5 6 6 7 7 8 8 9 9 5 5 6 6 7 7 8 8 9 9 LLC of size 25MB LLC of size 25MB Processor socket 0 Processor socket 1 250GB DRAM 10 Gb/s network KVM, Docker Resource Manager, PM 2 Ubuntu 16.04 Name Node, Data Node 25
Ø Ø • • • • Ø 26
VM 1 Workload || VM 2 Workload bg: SparkPi 95%ile latency degradation (%) Better 27
VM 1 Workload || VM 2 Workload bg: SparkPi 95%ile latency CPU Memory degradation (%) 43%↑ 201%↑ Better 28
VM 1 Workload || VM 2 Workload bg: SparkPi 95%ile latency CPU Memory degradation (%) 43%↑ 201%↑ Better bg: KMeans 95%ile latency degradation (%) Better 29
VM 1 Workload || VM 2 Workload bg: SparkPi 95%ile latency CPU Memory degradation (%) 43%↑ 201%↑ Better bg: KMeans 95%ile latency CPU Memory degradation (%) 34%↑ 321%↑ Better 30
Lab testbed: 2-vCPU foreground VM, 2-core background container. 20 Increase in Baseline Heracles transfer time (%) Static 160Mbps 10 Static 80Mbps Better Scavenger 0 Sorting FFT 31
Lab testbed: 2-vCPU foreground VM, 2-core background container. 20 Increase in Baseline Heracles transfer time (%) Static 160Mbps 10 Static 80Mbps Better Scavenger 0 Sorting FFT CPU Network Scavenger outperforms static approaches while 37%↑ 180Mbps ↑ affording higher background usage. 32
Cloud testbed: 4-vCPU foreground VM, 6-core background DCopy container. 4 35346 369968 13 177343 62 50612 Normalized 3 95%ile latency No background 95%ile latency Baseline Scavenger 2 Better 1 0 xapian moses silo specjbb masstree shore sphinx img-dnn 33
Cloud testbed: 4-vCPU foreground VM, 6-core background DCopy container. 4 35346 369968 13 177343 62 50612 Normalized 3 95%ile latency No background 95%ile latency Baseline Scavenger 2 Better 1 0 xapian moses silo specjbb masstree shore sphinx img-dnn 3-5% Scavenger can successfully and aggressively regulate bg CPU ↑ workload to mitigate its impact on fg performance. 34
Ø Ø • • Ø • • 35
36
Recommend
More recommend