Reconciling High Server U0liza0on and Sub-millisecond - PowerPoint PPT Presentation

Reconciling ¡High ¡Server ¡U0liza0on ¡ and ¡ Sub-‑millisecond ¡Quality-‑of-‑Service ¡ Jacob ¡Leverich ¡and ¡Christos ¡Kozyrakis, ¡ Stanford ¡University ¡ ¡ EuroSys ¡’14, ¡April ¡14 th , ¡2014 ¡ 1 ¡

Server ¡u0liza0on ¡is ¡low ¡ Amazon ¡EC2 ¡[Liu, ¡CGC’11] ¡ Industry ¡average ¡[McKinsey’09] ¡ 0.03" 0.025" Capital-‑inefficient ¡ Frac%on(of(%me( 0.02" 0.015" U0liza0on ¡at ¡Google ¡ 0.01" Power-‑inefficient ¡ [Barroso ¡and ¡Holzle, ¡2007] ¡ 0.005" 0" 0.0" 0.1" 0.2" 0.3" 0.4" 0.5" 0.6" 0.7" 0.8" 0.9" 1.0" CPU(u%liza%on( 2 ¡

Why ¡so ¡low? ¡ • Diurnal ¡varia0on ¡ • Capacity ¡for ¡future ¡growth, ¡unexpected ¡spikes ¡ • Server/workload ¡mismatch ¡ Simple ¡solu5on: ¡ Cluster ¡Consolida5on ¡ 3 ¡

Two ¡consolida0on ¡examples ¡ • Analy0cs ¡cluster ¡with ¡unused ¡memory ¡ Cores ¡ 70% ¡ Memcached? ¡ Memory ¡ 20% ¡ • Memcached ¡cluster ¡with ¡unused ¡CPU ¡ Cores ¡ 30% ¡ Analy0cs? ¡ 85% ¡ Memory ¡ 4 ¡

Consolida0on ¡ ¡ ¡ ¡ à ¡ ¡Poor ¡Performance ¡& ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡Quality ¡of ¡Service ¡ • Interference ¡on ¡shared ¡resources ¡ – Cores, ¡caches, ¡memory, ¡storage, ¡network ¡ – QoS ¡viola5ons ¡in ¡low-‑latency ¡applica5ons ¡ • Latency ¡correlated ¡with ¡revenue ¡[Mayer’06] ¡ ¡ ¡ • Simple ¡solu0ons ¡lead ¡to ¡low-‑u0liza0on ¡ – Don’t ¡co-‑locate ¡work ¡with ¡low-‑latency ¡services ¡ – Inflate ¡reserva0ons ¡to ¡reduce ¡co-‑located ¡jobs ¡ 5 ¡

Can ¡we ¡reconcile ¡high ¡u0liza0on ¡and ¡ good ¡quality ¡of ¡service? ¡ Project ¡MUTILATE: ¡ More ¡U5liza5on ¡ with ¡Low ¡Latency ¡ 6 ¡

Contribu0ons ¡ • Iden0fied ¡key ¡QoS ¡vulnerabili0es ¡for ¡sub-‑millisecond ¡services ¡ – Queuing ¡delay, ¡scheduling ¡delay, ¡thread ¡load ¡imbalance ¡ • Developed ¡best ¡prac0ces ¡to ¡maintain ¡good ¡QoS ¡ – Queuing ¡delay : ¡ ¡ ¡Interference-‑aware ¡provisioning ¡ – Scheduling ¡delay : ¡ ¡ ¡Use ¡alterna0ves ¡to ¡CFS ¡ – Thread ¡load ¡imbalance : ¡Dynamically ¡share ¡connec0ons/requests ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡[or ¡pin ¡threads] ¡ – Network ¡interference : ¡NIC ¡receive-‑flow ¡steering ¡ • 17-‑52% ¡reduc0on ¡in ¡TCO ¡with ¡good ¡QoS ¡despite ¡interference ¡ ¡ ¡ ¡ 7 ¡

Focused ¡on ¡memcached ¡ • Low ¡nominal ¡latency: ¡100s ¡of ¡usecs ¡ • Sensi0ve ¡to ¡interference ¡ • Good ¡example ¡of ¡an ¡event-‑based ¡service ¡ – Arch. ¡shared ¡by ¡REDIS, ¡node.js, ¡lighjpd, ¡nginx, ¡etc. ¡ • Focus ¡on ¡interference ¡due ¡to ¡consolida0on ¡ – Ignore ¡misbehaving ¡clients, ¡large ¡requests, ¡etc. ¡ [Shue, ¡OSDI’12, ¡“Pisces”] ¡ 8 ¡

Life ¡of ¡a ¡memcached ¡request ¡ TCP/IP ¡ Write ¡ Client ¡ 3 IRQ ¡ TX ¡ 8 Kernel ¡ GET ¡foo ¡ 1 2 3 NIC ¡ switch ¡ wire ¡ Syscall ¡ 9 Memcached ¡ Schedule ¡ 11 ¡ User ¡ libevent ¡ Server ¡ Ac0vate ¡ TCP/IP ¡ TCP/IP ¡ Write ¡ Read ¡ Epoll ¡ RX ¡ TX ¡ NIC ¡ 4 5 6 7 8 9 ¡ 10 ¡ 11 ¡ 12 ¡ 13 ¡ 14 ¡ 15 ¡ NIC ¡ Client ¡ VALUE ¡foo ¡ switch ¡ NIC ¡ 4 5 6 7 8 11 ¡ wire ¡ bar ¡ wire ¡ END ¡ 9 ¡

QoS ¡vulnerabili0es ¡ Memcached ¡ 3 IRQ ¡ Schedule ¡ Server ¡ libevent ¡ Ac0vate ¡ TCP/IP ¡ TCP/IP ¡ Write ¡ 8 Kernel ¡ Epoll ¡ Read ¡ RX ¡ TX ¡ get ¡FOO ¡ Syscall ¡ 9 NIC ¡ 4 5 7 8 6 9 ¡ 11 ¡ 12 ¡ 13 ¡ 14 ¡ 15 ¡ NIC ¡ 10 ¡ 11 ¡ User ¡ ~10usecs ¡ ~19-‑21usecs ¡(unloaded) ¡ • Queuing ¡delay ¡ – Func0on ¡of ¡load ¡and ¡service ¡0me ¡ • Scheduling ¡delay ¡ – Wait ¡0me ¡and ¡context ¡switch ¡latency ¡ ¡ ¡ 10 ¡

Let’s ¡capacity ¡plan ¡a ¡cluster ¡ • Want ¡to ¡support ¡1B ¡queries/sec ¡total ¡ – Accounts ¡for ¡diurnal ¡varia0on, ¡unexpected ¡spikes ¡ (worst-‑case ¡peak) ¡ – Must ¡maintain ¡low ¡latency ¡ • How ¡many ¡servers ¡do ¡we ¡need? ¡ 11 ¡

Provisioning ¡for ¡Quality ¡of ¡Service ¡ ~1M ¡QPS ¡ 1000" Average" 900" 95th5%" 800" 700" For ¡1B ¡QPS ¡we ¡need ¡ Latency((usecs)( 600" 1,000 ¡servers ¡@ ¡1M ¡QPS ¡ 500" 400" 300" 200" 100" 0" " " " " " " " " " " " " " " " " " " " % % % % % % % % % % % % % % % % % % % 3 8 4 9 4 0 5 1 6 1 7 2 8 3 8 4 9 5 0 1 1 2 3 3 4 4 5 5 6 6 7 7 8 8 9 0 1 Memcached(QPS((%(of(peak)( 12 ¡

Provisioning ¡for ¡Quality ¡of ¡Service ¡ Histogram ¡of ¡CPU ¡u5liza5on ¡@ ¡Google ¡[Barroso’07] ¡ 1000" Average" Provisioned ¡QPS ¡ Nominal ¡QPS ¡ 900" 95th5%" 800" 700" Latency((usecs)( 600" 500" Tons ¡of ¡spare ¡capacity! ¡ 400" Can ¡I ¡make ¡use ¡of ¡it? ¡ 300" 200" 100" 0" " " " " " " " " " " " " " " " " " " " % % % % % % % % % % % % % % % % % % % 3 8 4 9 4 0 5 1 6 1 7 2 8 3 8 4 9 5 0 1 1 2 3 3 4 4 5 5 6 6 7 7 8 8 9 0 1 Memcached(QPS((%(of(peak)( 13 ¡

Cluster ¡consolida0on ¡ • Memcached ¡cluster ¡ – 1,000 ¡servers, ¡30% ¡nominal ¡load ¡ • Analy0cs ¡cluster ¡ – 1,000 ¡servers, ¡50% ¡load, ¡best-‑effort ¡batch ¡jobs ¡ • Can ¡we ¡combine ¡this ¡capacity? ¡ – Must ¡ensure ¡we ¡don’t ¡disturb ¡provisioned ¡QoS ¡ for ¡the ¡memcached ¡server ¡ 14 ¡

Latency ¡@ ¡80% ¡QPS ¡ Baseline ¡(no ¡interference) ¡ 700" 600" 500" Latency((usecs)( 400" 95 th ¡ 300" Avg ¡ 200" 100" 0" Time((20(seconds)( 15 ¡

Latency ¡@ ¡80% ¡QPS ¡ with ¡471.omnetpp ¡ Can’t ¡maintain ¡QoS ¡we ¡provisioned ¡for. ¡ 4,000" This ¡is ¡why ¡workload ¡consolida5on ¡is ¡dangerous! ¡ 3,500" 95 th ¡ 3,000" Latency((usecs)( 2,500" 2,000" 1,500" 1,000" Old ¡ 500" 95 th ¡ Old ¡ 0" Time((20(seconds)( Avg ¡ 16 ¡

Related ¡work ¡ • CPI 2 ¡ [EuroSys’13] ¡ – Punish ¡workload ¡causing ¡interference ¡ • Bubble-‑Up, ¡Paragon ¡[MICRO’11, ¡ASPLOS’13] ¡ – Iden0fy ¡or ¡predict ¡workloads ¡that ¡interfere, ¡ don’t ¡consolidate ¡ • Manage ¡symptoms, ¡don’t ¡address ¡causes ¡ 17 ¡

Latency ¡with ¡heavy ¡L3 ¡interference ¡ 1000" Average" Provisioned ¡QPS ¡ 900" 95th5%" 800" 700" Latency((usecs)( 600" 500" 400" 300" 200" 100" 0" " " " " " " " " " " " " " " " " " " " % % % % % % % % % % % % % % % % % % % 3 8 4 9 4 0 5 1 6 1 7 2 8 3 8 4 9 5 0 1 1 2 3 3 4 4 5 5 6 6 7 7 8 8 9 0 1 Memcached(QPS((%(of(peak)( 18 ¡

Latency ¡with ¡heavy ¡L3 ¡interference ¡ ¡ Memcached ¡ 1000" Write ¡ Read ¡ Average" Provisioned ¡QPS ¡ 900" 95th5%" 800" 11 ¡ 12 ¡ 13 ¡ Average"(w/"L3"int.)" 700" Latency((usecs)( 95th5%"(w/"L3"int.)" 600" ~10usecs ¡ ~16usecs ¡ 500" 400" 300" 200" 100" 0" " " " " " " " " " " " " " " " " " " " % % % % % % % % % % % % % % % % % % % 3 8 4 9 4 0 5 1 6 1 7 2 8 3 8 4 9 5 0 1 1 2 3 3 4 4 5 5 6 6 7 7 8 8 9 0 1 Memcached(QPS((%(of(peak)( 19 ¡

Reconciling High Server U0liza0on and Sub-millisecond - PowerPoint PPT Presentation

Reconciling High Server U0liza0on and Sub-millisecond Quality-of-Service Jacob Leverich and Christos Kozyrakis, Stanford University EuroSys 14, April

Visualizing Model Architecture john.sekar@mssm.edu SASB `17 Kinetics ~ Reaction Rules Enz Sub

Millisecond Pulsar Populations Millisecond Pulsar Populations in Globular Clusters in Globular

Reconciling Human Development Reconciling Human Development and Climate Protection -

A Tale of Two Theories: A Tale of Two Theories: Reconciling Reconciling random matrix theory

Accretion - driven Millisecond X - ray Pulsars and the Discovery of the First Eclipsing Event

LMXBs as progenitors of millisecond pulsars Alessandro Patruno Astronomical Institute A.

Server Traffic Management Server Traffic Management Jeff Chase Duke University, Department of

Content Server Caching Network Client Web Server Browser Avoid Network Latency Avoid Queuing

Sub-millisecond Stateful Stream Querying over Fast-evolving Linked Data Yunhao Zhang, Rong Chen,

Proxy Server, Network Address Translator, Firewall 1 Proxy Server 2 1 Introduction What

Installing a Web Server 1. Install a sample web server, which supports Servlets/JSPs. A light

Installing a Web Server 1. Install a sample web server, which supports Servlets/JSPs. A light

Server Upgrades 6/25/19 Agenda Existing Server Infrastructure Reasons for upgrading

1 Handling Return Traffic Handling Return Traffic URL Switching URL Switching Idea: switch

Reconciling DRR and Climate Frameworks World Water Week 2018 Monday 27 August | 16.00-17.30 |

Industry vs the Planet A Christian Perspective on Reconciling Environmental

Layered Video Stream RLM Sessions Each session composed of layers, with one layer per group

Protection Dr. Jnos Tapolcai tapolcai@tmit.bme.hu http://opti.tmit.bme.hu/~tapolcai/ 1

Implementing a List in Java Two implementation approaches are most commonly used for simple

ASC markets & the pandemic An NAO perspective July 2020 The exam questions What has

BRF-21 and BRF-22 Plan Needs: 95 degree LCW, about 160 GPM 480 VAC 3-phase, about 550 Amps

M. Walfish, M. Vutukuru, H. Balakrishnan, D. Karger and S. Shenker Presented by Kong Lam Material

Rethinking how capital programmes are delivered 3 rd October 2018 Sponsors this line this line

Netflix Performance Meetup Global Client Performance Fast Metrics 3G in Kazakhstan Making the

Reconciling High Server U0liza0on and Sub-millisecond - PowerPoint PPT Presentation

Reconciling High Server U0liza0on and Sub-millisecond Quality-of-Service Jacob Leverich and Christos Kozyrakis, Stanford University EuroSys 14, April

Visualizing Model Architecture john.sekar@mssm.edu SASB `17 Kinetics ~ Reaction Rules Enz Sub

Millisecond Pulsar Populations Millisecond Pulsar Populations in Globular Clusters in Globular

Reconciling Human Development Reconciling Human Development and Climate Protection -

A Tale of Two Theories: A Tale of Two Theories: Reconciling Reconciling random matrix theory

Accretion - driven Millisecond X - ray Pulsars and the Discovery of the First Eclipsing Event

LMXBs as progenitors of millisecond pulsars Alessandro Patruno Astronomical Institute A.

Server Traffic Management Server Traffic Management Jeff Chase Duke University, Department of

Content Server Caching Network Client Web Server Browser Avoid Network Latency Avoid Queuing

Sub-millisecond Stateful Stream Querying over Fast-evolving Linked Data Yunhao Zhang, Rong Chen,

Proxy Server, Network Address Translator, Firewall 1 Proxy Server 2 1 Introduction What

Installing a Web Server 1. Install a sample web server, which supports Servlets/JSPs. A light

Installing a Web Server 1. Install a sample web server, which supports Servlets/JSPs. A light

Server Upgrades 6/25/19 Agenda Existing Server Infrastructure Reasons for upgrading

1 Handling Return Traffic Handling Return Traffic URL Switching URL Switching Idea: switch

Reconciling DRR and Climate Frameworks World Water Week 2018 Monday 27 August | 16.00-17.30 |

Industry vs the Planet A Christian Perspective on Reconciling Environmental

Layered Video Stream RLM Sessions Each session composed of layers, with one layer per group

Protection Dr. Jnos Tapolcai tapolcai@tmit.bme.hu http://opti.tmit.bme.hu/~tapolcai/ 1

Implementing a List in Java Two implementation approaches are most commonly used for simple

ASC markets &amp; the pandemic An NAO perspective July 2020 The exam questions What has

BRF-21 and BRF-22 Plan Needs: 95 degree LCW, about 160 GPM 480 VAC 3-phase, about 550 Amps

M. Walfish, M. Vutukuru, H. Balakrishnan, D. Karger and S. Shenker Presented by Kong Lam Material

Rethinking how capital programmes are delivered 3 rd October 2018 Sponsors this line this line

Netflix Performance Meetup Global Client Performance Fast Metrics 3G in Kazakhstan Making the

ASC markets & the pandemic An NAO perspective July 2020 The exam questions What has