interactive cloud services? Tail latency more important and - - PowerPoint PPT Presentation

interactive cloud services
SMART_READER_LITE
LIVE PREVIEW

interactive cloud services? Tail latency more important and - - PowerPoint PPT Presentation

W ORKLOAD LOAD C HARACTER OF I NTE VE C LOUD UD S ERVIC CTERIZ IZATION ATION OF TERACTI CTIVE ICES ON ON B IG AND S MALL S ERVER P LATF IG AND TFOR ORMS MS Shuang Chen *, Shay Galon**, Christina Delimitrou*, Srilatha Manne**, and Jos


slide-1
SLIDE 1

WORKLOAD

LOAD CHARACTER CTERIZ IZATION ATION OF OF INTE TERACTI CTIVE VE CLOUD UD SERVIC ICES ON ON

BIG

IG AND AND SMALL SERVER PLATF TFOR ORMS MS

Shuang Chen*, Shay Galon**, Christina Delimitrou*, Srilatha Manne**, and José Martínez* *Cornell University **Cavium Inc.

slide-2
SLIDE 2

EXE

XECU CUTI TIVE VE SUMM MMARY

  • How to achieve low tail latency for

interactive cloud services?

  • Tail latency more important and challenging
  • The entire stack from SW to HW is involved
  • Understand how tail latency reacts to

application and system changes

  • See how current designs work
  • Get insights on future designs

Introduction • Characterization• Implications Page 1 of 20

slide-3
SLIDE 3

MOTI

TIVATION VATION

Introduction • Characterization• Implications Page 2 of 20

slide-4
SLIDE 4

LOW

OW LATENCY CY

  • Tail latency
  • e.g., QoS defined as 99th %ile in 500usec

0.995 = 0.95

Introduction • Characterization• Implications Page 3 of 20

0.99 0.99 0.99 0.99 0.99

slide-5
SLIDE 5

LOW

OW TAIL IL LATE TENCY NCY REQUIR IREM EMEN ENTS TS

  • The entire stack from SW to HW is involved

Application OS Hardware Resource Manager Virtualization

  • Application bottleneck
  • Different user cases
  • Scalability
  • Overhead of virtualization
  • SW isolation mechanisms
  • Overhead of context switching
  • HW isolation mechanisms
  • Hyperthreading

Introduction • Characterization• Implications Page 4 of 20

slide-6
SLIDE 6

CATE

TEGORI GORIZE ZE LC

C APPLICATI

ICATIONS ONS

  • By requirement of tail latency
  • us: memcached
  • ms: web server, in-memory database
  • s: persistent database
  • By statefulness
  • Stateful: memcached
  • Stateless: web server

Introduction • Characterization• Implications Page 5 of 20

slide-7
SLIDE 7

SELECT

CTED ED LC W

C WORKLOADS

LOADS

  • NGINX
  • Web server
  • Stateless
  • 99th% in tens of ms
  • Memcached
  • Key-value store
  • Stateful
  • 99th% in hundreds of us

QoS Strictness Statefulness

NGINX Memcached

Introduction • Characterization• Implications Page 6 of 20

slide-8
SLIDE 8

SERVER ARCH

CHIT ITECTUR CTURE

P

L1 I/D: 32/32KB

P

L1 I/D:

32/32KB

LLC: 55MB, 20 ways Memory: 128G DDR4

…22 Cores… 2 Threads/Core

P L1 I/D: 78/32KB P L1 I/D: 78/32KB LLC: 16MB, 16 ways Memory: 128G DDR4

…48 Cores… 1 Thread/Core

L2: 256KB L2: 256KB

Intel Xeon E5-2699 v4 Cavium ThunderX $4,115 $785

NIC: 10Gbps NIC: 10Gbps 14nm 28nm

Introduction • Characterization• Implications Page 7 of 20

slide-9
SLIDE 9

STU

TUDIE IED PARAME METE TERS RS

Application OS Hardware Resource Manager Virtualization

  • Application bottleneck
  • Different user cases
  • Scalability
  • Overhead of virtualization
  • SW isolation mechanisms
  • Overhead of context switching
  • HW isolation mechanisms
  • Hyperthreading

Introduction • Characterization• Implications Page 8 of 20

slide-10
SLIDE 10

INPUT LOAD

Memcached NGINX

Xeon ThunderX

Introduction • Characterization• Implications Page 9 of 20

5.2x 5x

slide-11
SLIDE 11

MEMC

MCAC ACHE HED LATE TENCY CY DECOMP COMPOSITI OSITION ON

IRQ Kernel Syscall User Receive Send

NIC NIC

RX 3111 5 15 1,290 1,650 3 6 14 782 1,009 4 5 9 7 24

ThunderX At 10% of max throughput Xeon

6 7

At 90% of max throughput Xeon ThunderX

Queuing delay

Introduction • Characterization• Implications Page 10 of 20

14 20 24

2x slower than Xeon Network delay Little user-space processing

slide-12
SLIDE 12

STU

TUDIE IED PARAME METE TERS RS

Application OS Hardware Resource Manager Virtualization

  • Application bottleneck
  • Different user cases
  • Scalability
  • Overhead of virtualization
  • SW isolation mechanisms
  • Overhead of context switching
  • HW isolation mechanisms
  • Hyperthreading

Introduction • Characterization• Implications Page 11 of 20

slide-13
SLIDE 13

MEMC

MCAC ACHE HED VALUE SIZE IZE

Xeon ThunderX

  • Memory copy
  • Network processing and transmission
  • ThunderX is more sensitive

Introduction • Characterization• Implications Page 12 of 20

slide-14
SLIDE 14

NUMB

MBER ER OF OF MEMC MCAC ACHED ED ITE TEMS MS

Xeon ThunderX

  • Cache capacity
  • ThunderX is more sensitive

Introduction • Characterization• Implications Page 13 of 20

slide-15
SLIDE 15

STU

TUDIE IED PARAME METE TERS RS

Application OS Hardware Resource Manager Virtualization

  • Application bottleneck
  • Different user cases
  • Scalability
  • Overhead of virtualization
  • SW isolation mechanisms
  • Overhead of context switching
  • HW isolation mechanisms
  • Hyperthreading

Introduction • Characterization• Implications Page 14 of 20

slide-16
SLIDE 16

SCA

CALABI ABILIT LITY

Memcached NGINX

  • Interrupt handling
  • Load imbalance
  • Lock contention

Introduction • Characterization• Implications Page 15 of 20

slide-17
SLIDE 17

STU

TUDIE IED PARAME METE TERS RS

Application OS Hardware Resource Manager Virtualization

  • Application bottleneck
  • Different user cases
  • Scalability
  • Overhead of virtualization
  • SW isolation mechanisms
  • Overhead of context switching
  • HW isolation mechanisms
  • Hyperthreading

Introduction • Characterization• Implications Page 16 of 20

slide-18
SLIDE 18

CONTE

TEXT XT SWIT ITCH CHIN ING

Memcached on Xeon Memcached on ThunderX

Introduction • Characterization• Implications Page 17 of 20

  • Statically spawned threads VS dynamically allocated cores
  • ThunderX is more sensitive
slide-19
SLIDE 19

STU

TUDIE IED PARAME METE TERS RS

Application OS Hardware Resource Manager Virtualization

  • Application bottleneck
  • Different user cases
  • Scalability
  • Overhead of virtualization
  • SW isolation mechanisms
  • Overhead of context switching
  • HW isolation mechanisms
  • Hyperthreading

Introduction • Characterization• Implications Page 18 of 20

slide-20
SLIDE 20

HYP YPERTH THREADING ADING

  • Reduce the overhead of context switching
  • Allocate two threads on two hyperthreads
  • Make better use of execution units
  • Co-locate different applications

Memcached & Nginx on the same hyperthreads Memcached & Nginx on different hyperthreads

Introduction • Characterization• Implications Page 19 of 20

slide-21
SLIDE 21

IMP

MPLIC ICAT ATION IONS OF OF THESE STU TUDIE IES

  • Reduce queuing delays
  • Improve elasticity
  • Lock alternatives
  • Load balance
  • Reduce the overhead of virtualization
  • Avoid context switching
  • Make best use of SW isolation mechanisms
  • Big VS Small Cores
  • Make best use of HW features

Application OS Hardware Resource Manager Virtualization

Introduction • Characterization• Implications Page 20 of 20

QUESTIONS

TIONS?