PARTIES: Q O S-A WARE R ESOURCE P ARTITIONING FOR M ULTIPLE I - - PowerPoint PPT Presentation

parties
SMART_READER_LITE
LIVE PREVIEW

PARTIES: Q O S-A WARE R ESOURCE P ARTITIONING FOR M ULTIPLE I - - PowerPoint PPT Presentation

PARTIES: Q O S-A WARE R ESOURCE P ARTITIONING FOR M ULTIPLE I NTERACTIVE S ERVICES Shuang Chen , Christina Delimitrou, Jos F. Martnez Cornell University C OLOCATION OF A PPLICATIONS Best- Latency effort -critical P P P P Private


slide-1
SLIDE 1

PARTIES: QOS-AWARE RESOURCE PARTITIONING

FOR MULTIPLE INTERACTIVE SERVICES

Shuang Chen, Christina Delimitrou, José F. Martínez Cornell University

slide-2
SLIDE 2

Best- effort

COLOCATION OF APPLICATIONS

Private caches

Last-level Cache

Private caches Private caches Private caches

P P P P …

Latency

  • critical

Page 1 of 15 Motivation• Characterization• PARTIES• Evaluation • Conclusions

slide-3
SLIDE 3

PRIOR WORK

§ Interference during colocation § Scheduling [Nathuji’10, Mars’13, Delimitrou’14]

  • Avoid co-scheduling of apps that may interfere
  • May require offline knowledge
  • Limit colocation options

§ Resource partitioning [Sanchez’11, Lo’15]

  • Partition shared resources
  • At most 1 LC app + multiple best-effort jobs

Page 2 of 15 Motivation• Characterization• PARTIES• Evaluation • Conclusions

slide-4
SLIDE 4

Best- effort

TRENDS IN DATACENTERS

Latency

  • critical

Microservices Monolith 1 LC + many BE many LC + many BE

Page 3 of 15 Motivation• Characterization• PARTIES• Evaluation • Conclusions

All have QoS targets More LC jobs

slide-5
SLIDE 5

MAIN CONTRIBUTIONS

§ Workload characterization

  • The impact of resource sharing
  • The effectiveness of resource isolation
  • Relationship between different resources

§ PARTIES: First QoS-aware resource manager for colocation of many LC services

  • Dynamic partitioning of 9 shared resources
  • No a priori application knowledge
  • 61% higher throughput under QoS constraints
  • Adapts to varying load patterns

Page 1 of 15 Motivation• Characterization• PARTIES• Evaluation • Conclusions

slide-6
SLIDE 6

Application Memcached Xapian NGINX Moses MongoDB Sphinx Domain Key-value store Web search Web server Real-time translation Persistent database Speech recognition Target QoS 600us 5ms 10ms 15ms 300ms 2.5s Max Load 1,280,000 8,000 560,000 2,800 240 14 User / Sys / IO CPU% 13 / 78 / 0 42 / 23 / 0 20 / 50 / 0 50 / 14 / 0 0.3 / 0.2 / 57 85 / 0.6 / 0 LLC MPKI 0.55 0.03 0.06 10.48 0.01 6.28 Memory Capacity 9.3 GB 0.02 GB 1.9 GB 2.5 GB 18 GB 1.4 GB Memory Bandwidth 0.6 GB/s 0.01 GB/s 0.6 GB/s 26 GB/s 0.03 GB/s 3.1 GB/s Disk Bandwidth 0 MB/s 0 MB/s 0 MB/s 0 MB/s 5 MB/s 0 MB/s Network Bandwidth 3.0 Gbps 0.07 Gbps 6.2 Gbps 0.001 Gbps 0.01 Gbps 0.001 Gbps

INTERACTIVE LC APPLICATIONS

Page 5 of 15 Motivation• Characterization• PARTIES• Evaluation • Conclusions

slide-7
SLIDE 7

Application Memcached Xapian NGINX Moses MongoDB Sphinx Domain Key-value store Web search Web server Real-time translation Persistent database Speech recognition Target QoS 600us 5ms 10ms 15ms 300ms 2.5s Max Load 1,280,000 8,000 560,000 2,800 240 14 User / Sys / IO CPU% 13 / 78 / 0 42 / 23 / 0 20 / 50 / 0 50 / 14 / 0 0.3 / 0.2 / 57 85 / 0.6 / 0 LLC MPKI 0.55 0.03 0.06 10.48 0.01 6.28 Memory Capacity 9.3 GB 0.02 GB 1.9 GB 2.5 GB 18 GB 1.4 GB Memory Bandwidth 0.6 GB/s 0.01 GB/s 0.6 GB/s 26 GB/s 0.03 GB/s 3.1 GB/s Disk Bandwidth 0 MB/s 0 MB/s 0 MB/s 0 MB/s 5 MB/s 0 MB/s Network Bandwidth 3.0 Gbps 0.07 Gbps 6.2 Gbps 0.001 Gbps 0.01 Gbps 0.001 Gbps

INTERACTIVE LC APPLICATIONS

Page 6 of 15 Motivation• Characterization• PARTIES• Evaluation • Conclusions

Max load: max RPS under QoS target when running alone

slide-8
SLIDE 8

Memcached Xapian NGINX Moses MongoDB Sphinx Hyperthread CPU Power LLC Capacity LLC Bandwidth Memory Bandwidth Memory Capacity Disk Bandwidth Network Bandwidth

0% Extremely sensitive 100% Not sensitive at all % of max load under QoS target

INTERFERENCE STUDY

  • Applications are sensitive to resources with high usage
  • Applications with strict QoS targets are more sensitive

Page 7 of 15 Motivation• Characterization• PARTIES• Evaluation • Conclusions

slide-9
SLIDE 9

ISOLATION MECHANISMS

P

Private caches

Last-level Cache

Private caches

P

Private caches

P …

Private caches

P

cgroup ACPI frequency driver qdisc

Intel CAT

  • Core mapping

» Hyperthreads » Core counts

  • Memory capacity
  • Disk bandwidth
  • Core frequency

» Power

  • LLC capacity

» Cache capacity » Cache bandwidth » Memory bandwidth

  • Network bandwidth

Page 8 of 15 Motivation• Characterization• PARTIES• Evaluation • Conclusions

slide-10
SLIDE 10

RESOURCE FUNGIBILITY § Resources are fungible

  • More flexibility in resource allocation
  • Simplifies resource manager

1 3 5 7 9 1113151719

Cache ways

1 3 5 7 9 11 13

X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X

Xapian

1 3 5 7 9 1113151719

Cache ways

1 3 5 7 9 11 13

X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X

Xapian

Stand-alone With memory interference

Page 9 of 15 Motivation• Characterization• PARTIES• Evaluation • Conclusions

slide-11
SLIDE 11

PARTIES: DESIGN PRINCIPLES

§ PARTIES

  • PARTitioning for multiple InteractivE Services

§ Design principles

  • LC apps are equally important
  • Allocation should be dynamic and fine-grained
  • No a priori application knowledge or offline profiling
  • Recover quickly from incorrect decisions
  • Migration is used as a last resort

Page 10 of 15 Motivation• Characterization• PARTIES• Evaluation • Conclusions

slide-12
SLIDE 12

C

$

time

Slack 20%

PARTIES

Latency Monitor

Client side Server side

Main Function Poll latency every 100ms

Page 11 of 15 Motivation• Characterization• PARTIES• Evaluation • Conclusions

App 2 App 1

Unallocated pool

  • 5 knobs organized into 2 wheels
  • Start from a random resource
  • Follow the wheels to visit all resources

QoS violations? Upsize! Excess resources? Downsize!

C

$

C

$ F

Downsizing App 2…

C

$ $ F $ F

Upsizing App 1…

slide-13
SLIDE 13

METHODOLOGY

§ Platform: Intel E5-2699 v4

  • Single socket with 22 cores (8 IRQ cores)

§ Virtualization

  • LXC 2.0.7

§ Load generators

  • Open loop
  • Request inter-arrival distribution: exponential
  • Request popularity: Zipfian

§ Testing strategy

  • Constant load: 30s warmup, 1m measurement (x5)
  • Varying load simulates diurnal load patterns

Page 12 of 15 Motivation• Characterization• PARTIES• Evaluation • Conclusions

slide-14
SLIDE 14

10 20 30 40 50 60 70 80 Max Load of Xapian(%) 70 70 60 30 20 10 60 50 40 20 10 50 40 30 10 40 20 10 30 20 10 20 10 20 10 20 40 60 80 100 10 20 30 40 50 60 70 80 Max Load of Xapian(%) 10 20 30 40 50 60 Max Load of Memcached(%) 60 50 40 30 50 40 30 10 40 20 10 30 10 10 10 20 30 40 50 60 70 80 Max Load of Xapian(%) 80 70 60 40 20 10 70 50 40 30 10 50 40 30 20 40 30 20 30 20 10 30 10 20 10 20 40 60 80 100 10 20 30 40 50 60 70 80 Max Load of Xapian(%) 10 20 30 40 50 60 Max Load of Memcached(%) 50 30 10 10 30 20 20 10 10

CONSTANT LOADS: MEMCACHED, XAPIAN & NGINX

Unmanaged Heracles PARTIES Oracle

Max Load of NGINX(%) Max Load of NGINX(%)

Oracle

  • Offline profiling
  • Always finds the

global optimum

Heracles

  • No partitioning

between BE jobs

  • Suspend BE upon

QoS violation

  • No interaction

between resources

Page 13 of 15 Motivation• Characterization• PARTIES• Evaluation • Conclusions

slide-15
SLIDE 15

MORE EVALUATION

Page 14 of 15 Motivation• Characterization• PARTIES• Evaluation • Conclusions

Constant loads

§ All 2- and 3-app mixes under PARTIES § Comparison with Heracles for 2- to 6-app mixes

Diurnal load pattern

§ Colocation of Memcached, Xapian and Moses

PARTIES overhead

§ Convergence time for 2- to 6-app mixes

slide-16
SLIDE 16

CONCLUSIONS

§ Need to manage multiple LC apps § Insights

  • Resource partitioning
  • Resource fungibility

§ PARTIES

  • Partition 9 shared resources
  • No offline knowledge required
  • 61% higher throughput under QoS targets
  • Adapts to varying load patterns

Page 15 of 15 Motivation• Characterization• PARTIES• Evaluation • Conclusions

slide-17
SLIDE 17

PARTIES: QOS-AWARE RESOURCE PARTITIONING

FOR MULTIPLE INTERACTIVE SERVICES

Shuang Chen, Christina Delimitrou, José F. Martínez Cornell University http://tiny.cc/parties