Cake: : Enabling Hig igh-level SLOs on Shared Sto torage Syste - - PowerPoint PPT Presentation

cake enabling hig igh level slos on shared sto torage
SMART_READER_LITE
LIVE PREVIEW

Cake: : Enabling Hig igh-level SLOs on Shared Sto torage Syste - - PowerPoint PPT Presentation

Cake: : Enabling Hig igh-level SLOs on Shared Sto torage Syste tems Andrew Wang, Shivaram Venkataraman, Sara Alspaugh, Randy Katz, Ion Stoica University of California, Berkeley SO SOCC CC 20 2012 12 Content Introduction Problem


slide-1
SLIDE 1

Cake: : Enabling Hig igh-level SLOs on Shared Sto torage Syste tems

Andrew Wang, Shivaram Venkataraman, Sara Alspaugh, Randy Katz, Ion Stoica University of California, Berkeley SO SOCC CC 20 2012 12

slide-2
SLIDE 2

2

 Introduction  Problem And Challenge  Solutions  System Design  Implementation  Evaluation  Conclusion  Future work

Content

slide-3
SLIDE 3

Introduction

 Rich web applications

 A single slow storage request can dominate the

  • verall response time

 High percentile latency SLOs

 Deal with the latency present at the 95th or

99th percentile

slide-4
SLIDE 4

4

Introduction

Datacenter applications

 Latency-sensitive  Throughput-oriented

Accessing distributed storage systems

 Applications don’t share storage systems  Service-level objectives on throughput or latency

slide-5
SLIDE 5

5

Introduction

 SLOs

Reflect the performance expectations

 Amazon, Google, and Microsoft have identified

SLO as a major cause of user dissatisfaction

For example

 A web client might require a 99th percentile

latency SLO of 100ms

 A batch job might require a throughput SLO of

100 scan requests per second

slide-6
SLIDE 6

6

Problem And Challenge

 Physically separating storage systems

 Need Individual peak load  Segregation of data leads to degraded user

experience

 Operational complexity

  • Require additional maintenance staff
  • More software bugs and configuration errors
slide-7
SLIDE 7

7

Problem And Challenge

 Focusing solely on controlling disk-level resources

 High-level storage SLOs require consideration of

resources beyond the disk

 Disconnect between the high-level SLOs and

performance parameters like MB/s

 Require tedious, manual translation  More programmer or system operator

slide-8
SLIDE 8

8

Solutions

Cake ke A coordinated, multi-resource schedule for shared distributed storage environments with the goal of achieving both high throughput and bounded latency.

slide-9
SLIDE 9

9

Architecture

System Design

slide-10
SLIDE 10

10

System Design

 First-level schedulers as a client

 Provide mechanisms for differentiated

scheduling

 Split large requests into smaller chunks  Limit the number of outstanding device requests

slide-11
SLIDE 11

11

System Design

 Cake’s second-level scheduler as a

feedback loop

 While attempting to increase utilization  Continually adjusts resource allocation at each

  • f the first-level schedulers

 Maximize SLO compliance of the system

slide-12
SLIDE 12

12

First-level Resource Scheduling

Differentiated scheduling

a b

slide-13
SLIDE 13

13

First-level Resource Scheduling

Split large requests Control number of outstanding requests

c d

slide-14
SLIDE 14

14

Second-level Scheduling

 Multi-resource Request Lifecycle

 Request processing in a storage system

involves far more than just accessing disk

 Necessitating a coordinated, multi-resource

approach to scheduling

slide-15
SLIDE 15

15

Second-level Scheduling

 Multi-resource Request Lifecycle

slide-16
SLIDE 16

16

Second-level Scheduling

 High-level SLO Enforcement

 Cake’s second-level scheduler

  • Satisfy the latency requirements of latency-sensitive

front-end clients

  • Maximize the throughput of throughput-oriented

batch clients

 Two phases of second level scheduling

decisions

  • For disk in the SLO compliance-based phase
  • For non-disk resources in the queue occupancy-

based phase

slide-17
SLIDE 17

17

Second-level Scheduling

 The initial SLO compliance-based phase

 Decide on disk allocations based on client performance

 The queue occupancy-based phase

 Balance allocation in the rest of the system to keep the

disk utilized and improve overall performance

slide-18
SLIDE 18

18

Implementation

 Chunking Large Requests

slide-19
SLIDE 19

19

Implementation

 Number of Outstanding Requests

slide-20
SLIDE 20

20

Implementation

 Cake Second-level Scheduler — SLO

Compliance-based Scheduling

slide-21
SLIDE 21

21

Implementation

 Cake Second-level Scheduler — Queue

Occupancy-based Scheduling

slide-22
SLIDE 22

22

Evaluation

 Proportional Shares and Reservations

When the front-end client is sending low throughput, reservations are an effective way of reducing queue time at HDFS

slide-23
SLIDE 23

23

Evaluation

 Proportional Shares and Reservations

When the front-end is sending high throughput,proportional share is an effective mechanism at reducing latency

slide-24
SLIDE 24

24

Evaluation

 Single vs Multi-resource Scheduling

CPU contention within HBase when running many concurrent threads and without separate queues and differentiated scheduling

slide-25
SLIDE 25

25

Evaluation

 Single vs. Multi-resource Scheduling

Thread-per-request displays greatly increased latency with chunked request sizes

slide-26
SLIDE 26

26

Evaluation

 Convergence Time  Diurnal Workload  Spike Workload  Latency Throughput Trade-off  Quantifying Benefits of Consolidation

slide-27
SLIDE 27

27

Conclusion

 Coordinating resource allocation across

multiple software layers

 Allowing application programmers to specify

high-level SLOs directly to the storage

 Allowing consolidation of latency-sensitive

and throughput-oriented workloads

slide-28
SLIDE 28

28

Conclusion

 Allowing users to flexibly move within the

storage latency vs. throughput trade-off by choosing different high-level SLOs

 Using Cake has concrete economic and

business advantages

slide-29
SLIDE 29

29

Future work

 SLO admission control  Influence of DRAM and SSDs  Composable application-level SLOs  Automatic parameter tuning  Generalization to multiple SLOs

slide-30
SLIDE 30

30

Thank You