The SCADS Director: Scaling a Distributed Storage System Under - - PowerPoint PPT Presentation

the scads director
SMART_READER_LITE
LIVE PREVIEW

The SCADS Director: Scaling a Distributed Storage System Under - - PowerPoint PPT Presentation

The SCADS Director: Scaling a Distributed Storage System Under Stringent Performance Requirements Beth Trushkowsky, Peter Bodk, Armando Fox, Michael J. Franklin, Michael I. Jordan, David A. Patterson FAST 2011 elasticity for interactive web


slide-1
SLIDE 1

The SCADS Director:

Scaling a Distributed Storage System Under Stringent Performance Requirements

Beth Trushkowsky, Peter Bodík, Armando Fox, Michael J. Franklin, Michael I. Jordan, David A. Patterson FAST 2011

slide-2
SLIDE 2

elasticity for interactive web apps

2

Interactivity Service-Level-Objective: Over any 1-minute interval, 99% of requests are satisfied in less than 100ms Targeted systems features:

  • horizontally scalable
  • API for data movement
  • backend for interactive apps

clients web servers storage

slide-3
SLIDE 3

wikipedia workload trace - June 2009

3

Michael Jackson dies

slide-4
SLIDE 4
  • verprovisioning storage system

4

(assuming data stored on ten servers)

  • verprovision by 300%

to handle spike

slide-5
SLIDE 5

contributions

5

 Cloud computing is mechanism for storage elasticity

 Scale up when needed  Scale down to save money

 We address the scaling policy

 Challenges of latency-based scaling  Model-based approach for elasticity to deal with stringent SLO  Fine-grained workload monitoring aids in scaling up and down  Show elasticity for both a hotspot and a diurnal workload

pattern

slide-6
SLIDE 6

SCADS key/value store

6

 Features

 Partitioning (until some minimum data size)  Replication  Add/remove servers

 Properties

 Range-based partitioning  Data maintained in memory for performance  Eventually consistent

(see SCADS: Scale-independent storage for social computing applications, CIDR’09)

slide-7
SLIDE 7

classical closed-loop control for elasticity?

7

SCADS cluster Controller actions Action Executor actions upper %-tile latency sampled latency sampled latency config

slide-8
SLIDE 8
  • scillations from a noisy signal

8

time 99th %-tile latency

Noisy signal… Will smoothing help?

slide-9
SLIDE 9

too much smoothing masks spike

9

time 99th %-tile latency

slide-10
SLIDE 10

variation for smoothing intervals

10 5 10 15 2 5 10 20 50 smoothing interval [min] standard deviation [ms] (log scale)

! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !

99th %-tile latency mean latency (SCADS running on Amazon EC2) raw 99th % raw mean

slide-11
SLIDE 11

model-predictive control (MPC)

 MPC instead of classical closed-loop

 Upper %-tile latency is a noisy signal  Use per-server workload as predictor of upper %-tile latency  Therefore need a model that predicts SLO violations based on

  • bserved workload

 Reacting with MPC

 Use model of the system to determine a sequence of actions

to change state to meet constraint

 Execute first steps, then re-evaluate

11

Model workload SLO violation

slide-12
SLIDE 12

model-predictive control loop

12

SCADS cluster Controller actions Action Executor actions Performance Models Workload Histogram sampled workload smoothed workload config upper %-tile latency sampled latency sampled latency config

slide-13
SLIDE 13

building a performance model

 Benchmark SCADS servers

  • n Amazon’s EC2

 Steady-state model

 Single server capacity  Explore space of possible

workload

 Binary classifier: SLO violation

  • r not

13

2000 4000 6000 8000 500 1000 1500 2000 2500 get workload [req/sec] put workload [req/sec]

! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !!!!!!!!!!!!!!!!!!!!!! ! !!!!!!!!!!!!!!!!!!!!!!!! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !

2000 4000 6000 8000 500 1000 1500 2000 2500 get workload [req/sec] put workload [req/sec]

! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !!!!!!!!!!!!!!!!!!!!!! ! !!!!!!!!!!!!!!!!!!!!!!!! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !

50/50 80/20 90/10 95/5 No violation Violation

slide-14
SLIDE 14

how much data to move?

14

time workload (requests/sec)

slide-15
SLIDE 15

finer-granularity workload monitoring

 Need fine-grained workload monitoring

 Data movement especially impacts tail of latency distribution  Only move enough data to alleviate performance issues  Move data quickly  Better for scaling down later  Monitor workload on small units of data (bins)  Move/copy bins between servers

15

slide-16
SLIDE 16

summary of approach

16

 Fine-grained monitoring and performance model

 Determine amount of data to move from overloaded server  Estimate how much “extra room” an underloaded server has  Know when safe to coalesce servers

 Replication for predictability and robustness

 See paper and/or tonight’s poster session

slide-17
SLIDE 17

controller stages

17

Stage 1: Replicate Workload threshold Bins Stage 2: Partition Stage 3: Allocate servers Storage nodes N1 N2 N3 N4 N5 N6 N7

slide-18
SLIDE 18

controller stages

18

Workload threshold Bins

destination

Stage 1: Replicate Stage 2: Partition Stage 3: Allocate servers Storage nodes N1 N2 N3 N4 N5 N6 N7

slide-19
SLIDE 19

controller stages

19

Storage nodes Workload threshold Bins N1 N2 N3 N4 N5 N6 N7 Stage 1: Replicate Stage 2: Partition Stage 3: Allocate servers

slide-20
SLIDE 20

experimental results

 Experiment setup

 Up to 20 SCADS servers run on m1.small instances on Amazon EC2  Server capacity: 800MB, due to in-memory restriction  5-10 data bins per server  100ms SLO on read latency

 Workload profiles

 Hotspot

 100% workload increase in five minutes on a single data item  Based on spike experienced by CNN.com on 9/11

 Diurnal

 Workload increases during the day, decreases at night  Replayed trace at 12x speedup 20

slide-21
SLIDE 21

extra workload directed to single data item

21 request rate

05:10 05:15 05:20 05:25 05:30 40000 30000

aggregate request rate

05:10 05:15 05:20 05:25 05:30 10000 30000

per−bin request rate

150 05:10 05:15 05:20 05:25 05:30

per-bin request rate

hot bin

  • ther 199 bins

time [min]

slide-22
SLIDE 22

replicating hot data

22

per-bin request rate

99th percentile latency [ms]

50 100 150 20

99th %-tile latency (ms)

number of servers

5 10 15 20

number of servers

time [min]

05:10 05:15 05:20 05:25 05:30 10000 30000

per−bin request rate

150 05:10 05:15 05:20 05:25 05:30

slide-23
SLIDE 23

scaling up and down

23

 Number of servers

 two experiments close

to “ideal”

 Over-provisioning

tradeoff

 Amplify workload by

10%, 30%

 Savings

 Known peak: 16%  30% headroom: 41%

20 40 60 80 100 120 40000 80000 120000 simulated time [min] workload rate [req/s]

number of servers aggregate request rate ideal elastic 10% elastic 30%

20 40 60 80 100 120 5 10 15 simulated time [min] number of servers elastic 0.3 elastic 0.1 ideal

slide-24
SLIDE 24

cost-risk tradeoff

24

 Over-provisioning

 Allows more time before violation occurs  Cost-risk tradeoff

 Comparing over-provisioning for diurnal experiment

 Recall SLO parameters: threshold, percentile, interval  Over-provisioning factor of 30% vs 10% Interval Max percentile achieved 30% 10% 5 min 99.5 99 1 min 99 95 20 sec 95 90

slide-25
SLIDE 25

conclusion

 Elasticity for storage servers possible by leveraging cloud

computing

 Upper percentile too noisy

 Model-based approach to build control framework for

elasticity subject to stringent performance SLO

 Finer-grained workload monitoring

 Minimize impact of data movement on performance  Quickly responding to workload fluctuations

 Evaluated on EC2 with hotspot and diurnal workloads

25

slide-26
SLIDE 26

increasing replication

26

50 100 150 200 0.0 0.2 0.4 0.6 0.8 1.0

99th percentile latency with varying replication

lantecy [ms] CDF 5 nodes, 1 replica 10 nodes, 2 replicas 15 nodes, 3 replicas