PrincetonUniversity
Performance Isolation and Fairness for Multi-Tenant Cloud Storage
David Shue*, Michael Freedman*, and Anees Shaikh✦
*Princeton ✦IBM Research
Performance Isolation and Fairness for Multi-Tenant Cloud Storage - - PowerPoint PPT Presentation
PrincetonUniversity Performance Isolation and Fairness for Multi-Tenant Cloud Storage David Shue *, Michael Freedman*, and Anees Shaikh *Princeton IBM Research Setting: Shared Storage in the Cloud Y Y F F Z Z T T 2 Setting:
PrincetonUniversity
David Shue*, Michael Freedman*, and Anees Shaikh✦
*Princeton ✦IBM Research
2
Z Y T F Z Y F T
2
Z Y T F Z Y F T S3 EBS SQS
2
Z Y T F Z Y F T S3 EBS SQS Shared Key-Value Storage
DD DD DD DD Shared Key-Value Storage
3
Z Y T F Z Y F T Y Y Z F F F
Multiple co-located tenants ⇒ resource contention
DD DD DD DD DD DD DD DD Shared Key-Value Storage
4
Z Y T F Z Y F T Y Y Z F F F
Multiple co-located tenants ⇒ resource contention
DD DD DD DD DD DD DD DD Shared Key-Value Storage
4
Z Y T F Z Y F T Y Y Z F F F
Fair queuing @ big iron Multiple co-located tenants ⇒ resource contention
5
Distributed system ⇒ distributed resource allocation Multiple co-located tenants ⇒ resource contention
Z Y T F Z Y F T Y Y Z F F F SS SS SS SS
Z Y T F Z Y F T Y Y Z F F F SS SS SS SS
6
Multiple co-located tenants ⇒ resource contention Distributed system ⇒ distributed resource allocation Z keyspace T keyspace F
keyspace
Y keyspace
Z Y T F Z Y F T Y Y Z F F F SS SS SS SS
6
popularity data partition
Multiple co-located tenants ⇒ resource contention Distributed system ⇒ distributed resource allocation
Z Y T F Z Y F T Y Y Z F F F Z Y T F Z Y F T Y Y Z F F F SS SS SS SS
7
Skewed object popularity ⇒ variable per-node demand Multiple co-located tenants ⇒ resource contention Distributed system ⇒ distributed resource allocation
Z Y T F Z Y F T Y Y Z F F F Z Y T F Z Y F T Y Y Z F F F SS SS SS SS
7
Skewed object popularity ⇒ variable per-node demand Multiple co-located tenants ⇒ resource contention Distributed system ⇒ distributed resource allocation
1kB
GET
10B
GET
1kB
SET
10B
SET (small reads) (large reads) (large writes) (small writes)
Disparate workloads ⇒ different bottleneck resources
Zynga Yelp Foursquare TP Shared Key-Value Storage
8
Z Y T F Z Y F T Y Y Z F F F SS SS SS SS
d d Skewed object popularity ⇒ variable per-node demand Multiple co-located tenants ⇒ resource contention Distributed system ⇒ distributed resource allocation Disparate workloads ⇒ different bottleneck resources
Zynga Yelp Foursquare TP Shared Key-Value Storage
8
Z Y T F Z Y F T Y Y Z F F F SS SS SS SS
d d Skewed object popularity ⇒ variable per-node demand Multiple co-located tenants ⇒ resource contention Distributed system ⇒ distributed resource allocation Disparate workloads ⇒ different bottleneck resources
Zynga Yelp Foursquare TP Shared Key-Value Storage
9
d
Z Y T F Z Y F T Y Y Z F F F SS SS SS SS
Skewed object popularity ⇒ variable per-node demand Multiple co-located tenants ⇒ resource contention Distributed system ⇒ distributed resource allocation Disparate workloads ⇒ different bottleneck resources
10
~ min guarantees, high utilization
10
~ min guarantees, high utilization
~ rate limited, non-work conserving
Tenant A
11
Tenant B
VM VM VM VM VM VM
GET 1101100
RR Controller
Tenant A
11
Tenant B
VM VM VM VM VM VM
GET 1101100
RR Controller
PP
Tenant A
11
Tenant B
VM VM VM VM VM VM
RS
GET 1101100
RR Controller
PP
Tenant A
11
Tenant B
VM VM VM VM VM VM
RS FQ
GET 1101100
RR Controller
PP
Tenant A
12
Tenant B
VM VM VM VM VM VM
RS FQ PP WA WA2 WB2
GET 1101100
RR Controller
Tenant A
12
Tenant B
VM VM VM VM VM VM
RS FQ PP WA WA2 WB2
GET 1101100
RR Controller
WA2 WB2 WA1 WB1
13
Tenant A Tenant B
VM VM VM VM VM VM
PP RS FQ WA WA2 WB2
Controller RR
14
Tenant A Tenant B
VM VM VM VM VM VM
PP RS FQ WA WA2 WB2
RR Controller
Overloaded
15
Tenant A Tenant B
VM VM VM VM VM VM
PP RS FQ WA WA2 WB2
RR
Collect per-partition tenant demand
Controller
15
Bin-pack partitions
Tenant A Tenant B
VM VM VM VM VM VM
PP RS FQ WA WA2 WB2
RR
Collect per-partition tenant demand
Controller
16
Tenant A Tenant B
VM VM VM VM VM VM
PP
Results in feasible partition placement
RS FQ WA WA2 WB2
RR Controller
16
Tenant A Tenant B
VM VM VM VM VM VM
PP
Results in feasible partition placement
RS FQ WA WA2 WB2
RR Controller
Controller
17
WA1 = WB1 WA2 = WB2
Tenant A Tenant B
VM VM VM VM VM VM
PP WA WA2 WB2
RR
RS FQ
Controller
17
WA1 = WB1 WA2 = WB2
Tenant A Tenant B
VM VM VM VM VM VM
PP WA WA2 WB2
RR
RS FQ
Controller
17
WA1 = WB1 WA2 = WB2
Tenant A Tenant B
VM VM VM VM VM VM
PP WA WA2 WB2
RR
RS FQ
Overloaded
18
Tenant A Tenant B
VM VM VM VM VM VM
PP WA WA2 WB2
Controller
WA1 = WB1 WA2 = WB2
RR
RS FQ
18
max mismatch
Tenant A Tenant B
VM VM VM VM VM VM
Compute per-tenant +/- mismatch
PP WA WA2 WB2
Controller
WA1 = WB1 WA2 = WB2
RR
RS FQ
18
A←B WA1 > WB1
Tenant A Tenant B
VM VM VM VM VM VM
Compute per-tenant +/- mismatch
PP WA WA2 WB2
Controller
WA2 = WB2
RR
RS FQ
18
A←B A→B WA1 > WB1 WA2 < WB2
Tenant A Tenant B
VM VM VM VM VM VM
Compute per-tenant +/- mismatch
PP WA WA2 WB2
Controller
Reciprocal weight swap
RR
RS FQ
19
50% 50%
RS PP WA WA2 WB2
Controller Tenant A Tenant B
VM VM VM VM VM VM
GET 1101100
RR
WA1 > WB1 WA2 < WB2
FQ
19
50% 50%
RS PP WA WA2 WB2
Controller Tenant A Tenant B
VM VM VM VM VM VM
GET 1101100
RR
WA1 > WB1 WA2 < WB2
FQ
Tenant A
20
Tenant B
VM VM VM
Controller 50% 50%
RS PP WA WA2 WB2
VM VM VM
GET 1101100
WA1 > WB1 WA2 < WB2
FQ
RR
Tenant A
20
detect weight mismatch by request latency
Tenant B
VM VM VM
Controller 50% 50%
RS PP WA WA2 WB2
VM VM VM
GET 1101100
WA1 > WB1 WA2 < WB2
FQ
RR
Tenant A
20
60% 40%
detect weight mismatch by request latency
Tenant B
VM VM VM
Controller
RS PP WA WA2 WB2
VM VM VM
GET 1101100
WA1 > WB1 WA2 < WB2
FQ
RR
Tenant A
20
60% 40%
detect weight mismatch by request latency
Tenant B
VM VM VM
Controller
RS PP WA WA2 WB2
VM VM VM
GET 1101100
WA1 > WB1 WA2 < WB2
FQ
RR
21
Tenant A Tenant B
VM VM VM VM VM VM
Controller
RS PP WA WA2 WB2 FQ
WA2 < WB2 WA1 > WB1
RR
GET 1101100 GET 0100111
21
Tenant A Tenant B
VM VM VM VM VM VM
Controller
RS PP WA WA2 WB2 FQ
WA2 < WB2 WA1 > WB1
RR
Bandwidth limited Request Limited
21
Tenant A Tenant B
VM VM VM VM VM VM
Controller
RS PP WA WA2 WB2 FQ
WA2 < WB2
RR
Bandwidth limited Request Limited bottleneck resource (out bytes) fair share
21
Tenant A Tenant B
VM VM VM VM VM VM
Controller
RS PP WA WA2 WB2 FQ
WA2 < WB2
RR
Bandwidth limited Request Limited
22
Tenant A Tenant B
VM VM VM VM VM VM
Track per-tenant resource vector
Controller
RS PP WA WA2 WB2 FQ
WA2 < WB2
RR
Bandwidth limited Request Limited
22
Tenant A Tenant B
VM VM VM VM VM VM
Track per-tenant resource vector dominant resource fair share
Controller
RS PP WA WA2 WB2 FQ
WA2 < WB2
RR
Bandwidth limited Request Limited
22
Tenant A Tenant B
VM VM VM VM VM VM
Track per-tenant resource vector dominant resource fair share
Controller
RS PP WA WA2 WB2 FQ
WA2 < WB2
RR
23
Timescale System Visibility
RS
d
PP WA WA2 WB2 FQ
23
minutes Timescale System Visibility global
RS
Controller
d
PP WA WA2 WB2 FQ
23
minutes seconds Timescale System Visibility global
RS
Controller
d
PP WA WA2 WB2 FQ
23
minutes seconds microseconds Timescale System Visibility local global
RS
RR RR
...
SS SS
...
Controller
d
PP WA WA2 WB2 FQ
23
minutes seconds microseconds Timescale System Visibility local global
RS
RR RR
...
SS SS
...
Controller
d
Replica Selection Policies Weight Allocations
PP WA WA2 WB2 FQ
23
minutes seconds microseconds Timescale System Visibility local global
RS
RR RR
...
SS SS
...
Controller
d
Replica Selection Policies Weight Allocations
PP WA WA2 WB2 FQ
23
minutes seconds microseconds Timescale System Visibility local global
RS
RR RR
...
SS SS
...
Controller
d
Replica Selection Policies Weight Allocations
PP WA WA2 WB2 FQ
23
minutes seconds microseconds Timescale System Visibility local global
RS
RR RR
...
SS SS
...
Controller
d
Replica Selection Policies Weight Allocations
fairness and capacity constraints
PP WA WA2 WB2 FQ
23
minutes seconds microseconds Timescale System Visibility local global
RS
RR RR
...
SS SS
...
Controller
d
Replica Selection Policies Weight Allocations
fairness and capacity constraints
PP WA WA2 WB2 FQ
23
minutes seconds microseconds Timescale System Visibility local global
RS
RR RR
...
SS SS
...
Controller
d
Replica Selection Policies Weight Allocations
fairness and capacity constraints
PP WA WA2 WB2 FQ
23
minutes seconds microseconds Timescale System Visibility local global
RS
RR RR
...
SS SS
...
Controller
d
Replica Selection Policies Weight Allocations
fairness and capacity constraints
PP WA WA2 WB2 FQ
23
minutes seconds microseconds Timescale System Visibility local global
RS
RR RR
...
SS SS
...
Controller
d
Replica Selection Policies Weight Allocations
fairness and capacity constraints
PP WA WA2 WB2 FQ
23
minutes seconds microseconds Timescale System Visibility local global
RS
RR RR
...
SS SS
...
Controller
d
Replica Selection Policies Weight Allocations
fairness and capacity constraints
PP WA WA2 WB2 FQ
23
minutes seconds microseconds Timescale System Visibility local global
RS
RR RR
...
SS SS
...
Controller
d
Replica Selection Policies Weight Allocations
fairness and capacity constraints
PP WA WA2 WB2 FQ
23
minutes seconds microseconds Timescale System Visibility local global
RS
RR RR
...
SS SS
...
Controller
d
Maximum bottleneck flow weight exchange Replica Selection Policies Weight Allocations
fairness and capacity constraints
PP WA WA2 WB2 FQ
23
minutes seconds microseconds Timescale System Visibility local global
RS
RR RR
...
SS SS
...
Controller
d
Maximum bottleneck flow weight exchange FAST
replica selection Replica Selection Policies Weight Allocations
fairness and capacity constraints
PP WA WA2 WB2 FQ
23
minutes seconds microseconds Timescale System Visibility local global
RS
RR RR
...
SS SS
...
Controller
d
Maximum bottleneck flow weight exchange FAST
replica selection DRR token-based DRFQ scheduler Replica Selection Policies Weight Allocations
fairness and capacity constraints
PP WA WA2 WB2 FQ
24
24
24
24
24
24
24
24
24
25
20 40 60 80 100 120 140 160 180 10 20 30 40 50 60 70 80 GET Requests (kreq/s) Time (s)
Unmodified Membase Ideal fair share: 110 kreq/s (1kB requests)
0.57 MMR
Min-Max Ratio: min rate/max rate (0,1] 8 Tenants - 8 Client - 8 Storage Nodes Zipfian object popularity distribution
20 40 60 80 100 120 140 160 180 10 20 30 40 50 60 70 80 GET Requests (kreq/s) Time (s) 20 40 60 80 100 120 140 160 180 10 20 30 40 50 60 70 80 GET Requests (kreq/s) Time (s)
Unmodified Membase Ideal fair share: 110 kreq/s (1kB requests) Pisces
0.57 MMR 0.98 MMR
Min-Max Ratio: min rate/max rate (0,1] 8 Tenants - 8 Client - 8 Storage Nodes Zipfian object popularity distribution
27
Unmodified Membase
20 40 60 80 100 120 140 160 180 10 20 30 40 50 60 70 80 GET Requests (kreq/s) Time (s)
0.57 MMR
27
Unmodified Membase
0.36 MMR
20 40 60 80 100 120 140 160 180 10 20 30 40 50 60 70 80 GET Requests (kreq/s) Time (s)
0.57 MMR
20 40 60 80 100 120 140 160 180 10 20 30 40 50 60 70 80 GET Requests (kreq/s) Time (s)
2x vs 1x demand
27
Unmodified Membase
0.59 MMR 0.36 MMR 0.58 MMR
20 40 60 80 100 120 140 160 180 10 20 30 40 50 60 70 80 90
20 40 60 80 100 120 140 160 180 10 20 30 40 50 60 70 80 90
20 40 60 80 100 120 140 160 180 10 20 30 40 50 60 70 80 GET Requests (kreq/s) Time (s)
0.57 MMR FQ
20 40 60 80 100 120 140 160 180 10 20 30 40 50 60 70 80 GET Requests (kreq/s) Time (s)
2x vs 1x demand
27
Unmodified Membase
0.59 MMR 0.36 MMR 0.58 MMR 0.74 MMR
20 40 60 80 100 120 140 160 180 10 20 30 40 50 60 70 80 90 20 40 60 80 100 120 140 160 180 10 20 30 40 50 60 70 80 90
PP FQ
20 40 60 80 100 120 140 160 180 10 20 30 40 50 60 70 80 90 20 40 60 80 100 120 140 160 180 10 20 30 40 50 60 70 80 90
20 40 60 80 100 120 140 160 180 10 20 30 40 50 60 70 80 GET Requests (kreq/s) Time (s)
0.57 MMR FQ
20 40 60 80 100 120 140 160 180 10 20 30 40 50 60 70 80 GET Requests (kreq/s) Time (s)
2x vs 1x demand
0.64 MMR
27
Unmodified Membase
0.59 MMR 0.93 MMR 0.36 MMR 0.58 MMR 0.74 MMR
20 40 60 80 100 120 140 160 180 10 20 30 40 50 60 70 80 90 20 40 60 80 100 120 140 160 180 10 20 30 40 50 60 70 80 90
WA PP FQ
20 40 60 80 100 120 140 160 180 10 20 30 40 50 60 70 80 90 20 40 60 80 100 120 140 160 180 10 20 30 40 50 60 70 80 90
0.96 MMR
20 40 60 80 100 120 140 160 180 10 20 30 40 50 60 70 80 GET Requests (kreq/s) Time (s)
0.57 MMR FQ
20 40 60 80 100 120 140 160 180 10 20 30 40 50 60 70 80 GET Requests (kreq/s) Time (s)
2x vs 1x demand
0.64 MMR
27
Unmodified Membase
0.59 MMR 0.93 MMR 0.36 MMR 0.58 MMR 0.74 MMR
20 40 60 80 100 120 140 160 180 10 20 30 40 50 60 70 80 90 20 40 60 80 100 120 140 160 180 10 20 30 40 50 60 70 80 90 20 40 60 80 100 120 140 160 180 10 20 30 40 50 60 70 80 90
RS WA PP FQ PP FQ 0.90 MMR
20 40 60 80 100 120 140 160 180 10 20 30 40 50 60 70 80 90 20 40 60 80 100 120 140 160 180 10 20 30 40 50 60 70 80 90 20 40 60 80 100 120 140 160 180 10 20 30 40 50 60 70 80 90
0.96 MMR 0.89 MMR
20 40 60 80 100 120 140 160 180 10 20 30 40 50 60 70 80 GET Requests (kreq/s) Time (s)
0.57 MMR FQ
20 40 60 80 100 120 140 160 180 10 20 30 40 50 60 70 80 GET Requests (kreq/s) Time (s)
2x vs 1x demand
0.64 MMR
27
Unmodified Membase
0.59 MMR 0.93 MMR 0.98 MMR 0.36 MMR 0.58 MMR 0.74 MMR
20 40 60 80 100 120 140 160 180 10 20 30 40 50 60 70 80 90 20 40 60 80 100 120 140 160 180 10 20 30 40 50 60 70 80 90 20 40 60 80 100 120 140 160 180 10 20 30 40 50 60 70 80 90
RS WA PP FQ WA PP FQ 0.90 MMR
20 40 60 80 100 120 140 160 180 10 20 30 40 50 60 70 80 90 20 40 60 80 100 120 140 160 180 10 20 30 40 50 60 70 80 90 20 40 60 80 100 120 140 160 180 10 20 30 40 50 60 70 80 90
0.96 MMR 0.97 MMR 0.89 MMR
20 40 60 80 100 120 140 160 180 10 20 30 40 50 60 70 80 GET Requests (kreq/s) Time (s)
0.57 MMR FQ
20 40 60 80 100 120 140 160 180 10 20 30 40 50 60 70 80 GET Requests (kreq/s) Time (s)
2x vs 1x demand
0.64 MMR
28
875 1750 2625 3500
1kB Requests 10B Requests
Aggregate System Throughput
GET Requests (kreq/s) Unmodified Membase Pisces < 5% > 19%
29
4 heavy hitters 20 moderate demand 40 low demand
29
20 40 60 80 100 120 140 160 25 30 35 40 45 50 55 60 100x weight (4) 10x weight (20) 1x weight (40)
GET Requests (kreq/s) Time (s)
4 heavy hitters 20 moderate demand 40 low demand
29
20 40 60 80 100 120 140 160 25 30 35 40 45 50 55 60 100x weight (4) 10x weight (20) 1x weight (40)
GET Requests (kreq/s) Time (s) 0.98 MMR
4 heavy hitters 20 moderate demand 40 low demand
0.89 MMR 0.91 MMR
29
20 40 60 80 100 120 140 160 25 30 35 40 45 50 55 60 100x weight (4) 10x weight (20) 1x weight (40)
GET Requests (kreq/s) Time (s) 0.98 MMR
4 heavy hitters 20 moderate demand 40 low demand
0.89 MMR 0.91 MMR 0.91 MMR
29
20 40 60 80 100 120 140 160 25 30 35 40 45 50 55 60 100x weight (4) 10x weight (20) 1x weight (40)
GET Requests (kreq/s) Time (s) 0.98 MMR
4 heavy hitters 20 moderate demand 40 low demand
0.56 MMR 0.89 MMR 0.91 MMR 0.91 MMR
30
1kB workload bandwidth limited 10B workload request limited
100 200 300 400 500 600 700 800 20 25 30 35 40 45 50 55 60 1kB bandwidth limited 10B request limited
30
Time (s)
Bandwidth (Mb/s)
1kB workload bandwidth limited 10B workload request limited
100 200 300 400 500 600 700 800 20 25 30 35 40 45 50 55 60 1kB bandwidth limited 10B request limited 50 100 150 200 250 300 20 25 30 35 40 45 50 55 60
30
Time (s)
Bandwidth (Mb/s) GET Requests (kreq/s)
Time (s)
1kB workload bandwidth limited 10B workload request limited
100 200 300 400 500 600 700 800 20 25 30 35 40 45 50 55 60 1kB bandwidth limited 10B request limited 50 100 150 200 250 300 20 25 30 35 40 45 50 55 60
30
Time (s)
Bandwidth (Mb/s) GET Requests (kreq/s)
76% of bandwidth 76% of request rate
Time (s)
1kB workload bandwidth limited 10B workload request limited
100 200 300 400 500 600 700 800 20 25 30 35 40 45 50 55 60 1kB bandwidth limited 10B request limited 50 100 150 200 250 300 20 25 30 35 40 45 50 55 60
30
Time (s)
Bandwidth (Mb/s) GET Requests (kreq/s)
76% of bandwidth 76% of request rate
Time (s)
1kB workload bandwidth limited 10B workload request limited
24% of request rate
31
Constant Bursty Diurnal (2x wt) Tenant Demand
31
Constant Bursty Diurnal (2x wt)
50 100 150 200 10 20 30 40 50 60 70 80 90 GET Requests (kreq/s) Time (s)
Tenant Demand
31
Constant Bursty Diurnal (2x wt)
~2x
50 100 150 200 10 20 30 40 50 60 70 80 90 GET Requests (kreq/s) Time (s)
Tenant Demand
31
Constant Bursty Diurnal (2x wt)
~2x even
50 100 150 200 10 20 30 40 50 60 70 80 90 GET Requests (kreq/s) Time (s)
Tenant Demand
resources w/ high utilization
32
PP
Partition Placement
WA RS FQ
Weight Allocation Replica Selection Fair Queuing