Availability Knob
Flexible User-Defined Availability in the Cloud
Mohammad Shahrad and David Wentzlaff
October 5, 2016
Availability Knob Flexible User-Defined Availability in the Cloud - - PowerPoint PPT Presentation
Availability Knob Flexible User-Defined Availability in the Cloud Mohammad Shahrad and David Wentzlaff October 5, 2016 IaaS Providers and Availability Guarantees One thing in common: Fixed 99.95% availability! 2 Whats wrong with fixed
Mohammad Shahrad and David Wentzlaff
October 5, 2016
2
Fixed 99.95% availability! One thing in common:
3
Cloud customers:
Cloud infrastructures:
SW reliability
* WTP= Willingness to Pay
Let’s have clients ask for their desired availability and be charged correspondingly.
4
Cloud Scheduler Cloud Scheduler Cloud Scheduler Cloud Scheduler Cloud Scheduler Cloud Scheduler
5
Cloud management
Cloud Scheduler Cloud Scheduler Cloud Scheduler
Service Level Agreements (SLAs)
(e.g. 99.8% / 7 days)
6
e.g. (99.95%,1.00), (99.9%,0.95)
7
considering possible penalties using:
User’s experienced vs. requested DT Expected PM time-to-next-failure VM size and expected DT** length in case of failure
PM* Failure DB Service Record DB
* PM= Physical Machine ** DT= Downtime
8
Extra Knowledge on user availability demand enables new scheduling features: Benign VM* Migration (BVM) Deliberate Downtimes (DDT)
* VM= Virtual Machine
9
Periodic migration of over-served VMs to cheaper resources
* DTF= Downtime Fulfillment ** SLO= Service Level Objective
10
Requested Avail. Delivered Avail.
Safety Margin
How to set prices to ensure mutual benefit? How does AK make money?
11
Clients may:
Providers can:
* SLO= Service Level Objective ** DT = Downtime
Pricing for incentive compatibility
12
Using game theory to ensure:
13
Higher market efficiency through supply chain flexibility
Lowering OpEx, Extra Bidding/Sprinting
Compensates risks & supply/demand disparity ~10% Cost Reduction ~20% Profit Increase
14
Infrequency of Failures
Accelerated testing Simulations
15
Data center scale
[1] http://gdkomeg.en.made-in-china.com/productimage [1]
16
Scalability Resolution/Accuracy trade-off Diverse Applications Multiple VMs Various Machine Types (cost/resilience trade-off)
17
18
1000 machines, 12000 users, Normal demand dist., 6 month BVM every 1hr for top 10% of over-served clients
19
Cost Reduction
Increased Miss Rate 0.19% 0.34%
1000 machines, 12000 users, Uniform demand dist. [3 nines,5 nines], 30 days BVM every 1hr for top 10% of over-served clients
Benefits of BVM depend on machine type blend and data-center utilization.
20
1000 machines, 12000 users, Normal demand dist. [3 nines,5 nines], 6 month BVM every 1hr for top 10% of over-served clients
Benefits of DDT depend on demand distribution.
DDT
21
Downtime Price
* WTP= Willingness to Pay
AK Satisfaction Fixed-avail Satisfaction
22
23
Mohammad Shahrad
mshahrad@princeton.edu
Client must have the incentive to change his plan.
Price No change; Fixed A1
PA1
deliberate failures by user to earn cash back
PA1 − SCA1(αA1 +(1−α)A2)
Change to A2
αPA1 +(1−α)PA2
Plan update condition: Upper bound of SC given arbitrary P
25
26
Nash equilibrium:
27
1 2 3 4 5 6 7 8
Catastrophic Event Length (Hour)
10 20 30 40 50 60 70 80 90 100
Missed SLOs (%)
AK (Uniform Dist) Fixed Availability
28
29
30
can use to gather avail data:
31