availability knob
play

Availability Knob Flexible User-Defined Availability in the Cloud - PowerPoint PPT Presentation

Availability Knob Flexible User-Defined Availability in the Cloud Mohammad Shahrad and David Wentzlaff October 5, 2016 IaaS Providers and Availability Guarantees One thing in common: Fixed 99.95% availability! 2 Whats wrong with fixed


  1. Availability Knob Flexible User-Defined Availability in the Cloud Mohammad Shahrad and David Wentzlaff October 5, 2016

  2. IaaS Providers and Availability Guarantees One thing in common: Fixed 99.95% availability! 2

  3. What’s wrong with fixed availability? Cloud infrastructures: Cloud customers: • Heterogeneous HW & 
 • Various downtime demands • Different WTP* SW reliability * WTP= Willingness to Pay 3

  4. The Availability Knob (AK) Let’s have clients ask for their desired availability and be charged correspondingly. 4

  5. Cloud Cloud Scheduler What should change in Scheduler cloud to support AK? Service Level Agreements (SLAs) Cloud Cloud Cloud Cloud Cloud Scheduler Scheduler Scheduler Scheduler Scheduler Cloud management • Gathering failure data and build failure stats • Avail-aware scheduling 5 Cloud Cloud Scheduler Scheduler

  6. How do SLAs look with AK? 1. Desired Avail. / Period 
 2. Availability price scale 
 (e.g. 99.8% / 7 days) e.g. (99.95%,1.00), (99.9%,0.95) 3. Variable service credit (penalty) 6

  7. The AK Scheduler PM* Failure DB Service Record DB 1. Check for available resources 2. Find the cheapest resource considering possible penalties using: Expected PM time-to-next-failure VM size and expected DT** length in case of failure User’s experienced vs. requested DT * PM= Physical Machine ** DT= Downtime 7

  8. AK-Specific Scheduler Features Extra Knowledge on user availability demand enables new scheduling features: Benign VM* Migration (BVM) Deliberate Downtimes (DDT) * VM= Virtual Machine 8

  9. Benign VM Migration (BVM) • VMs can be over-served • Low failure rate • Assignment to HR resources (resource shortfall) Periodic migration of over-served VMs to cheaper resources * DTF= Downtime Fulfillment ** SLO= Service Level Objective 9

  10. Deliberate Downtimes (DDT) • Providers can deliberately fail VMs near the end of period. Motivations: • • Bidding redeemed resources Building market incentives • • etc. Lowering energy consumption • Delivered Avail. Safety Margin Requested Avail. 10

  11. Economics of AK How to set prices to ensure mutual benefit? How does AK make money? 11

  12. Incentive Compatibility Providers can: Clients may: - run buggy VMs - neglect meeting SLOs* - cause deliberate DTs**. Pricing for incentive compatibility Using game theory to ensure: - Providers maximize profit margin by not violating SLOs - Clients pay less by asking their true demands * SLO= Service Level Objective ** DT = Downtime 12

  13. How does AK make money? 1. Adapting service to real demand: 
 Higher market efficiency through supply chain flexibility 2. More efficient resource utilization: 
 Lowering OpEx, Extra Bidding/Sprinting 3. Variable profit margins: 
 Compensates risks & supply/demand disparity ~10% Cost Reduction ~20% Profit Increase 13

  14. AK Deployment • No hardware change required • Low technology adoption cost • Existing fixed availability a subset of AK • Can be offered as an optional feature • Easy shift to the new model 14

  15. How to evaluate AK? Infrequency of Failures Accelerated testing Simulations Data center scale 1. Stochastic simulations in MATLAB [1] 2. Prototype implementation with OpenStack [1] http://gdkomeg.en.made-in-china.com/productimage 15

  16. AKSim: Stochastic Cloud Simulator Various Machine Types Scalability (cost/resilience trade-off) Resolution/Accuracy trade-off Diverse Applications Multiple VMs 16

  17. OpenStack AK Prototype 17

  18. Availability-aware Scheduler 1000 machines, 12000 users, Normal demand dist., 6 month BVM every 1hr for top 10% of over-served clients 18

  19. Benign VM Migration (BVM) ~7% Cost Reduction Increased Miss Rate 0.19% 0.34% Benefits of BVM depend on machine type blend 1000 machines, 12000 users, Uniform demand dist. [3 nines,5 nines], 30 days and data-center utilization . BVM every 1hr for top 10% of over-served clients 19

  20. Deliberate Downtimes (DDT) DDT Benefits of DDT depend on demand distribution . 1000 machines, 12000 users, Normal demand dist. [3 nines,5 nines], 6 month BVM every 1hr for top 10% of over-served clients 20

  21. Improved Service Satisfaction AK Satisfaction Fixed-avail Satisfaction Downtime Price * WTP= Willingness to Pay 21

  22. Things to Remember • Supply chain flexibility -> market efficiency • Knowing user demand can enable new techniques • Game theory to ensure mutual economic incentive • Leveraging reliability/cost trade-offs 22

  23. The Availability Knob Mohammad Shahrad mshahrad@princeton.edu 23

  24. Back-up Slides

  25. What if client’s demand changed? Client must have the incentive to change his plan. Upper bound of SC given arbitrary P Price Plan update condition: No change; Fixed A 1 PA 1 deliberate failures by PA 1 − SCA 1 ( α A 1 +(1 −α ) A 2 ) user to earn cash back α PA 1 +(1 −α ) PA 2 Change to A 2 25

  26. Nash Equilibrium Nash equilibrium: 26

  27. Catastrophic Failure & AK • When the whole cloud service is down. 100 90 80 70 Missed SLOs (%) 60 50 40 30 20 AK (Uniform Dist) 10 Fixed Availability 0 0 1 2 3 4 5 6 7 8 Catastrophic Event Length (Hour) 27

  28. Why OpenStack • VM migration (unlike Eucalyptus) • Diverse hypervisor support (KVM) • AWS Compatibility • Big community (good support) • Real world adoption in public/private/hybrid clouds 28

  29. Some More Results 29

  30. Service Credit Reshaping 30

  31. Availability Monitoring Tools • There are some performance monitoring tools AK can use to gather avail data: • Nagios (used in AWS) • Zabbix • Ganglia 31

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend