Optimizing Client-side Resource Utilization in Public Clouds - - PowerPoint PPT Presentation

optimizing client side resource utilization in public
SMART_READER_LITE
LIVE PREVIEW

Optimizing Client-side Resource Utilization in Public Clouds - - PowerPoint PPT Presentation

Optimizing Client-side Resource Utilization in Public Clouds Swapnil Haria, Mihir Patil, Haseeb Tariq, Anup Rathi Outline Motivation Solution Implementation Evaluation Conclusion Outline Motivation Solution


slide-1
SLIDE 1

Optimizing Client-side Resource Utilization in Public Clouds

Swapnil Haria, Mihir Patil, Haseeb Tariq, Anup Rathi

slide-2
SLIDE 2

Outline

  • Motivation
  • Solution
  • Implementation
  • Evaluation
  • Conclusion
slide-3
SLIDE 3

Outline

  • Motivation
  • Solution
  • Implementation
  • Evaluation
  • Conclusion
slide-4
SLIDE 4

Cloud Services ( Not a distraction anymore1)

[1] Jeff Bezos' Risky Bet, November 2006, http://www.bloomberg.com/bw/stories/2006-11-12/jeff-bezos-risky-bet

slide-5
SLIDE 5
  • 30 % of total cloud revenue
  • Annual revenues crossed $5 Billion

Cloud Services ( Not a distraction anymore1)

[1] Jeff Bezos' Risky Bet, November 2006, http://www.bloomberg.com/bw/stories/2006-11-12/jeff-bezos-risky-bet

slide-6
SLIDE 6

Cloud Services ( Not a distraction anymore1)

  • 30 % of total cloud revenue
  • Annual revenues crossed $5 Billion

Other Players :

[1] Jeff Bezos' Risky Bet, November 2006, http://www.bloomberg.com/bw/stories/2006-11-12/jeff-bezos-risky-bet

slide-7
SLIDE 7

Popularity

  • ZERO up-front capital expenses
  • On-demand hardware availability
  • Flexible pricing options
slide-8
SLIDE 8

Popularity

  • ZERO up-front capital expenses
  • On-demand hardware availability
  • Flexible pricing options

"Amazon EC2 changes the economics of computing by allowing you to pay only for capacity that you actually use."

slide-9
SLIDE 9

Popularity

  • ZERO up-front capital expenses
  • On-demand hardware availability
  • Flexible pricing options

"Amazon EC2 changes the economics of computing by allowing you to pay only for capacity that you actually use."

Elastic Cloud Compute

slide-10
SLIDE 10

Popularity

  • ZERO up-front capital expenses
  • On-demand hardware availability
  • Flexible pricing options

"Amazon EC2 changes the economics of computing by allowing you to pay only for capacity that you actually use."

Elastic Cloud Compute

slide-11
SLIDE 11

Limitations

  • Allocate resources in fixed sized chunks (EC2 Instances)
  • 1 core , 1GB RAM -> 36 core, 244 GB RAM
  • Accurately predict application requirements
  • Undersized VM - Performance degradation
  • Oversized VM - Extra costs

Multiple applications, multiple VMs, no peace

slide-12
SLIDE 12

Challenges

  • Application requirements vary widely
  • Black Friday for e-commerce websites

http://www.xad.com/media-mentions/mobile-activity-on-xmas-eve-24-pct-higher-than-black-friday/

slide-13
SLIDE 13

Challenges

  • Application requirements vary widely
  • Black Friday for e-commerce websites
  • Evenings and late nights for Netflix

http://www.techspot.com/news/46048-netflix-represents-327-of-north-americas-peak-web-traffic.html

slide-14
SLIDE 14

Challenges

  • Application requirements vary widely
  • Black Friday for e-commerce websites
  • Evenings and late nights for Netflix
  • Slashdot effect!

CMUSphinx Project

slide-15
SLIDE 15

Challenges

  • Humans are bad at estimating workload requirements2
  • Study of developers at Twitter submitting jobs to datacenter
  • 70% overestimated by 10x
  • 20% underestimated by 5x

terrible

[2]Quasar: Resource-Efficient and QoS-Aware Cluster Management. Christina Delimitrou and Christos Kozyrakis. ASPLOS 2014.

slide-16
SLIDE 16

Outline

  • Motivation
  • Solutions
  • Implementation
  • Evaluation
  • Conclusion
slide-17
SLIDE 17

Resource as a Service3

  • 1. Fine grained cloud reservations
  • 2. CPU (cycles), memory (pages), I/O (bandwidth), Time

(seconds)

  • Where does it stop?
  • Reduces wasted costs, but difficult to reason about
  • Hardware feasibility issues for service providers

[3] The rise of RaaS: the resource-as-a-service cloud. Orna Agmon Ben-Yehuda et al. Commun. ACM 2014

slide-18
SLIDE 18

Proposal

slide-19
SLIDE 19

Tell me more!

Application Mobility Real-time Management

slide-20
SLIDE 20

Application Mobility

  • On-demand application migration across machines
  • Conventional issues -
  • Application state stored in kernel (file descriptors, sockets)
  • Residual dependencies left on source machine
  • Execution Continuity

We need

  • Process Isolation (even from kernel)
  • Minimal state in kernel
slide-21
SLIDE 21

Now where did I see that before?

Image Source - Wikipedia

slide-22
SLIDE 22

Where do I find one of these?

Old idea, but making a comeback in Cloud OS

  • Drawbridge from Microsoft Research
  • MirageOS from University of Cambridge

Both (claim to) support application-migration!

slide-23
SLIDE 23

Real-time Management

  • Monitor application requirements in real-time
  • Use application migration to organize processes on VMs
slide-24
SLIDE 24

Real-time Management

  • Monitor application requirements in real-time
  • Relatively easy
  • Working set sizes, idle cycles
  • Use application migration to organize processes on VMs
  • Complex
  • Varying configurations and prices of VMs
  • Identifying processes to migrate
  • Downtime / Budgets!
slide-25
SLIDE 25

Policies

Steps

  • Determine migration events
  • Identify process(es) for migration
  • Choose target from existing VMs, if possible
  • Figure out instance types for creating new VMs
slide-26
SLIDE 26

Policies

Metrics (in order of priority)

  • Maximize VM utilization
  • Satisfy performance guarantees
  • Minimize costs

User-Defined Parameters

  • Upper limit on cost
  • Max downtime per process
slide-27
SLIDE 27

Policies

  • Single Application per VM
  • Easy to reason about
  • Use naive best fit model to find target VMs
  • Multiple Applications per VM
  • Highly complex optimization problem (NP-Hard)
  • Use Heuristics!
  • Use best fit and explore nearby options to find target VMs
slide-28
SLIDE 28

Software Architecture

slide-29
SLIDE 29

Software Architecture

slide-30
SLIDE 30

Software Architecture

slide-31
SLIDE 31

Software Architecture

slide-32
SLIDE 32

Outline

  • Motivation
  • Solutions
  • Implementation
  • Evaluation
  • Conclusion
slide-33
SLIDE 33

Proof of Concept Model

  • Linux Containers (lxc)
  • Emulate isolated processes on Drawbridge/MirageOS
  • Checkpoint/Restore in Userspace (CRIU)
  • Checkpoint containers on VM A
  • Migrate files to VM B
  • Restore on VM B
slide-34
SLIDE 34

Simulator

  • Rapidly validate migration policies
  • Evaluate the influence of policy parameters on results
  • Written in about 2000 lines of Java code
slide-35
SLIDE 35

Outline

  • Motivation
  • Solutions
  • Implementation
  • Evaluation
  • Conclusion
slide-36
SLIDE 36

Experimental Setup

  • Proof of concept model(WIP)
  • Live migrating SPEC benchmarks running in LXC
  • Observed downtime – 30 seconds (depending of process size)
  • Migration Policy Simulations
  • Used our own random workload generator
  • 2 workloads of each type – static, high variability and low variability
slide-37
SLIDE 37

Capping Costs

10 20 30 40 50 60 3 4 4.5 5

Max spending limit per day (dollars)

Number of Migrations

Single app Multiple apps 50 100 150 200 250 300 350 400 3 4 4.5 5

Max spending limit per day (dollars)

Overcommitment

Single app Multiple apps

slide-38
SLIDE 38

Constraining Downtime

12.5 13 13.5 14 14.5 15 15.5 2 3 4 5

Max migrations per process per day

Total Cost

Single app Multiple apps 100 200 300 400 500 600 2 3 4 5

Max migrations per process per day

Overcommitment

Single app Multiple apps

slide-39
SLIDE 39

Suppressing Spikes

10 20 30 40 50 60 1 4 8

Median window size

Number of Migrations

Single app Multiple apps 50 100 150 200 250 1 4 8

Median window size

Overcommitment

Single app Multiple apps

slide-40
SLIDE 40

Show me the money

  • Baseline
  • Used same workloads as the simulation
  • Picked from available VMs that would best fit the workloads
  • No migrations!
  • Cost for 3 days - $45.36
  • Our solution
  • No migration policy requires more than $15 for 3 days
  • 66% money saved!
slide-41
SLIDE 41

Conclusions

  • Streamlining cloud operations important with increasing scale
  • Current IaaS reservation models insufficient
  • Better support needed from cloud providers
  • Amazon EC2 Container Service
  • Migration policies have to optimize in a multi-dimensional space
  • Simple ones offer savings too!
slide-42
SLIDE 42

Questions?

slide-43
SLIDE 43

BACKUPS

slide-44
SLIDE 44

Single application per VM

slide-45
SLIDE 45

Effect of cost per day

5 10 15 20 25 30 35 40 45 3 4 4.5 5

Max amount allowed per day (dollars)

Migrations and Cost

Migrations Cost 20 40 60 80 100 120 140 3 4 4.5 5

Max amount allowed per day (dollars)

Overcommitment

Overcommitment

slide-46
SLIDE 46

Migrations cap

5 10 15 20 25 30 35 2 3 4

Max number of migrations per process per day

Migrations and Cost

Migrations Cost 100 200 300 400 500 600 2 3 4

Max number of migrations per process per day

Overcommitment

Overcommitment

slide-47
SLIDE 47

Median window variations

5 10 15 20 25 30 35 40 45 50 1 4 8

Migrations and Cost

Migrations Cost 20 40 60 80 100 120 140 160 180 200 1 4 8

Overcommitment

Overcommitment

slide-48
SLIDE 48

Multiple applications per VM

slide-49
SLIDE 49

Effect of cost per day

10 20 30 40 50 60 3 4 4.5 5

Max amount allowed per day (dollars)

Migrations and Cost

Migrations Cost 50 100 150 200 250 300 350 400 3 4 4.5 5

Max amount allowed per day (dollars)

Overcommitment

Overcommitment

slide-50
SLIDE 50

Migrations cap

5 10 15 20 25 30 35 40 45 50 3 4 5

Max number of migrations per process per day

Migrations and Cost

Migrations Cost 100 200 300 400 500 600 3 4 5

Max number of migrations per process per day

Overcommitment

Overcommitment

slide-51
SLIDE 51

Median window variations

10 20 30 40 50 60 1 4 8

Migrations and Cost

Migrations Cost 50 100 150 200 250 1 4 8

Overcommitment

Overcommitment