Cloud Spot Markets are Not Sustainable: The Case for Transient - - PowerPoint PPT Presentation

cloud spot markets are not sustainable
SMART_READER_LITE
LIVE PREVIEW

Cloud Spot Markets are Not Sustainable: The Case for Transient - - PowerPoint PPT Presentation

Cloud Spot Markets are Not Sustainable: The Case for Transient Guarantees Supreeth Subramanya, Amr Rizk, David Irwin g n i l Idle Cloud Capacity l e S Shared warehouse scale has its limitations machines tend to have 10-50% utilization


slide-1
SLIDE 1

Cloud Spot Markets are Not Sustainable:

Supreeth Subramanya, Amr Rizk, David Irwin

The Case for Transient Guarantees

slide-2
SLIDE 2

Idle Cloud Capacity

2/15

❝ Shared warehouse scale machines tend to have 10-50% utilization ❞

[2013] The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines.

has its limitations

S e l l i n g

Users bid in a 2nd price auction EC2 continually evaluates supply- demand to price spot servers Allocate: bid price ≥ spot price Revoke: bid price < spot price

Commoditized compute

slide-3
SLIDE 3

3/15

Commodity Spot Markets

Commodity and futures markets are great at pricing the resources and balancing supply and demand Not possible to “beat the market” by Predicting future prices Efficient Market Hypothesis Mature markets are inherently VolatilE but …

slide-4
SLIDE 4

4/15

Compute Time vs. Other Commodities

Compute time is ❝ stateful ❞

vs.

. . .

  • 1. Losing a server unpredictably incurs an overhead
  • 2. This overhead decreases the useful compute time of the server

∴ market volatility reduces amount of compute time purchased

A s t h e c l

  • u

d s p

  • t

m a r k e t s m a t u r e , t h e v a l u e

  • f

r e s

  • u

r c e s t h e y a l l

  • c

a t e w i l l d e c r e a s e

slide-5
SLIDE 5

5/15

Understanding Spot Market Characteristics

slide-6
SLIDE 6

6/15

Spot Servers are Intrinsically Less Valuable!

Stateful batch job Spot VM Checkpoint to remote disk

Topt ≈ √ (2 * 𝝴 * MTTR)

Optimal interval

  • f checkpointing

Single-node batch job on a spot VM

❝ On average, spot servers get less work done per unit of time compared to an equivalent on-demand server ❞

Expected runtime

E[Tspot] = T + ( * 𝝴 ) + ( * ) T Topt T MTTR

Topt 2

Checkpointing Overhead Recomputation Actual Runtime

slide-7
SLIDE 7

7/15

Ton-demand

E[Tspot] = Peq Pon-demand * Equilibrium Price of Spot

(or price when spot stops being cheap)

Spot Servers are Intrinsically Less Valuable!

Stateful batch job Spot VM

Completion time 20 hours

Stateful batch job On-demand

Completion time 12 hours

❝ For this application, a spot server with 40% discount on the on-demand price, provides no savings at all ❞

slide-8
SLIDE 8

8/15

AS

Unit time Availability 1

Time to checkpoint Compute time t

Available, Not Volatile, Predictable

a1 a2 a3 a4 aV

. . . Unit time Availability 1

Time to checkpoint Compute time ∑ai = AS t

Available, Volatile, Predictable

Unit time Availability 1

Time to checkpoint Compute time t Lost time

fchkp

a3, a4 < fchkp

Available, Volatile, Unpredictable Needs just one checkpointing Needs as many checkpoints as there are revocations Needs periodic checkpointing

Distilling the Spot Market Characteristics

We identify three key metrics: Availability, Volatility, Predictability

slide-9
SLIDE 9

9/15

Useful Server Time Chkp Overhead Recomputation

20 40 60 80 100

OnDemand c4.large cg1.4xl

Performance (% of On-demand)

Useful Server Time Chkp Overhead Recomputation

Equilibrium price of markets

Market Characteristics Impact the Performance

0.3 0.6 0.9 1.2 Jan 1 Jan 15 Feb 1 Feb 15 Mar 1

Hourly price (in $)

c4.large (Linux) us-east-1 Mature markets are more volatile and less predictable

5 10 15 20 25 Jan 1 Jan 15 Feb 1 Feb 15 Mar 1

Hourly price (in $)

cg1.4xlarge (Linux) us-east-1 Deprecated/rarely used markets are less volatile and more predictable

slide-10
SLIDE 10

10/15

On Spot Market Evolution

slide-11
SLIDE 11

11/15

State of EC2 Spot Markets

(Adaption level, Cost and Complexity)

Low adaption, Priced cheaply, Complex to use

2009-2014

Demand ≈ Supply, Equilibrium price, Convenient to use

Under mature market conditions

Increasing adaption, Priced moderately, Decreasing complexity

2015 onwards

As they mature, cloud spot markets may not maximize the value of idle cloud capacity

slide-12
SLIDE 12

12/15

Transient Guarantees

❝ Uncertainty is more stressful than knowing for sure something bad will happen ❞

de Berker, Archy O., et al. “Computations of uncertainty mediate acute stress responses in humans.” Nature communications 7 (2016)

slide-13
SLIDE 13

13/15

Idle Cloud Capacity Highly Available nodes Highly Volatile nodes EC2 Spot and GCE Preemptible No explicit information on availability and volatility Transient Guarantees (MTTR based)

Class-1 (high MTTR) Class-N (low MTTR)

Why Transient Guarantees?

Not all spots are alike, and there are many ways to sell them

slide-14
SLIDE 14

14/15

Transient Guarantees

Providing probabilistic assurances on availability, volatility and predictability of spot servers

E.g., Class-1 servers come with an MTTR of 55 hours, and Class-4 servers 2 hours Increase revenue through differentiated offering Retain the freedom to reclaim any server Able to value spot servers correctly Minimize fault-tolerance overhead Partitioning transient nodes into classes Fixed pricing vs. market pricing Verifying transient guarantees

slide-15
SLIDE 15

Thank you!

Supreeth Subramanya

http://people.umass.edu/ssubramanya/

15/15