Cloud Spot Markets are Not Sustainable: The Case for Transient - - PowerPoint PPT Presentation
Cloud Spot Markets are Not Sustainable: The Case for Transient - - PowerPoint PPT Presentation
Cloud Spot Markets are Not Sustainable: The Case for Transient Guarantees Supreeth Subramanya, Amr Rizk, David Irwin g n i l Idle Cloud Capacity l e S Shared warehouse scale has its limitations machines tend to have 10-50% utilization
Idle Cloud Capacity
2/15
❝ Shared warehouse scale machines tend to have 10-50% utilization ❞
[2013] The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines.
has its limitations
S e l l i n g
Users bid in a 2nd price auction EC2 continually evaluates supply- demand to price spot servers Allocate: bid price ≥ spot price Revoke: bid price < spot price
Commoditized compute
3/15
Commodity Spot Markets
Commodity and futures markets are great at pricing the resources and balancing supply and demand Not possible to “beat the market” by Predicting future prices Efficient Market Hypothesis Mature markets are inherently VolatilE but …
4/15
Compute Time vs. Other Commodities
Compute time is ❝ stateful ❞
vs.
. . .
- 1. Losing a server unpredictably incurs an overhead
- 2. This overhead decreases the useful compute time of the server
∴ market volatility reduces amount of compute time purchased
A s t h e c l
- u
d s p
- t
m a r k e t s m a t u r e , t h e v a l u e
- f
r e s
- u
r c e s t h e y a l l
- c
a t e w i l l d e c r e a s e
5/15
Understanding Spot Market Characteristics
6/15
Spot Servers are Intrinsically Less Valuable!
Stateful batch job Spot VM Checkpoint to remote disk
Topt ≈ √ (2 * 𝝴 * MTTR)
Optimal interval
- f checkpointing
Single-node batch job on a spot VM
❝ On average, spot servers get less work done per unit of time compared to an equivalent on-demand server ❞
Expected runtime
E[Tspot] = T + ( * 𝝴 ) + ( * ) T Topt T MTTR
Topt 2
Checkpointing Overhead Recomputation Actual Runtime
7/15
Ton-demand
E[Tspot] = Peq Pon-demand * Equilibrium Price of Spot
(or price when spot stops being cheap)
Spot Servers are Intrinsically Less Valuable!
Stateful batch job Spot VM
➕
Completion time 20 hours
Stateful batch job On-demand
➕
Completion time 12 hours
❝ For this application, a spot server with 40% discount on the on-demand price, provides no savings at all ❞
8/15
AS
Unit time Availability 1
Time to checkpoint Compute time t
Available, Not Volatile, Predictable
a1 a2 a3 a4 aV
. . . Unit time Availability 1
Time to checkpoint Compute time ∑ai = AS t
Available, Volatile, Predictable
Unit time Availability 1
Time to checkpoint Compute time t Lost time
fchkp
a3, a4 < fchkp
Available, Volatile, Unpredictable Needs just one checkpointing Needs as many checkpoints as there are revocations Needs periodic checkpointing
Distilling the Spot Market Characteristics
We identify three key metrics: Availability, Volatility, Predictability
9/15
Useful Server Time Chkp Overhead Recomputation
20 40 60 80 100
OnDemand c4.large cg1.4xl
Performance (% of On-demand)
Useful Server Time Chkp Overhead Recomputation
Equilibrium price of markets
Market Characteristics Impact the Performance
0.3 0.6 0.9 1.2 Jan 1 Jan 15 Feb 1 Feb 15 Mar 1
Hourly price (in $)
c4.large (Linux) us-east-1 Mature markets are more volatile and less predictable
5 10 15 20 25 Jan 1 Jan 15 Feb 1 Feb 15 Mar 1
Hourly price (in $)
cg1.4xlarge (Linux) us-east-1 Deprecated/rarely used markets are less volatile and more predictable
10/15
On Spot Market Evolution
11/15
State of EC2 Spot Markets
(Adaption level, Cost and Complexity)
Low adaption, Priced cheaply, Complex to use
2009-2014
Demand ≈ Supply, Equilibrium price, Convenient to use
Under mature market conditions
Increasing adaption, Priced moderately, Decreasing complexity
2015 onwards
As they mature, cloud spot markets may not maximize the value of idle cloud capacity
12/15
Transient Guarantees
❝ Uncertainty is more stressful than knowing for sure something bad will happen ❞
de Berker, Archy O., et al. “Computations of uncertainty mediate acute stress responses in humans.” Nature communications 7 (2016)
13/15
Idle Cloud Capacity Highly Available nodes Highly Volatile nodes EC2 Spot and GCE Preemptible No explicit information on availability and volatility Transient Guarantees (MTTR based)
Class-1 (high MTTR) Class-N (low MTTR)
Why Transient Guarantees?
Not all spots are alike, and there are many ways to sell them
14/15
Transient Guarantees
Providing probabilistic assurances on availability, volatility and predictability of spot servers
E.g., Class-1 servers come with an MTTR of 55 hours, and Class-4 servers 2 hours Increase revenue through differentiated offering Retain the freedom to reclaim any server Able to value spot servers correctly Minimize fault-tolerance overhead Partitioning transient nodes into classes Fixed pricing vs. market pricing Verifying transient guarantees
Thank you!
Supreeth Subramanya
http://people.umass.edu/ssubramanya/
15/15