Information Models: Creating and Preserving Value in Volatile - - PowerPoint PPT Presentation

information models creating and preserving value in
SMART_READER_LITE
LIVE PREVIEW

Information Models: Creating and Preserving Value in Volatile - - PowerPoint PPT Presentation

Information Models: Creating and Preserving Value in Volatile Resources Chaojie Zhang, Varun Gupta, Andrew A. Chien University of Chicago June 25, 2019 ROSS Workshop 1 Chien - ROSS 2019 Excess Resources in the Cloud IaaS demand expanding


slide-1
SLIDE 1

Information Models: Creating and Preserving Value in Volatile Resources

Chaojie Zhang, Varun Gupta, Andrew A. Chien University of Chicago June 25, 2019 ROSS Workshop

1 Chien - ROSS 2019

slide-2
SLIDE 2

Excess Resources in the Cloud

  • IaaS demand expanding
  • Demand fluctuates
  • Capacities must meet peak

demand

  • à excess resources
  • Excess offered as volatile

resources

Chien - ROSS 2019 2

Excess Resources Foreground load Resources

N

slide-3
SLIDE 3

Cloud Operator

What are Volatile Resources?

  • Unreliable, can be unilaterally

revoked

  • Examples
  • Google Preemptible VMs
  • AWS Spot Instances
  • Consequences
  • Wasted work
  • Delayed critical path

Chien - ROSS 2019 3

Requests Reliable Resource User Volatile Resource User Revocation Requests

slide-4
SLIDE 4

Volatile Resource Availability

Arming Users with Information

Chien - ROSS 2019 4

Volatile Resource Availability

Information Models (summary)

slide-5
SLIDE 5

Maximizing Value of Volatile Resources

  • What information model do users need to maximize their value of

volatile resources

  • Assume if user value maximized à cloud providers can sell for more

money

Chien - ROSS 2019 5

Optimized Use Information Models Volatile Resource User

Volatile Resource Availability

slide-6
SLIDE 6

Main Contributions

  • Show a specific information model that dramatically increases users’

ability to achieve value (small)

  • Cloud providers can provide information models without

compromising internal resource management flexibility

  • Results are robust over 608 AWS Spot Instance pools
  • 4 regions, millions of CPUs

Chien - ROSS 2019 6

slide-7
SLIDE 7

Information Models

  • What information enables users to target volatile resources to extract

most value?

  • Interval duration PDF's
  • 1. MTTR
  • 2. 10pctile
  • 3. 90pctile

Full

Chien - ROSS 2019 7

slide-8
SLIDE 8

Evaluation of Information Models

  • Resource Dynamics: 3-month 608 AWS Spot Instance pools
  • 5 minute intervals, 15 million data points
  • User behaviors
  • Match computations to resource (duration ~ time to revocation)
  • Maximize the expectation of value of job duration on the intervals
  • Utility function
  • Step function (batch and workflow tasks)
  • Metrics:
  • Total User Value

Chien - ROSS 2019 8

slide-9
SLIDE 9

Evaluation: Total Value vs. Information Models

  • Comparing three information models
  • 90pctile gives best results
  • 30% value increase

Chien - ROSS 2019 9

slide-10
SLIDE 10

Evaluation: Total Value of Information Models

  • Comparing three information models, and Full is a reference
  • 90pctile gives best results
  • 30% value increase
  • Limited information models can achieve most of the benefit of Full, 90%
  • Results are robust over vast majority of 608 instance pools

Chien - ROSS 2019 10

slide-11
SLIDE 11

Evaluation: Robustness of Info Model Benefit

Mean of 608 pools

Chien - ROSS 2019 11

  • But, cloud providers use a range of volatile resource management

(VRM, revocation) policies?

  • Information Model benefit and ordering is robust across
  • A range of VRMs
  • All 608 instance pools
slide-12
SLIDE 12

Information Models: Summary

  • It’s hard for users to maximize value with no information, and cloud

providers afraid of sharing too much

  • With just limited information (mean + 90th percentile) dramatically

increase user value

  • However, cloud providers worry that information model will constrain

resource management

Chien - ROSS 2019 13

slide-13
SLIDE 13

Original foreground load Increased frequency Original foreground load Increased Magnitude

Challenge: Statistical Guarantees and Resource Management “Freedom”

  • So, if we gave out an information model (statistical guarantee) :

Does it constrain resource management?

  • Changed foreground load à Changed statistics

Chien - ROSS 2019 14

Original foreground load Volatile resource availability

slide-14
SLIDE 14

What about a Change in Magnitude?

  • Consider drastic reduction

in volatile resources (1->1/K)

  • K = 1, 2, 3
  • How does this affect 90pctile?
  • 2-week sliding window
  • Magnitude change has no

impact on 90pctile statistical guarantees à No constraint!

Chien - ROSS 2019 15

slide-15
SLIDE 15

What about a Change in Frequency?

  • Increase volatile resource

variation frequency by contracting time base (1->1/F)

  • F = 1, 2, 3
  • How does this affect 90pctile?
  • 2-week sliding window
  • Frequency change reduces

90pctile dramatically

  • Violates the guarantee!

Chien - ROSS 2019 16

slide-16
SLIDE 16

Can We Preserve the Guarantee?

  • Idea: Guarantee-Preserving Resource

Management

  • Maintain 90pctile guarantee under frequency

change

  • Offline Static Algorithm
  • Reshape the distribution by withholding each

interval for X minutes

  • kills short intervals, shortens long intervals
  • What is the best X?
  • Find smallest X that preserves guarantee

Chien - ROSS 2019 17

X minutes

slide-17
SLIDE 17

Online Dynamic Algorithms

  • Idea: AIMD, Online Targeting
  • Doubles the 90pctile – preserves

the guarantee and reduces job failures

  • Info Model => Good user value
  • Preserving RM => Providers’

flexibilities

Chien - ROSS 2019 18

slide-18
SLIDE 18

Classifying 608 Instance Types

  • 3 Classes of Instance Types
  • Stable, Transition, Unstable
  • 400 Stable
  • The 90pctile is consistent
  • 177 Transition
  • 90pctile guarantee is

matched most of the time

  • 31 Unstable
  • 90pctile unstable, low,

unusable

Chien - ROSS 2019 20

Stable Transition

slide-19
SLIDE 19

Evaluation: Preserving 90pctile Guarantees

  • Guarantee Preserving Algorithms
  • Effective for Stable pools
  • Helpful for Transition pools

Violation Percentage (time)

Chien - ROSS 2019 21

slide-20
SLIDE 20

Related Work

  • Volatile Resource Characterization
  • Characterization of price [Javadi 2011, Tang 2012, Wolski 2017], revocation

behavior [Chohan 2010]

  • Engineering Reliable Resources
  • Checkpointing [Khatua 2013], replication [Voorsluys 2012, Xu 2016 ],

migration [Yi 2013, Jung 2013]

  • Construct an “economy class” of nearly reliable resources [Carvalho 2014]
  • Value of Information
  • Transient guarantee [Shastri 2016]
  • Guarantee Preserving Algorithms
  • None

Chien - ROSS 2019 22

slide-21
SLIDE 21

Summary & Future Work

  • Small information model à large increase in user value
  • 90pctile info model: two numbers
  • 30% average increase, up to 2X
  • 90% of the benefit of full disclosure
  • Guarantee preserving algorithms can preserve guarantees and maintain cloud

provider’s flexibility

  • Results robust over 608 AWS Spot Instance pools
  • For more information: http://zccloud.cs.uchicago.edu/ and
  • Chaojie Zhang, Varun Gupta, and Andrew A. Chien, Information Models: Creating

and Preserving Value in Volatile Cloud Resources , in the IEEE International Conference on Cloud Engineering (IC2E), June 2019, Prague, Czechoslovakia.

Chien - ROSS 2019 23