SPOTLYTICS: HOW TO USE CLOUD MARKET PLACES FOR DATA ANALYTICS? TIM - PowerPoint PPT Presentation

SPOTLYTICS: HOW TO USE CLOUD MARKET PLACES FOR DATA ANALYTICS? TIM KRASKA, ELKHAN DADASHOV, CARSTEN BINNIG

CLOUD IAAS Idea: Rent virtual machines from and run your software (e.g., DBMS, Spark, etc.) small large medium extra large Typical Pricing Models • On-demand: fixed price per hour (e.g., 10 cent/hour) • Reserved: basic fee based on contract over x years + lower hourly rate compared to on-demand

MARKET-BASED IAAS IaaS providers overprovision their resources Market-based IaaS: Overcapacity is sold under a dynamic pricing scheme • High Overcapacity => Low Price • Low Overcapacity => High Price (BUT also other parameters influence price) Main provider: Amazon Spot Instances 3

AWS INSTANCES SPOT: USAGE MODEL Bid Price ≥ Market Price: instance is granted Bid Price < Market Price: instance is not granted / revoked Market Price Bid Price = 5 cent

AWS SPOT INSTANCES: PRICE MODEL Prices are different per instance type + region + zone Market Price On-demand (no contract) Reserved (3 years) 5

AWS SPOT INSTANCES: BILLING Billing is based on an interval ε (1h for Spot) Bid Price = 5 cent Costs : price at launch time*intervals (re-evaluated every interval) Discount: for non-full intervals if instance is terminated by provider 6

CHALLENGES FOR ANALYTICS ON SPOT Main goal should be to save monetary cost Fault-tolerance of systems plays a key role Other Peculiarities: • all machines of the same type fail together • weird almost binary (high price, low price) behavior • price fluctuations for some types suddenly stopped • abnormally high spikes • etc.

PROBLEM STATEMENT • Given job J (e.g., Map-Reduce program, a SQL query) and a fault-tolerance strategy FT • Find the best deployment strategy to minimize the overall monetary cost of executing Q Deployment Strategy? Price: 5c / hour Type: 3 x m4.large

COARSE-GRAINED RESTART Scheme implemented in a Distributed DBMS Node 1 1 1 3 3 4 4 5 5 2 2 Recovery: Restart complete query Node 2 1 1 3 3 4 4 5 5 2 2 9

FINE-GRAINED RESTART + CHECKPOINTS Scheme implemented in Hadoop Temp Node 1 1 Temp Temp 3 4 4 5 Temp 2 Recovery: Restart of individual operator instances Temp Node 2 1 Temp Temp 3 4 5 Temp 2 10

FINE-GRAINED RESTART + LINEAGE Scheme implemented in Spark Node 1 1 1 3 4 5 3 4 2 2 Recovery: Restart of individual operator instances + lineage Node 2 1 3 4 5 2 11

CONTRIBUTIONS OF THIS PAPER? Cost analysis for different fault-tolerance strategies • Coarse-grained Query Restart • Fine-grained Restart / Check pointing • Fine-grained Restart / Lineage Result 1. It is never beneficial to shut down an instance before the end of the billing interval ε .

COARSE-GRAINED RESTART Runtime costs of a job J (wo failure) • Job is composed of multiple tasks • Runtime of task on one instance: R • Runtime of task on n instances: R/n On failure: Complete Restart Result 2 . Running a job in a single billing interval ε is cheaper than running the job with fewer resources over several intervals

Result 2 . Running a job in a single billing interval ε is cheaper than running the job with fewer resources over several intervals • Assume that q · m is the number of machines to run the job in exactly one billing interval • Then m the number of machines to run the job in q intervals • Thus, cost for a successful run are equal • However, probability for failure increases with runtime k

COARSE-GRAINED RESTART Runtime costs of a Job J (wo failure) • Job is composed of multiple tasks • Runtime of task on one instance: R = R CPU /I CPU (R CPU : Total Cycles, I CPU : Cycles of instance in one ε ) • Runtime of task on n instances: R/n On failure: Complete Restart Result 2 . Running a job in a single billing interval ε is cheaper than running the job with fewer resources over several intervals Result 3. Using more machines to finish early can be beneficial (depending on the failure rate λ ).

EXP: VARYING # OF MACHINE Low Failure Rate ( λ =0.75 -> every 800 minutes) Many Few instances instances Setup: us-east-1c–m1.large–Linux instance type with on-demand price of $0.175 and a bid price of $0.0263 (15% of on-demand price)

EXP: VARYING # OF MACHINE High Failure Rate ( λ =1.8 -> every 33 minutes) Many Few instances instances Setup: us-east-1c–m1.large–Linux instance type with on-demand price of $0.175 and a bid price of $0.0263 (15% of on-demand price)

FINE-GRAINED + CHECKPOINT Result 4. The expected cost of using n or 2 · n machines for a job is the “same” with check-pointing Intuition: Checkpointing allows to resume work “w/o loosing” invested work • Doubling machines reduces runtime by half but increases cost per • billing interval by two

FINE-GRAINED + CHECKPOINT Result 4. The expected cost of using n or 2 · n machines for a job is the “same” with check-pointing Intuition: Checkpointing allows to resume work “w/o loosing” invested work • Doubling machines reduces runtime by half but increases cost per • billing interval by two Result 5. Using a single instance to finish a job in a single checkpointing interval is the cheapest and most risk-averse option. Intuition: High variance for one interval (i.e., pay nothing or all) • Less variance for more intervals •

EXP: ONE VS. MANY MACHINES Medium of the prices from 4 years as the bid- price Setup: three machine types, m2.2xlarge, m2.4xlarge, and m2.xlarge all from the us-east-1a data center

FINE-GRAINED + LINEAGE Result 6. Same as Coarse-grained Query Restart on Spot Instances if we do not mix instance types

CONCLUSIONS Market-based IaaS for Data Analytics Main Contributions: Cost Analysis for different FT schemes Query Restart: Get more machines to pay less • Fine-grained / Checkpointed (Hadoop): One machine saves most • Fine-grained / Lineage (Spark): Same as query restart • Future work: Mixing instance types, bid prices for deployment • Minimize runtime for given budget •

SPOTLYTICS: HOW TO USE CLOUD MARKET PLACES FOR DATA ANALYTICS? TIM - PowerPoint PPT Presentation

SPOTLYTICS: HOW TO USE CLOUD MARKET PLACES FOR DATA ANALYTICS? TIM KRASKA, ELKHAN DADASHOV, CARSTEN BINNIG CLOUD IAAS Idea: Rent virtual machines from and run your software (e.g., DBMS, Spark, etc.) small large medium extra large Typical

KAFKA STREAMS CLOUD MONITORING AWS CLOUD MONITORING AWS APP CLOUD MONITORING AWS HTTP APP

Building a Private Cloud Cloud Infrastructure Using Opensource Building a Private Cloud OSCON

SNR SNR- -cloud interaction cloud interaction cloud interaction SNR SNR cloud interaction

Cloud Cloud Cloud Cloud network Edge Edge Edge Edge as a Edge Edge Edge Edge Edge

Cloud Ross Mallace Commercial Director Cloud/SaaS Cloud is here. ALL By 2020 most core

Embracing Cloud Ian Apperley Agenda A little about me What is Cloud and where did it come

Are We Really Cloud-Native? Bert Ertman Cloud-Native Computing What is Cloud-Native? answer:

CS5412: THE CLOUD VALUE PROPOSITION Lecture XXII Ken Birman Cloud Hype 2 The cloud is

SAS and (the) Cloud Dave Annis SAS Solutions onDemand SAS and (the) Cloud Everyones Cloud

Cloud Computing & Cloud Models Cloud Models Topics Defining cloud computing

CS5412: THE CLOUD VALUE PROPOSITION Lecture XXII Ken Birman Cloud Hype 2 The cloud is

Electron Cloud Build Electron Cloud Build- Electron Cloud Build Electron Cloud Build -Up

Cloud-iQ New features including xSP reporting Crayon Channel Team Cloud-iQ updates The Cloud-iQ

NVIDIA GPUs in the Cloud 4 EVOLVING CLOUD REQUIREMENTS On Off Hybrid Cloud premises premises

Nico Uys Cloud Business Line Manager 1 Recent SAP on cloud projects Lessons learned

Cloud-Integrated IP Design: Bursting EDA Workflows to the Public Cloud Jerome McFarland,

High Price Gapping Play 99 CENTS STORES 14.4 14.3 14.2 14.1 14.0 13.9 13.8 13.7 13.6

On the Size and the Approximability of Minimum Temporally Connected Subgraphs Dimitris Fotakis

Lecture 15: Exact Tensor Completion Joint Work with David Steurer Lecture Outline Part I:

A Model of Black Hole Evaporation and 4D Weyl Anomaly RIKEN-iTHES Yuki Yokokura with H. Kawai

AerCap Holdings N.V. Aengus Kelly, CEO January 2017 Industry Update Looking Back PASSENGER

Procurement and Business Aspects of the Cloud Belnet Mario Vandaele Brussels 19th

STATEWIDE TRUCK PARKING STUDY Great American Trucking Show August 2019 Texas Freight Mobility

Hadoop Infrastructure @Uber Past , Present and Future Mayank Bansal U B E R | Data Ubers

SPOTLYTICS: HOW TO USE CLOUD MARKET PLACES FOR DATA ANALYTICS? TIM - PowerPoint PPT Presentation

SPOTLYTICS: HOW TO USE CLOUD MARKET PLACES FOR DATA ANALYTICS? TIM KRASKA, ELKHAN DADASHOV, CARSTEN BINNIG CLOUD IAAS Idea: Rent virtual machines from and run your software (e.g., DBMS, Spark, etc.) small large medium extra large Typical

KAFKA STREAMS CLOUD MONITORING AWS CLOUD MONITORING AWS APP CLOUD MONITORING AWS HTTP APP

Building a Private Cloud Cloud Infrastructure Using Opensource Building a Private Cloud OSCON

SNR SNR- -cloud interaction cloud interaction cloud interaction SNR SNR cloud interaction

Cloud Cloud Cloud Cloud network Edge Edge Edge Edge as a Edge Edge Edge Edge Edge

Cloud Ross Mallace Commercial Director Cloud/SaaS Cloud is here. ALL By 2020 most core

Embracing Cloud Ian Apperley Agenda A little about me What is Cloud and where did it come

Are We Really Cloud-Native? Bert Ertman Cloud-Native Computing What is Cloud-Native? answer:

CS5412: THE CLOUD VALUE PROPOSITION Lecture XXII Ken Birman Cloud Hype 2 The cloud is

SAS and (the) Cloud Dave Annis SAS Solutions onDemand SAS and (the) Cloud Everyones Cloud

Cloud Computing &amp; Cloud Models Cloud Models Topics Defining cloud computing

CS5412: THE CLOUD VALUE PROPOSITION Lecture XXII Ken Birman Cloud Hype 2 The cloud is

Electron Cloud Build Electron Cloud Build- Electron Cloud Build Electron Cloud Build -Up

Cloud-iQ New features including xSP reporting Crayon Channel Team Cloud-iQ updates The Cloud-iQ

NVIDIA GPUs in the Cloud 4 EVOLVING CLOUD REQUIREMENTS On Off Hybrid Cloud premises premises

Nico Uys Cloud Business Line Manager 1 Recent SAP on cloud projects Lessons learned

Cloud-Integrated IP Design: Bursting EDA Workflows to the Public Cloud Jerome McFarland,

High Price Gapping Play 99 CENTS STORES 14.4 14.3 14.2 14.1 14.0 13.9 13.8 13.7 13.6

On the Size and the Approximability of Minimum Temporally Connected Subgraphs Dimitris Fotakis

Lecture 15: Exact Tensor Completion Joint Work with David Steurer Lecture Outline Part I:

A Model of Black Hole Evaporation and 4D Weyl Anomaly RIKEN-iTHES Yuki Yokokura with H. Kawai

AerCap Holdings N.V. Aengus Kelly, CEO January 2017 Industry Update Looking Back PASSENGER

Procurement and Business Aspects of the Cloud Belnet Mario Vandaele Brussels 19th

STATEWIDE TRUCK PARKING STUDY Great American Trucking Show August 2019 Texas Freight Mobility

Hadoop Infrastructure @Uber Past , Present and Future Mayank Bansal U B E R | Data Ubers

Cloud Computing & Cloud Models Cloud Models Topics Defining cloud computing