Estimating Cloud Application Performance Based on Micro-Benchmark - - PowerPoint PPT Presentation

estimating cloud application performance based on micro
SMART_READER_LITE
LIVE PREVIEW

Estimating Cloud Application Performance Based on Micro-Benchmark - - PowerPoint PPT Presentation

Estimating Cloud Application Performance Based on Micro-Benchmark Profiling Joel Scheuner, Philipp Leitner Joel Scheuner ! scheuner@chalmers.se " joe4dev # @joe4dev Supported by Context: Public Infrastructure-as-a-Service Clouds IaaS


slide-1
SLIDE 1

Joel Scheuner ! scheuner@chalmers.se " joe4dev #@joe4dev

Estimating Cloud Application Performance Based on Micro-Benchmark Profiling

Joel Scheuner, Philipp Leitner

Supported by

slide-2
SLIDE 2

Context: Public Infrastructure-as-a-Service Clouds

IaaS PaaS SaaS

Applications Data Runtime Middleware OS Virtualization Servers Storage Networking Applications Data Runtime Middleware OS Virtualization Servers Storage Networking Applications Data Runtime Middleware OS Virtualization Servers Storage Networking

User-Managed Provider-Managed

Infrastructure-as-a-Service (IaaS) Platform-as-a-Service (PaaS) Software-as-a-Service (SaaS)

2018-07-02 IEEE CLOUD'18 2

slide-3
SLIDE 3

2018-07-02 IEEE CLOUD'18 3

Motivation: Capacity Planning in IaaS Clouds

What cloud provider should I choose?

https://www.cloudorado.com

slide-4
SLIDE 4

20 40 60 80 100 120 2 6 2 7 2 8 2 9 2 1 2 1 1 2 1 2 2 1 3 2 1 4 2 1 5 2 1 6 2 1 7 Number of Instance Type

2018-07-02 IEEE CLOUD'18 4

Motivation: Capacity Planning in IaaS Clouds

What cloud service (i.e., instance type) should I choose?

t2.nano 0.05-1 vCPU 0.5 GB RAM $0.006/h x1e.32xlarge 128 vCPUs 3904 GB RAM $26.688 hourly

à Impractical to Test all Instance Types

slide-5
SLIDE 5

2018-07-02 IEEE CLOUD'18 5

Topic: Performance Benchmarking in the Cloud

“The instance type itself is a very major tunable parameter”

! @brendangregg re:Invent’17 https://youtu.be/89fYOo1V2pA?t=5m4s

slide-6
SLIDE 6

2018-07-02 IEEE CLOUD'18 6

Background

Generic Artificial Resource- specific Specific Real-World Resource- heterogeneous

Micro Benchmarks

CPU Memory I/O Network Overall performance (e.g., response time)

Application Benchmarks

Domain Workload Resource Usage

slide-7
SLIDE 7

2018-07-02 IEEE CLOUD'18 7

Problem: Isolation, Reproducibility of Execution

Generic Artificial Resource-specific Specific Real-World Resource- heterogeneous

Micro Benchmarks

CPU Memory I/O Network Overall performance (e.g., response time)

Application Benchmarks

slide-8
SLIDE 8

2018-07-02 IEEE CLOUD'18 8

Question

Generic Artificial Resource-specific Specific Real-World Resource- heterogeneous

Micro Benchmarks

CPU Memory I/O Network Overall performance (e.g., response time)

Application Benchmarks

How relevant?

?

slide-9
SLIDE 9

2018-07-02 IEEE CLOUD'18 9

Research Questions

PRE – Performance Variability Does the performance of equally configured cloud instances vary relevantly? RQ1 – Estimation Accuracy How accurate can a set of micro benchmarks estimate application performance? RQ2 – Micro Benchmark Selection Which subset of micro benchmarks estimates application performance most accurately?

slide-10
SLIDE 10

2018-07-02 IEEE CLOUD'18 10

Idea

Micro Benchmarks

CPU Memory I/O Network Overall performance (e.g., response time)

Application Benchmarks

Evaluate a Prediction Model performance cost performance cost

VMN VMN

slide-11
SLIDE 11

2018-07-02 IEEE CLOUD'18 11

Methodology

Benchmark Design

slide-12
SLIDE 12

2018-07-02 IEEE CLOUD'18 12

CPU

  • sysbench/cpu-single-thread
  • sysbench/cpu-multi-thread
  • stressng/cpu-callfunc
  • stressng/cpu-double
  • stressng/cpu-euler
  • stressng/cpu-ftt
  • stressng/cpu-fibonacci
  • stressng/cpu-int64
  • stressng/cpu-loop
  • stressng/cpu-matrixprod

Memory

  • sysbench/memory-4k-block-size
  • sysbench/memory-1m-block-size

Broad resource coverage and specific resource testing

Micro Benchmarks

Micro Benchmarks

CPU Memory I/O Network

I/O

  • [file I/O] sysbench/fileio-1m-seq-write
  • [file I/O] sysbench/fileio-4k-rand-read
  • [disk I/O] fio/4k-seq-write
  • [disk I/O] fio/8k-rand-read

Network

  • iperf/single-thread-bandwidth
  • iperf/multi-thread-bandwidth
  • stressng/network-epoll
  • stressng/network-icmp
  • stressng/network-sockfd
  • stressng/network-udp

Software (OS)

  • sysbench/mutex
  • sysbench/thread-lock-1
  • sysbench/thread-lock-128
slide-13
SLIDE 13

2018-07-02 IEEE CLOUD'18 13

Application Benchmarks

Overall performance (e.g., response time)

Application Benchmarks

Molecular Dynamics Simulation (MDSim) WordPress Benchmark (WPBench)

Multiple short blogging session scenarios (read, search, comment)

20 40 60 80 100 00:00 01:00 02:00 03:00 04:00 05:00 06:00 07:00 08:00 Elapsed Time [min] Number of Concurrent Threads
slide-14
SLIDE 14

2018-07-02 IEEE CLOUD'18 14

Methodology

Benchmark Design Benchmark Execution

A Cloud Benchmark Suite Combining Micro and Applications Benchmarks

QUDOS@ICPE’18, Scheuner and Leitner

slide-15
SLIDE 15

2018-07-02 IEEE CLOUD'18 15

Execution Methodology

B A C C B A A C B

D) Randomized Multiple Interleaved Trials (RMIT)

30 benchmark scenarios 3 trials ~2-3h runtime

[1] A. Abedi and T. Brecht. Conducting repeatable experiments in highly variable cloud computing environments. ICPE’17

[1]

slide-16
SLIDE 16

2018-07-02 IEEE CLOUD'18 16

Benchmark Manager

Cloud WorkBench (CWB) Tool for scheduling cloud experiments ! sealuzh/cloud-workbench

Cloud Work Bench – Infrastructure-as- Code Based Cloud Benchmarking

CloudCom’14, Scheuner, Leitner, Cito, and Gall

Cloud WorkBench: Benchmarking IaaS Providers based on Infrastructure-as-Code

Demo@WWW’15, Scheuner, Cito, Leitner, and Gall

slide-17
SLIDE 17

2018-07-02 IEEE CLOUD'18 17

Methodology

Benchmark Design Benchmark Execution Data Pre- Processing Data Analysis

  • 4.41
4.3 3.16 3.32 6.83 5 10 20 30 40 50 m1.small (eu) m1.small (us) m3.medium (eu) m3.medium (us) m3.large (eu) Configuration [Instance Type (Region)] Relative Standard Deviation (RSD) [%]

A Cloud Benchmark Suite Combining Micro and Applications Benchmarks

QUDOS@ICPE’18, Scheuner and Leitner

Estimating Cloud Application Performance Based on Micro Benchmark Profiling

CLOUD’18, Scheuner and Leitner

slide-18
SLIDE 18

2018-07-02 IEEE CLOUD'18 18

Performance Data Set

eu + us eu + us eu

* * ECU := Elastic Compute Unit (i.e., Amazon’s metric for CPU performance)

>240 Virtual Machines (VMs) à 3 Iterations à ~750 VM hours >60’000 Measurements (258 per instance)

PRE RQ1+2

m1.small 1 1 1.7 PV Low m1.medium 1 2 3.75 Instance Type vCPU ECU RAM [GiB] Virtualization Network Performance PV Moderate m3.medium 1 3 3.75 PV /HVM Moderate 2 m1.large 4 7.5 PV Moderate 2 m3.large 6.5 7.5 HVM Moderate 2 m4.large 6.5 8.0 HVM Moderate 2 c3.large 7 3.75 HVM Moderate c4.large 2 8 3.75 HVM Moderate 4 c3.xlarge 14 7.5 HVM Moderate 4 c4.xlarge 16 7.5 HVM High c1.xlarge 8 20 7 PV High

slide-19
SLIDE 19

2018-07-02 IEEE CLOUD'18 19

4.41 4.3 3.16 3.32 4.14 2 outliers (54% and 56%)

5 10 20 30 m1.small (eu) m1.small (us) m3.medium (eu) m3.medium (us) m3.large (eu)

Configuration [Instance Type (Region)] Relative Standard Deviation (RSD) [%]

Threads Latency Fileio Random Network Fileio Seq. mean

PRE – Performance Variability Does the performance of equally configured cloud instances vary relevantly? Results

slide-20
SLIDE 20

2018-07-02 IEEE CLOUD'18 20

Instance Type1

(m1.small)

Instance Type2 Instance Type12

(c1.xlarge)

micro1, micro2, …, microN app1, app2

app1 micro1

Linear Regression Model

RQ1 – Estimation Accuracy How accurate can a set of micro benchmarks estimate application performance? Approach

Forward feature selection to optimize relative error

slide-21
SLIDE 21

2018-07-02 IEEE CLOUD'18 21

RQ1 – Estimation Accuracy How accurate can a set of micro benchmarks estimate application performance?

1000 2000 25 50 75 100

Sysbench − CPU Multi Thread Duration [s] WPBench Read − Response Time [ms] Instance Type

m1.small m3.medium (pv) m3.medium (hvm) m1.medium m3.large m1.large c3.large m4.large c4.large c3.xlarge c4.xlarge c1.xlarge

Group

test train

Relative Error (RE) = 12.5% !" = 99.2%

Results

slide-22
SLIDE 22

RQ2 – Micro Benchmark Selection Which subset of micro benchmarks estimates application performance most accurately?

8/25/18 Chalmers 22

Results

Relative Error [%] Micro Benchmark Sysbench – CPU Multi Thread 12 Sysbench – CPU Single Thread 454 Baseline vCPUs 616 ECU 359 Cost 663

(i.e., Amazon’s metric for CPU performance)

slide-23
SLIDE 23

2018-07-02 IEEE CLOUD'18 23

RQ – Implications

Suitability of selected micro benchmarks to estimate application performance Benchmarks cannot be used interchangeable à Configuration is important Baseline metrics vCPU and ECU are insufficient

slide-24
SLIDE 24

2018-07-02 IEEE CLOUD'18 24

Threats to Validity

Construct Validity

Almost 100% of benchmarking reports are wrong because benchmarking is "very very error-prone”1

[senior performance architect @Netflix]

à Guidelines, rationalization, open source

1 https://www.youtube.com/watch?v=vm1GJMp0QN4&feature=youtu.be&t=18m29s

Internal Validity

the extent to which cloud environmental factors, such as multi-tenancy, evolving infrastructure, or dynamic resource limits, affect the performance level of a VM instance

à Variability PRE, stop interfering process

External Validity (Generalizability)

Other cloud providers? Larger instance types? Other application domains?

à Future work

Reproducibility

the extent to which the methodology and analysis is repeatable at any time for anyone and thereby leads to the same conclusions ! dynamic cloud environment

à Fully automated execution, open source

slide-25
SLIDE 25

2018-07-02 IEEE CLOUD'18 25

Related Work

[1] Athanasia Evangelinou, Michele Ciavotta, Danilo Ardagna, Aliki Kopaneli, George Kousiouris, and Theodora Varvarigou. Enterprise applications cloud rightsizing through a joint benchmarking and optimization

  • approach. Future Generation Computer Systems, 2016

[2] Mauro Canuto, Raimon Bosch, Mario Macias, and Jordi Guitart. A methodology for full-system power modeling in heterogeneous data

  • centers. In Proceedings of the 9th International Conference on Utility and

Cloud Computing (UCC ’16), 2016 [3] Kenneth Hoste, Aashish Phansalkar, Lieven Eeckhout, Andy Georges, Lizy K. John, and Koen De Bosschere. Performance prediction based on inherent program similarity. In PACT ’06, 2006

Application Performance Prediction Application Performance Profiling

  • System-level resource monitoring [1,2]
  • Compiler-level program similarity [3]
  • Trace and reply with Cloud-Prophet [4,5]
  • Bayesian cloud configuration refinement

for big data analytics [6]

[4] Ang Li, Xuanran Zong, Ming Zhang, Srikanth Kandula, and Xiaowei Yang. Cloud-prophet: predicting web application performance in the cloud. ACM SIGCOMM Poster, 2011 [5] Ang Li, Xuanran Zong, Srikanth Kandula, Xiaowei Yang, and Ming Zhang. Cloud-prophet: Towards application performance prediction in cloud. In Proceedings of the ACM SIGCOMM 2011 Conference (SIGCOMM ’11), 2011 [6] Omid Alipourfard, Hongqiang Harry Liu, Jianshu Chen, Shivaram Venkataraman, Minlan Yu, and Ming Zhang. Cherrypick: Adaptively unearthing the best cloud configurations for big data

  • analytics. In 14th USENIX Symposium on Networked Systems Design and

Implementation (NSDI 17), 2017

slide-26
SLIDE 26

2018-07-02 IEEE CLOUD'18 26

Conclusion

Methodology

Benchmark Design Benchmark Execution Data Pre- Processing Data Analysis

  • 4.41
4.3 3.16 3.32 6.83 5 10 20 30 40 50 m1.small (eu) m1.small (us) m3.medium (eu) m3.medium (us) m3.large (eu) Configuration [Instance Type (Region)] Relative Standard Deviation (RSD) [%]

A Cloud Benchmark Suite Combining Micro and Applications Benchmarks

QUDOS@ICPE’18, Scheuner and Leitner

Estimating Cloud Application Performance Based on Micro Benchmark Profiling

CLOUD’18, Scheuner and Leitner

RQ – Implications

Suitability of selected micro benchmarks to estimate application performance Benchmarks cannot be used interchangeable à Configuration is important Baseline metrics vCPU and ECU are insufficient

! scheuner@chalmers.se " # joe4dev

RQ1 – Estimation Accuracy How accurate can a set of micro benchmarks estimate application performance?

1000 2000 25 50 75 100 Sysbench − CPU Multi Thread Duration [s] WPBench Read − Response Time [ms] Instance Type m1.small m3.medium (pv) m3.medium (hvm) m1.medium m3.large m1.large c3.large m4.large c4.large c3.xlarge c4.xlarge c1.xlarge Group test train Relative Error (RE) = 12.5% !" = 99.2% Results RQ2 – Micro Benchmark Selection Which subset of micro benchmarks estimates application performance most accurately? Results Relative Error [%] Micro Benchmark Sysbench – CPU Multi Thread 12 Sysbench – CPU Single Thread 454 Baseline vCPUs 616 ECU 359 Cost 663 (i.e., Amazon’s metric for CPU performance)

20 40 60 80 100 120 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 Number of Instance Type

Motivation: Capacity Planning in IaaS Clouds

What cloud service (i.e., instance type) should I choose?

t2.nano 0.05-1 vCPU 0.5 GB RAM $0.006/h x1e.32xlarge 128 vCPUs 3904 GB RAM $26.688 hourly

à Impractical to Test all Instance Types