Towards Understanding the Workload of a IaaS Cloud Lo c Perennou - - PowerPoint PPT Presentation

towards understanding the workload of a iaas cloud
SMART_READER_LITE
LIVE PREVIEW

Towards Understanding the Workload of a IaaS Cloud Lo c Perennou - - PowerPoint PPT Presentation

Towards Understanding the Workload of a IaaS Cloud Lo c Perennou Outscale, ISEP loic.perennou@outscale.com September 13, 2018 1 / 25 Outline Introduction Data Collection Comparison of Outscales and Azures Workloads Relationship


slide-1
SLIDE 1

Towards Understanding the Workload of a IaaS Cloud

Lo¨ ıc Perennou

Outscale, ISEP loic.perennou@outscale.com

September 13, 2018

1 / 25

slide-2
SLIDE 2

Outline

Introduction Data Collection Comparison of Outscale’s and Azure’s Workloads Relationship Between Tags and CPU Utilization

2 / 25

slide-3
SLIDE 3

Outline

Introduction Data Collection Comparison of Outscale’s and Azure’s Workloads Relationship Between Tags and CPU Utilization

3 / 25

slide-4
SLIDE 4

Outscale

◮ Founded in 2010, acquired by Dassault Systemes in 2017. ◮ Provides virtualized hardware like VMs, and services to

manage them.

◮ Develops its own orchestrator, TINA OS, compatible with

Amazon EC2.

4 / 25

slide-5
SLIDE 5

Motivations

◮ We need to make resource allocation fit utilization. ◮ Utilization is unknown when a VM starts, but could be

predicted by ML.

◮ Data must be available to propose and test models.

5 / 25

slide-6
SLIDE 6

Related Cloud Workload Traces

Organization Google Eucalyptus Sys. Bitbrains Azure year 2011 2014 2015 2017 # jobs/VMs 0.7M jobs 9,173 VMs 1,750 VMs 2M VMs resource usage no no yes yes starts/stops yes yes no yes reference [1, 2] [5] [3] [4]

◮ Problem : We are not sure if Outscale’s workload is similar to

Azure’s.

6 / 25

slide-7
SLIDE 7

Outline

Introduction Data Collection Comparison of Outscale’s and Azure’s Workloads Relationship Between Tags and CPU Utilization

7 / 25

slide-8
SLIDE 8

Overview

user system TINA

  • rchestrator

syslog database infrastructure calls manages logs reads counters

  • perating system

VM1 VM2 probe server sends hardware

API

2 data sources:

◮ Logs of user actions from TINA OS ◮ Measurements of hardware utilization of Virtual Machines

8 / 25

slide-9
SLIDE 9

Descriptive Statistics

◮ 4 months ◮

700 000 VMs in total

10 000 VMs running simultaneously

9 / 25

slide-10
SLIDE 10

Outline

Introduction Data Collection Comparison of Outscale’s and Azure’s Workloads Relationship Between Tags and CPU Utilization

10 / 25

slide-11
SLIDE 11

Distribution of Resources Requested by VMs

OSC internal 20 40 60 80 100

36 24 20 16 8 4 2 1

OSC client OSC all Microsoft % of VMs

(a) cores requested

OSC client OSC internal OSC all Microsoft 20 40

60 80 100

% of VMs

[32;inf[ [16;32[ [8;16[ [4;8[ [2;4[ [0;2[

(b) ram requested

◮ Internal accounts at Outscale launch small VMs (test). ◮ Clients create bigger VMs than at Microsoft.

11 / 25

slide-12
SLIDE 12

Distribution of Runtime

100 101 102 103 104 105

rutime (minutes) 20 40 60

100

CDF (P{runtime<x} = y)

  • utscale client

microsoft

80

◮ The runtime of 65% VMs is ¡ 1h. ◮ Clients create slightly longer VMs than at Microsoft.

12 / 25

slide-13
SLIDE 13

VM Start Rate

20 40 60 80

100 120 140

160

hour OfWeek

1

1

2

  • utscale_client

microsoft

number of VMs started (smoothed)

◮ 2 peaks/day at Outscale, 1 at Microsoft. ◮ Less activity at Outscale in the weekend.

13 / 25

slide-14
SLIDE 14

Relationship Between Start Time and Runtime

◮ Daily creation of VMs from Monday to Friday. ◮ VMs created on Friday run during the whole weekend.

14 / 25

slide-15
SLIDE 15

Conclusion on Workload Comparison

◮ Bigger requests, longer runtimes at Outscale. ◮ Relatively more activity during the week, less in weekends. ◮ Activity patterns exists, at least for some users.

15 / 25

slide-16
SLIDE 16

Outline

Introduction Data Collection Comparison of Outscale’s and Azure’s Workloads Relationship Between Tags and CPU Utilization

16 / 25

slide-17
SLIDE 17

Definition of Tags

Freely-typed string that describes a VM.

◮ Example (ideal): “Release 2.4 of Kafka used in production”. ◮ Example (real):

“EV6MTNDBLU FUn3xlIATTiOAoDJYIeYGA MT Database2 0 420403n2q”.

17 / 25

slide-18
SLIDE 18

Methodology

◮ Group VMs according to their tags (clustering). ◮ Visualize the CPU utilization of VMs within each cluster.

18 / 25

slide-19
SLIDE 19

Convert Text Tags to Vectors for Clustering

Figure: Dictionary Vectorization

19 / 25

slide-20
SLIDE 20

Hierarchical Clustering

◮ At the beginning, there is 1 group per vector. ◮ The two closest groups are merged (based on the distance

between their elements).

20 / 25

slide-21
SLIDE 21

Visualization of the CPU utilization of tag groups

Figure: group A

Low utilization for every VM

21 / 25

slide-22
SLIDE 22

Visualization of the CPU utilization of tag groups

Figure: group B

Tags alone fail to explain the variance.

22 / 25

slide-23
SLIDE 23

Conclusion

◮ Resource allocation of VMs needs to be based on predicted

utilization.

◮ Predictive models need data to be trained and tested. ◮ Outscale’s data is different from Azure’s and justifies that we

look for our own models.

◮ Tex information (tags) could provide interesting features

(ongoing work).

23 / 25

slide-24
SLIDE 24

References I

  • J. Wilkes, “More Google cluster data.” Google research blog,
  • Nov. 2011.

Posted at http://googleresearch.blogspot.com/2011/ 11/more-google-cluster-data.html.

  • C. Reiss, A. Tumanov, G. R. Ganger, R. H. Katz, and M. A.

Kozuch, “Heterogeneity and dynamicity of clouds at scale: Google trace analysis,” in Proceedings of the Third ACM Symposium on Cloud Computing, SoCC ’12, (New York, NY, USA), pp. 7:1–7:13, ACM, 2012.

  • S. Shen, V. v. Beek, and A. Iosup, “Statistical characterization
  • f business-critical workloads hosted in cloud datacenters,” in

2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp. 465–474, May 2015.

24 / 25

slide-25
SLIDE 25

References II

  • E. Cortez, A. Bonde, A. Muzio, M. Russinovich, M. Fontoura,

and R. Bianchini, “Resource central: Understanding and predicting workloads for improved resource management in large cloud platforms,” in Proceedings of the 26th Symposium

  • n Operating Systems Principles, SOSP ’17, (New York, NY,

USA), pp. 153–167, ACM, 2017.

  • R. Wolski and J. Brevik, “Using parametric models to

represent private cloud workloads,” IEEE Transactions on Services Computing, vol. 7, pp. 714–725, Oct 2014.

25 / 25