Automatic clustering of similar VM to improve the scalability of - - PowerPoint PPT Presentation

automatic clustering of similar vm to improve the
SMART_READER_LITE
LIVE PREVIEW

Automatic clustering of similar VM to improve the scalability of - - PowerPoint PPT Presentation

Automatic clustering of similar VM to improve the scalability of monitoring and management in IaaS cloud infrastructures C. Canali R. Lancellotti University of Modena and Reggio Emilia Department of Engineering Enzo Ferrari December


slide-1
SLIDE 1

December 6th, 2013 - DEIB - PoliMi 1

Automatic clustering of similar VM to improve the scalability of monitoring and management in IaaS cloud infrastructures

  • C. Canali
  • R. Lancellotti

University of Modena and Reggio Emilia Department of Engineering “Enzo Ferrari”

slide-2
SLIDE 2

December 6th, 2013 - DEIB - PoliMi 2

WEBLab

  • WEBLab: Web Engineering and

Benchmarking Lab

  • Contributing to

– DIEF - Department of Engineering

“Enzo Ferrari” (not only automotive)

– CRIS - Research center of Security

  • Research interests

– Distributed systems – Cloud computing – Performance / scalability issues – Monitoring in distributed systems – Security in networked / cloud systems – ...

slide-3
SLIDE 3

December 6th, 2013 - DEIB - PoliMi 3

Agenda

  • Background and motivation

– IaaS Cloud – Reference scenario – Traditional approach vs. clustering – Impact on monitoring and management

  • Clustering based on metric correlation

– Theoretical model(s) – Experimental evaluation

  • Clustering based on Bhattacharyya distance

– Theoretical model(s) – Experimental evaluation

  • Conclusion and future work
slide-4
SLIDE 4

December 6th, 2013 - DEIB - PoliMi 4

Cloud computing

slide-5
SLIDE 5

December 6th, 2013 - DEIB - PoliMi 5

Cloud computing

  • Cloud computing AKA Utility computing
  • Access to resources and services:

– Multiple customers

same provider →

– Leveraging economies of scale – No initial cost (pay per use) – Exploit virtualization technologies

  • Multiple cloud paradigms:

– SaaS – PaaS – IaaS

NOTE: We may still have long-time commitments (e.g. reserved instances)

slide-6
SLIDE 6

December 6th, 2013 - DEIB - PoliMi 6

Challenges: monitoring

  • Large data centers (> 105 VMs)

huge amount of data →

  • Multiple data centers

geographic data exchange →

  • VM can be anything

treat VM as black boxes →

  • → Scalability issues

VM VM VM VM VM VM VM VM VM VM VM VM

slide-7
SLIDE 7

December 6th, 2013 - DEIB - PoliMi 7

Challenges: monitoring

  • Current approach

reduce amount of data in a uniform way: →

– Reduce sampling frequency – Reduce number of metrics considered

  • → Reduced monitoring effectiveness

– Less information available to take

management decision

slide-8
SLIDE 8

December 6th, 2013 - DEIB - PoliMi 8

Challenges: management

  • Large data centers

large opt. problems →

– Too many variables – Too many bounds – Like a huge multi-dimensional tetris

  • VM can be anything

treat VM as black boxes → difficult search for → complementary workloads

  • → Scalability issues

VM VM VM VM VM VM VM VM VM VM VM VM

slide-9
SLIDE 9

December 6th, 2013 - DEIB - PoliMi 9

Challenges: management

  • Current approach

reduce amount of bounds: →

– Assume VM resource utilization constant

  • ver long periods (e.g. day/night)

– Reduce number of metrics considered – Consider only nominal resource utilization

→ rely on hierarchical management

  • → Reduced management effectiveness

– No support for fine grained management – Sub-optimal management decisions

slide-10
SLIDE 10

December 6th, 2013 - DEIB - PoliMi 10

VM VM

Exploiting VM similarity

  • No information on VM behavior is used

to improve scalability

  • Proposal: automatically cluster VMs with similar

behavior

  • Requirements:

– No human intervention – No models for VM classes – No crystal ball

VM VM VM VM VM VM VM VM VM VM VM VM VM VM

CL1 CL2

slide-11
SLIDE 11

December 6th, 2013 - DEIB - PoliMi 11

Improving monitoring scalability

  • Group similar VMs together
  • Elect a few (e.g., 3) cluster representatives

– Support for byzantine failures in

representatives

  • Detailed monitoring of cluster

representatives

  • Reduced monitoring of other VMs

VM VM VM VM VM VM VM VM

CL1 CL2

slide-12
SLIDE 12

December 6th, 2013 - DEIB - PoliMi 12

Improving monitoring scalability

  • Numeric example
  • Every VM as a black box:

– 1000 VMs, K metrics,

1 sample/5 min

→ 288 103 K sample/day

  • With clustering:

– 15 clusters, 67 VMs per cluster – 3 representative per cluster

→ 45 VMs, K metrics, 1 sample/5 min

– Non representatives

955 VM, K metrics 1 sample/6 hour →

→ 16,8 103 K sample/day

  • Data collected reduced by 17:1
slide-13
SLIDE 13

December 6th, 2013 - DEIB - PoliMi 13

Improving management scalability

  • Server placement and consolidation
  • Build a small consolidation solution
  • Replicate solution as a building block

Global problem Building block solution Residual problem solution

slide-14
SLIDE 14

December 6th, 2013 - DEIB - PoliMi 14

Reference scenario

  • IaaS with long term commitment

– Amazon Reserved instances, private cloud

  • Reactive VM relocation

– Local manager

  • Periodic global consolidation

– Global optimization

slide-15
SLIDE 15

December 6th, 2013 - DEIB - PoliMi 15

Proposed methodology

  • Methodology:

– Define quantitative model for

VM behavior

– Cluster similar VM together

  • Elect a few (e.g., 3) cluster

representatives

  • Fine-grained monitoring of

cluster representatives

  • Reduced monitoring applied to
  • ther VMs

– Reduced number of metrics – Lower sampling frequency

Extract quantitiative Model of VM behavior Quantitative model Data samples (time series) Clustering Clustering solution

slide-16
SLIDE 16

December 6th, 2013 - DEIB - PoliMi 16

Design choices

  • How to represent VM behavior?
  • Use correlation between metrics

– Possible enhancement: use PCA

  • Use probability distribution of metrics

– Use histograms & Bhattacharyya distance – May need to select which information are

“useful”

– Must merge heterogeneous information from

multiple metrics

– May exploit ensemble techniques to provide

robust performance

– Possible enhancement: use histogram

smoothing

slide-17
SLIDE 17

December 6th, 2013 - DEIB - PoliMi 17

Design choices

  • How to perform clustering?
  • Use K-Means

– When VM behavior is represented as a

feature vector

  • Use spectral clustering

– When VM behavior can be

used to compute distance/ similarity between VMs

slide-18
SLIDE 18

December 6th, 2013 - DEIB - PoliMi 18

Agenda

  • Background and motivation

– IaaS Cloud – Reference scenario – Traditional approach vs. clustering – Impact on monitoring and management

  • Clustering based on metric correlation

– Theoretical model(s) – Experimental evaluation

  • Clustering based on Bhattacharyya distance

– Theoretical model(s) – Experimental evaluation

  • Conclusion and future work
slide-19
SLIDE 19

December 6th, 2013 - DEIB - PoliMi 19

Theoretical model

  • Extraction of a quantitative model
  • f VM behavior

– Input: time series of metrics

describing VM n behavior (X1, ... ,Xm)

– Compute correlation matrix Sn for

each VM n

– Output: feature vectors Vn

Extract quantitiative Model of VM behavior Quantitative model Data samples (time series) Clustering Clustering solution NOTE: We exploit simmetry in matrix Sn to remove redundant information

slide-20
SLIDE 20

December 6th, 2013 - DEIB - PoliMi 20

Theoretical model

  • Clustering of VMs

– Input: feature vector Vi – Clustering based on k-means

algorithm

– Output: clustering solution

Extract quantitiative Model of VM behavior Quantitative model Data samples (time series) Clustering Clustering solution

slide-21
SLIDE 21

December 6th, 2013 - DEIB - PoliMi 21

Case study

  • Datacenter supporting a e-health Web

application

– Web server and DBMS – 110 VMs – 11 metrics for each VM, – Sampling frequency: 5 min

  • Goal: separate Web servers and DBMS

– Main metric: Purity of clustering

  • Three types of analyses

– Impact of time series length – Impact of filtering techniques – Impact of number of nodes

slide-22
SLIDE 22

December 6th, 2013 - DEIB - PoliMi 22

Impact of time series length

  • Reduction of available data

reduction in the purity of clustering →

  • Purity > 0.7

for time series > 20 dd

slide-23
SLIDE 23

December 6th, 2013 - DEIB - PoliMi 23

Impact of filtering techniques

  • Application of data filtering:

– Remove idle periods in time series

  • Data filtering

improves performance

– Removal of

periods providing limited information

  • Purity >0.8

even for 5 days time series

slide-24
SLIDE 24

December 6th, 2013 - DEIB - PoliMi 24

Impact of number of nodes

Number of VMs Purity Clustering time [s]

10 1 49.7 30 0.86 59.5 50 0.84 68.6 70 0.84 78.0 90 0.83 88.3 110 0.84 95.3

  • Purity is not adversely affected by # of VM

– Purity ~ 0.85 for [30-110] VMs

slide-25
SLIDE 25

December 6th, 2013 - DEIB - PoliMi 25

Proposed enhancement

  • The clustering time grows

– Linearly with # of VM – Quadratically with # of metrics

  • → Potential scalability issue
  • Can we reduce the number of metrics?
  • Can we reduce the quadratic relationship?
slide-26
SLIDE 26

December 6th, 2013 - DEIB - PoliMi 26

Proposed enhancement

  • The clustering time grows

– Linearly with # of VM – Quadratically with # of metrics

  • → Potential scalability issue
  • Can we reduce the number of metrics?

NO: clustering purity is heavily affected →

  • Can we reduce the quadratic relationship?

YES: can exploit PCA techniques →

slide-27
SLIDE 27

December 6th, 2013 - DEIB - PoliMi 27

Reducing number of metrics

slide-28
SLIDE 28

December 6th, 2013 - DEIB - PoliMi 28

PCA-based technique

slide-29
SLIDE 29

December 6th, 2013 - DEIB - PoliMi 29

PCA-based technique

  • Building the feature vector:
slide-30
SLIDE 30

December 6th, 2013 - DEIB - PoliMi 30

How many principal components?

  • Use of Skree plot
  • 1 component captures ~60% of variance
  • → good enough for us
slide-31
SLIDE 31

December 6th, 2013 - DEIB - PoliMi 31

Performance evaluation

slide-32
SLIDE 32

December 6th, 2013 - DEIB - PoliMi 32

Performance evaluation

slide-33
SLIDE 33

December 6th, 2013 - DEIB - PoliMi 33

Agenda

  • Background and motivation

– IaaS Cloud – Reference scenario – Traditional approach vs. clustering – Impact on monitoring and management

  • Clustering based on metric correlation

– Theoretical model(s) – Experimental evaluation

  • Clustering based on Bhattacharyya distance

– Theoretical model(s) – Experimental evaluation

  • Conclusion and future work
slide-34
SLIDE 34

December 6th, 2013 - DEIB - PoliMi 34

Modeling VM behavior

  • Model based on probability

distribution of resource usage

– Multiple resources considered

(metrics)

  • Histogram for every metric, every VM

– Normalized histogram (∑h=1) – B: number of buckets (critical)

VM behavior Hist Data samp. Similarity

  • Dist. Mat.

Clustering Clust. solution

slide-35
SLIDE 35

December 6th, 2013 - DEIB - PoliMi 35

Defining VM similarity

  • Use of Bhattacharyya distance

– Determine distance matrix for each

couple of VMs, each metric

VM behavior Hist Data samp. Similarity

  • Dist. Mat.

Clustering Clust. solution

Dm(n1,n2)=−ln(∑i √hn1,i⋅hn2,i)

slide-36
SLIDE 36

December 6th, 2013 - DEIB - PoliMi 36

Merging multi-metric information

  • For each metric we have a different

distance information

  • How to merge the contribution of

each metric?

  • Two solutions:

– Euclidean distance merging – Solve separate clustering problems

and merge clustering solutions (clustering ensemble)

VM behavior Hist Data samp. Similarity

  • Dist. Mat.

Clustering Clust. solution

slide-37
SLIDE 37

December 6th, 2013 - DEIB - PoliMi 37

Merging multi-metric information

Euclidean Distance merging Clustering ensemble

slide-38
SLIDE 38

December 6th, 2013 - DEIB - PoliMi 38

Euclidean distance merging

  • For each VMs n1, n2
  • Open problems:

– Is it correct to consider every

metric together?

– Is there a way to select the right

metrics?

VM behavior Hist Data samp. Similarity

  • Dist. Mat.

Clustering Clust. solution

D(n1,n2)=√∑ Dm(n1,n2)

2⋅am

slide-39
SLIDE 39

December 6th, 2013 - DEIB - PoliMi 39

Choosing the right metrics

  • With euclidean merging multiple

metrics determine the final distance matrix

  • Not every metric provide significant

information

  • Proposal to identify relevant metrics

– Consider auto-correlation: ACF decreasing

rapidly random variations →

– Consider Coefficient of Variation:

CF » 1 spiky and noisy behavior → CF « 1 little information provided →

  • → Merge information from metrics with

– ACF decreasing slowly – CF ~ 1

slide-40
SLIDE 40

December 6th, 2013 - DEIB - PoliMi 40

Clustering ensemble

  • Two-step process
  • For every metric m

– Compute Bhattacharyya distance

matrix

– Compute clustering solution

  • Compute co-occurence matrix A

– For each couple of VMs compute

number of times they are in the same cluster

  • Clustering using matrix A as affinity
  • OK to consider every metric?

– Quorum-based approach ensures

good robustness of results

VM behavior Hist Data samp. Similarity

  • Dist. Mat.

Clustering Clust. solution

slide-41
SLIDE 41

December 6th, 2013 - DEIB - PoliMi 41

Clustering ensemble: example

VM behavior Hist Data samp. Similarity

  • Dist. Mat.

Clustering Clust. solution

A VM1 VM2 VM3 VM4 VM1 3 2 VM2 2 3 1 1 VM3 1 3 3 VM4 1 3 3

Clustering solutions Metric 1 Metric 2 Metric 3 VM1 CL1 CL2 CL2 VM2 CL1 CL2 CL1 VM3 CL2 CL1 CL1 VM4 CL2 CL1 CL1

slide-42
SLIDE 42

December 6th, 2013 - DEIB - PoliMi 42

Clustering ensemble: example

VM behavior Hist Data samp. Similarity

  • Dist. Mat.

Clustering Clust. solution

A VM1 VM2 VM3 VM4 VM1 3 2 VM2 2 3 1 1 VM3 1 3 3 VM4 1 3 3

Clustering solutions Metric 1 Metric 2 Metric 3 VM1 CL1 CL2 CL2 VM2 CL1 CL2 CL1 VM3 CL2 CL1 CL1 VM4 CL2 CL1 CL1

slide-43
SLIDE 43

December 6th, 2013 - DEIB - PoliMi 43

Clustering algorithm

  • Use of spectral clustering algorithm

– Input: Square, symmetric

distance/affinity matrix

– Output: Cluster ID for every VM

  • Additional feature:

– Number of clusters can be

automatically determined through spectral gap analysis

VM behavior Hist Data samp. Similarity

  • Dist. Mat.

Clustering Clust. solution

slide-44
SLIDE 44

December 6th, 2013 - DEIB - PoliMi 44

Case study

  • IaaS cloud supporting e-health

– Web server and DBMS – 110 VMs – 10 metrics for each VM, – Sampling frequency: 5 min – Euclidean merging of metrics

  • Goal: separate Web servers and DBMS

– Main metric: Purity of clustering

  • Three types of analyses

– Impact of time series length – Impact of metric selection – Impact of histogram charact.

slide-45
SLIDE 45

December 6th, 2013 - DEIB - PoliMi 45

Impact of time series length

NOTE: We consider Euclidean distance merging

slide-46
SLIDE 46

December 6th, 2013 - DEIB - PoliMi 46

Impact of metric selection

Network I/O Mem paging # of procs.

slide-47
SLIDE 47

December 6th, 2013 - DEIB - PoliMi 47

Impact of metric selection

slide-48
SLIDE 48

December 6th, 2013 - DEIB - PoliMi 48

Impact of histogram characteristics

slide-49
SLIDE 49

December 6th, 2013 - DEIB - PoliMi 49

Histogram smoothing

  • Bhattacharyya distance affected

by quantization errors in histograms sensitivity to histogram characteristics →

  • Proposal: gaussian smoothing of histograms

before computing Bhattacharyya distance

slide-50
SLIDE 50

December 6th, 2013 - DEIB - PoliMi 50

Histogram smoothing

No smoothing: Bhattacharyya distance 0.496 Smoothing: Bhattacharyya distance 0.297 Reduction by 40%

slide-51
SLIDE 51

December 6th, 2013 - DEIB - PoliMi 51

Effect of histogram smoothing

slide-52
SLIDE 52

December 6th, 2013 - DEIB - PoliMi 52

Clustering Ensemble

  • Overall goal:

– Reduce sensitivity to histogram

characteristics (number of histogram bkts)

– No need to select significant metrics – No smoothing required

  • Potential drawback

– Higher computational cost

slide-53
SLIDE 53

December 6th, 2013 - DEIB - PoliMi 53

Clustering Ensemble

slide-54
SLIDE 54

December 6th, 2013 - DEIB - PoliMi 54

Clustering Ensemble

  • Major stability improvement
  • Almost insensitive to histogram number of

buckets

Clustering ensemble Euclidean merging

slide-55
SLIDE 55

December 6th, 2013 - DEIB - PoliMi 55

Clustering Ensemble

  • Significant performance penalty:
  • 1 clustering for each metric
  • Typically uses more metric than euclidean

merging

Clustering ensemble Euclidean merging

slide-56
SLIDE 56

December 6th, 2013 - DEIB - PoliMi 56

Agenda

  • Background and motivation

– IaaS Cloud – Reference scenario – Traditional approach vs. clustering – Impact on monitoring and management

  • Clustering based on metric correlation

– Theoretical model(s) – Experimental evaluation

  • Clustering based on Bhattacharyya distance

– Theoretical model(s) – Experimental evaluation

  • Conclusion and future work
slide-57
SLIDE 57

December 6th, 2013 - DEIB - PoliMi 57

Conclusion and future work

  • Scalability in IaaS cloud systems
  • pen issue

  • Proposal an analysis of mutiple methodologies to

improve scalability through clustering of similar VMs

– Representing VM behavior using correlation – Reduction of correlation data with PCA – Representing VM behavior with histograms – Euclidean merging of distances – Metric selection – Histogram smoothing – Clustering ensemble

slide-58
SLIDE 58

December 6th, 2013 - DEIB - PoliMi 58

Conclusion and future work

  • Experimental results are encouraging

– Can achieve high clustering purity – Can provide accurate clustering even with

very short time series

– Can provide stable results – Time for clustering is acceptable

  • This is not a crystal ball

– But may be a useful tool

to improve monitoring and management of cloud data centers

slide-59
SLIDE 59

December 6th, 2013 - DEIB - PoliMi 59

Conclusion and future work

  • Future research directions:
  • Evaluate different models for VM behavior
  • Application of clustering to improve

scalability of data center management

slide-60
SLIDE 60

December 6th, 2013 - DEIB - PoliMi 60

References

  • Claudia Canali, Riccardo Lancellotti, "Automatic Virtual Machine

Clustering based on Bhattacharyya Distance for Multi-Cloud Systems", Proc. of 1st International Workshop on Multi-cloud Applications and Federated Clouds (Multi-Cloud'13), Prague, April 2013

  • Claudia Canali, Riccardo Lancellotti, "Automated Clustering of Virtual

Machines based on Correlation of Resource Usage", Journal of Communications Software and Systems (JCOMSS), Vol. 8, No. 4, Dec. 2012

  • Claudia Canali, Riccardo Lancellotti, "Automated Clustering of VMs for

Scalable Cloud Monitoring and Management", 20th International Conference on Software, Telecommunications and Computer Networks (SOFTCOM'12), Split, Croatia, 11-13 Sept. 2012

  • Claudia Canali, Riccardo Lancellotti, "Exploiting Ensemble Techniques for

Automatic Clustering of Virtual Machine Clustering in Cloud Systems", to appear on Automated Software Engineering

  • Claudia Canali, Riccardo Lancellotti, "Improving scalability of cloud

monitoring through PCA-based Clustering of Virtual Machines", to appear on Journal of Computer Science and Technology

slide-61
SLIDE 61

December 6th, 2013 - DEIB - PoliMi 61

Automatic clustering of similar VM to improve the scalability of monitoring and management in IaaS cloud infrastructures

  • C. Canali
  • R. Lancellotti

University of Modena and Reggio Emilia Department of Engineering “Enzo Ferrari”