Characterizing Private Clouds: A Large-Scale Empirical Analysis of - PowerPoint PPT Presentation

Characterizing Private Clouds: A Large-Scale Empirical Analysis of Enterprise Clusters Ignacio Cano, Srinivas Aiyar, Arvind Krishnamurthy University of Washington – Nutanix Inc. ACM Symposium on Cloud Computing October 2016 1

Private Clouds 2

Private Clouds • Cloud computing that delivers service to a single organization , as opposed to public clouds, which service many. • Direct control of infrastructure and data . • Carry management and maintenance costs . 3

Motivation • Increasing trend in the use of private clouds within companies. • Private clouds deployments require careful consideration of what will happen in the future: – Capacity – Failures – … 4

Motivation • Research Questions: – What are the most common failures ? Need Measurement Data! – What type of workloads are typically run? – How is the storage used ? What about CPU usage ? – How do additional replicas impact data durability ? – What causes companies to expand their clusters ? 5

Related Work Setting \ Study Hardware Failures Storage Compute Metadata in Windows PCs • HW Failures in PCs • Disk/CPU Usage and Load • [Agrawal et al., TOS’07] Desktops Limited prior work • I/O on Apple computers [Nightingale et al., EuroSys’11] [Bolosky et al., SIGMETRICS’00] [Harter et al., SOSP’11] on Private Clouds! • Workloads characterization [Mishra et al., SIGMETRICS’10] • Data Characteristics and HW reliability Scheduling on • • Public Clouds Access Patterns [Vishwanath et al., SoCC’11] Heterogeneous Clusters [Liu et al., IEEE/ACM CCGrid’13] [Reiss et al., SoCC’12] 6

In this talk • Large-Scale Measurement Study of Private Clouds – Lower hardware failure rates – Nodes overprovisioned – Stable storage and CPU usage • Modeling based on the Measurements – Each extra replica provides substantial durability improvements – Storage needs drive growth more than compute 7

Outline • Large-Scale Measurement Study of Private Clouds – Context – Cluster Profiles – Failure Analysis – Workload Characteristics • Modeling based on the Measurements – Durability – Cluster Growth 8

Operations interposed Random replication Nutanix Clusters at the hypervisor level VMs migration and redirected to CVMs … Integrated Global view of Global view of cluster state Compute-Storage cluster state 10

Clusters Summary Statistics Value # of Clusters 2168 12

Clusters Summary Statistics Value # of Clusters 2168 # of Nodes 13394 6.18 Nodes/Cluster 13

Clusters Summary Statistics Value # of Clusters 2168 # of Nodes 13394 Cluster Sizes 3 - 40 14

Clusters Summary Statistics Value # of Clusters 2168 # of Nodes 13394 Cluster Sizes 3 - 40 # of Disks ~ 70K 15

Node Configurations Storage Compute Configuration Memory (GB) SSD (TB) HDD (TB) Cores Clock Rate (GHz) Config-1 1.6 8 24 2.5 384 Config-2 0.8 4 12 2.4 128 Config-3 0.8 30 16 2.4 256 Storage-heavy Compute-heavy Mostly homogeneous within a cluster 16

Workloads Workload Example Applications Configuration Virtual Desktop Citrix XenDesktop Config-1 Infrastructure VMware Horizon/View 17

Workloads Workload Example Applications Configuration Virtual Desktop Citrix XenDesktop Config-1 Infrastructure VMware Horizon/View SQL Server Config-2 Server Exchange Mail Server Config-3 18

Workloads Workload Example Applications Configuration Virtual Desktop Citrix XenDesktop Config-1 Infrastructure VMware Horizon/View SQL Server Config-2 Server Exchange Mail Server Config-3 Splunk Big Data Config-3 Hadoop 19

Workloads Workload Example Applications Configuration Virtual Desktop Citrix XenDesktop Config-1 Infrastructure VMware Horizon/View SQL Server Config-2 Server Exchange Mail Server Config-3 Splunk Big Data Config-3 Hadoop IT Infrastructure Others Mix Custom applications 20

Distribution of VMs per Node Most 2-4 vCPUs Highest density Median 21 35 1 vCPU Avg. # of VMs per Node 30 2-4 vCPUs > 4 vCPUs 25 20 15 10 5 0 3 4 5 6 7 8 10 12 16 20 32 Lowest density Size of Cluster (# of Nodes) 21

Failures • We only consider failures that require manual intervention, i.e., human operators annotate the cause of the problem. 23

Hardware Failures HDD Memory SSD PSU BIOS-Image Top 3 account for IPMI around 50% of Node Chassis HW failures NIC BMC-Image BMC-Hardware Cables CPU Fan Rail GPU 0 5 10 15 20 24 % of Total Hardware Cases

Annual Return Rate Component ARR (%) HDD 0.76 2-9 % prior studies 25

Annual Return Rate Component ARR (%) HDD 0.76 Lower return rates SSD 0.72 Enterprise-grade 4-10 % prior commodity HW studies (4 years) 26

Workload Characteristics • Usage over time seems to be stable/predictable: 80% of the clusters use – Storage: mean <= 50%, std <= 8% – CPU: mean <= 20%, std <= 5% • SSDs can generally maintain the working set – 80% of nodes use <= 500 GB for the working set 28

Durability Model • Estimate the probability of data loss. • Assumptions: – replication factor of 2 – random replication (replicate to a random node) • The time required to create a new replica when a node goes down: Data to be replicated d ∆ t = Data Remaining ( n − 1) v transfer rate live nodes 30

Durability Model • p (∆t) = probability of node failure in ∆t time. • We decompose the overall period over which we want to provide the durability guarantee into a sequence of intervals , each of length ∆t. • Q = data loss event where two failures occur within ∆t time, i.e. data could not be replicated. 31

Durability Model • Then the probability that there is no data loss in an interval ∆t: P ( ¬ Q, ∆ t ) ≤ (1 − p ( ∆ t )) n + np ( ∆ t )(1 − p ( ∆ t )) n − 1 (1 − p ( ∆ t )) n − 1 The remaining n-1 Exactly one nodes do not fail No failures node fails within ∆t time 32

Durability Model • On a yearly-basis, we consider all ∆t intervals in a year. • Probability of no data loss within a year is: P durability = P ( ¬ Q, ∆ t ) N ( ∆ t ) # of intervals of ∆t time in a year 33

Durability in Private Clouds 1 Fraction of Clusters Rule of Thumb: each additional 0.8 replica provides an additional 5 0.6 9’s of durability Most clusters have 5 9’s with Most clusters have 5 9’s with 0.4 RF2, and 10 9’s with RF3 RF2, and 10 9’s with RF3 0.2 RF2 0 RF3 1e-12 1e-11 1e-10 1e-09 1e-08 1e-07 1e-06 1e-05 0.0001 Data Loss (Probability) 34

Cluster Growth Analysis • Customers periodically add nodes to their existing clusters. • What drives such growth ? • We resort to machine learning – Binary classification problem – Logistic Regression with L1 regularization 36

Cluster Growth Analysis • Use 200 clusters than grew at least once in a period of 8 months. • 15K examples (70% train, 10% val, 20% test). • Train with different combination of features to understand which are important. 37

Features Cluster Features F c Description n(nodes) discretized # of nodes n(vms) # of vms per node Storage Features F s Description r(ssd) ssd usage to ssd capacity ratio r(hdd) hdd usage to hdd capacity ratio r(store) storage usage to total capacity ratio Performance Features F p Description n(vcpus) # of virtual cpus n(iops) # of iops per node 38

What drives cluster growth? Upgrades from 3-4 1. Cluster Size node clusters 2. Storage Needs HDD usage 3. Compute Needs Number of VMs Storage more than compute! 39

Characterizing Private Clouds: A Large-Scale Empirical Analysis of - PowerPoint PPT Presentation

Characterizing Private Clouds: A Large-Scale Empirical Analysis of Enterprise Clusters Ignacio Cano, Srinivas Aiyar, Arvind Krishnamurthy University of Washington Nutanix Inc. ACM Symposium on Cloud Computing October 2016 1 Private

Clouds A B Clouds A Eastern 2/3 of the U.S. Clouds Clouds on Mars are made of _____ . A.

When you look up into the sky, you will often see clouds. No two clouds are the same, and there

2 Microstructures of Warm Clouds Clouds that lie completely below the 0 C isotherm, referred to

A large-scale International IPv6 Network A large-scale International IPv6 Network www.6net.org

6 Artificial Modification of Clouds The microstructures of clouds are influenced by the concen-

4. Droplet Growth in Warm Clouds In warm clouds, droplets can grow by condensation in a

Session 3: Hydrology & Clouds 3:00- 5:30 PM Session 3: Hydrology & Clouds 3:00- 5:30 PM

Mixing Public Mixing Public and private and private clouds clouds a Practical Perspective a

FINANCING LARGE SCALE SOLAR Large Scale Solar Conference - Sydney Gloria Chan Director, Large

Breathing in the Clouds: Thin Air or Bad Atmosphere? G. Vossen: Breathing in the Clouds 1

System for Intel SGX Adil Ahmad Kyungtae Kim Muhammad Ihsanulhaq Sarfaraz Byoungyoung Lee 1

Social Clouds Creating a research agenda Andrew Lippman MIT Media Lab October, 2010 Clouds and

On Inferring and Characterizing On Inferring and Characterizing Internet Routing Policies

Detecting and Detecting and Characterizing Heterogeneity Characterizing Heterogeneity

Functional Principal Component Analysis May 14, 2018 Empirical Principal Component FPC for the

Large-Scale Machine Learning at Twitter 2 Large-Scale Machine Learning at Twitter Jimmy Lin and

Scaling Agile to the Enterprise Enabling the Agile Enterprise Strategically Aligned, Throughput

Establishing Enterprise g p Risk Management in Management Practices Management Practices

The Case for Enterprise Ready Virtual Private Clouds Timothy Wood , Alexandre Gerber * , K.K.

Before starting Candidates? In office you can highly influence spending for projects. You

Selling Drupal to Large Enterprises Felipe Rubim DrupalCon

Pushing Enterprise Software to the Next Level Self-contained Web Applications on In-Memory

Enterprise Risk Management & Commercial Insurance Part of an Ongoing Webinar Series on

Enterprise Architecture CS 4720 Web & Mobile Systems

Sambuz

Useful Links

Newsletter

Mail Us

Characterizing Private Clouds: A Large-Scale Empirical Analysis of - PowerPoint PPT Presentation

Characterizing Private Clouds: A Large-Scale Empirical Analysis of Enterprise Clusters Ignacio Cano, Srinivas Aiyar, Arvind Krishnamurthy University of Washington Nutanix Inc. ACM Symposium on Cloud Computing October 2016 1 Private

Clouds A B Clouds A Eastern 2/3 of the U.S. Clouds Clouds on Mars are made of _____ . A.

When you look up into the sky, you will often see clouds. No two clouds are the same, and there

2 Microstructures of Warm Clouds Clouds that lie completely below the 0 C isotherm, referred to

A large-scale International IPv6 Network A large-scale International IPv6 Network www.6net.org

6 Artificial Modification of Clouds The microstructures of clouds are influenced by the concen-

4. Droplet Growth in Warm Clouds In warm clouds, droplets can grow by condensation in a

Session 3: Hydrology &amp; Clouds 3:00- 5:30 PM Session 3: Hydrology &amp; Clouds 3:00- 5:30 PM

Mixing Public Mixing Public and private and private clouds clouds a Practical Perspective a

FINANCING LARGE SCALE SOLAR Large Scale Solar Conference - Sydney Gloria Chan Director, Large

Breathing in the Clouds: Thin Air or Bad Atmosphere? G. Vossen: Breathing in the Clouds 1

System for Intel SGX Adil Ahmad Kyungtae Kim Muhammad Ihsanulhaq Sarfaraz Byoungyoung Lee 1

Social Clouds Creating a research agenda Andrew Lippman MIT Media Lab October, 2010 Clouds and

On Inferring and Characterizing On Inferring and Characterizing Internet Routing Policies

Detecting and Detecting and Characterizing Heterogeneity Characterizing Heterogeneity

Functional Principal Component Analysis May 14, 2018 Empirical Principal Component FPC for the

Large-Scale Machine Learning at Twitter 2 Large-Scale Machine Learning at Twitter Jimmy Lin and

Scaling Agile to the Enterprise Enabling the Agile Enterprise Strategically Aligned, Throughput

Establishing Enterprise g p Risk Management in Management Practices Management Practices

The Case for Enterprise Ready Virtual Private Clouds Timothy Wood , Alexandre Gerber * , K.K.

Before starting Candidates? In office you can highly influence spending for projects. You

Selling Drupal to Large Enterprises Felipe Rubim DrupalCon

Pushing Enterprise Software to the Next Level Self-contained Web Applications on In-Memory

Enterprise Risk Management &amp; Commercial Insurance Part of an Ongoing Webinar Series on

Enterprise Architecture CS 4720 Web &amp; Mobile Systems

Sambuz

Useful Links

Newsletter

Mail Us

Session 3: Hydrology & Clouds 3:00- 5:30 PM Session 3: Hydrology & Clouds 3:00- 5:30 PM

Enterprise Risk Management & Commercial Insurance Part of an Ongoing Webinar Series on

Enterprise Architecture CS 4720 Web & Mobile Systems