Characterization & Analysis of a Server Consolidation Benchm - - PowerPoint PPT Presentation

characterization analysis of a server consolidation
SMART_READER_LITE
LIVE PREVIEW

Characterization & Analysis of a Server Consolidation Benchm - - PowerPoint PPT Presentation

Characterization & Analysis of a Server Consolidation Benchm ark on Xen Padm a Apparao Ravi Iyer, Don Newell, Xiaomin Zhang, Tom Adelmeyer Intel Corporation Nov 16 th , 2007 1 Background Virtualization and consolidation are a


slide-1
SLIDE 1

1

Characterization & Analysis of a Server Consolidation Benchm ark

  • n Xen

Padm a Apparao

Ravi Iyer, Don Newell, Xiaomin Zhang, Tom Adelmeyer Intel Corporation Nov 16th, 2007

slide-2
SLIDE 2

2

Background

  • Virtualization and consolidation are a growing trend

in datacenters

– > 40% of servers expected to run a consolidated workload by 2010

  • Problem is there is no analysis methodology or

performance studies in place for understanding consolidated workloads

Workload Single O/S Server Workload 1 Guest OS Server Workload 2 Guest OS Workload 3 Guest OS VMM or Hypervisor

slide-3
SLIDE 3

3

Motivation

  • Performance characterization is useful for

– Providing feedback to IT administrators

– Deployment with fair share of resources to end-users is challenging with virtualization

– Providing feedback to Platform Architects

– To project future platform performance

– How do apps scale for future platforms

– To optimize future architectures for consolidation

– Architectural Effects on consolidation

– Cache and other resource sharing effects

– Virtualization overheads effects

– Providing feedback to VMM developers

– How are the platforms resources (cores/ IO devices scheduled) – Scheduling heuristics may be suboptimal without an execution profile

slide-4
SLIDE 4

4

A Consolidation Benchm ark

  • vConsolidate (vCon) is one of the proposed

benchmarks for virtualization consolidation

– Developed by Intel – VMM agnostic

  • VMmark is another consolidation benchmark

developed by VMware

  • vSPEC is a virtualization benchmark being defined

by the SPEC committee

  • Our focus is vCon
slide-5
SLIDE 5

5

vConsolidate Benchm ark Configuration

  • Various profiles for vCon.
  • We chose profile 3
  • System config:

– Intel dual socket core2-duo machine – Core2-duo processors at 3GHz/ 4MB second level cache – 16GB system memory – Intel VT technology

  • Tools:

– Xentop/ sar – Virtual Emon developed by Intel – Xen code instrumentation

Workload vCPUs vMemory OS App vCPUs vMemory OS App Web Webbench 1 1.0 GB Windows 32-bit IIS 2 1.5 GB Windows 32-bit IIS Mail Loadsim 1 1.0 GB Windows 32-bit Exchange 1 1.5 GB Windows 32-bit Exchange Database Sysbench 1 1.0 GB Windows 32-bit MS SQL 2 1.5 GB Windows 64-bit MS SQL Java SPECjbb 1 1.7 GB Windows 32-bit BEA JVM 2 2.0 GB Windows 64-bit BEA JVM Idle 1 0.4 GB Windows 32-bit 1 0.4 GB Windows 32-bit Workload vCPUs vMemory OS App vCPUs vMemory OS App Web Webbench 2 1.5 GB Linux 32-bit Apache 2 2.0 GB Windows 32-bit IIS Mail Loadsim 1 1.5 GB Windows 32-bit Exchange 2 2.0 GB Windows 32-bit Exchange Database Sysbench 2 1.5 GB Linux 64-bit MySQL 4 2.0 GB Windows 64-bit MS SQL Java SPECjbb 2 2.0 GB Linux 64-bit BEA JVM 4 2.0 GB Windows 64-bit BEA JVM Idle 1 0.4 GB Windows 32-bit 1 0.4 GB Windows 32-bit Profile # 4 Profile # 3 Profile # 1 Profile # 2

slide-6
SLIDE 6

6

Results and Analysis: Perform ance I m pact

Dedicated vs. Consolidated Throughput 0.2 0.4 0.6 0.8 1 1.2 Alone vCon Alone vCon Alone vCon Alone vCon JBB Sysbench WebBench Mail Performance Normalized to a dedicated run

  • Workloads are run alone within

a single VM for the dedicated measurements.

  • SPECjbb loses 37% in

consolidation

  • Sysbench loses 58%
  • Webbench loses 20%
  • and Mail loses 32%
  • Degradation likely to resources

like core/ cache/ memory/ IO/ network contention and due to virtualization overheads

  • Cpu utilization reduction is due

to core contention

CPU utilization for dedicated vs. consolidated workloads 0.00 0.20 0.40 0.60 0.80 1.00 1.20 Alone vCon Alone vCon Alone vCon Alone vCon JBB Sysbench WebBench Mail Cpu% normalized to when running in dedicated mode

slide-7
SLIDE 7

7

Results and Analysis: Architectural characterization

  • Understand where the

performance loss is coming from

– jbb CPI (cycles per instruction) increases 37% – Most of CPI increase is due to L2 MPI (misses per instruction) increase

– Due to cache pollution when running with other workloads – Cache interference

– Similar behavior observed with other workloads too.

Contribution to CPI for SPECjbb

0.5 1 1.5 2 2.5 Score Delta CPI Delta L2 MPI Delta Metric Normalized to Single JBB Alone JBB vCon

slide-8
SLIDE 8

8

Results and Analysis: Cache Scaling

Performance Comparison of workloads in a Dedicated vs. Consolidated environment

0.41 0.76 0.56 0.74 0.69 0.67 0.43 0.39 0.35 0.67 0.64 0.64 0.00 0.20 0.40 0.60 0.80 1.00 1.20 Alone vCon Alone vCon Alone vCon Alone vCon Alone vCon vCon Alone vCon Alone vCon Alone vCon Alone vCon Alone vCon Alone vCon 1M B 2M B 4M B 1M B 2M B 4M B 1M B 2M B 4M B 1M B 2M B 4M B JBB SysBench WebBench M ail

Raw performance norm alized to dedicated environment

  • Useful for understanding the benefit of larger caches to the workloads.
  • Jbb and Sysbench do well with larger caches.
  • Helpful for platform architects in machine decision about cache sizes

for future platforms

Cache Scaling for SPECjbb ( in vCon)

JBB in vCon 1MB 2MB 4MB Jbb Score 1 1.31 1.78 Jbb CPI 1 0.77 0.57 Jbb L2 MPI 1 0.75 0.49

Cache Scaling for Sysbench ( in vCon) Sys in vCon 1MB-S 2MB-S 4MB-S Sys Score 1 1.41 1.60 Sys CPI 1 0.83 0.76 Sys L2 MPI 1 0.70 0.57 Cache Scaling for W ebbench ( in vCon) Web in vCon 1MB-S 2MB-S 4MB-S Web Score 1 1.08 1.18 Web CPI 1 0.92 0.88 Web L2 MPI 1 0.84 0.69

Mail in vCon

1MB-S 2MB-S 4MB-S Mail Score 1 1.15 1.09 Mail CPI 1 1.09 0.71 Mail L2 MPI 1 0.67 1.05 Cache Scaling for Mail ( in vCon)

slide-9
SLIDE 9

9

Results and Analysis: vCon Execution Profile – Life of a VM

  • Understand how a VM behaves over time
  • Instrumented the scheduler to give us data like on

which pcpu the VM is running , how long, where it did migrate to, when did it come back and who ran while it had migrated

– Help understand cache interference – Help understand the behavior of the scheduler

CPU%

  • VM

Measured with Xentop Computed from Scheduler Profile dom0 30% 36% JBB 122% 120% Sys 116% 118% Web 114% 112% Mail 6% 8%

  • We measured cpu utilization with

xentop and using our instrumentation

  • Data pretty close validating our

methodology

slide-10
SLIDE 10

10 10

Results and Analysis: vCon Execution Profile – Life of a VM …….

  • A VM runs on all pcpus, no

particular affinitization

– Shows good dynamic load balancing by the scheduler.

  • A VM comes back to the same

pcpu most of the time

– Helps in reducing cache misses, if the cache was not polluted – What is missing is: what VMs ran during the interim that the VM had migrated extent of cache pollution/ quantification of cache misses

Cpu%

  • VM

Across all cpus pCPU0 pCPU1 pCPU2 pCPU3 Dom0 100% 19% 33% 27% 8% Jbb 100% 32% 28% 20% 25% Sys 100% 26% 25% 28% 24% Web 100% 18% 20% 27% 40% Mail 100% 37% 23% 18% 2%

Dom0 JBB SYS WEB MAIL % time came back to same cpu 95% 87% 92% 92% 97% % Time went to another cpu 5% 13% 8% 8% 3%

Time profile of a vcpu across physcial cpus

1 2 3 4 2000 4000 6000 8000 10000 12000

Thousands

Time Physical cpus JBB-0 JBB-1 Sys-0 Sys-1 Web-0 Web-1

slide-11
SLIDE 11

11 11

Results and Analysis: vCon Execution Profile – Cache I nterference

% Time a VM ran with another VM

28.4% 28.8% 30.3% 9% 1.8% 0% 2% 4% 6% 8% 10% 12% Dom0 Jbb Sys Web Mail

% Cpu Time

0% 8% 16% 24% 32% 40% Dom0 Jbb Sys Web Mail

  • Cache interference impacts the

performance of the workload

  • Find out which Vm/ vcpu shares the

second level cache with another VM/ vcpu and for how much time

  • Of the 30.3% spent in jbb, 10.5%

is spent with sysbench, 10.2 % with webbench, 7.5% with jbb (other vcpu) 2% with Mail

  • Knowing the L2 MPI and CPI

impact of a VM with another VM we can determine the cache interference

slide-12
SLIDE 12

12 12

Results and Analysis: vCon Execution Profile – Cache I nterference……..

Core0 Core2 L2 L2

Core0 Core2 Core1 Core3 Dom0-0 JBB-0 Dom0-1 JBB-1

Core0 Core2 L2 L2

Core0 Core2 Core1 Core3 Dom0-0 JBB-0 Dom0-1 JBB-1

Core0 Core2 L2 L2

Core0 Core2 Core1 Core3 Dom0-0 Sys-0 JBB-0 Dom0-1 Sys-1 JBB-1

Core0 Core2 L2 L2

Core0 Core2 Core1 Core3 Dom0-0 Sys-0 JBB-0 Dom0-1 Sys-1 JBB-1

  • Affintize the vcpus to different cores
  • Get cache MPI with each of the

workloads

  • Jbb loses 16% running with JBB,14 with

Sysbench, 11 with Webbench and 3% with Mail

Impact to SPECjbb L2 MPI due to running with

  • ther workloads

0.00 1.00 2.00 3.00 JBB JBB with JBB JBB with Sys JBB with Web JBB with Mail JBB in vCon L2MPI normalized to when running alone

Impact to SPECjbb CPI due to running with other workloads

0.00 0.40 0.80 1.20 1.60 JBB JBB with JBB JBB with Sys JBB with Web JBB with Mail JBB in vCon CPI normalized to when running alone

slide-13
SLIDE 13

13 13

Form characterization to Modeling

  • How do we build a performance projection model

Virtualized Core interference Cache interference

Virtualization Overheads

cpu% L2 MPI VT events and costs Native Workload Performance

+

Projected performance of Workload in consolidation

Workload

slide-14
SLIDE 14

14 14

W hat kind of hooks w ould w e like?

  • How can we get the execution profile of a VM

without instrumentation?

  • Have some logging in the VMM that gives:

– Where (which pcpu) did the VM start running, what other VMs were running on all other pcpus. – Where (which vcpu) did the VM/ vcpu migrate to? – When did the vcpu come back to the same pcpu? – How long did it run on the pcpu before migration? – What was the longest duration a VM ran on any pcpu/ with another VM

  • While the VM was running, what events (context

switches, interrupts (external and ipis) page faults, tlb flushes etc were generated, and what was the time taken by the VMM to service these events.

slide-15
SLIDE 15

15 15

Sum m ary

  • Need for performance characterization of a

consolidation benchmark

  • Introduction of the vConsolidate benchmark

developed by Intel

  • Dedicated vs. Consolidated performance

– Sysbench and SPECjbb lose the most performance – Degradation due to contention of resources

  • Architectural characterization for cache interference

– Scheduling studies to understand the life of a VM/ vcpu – Affinitization studies to understand cache interference

  • Overview of building a performance projection

model

  • Xen hooks needed
slide-16
SLIDE 16

16 16

Thank you Questions?

slide-17
SLIDE 17

17 17

Server Consolidation Characterization and Modeling IT Infrastructure and Management Future Plaftorm/cpu Architectures VMM Optimizations How best to provision the Applications so as to meet performance and SLA criteria? What platform/ cpu features are needed in the future to support server consolidation How to improve and

  • ptimize scheduling

algorithms for resource management and consolidation?