1 IISWC 2008
Implications of Cache Asymmetry
- n Server Consolidation Performance
Presenter: Omesh Tickoo
Padma Apparao, Ravi Iyer, Don Newell *Hardware Architecture Lab Intel Corporation
Implications of Cache Asymmetry on Server Consolidation Performance - - PowerPoint PPT Presentation
Implications of Cache Asymmetry on Server Consolidation Performance Presenter: Omesh Tickoo Padma Apparao, Ravi Iyer, Don Newell *Hardware Architecture Lab Intel Corporation 1 IISWC 2008 Outline Server Consolidation Asymmetric
1 IISWC 2008
Padma Apparao, Ravi Iyer, Don Newell *Hardware Architecture Lab Intel Corporation
2 IISWC 2008
3 IISWC 2008
– Virtualization and consolidation are a growing trend in datacenters – Majority of servers expected to run consolidated workloads within few years
Workload Single O/S Server
– Performance analysis of consolidation scenarios is challenging
– Server consolidation performance as a function of cache contention & asymmetry
Workload 1 Guest OS Server Workload 2 Guest OS Workload 3 Guest OS VMM or Hypervisor
4 IISWC 2008
5 IISWC 2008
C
Cache
C
Cache
C
Cache
( a) Sym m etric Private Caches of Equal Size
Task1 Task2 Taskx
C
Cache
Taskn
C
Cache
C
Cache
C
Cache
( a) Sym m etric Private Caches of Equal Size
Task1 Task2 Taskx
C
Cache
Taskn
C C
Task1 Task2 Taskn
Cache
Taskx
( b) Virtually Asym m etric Shared Caches of Equal Size
C C
Cache
C C
Task1 Task2 Taskn
Cache
Taskx
( b) Virtually Asym m etric Shared Caches of Equal Size
C C
Cache
C
Cache
C
Cache
C
Cache
( c) Physically Asym m etric Private Caches of Different Size
Task1 Task2 Taskx
C
Cache
Taskn
C
Cache
C
Cache
C
Cache
( c) Physically Asym m etric Private Caches of Different Size
Task1 Task2 Taskx
C
Cache
Taskn
C C
Task1 Task2 Taskn
Cache
Taskx
( d) Virtually & Physically Asym m etric Shared Caches of Different Size
C C
Cache
C C
Task1 Task2 Taskn
Cache
Taskx
( d) Virtually & Physically Asym m etric Shared Caches of Different Size
C C
Cache
W hat are the im plications
6 IISWC 2008
VM/Workload Vcpus Configuration Memory Configuration in MB Java/SPECjbb (bops/sec) 2 2056 Database/Sysbench (Tx/sec) 2 1544 Web/Webench (Tx/sec) 2 1544 Mail/Exchange (hits/sec) 1 1544 Idle 1 418
5 VMs
7 IISWC 2008
LLC LLC LLC LLC Mem ory Xen 3.1 vConsolidate VM VM VM
8 IISWC 2008
– But virtually asymmetric
– But virtually asymmetric also
9 IISWC 2008
SPECjbb Performance (Symmetric Caches) 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 Thruput CPI MPI Metric normalized to 2MB 2MB 3MB 4MB 6MB Sysbench Performance (Symmetric Caches) 0.2 0.4 0.6 0.8 1 1.2 Thruput CPI MPI Metric normalized to 2MB 2MB 3MB 4MB 6MB Webbench Performance (Symmetric Caches) 0.2 0.4 0.6 0.8 1 1.2 Thruput CPI MPI Metric normalized to 2MB 2MB 3MB 4MB 6MB
LLC LLC LLC LLC Mem ory
Virtual Machine (no sharing)
All LLCs
size
Virtual Machine (w/ sharing)
OR
SPECjbb2005 most sensitive to cache – 50% perf improvement from 2MB to 6MB Sysbench and Webbench show less than 10% improvement
10 IISWC 2008
LLC LLC LLC LLC Mem ory
Consolidated Virtual Machines (vCon)
All LLCs
size
SPECjbb Performance with Virtual Cache Asymmetry (6MB)
0.00 0.40 0.80 1.20 1.60
J B B a l
e J B B + J B B J B B + S y s b e n c h J B B + W e b b e n c h J B B i n v C
Metric normalized to when running alone 6MB Thruput 6MB CPI 6MB MPI SPECjbb Performance with Virtual Cache Asymmetry (4MB) 0.00 0.40 0.80 1.20 1.60 JBBalone JBB+JBB JBB+Sysbench JBB+Webbench JBB in vCon Metric normalized to when running alone 4MB Thruput 4MB CPI 4MB MPI
Consolidation causes causes ~30% loss in performance Cache Interference => 20% Core Inteference => 9%
11 IISWC 2008
LLC LLC LLC LLC Mem ory
Individual Virtual Machine
LLCs are 6M size LLCs are smaller (4M, 3M or 2M)
SPECjbb (Physically Asymmetric Caches)
0.00 0.20 0.40 0.60 0.80 1.00 1.20 1.40 1.60 1.80 6-6 6-4 6-3 6-2 Metric normalized to 6MB-6MB Thruput CPI MPI
Sysbench (Physically Asymmetric Caches)
0.00 0.20 0.40 0.60 0.80 1.00 1.20 1.40 6-6 6-4 6-3 6-2 Metric normalized to 6MB-6MB Thruput CPI MPI
Webbench (Physically Asymmetric Caches)
0.00 0.20 0.40 0.60 0.80 1.00 1.20 1.40 6-6 6-4 6-3 6-2 Metric normalized to 6MB-6MB Thruput CPI MPI
SPECjbb2005 is affected the most Sysbench and Webbench are not affected much
12 IISWC 2008
LLC LLC LLC LLC Mem ory
LLCs are 6M size LLCs are smaller (4M, 3M or 2M)
Consolidated Virtual Machines (vCon)
SPECjbb Performance onVirtual+Physical asymmetry
0.00 0.20 0.40 0.60 0.80 1.00 1.20 1.40 1.60 6-6 6-4 6-3 6-2 Metric normalized to 6MB-6MB Thruput CPI MPI
Sysbench Performance on Virtual+Physical Asymmetry
0.00 0.20 0.40 0.60 0.80 1.00 1.20 1.40 1.60 6-6 6-4 6-3 6-2 Metric normalized to 6MB-6MB Thruput CPI MPI WebBench Performance onVirtual+Physical Asymmetry 0.00 0.20 0.40 0.60 0.80 1.00 1.20 1.40 6-6 6-4 6-3 6-2 Metric normalized to 6MB-6MB Thruput CPI MPI
SPECjbb is affected the most (as expected) Sysbench and Webbench are not affected much Opportunity to move Sysbench and Webbench to smaller cache cores => can improve performance of SPECjbb?
13 IISWC 2008
interference
take this into account
and small cores
affinitize
– Cache-sensitive VMs to large-cache-cores – Cache-insensitive VMs to small-cache-cores
JBB vcpu0 (affinitized to 6MB) vcpu1 (floating) % benefit CPI 1.51 1.80 19% MPI 0.0051 0.0070 39% Sysbench vcpu0 (affinitized to 6MB) vcpu1 (floating) % benefit CPI 2.51 2.96 18% MPI 0.0016 0.0020 25% Webbench vcpu0 (6MB cache) vcpu1 (floating) % benefit CPI 2.59 2.88 11% MPI 0.0023 0.0026 11%
Affinitization Experiment: Affinitize one vcpu to large core Leave the other vcpu floating Allows for detection of sensitivity for Improved scheduling
14 IISWC 2008
– Symmetric – Virtual Asymmetry – Physical Asymmetry – Virtual + Physical Asymmetry
workload
– Using vConsolidate & asymmetric CMP platform