Real-Time Cache Management for Multi-Core Virtualization Hyoseung - - PowerPoint PPT Presentation

real time cache management for multi core virtualization
SMART_READER_LITE
LIVE PREVIEW

Real-Time Cache Management for Multi-Core Virtualization Hyoseung - - PowerPoint PPT Presentation

EMSOFT 2016 Real-Time Cache Management for Multi-Core Virtualization Hyoseung Kim 1,2 Raj Rajkumar 2 1 University of Riverside, California 2 Carnegie Mellon University EMSOFT 2016 Benefits of Multi-Core Processors Consolidation of real-time


slide-1
SLIDE 1

EMSOFT 2016

Real-Time Cache Management for Multi-Core Virtualization

1 University of Riverside, California 2 Carnegie Mellon University

Hyoseung Kim 1,2 Raj Rajkumar 2

slide-2
SLIDE 2

EMSOFT 2016

Benefits of Multi-Core Processors

  • Consolidation of real-time systems onto a single hardware

platform

– Reduces the number of CPUs and wiring harness among them – Leads to a significant reduction in size, weight, and cost requirements Multi-core platform Single-core Platforms

Workload Consolidation

2/24

slide-3
SLIDE 3

EMSOFT 2016

Virtualization of Real-Time Systems

  • Barriers to consolidation

– Each app. could have been developed independently by different vendors

  • Bare-metal / Proprietary OS
  • Linux / Android

– Different license issues

  • Consolidation via virtualization

– Each application can maintain its own implementation – Minimizes re-certification process – Fault isolation – IP protection, license segregation

Virtualization

Multi-core CPU Real-Time Hypervisor

3/24

slide-4
SLIDE 4

EMSOFT 2016

Virtual Machines and Hypervisor

  • Two-level hierarchical scheduling structure

– Task scheduling on virtual CPUs (VCPUs) by Guest OSs – VCPU scheduling on physical CPUs (PCPUs) by the hypervisor

Virtual Machine (VM)

VCPU Task

Guest OS

Task VCPU Task Task

PCPU

Hypervisor

PCPU

VCPU Task

Guest OS

Task VCPU Task Task 4/24

slide-5
SLIDE 5

EMSOFT 2016

  • Shared last-level cache (LLC)

– Reduces task execution time – Allows consolidating more tasks onto a single hardware platform

  • Cache interference in multi-core virtualization

Shared Cache Interference

Cache interference must be addressed for real-time predictability

① Intra-VCPU cache interference: tasks running on the same VCPU ② Inter-VCPU cache interference: tasks running on different VCPUs

① ②

VM

VCPU Task Task VCPU Task Task

Guest OS

5/24

slide-6
SLIDE 6

EMSOFT 2016

Page Coloring for S/W Cache Control

  • Page coloring

– Software-based, OS-level cache partitioning mechanism – Used by many prior cache management schemes developed for non-virtualized multi-core systems [1, 2, 3, 4]

[1] H. Kim et al. A coordinated approach for practical OS-level cache management in multi-core real-time systems. In ECRTS, 2013. [2] R. Mancuso et al. Real-time cache management framework for multi-core architectures. In RTAS, 2013. [3] N. Suzuki et al. Coordinated bank and cache coloring for temporal protection of memory accesses. In ICESS, 2013. [4] B. C. Ward et al. Making shared caches more predictable on multicore platforms. In ECRTS, 2013..

[ Physically-indexed, set-associative cache ]

Physical address Cache mapping Physical page # Page offset Line offset Set index

Color Index 6/24

slide-7
SLIDE 7

EMSOFT 2016

Challenges in Virtualization (1/2)

  • 1. Page coloring and algorithms based on it do not work in a VM

due to the additional address layer at the hypervisor

  • f a VM

Virtual Machine (VM) Virtual pages Physical pages Physical pages of a host machine

Task 1 Task 2

Hypervisor OS Page Coloring Guest OS No longer mapped to expected cache colors

7/24

slide-8
SLIDE 8

EMSOFT 2016

Challenges in Virtualization (2/2)

  • 2. Even if page coloring works in a VM, legacy systems to be

virtualized may not have page coloring support

– Will suffer from cache interference – Need a support for closed-source guest OSs

  • 3. Prior real-time cache management schemes cannot answer:

– How to find a VM’s resource requirement in the presence of cache interference? – How to allocate the host machine's cache to VMs to be consolidated?

8/24

slide-9
SLIDE 9

EMSOFT 2016

Our Contributions

  • Real-time cache management for multi-core virtualization
  • vLLC and vColoring

– Provide a way to allocate host cache colors to individual tasks running in a virtual machine  First software-based techniques – Prototype implemented in KVM running on x86 and ARM platforms

  • Cache management scheme

– Allocates cache colors to tasks in a VM while satisfying timing constraints – Finds a VM’s CPU demand w.r.t. the number of cache colors assigned to it – Minimizes the total utilization of VMs to be consolidated  First approach

9/24

slide-10
SLIDE 10

EMSOFT 2016

Outline

  • Introduction and Motivation
  • Real-Time Cache Management for Multi-Core Virtualization

– System model – vLLC and vColoring – Cache management scheme

  • Evaluation
  • Conclusions

10/24

slide-11
SLIDE 11

EMSOFT 2016

System Model

  • Hypervisor: implements page coloring
  • Guest OSs: may or may not have page coloring
  • Partitioned fixed-priority scheduling for both the hypervisor & guest OSs
  • VM ≔ (𝑤1, 𝑤2, … , 𝑤𝑂𝑤𝑑𝑞𝑣)
  • VCPU 𝑤𝑗 ≔ 𝐷𝑗

𝑤 𝑙 , 𝑈 𝑗 𝑤

– 𝐷𝑗

𝑤 𝑙 : Execution budget with 𝑙 cache colors assigned to it

– 𝑈𝑗

𝑤: Budget replenishment period

  • Task 𝜐𝑗 ≔ (𝐷𝑗 𝑙 , 𝑈

𝑗, 𝐸𝑗)

– 𝐷𝑗 𝑙 : Worst-case execution time (WCET) with 𝑙 cache colors assigned to it – 𝑈𝑗: Period – 𝐸𝑗: Relative deadline

11/24

slide-12
SLIDE 12

EMSOFT 2016

vLLC: Virtual Last-Level Cache

  • Technique for guest OSs with page coloring (e.g., Linux/RK)

– Provides

  • Virtual LLC (Last-level cache) information
  • Host physical pages corresponding to the virtual LLC

Guest VM

Guest

  • Phy. pages

128KB size 256 sets 16-way

Virtual LLC Info Host

  • Phy. pages

Host LLC

Color 1 Color 2 Color 3 Color 4

Host machine

256KB size 512 sets 16-way

Host LLC Info Virtual LLC

Color 1 Color 2 Colors 2 and 4 Guest Cache Color 1 = Host Cache Color 2, Guest Cache Color 2 = Host Cache Color 4

① ② ③ ④ Page coloring 12/24

slide-13
SLIDE 13

EMSOFT 2016

vLLC: Virtual Last-Level Cache

  • Virtual LLC information

– # of cache colors 𝑜 = 𝑇/(𝑋 ⋅ 𝑄)

  • Trapping and emulating cache-related operations

– x86: executions of a CPUID instruction – ARM Cortex-A15: accesses to CCSIDR and CSSERR registers

  • Limitations

– The number of cache colors is restricted to a power of two – Cannot support a guest OS where page coloring is hard-coded 𝑇: cache size 𝑋: # of ways 𝑄: size of page frame

This is fixed Virtualize these!

13/24

slide-14
SLIDE 14

EMSOFT 2016

vColoring: Virtual Coloring of Cache

  • Technique for guest OSs without page coloring support

– Re-maps guest pages to host pages for the requested cache colors – Applicable to VMs running closed-source, proprietary guest OSs

Task’s Page Table Base Address Host physical pages Color X Color 1

Page migration

① ②

entry ... ... entry ... ... entry ... ...

Guest page table traversal

... ... ...

Find a host page mapped to a guest page ③ ④

Guest VM Host machine

Req. Color 1 Present & user accessible PTEs

Guest page tables are not changed at all

 Cache allocation is transparent to the guest OS

14/24

slide-15
SLIDE 15

EMSOFT 2016

Outline

  • Introduction and Motivation
  • Real-Time Cache Management for Multi-Core Virtualization

– System model – vLLC and vColoring – Cache management scheme

  • Evaluation
  • Conclusions

15/24

slide-16
SLIDE 16

EMSOFT 2016

Allocating Cache Colors to Tasks

  • Two types of cache interference: Inter-VCPU & Intra-VCPU
  • Simple approach 1: Complete cache partitioning (CCP)

– No cache sharing at all – May result in poor performance due to smaller cache size

  • Simple approach 2: Complete cache sharing (CCS) among tasks on the same VCPU

– No cache sharing between tasks on different VCPUs – Bounds intra-VCPU interference with Cache-Related Preemption Delay (CRPD) – May suffer from high CRPD

  • Our approach: Controlled sharing of cache colors on each VCPU

– Goal: finds a cache-to-task allocation that minimizes taskset utilization  NP-hard – Approximates CRPD caused by task 𝜐𝑗 to reduce the complexity

Assuming all other tasks have been assigned all cache colors 16/24

slide-17
SLIDE 17

EMSOFT 2016

Designing a Cache-Aware VM

  • VM’s CPU demand

– The sum of the CPU demands of VCPUs in the VM

  • Our approach: Cache-aware VM designing algorithm (CAVM)

– Phase 1: Allocates cache-sensitive tasks to the same VCPU so that they can benefit from cache sharing

  • After Phase 1, each VCPU has its own taskset

– Phase 2: Derives each VCPU's CPU demands w.r.t. the number of cache colors assigned to it

  • Determines the minimum budget 𝐷𝑗

𝑤(𝑙) for all possible 𝑙 values

Affected by the allocation of tasks and cache colors to VCPUs

17/24

slide-18
SLIDE 18

EMSOFT 2016

Allocating Host Cache Colors to VMs

  • Goal: determines the number of cache colors for each VCPU of the VMs to be

consolidated, while minimizing the total VM utilization

  • Our approach: Dynamic programming

Minimum number of cache colors to satisfy timing constraints Finds the maximum utilization gain made by additional cache colors 18/24

slide-19
SLIDE 19

EMSOFT 2016

Outline

  • Introduction and Motivation
  • Real-Time Cache Management for Multi-Core Virtualization

– System model – vLLC and vColoring – Cache management scheme

  • Evaluation
  • Conclusions

19/24

slide-20
SLIDE 20

EMSOFT 2016

Implementation

  • Experimental setup

– x86: Intel i7-2600 four cores @ 3.4 GHz  8 MB LLC, 32 colors – ARM: Exynos 5422 (four Cortex-A15 cores @ 2 GHz)  2 MB LLC, 32 colors – Hypervisor: Implemented in KVM, but applicable to other hypervisors – Guest OSs: Linux/RK, Vanilla Linux, MS Windows Embedded (x86 only)

  • Implementation overhead

20/24

slide-21
SLIDE 21

EMSOFT 2016

vLLC and vColoring

  • Execution times of a synthetic task

20 40 60 80 100 120 140 160 4 8 12 16 20 24 28 32

  • Norm. Execution Time (%)

# of cache colors Linux/RK w/ vLLC Vanilla Linux w/ vColoring MS Windows w/ vColoring 20 40 60 80 100 120 140 160 4 8 12 16 20 24 28 32

  • Norm. Execution Time (%)

# of cache colors Linux/RK w/ vLLC Vanilla Linux w/ vColoring

x86 ARM

21/24

slide-22
SLIDE 22

EMSOFT 2016

Cache Management Scheme

  • Experimental results with random tasksets

– Quad-core, 2 VMs, 4 VCPUs per VM, 2MB LLC, 10 – 15 tasks – Cache color reload time: 207 𝜈sec (obtained from our ARM board)

  • VM utilization w.r.t. the number of cache colors

1.5 2 2.5 3 3.5 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 Total VM utilization Number of cache colors Ours BFD+CCP WFD+CCP FFD+CCP BFD+CCS WFD+CCS FFD+CCS

Our scheme yields 1.18 - 1.54x lower utilization

Lower is better

22/24

slide-23
SLIDE 23

EMSOFT 2016

Conclusions

  • Real-time cache management for multi-core virtualization
  • vLLC and vColoring

– Hypervisor-level techniques to control cache allocation to individual tasks running in a virtual machine – Evaluated with Linux/RK, vanilla Linux, and MS Embedded Windows

  • Cache management scheme

– Determines cache to task allocation – Designs a VM in the presence of cache interference – Minimizes the total utilization of VMs

  • Future work: main memory interference in virtualization

– vColoring: applicable to DRAM bank partitioning

Up to 1.54x lower utilization

23/24

slide-24
SLIDE 24

EMSOFT 2016

Real-Time Cache Management for Multi-Core Virtualization

Thank You

1 University of Riverside, California 2 Carnegie Mellon University

Hyoseung Kim 1,2 Raj Rajkumar 2