A Coordinated Approach for Practical OS-Level Cache Management in - - PowerPoint PPT Presentation

a coordinated approach for
SMART_READER_LITE
LIVE PREVIEW

A Coordinated Approach for Practical OS-Level Cache Management in - - PowerPoint PPT Presentation

ECRTS 2013 A Coordinated Approach for Practical OS-Level Cache Management in Multi-Core Real-Time Systems Hyoseung Kim Arvind Kandhalu Prof. Raj Rajkumar Electrical and Computer Engineering Carnegie Mellon University ECRTS 2013 Why


slide-1
SLIDE 1

ECRTS 2013

A Coordinated Approach for Practical OS-Level Cache Management in Multi-Core Real-Time Systems

Hyoseung Kim Arvind Kandhalu

  • Prof. Raj Rajkumar

Electrical and Computer Engineering Carnegie Mellon University

slide-2
SLIDE 2

ECRTS 2013

Why Multi-Core Processors?

  • Processor development trend

– Increasing overall performance by integrating multiple cores

  • Embedded systems: Actively adopting multi-core CPUs
  • Automotive:

– Freescale i.MX6 Quad-core CPU – Qorivva Dual-core ECU

  • Avionics and defense:

– COTS multi-core processors – ex) Rugged Intel i7-based single board computers

Motivation → Coordinated Cache Mgmt → Evaluation → Conclusion

2/24

slide-3
SLIDE 3

ECRTS 2013

Multi-Core CPUs for Real-Time Systems

  • Large shared cache in COTS multi-core processors
  • Use of shared cache in real-time systems

– Reduce task execution time – Consolidate more tasks on a single multi-core chip processor – Implement a cost-efficient real-time system

Intel Core i7 8-15 MB L3 Cache Freescale i.MX6 1MB L2 Cache

Motivation → Coordinated Cache Mgmt → Evaluation → Conclusion

3/24

slide-4
SLIDE 4

ECRTS 2013

Uncontrolled Shared Cache

  • 1. Inter-core Interference

C1 C2 C3

L1 L2 L1 L2 L1 L2 L3

C4

L1 L2

Cores Private Caches Shared Cache

  • 2. Intra-core Interference

C1 C2 C3

L1 L2 L1 L2 L1 L2 L3

C4

L1 L2

Cores Private Caches Shared Cache

40% Slowdown* 27% Slowdown*

Uncontrolled use of shared cache  Severely degrade the predictability of real-time systems

Tasks Tasks

* PARSEC Benchmark on Intel i7

Motivation → Coordinated Cache Mgmt → Evaluation → Conclusion

4/24

slide-5
SLIDE 5

ECRTS 2013

Physical page #

Cache Partitioning

  • Page coloring (S/W cache partitioning)

– Can be implemented on COTS multi-core processors – Provides cache performance isolation among tasks

Task virtual address Physical address Cache mapping

Virtual page # Page offset Page offset Line offset Set index

g bits (Page size : 2g)

l bits (cache-line: 2l ) s bits (# of sets: 2s) (s+ l – g) bits Color Index

Motivation → Coordinated Cache Mgmt → Evaluation → Conclusion

5/24

slide-6
SLIDE 6

ECRTS 2013

Problems with Page Coloring (1/2)

  • 1. Memory co-partitioning problem

– Physical pages are grouped into memory partitions – Memory usage ≠ Cache usage

Color Index 0 Color Index 1

……

Color Index 29 Color Index 30 Color Index 31

Cache partitions

1 2

i i+1

Task τ1 Physical pages (Memory partitions) Virtual Address Space

1 2

i i+1

Task τ2 ……

If 𝜐2’s memory usage < 2 memory partitions  Memory wastage If 𝜐1’s memory usage > 1 memory partition  Page swapping or memory pressure

Motivation → Coordinated Cache Mgmt → Evaluation → Conclusion

6/24

slide-7
SLIDE 7

ECRTS 2013

Problems with Page Coloring (2/2)

  • 2. Limited number of cache partitions

– Results in degraded performance as the number of tasks increases – The number of tasks cannot exceed the number of cache partitions

Color Index 0 Color Index 1

……

Color Index 29 Color Index 30 Color Index 31

Task τ1 Task τ2 Task τ32

32 Cache partitions

Task τ30 Task τ31

32 Tasks

Motivation → Coordinated Cache Mgmt → Evaluation → Conclusion

7/24

slide-8
SLIDE 8

ECRTS 2013

Our Goals

  • Challenges

– Uncontrolled shared cache: Cache interference penalties – Cache partitioning (page coloring):

  • Memory co-partitioning  Memory wastage or shortage
  • Limited number of cache partitions
  • Key idea: Controlled sharing of partitioned caches

while maintaining timing predictability

1. Provide predictability on multi-core real-time systems 2. Mitigate the problems of memory co-partitioning, limited partitions 3. Allocate cache partitions efficiently

Motivation → Coordinated Cache Mgmt → Evaluation → Conclusion

8/24

slide-9
SLIDE 9

ECRTS 2013

Outline

  • Motivation
  • Coordinated Cache Management

– System Model – Per-core Cache Reservation – Reserved Cache Sharing – Cache-Aware Task Allocation

  • Evaluation
  • Conclusion

9/24

slide-10
SLIDE 10

ECRTS 2013

System Model

  • Task Model 𝜐𝑗: 𝐷𝑗

𝑞, 𝑈𝑗, 𝐸𝑗, 𝑁𝑗

– 𝐷𝑗

𝑞: Worst-case execution time (WCET) of task 𝜐𝑗,

when it runs alone in a system with 𝑞 cache partitions – 𝑈

𝑗: Period of task 𝜐𝑗

– 𝐸𝑗: Relative deadline of task 𝜐𝑗 – 𝑁𝑗: Maximum physical memory requirement of task 𝜐𝑗

  • Partitioned fixed-priority preemptive scheduling
  • Assumptions

– Tasks do not self-suspend – Tasks do not share memory

# of cache partitions

WCET

1 2 3 4 5 6

 𝐷𝑗

𝑞 is non-increasing with 𝑞 Motivation → Coordinated Cache Mgmt → Evaluation → Conclusion

10/24

slide-11
SLIDE 11

ECRTS 2013

Mechanisms for controlled sharing of cache partitions Policy module controlling the mechanisms

Coordinated Cache Management

Tasks

+ + +

Bounded Penalties Memory partitions Cache partitions

1 2 3 NP -1 NP

  • 1. Per-core Cache

Reservation

  • 2. Reserved

Cache Sharing

  • 3. Cache-Aware

Task Allocation Task Parameters

Coordinated Cache Management τi :(Ci

p, Ti, Di, Mi )

Core 1 Core 2 Core NC

Partitioned Fixed-priority Scheduling

… …

Page Coloring (Cache Partitioning)

Per-core cache reservation  Prevent Inter-core cache interference Reserved cache sharing: Mitigate the problems with page coloring Considerations 1. Preserving schedulability

  • 2. Guaranteeing memory requirements

Motivation → Coordinated Cache Mgmt → Evaluation → Conclusion

11/24

slide-12
SLIDE 12

ECRTS 2013

Intra-Core Cache Interference

  • 1. Cache warm-up delay

– Occurs at the beginning of each period of a task – Caused by the executions of other tasks while the task is inactive

  • 2. Cache-related preemption delay

– Occurs when a task is preempted by a higher-priority task – Imposed on the preempted task

𝜐1

t+1 t+2 t+3 t+4 t+5 t+6 t+7 t+8 t+9 t+10 t (High) (Low) Time Tasks

𝜐2

Preemption Resumption 𝜐2 arrival

𝐷1=3 𝐷2=3

𝜐1 arrival

Cache warm-up delay Cache-related preemption delay

Motivation → Coordinated Cache Mgmt → Evaluation → Conclusion

Bounds intra-core cache interference Our RT-test Independent of specific cache analysis used Allows estimating WCET in isolation from others

12/24

slide-13
SLIDE 13

ECRTS 2013

Page Allocation for Cache Sharing

  • Sharing cache partitions = Sharing memory partitions

– Cache sharing can be restricted by task memory requirements – Depends on how pages are allocated

  • Our approach

– Allocate pages to a task from memory partitions in round-robin order

Color Index 0 Color Index 1

…… Cache partitions Task τ1 Memory partitions Virtual Address Space

8 pages

1 2

Motivation → Coordinated Cache Mgmt → Evaluation → Conclusion

4 pages from each

 Bounds the worst-case memory usage in a memory partition  Developed a memory feasibility test for cache-partition sharing

13/24

slide-14
SLIDE 14

ECRTS 2013

Coordinated Cache Management

Tasks

+ + +

Bounded Penalties Memory partitions Cache partitions

1 2 3 NP -1 NP

  • 1. Per-core Cache

Reservation

  • 2. Reserved

Cache Sharing

  • 3. Cache-Aware

Task Allocation

Coordinated Cache Management

Core 1 Core 2 Core NC

Partitioned Fixed-priority Task Scheduling

… …

Page Coloring (Cache Partitioning) Task Parameters

τi :(Ci

p, Ti, Di, Mi ) Cache-Aware Task Allocation

 Algorithm to allocate tasks and cache partitions to cores

Motivation → Coordinated Cache Mgmt → Evaluation → Conclusion

14/24

slide-15
SLIDE 15

ECRTS 2013

Cache-Aware Task Allocation (1/2)

  • Objectives

– Reduce the number of cache partitions required for a given taskset

  • Remaining cache partitions Non-real-time tasks

Saving CPU usage

– Exploit the benefits of cache sharing

  • Our approach

– Based on the BFD (best-fit decreasing) bin-packing heuristic

  • Load concentration is helpful for cache sharing

– Gradually assign caches to cores while allocating tasks to cores

  • Use cache reservation and cache sharing during task allocation

Motivation → Coordinated Cache Mgmt → Evaluation → Conclusion

15/24

slide-16
SLIDE 16

ECRTS 2013

  • Step 1: Each core is initially assigned zero cache partitions
  • Step 2: Find a core where a task fits best
  • Step 3: If not found, try to find the best-fit core for the task, assuming

each core has 1 more cache partition than before

  • Step 4: Once found, the best-fit core is assigned the task and

the assumed cache partition(s)

𝜐4 0.2

𝜐1 0.7

Cache-Aware Task Allocation (2/2)

Core 1 Core 2 Core 3 Core 4

Tasks:

𝜐2 0.4

𝜐3 0.3 Available cache partitions: 𝜐1 0.7 𝜐2 0.4 𝜐3 0.3

𝜐1 0.5

𝜐4 0.2

Assigned cache partitions

Remaining space: 0.3

Utilization of 𝜐1 decreased (Ui = Ci / Ti)

Remaining space: 0.5 (Harmonic)

Motivation → Coordinated Cache Mgmt → Evaluation → Conclusion

16/24

slide-17
SLIDE 17

ECRTS 2013

Outline

  • Motivation
  • Coordinated Cache Management

– Task model – Per-core Cache Reservation – Reserved Cache Sharing – Cache-Aware Task Allocation

  • Evaluation
  • Conclusion

17/24

slide-18
SLIDE 18

ECRTS 2013

Implementation

  • Based on Linux/RK Memory Reservation

– Page pool stores unallocated physical pages – Classifies pages into memory partitions with their color indices

Motivation → Coordinated Cache Mgmt → Evaluation → Conclusion

Page Pool of Linux/RK Memory Reservation

Mem-partition header Pages in Mem-partition Cache color index: 1 Cache color index: NP

Cache color index: 2 Task i : Parameters

  • Mem Req Mi = m pages
  • Cache indices, Core index

RT Taskset

c

Task i : CPU/Mem reserve with cache partitions

 

i i i p i i

M D T C , , , : 

18/24

slide-19
SLIDE 19

ECRTS 2013

Experimental Setup

  • Target system and system parameters

– Implemented in Linux/RK (Linux 2.6) – Intel i7-2600 quad-core processor – 8 MB shared L3 cache – Physical memory 1GB 2GB – Number of tasks: 𝑜 = {8, 12, 16}

  • Task functions are from the PARSEC benchmarks
  • Mixture of cache-sensitive and cache-insensitive tasks
  • 𝐷𝑗

𝑞 and 𝑁𝑗 for tasks are estimated ahead of time

(𝑁𝑢𝑝𝑢𝑏𝑚)

 𝑂𝐷 = 4 cores  𝑂𝑄 = 32 cache partitions  Size of a mem-partition 32MB 64MB

Motivation → Coordinated Cache Mgmt → Evaluation → Conclusion

19/24

slide-20
SLIDE 20

ECRTS 2013

Evaluation Methodology

  • Metrics

1. Cache partition usage 2. CPU utilization

  • Evaluated schemes

1. BFD: Best-Fit Decreasing + Page Coloring 2. WFD: Worst-Fit Decreasing + Page Coloring

  • No cache partition sharing

3. CATA: Our scheme (Cache-Aware Task Allocation)

Motivation → Coordinated Cache Mgmt → Evaluation → Conclusion

20/24

slide-21
SLIDE 21

ECRTS 2013

Cache Partition Usage

  • Minimum amount of cache required to schedule given tasksets

CATA requires 12-25% fewer cache partitions than BFD and WFD

N/A

20 40 60 80 100 8 tasks 12 tasks 16 tasks 8 tasks 12 tasks 16 tasks Cache Usage (%) Mtotal = 1 GB Mtotal = 2 GB BFD WFD CATA * Smaller is better

Fewer cache partitions  Fewer memory partitions  Mitigates the memory wastage of page coloring

Motivation → Coordinated Cache Mgmt → Evaluation → Conclusion

21/24

slide-22
SLIDE 22

ECRTS 2013

CPU Utilization

  • Total accumulated CPU utilization required to schedule given tasksets

– Same number of cache partitions is used (𝑂𝑄 = 32)

N/A 100 200 300 400 8 tasks 12 tasks 16 tasks 8 tasks 12 tasks 16 tasks Total CPU Util. (%) BFD WFD CATA * Smaller is better Mtotal = 1 GB Mtotal = 2 GB

CATA requires 14-49% less CPU utilization than BFD and WFD More number of tasks  Larger utilization benefit  Mitigates the limited availability of cache partitions Our scheme Efficient allocation of cache partitions Mitigates the two problems with page coloring

16-32% 35-44% 49% 14-29% 30-38% 40-41%

Motivation → Coordinated Cache Mgmt → Evaluation → Conclusion

22/24

slide-23
SLIDE 23

ECRTS 2013

Conclusions

  • Multi-core CPUs for real-time systems

– Uncontrolled shared cache: temporal interference among tasks – Page coloring: memory wastage/shortage, limited partitions

  • Coordinated OS-Level Cache Management

– No special H/W support, No modifications to application S/W – Per-core cache reservation & Reserved cache sharing

  • Preserves task schedulability
  • Guarantees task memory requirements

– Cache-aware task allocation

  • Determines efficient task and cache allocation
  • Yields 9-18% improvement in utilization on real platforms

Motivation → Coordinated Cache Mgmt → Evaluation → Conclusion

23/24

slide-24
SLIDE 24

ECRTS 2013

Linux/RK

  • https://rtml.ece.cmu.edu/redmine/projects/rk/

Motivation → Coordinated Cache Mgmt → Evaluation → Conclusion

24/24

  • x86 (32/64bit)
  • ARM (Cortex-A9)
  • Global/Partitioned scheduling
  • CPU/Mem reservation
  • Cache/Bank coloring
  • Task profiling mechanism