Real-Time Multi/Many-Core Architecture Heechul Yun 1 Real-Time - - PowerPoint PPT Presentation

real time multi many core
SMART_READER_LITE
LIVE PREVIEW

Real-Time Multi/Many-Core Architecture Heechul Yun 1 Real-Time - - PowerPoint PPT Presentation

Real-Time Multi/Many-Core Architecture Heechul Yun 1 Real-Time Multi/Many-Core Architecture Projects on Real-Time CPU Architectures Assigned Papers Shedding the Shackles of Time-Division Multiplexing, RTSS, 2018 Deterministic


slide-1
SLIDE 1

Real-Time Multi/Many-Core Architecture

Heechul Yun

1

slide-2
SLIDE 2

Real-Time Multi/Many-Core Architecture

  • Projects on Real-Time CPU Architectures
  • Assigned Papers

– Shedding the Shackles of Time-Division Multiplexing, RTSS, 2018 – Deterministic Memory Abstraction and Supporting Multicore System Architecture. ECRTS, 2018

2

slide-3
SLIDE 3

Trends in Automotive E/E Systems

3

  • A. Hamann (Bosch). “Industrial Challenge: Moving from Classical to High-Performance Real-Time Systems.” WATER, 2018.

Source: Bosch

Centralization & High-Performance HW

slide-4
SLIDE 4

Modern System-on-a-Chip (SoC)

4

Core1 Core2 GPU NPU… Memory Controller (MC) Shared Cache

  • Integrate multiple cores, GPU, accelerators
  • Good performance, size, weight, power
  • Challenges: time predictability

DRAM

slide-5
SLIDE 5

Worst-Case Execution Time (WCET)

  • Real-time scheduling theory is based on the

assumption of known WCETs of real-time tasks

5

Image source: [Wilhelm et al., 2008]

slide-6
SLIDE 6

Computing WCET

  • Static analysis

– Input: program code, architecture model – output: WCET – Problem: architecture model is hard and pessimistic

  • Measurement

– No guarantee on true worst-case – But, widely used in practice

6

slide-7
SLIDE 7

Memory Hierarchies, Pipelines, and Buses for Future Architectures in Time-Critical Embedded Systems

IEEE TCAD, 2009

7

slide-8
SLIDE 8

“Problematic” CPU Features

  • Architectures are optimized to reduce average

performance

  • WCET estimation is hard because of

– Pipelining – TLBs/Caches – Super-scalar – Out-of-order scheduling – Branch predictors – Hardware prefetchers – Basically anything that affect processor state

8

slide-9
SLIDE 9

Static Timing Analysis

9

[11]–[13]. control-flo flo first first

  • l-flow

program’ flo control-flo identifies

  • l-flow

processor’ finally control-flo ely—together interactions—to influence influence influence

slide-10
SLIDE 10

Control Flow Graph (CFG)

  • Analyze code
  • Split basic blocks
  • Compute per-block WCET

– use abstract CPU model

10

slide-11
SLIDE 11

Timing Anomalies

  • Locally faster != globally faster

11

Image source: [Wilhelm et al., 2008]

slide-12
SLIDE 12

Timing Anomalies

  • Locally faster != globally faster

12

Image source: [Wilhelm et al., 2008]

slide-13
SLIDE 13

Challenge: Shared Memory Hierarchy

13

  • Memory performance varies widely due to

interference

  • Task WCET can be extremely pessimistic

Core1 Core2 Core3 Core4 Memory Controller (MC) Shared Cache DRAM

Task 1 Task 2 Task 3 Task 4

I D I D I D I D

slide-14
SLIDE 14

Effect of Memory Interference

  • DNN control task suffers >10X slowdown

– When co-scheduling different tasks on on idle cores.

14

2 4 6 8 10 12 DNN (Core 0,1) BwWrite (Core 2,3) Normalized Exeuction Time Solo Corun

DRAM LLC Core1 Core2 Core3 Core4

DNN BwWrite

Waqar Ali and Heechul Yun. “RT-Gang: Real-Time Gang Scheduling Framework for Safety-Critical Systems.” RTAS, 2019 (to appear)

slide-15
SLIDE 15

Cache Denial-of-Service Attacks

15

Michael G. Bechtel and Heechul Yun. “Denial-of-Service Attacks on Shared Cache in Multicore: Analysis and Prevention.” In RTAS, 2019 (to appear, Outstanding Paper Award)

LLC Core1 Core2 Core3 Core4

victim attackers

  • Observed worst-case: >300X (times) slowdown

– On simple in-order multicores (Raspberry Pi3, Odroid C2)

Difficult to guarantee predictable timing

slide-16
SLIDE 16

Real-Time CPU Architectures

  • PRET

– UC Berkeley.

  • MERASA/parMERASA project

– EU

  • ACROSS

– EU

  • ARAMIS

– Germany

  • EMC2

– EU

16

slide-17
SLIDE 17

FlexPRET: A Processor Platform for Mixed-Criticality Systems

RTAS, 2014

17

slide-18
SLIDE 18

18

slide-19
SLIDE 19

PRET Pipeline

19

FETCH DECOD E REGACC MEM EXECUT E EXCEPT FETCH DECOD E REGACC MEM EXECUT E EXCEPT FETCH DECOD E REGACC MEM EXECUT E EXCEPT FETCH DECOD E REGACC MEM EXECUT E EXCEPT FETCH DECOD E REGACC MEM EXECUT E EXCEPT FETCH DECOD E REGACC MEM EXECUT E EXCEPT FETCH DECOD E REGACC MEM EXECUT E EXCEPT FETCH DECOD E REGACC MEM EXECUT E FETCH DECOD E REGACC MEM FETCH DECOD E REGACC FETCH DECOD E FETCH

t

THREAD#1 THREAD#2 THREAD#3 THREAD#4 THREAD#5 THREAD#6

1 clock Thread 1, Instruction 1 Thread 1, Instruction 2

slide-20
SLIDE 20

FlexPRET Pipeline

20

slide-21
SLIDE 21

Hardware Support for WCET Analysis of Hard Real-Time Multicore Systems

ISCA 2009

21

slide-22
SLIDE 22

Analyzable Multicore Architecture

  • Idea1: Bound interference on shared

resources

– On-chip shared bus – (shared) L2 cache

  • Idea2: WCET computation mode

22

slide-23
SLIDE 23

Architecture

23

slide-24
SLIDE 24

Round-Robin Bus Arbitration

  • UBD = (NHRT – 1) * Lbus

24

slide-25
SLIDE 25

Request vs. Job-level WCET Analysis

  • Request-level analysis

– Assume worst-case interference for each access of the task under analysis – Pessimistic as not all accesses will get interference

  • Job-level analysis

– Assume the total number of competing memory access is known – Can reduce pessimism

25

slide-26
SLIDE 26

Summary

  • Timing anomalies

– Locally fast != globally fast on non-timing compositional architectures (i.e., most architectures)

  • Timing compositional architecture

– Free of timing anomalies

26

slide-27
SLIDE 27

Discussion

  • Why is this interesting?
  • Are assumptions realistic?

– Task model – Cache model – Memory model – CPU (pipeline) model

27

slide-28
SLIDE 28

Discussion

  • Why is this interesting?
  • Are assumptions realistic?

– Task model – Cache model – Memory model – CPU (pipeline) model

28

slide-29
SLIDE 29

Atomic vs. Split-Transaction Bus

29

  • J. P. Shen and M. H. Lipasti. Modern Processor Design: Fundamentals of Superscalar Processors. Wav

eland Press, 2013.

slide-30
SLIDE 30

Announcement

  • Mini Project #1
  • DeepPicar Competition

– Build a self-driving car – Based on DeepPicar – Competition format

30

slide-31
SLIDE 31

Acknowledgement

  • Some slides are from:

– Prof. Rodolfo Pellizzoni, University of Waterloo – Prof. Edward A. Lee, University of Berkeley

31