EECS 388: Embedded Systems
- 10. Timing Analysis
Heechul Yun
1
EECS 388: Embedded Systems 10. Timing Analysis Heechul Yun 1 - - PowerPoint PPT Presentation
EECS 388: Embedded Systems 10. Timing Analysis Heechul Yun 1 Agenda Execution time analysis Static timing analysis Measurement based timing analysis 2 Execution Time Analysis Will my brake-by-wire system actuate the brakes
1
2
3
4
5
Image source: [Wilhelm et al., 2008]
6
Loops w/ finite bounds No recursion Run uninterrupted
7
– consider loop bounds
– use abstract CPU model
– by summing up the WCETs of the longest path
8
9
10
Volatile memory Non-volatile memory
11
CPU: 32 bit RISC-V Clock: 320 MHz SRAM: 16 (D) + 16 (I) KB Flash: 4MB 32 bit data bus
12
(Bild: ct.de/Maik Merten (CC BY SA 4.0)) Image source: PC Watch.
CPU: 4x Cortex-A72@1.5GHz L2 cache (shared): 1MB GPU: VideoCore IV@500Mhz DRAM: 1/2/4 GB LPDDR4-3200 Storage: micro-SD
Slide source: Edward A. Lee and Prabal Dutta (UCB)
Valid
Tag Block
Valid
Tag Block
Valid
Tag Block
Set 0 Set 1 Set S
Tag Set index Block offset m-1 s bits t bits b bits
Address
1 valid bit t tag bits B = 2b bytes per block
CACHE A “set” consists of one “line” If the tag of the address matches the tag of the line, then we have a “cache hit.” Otherwise, the fetch goes to main memory, updating the line.
Slide source: Edward A. Lee and Prabal Dutta (UCB)
Valid
Tag Block
Valid
Tag Block
Set 0 Set 1
Tag Set index Block offset m-1 s = 1 bits t = 27 bits b = 4 bits
Address = 32 bits
1 valid bit t tag bits B = 2b bytes per block
CACHE Four floats per block, four bytes per float, means 16 bytes, so b = 4
Slide source: Edward A. Lee and Prabal Dutta (UCB)
Slide source: Edward A. Lee and Prabal Dutta (UCB)
Slide source: Edward A. Lee and Prabal Dutta (UCB)
18
19
Autonomous Car.” In RTCSA, 2018
20
21
# echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor # echo performance > /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor # echo performance > /sys/devices/system/cpu/cpu2/cpufreq/scaling_governor # echo performance > /sys/devices/system/cpu/cpu3/cpufreq/scaling_governor
22
23
24
const int sched_prio_to_weight[40] = { /* -20 */ 88761, 71755, 56483, 46273, 36291, /* -15 */ 29154, 23254, 18705, 14949, 11916, /* -10 */ 9548, 7620, 6100, 4904, 3906, /* -5 */ 3121, 2501, 1991, 1586, 1277, /* 0 */ 1024, 820, 655, 526, 423, /* 5 */ 335, 272, 215, 172, 137, /* 10 */ 110, 87, 70, 56, 45, /* 15 */ 36, 29, 23, 18, 15, };
25
26
27
#define MEM_SIZE (4*1024*1024) char ptr[MEM_SIZE]; while(1) { for(int i = 0; i < MEM_SIZE; i += 64) { sum += ptr[i]; } }
28
29
#define MEM_SIZE (4*1024*1024) char ptr[MEM_SIZE]; while(1) { for(int i = 0; i < MEM_SIZE; i += 64) { ptr[i] = 0xff; } }
30
31
32
Core1 Core2 Core3 Core4 DRAM Memory Controller (MC) Shared Last Level Cache (LLC)
33
Core1 Core2 Core3 Core4 Memory Controller (MC) Shared Cache DRAM
Task 1 Task 2 Task 3 Task 4
I D I D I D I D
34
CPU Memory Hierarchy
T1 T2 Core 1 Memory Hierarchy Core 2 Core 3 Core 4
T 1 T 2 T 3 T 4 T 5 T 6 T 7 T 8
35
2 4 6 8 10 12 DNN (Core 0,1) BwWrite (Core 2,3) Normalized Exeuction Time Solo Corun
DRAM LLC Core1 Core2 Core3 Core4
DNN BwWrite
Waqar Ali and Heechul Yun. “RT-Gang: Real-Time Gang Scheduling Framework for Safety-Critical Systems.” RTAS, 2019
36
https://youtu.be/Jm6KSDqlqiU
37