Administrivia
- Mini project deadline: today
– Attach the capture of the evaluation run output
- Guest lecture on Friday
– Algorithmic Verification of Stability of Hybrid Systems by Dr. Pavithra Prabhakar. K-State
1
Administrivia Mini project deadline: today Attach the capture of - - PowerPoint PPT Presentation
Administrivia Mini project deadline: today Attach the capture of the evaluation run output Guest lecture on Friday Algorithmic Verification of Stability of Hybrid Systems by Dr. Pavithra Prabhakar. K-State 1 Administrivia
– Attach the capture of the evaluation run output
– Algorithmic Verification of Stability of Hybrid Systems by Dr. Pavithra Prabhakar. K-State
1
– Original research
– Building a cyber-physical system (robot)
selected hardware platform
– Repeating the evaluation of a chosen paper
2
3
– Lots of sensor data to process – More performance, less cost – Save space, weight, power (SWaP)
4
5
CPU Memory Hierarchy
Unicore
T1 T2 Core 1 Memory Hierarchy Core 2 Core 3 Core 4
Multicore
T 1 T 2 T 3 T 4 T 5 T 6 T 7 T 8
– Bigger, more complex application – Large amount of data processing
6
– Out-of-order core
– Multicore
– Accelerator
7
Part 1 Part 2 Part 3 Part 4
8
Core1 Core2 Core3 Core4 DRAM Memory Controller LLC LLC LLC LLC
9
CORE 1
L2 CACHE 0
SHARED L3 CACHE DRAM INTERFACE
CORE 0 CORE 2 CORE 3
L2 CACHE 1 L2 CACHE 2 L2 CACHE 3
DRAM BANKS
DR DRAM MEM EMORY CONTRO ROLLER
This slide is from Prof. Onur Mutlu
10
Memory channel Memory channel DIMM (Dual in-line memory module) Processor “Channel”
This slide is from Prof. Onur Mutlu
DIMM (Dual in-line memory module) Side view Front of DIMM Back of DIMM
This slide is from Prof. Onur Mutlu
DIMM (Dual in-line memory module) Side view Front of DIMM Back of DIMM Rank 0: collection of 8 chips Rank 1
This slide is from Prof. Onur Mutlu
Rank 0 (Front) Rank 1 (Back) Data <0:63> CS <0:1> Addr/Cmd <0:63> <0:63> Memory channel
This slide is from Prof. Onur Mutlu
Rank 0 <0:63> Chip 0 Chip 1 Chip 7
<0:7> <8:15> <56:63> Data <0:63>
This slide is from Prof. Onur Mutlu
Chip 0 <0:7> Bank 0
<0:7> <0:7> <0:7>
<0:7>
This slide is from Prof. Onur Mutlu
Bank 0 <0:7>
row 0 row 16k-1
2kB
1B
1B (column)
1B
Row-buffer
1B
<0:7>
This slide is from Prof. Onur Mutlu
0xFFFF…F 0x00 0x40
64B cache block Physical memory space
Channel 0 DIMM 0 Rank 0
This slide is from Prof. Onur Mutlu
0xFFFF…F 0x00 0x40
64B cache block Physical memory space
Rank 0
Chip 0 Chip 1 Chip 7
<0:7> <8:15> <56:63> Data <0:63>
This slide is from Prof. Onur Mutlu
0xFFFF…F 0x00 0x40
64B cache block Physical memory space
Rank 0
Chip 0 Chip 1 Chip 7
<0:7> <8:15> <56:63> Data <0:63>
Row 0 Col 0
This slide is from Prof. Onur Mutlu
0xFFFF…F 0x00 0x40
64B cache block Physical memory space
Rank 0
Chip 0 Chip 1 Chip 7
<0:7> <8:15> <56:63> Data <0:63>
8B Row 0 Col 0
8B
This slide is from Prof. Onur Mutlu
0xFFFF…F 0x00 0x40
64B cache block Physical memory space
Rank 0
Chip 0 Chip 1 Chip 7
<0:7> <8:15> <56:63> Data <0:63>
8B Row 0 Col 1
This slide is from Prof. Onur Mutlu
0xFFFF…F 0x00 0x40
64B cache block Physical memory space
Rank 0
Chip 0 Chip 1 Chip 7
<0:7> <8:15> <56:63> Data <0:63>
8B 8B Row 0 Col 1
8B
This slide is from Prof. Onur Mutlu
0xFFFF…F 0x00 0x40
64B cache block Physical memory space
Rank 0
Chip 0 Chip 1 Chip 7
<0:7> <8:15> <56:63> Data <0:63>
8B 8B Row 0 Col 1
A 64B cache block takes 8 I/O cycles to transfer. During the process, 8 columns are read sequentially.
This slide is from Prof. Onur Mutlu
L3 DRAM DIMM Memory Controller (MC) Bank 4 Bank 3 Bank 2 Bank 1
Core1 Core2 Core3 Core4
accessed in parallel
L3 DRAM DIMM Memory Controller (MC) Bank 4 Bank 3 Bank 2 Bank 1
Core1 Core2 Core3 Core4
– DDR3 1333Mhz
L3 DRAM DIMM Memory Controller (MC) Bank 4 Bank 3 Bank 2 Bank 1
Core1 Core2 Core3 Core4
– DDR3 1333Mhz
L3 DRAM DIMM Memory Controller (MC) Bank 4 Bank 3 Bank 2 Bank 1
Core1 Core2 Core3 Core4
– Less than peak b/w – How much?
L3 DRAM DIMM Memory Controller (MC) Bank 4 Bank 3 Bank 2 Bank 1
Core1 Core2 Core3 Core4
Bank 4
Row 1 Row 2 Row 3 Row 4 Row 5 Bank 1 Row Buffer Bank 2 Bank 3 activate precharge Read/write
– Row miss: 19 cycles, Row hit: 9 cycles
(*) PC6400-DDR2 with 5-5-5 (RAS-CAS-CL latency setting)
READ (Bank 1, Row 3, Col 7)
Col7
31
Kim et al., “Bounding Memory Interference Delay in COTS-based Multi-Core Systems,” RTAS’14
– Translate requests to DRAM command sequences – Timing constraints: e.g., minimum write-to-read delay, activation time, … – Resource conflicts: bank, bus, channel
– Buffering, reordering, pipelining in scheduling requests
32
– Buffer read/write requests from CPU cores – Unpredictable queuing delay due to reordering
33
Bruce Jacob et al, “Memory Systems: Cache, DRAM, Disk” Fig 13.1.
34
Core1: READ Row 1, Col 1 Core2: READ Row 2, Col 1 Core1: READ Row 1, Col 2 Core1: READ Row 1, Col 1 Core1: READ Row 1, Col 2 Core2: READ Row 2, Col 1
DRAM DRAM Initial Queue Reordered Queue 2 Row Switch 1 Row Switch
– Keep the row open after an access
– Close the row after an access
35
36
– Each mem req. access ALL banks
– Each core has dedicated DRAM banks
– Use analysis friendly scheduling (e.g., round-robin)
37
38
CODES+ISSS 2007.
IEEE Embedded Systems Letters, 2009
and temporal isolation, CODES+ISSS, 2011
evaluation of a space case study, RTAS, 2015
Controllers with Read/Write Bundling, 2016
2016
39