Memory Hierarchy Design Memory Hierarchy Design
Chapter 5 and Appendix C
1
Memory Hierarchy Design Memory Hierarchy Design Chapter 5 and - - PowerPoint PPT Presentation
Memory Hierarchy Design Memory Hierarchy Design Chapter 5 and Appendix C 1 Overview Problem CPU vs Memory performance imbalance Solution Driven by temporal and spatial locality Memory hierarchies Memory hierarchies
1
– CPU vs Memory performance imbalance
– Driven by temporal and spatial locality Memory hierarchies – Memory hierarchies
memories
slower secondary storage
the higher levels
2
3
CPU Main Main CPU CPU 400MHz Main Memory 10MHz Main Memory 10MHz CPU Cache Bus 66MHz Bus 66MHz Data object t f Block transfer
transfer
4
5
6
7
8
Cache Main Memory Tag
9
10
11
12
13
14
15
LRU Random LRU Random LRU Random 16KB 5.18% 5.69% 4.67% 5.29% 4.39% 4.96% 64KB 1.88% 2.01% 1.54% 1.66% 1.39% 1.53% 256KB 1 15% 1 17% 1 13% 1 13% 1 12% 1 12%
16
256KB 1.15% 1.17% 1.13% 1.13% 1.12% 1.12%
17
T I d ff 21 8 5 Address
Tag Index offset Data data In out Valid Tag Data (256) Valid Tag Data (256) Write buffer L l l =?
18
Lower level memory
Write address V V V V 100 1 100 104 108 1 1 108 112 1 1 Write address V V V V 100 1 1 1 1
19
25 30 15 20 25 %
Data Cache 5 10 15 % Data Cache Unified 5 1 KB 2 KB 4 KB 8 KB 16 KB 32 KB 64 KB 128 KB C h i Cache size
20
21
22
0 75 * (1 + 0 0064 * 50) + 0 25 * (1 + 0 0647 * 50) 2 05
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
Hit time Pseudo hit time Miss penalty
Miss penalty
38
Address Data Data in out
Tag
=?
Write buffer
=?
39
Lower level memory
40
41
/* Before */ for (i = 0; i < N; i = i+1) for (j = 0; j < N; j = j+1){ for (j = 0; j < N; j = j+1){ r = 0; for (k = 0; k < N; k = k+1) r = r + y[i][k]*z[k][j]; x[i][j] = r; };
–Read all NxN elements of z[] –Read N elements of 1 row of y[] repeatedly –Write N elements of 1 row of x[] –Write N elements of 1 row of x[]
–3 NxNx4 => no capacity misses
42
p y
/* After */ for (jj = 0; jj < N; jj = jj+B) for (kk = 0; kk < N; kk = kk+B) for (kk = 0; kk < N; kk = kk+B) for (i = 0; i < N; i = i+1) for (j = jj; j < min(jj+B-1,N); j=j+1){ r = 0; for(k=kk; k<min(kk+B-1,N);k =k+1) r = r + y[i][k]*z[k][j]; x[i][j] = x[i][j] + r; }; };
43
44
45
46
47
48
49
50
Virtual address TLB Operation Page# Offset TLB Mi Hit Miss Hit Real address Cache Operation Tag Remainder Cache Miss Hit Value
+
Main Page Table
51
Memory Value
– Latency: Cache Miss Penalty
– Bandwidth: I/O & Large Block Miss Penalty (L2)
– Dynamic since needs to be refreshed periodically – Addresses divided into 2 halves (Memory as a 2D matrix): RAS R A St b
No refresh (6 transistors/bit vs 1 transistor /bit area is 10X) – No refresh (6 transistors/bit vs. 1 transistor /bit, area is 10X) – Address not divided: Full addreess
52
CPU CPU CPU
CPU Cache CPU Cache CPU Multiplexor bus Memory bus Memory Memory Memory Memory Cache bus Memory Memory bank 0 Memory bank 1 Memory bank 2 Memory bank 3 Memory bus 256/512 bits
53
32/64 bits
54
55