Computer Organization & Assembly Language Programming (CSE 2312)
Lecture 21: Caches Taylor Johnson
Computer Organization & Assembly Language Programming (CSE - - PowerPoint PPT Presentation
Computer Organization & Assembly Language Programming (CSE 2312) Lecture 21: Caches Taylor Johnson Announcements and Outline Programming assignment 1 assigned, due 11/4 by midnight Review Example Debugging UART Interaction with
Lecture 21: Caches Taylor Johnson
November 4, 2014 2
3 November 4, 2014
November 4, 2014 4
5 November 4, 2014
registers.
6 November 4, 2014
7 November 4, 2014
8 November 4, 2014
9 November 4, 2014
10 November 4, 2014
11 November 4, 2014
12 November 4, 2014
13 November 4, 2014
14 November 4, 2014
November 4, 2014 15
16 November 4, 2014
17 November 4, 2014
November 4, 2014 18
November 4, 2014 19
November 4, 2014 20
level
lower level
= 1 – hit ratio
from upper level
November 4, 2014 22
word CPU)
November 4, 2014 23
November 4, 2014 24
25 November 4, 2014
26 November 4, 2014
27 November 4, 2014
from memory, which takes time M.
28 November 4, 2014
29 November 4, 2014
November 4, 2014 30
November 4, 2014 31
32 November 4, 2014
determine that.
33 November 4, 2014
How do we know if
Where do we look?
November 4, 2014 34
#Blocks is a
Use low-order
November 4, 2014 35
Index Tag Data Valid
November 4, 2014 36
Index Tag Data Valid
November 4, 2014 37
Index Tag Data Valid
November 4, 2014 38
November 4, 2014 39
Index V Tag Data 000 N 001 N 010 N 011 N 100 N 101 N 110 N 111 N
November 4, 2014 40
Index V Tag Data 000 N 001 N 010 N 011 N 100 N 101 N 110 Y 10 Mem[ 10 110] 111 N Word addr Binary addr Hit/miss Cache block 22 10 110 Miss 110
November 4, 2014 41
Index V Tag Data 000 N 001 N 010 Y 11 Mem[11010] 011 N 100 N 101 N 110 Y 10 Mem[10110] 111 N Word addr Binary addr Hit/miss Cache block 26 11 010 Miss 010
November 4, 2014 42
Index V Tag Data 000 N 001 N 010 Y 11 Mem[11010] 011 N 100 N 101 N 110 Y 10 Mem[10110] 111 N Word addr Binary addr Hit/miss Cache block 22 10 110 Hit 110 26 11 010 Hit 010
November 4, 2014 43
Index V Tag Data 000 Y 10 Mem[10000] 001 N 010 Y 11 Mem[11010] 011 Y 00 Mem[00011] 100 N 101 N 110 Y 10 Mem[10110] 111 N Word addr Binary addr Hit/miss Cache block 16 10 000 Miss 000 3 00 011 Miss 011 16 10 000 Hit 000
November 4, 2014 44
Index V Tag Data 000 Y 10 Mem[10000] 001 N 010 Y 10 Mem[10010] 011 Y 00 Mem[00011] 100 N 101 N 110 Y 10 Mem[10110] 111 N Word addr Binary addr Hit/miss Cache block 18 10 010 Miss 010
November 4, 2014 45
November 4, 2014 46
3 4 9 10 31 4 bits 6 bits 22 bits
November 4, 2014 47
November 4, 2014 48
November 4, 2014 49
memory takes 100 cycles
November 4, 2014 50
November 4, 2014 51
initialization)
November 4, 2014 52
November 4, 2014 53
November 4, 2014 54
November 4, 2014 55
November 4, 2014 56
November 4, 2014 57
November 4, 2014 58
November 4, 2014 59
November 4, 2014 60
November 4, 2014 61
November 4, 2014 62
Block address Cache index Hit/miss Cache content after access 1 2 3 miss Mem[0] 8 miss Mem[8] miss Mem[0] 6 2 miss Mem[0] Mem[6] 8 miss Mem[8] Mem[6]
November 4, 2014 63
Block address Cache index Hit/miss Cache content after access Set 0 Set 1 miss Mem[0] 8 miss Mem[0] Mem[8] hit Mem[0] Mem[8] 6 miss Mem[0] Mem[6] 8 miss Mem[8] Mem[6]
Fully associative
Block address Hit/miss Cache content after access miss Mem[0] 8 miss Mem[0] Mem[8] hit Mem[0] Mem[8] 6 miss Mem[0] Mem[8] Mem[6] 8 hit Mem[0] Mem[8] Mem[6]
November 4, 2014 64
November 4, 2014 65
November 4, 2014 66
November 4, 2014 67
November 4, 2014 68
November 4, 2014 69
November 4, 2014 70
November 4, 2014 71
November 4, 2014 72
November 4, 2014 73
November 4, 2014 74
for (int j = 0; j < n; ++j) { double cij = C[i+j*n]; for( int k = 0; k < n; k++ ) cij += A[i+k*n] * B[k+j*n]; C[i+j*n] = cij; }
November 4, 2014 75
new accesses
November 4, 2014 76
1 #define BLOCKSIZE 32 2 void do_block (int n, int si, int sj, int sk, double *A, double 3 *B, double *C) 4 { 5 for (int i = si; i < si+BLOCKSIZE; ++i) 6 for (int j = sj; j < sj+BLOCKSIZE; ++j) 7 { 8 double cij = C[i+j*n];/* cij = C[i][j] */ 9 for( int k = sk; k < sk+BLOCKSIZE; k++ ) 10 cij += A[i+k*n] * B[k+j*n];/* cij+=A[i][k]*B[k][j] */ 11 C[i+j*n] = cij;/* C[i][j] = cij */ 12 } 13 } 14 void dgemm (int n, double* A, double* B, double* C) 15 { 16 for ( int sj = 0; sj < n; sj += BLOCKSIZE ) 17 for ( int si = 0; si < n; si += BLOCKSIZE ) 18 for ( int sk = 0; sk < n; sk += BLOCKSIZE ) 19 do_block(n, si, sj, sk, A, B, C); 20 }
November 4, 2014 77
78 November 4, 2014
Unoptimized Blocked
November 4, 2014 79
bytes/sec
~=650 MB
November 4, 2014 80
November 4, 2014 81
November 4, 2014 82
November 4, 2014 83
84 November 4, 2014
November 4, 2014 85
November 4, 2014 86
87 November 4, 2014
88 November 4, 2014
November 4, 2014 89
November 4, 2014 90
November 4, 2014 91
92 November 4, 2014
November 4, 2014 93
November 4, 2014 94
November 4, 2014 95
November 4, 2014 96
November 4, 2014 97
98 November 4, 2014
99 November 4, 2014
November 4, 2014 100
101 November 4, 2014
102 November 4, 2014
103 November 4, 2014
104 November 4, 2014
105 November 4, 2014
106 November 4, 2014
107 November 4, 2014
108 November 4, 2014
109 November 4, 2014
110 November 4, 2014
111 November 4, 2014
112 November 4, 2014
113 November 4, 2014
at a time). Then, striping cannot be used.
114 November 4, 2014
115 November 4, 2014
116 November 4, 2014
117 November 4, 2014
118 November 4, 2014
119 November 4, 2014
120 November 4, 2014
121 November 4, 2014
122 November 4, 2014
November 4, 2014 123
November 4, 2014 124