Carnegie Mellon
1
Cache Lab Implementation and Blocking Slides courtesy of: Aditya - - PowerPoint PPT Presentation
Carnegie Mellon Cache Lab Implementation and Blocking Slides courtesy of: Aditya Shah, CMU 1 Carnegie Mellon Welcome to the World of Pointers ! 2 Carnegie Mellon Outline Schedule Memory organization Caching Different types
Carnegie Mellon
1
Carnegie Mellon
2
Carnegie Mellon
3
Schedule Memory organization Caching
Cache lab
Carnegie Mellon
4
SRAM (cache)
DRAM (main memory)
Carnegie Mellon
5
Temporal locality
Spatial locality
Carnegie Mellon
6
64-bit on shark machines Block offset: b bits Set index: s bits Tag Bits: (Address Size – b – s)
Carnegie Mellon
7
A cache is a set of 2^s cache sets A cache set is a set of E cache lines
Each cache line stores a block
Total Capacity = S*B*E
Carnegie Mellon
8
E lines per set S = 2s sets
1 2 B-1 tag v
valid bit B = 2b bytes per cache block (the data)
t bits s bits b bits
Address of word: tag set index block
data begins at this offset
Carnegie Mellon
9
Part (a) Building a cache simulator Part (b) Optimizing matrix transpose
Carnegie Mellon
10
A cache simulator is NOT a cache!
Your cache simulator needs to work for different s, b, E,
Use LRU – Least Recently Used replacement policy
Carnegie Mellon
11
A cache is just 2D array of cache lines:
Each cache_line has:
Carnegie Mellon
12
getopt() automates parsing elements on the unix command
Carnegie Mellon
13
A switch statement is used on the local variable holding
Think about how to handle invalid inputs For more information,
Carnegie Mellon
14
i nt m a i n( i nt a r gc , c ha r ** a r gv) { i nt
x, y; / * l oopi ng ove r a r gum e m e nt s */ whi l e ( - 1 ! = ( opt = = ge t opt ( a r gc , a r gv, “ x: y: " ) ) ) { / * de t e t e r m i ne whi c h a r gum e m e nt i t ’ s s pr oc e c e s s i ng * */ s wi t c h c h( opt ) { c a s e e ' x' : x = a t oi ( opt a r a r g) ; br e a k; c a s e e ‘ y' : y = a t oi ( o ( opt a r g) ; br e a k; de f a u a ul t : pr i nt f ( “ wr ong a r gu gum e nt \ n" ) ; br e a k; } } }
Suppose the program executable was called “foo”.
Carnegie Mellon
15
The fscanf() function is just like scanf() except it can specify
For more information,
fscanf will be useful in reading lines from the trace files.
Carnegie Mellon
16
Carnegie Mellon
17
Use malloc to allocate memory on the heap Always free what you malloc, otherwise may
Don’t free memory you didn’t allocate
Carnegie Mellon
18
Matrix Transpose (A -> B)
How do we optimize this operation using the
Carnegie Mellon
19
Suppose Block size is 8 bytes ? Access A[0][0] cache miss
Access B[0][0] cache miss
Access A[0][1] cache hit Access B[1][0] cache miss
Carnegie Mellon
20
Blocking: divide matrix into sub-matrices. Size of sub-matrix depends on cache block size,
Try different sub-matrix sizes.
Carnegie Mellon
21
c = (double *) calloc(sizeof(double), n*n); /* Multiply n x n matrices a and b */ void mmm(double *a, double *b, double *c, int n) { int i, j, k; for (i = 0; i < n; i++) for (j = 0; j < n; j++) for (k = 0; k < n; k++) c[i*n + j] += a[i*n + k] * b[k*n + j]; }
Carnegie Mellon
22
Assume:
First iteration:
8 wide
Carnegie Mellon
23
Assume:
Second iteration:
Total misses:
8 wide
Carnegie Mellon
24
c = (double *) calloc(sizeof(double), n*n); /* Multiply n x n matrices a and b */ void mmm(double *a, double *b, double *c, int n) { int i, j, k; for (i = 0; i < n; i+=B) for (j = 0; j < n; j+=B) for (k = 0; k < n; k+=B) /* B x B mini matrix multiplications */ for (i1 = i; i1 < i+B; i++) for (j1 = j; j1 < j+B; j++) for (k1 = k; k1 < k+B; k++) c[i1*n+j1] += a[i1*n + k1]*b[k1*n + j1]; }
Carnegie Mellon
25
Assume:
First (block) iteration:
Carnegie Mellon
26
Assume:
Second (block) iteration:
Total misses:
Carnegie Mellon
27
No blocking: (9/8) * n3 Blocking: 1/(4B) * n3 Suggest largest possible block size B, but limit 3B2 < C! Reason for dramatic difference:
For a detailed discussion of blocking:
Carnegie Mellon
28
Cache:
Test Matrices:
Carnegie Mellon
29
Things you’ll need to know:
Carnegie Mellon
30
Strict compilation flags Reasons:
Add “-Werror” to your compilation flags
Carnegie Mellon
31
Remember to include files that we will be using functions
If function declaration is missing
Live example
Carnegie Mellon
32
The first row of Matrix A evicts the first row of Matrix B
Matrices are stored in memory in a row major order.
Carnegie Mellon
33
Read the style guideline
Start forming good habits now!
Carnegie Mellon
34