cache memories
play

Cache Memories Lecture, Oct. 30, 2018 1 Bryant and OHallaron, - PowerPoint PPT Presentation

Cache Memories Lecture, Oct. 30, 2018 1 Bryant and OHallaron, Computer Systems: A Programmers Perspective, Third Edition General Cache Concept Smaller, faster, more expensive Cache 8 4 9 14 10 3 memory caches a subset of the


  1. Cache Memories Lecture, Oct. 30, 2018 1 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition

  2. General Cache Concept Smaller, faster, more expensive Cache 8 4 9 14 10 3 memory caches a subset of the blocks Data is copied in block-sized 10 4 transfer units Larger, slower, cheaper memory Memory 0 1 2 3 viewed as partitioned into “blocks” 4 4 5 6 7 8 9 10 10 11 12 13 14 15 2 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition

  3. 3 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition

  4. 4 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition

  5. 5 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition

  6. Structure Representation r struct rec { int a[4]; size_t i; a i next struct rec *next; 24 32 0 16 }; 6 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition

  7. I[0].A I[0].B I[0].BV[0] I[0].B[1] I[1].A I[1].B I[1].BV[0] I[1].B[1] 7 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition

  8. I[0].A I[0].B I[0].BV[0] I[0].B[1] Each block I[1].A I[1].B I[1].BV[0] I[1].B[1] associated the first half of the array has a unique spot in memory I[2].A I[2].B I[2].BV[0] I[2].B[1] I[3].A I[3].B I[3].BV[0] I[3].B[1] 2^9 8 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition

  9. Cache Optimization Techniques for (j = 0; j < 3: j = j+1){ for (i = 0; i < 3: i = i+1){ for( i = 0; i < 3; i = i + 1){ for( j = 0; j < 3; j = j + 1){ x[i][j] = 2*x[i][j]; x[i][j] = 2*x[i][j]; } } } } Inner loop analysis These two loops compute the same result Array in row major order X[0][0] X[0][1] X[0][2] X[1][0] X[1][1] X[1][2] X[2][0] X[2][1] X[2][2] 0x0 – 0x3 0x4 - 0x7 0x8-0x11 0x12 – 0x15 0x16 - 0x19 0x20-0x23 X[0][0] X[0][1] X[0][2] X[1][0] X[1][1] X[1][2] X[2][0] X[2][1] X[2][2] 9 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition

  10. Cache Optimization Techniques for (j = 0; j < 3: j = j+1){ for (i = 0; i < 3: i = i+1){ for( i = 0; i < 3; i = i + 1){ for( j = 0; j < 3; j = j + 1){ x[i][j] = 2*x[i][j]; x[i][j] = 2*x[i][j]; } } } } These two loops compute the same result int *x = malloc(N*N); Array in row major order for (i = 0; i < 3: i = i+1){ for( j = 0; j < 3; j = j + 1){ x[i*N +j] = 2*x[i*N + j]; X[0][0] X[0][1] X[0][2] } } X[1][0] X[1][1] X[1][2] X[2][0] X[2][1] X[2][2] 0x0 – 0x3 0x4 - 0x7 0x8-0x11 0x12 – 0x15 0x16 - 0x19 0x20-0x23 X[0][0] X[0][1] X[0][2] X[1][0] X[1][1] X[1][2] X[2][0] X[2][1] X[2][2] 10 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition

  11. Matrix Multiplication Refresher 11 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition

  12. Miss Rate Analysis for Matrix Multiply • Assume: • Block size = 32B (big enough for four doubles) • Matrix dimension (N) is very large • Cache is not even big enough to hold multiple rows • Analysis Method: • Look at access pattern of inner loop j j k = x i i k C A B 12 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition

  13. Layout of C Arrays in Memory (review) • C arrays allocated in row-major order • each row in contiguous memory locations • Stepping through columns in one row: • for (i = 0; i < N; i++) sum += a[0][i]; • accesses successive elements • if block size (B) > sizeof(a ij ) bytes, exploit spatial locality • miss rate = sizeof(a ij ) / B • Stepping through rows in one column: • for (i = 0; i < n; i++) sum += a[i][0]; • accesses distant elements • no spatial locality! • miss rate = 1 (i.e. 100%) 13 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition

  14. Matrix Multiplication (ijk) /* ijk */ Inner loop: for (i=0; i<n; i++) { (*,j) for (j=0; j<n; j++) { (i,j) sum = 0.0; (i,*) for (k=0; k<n; k++) A B C sum += a[i][k] * b[k][j]; c[i][j] = sum; } matmult/mm.c } Row-wise Column- Fixed wise Misses per inner loop iteration: A B C 0.25 1.0 0.0 14 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition

  15. Matrix Multiplication (jik) /* jik */ Inner loop: for (j=0; j<n; j++) { for (i=0; i<n; i++) { (*,j) sum = 0.0; (i,j) (i,*) for (k=0; k<n; k++) A B C sum += a[i][k] * b[k][j]; c[i][j] = sum } } matmult/mm.c Row-wise Column- Fixed wise Misses per inner loop iteration: A B C 0.25 1.0 0.0 15 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition

  16. Matrix Multiplication (kij) /* kij */ Inner loop: for (k=0; k<n; k++) { for (i=0; i<n; i++) { (i,k) (k,*) (i,*) r = a[i][k]; for (j=0; j<n; j++) A B C c[i][j] += r * b[k][j]; } Row-wise Row-wise } Fixed matmult/mm.c Misses per inner loop iteration: A B C 0.0 0.25 0.25 16 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition

  17. Matrix Multiplication (ikj) /* ikj */ Inner loop: for (i=0; i<n; i++) { for (k=0; k<n; k++) { (i,k) (k,*) (i,*) r = a[i][k]; for (j=0; j<n; j++) A B C c[i][j] += r * b[k][j]; } } matmult/mm.c Row-wise Row-wise Fixed Misses per inner loop iteration: A B C 0.0 0.25 0.25 17 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition

  18. Matrix Multiplication (jki) Inner loop: /* jki */ for (j=0; j<n; j++) { (*,k) (*,j) for (k=0; k<n; k++) { (k,j) r = b[k][j]; for (i=0; i<n; i++) A B C c[i][j] += a[i][k] * r; } } Column- Fixed Column- matmult/mm.c wise wise Misses per inner loop iteration: A B C 1.0 0.0 1.0 18 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition

  19. Matrix Multiplication (kji) /* kji */ Inner loop: for (k=0; k<n; k++) { (*,k) (*,j) for (j=0; j<n; j++) { (k,j) r = b[k][j]; for (i=0; i<n; i++) A B C c[i][j] += a[i][k] * r; } } matmult/mm.c Column- Fixed Column- wise wise Misses per inner loop iteration: A B C 1.0 0.0 1.0 19 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition

  20. Summary of Matrix Multiplication for (i=0; i<n; i++) { for (j=0; j<n; j++) { ijk (& jik): sum = 0.0; • 2 loads, 0 stores for (k=0; k<n; k++) { • misses/iter = 1.25 sum += a[i][k] * b[k][j];} c[i][j] = sum; } } for (k=0; k<n; k++) { kij (& ikj): for (i=0; i<n; i++) { • 2 loads, 1 store r = a[i][k]; for (j=0; j<n; j++){ • misses/iter = 0.5 c[i][j] += r * b[k][j];} } } for (j=0; j<n; j++) { jki (& kji): for (k=0; k<n; k++) { • 2 loads, 1 store r = b[k][j]; for (i=0; i<n; i++){ • misses/iter = 2.0 c[i][j] += a[i][k] * r;} } } 20 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition

  21. Core i7 Matrix Multiply Performance 100 jki / kji Cycles per inner loop iteration ijk / jik jki kji 10 ijk jik kij ikj kij / ikj 1 50 100 150 200 250 300 350 400 450 500 550 600 650 700 Array size (n) 21 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition

  22. Example: Matrix Multiplication c = (double *) calloc(sizeof(double), n*n); /* Multiply n x n matrices a and b */ void mmm(double *a, double *b, double *c, int n) { int i, j, k; for (i = 0; i < n; i++) for (j = 0; j < n; j++) for (k = 0; k < n; k++) c[i*n + j] += a[i*n + k] * b[k*n + j]; } j c a b = * i 22 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition

  23. Cache Miss Analysis • Assume: • Matrix elements are doubles • Assume the matrix is square • Cache block = 8 doubles • Cache size C << n (much smaller than n) • First iteration: • n/8 + n = 9n/8 misses n • Afterwards in cache: (schematic) = * = * 8 wide 23 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition

  24. Cache Miss Analysis • Assume: • Matrix elements are doubles • Cache block = 8 doubles • Cache size C << n (much smaller than n) • Second iteration: • Again: n/8 + n = 9n/8 misses n • Total misses: • 9n/8 * n 2 = (9/8) * n 3 = * 8 wide 24 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition

  25. Blocked Matrix Multiplication j1 c a b += * i1 Block size B x B 25 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition

  26. j1 c a b += * i1 Block size B x B 26 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition

  27. j1 c a b += * i1 Block size B x B 1 2 5 6 1 2 5 6 3 4 7 8 3 4 7 8 9 10 13 14 9 10 13 14 11 12 15 16 11 12 15 16 27 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition

  28. j1 c a b += * i1 Block size B x B 1 2 5 6 1 2 5 6 3 4 7 8 3 4 7 8 9 10 13 14 9 10 13 14 11 12 15 16 11 12 15 16 1 2 1 2 5 6 9 10 * + * 3 4 3 4 7 8 11 12 28 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend