Welcome! Todays Agenda: Caching: Recap Data Locality - PowerPoint PPT Presentation

/IN /INFOMOV/ Optimization & Vectorization J. Bikker - Sep-Nov 2016 - Lecture 5: “Caching (2)” Welcome!

Today’s Agenda: Caching: Recap  Data Locality  Alignment  False Sharing  A Handy Guide (to Pleasing the Cache) 

INFOMOV – Lecture 5 – “Caching (2)” 3 Recap Refresher: Three types of cache: Fully associative Direct mapped N-set associative In an N-set associative cache, each memory address can be stored in N slots. Example:  32KB, 8-way set-associative, 64 bytes per cache line: 64 sets of 512 bytes.

INFOMOV – Lecture 5 – “Caching (2)” 4 Recap 32KB, 8-way set-associative, 64 bytes per cache line: 64 sets of 512 bytes offset tag set nr 31 12 11 6 5 0 32-bit address

INFOMOV – Lecture 5 – “Caching (2)” 5 set (0..7) Recap 32KB, 8-way set-associative, 64 bytes per cache line: 64 sets of 512 bytes offset tag set nr 31 12 11 6 5 0 32-bit address Examples: index: 0..63 (6 bit) 0x00001234 0001 001000 110100 0x00008234 1000 001000 110100 0x00006234 0110 001000 110100 0x0000A234 1010 001000 110100 0x0000A240 1010 001001 000000 0x0000F234 1111 001000 110100

INFOMOV – Lecture 5 – “Caching (2)” 6 Recap 32KB, 8-way set-associative, 64 bytes per cache line: 64 sets of 512 bytes offset tag set nr 31 12 11 6 5 0 32-bit address Theoretical consequence:  Address 0, 4096, 8192, … map to the same set (which holds max. 8 addresses)  consider int value[1024][1024] :  value[0,1,2…][x] map to the same set  querying this array vertically:  will quickly result in evictions  will use only 512 bytes of your cache

INFOMOV – Lecture 5 – “Caching (2)” 7 Recap 64 bytes per cache line Theoretical consequence:  If address 𝑌 is pulled into the cache, so is ( 𝑌+1…. 𝑌 +63). Example*: int arr = new int[64 * 1024 * 1024]; // loop 1 for( int i = 0; i < 64 * 1024 * 1024; i++ ) arr[i] *= 3; // loop 2 for( int i = 0; i < 64 * 1024 * 1024; I += 16 ) arr[i] *= 3; Which one takes longer to execute? *: http://igoro.com/archive/gallery-of-processor-cache-effects

INFOMOV – Lecture 5 – “Caching (2)” 8 Recap 64 bytes per cache line Theoretical consequence:  If address 𝑌 is removed from cache, so is ( 𝑌+1…. 𝑌 +63).  If the object you’re querying straddles the cache line boundary, you may suffer not one but two cache misses. Example: struct Pixel { float r, g, b; }; // 12 bytes Pixel screen[768][1024]; Assuming pixel (0,0) is aligned to a cache line boundary, the offsets in memory of pixels (0,1..5) are 12, 24, 36, 48, 60, … . Walking column 5 will be very expensive.

INFOMOV – Lecture 5 – “Caching (2)” 9 Recap Considering the Cache  Size  Cache line size and alignment  Aliasing  Sharing  Access patterns

INFOMOV – Lecture 5 – “Caching (2)” 11 Data Locality Why do Caches Work? 1. Because we tend to reuse data. 2. Because we tend to work on a small subset of our data. 3. Because we tend to operate on data in patterns.

INFOMOV – Lecture 5 – “Caching (2)” 12 Data Locality Reusing data  Very short term: variable ‘ i ’ being used intensively in a loop  register  Short term: lookup table for square roots being used on every input element  L1 cache  Mid-term: particles being updated every frame  L2, L3 cache  Long term: sound effect being played ~ once a minute  RAM  Very long term: playing the same CD every night  disk

INFOMOV – Lecture 5 – “Caching (2)” 13 Data Locality Reusing data Ideal pattern:  load data once, operate on it, discard. Typical pattern:  operate on data using algorithm 1, then using algorithm 2, … Note: GPUs typically follow the ideal pattern. (more on that later)

INFOMOV – Lecture 5 – “Caching (2)” 14 Data Locality Reusing data Ideal pattern:  load data sequentially. Typical pattern:  whatever the algorithm dictates.

INFOMOV – Lecture 5 – “Caching (2)” 15 Data Locality Example: rotozooming

INFOMOV – Lecture 5 – “Caching (2)” 16 Data Locality Example: rotozooming

INFOMOV – Lecture 5 – “Caching (2)” 17 Data Locality Example: rotozooming Improving data locality: z-order / Morton curve Method: X = 1 1 0 0 0 1 0 1 1 0 1 1 0 1 Y = 1 0 1 1 0 1 1 0 1 0 1 1 1 0 -------------------------------- M = 1101101000111001110011111001

INFOMOV – Lecture 5 – “Caching (2)” 18 Data Locality Data Locality Wikipedia: Tem emporal Loc ocalit ity – “If at one point in time a particular memory location is referenced, then it is likely that the same location will be referenced again in the near future .” Spatia ial Loc ocality – “If a particular memory location is referenced at a particular time, then it is likely that nearby memory locations will be referenced in the near future .” * More info: http://gameprogrammingpatterns.com/data-locality.html

INFOMOV – Lecture 5 – “Caching (2)” 19 Data Locality Data Locality How do we increase data locality? Line inear r ac access – Sometimes as simple as swapping for loops * Tiling – Example of working on a small subset of the data at a time. Streaming – Operate on/with data until done. Redu educing ng dat data size ze – Smaller things are closer together. How do trees/linked lists/hash tables fit into this? * For an elaborate example see https://www.cs.duke.edu/courses/cps104/spring11/lects/19-cache-sw2.pdf

INFOMOV – Lecture 5 – “Caching (2)” 21 Alignment Cache line size and data alignment What is wrong with this struct? Better: Note: As soon as we read any field struct Particle struct Particle from a particle, the other fields { { are guaranteed to be in L1 cache. float x, y, z; float x, y, z; float vx, vy, vz; float vx, vy, vz; float mass; float mass, dummy; If you update x, y and z in one }; }; loop, and vx, vy, vz in a second // size: 28 bytes // size: 32 bytes loop, it is better to merge the two loops. Two particles will fit in a cache line (taking up 56 bytes). The next particle will be in two cache lines.

INFOMOV – Lecture 5 – “Caching (2)” 22 Alignment Cache line size and data alignment What is wrong with this allocation? Note: Is it bad if particles straddle a struct Particle cache line boundary? { float x, y, z; Not necessarily: if we read the float vx, vy, vz; float mass, dummy; array sequentially, we sometimes }; get 2, but sometimes 0 cache // size: 32 bytes misses. Particle particles[512]; For random access, this is not a good idea. Although two particles will fit in a cache line, we have no guarantee that the address of the first particle is a multiple of 64.

INFOMOV – Lecture 5 – “Caching (2)” 23 Alignment Cache line size and data alignment Controlling the location in memory of arrays: An address that is dividable by 64 has its lowest 6 bits set to zero. In hex: all addresses ending with 40, 80 and C0. Enforcing this: Particle* particles = _aligned_malloc(512 * sizeof( Particle ), 64); Or: __declspec(align(64)) struct Particle { … };

INFOMOV – Lecture 5 – “Caching (2)” 25 False Sharing Multiple Cores using Caches Two cores can hold copies of the same data. T0 L1 I-$ L2 $ T1 L1 D-$ Not as unlikely as you may think – Example: T0 L1 I-$ byte data = new byte[COUNT]; L2 $ for( int i = 0; i < COUNT; i++ ) T1 L1 D-$ data[i] = rand() % 256; L3 $ // count byte values T0 L1 I-$ int counter[256]; L2 $ for( int i = 0; i < COUNT; i++ ) T1 L1 D-$ counter[byteArray[i]]++; L1 I-$ T0 L2 $ T1 L1 D-$

INFOMOV – Lecture 5 – “Caching (2)” 26 False Sharing Multiple Cores using Caches Multithreading GlassBall, options: 1. Draw balls in parallel 2. Draw screen columns in parallel 3. Draw screen lines in parallel

INFOMOV – Lecture 5 – “Caching (2)” 28 Easy Steps How to Please the Cache Or: “how to evade RAM” 1. Keep your data in registers Use fewer variables Limit the scope of your variables Pack multiple values in a single variable Use floats and ints (they use different registers) Compile for 64-bit (more registers) Arrays will never go in registers

INFOMOV – Lecture 5 – “Caching (2)” 29 Easy Steps How to Please the Cache Or: “how to evade RAM” 2. Keep your data local Read sequentially Keep data small Use tiling / Morton order Fetch data once, work until done (streaming) Reuse memory locations

INFOMOV – Lecture 5 – “Caching (2)” 30 Easy Steps How to Please the Cache Or: “how to evade RAM” 3. Respect cache line boundaries Use padding if needed Don’t pad for sequential access Use aligned malloc / __declspec align Assume 64-byte cache lines

Welcome! Todays Agenda: Caching: Recap Data Locality - PowerPoint PPT Presentation

/IN /INFOMOV/ Optimization & Vectorization J. Bikker - Sep-Nov 2016 - Lecture 5: Caching (2) Welcome! Todays Agenda: Caching: Recap Data Locality Alignment False Sharing A Handy Guide (to Pleasing the

Welcome back. Today. Welcome back. Today. Continue Sampling combinatorial structures. Welcome

Welcome! Welcome! Welcome! Welcome! What will happen today? What will happen today? Lecture

What is the League Today 1 1/23/2017 What is the League Today What is the League Today 2

Welcome back. Today. Welcome back. Today. Review: Spectral gap, Edge expansion h ( G ) ,

Welcome back... Welcome back... ..to me. Welcome back... ..to me. Test out Welcome back...

Social/Network/Analysis mohamed.bouguessa@uqo.ca/ 1 Web/today 2

Lecture 15 Logistics HW4 is due today HW5 posted today HW5 posted today Exam

Welcome to Today s ACM Webinar Welcome to today s ACM Webinar. The presentation starts

Welcome! Welcome ! - Agenda ANNUAL STEM EXPO 17 ..:: TIME AGENDA ITEM 2:30 PM Welcome Ceremony

Welcome Monthly Meeting August 2, 2019 Welcome & Check-in Agenda I. Welcome and

TEC Roadshow 2016 Welcome Agenda What well cover today: Welcome TECs current

2015 Assigners Summit Welcome Agenda: 1. Welcome 2. Part 1 Issues in assigning today 3.

Department Collaborative June 25, 2018 Welcome! Agenda for today: Welcome Presentation

WIEMANN LAMPHERE ARCHITECTS MONTPELIER TODAY MONTPELIER TODAY PARKING! VEHICLES ARE

Today. Types of graphs. Today. Types of graphs. Complete Graphs. Trees. Hypercubes. Today.

Welcome! Welcome! Welcome! Welcome! Autor:Johann Oberdorfer Autor:Johann Oberdorfer With

Welcome! Todays Agenda: Caching: Recap Data Locality Alignment False

B-Tagging and ttH, H bb Analysis on Fully Simulated Events in the ATLAS Experiment A.H.

LC Detector R&D: Report from Liaisons Jan Strube (Tohoku University) Maxim Titov (CEA Saclay)

Vertexing Degradation Takanori Hara(Osaka U.) 2005/Apr/21 @Hawaii Near Future 2004 2008

f g u ,v = u f x , y g u x ,v y dx dy v Or, in

Reconfiguration Overhead in Dynamic Task-Based Implementations on FPGAs Padmini Nagaraj

Hidden Data in Internet Published Documents 2004-12-27 21. Chaos Communication Congress 2004

The DCT domain and JPEG CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of

Welcome! Todays Agenda: Caching: Recap Data Locality - PowerPoint PPT Presentation

/IN /INFOMOV/ Optimization & Vectorization J. Bikker - Sep-Nov 2016 - Lecture 5: Caching (2) Welcome! Todays Agenda: Caching: Recap Data Locality Alignment False Sharing A Handy Guide (to Pleasing the

Welcome back. Today. Welcome back. Today. Continue Sampling combinatorial structures. Welcome

Welcome! Welcome! Welcome! Welcome! What will happen today? What will happen today? Lecture

What is the League Today 1 1/23/2017 What is the League Today What is the League Today 2

Welcome back. Today. Welcome back. Today. Review: Spectral gap, Edge expansion h ( G ) ,

Welcome back... Welcome back... ..to me. Welcome back... ..to me. Test out Welcome back...

Social/Network/Analysis mohamed.bouguessa@uqo.ca/ 1 Web/today 2

Lecture 15 Logistics HW4 is due today HW5 posted today HW5 posted today Exam

Welcome to Today s ACM Webinar Welcome to today s ACM Webinar. The presentation starts

Welcome! Welcome ! - Agenda ANNUAL STEM EXPO 17 ..:: TIME AGENDA ITEM 2:30 PM Welcome Ceremony

Welcome Monthly Meeting August 2, 2019 Welcome &amp; Check-in Agenda I. Welcome and

TEC Roadshow 2016 Welcome Agenda What well cover today: Welcome TECs current

2015 Assigners Summit Welcome Agenda: 1. Welcome 2. Part 1 Issues in assigning today 3.

Department Collaborative June 25, 2018 Welcome! Agenda for today: Welcome Presentation

WIEMANN LAMPHERE ARCHITECTS MONTPELIER TODAY MONTPELIER TODAY PARKING! VEHICLES ARE

Today. Types of graphs. Today. Types of graphs. Complete Graphs. Trees. Hypercubes. Today.

Welcome! Welcome! Welcome! Welcome! Autor:Johann Oberdorfer Autor:Johann Oberdorfer With

Welcome! Todays Agenda: Caching: Recap Data Locality Alignment False

B-Tagging and ttH, H bb Analysis on Fully Simulated Events in the ATLAS Experiment A.H.

LC Detector R&amp;D: Report from Liaisons Jan Strube (Tohoku University) Maxim Titov (CEA Saclay)

Vertexing Degradation Takanori Hara(Osaka U.) 2005/Apr/21 @Hawaii Near Future 2004 2008

f g u ,v = u f x , y g u x ,v y dx dy v Or, in

Reconfiguration Overhead in Dynamic Task-Based Implementations on FPGAs Padmini Nagaraj

Hidden Data in Internet Published Documents 2004-12-27 21. Chaos Communication Congress 2004

The DCT domain and JPEG CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of

Welcome Monthly Meeting August 2, 2019 Welcome & Check-in Agenda I. Welcome and

LC Detector R&D: Report from Liaisons Jan Strube (Tohoku University) Maxim Titov (CEA Saclay)