Cache-Oblivious Algorithms Paper Reading Group Matteo Frigo - PowerPoint PPT Presentation

Cache-Oblivious Algorithms Paper Reading Group Matteo Frigo Charles E. Leiserson Harald Prokop Sridhar Ramachandran Presents: Maksym Planeta 03.09.2015

Table of Contents Introduction Cache-oblivious algorithms Matrix multiplication Matrix transposition Fast Fourier Transform Sorting Relieved system model Experimental evaluation Conclusion

Matrix multiplication ORD-MULT ( A , B , C ) 1 for i ← 1 to m 2 for j ← 1 to p 3 for k ← 1 to n 4 C ij ← C ij + A ik × B kj

✵ ✎✂ ✟✏✠ ✵ ✄✂ ✟✡✠ ✽✯✽ ✵ ✄✂ ✟✏✠ ✴ ✸ ✂ ✂ ✂ ✂ ☛ ✼ ✂ ✵ ✸ ✼ ✷ ✂ ✿ ✵ ✆ ✾ ❀ ✝ ✴ ✂ ✂ ☎✆ ✝ ✴ ✆ ✆✞ ✴ ✴ ✷ ✷ ✵ ✸ ☞ ✵ ✁ ✵ ✍ ✽ ✽ ✆ ✆ ✿ ✿ ✷ ✆ ✆ ✿ ✁ ✿ ✽ ✿ ✴ ✵ ✝ ✷ ☞ ✂ ✴ ✂ ✸ ✂ ✵ ✴ ✸ ✵ ✌ ☞ ✂ ✴ ✂ ✂ ✾ ✒ ✁ ✆ ✿ ✽ ✾ ✾ ✿ ❁ ✽ ✽ ✆ ✆ ✆ ✿ ✁ ✿ ✏ ✑ ✁ ✁ ✒ ✝ ✾ ✾ ✿ ❁ ✽ ✆ ✁ ✿ ✆ ✾ ✾ ✁ ✝ ✽ ✆ ✁ ✾ ❁ ✾ � ❀ � � � � � � � ✽ � ✑ � ✾ � ✾ � ✿ � ❁ ✽ ❁ ✽ ✿ ✽ ✆ ✆ ✆ ✆ � ✆ ✿ ✁ ✆ ✁ ✝ ✿ ✽ ❁ ✾ ✵ ✝ ✴ ✂ ✑ ✴ ✷ ✼ ✁ ✸ ✂ ✼ ☛ ✂ ✴ ✸ ✁ ✁ ✂ ✁ ✝ ✾ ✁ ✝ ❀ ✝ ✾ ✝ ✝ ❀ ✝ ✾ ✁ ✝ ❀ ✝ ✂ ✵ ✝ ✿ ✁ ✆ ✁ ✿ ✽ ✾ ✾ ❁ ✁ ✽ ✆ ✽ ✆ ✆ ✿ ✁ ✆ ✆ ✷ ✽ ✴ ✵ ✽ ✾ ✾ ✿ ❁ ✆ ✽ ✁ ✝ ✿ ✾ ✾ ✁ ✝ ❀ ✁ ✿ � ✁ ✝ ✾ ❀ ✝ � ✴ ❀ ✂ ✂ ☎ ✝ ✞ ✴ ✷ ✝ ✾ ✸ ✾ ✵ ✽ ✁ ✝ ✿ ✁ ✝ ❀ ✝ ✝ ✁ ✝ ✾ ❀ ✝ ✁ ✷ ✷ ✾ ✾ ✸ ✵ ✷ ✴ ✵ ✽ ✾ ✿ ✂ ❁ ✽ ✁ ✆ ✁ ✝ ✿ ✂ ✴ ✵ ✴ ✂ ✼ ✸ ✂ ✼ ☛ ✂ ✂ ✂ ✸ ✂ ✵ ✷ ✴ ✵ ☞ ✽ ✴ ✷ ☛ ✴ ✷ ✷ ✵ ✸ ✵ ✂ ✂ ✼ ✸ ✂ ✼ ✂ ✆✞ ✴ ✸ ✂ ✂ ✵ ✷ ✝ ❀ ✾ ✝ ✁ ✝ ✴ ✆ ✾ ✾ � � � � � � � � ✑ � ✾ � � ✝ ✒ ✁ ✝ ✾ ✾ ❀ ✝ ✴ ✂ ✂ ☎✆ ✆ ❀ ✝ ❀ ✴ ✆ ✆ ✽ ✽ ✍ ✵ ✴ ✷ ✵ ✸ ✂ ✂ ✂ ✁ ☞ ✌ ✵ ✴ ✷ ✵ ✂ ✸ ✂ ✴ ✂ ☞ ✿ ✿ ✁ ✁ ✝ ❀ ✾ ✝ ✁ ✝ ❀ ✾ ✝ ✁ ✿ ✝ ✽ ✴ ❁ ✿ ✾ ✾ ✽ ✿ ✽ ✿ ✁ ✿ ✆ ✆ � ✿ ✵ ✝ ❀ ✝ ✁ ✾ ✝ ❀ ✾ ✝ ✁ ✾ ✝ ❀ ✁ ✾ ✾ ✿ ✝ ✁ ✆ ✁ ✽ ❁ ✿ ✾ ✾ ✽ ✝ ✁ ✴ ✵ ✵ ✂ ✂ ✸ ✴ ✂ ☛ ✼ ✂ ✸ ✼ ✏ ✷ ✝ ✴ ✑ ✑ ✂ ✴ ✝ ✁ ✁ ✁ ✝ ✝ ❀ ✵ ✷ ✽ ✆ ☎ ❁ ✂ ✂ ✴ � � ❁ ❁ ✽ ✿ ✽ ✆ ✞ ✆ ✽ ✆ ✆ ✿ ✁ ✆ ✁ ✝ ✿ � � ✝ ✴ ✵ ✂ ✸ ✂ ✂ ✴ ✂ ☞ ✵ ✴ ✷ ✵ ✂ ✸ ✴ ✷ ✂ ☛ ✼ ✂ ✸ ✼ ✒ ✂ ✵ ✷ ✸ ✷ ☞ Matrix layout Like in C . . . (a) 0 1 0 1 2 2 3 3 4 4 5 5 6 6 7 7 8 9 8 9 10 10 11 11 12 12 13 13 14 14 15 15 16 16 17 17 18 18 19 19 20 20 21 21 22 22 23 23 24 25 25 26 26 27 27 28 28 29 29 30 30 31 31 24 32 33 32 33 34 34 35 35 36 36 37 37 38 38 39 39 40 41 40 41 42 42 43 43 44 44 45 45 46 46 47 47 48 49 49 50 48 50 51 51 52 52 53 53 54 54 55 55 56 56 57 57 58 58 59 59 60 60 61 61 62 62 63 63 Figure: Row major order ✵ ✎✂ ✟✏✠ ✵ ✄✂ ✟✡✠ ✽✯✽ ✵ ✄✂ ✟✏✠

� ❁ ✆ ✁ ✿ ✆ ✆ ✽ ✆ ✆ ✆ ✽ ✿ ✽ ❁ ❁ ✒ ✝ ✾ ✑ ✏ ✿ ✿ ✁ ✿ ✆ ✆ ✽ ✆ ✽ ❁ ✿ ✁ ✿ ✾ � ☎✆ ✁ ✂ ✂ ✴ ✝ ❀ ✾ ✾ ✝ ✁ ✒ � ✾ ✾ � � � ✽ ✿ ❀ ✝ � � � � � � � � ✑ ✾ ✽ ✝ ❀ ✾ ✵ ✷ ✴ ✑ ✁ ✂ ✴ ✝ ✁ ✁ ✁ ✝ ✝ ✝ ✸ ✁ ✾ ✝ ❀ ✝ ✁ ✾ ✝ ❀ ✝ ✁ ✾ ✝ ❀ ✼ ✂ ✿ ✆ ✁ ✆ ✁ ✆ ✁ ✆ ✽ ✝ ✁ ✾ ✾ ✿ ✝ ✁ ✽ ✼ ✵ ☛ ✂ ✴ ✸ ✂ ✂ ✷ ❁ ✴ ✵ ✽ ✾ ✾ ✿ ✆ ✆ ✁ ❀ ✴ ✵ ✽ ✾ ✾ ✿ ❁ ✽ ✝ ❀ ✾ ✝ ✁ ✝ ✾ ✵ ✝ ✁ ✝ ❀ ✾ ✝ ✁ ✝ ❀ ✾ ✝ ✁ ✿ ✝ ✷ ✸ ✽ ✸ ✴ ✂ ✂ ☎ ✝ ✞ ✴ ✷ ✷ ✸ ✷ ✵ ✂ ✼ ✂ ✂ ✷ ✂ ✴ ✂ ☞ ✵ ✴ ✵ ✼ ✂ ✸ ✂ ✴ ✂ ☛ ✁ ❁ ✆✞ ✂ ✸ ✂ ✴ ✂ ☞ ☞ ✵ ✴ ✷ ✵ ✂ ✂ ✸ ✴ ☛ ✵ ✼ ✂ ✸ ✼ ✆ ✂ ✂ ✵ ✸ ✵ ✷ ✷ ✴ ✴ ✂ ✷ ✿ ✆ ✾ ✾ ✽ ✿ ✽ ✿ ✁ ✿ ✆ ✆ ✁ ✿ ✁ ✿ ✆ ✴ ✂ ✵ ✌ ☞ ✂ ✴ ✂ ✸ ✽ ✵ ✷ ✴ ✵ ✍ ✽ ✝ ✾ � ✂ ✂ ✂ ✴ ✂ ☞ ✌ ✵ ✴ ✷ ✵ ✂ ✸ ✂ ✴ ☞ ✵ ☞ ✵ ✴ ✷ ✵ ✂ ✂ ✸ ✴ ✂ ☛ ✼ ✂ ✸ ✸ ✷ ✁ ✿ ✝ ❀ ✾ ✝ ✁ ✿ ✝ ✁ ✽ ❁ ✿ ✾ ✾ ✽ ✽ ✴ ✿ ✵ ✍ ✽ ✽ ✆ ✆ ✁ ✿ ✿ ✿ ✆ ✆ ✿ ✁ ✼ ✂ ✝ ✆ � � � ❀ ✿ ✽ � � ✿ ✝ ✁ ✆ ✁ ✿ ✆ � ✽ ✆ ✆ ✆ ✽ ✿ ✽ ❁ ❁ ❁ ✒ ✾ ✑ ✏ � � ✂ ✂ ✵ ✸ ✵ ✷ ✷ ✴ ✴ ✆✞ ✆ ✝ ✆ ☎✆ ✿ ✂ ✴ � ✾ � � ✑ � ✾ � � ✝ ✒ ✁ ✝ ✾ ✾ ❀ ✁ ✾ ✿ � ✂ ✵ ✷ ✸ ✷ ✷ ✴ ✞ ✝ ☎ ❀ ✂ ✂ ✴ � ✼ ✝ ✾ ✁ ✝ ❀ ✝ ✾ ✁ ✝ ❀ ✝ ✾ ✁ ✝ ✝ ✸ ✝ ✸ ✝ ✁ ✆ ✁ ✽ ❁ ✿ ✾ ✾ ✽ ✵ ✴ ✷ ✵ ✂ ✂ ✂ ✼ ☛ ✂ ✴ ✂ ✸ ✵ ✂ ✷ ✴ ✵ ☞ ✂ ✴ ❀ ✝ ❀ ❁ ✁ ✝ ✽ ✆ ✁ ✆ ✁ ✆ ✁ ✿ ✽ ✾ ✾ ✿ ✽ ✾ ✁ ✝ ✁ ✝ ✾ ❀ ✝ ✝ ✆ ✾ ❀ ✝ ✆ ✆ ✽ ✾ ✿ ✁ ✂ ✁ ✁ ✝ ✴ ✂ ✑ ✴ ✷ ✵ ✼ ✸ ✂ ✼ ☛ ✴ ✝ ✾ ✁ ✆ ✽ ❁ ✿ ✾ ✽ ✸ ✵ ✴ ✷ ✵ ✂ ✂ � Matrix layout Like in C . . . (a) (b) 0 1 0 1 2 2 3 3 4 4 5 5 6 6 7 7 0 0 8 8 16 16 24 24 32 32 40 40 48 48 56 56 8 8 9 9 10 10 11 11 12 12 13 13 14 14 15 15 1 1 9 9 17 17 25 25 33 33 41 41 49 49 57 57 16 17 16 17 18 18 19 19 20 20 21 21 22 22 23 23 2 2 10 10 18 18 26 26 34 34 42 42 50 50 58 58 24 25 25 26 26 27 27 28 28 29 29 30 30 31 31 3 3 19 19 35 35 43 43 51 51 59 59 24 11 11 27 27 32 33 32 33 34 34 35 35 36 36 37 37 38 38 39 39 4 4 12 12 20 20 28 28 36 36 44 44 52 52 60 60 40 40 41 41 42 42 43 43 44 44 45 45 46 46 47 47 5 5 13 13 21 21 29 29 37 37 45 45 53 53 61 61 48 49 49 50 48 50 51 51 52 52 53 53 54 54 55 55 6 6 14 14 22 22 30 30 38 38 46 46 54 54 62 62 56 56 57 57 58 58 59 59 60 60 61 61 62 62 63 63 7 7 15 15 23 23 31 31 39 39 47 47 55 55 63 63 Figure: Row major order Figure: Column major order Or like in Fortran ✵ ✎✂ ✵ ✎✂ ✟✏✠ ✟✏✠ ✵ ✄✂ ✵ ✄✂ ✟✡✠ ✟✡✠ ✽✯✽ ✽✯✽ ✵ ✄✂ ✵ ✄✂ ✟✏✠ ✟✏✠

Cache friendly algorithm BLOCK-MULT ( A , B , C , n ) 1 for i ← 1 to n / s 2 for j ← 1 to n / s 3 for k ← 1 to n / s 4 ORD-MULT ( A ik , B kj , C ij , s )

BLOCK-MULT issues Being cache aware is hard: ◮ Cumbersome structure ◮ Complicated choice of s ◮ Expensive mispicking of s ◮ Problematic if n mod s � = 0

Motivation ◮ Keeping algorithm simple is nice. ◮ But cache effectiveness is the must .

Table of Contents Introduction Cache-oblivious algorithms Matrix multiplication Matrix transposition Fast Fourier Transform Sorting Relieved system model Experimental evaluation Conclusion

✿ ✾ ✷ ✵ ✸ ✷ ✸ ✼ ✽ ✿ ✴ ❀ ❁ ✽ ✽ ✿ ✽ ✾ ✷ ✵ ✶ ✸ ✷ ✴ ✵ ✴ ✶ ✵ ✵ ✶ ✴ ✷ ✴ ✴ ✵ ✳ ✸ System model �✂✁☎✄✝✆✟✞✂✠✡✁☞☛✍✌✎✞✑✏✒☛✔✓✑✕✟✖✘✗✟✖✙✓✚☛✜✛✘✕✣✢✙✤✎✆✟✁✎✕✣✥✘✦✡✛✙✧★✓✟✞✎✕★✄✩✞✡✌✪✆✬✫✭✏✑✌★✛☞✫✯✮✰✢✪✄✱✫✲✞✎✤✎✧ ◮ Two level memory ◮ Fully associative ◮ Strictly optimal replacement Main organized by Memory ◮ Automatic replacement optimal replacement strategy ◮ Tall cache: Cache CPU Z = Ω( L 2 ) , W work where: Q Z ✸ L Cache lines Z – number of cache misses words in the Lines of length L cache ✵✺✵ ✷✹✴ L – number of Figure 1: The ideal-cache model words in a ✷✻✴ cache line ✿✣✾

Cache-Oblivious Algorithms Paper Reading Group Matteo Frigo - PowerPoint PPT Presentation

Cache-Oblivious Algorithms Paper Reading Group Matteo Frigo Charles E. Leiserson Harald Prokop Sridhar Ramachandran Presents: Maksym Planeta 03.09.2015 Table of Contents Introduction Cache-oblivious algorithms Matrix multiplication

Part 2, course 2: Cache Oblivious Algorithms CR10: Data Aware Algorithms October 2, 2019 Agenda

Cache-Oblivious Algorithms 1 Cache-Oblivious Model 2 The Unknown Machine Algorithm Algorithm

Coping with the Memory Hierarchy the Cache-Oblivious Way Rolf Fagerberg University of Aarhus

Cache-oblivious sparse matrixvector multiplication Albert-Jan Yzelman April 3, 2009 Joint

Cache-oblivious sparse matrixvector multiplication Albert-Jan Yzelman & Rob H. Bisseling

1 Classifying cache misses Cache Organization Classifying misses by causes (3Cs) Cache size,

FUNCTIONALLY OBLIVIOUS (AND SUCCINCT) Edward Kmett BUILDING BETTER TOOLS Cache-Oblivious

Cache Oblivious Sorting Gerth Stlting Brodal University of Aarhus Algorithms and Data

Part 2: External Memory and Cache Oblivious Algorithms CR10: Data Aware Algorithms September 25,

What Is Memory Hierarchy A typical memory hierarchy today: Lecture 13: Cache Basics and Cache

Memory Hierarchy: Cache Memory hierarchy Cache basics Locality Cache organization Cache-aware

Web Cache Consistency Web Cache Consistency Web Cache Consistency Web Cache Consistency

L09: Cache Name: ID: Question: Direct Mapping Cache Hit Rate Consider a 4-block empty Cache,

A Cache-Oblivious Heap Introduced by Arge et al. [1]. Based on distribution of elements

Part 2, course 3: Parallel External Memory and Cache Oblivious Algorithms CR10: Data Aware

Cache-Oblivious and Cache-Aware Algorithms , July 2004 Data Structures , February-March 2002

BLASFEO Gianluca Frison University of Freiburg BLIS retreat September 19, 2017 Gianluca Frison

Addressing Modes Chapter 11 S. Dandamudi Outline Addressing modes Examples Simple

Introduction to Object-Oriented Programming Arrays Christopher Simpkins

CSCI 1112: Lecture 3 Mona Diab RoadMap High level view of Object Oriented Programming

programming in the presence of memory faults Saverio Caminiti , Irene Finocchi, and Emanuele G.

Performance Engineering for Algorithmic Building Blocks in the GHOST Library Georg Hager, Moritz

Get Out of the Valley: Power-Efficient Address Mapping for GPUs The 45 th International Symposium

Parallel Programming http://www.cs.bham.ac.uk/~hxt/2013/ parallel-programming/ based on: David

Cache-Oblivious Algorithms Paper Reading Group Matteo Frigo - PowerPoint PPT Presentation

Cache-Oblivious Algorithms Paper Reading Group Matteo Frigo Charles E. Leiserson Harald Prokop Sridhar Ramachandran Presents: Maksym Planeta 03.09.2015 Table of Contents Introduction Cache-oblivious algorithms Matrix multiplication

Part 2, course 2: Cache Oblivious Algorithms CR10: Data Aware Algorithms October 2, 2019 Agenda

Cache-Oblivious Algorithms 1 Cache-Oblivious Model 2 The Unknown Machine Algorithm Algorithm

Coping with the Memory Hierarchy the Cache-Oblivious Way Rolf Fagerberg University of Aarhus

Cache-oblivious sparse matrixvector multiplication Albert-Jan Yzelman April 3, 2009 Joint

Cache-oblivious sparse matrixvector multiplication Albert-Jan Yzelman &amp; Rob H. Bisseling

1 Classifying cache misses Cache Organization Classifying misses by causes (3Cs) Cache size,

FUNCTIONALLY OBLIVIOUS (AND SUCCINCT) Edward Kmett BUILDING BETTER TOOLS Cache-Oblivious

Cache Oblivious Sorting Gerth Stlting Brodal University of Aarhus Algorithms and Data

Part 2: External Memory and Cache Oblivious Algorithms CR10: Data Aware Algorithms September 25,

What Is Memory Hierarchy A typical memory hierarchy today: Lecture 13: Cache Basics and Cache

Memory Hierarchy: Cache Memory hierarchy Cache basics Locality Cache organization Cache-aware

Web Cache Consistency Web Cache Consistency Web Cache Consistency Web Cache Consistency

L09: Cache Name: ID: Question: Direct Mapping Cache Hit Rate Consider a 4-block empty Cache,

A Cache-Oblivious Heap Introduced by Arge et al. [1]. Based on distribution of elements

Part 2, course 3: Parallel External Memory and Cache Oblivious Algorithms CR10: Data Aware

Cache-Oblivious and Cache-Aware Algorithms , July 2004 Data Structures , February-March 2002

BLASFEO Gianluca Frison University of Freiburg BLIS retreat September 19, 2017 Gianluca Frison

Addressing Modes Chapter 11 S. Dandamudi Outline Addressing modes Examples Simple

Introduction to Object-Oriented Programming Arrays Christopher Simpkins

CSCI 1112: Lecture 3 Mona Diab RoadMap High level view of Object Oriented Programming

programming in the presence of memory faults Saverio Caminiti , Irene Finocchi, and Emanuele G.

Performance Engineering for Algorithmic Building Blocks in the GHOST Library Georg Hager, Moritz

Get Out of the Valley: Power-Efficient Address Mapping for GPUs The 45 th International Symposium

Parallel Programming http://www.cs.bham.ac.uk/~hxt/2013/ parallel-programming/ based on: David

Cache-oblivious sparse matrixvector multiplication Albert-Jan Yzelman & Rob H. Bisseling