cache oblivious algorithms and data structures
play

Cache-Oblivious Algorithms and Data Structures Gerth Stlting Brodal - PowerPoint PPT Presentation


  1. ✂ ✔ ✓ ✗ ✑ ☞ ✗ ✘ ✞ ✙ ☞ ✚ ✡ ✓ ✥ ✣ ✞ ✤ ✓ ✑ ✔ ✏ ✗ ✖ ✒ ✞ ☎ ✝ ✆ � ✁✂ ✄ ☎ ✆ ✆ ✆ ☞ ✕ ✞ ★ ✑ ✏ ✒ ☞ ✍✓ ☞ ✔ ✡ ✣ Cache-Oblivious Algorithms and Data Structures Gerth Stølting Brodal University of Aarhus ✠☛✡ ✌✎✍ ✌✎✏ ✛✢✜ ✚✧✦ ✝✟✞ 1

  2. Outline • Motivation – A typical workstation – A trivial program • Memory models – I/O model – Ideal cache model • Basic cache-oblivious algorithms – Matrix multiplication – Search trees – Sorting • Some experimental results • Conclusion Cache-Oblivious Algorithms and Data Structures 2

  3. A Typical Workstation Cache-Oblivious Algorithms and Data Structures 3

  4. Customizing a Dell 650 Processor speed 2.4 – 3.2 GHz L3 cache size 0.5 – 2 MB Memory 1/4 – 4 GB Hard Disk 36 GB – 146 GB 7.200 – 15.000 RPM www.dell.dk CD/DVD 8 – 48x L2 cache size 256 – 512 KB L2 cache line size 128 Bytes L1 cache line size 64 Bytes L1 cache size 16 KB www.intel.com Cache-Oblivious Algorithms and Data Structures 4

  5. Customizing a Dell 650 Processor speed 2.4 – 3.2 GHz L3 cache size 0.5 – 2 MB ? w Memory 1/4 – 4 GB o n Hard Disk 36 GB – 146 GB k o t 7.200 – 15.000 RPM t n www.dell.dk CD/DVD 8 – 48x a w e w o L2 cache size 256 – 512 KB D L2 cache line size 128 Bytes L1 cache line size 64 Bytes L1 cache size 16 KB www.intel.com Cache-Oblivious Algorithms and Data Structures 4

  6. Hierarchical Memory Basics B 3 B 1 R CPU L1 L2 L3 Disk A M B 2 B 4 Increasing access time and space • Data moved between adjacent memory levels in blocks Cache-Oblivious Algorithms and Data Structures 5

  7. A Trivial Program for (i=0; i+d<n; i+=d) A[i]=i+d; A[i]=0; for (i=0, j=0; j<8*1024*1024; j++) i=A[i]; d A n Cache-Oblivious Algorithms and Data Structures 6

  8. A Trivial Program (cont.) d = 1 200 180 160 140 120 Seconds 100 80 60 40 20 0 0 5 10 15 20 25 log n RAM : n ≈ 2 25 ≡ 128 MB Cache-Oblivious Algorithms and Data Structures 7

  9. A Trivial Program (cont.) d = 1 3 2.5 2 Seconds 1.5 1 0.5 0 2 4 6 8 10 12 14 16 18 20 log n L1 : n ≈ 2 12 ≡ 16 KB L2 : n ≈ 2 16 ≡ 256 KB Cache-Oblivious Algorithms and Data Structures 8

  10. n = 2 24 A Trivial Program (cont.) 2 1.8 1.6 1.4 1.2 Seconds 1 0.8 0.6 0.4 0.2 0 0 5 10 15 20 25 log d Cache line d = 2 3 ≡ 32 Bytes Cache-Oblivious Algorithms and Data Structures 9

  11. n = 2 24 A Trivial Program (cont.) 2 1.8 1.6 ? w 1.4 o n 1.2 k Seconds o 1 t t n 0.8 a w 0.6 e w 0.4 o D 0.2 0 0 5 10 15 20 25 log d Cache line d = 2 3 ≡ 32 Bytes Cache-Oblivious Algorithms and Data Structures 9

  12. A Trivial Program (cont.) — If you want to know... Experiments were performed on a DELL 8000, Pentium III, 850 MHz, 128 MB RAM, running Linux 2.4.2, and using gcc version 2.96 with optimization -O3 L1 instruction and data caches • 4-way set associative, 32-byte line size • 16 KB instruction cache and 16 KB write-back data cache L2 level cache • 8-way set associative, 32-byte line size • 256 KB www .Intel. com Cache-Oblivious Algorithms and Data Structures 10

  13. � Algorithmic Problem • Memory hierarchy has become a fact of life • Accessing non-local storage may take a very long time • Good locality is important for achieving high performance Latency Relative to CPU Register 0.5 ns 1 L1 cache 0.5 ns 1-2 L2 cache 3 ns 2-7 DRAM 150 ns 80-200 TLB 500+ ns 200-2000 Increasing Disk 10 ms 10 Cache-Oblivious Algorithms and Data Structures 11

  14. Algorithmic Problem • Modern hardware is not uniform — many different parameters – Number of memory levels – Cache sizes – Cache line/disk block sizes – Cache associativity – Cache replacement strategy – CPU/BUS/memory speed Cache-Oblivious Algorithms and Data Structures 12

  15. Algorithmic Problem • Modern hardware is not uniform — many different parameters – Number of memory levels – Cache sizes – Cache line/disk block sizes – Cache associativity – Cache replacement strategy – CPU/BUS/memory speed • Programs should ideally run for many different parameters Cache-Oblivious Algorithms and Data Structures 12

  16. Algorithmic Problem • Modern hardware is not uniform — many different parameters – Number of memory levels – Cache sizes – Cache line/disk block sizes – Cache associativity – Cache replacement strategy – CPU/BUS/memory speed • Programs should ideally run for many different parameters – by knowing many of the parameters at runtime – by knowing few essential parameters – ignoring the memory hierarchies Cache-Oblivious Algorithms and Data Structures 12

  17. Algorithmic Problem • Modern hardware is not uniform — many different parameters – Number of memory levels – Cache sizes – Cache line/disk block sizes – Cache associativity – Cache replacement strategy – CPU/BUS/memory speed • Programs should ideally run for many different parameters – by knowing many of the parameters at runtime – by knowing few essential parameters – ignoring the memory hierarchies practice Cache-Oblivious Algorithms and Data Structures 12

  18. Algorithmic Problem • Modern hardware is not uniform — many different parameters – Number of memory levels – Cache sizes – Cache line/disk block sizes – Cache associativity – Cache replacement strategy – CPU/BUS/memory speed • Programs should ideally run for many different parameters – by knowing many of the parameters at runtime – by knowing few essential parameters – ignoring the memory hierarchies practice • Programs are executed on unpredictable configurations – Generic portable and scalable software libraries – Code downloaded from the Internet, e.g. Java applets – Dynamic environments, e.g. multiple processes Cache-Oblivious Algorithms and Data Structures 12

  19. Outline • Motivation – A typical workstation – A trivial program • Memory models – I/O model – Ideal cache model • Basic cache-oblivious algorithms – Matrix multiplication – Search trees – Sorting • Some experimental results • Conclusion Cache-Oblivious Algorithms and Data Structures 13

  20. Hierarchical Memory Models — many parameters R CPU L1 L2 L3 Disk A M Increasing access time and space • Limited success since model to complicated Cache-Oblivious Algorithms and Data Structures 14

  21. I/O Model — two parameters Aggarwal and Vitter 1988 I/O • Measure number of block transfers between two memory levels M c B e a m c • Bottleneck in many computations CPU o h r e y • Very successful (simplicity) M Cache-Oblivious Algorithms and Data Structures 15

  22. I/O Model — two parameters Aggarwal and Vitter 1988 I/O • Measure number of block transfers between two memory levels M c B e a m c • Bottleneck in many computations CPU o h r e y • Very successful (simplicity) M Limitations • Parameters B and M must be known • Does not handle multiple memory levels • Does not handle dynamic M Cache-Oblivious Algorithms and Data Structures 15

  23. Ideal Cache Model — no parameters!? Frigo, Leiserson, Prokop, Ramachandran 1999 I/O • Program with only one memory M • Analyze in the I/O model for c B e a m c CPU o • Optimal off-line cache replacement h r e y strategy arbitrary B and M M Cache-Oblivious Algorithms and Data Structures 16

  24. Ideal Cache Model — no parameters!? Frigo, Leiserson, Prokop, Ramachandran 1999 I/O • Program with only one memory M • Analyze in the I/O model for c B e a m c CPU o • Optimal off-line cache replacement h r e y strategy arbitrary B and M M Advantages • Optimal on arbitrary level ⇒ optimal on all levels • Portability, B and M not hard-wired into algorithm • Dynamic changing parameters Cache-Oblivious Algorithms and Data Structures 16

  25. Justification of the Ideal-Cache Model Frigo, Leiserson, Prokop, Ramachandran 1999 Optimal replacement LRU + 2 × cache size ⇒ at most 2 × cache misses Sleator and Tarjan, 1985 Corollary T M,B ( N ) = O ( T 2 M,B ( N )) ⇒ #cache misses using LRU is O ( T M,B ( N )) Two memory levels Optimal cache-oblivious algorithm satisfying T M,B ( N ) = O ( T 2 M,B ( N )) ⇒ optimal #cache misses on each level of a multilevel LRU cache Fully associativity cache Simulation of LRU • Direct mapped cache • Explicit memory management • Dictionary (2-universal hash functions) of cache lines in memory • Expected O (1) access time to a cache line in memory Cache-Oblivious Algorithms and Data Structures 17

  26. Outline • Motivation – A typical workstation – A trivial program • Memory models – I/O model – Ideal cache model • Basic cache-oblivious algorithms – Matrix multiplication – Search trees – Sorting • Some experimental results • Conclusion Cache-Oblivious Algorithms and Data Structures 18

  27. Warm-up : Scanning sum = 0 for i = 1 to N do sum = sum + A [ i ] � N � O I/Os B B A N Cache-Oblivious Algorithms and Data Structures 19

  28. Warm-up : Scanning sum = 0 for i = 1 to N do sum = sum + A [ i ] � N � O I/Os B B A N Corollary Cache-oblivious selection requires O ( N/B ) I/Os Hoare 1961 / Blum et al. 1973 Cache-Oblivious Algorithms and Data Structures 19

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend