coping with the memory hierarchy the cache oblivious way
play

Coping with the Memory Hierarchy the Cache-Oblivious Way Rolf - PowerPoint PPT Presentation

Coping with the Memory Hierarchy the Cache-Oblivious Way Rolf Fagerberg University of Aarhus Imada, SDU, February 18, 2004 Overview The memory hierachy The I/O-model The cache-oblivious model Examples of cache-oblivious


  1. Coping with the Memory Hierarchy the Cache-Oblivious Way Rolf Fagerberg University of Aarhus Imada, SDU, February 18, 2004

  2. Overview • The memory hierachy • The I/O-model • The cache-oblivious model • Examples of cache-oblivious algorithms • Double for-loop (with applications) • Searching • Sorting • Theoretical limits of cache-obliviousness Fagerberg: The Cache-Oblivious Way 2

  3. The Memory Hierarchy Modern computers: RAM Cache3 Cache2 Disk Reg. Cache1 CPU Tertiary Storage Fagerberg: The Cache-Oblivious Way 3

  4. The Memory Hierarchy Modern computers: RAM Cache3 Cache2 Disk Reg. Cache1 CPU Tertiary Storage Access time Volume Registers 1 cycle 1 Kb Cache 10 cycles 512 Kb RAM 100 cycles 512 Mb Disk 20,000,000 cycles 80 Gb Fagerberg: The Cache-Oblivious Way 3

  5. The Memory Hierarchy Modern computers: RAM Cache3 Cache2 Disk Reg. Cache1 CPU Tertiary Storage Gap increases over time. Access time Volume Real problems of Gigabyte, Terabyte, Registers 1 cycle 1 Kb and even Petabyte size: Databases Cache 10 cycles 512 Kb (finance, phone companies, banks, RAM 100 cycles 512 Mb weather, geology, geography, astron- Disk 20,000,000 cycles 80 Gb omy), WWW, GIS systems, computer graphics. Fagerberg: The Cache-Oblivious Way 3

  6. Classic RAM Model Add: O (1) R The RAM model: CPU A O (1) Branch: M Mem access: O (1) Fagerberg: The Cache-Oblivious Way 4

  7. Classic RAM Model Add: O (1) R The RAM model: CPU A O (1) Branch: M Mem access: O (1) Increasingly inadequate Fagerberg: The Cache-Oblivious Way 4

  8. Overview √ The memory hierachy • The I/O-model • The cache-oblivious model • Examples of cache-oblivious algorithms • Double for-loop (with applications) • Searching • Sorting • Theoretical limits of cache-obliviousness Fagerberg: The Cache-Oblivious Way 5

  9. I/O Model I/O Model two layers M e m → External CPU o r Memory y N = problem size M = memory size Aggarwal and Vitter 1988 B = I/O block size Cost: number of I/Os. Fagerberg: The Cache-Oblivious Way 6

  10. Example CPU time Inplace Worstcase √ √ Heapsort N log N √ N log N Quicksort √ Mergesort N log N Fagerberg: The Cache-Oblivious Way 7

  11. Example CPU time Inplace Worstcase I/O √ √ Heapsort N log N N log N √ N log N ( N log N ) /B Quicksort √ Mergesort N log N ( N log N ) /B Random memory access ⇒ page fault at every access. Sequential memory access ⇒ page fault every B accesses. Typically, B ∼ 10 3 Fagerberg: The Cache-Oblivious Way 7

  12. I/O-Optimal Sorting N Binary Mergesort: B log 2 N I/Os Multi-Way Merging: Maximal merge degree ≈ M/B N N Multi-Way Mergesort: B log M/B M I/Os Fagerberg: The Cache-Oblivious Way 8

  13. I/O Model Facts • Scanning: Θ( N/B ) I/Os. • Searching: Θ(log B N ) I/Os by B -trees. � � N N I/Os by M • Sorting: Θ B log M/B B -way merge-sort. M � � min { N, N N • Permuting: Θ B log M/B M } by direct move or sorting 1988-2004: Many algorithms and data structures for problems from computational geometry, graphs, strings, . . . Fagerberg: The Cache-Oblivious Way 9

  14. Overview √ The memory hierachy √ The I/O-model • The cache-oblivious model • Examples of cache-oblivious algorithms • Double for-loop (with applications) • Searching • Sorting • Theoretical limits of cache-obliviousness Fagerberg: The Cache-Oblivious Way 10

  15. Computer Models Reality: L1 L2 R C C CPU a a A Disk c c M h h e e Increasing access time Models: I/O Cache- M c B e R a m CPU A CPU c Oblivious- o M h r e y ness M Multi-level RAM model I/O model New Model models Fagerberg: The Cache-Oblivious Way 11

  16. Cache-Oblivious Model • Program in the RAM model I/O • Analyze in the I/O model for M c B e a m arbitrary B and M c CPU o h r e y • Optimal off-line cache replacement strategy M Frigo, Leiserson, Prokop, Ramachandran, FOCS’99 Fagerberg: The Cache-Oblivious Way 12

  17. Cache-Oblivious Model • Program in the RAM model I/O • Analyze in the I/O model for M c B e a m arbitrary B and M c CPU o h r e y • Optimal off-line cache replacement strategy M Frigo, Leiserson, Prokop, Ramachandran, FOCS’99 Advantages: • Optimal on arbitrary level ⇒ optimal on all levels • Portability • Simplicity of model. L1 L2 R C C CPU a a A Disk c c M h h e e Increasing access time Fagerberg: The Cache-Oblivious Way 12

  18. Cache-Oblivious Results Scanning ⇒ stack, queue, selection,. . . . Fagerberg: The Cache-Oblivious Way 13

  19. Cache-Oblivious Results Scanning ⇒ stack, queue, selection,. . . . Matrix multiplication, FFT: FOCS’99 Sorting: FOCS’99, ICALP’02, ALENEX’04 Search trees: Prokop 99, FOCS’00, WAE’01, SODA’02 × 2, ESA’02, FOCS’03 Priority queues: STOC’02, ISAAC’02 Graph algorithms: STOC’02, BRICS-04-2 Computational geometry: 2 × ICALP’02 , SCG’03 Scanning dynamic sets: ESA’02 Power of cache-obliviousness: STOC’03 Fagerberg: The Cache-Oblivious Way 13

  20. Cache-Oblivious Results Scanning ⇒ stack, queue, selection,. . . . Matrix multiplication, FFT: FOCS’99 Sorting: FOCS’99, ICALP’02, ALENEX’04 Search trees: Prokop 99, FOCS’00, WAE’01, SODA’02 × 2, ESA’02, FOCS’03 Priority queues: STOC’02, ISAAC’02 Graph algorithms: STOC’02, BRICS-04-2 Computational geometry: 2 × ICALP’02 , SCG’03 Scanning dynamic sets: ESA’02 Power of cache-obliviousness: STOC’03 Fagerberg: The Cache-Oblivious Way 13

  21. Overview √ The memory hierachy √ The I/O-model √ The cache-oblivious model • Examples of cache-oblivious algorithms • Double for-loop (with applications) • Searching • Sorting • Theoretical limits of cache-obliviousness Fagerberg: The Cache-Oblivious Way 14

  22. ✆ ✌ ✁ ✂ ✄ ✌ ☎ ✎ ✠ ☞ ✡ ✞ ✌ ☛ ☛ ☞ � � ☛ ✍ ✄ ☞ ✏ ✌ ✎ � ✁ ✂ ☎ ☛ ✆ ✓ ☎ ✠ ✡ ✞ ☎ ✄ Double for-loop i X X , Y arrays of length n : Y j ✝✟✞ ✝✟✞ ✏✒✑ I/O complexity: B = n 2 n × n B Fagerberg: The Cache-Oblivious Way 15

  23. ✌ ☛ ✂ ✄ ✌ ✆ ✝ ✞ ✁ ✠ ✡ ✞ ✌ ☛ ☞ � � ✁ ✂ ✄ ✁ ✆ ☎ ✞ ✁ ✠ ☎ ☛ ✁ ☞ ✞ � ☞ ✏ ✌ ✎ ✓ ☎ ✎ ✍ ✄ � ☞ ☛ ✁ � ✂ ✄ ☎ ✆ ☛ ☎ ✠ ✡ ✞ ☎ ✆ ☎ ☛ � Double for-loop M X More efficient version in the I/O-model: Y M I/O complexity: n 2 M × n n M × M B = MB ✝✟✞ ✏✒✑ Fagerberg: The Cache-Oblivious Way 16

  24. Double for-loop Cache-oblivious version: n/ 2 n/ 2 X + recursion Y n/ 2 n/ 2 I/O complexity: n 2 Again MB Fagerberg: The Cache-Oblivious Way 17

  25. ✡ ✞ ✁ ✝ ✄ ☎ ☛ ✄ ☎ ✡ ✟ ✆ ✠ ☞ ✌ ✑ ✌ ✑ ✄ ☎ ✁ ☎ ✞ ✡ ✞ ✟ ✠ ☞ ✌ ✑ ✄ ☎ ✞ ✄ ✟ ✠ ☞ ✌ ☞ � ✁ ✁ ✂ ✡ ✟ ☎ ✟ ✌ ✑ ✌ ☛ ✄ ☎ ✡ ✞ ✠ ✠ ☞ ✌ ✑ ✄ ☎ ✡ ✞ ✟ ☞ ✟ ✠ ☎ ☞ ✌ ☞ � ✁ ✁ ✂ ✄ ✆ ✞ ✁ ✁ ✝ ✄ ☎ ☛ ✄ ☎ ✡ ✌ ✄ ☞ ✆ ✄ ✄ ☎ ✡ ✞ ✟ ✠ ✆ ✡ ☎ ☞ � ✄ ✍ ✎ ☎ ☞ ✓ � ☞ ✌ ✝ � ✁ ✁ ✂✄ ☎ ✆ ✁ ✁ ✄ ✠ ☎ ✑ ✌ ✑ ✄ ☎ ✡ ✞ ✟ ✎ ✏ ☛ ✂ ✟ ✠ ☞ ✌ ☞ � ✁ ✁ ✄ ✡ ☎ ✆ ✁ ✁ ✝ ✄ ☎ ✑ ✌ ✞ ☎ ☞ ✄ ☎ ✄ ☛ ☎ � ✁ ✁ ✂ ☎ ✄ ✆ ✁ ✁ ✝ ✄ ☎ ✑ ✌ ✑ ✠ 18 Double for-loop Cache-oblivious version Fagerberg: The Cache-Oblivious Way ✏✒✑

  26. Experiments 10000 time (seconds) 1000 100 10 plain cache-aware (L1) cache-aware (L2) log 2 of array size (bytes) cache-oblivious 1 15 16 17 18 19 20 21 Sizes within RAM (element size 4 bytes) 366 MHz Pentium II, 128 MB RAM, 256 KB Cache, gcc -O3, Linux Fagerberg: The Cache-Oblivious Way 19

  27. Experiments time (seconds) 1000 100 10 1 plain cache-aware (L2) cache-aware (RAM) log 2 of array size (bytes) 0.1 cache-oblivious 19 20 21 22 23 24 25 26 27 Sizes exceeding RAM (element size 1 KB) 366 MHz Pentium II, 128 MB RAM, 256 KB Cache, gcc -O3, Linux Fagerberg: The Cache-Oblivious Way 20

  28. For-loop Applications Join in databases Dynamic programming (bioinformatics) Matrix multiplication (scientific computing) Fagerberg: The Cache-Oblivious Way 21

  29. Overview √ The memory hierachy √ The I/O-model √ The cache-oblivious model • Examples of cache-oblivious algorithms √ Double for-loop (with applications) • Searching • Sorting • Theoretical limits of cache-obliviousness Fagerberg: The Cache-Oblivious Way 22

  30. Static Cache-Oblivious Trees Recursive memory layout (van Emde Boas layout) Prokop 1999 · · · ⌊ h/ 2 ⌋ A · · · · · · · · · h · · · ⌈ h/ 2 ⌉ · · · · · · · · · B 1 Bk · · · · · · · · · · · · · · · · · · A B 1 · · · Bk Binary tree Searches use O(log B N ) I/Os Fagerberg: The Cache-Oblivious Way 23

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend