cache oblivious sorting
play

Cache Oblivious Sorting Gerth Stlting Brodal University of Aarhus - PowerPoint PPT Presentation

Cache Oblivious Sorting Gerth Stlting Brodal University of Aarhus Algorithms and Data Structures, Bertinoro, Forl` , Italy, June 22-28, 2003 1 Foundation 2 Outline of Talk Cache oblivious model Sorting problem Binary and


  1. Cache Oblivious Sorting Gerth Stølting Brodal University of Aarhus Algorithms and Data Structures, Bertinoro, Forl` ı, Italy, June 22-28, 2003 1

  2. – Foundation 2

  3. Outline of Talk • Cache oblivious model • Sorting problem • Binary and multiway merge-sort • Funnel-sort • Lower bound — tall cache assumption • Experimental results • Conclusions Gerth S. Brodal: Cache Oblivious Sorting 3

  4. Cache Oblivious Model Frigo, Leiserson, Prokop, Ramachandran, FOCS’99 I/O • Program in the RAM model M • Analyze in the I/O model for c B e a m c CPU o arbitrary B and M h r e y M Gerth S. Brodal: Cache Oblivious Sorting 4

  5. Cache Oblivious Model Frigo, Leiserson, Prokop, Ramachandran, FOCS’99 I/O • Program in the RAM model M • Analyze in the I/O model for c B e a m c CPU o arbitrary B and M h r e y M Advantages: • Optimal on arbitrary level ⇒ optimal on all levels • Portability R CPU L1 L2 A Disk M Increasing access time and space Gerth S. Brodal: Cache Oblivious Sorting 4

  6. Sorting Problem • Input : array containing x 1 , . . . , x N • Output : array with x 1 , . . . , x N in sorted order • Elements can be compared and copied 3 4 8 2 8 4 0 4 4 6 ⇓ 0 2 3 4 4 4 4 6 8 8 Gerth S. Brodal: Cache Oblivious Sorting 5

  7. Binary Merge-Sort 0 2 3 4 4 4 4 6 8 8 Ouput Merging 2 3 4 8 8 0 4 4 4 6 Merging 3 4 2 8 8 0 4 4 4 6 Merging 2 8 0 4 Merging 3 4 8 2 8 4 0 4 4 6 Input Gerth S. Brodal: Cache Oblivious Sorting 6

  8. Binary Merge-Sort 0 2 3 4 4 4 4 6 8 8 Ouput Merging 2 3 4 8 8 0 4 4 4 6 Merging 3 4 2 8 8 0 4 4 4 6 Merging 2 8 0 4 Merging 3 4 8 2 8 4 0 4 4 6 Input • Recursive; two arrays; size O ( M ) internally in cache � � N N • O ( N log N ) comparisons • O B log 2 I/Os M Gerth S. Brodal: Cache Oblivious Sorting 6

  9. Merge-Sort Degree I/O � � N N 2 O B log 2 M � � N N d O B log d M ( d ≤ M B − 1) � � � � M N N Θ O B log M/B = O (Sort M,B ( N )) B M Aggarwal and Vitter 1988 Funnel-Sort O ( 1 2 ε Sort M,B ( N )) ( M ≥ B 1+ ε ) Frigo, Leiserson, Prokop and Ramachandran 1999 Brodal and Fagerberg 2002 Gerth S. Brodal: Cache Oblivious Sorting 7

  10. Outline of Talk • Cache oblivious model • Sorting problem • Binary and multiway merge-sort • Funnel-sort • Lower bound — tall cache assumption • Experimental results • Conclusions Gerth S. Brodal: Cache Oblivious Sorting 8

  11. Funnel-Sort Gerth S. Brodal: Cache Oblivious Sorting 9

  12. k -merger Frigo et al., FOCS’99 Sorted output stream M · · · k sorted input streams Gerth S. Brodal: Cache Oblivious Sorting 10

  13. k -merger Frigo et al., FOCS’99 Sorted output stream ← k 1 / 2 -mergers M 0 Recursive def. ← buffers of size k 3 / 2 = B 1 B √ · · · k M · · · M 1 M √ k · · · · · · k sorted input streams Gerth S. Brodal: Cache Oblivious Sorting 10

  14. k -merger Frigo et al., FOCS’99 Sorted output stream ← k 1 / 2 -mergers M 0 Recursive def. ← buffers of size k 3 / 2 = B 1 B √ · · · k M · · · M 1 M √ k · · · · · · k sorted input streams M 0 B 1 M 1 B 2 M 2 B √ k M √ · · · k Recursive Layout Gerth S. Brodal: Cache Oblivious Sorting 10

  15. Lazy k -merger Brodal and Fagerberg 2002 M 0 → B 1 B √ · · · k · · · M 1 M √ k · · · Gerth S. Brodal: Cache Oblivious Sorting 11

  16. Lazy k -merger Brodal and Fagerberg 2002 M 0 → B 1 B √ · · · k · · · M 1 M √ k · · · Procedure Fill ( v ) while out-buffer not full if left in-buffer empty Fill (left child) if right in-buffer empty Fill (right child) perform one merge step Gerth S. Brodal: Cache Oblivious Sorting 11

  17. Lazy k -merger Brodal and Fagerberg 2002 M 0 → B 1 B √ · · · k · · · M 1 M √ k · · · Procedure Fill ( v ) Lemma while out-buffer not full If M ≥ B 2 and output buffer has size if left in-buffer empty Fill (left child) k 3 then O ( k 3 B log M ( k 3 ) + k ) I/Os are if right in-buffer empty done during an invocation of Fill (root) Fill (right child) perform one merge step Gerth S. Brodal: Cache Oblivious Sorting 11

  18. Funnel-Sort Brodal and Fagerberg 2002 Frigo, Leiserson, Prokop and Ramachandran 1999 Divide input in N 1 / 3 segments of size N 2 / 3 Recursively MergeSort each segment Merge sorted segments by an N 1 / 3 -merger k N 1 / 3 N 2 / 9 N 4 / 27 . . . 2 Gerth S. Brodal: Cache Oblivious Sorting 12

  19. Funnel-Sort Brodal and Fagerberg 2002 Frigo, Leiserson, Prokop and Ramachandran 1999 Divide input in N 1 / 3 segments of size N 2 / 3 Recursively MergeSort each segment Merge sorted segments by an N 1 / 3 -merger k N 1 / 3 N 2 / 9 N 4 / 27 . . . 2 Funnel-Sort performs O (Sort M,B ( N )) I/Os for M ≥ B 2 Theorem Gerth S. Brodal: Cache Oblivious Sorting 12

  20. Outline of Talk • Cache oblivious model • Sorting problem • Binary and multiway merge-sort • Funnel-sort • Lower bound — tall cache assumption • Experimental results • Conclusions Gerth S. Brodal: Cache Oblivious Sorting 13

  21. Lower Bound Brodal and Fagerberg 2003 Block Size Memory I/Os B 1 M t 1 Machine 1 Machine 2 B 2 M t 2 One algorithm, two machines, B 1 ≤ B 2 Trade-off 8 t 1 B 1 + 3 t 1 B 1 log 8 Mt 2 ≥ N log N M − 1 . 45 N t 1 B 1 Gerth S. Brodal: Cache Oblivious Sorting 14

  22. Lower Bound Assumption I/Os ( a ) B 2 = M 1 − ε : Sort B 2 ,M ( N ) Lazy B ≤ M 1 − ε Funnel-sort Sort B 1 ,M ( N ) · 1 ( b ) B 1 = 1 : ε ( a ) B 2 = M/ 2 : Sort B 2 ,M ( N ) Binary B ≤ M/ 2 Merge-sort ( b ) B 1 = 1 : Sort B 1 ,M ( N ) · log M Corollary ( a ) ⇒ ( b ) Gerth S. Brodal: Cache Oblivious Sorting 15

  23. Fake Proof Goal: 8 t 1 B 1 + 3 t 1 B 1 log 8 Mt 2 ≥ N log N M − 1 . 45 N t 1 B 1 Merging sorted lists X and Y takes ≈ | X | log | Y | | X | comparisons In total t 1 B 1 elements touched ⇒ t 1 B 1 /t 2 elements touched on average per B 2 -I/O ⇒ effective B 2 is t 1 B 1 /t 2 B 2 : Comparisons gained per B 2 -I/O: M : M t 1 B 1 /t 2 · log t 1 B 1 /t 2 Hence: t 1 B 1 · log Mt 2 ≥ N log N − 1 . 45 N t 1 B 1 Gerth S. Brodal: Cache Oblivious Sorting 16

  24. Fake Proof Goal: 8 t 1 B 1 + 3 t 1 B 1 log 8 Mt 2 ≥ N log N M − 1 . 45 N t 1 B 1 Merging sorted lists X and Y takes ≈ | X | log | Y | | X | comparisons In total t 1 B 1 elements touched ⇒ t 1 B 1 /t 2 elements touched on average per B 2 -I/O ⇒ effective B 2 is t 1 B 1 /t 2 B 2 : Comparisons gained per B 2 -I/O: M : M t 1 B 1 /t 2 · log One problem : t 1 B 1 /t 2 Online choice Hence: t 1 B 1 · log Mt 2 ≥ N log N − 1 . 45 N t 1 B 1 Gerth S. Brodal: Cache Oblivious Sorting 16

  25. Ideas from Real Proof I/O 1 [ s, t ] , . . . I/O 2 [ s, t ] , . . . A [ i ] ≤ A [ j ] A [ i ] ← A [ j ] T Answers T T ∗ ∗ ∗ s A : i 8 t 1 B 1 + 3 t 1 B 1 log 8 Mt 2 ≥ height ≥ N log N M − 1 . 45 N B 1 t 1 Gerth S. Brodal: Cache Oblivious Sorting 17

  26. Outline of Talk • Cache oblivious model • Sorting problem • Binary and multiway merge-sort • Funnel-sort • Lower bound — tall cache assumption • Experimental results • Conclusions Gerth S. Brodal: Cache Oblivious Sorting 18

  27. � Hardware Processor type Pentium 4 Pentium 3 MIPS 10000 Workstation Dell PC Delta PC SGI Octane Operating system GNU/Linux Kernel GNU/Linux Kernel IRIX version 6.5 version 2.4.18 version 2.4.18 Clock rate 2400 MHz 800 MHz 175 MHz Address space 32 bit 32 bit 64 bit Integer pipeline stages 20 12 6 L1 data cache size 8 KB 16 KB 32 KB L1 line size 128 Bytes 32 Bytes 32 Bytes L1 associativity 4 way 4 way 2 way L2 cache size 512 KB 256 KB 1024 KB L2 line size 128 Bytes 32 Bytes 32 Bytes L2 associativity 8 way 4 way 2 way TLB entries 128 64 64 TLB associativity Full 4 way 64 way TLB miss handler Hardware Hardware Software Main memory 512 MB 256 MB 128 MB Gerth S. Brodal: Cache Oblivious Sorting 19

  28. Wall Clock ffunnelsort Pentium 4, 512/512 funnelsort 100.0µs lowscosa stdsort ami_sort msort-c msort-m Wall clock time per element 10.0µs 1.0µs 0.1µs 1,000,000 10,000,000 100,000,000 1,000,000,000 Elements Kristoffer Vinther 2003 Gerth S. Brodal: Cache Oblivious Sorting 20

  29. Page Faults ffunnelsort Pentium 4, 512/512 funnelsort 30.0 lowscosa stdsort msort-c 25.0 msort-m Page faults per block of elements 20.0 15.0 10.0 5.0 0.0 1,000,000 10,000,000 100,000,000 1,000,000,000 Elements Kristoffer Vinther 2003 Gerth S. Brodal: Cache Oblivious Sorting 21

  30. Cache Misses ffunnelsort MIPS 10000, 1024/128 funnelsort 30.0 lowscosa stdsort msort-c 25.0 msort-m L2 cache misses per lines of elements 20.0 15.0 10.0 5.0 0.0 100,000 1,000,000 10,000,000 100,000,000 1,000,000,000 Elements Kristoffer Vinther 2003 Gerth S. Brodal: Cache Oblivious Sorting 22

  31. TLB Misses ffunnelsort MIPS 10000, 1024/128 funnelsort 10.0 lowscosa stdsort msort-c msort-m TLB misses per block of elements 1.0 100,000 1,000,000 10,000,000 100,000,000 1,000,000,000 Elements Kristoffer Vinther 2003 Gerth S. Brodal: Cache Oblivious Sorting 23

  32. Outline of Talk • Cache oblivious model • Sorting problem • Binary and multiway merge-sort • Funnel-sort • Lower bound — tall cache assumption • Experimental results • Conclusions Gerth S. Brodal: Cache Oblivious Sorting 24

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend