in place data structures which complexity measures do
play

In-Place Data Structures: Which Complexity Measures Do Matter? Jyrki - PowerPoint PPT Presentation

In-Place Data Structures: Which Complexity Measures Do Matter? Jyrki Katajainen 1 , 2 Jingsen Chen 3 , Stefan Edelkamp 4 , Amr Elmasry 5 , Max Stenmark 2 1 Kbenhavns Universitet 2 Jyrki Katajainen and Company 3 Lule a Tekniska Universitet 4


  1. In-Place Data Structures: Which Complexity Measures Do Matter? Jyrki Katajainen 1 , 2 Jingsen Chen 3 , Stefan Edelkamp 4 , Amr Elmasry 5 , Max Stenmark 2 1 Københavns Universitet 2 Jyrki Katajainen and Company 3 Lule˚ a Tekniska Universitet 4 Universit¨ at Bremen 5 Alexandria University � Performance Engineering Laboratory c ARCO meeting at ITU, fall 2012 (1)

  2. Model of computation Available • An infinite array a suitable for storing elements • O (1) number of other memory locations for storing elements • O (1) number of other variables (counters, indices, bit strings of length ⌈ lg(1 + n ) ⌉ ) workspace n = 8 a 5 6 7 0 1 2 3 4 Requirement • If the data structure stores n elements, these elements must be kept in the first n locations of a . � Performance Engineering Laboratory c ARCO meeting at ITU, fall 2012 (2)

  3. Coverage In-place data structures Complexity measures • Binary heaps • Space utilization • Static search trees • # Element comparisons • # Element moves • # Cache misses • # Branch mispredictions • Running time Aha! The whole cycle What is important? design analysis experimentation implementation � Performance Engineering Laboratory c ARCO meeting at ITU, fall 2012 (3)

  4. Binary heaps 0 8 1 2 10 26 construct () 3 4 5 6 for ( i = parent ( n − 1); i ≥ 0; −− i ) 75 12 46 75 siftdown ( i ) 7 minimum () 80 return a [0] n = 8 insert ( x ) a 8 10 26 75 12 46 75 80 a [ n ] = x 5 6 7 0 1 2 3 4 siftup ( n ) n += 1 left - child ( i ) return 2 i + 1 extract - min () min = a [0] right - child ( i ) n − = 1 return 2 i + 2 a [0] = a [ n ] parent ( i ) siftdown (0) return ⌊ ( i − 1) / 2 ⌋ return min � Performance Engineering Laboratory c ARCO meeting at ITU, fall 2012 (4)

  5. Experimental setup Standard benchmark Processor � Intel R Core TM – construct a heap of size n i5-2520M Input data CPU @ 2.50GHz × 4 All elements are of type int Memory system Repetitions 12-way-associative L3 cache: Repeat each experiment 3 MB r times, r = 2 26 /n cache lines: 64 B Reported value main memory: 3.8 GB Measurement result divided Operating system by r × n Ubuntu 12.04 (Linux kernel 3.2.0-29-generic) Compiler compiler ( gcc version g++ 4.6.3) with optimization -O3 � Performance Engineering Laboratory c ARCO meeting at ITU, fall 2012 (5)

  6. Reduce # element comparisons Inventor construct insert extract - min Extra Space Williams/Floyd 2 n ∼ lg n ∼ 2 lg n O (1) words Gonnet & Munro 1 . 625 n Θ( n ) words ∼ lg n + log ∗ n Gonnet & Munro ∼ lg lg n O (1) words Lower bounds ∼ 1 . 37 n Ω(1) ∼ lg n Ω(1) words construct : Use a binomial tree in the construction insert : Binary search on the siftup path extract - min : lg n − lg lg n levels down along the siftdown path, siftup or recur further down � Performance Engineering Laboratory c ARCO meeting at ITU, fall 2012 (6)

  7. Floyd’s heap-construction program 1 template < typename position , typename index , typename comparator > 2 void siftdown ( position a , index i , index n , comparator less ) { 3 typedef typename std : : iterator_traits < position > :: value_type element ; 4 element copy = a [ i ] ; 0 5 loop : index j = 2 ∗ i ; 6 8 7 i f ( j < = n ) { 1 2 8 i f ( j < n ) 26 10 9 i f ( less ( a [ j ] , a [ j + 1]) ) 3 5 4 6 10 j = j + 1; 11 i f ( less ( copy , a [ j ]) ) { 75 12 46 75 12 a [ i ] = a [ j ] ; 7 13 i = j ; 80 14 goto loop ; 15 } n = 8 16 } 17 a [ i ] = copy ; a 8 10 26 75 12 46 75 80 18 } 0 1 2 3 4 5 6 7 19 20 template < typename position , typename comparator > comparator less ) { 21 void make_heap ( position first , position beyond , 22 typedef typename std : : iterator_traits < position > :: difference_type index ; 23 position const a = first − 1; 24 index const n = beyond − first ; 25 for ( index i = n / 2; i > 0; −− i ) 26 siftdown ( a , i , n , less ) ; [Floyd 1964] 27 } � Performance Engineering Laboratory c ARCO meeting at ITU, fall 2012 (7)

  8. Remove an easy-to-predict if opt 1 : Make sure that siftdown is always called with an odd n i f ( j < n ) . . . for ( index i = n / 2; i > 0; −− i ) siftdown ( a , i , n , less ) ; − → template < typename position , typename index , typename comparator > void siftup ( position a , index j , comparator less ) { . . . Construction time [ns] } n F F 1 index const m = ( n & 1) ? n : n − 1; for ( index i = m / 2; i > 0; −− i ) 2 10 7.5 7.1 siftdown ( a , i , m , less ) ; 2 15 siftup ( a , n , less ) ; 7.4 7.0 2 20 8.2 7.9 2 25 8.9 8.4 � Performance Engineering Laboratory c ARCO meeting at ITU, fall 2012 (8)

  9. Remove a hard-to-predict if opt 2 : Interpret the result of a comparison as an integer and use this value in normal index arithmetic i f ( condition ) { j = j + 1; Construction time [ns] } n F 1 F 12 − → 2 10 7.1 4.8 j = j + condition ; 2 15 7.0 4.9 2 20 7.9 6.3 2 25 8.4 7.2 � Performance Engineering Laboratory c ARCO meeting at ITU, fall 2012 (9)

  10. commercial break Lean programs • A program has a constant Theorem. Let P be a program number of unnested loops. of length κ , measured in the • Each loop is branch-free , number of assembly-language in- except the final conditional structions. Assume that the run- branch at the end. ning time of P is t ( n ) for an input • A branch predictor is static : of size n . There exists a pro- forward branches are not gram Q of length O ( κ ) that is taken and backward branches equivalent to P , runs in O ( κt ( n )) are taken. time for the same input as P , and • Each such program induces induces O (1) branch mispredic- O (1) branch mispredictions in tions. this model. [Elmasry, Katajainen 2012] � Performance Engineering Laboratory c ARCO meeting at ITU, fall 2012 (10)

  11. Reduce # element moves opt 3 : Do not make any element moves when the element at the root stays in its original location Construction time [ns] element copy = a [ i ] ; n F 12 F 123 − → 2 10 4.8 4.3 2 15 4.9 4.6 element copy ; index k = 2 ∗ i ; 2 20 6.3 5.9 k = k + less ( a [ k ] , a [ k + 1]) ; 2 25 7.2 6.9 i f ( less ( a [ i ] , a [ k ]) ) { copy = a [ i ] ; Element moves a [ i ] = a [ k ] ; } n F F 123 else { return ; 2 10 1.73 1.52 } i = k ; 2 15 1.74 1.53 2 20 1.74 1.53 2 25 1.74 1.52 Aha! Loop unrolling � Performance Engineering Laboratory c ARCO meeting at ITU, fall 2012 (11)

  12. Reduce # cache misses opt 4 : Visit the nodes in reverse depth-first order instead of reverse breadth-first order [Bojesen et al. 2000] for ( index i = n / 2; i > 0; −− i ) siftdown ( a , i , n , less ) ; Construction time [ns] − → F F 123 F 1 - 4 n index j = n / 2; index const i = j / 2; 2 10 7.4 4.3 5.2 while ( j > i ) { 2 15 siftdown ( a , j , n , less ) ; 7.4 4.6 5.1 index z = j ; 2 20 8.2 5.9 5.2 while (( z & 1) = = 0) { 2 25 z / = 2; 8.7 6.9 5.1 siftdown ( a , z , n , less ) ; } −− j ; } � Performance Engineering Laboratory c ARCO meeting at ITU, fall 2012 (12)

  13. Making the GM algorithm in-place Element comparisons size: ∼ n/ lg n ∼ 2 n − → ∼ 1 . 625 n Element moves size: ∼ lg n ∼ 2 n − → ∼ 2 . 125 n Cache misses 1. Improve GM : ∼ n lg B ∼ n B , assuming − → B O ( n ) words − → O ( n ) bits that B lg n << M ( B block 2. Apply the improved algo- size; M memory size) rithm for all bottom trees; Construction time [ns] keep the bits needed com- n F GM pactly in a word 2 10 7.4 8.0 3. Use F ’s siftdown approach for 2 15 7.4 7.7 the top tree. 2 20 8.2 7.7 2 25 8.7 7.7 � Performance Engineering Laboratory c ARCO meeting at ITU, fall 2012 (13)

  14. Construction time [ns] Instructions n std F F 123 F 1 - 4 GM n std F F 123 F 1 - 4 GM Heap construction: Summary 2 10 10.7 2 15 7.4 4.3 5.2 8.0 2 15 10.4 2 20 35.5 20.8 13.4 16.2 42.9 7.4 4.6 5.1 7.7 2 20 11.0 8.2 5.9 5.2 7.7 2 25 2 25 11.5 8.7 6.9 5.1 7.7 Element comparisons Branches | mispredictions n std / F GM n std F F 123 F 1 - 4 2 10 2 10 5.39 | 0.96 1.98 1.80 4.53 | 0.81 2.17 | 0.27 2.42 | 0.47 2 15 2 15 5.40 | 0.89 1.99 1.66 2.43 | 0.78 2.18 | 0.24 2.43 | 0.47 2 20 2 20 5.41 | 0.89 1.99 1.63 4.57 | 0.78 2.18 | 0.24 2.43 | 0.47 2 25 2 25 5.41 | 0.89 2 1.63 4.56 | 0.78 2.18 | 0.24 2.43 | 0.47 GM Element moves I/Os | misses (per n/B ) 3.60 | 0.66 2.39 | 0.38 n std F GM std / F F 1 - 4 GM n – – | 2 10 3.99 2 10 1.00 | 1.00 1.99 2.15 1.00 | 1.00 0.95 | 0.95 – | – 2 15 3.99 1.99 2.39 2 15 5.66 | 1.00 1.03 | 1.00 1.03 | 1.00 2 20 4 1.99 2.38 2 20 5.87 | 4.94 1.04 | 1.00 – | – 2 25 4 2 2.38 2 25 5.87 | 5.84 1.04 | 0.99 – | – � Performance Engineering Laboratory c ARCO meeting at ITU, fall 2012 (14)

  15. Static search trees 4 46 2 6 construct () 12 75 sort ( a, a + n ) 1 3 5 7 is - member ( x ) 10 26 75 80 i = 0 0 k = n 8 while i � = k n = 8 if x < a [ i ] k = i a 8 10 12 26 46 75 75 80 i = left - child ( i ) 5 6 7 0 1 2 3 4 else if a [ i ] < x left - child ( i ) i = right - child ( i ) return . . . else return yes right - child ( i ) return no return . . . � Performance Engineering Laboratory c ARCO meeting at ITU, fall 2012 (15)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend