Helsinki, 8 December 2003 Title: The current truth about heaps - - PDF document

helsinki 8 december 2003 title the current truth about
SMART_READER_LITE
LIVE PREVIEW

Helsinki, 8 December 2003 Title: The current truth about heaps - - PDF document

Helsinki, 8 December 2003 Title: The current truth about heaps Speaker: Jyrki Katajainen Co-workers: Claus Jensen and Fabio Vitale This talk is about the heaps we all love. I will explain how the heap functions are im- plemented in the CPH


slide-1
SLIDE 1

Helsinki, 8 December 2003 Title: The current truth about heaps Speaker: Jyrki Katajainen Co-workers: Claus Jensen and Fabio Vitale This talk is about the heaps we all love. I will explain how the heap functions are im- plemented in the CPH STL program library. The main contribution of the work done by my co-workers and myself is an experimental evaluation of various heap variants proposed in the computing literature. We have also done micro-benchmarking which gives some directions for future research. These slides are available at http://www.cphstl.dk/.

c

Performance Engineering Laboratory

1

slide-2
SLIDE 2

9th Scandinavian Workshop on Algorithm Theory

July 8–10, 2004 Louisiana Museum of Modern Art Humlebæk, Denmark http://swat.diku.dk/

Deadline for submission: February 10, 2004 at noon (GMT) Notification of authors: March 23, 2004 Final version due: April 20, 2004 End of early registration: May 4, 2004

c

Performance Engineering Laboratory

2

slide-3
SLIDE 3

c

Performance Engineering Laboratory

3

slide-4
SLIDE 4

Heap functions in the STL

void push heap(position A, position Z, ordering f);

Effect:

✁ ✁ ✁ ✁ ✁ ✁❆ ❆ ❆ ❆ ❆ ✈ ✲ ✁ ✁ ✁ ✁ ✁ ✁❆ ❆ ❆ ❆ ❆

at most log2 n comparisons

void pop heap(position A, position Z, ordering f);

Effect:

✈ ✁ ✁ ✁ ✁ ✁ ✁❆ ❆ ❆ ❆ ❆ ✲ ✁ ✁ ✁ ✁ ✁ ✁❆ ❆ ❆ ❆ ❆ ✈

at most 2 log2 n comparisons

void make heap(position A, position Z, ordering f);

Effect:

✲ ✁ ✁ ✁ ✁ ✁ ✁❆ ❆ ❆ ❆ ❆

at most 3n comparisons

void sort heap(position A, position Z, ordering f);

Effect:

✁ ✁ ✁ ✁ ✁ ✁❆ ❆ ❆ ❆ ❆ ✲ ✑✑✑✑✑ ✑ at most n log2 n

comparisons

c

Performance Engineering Laboratory

4

slide-5
SLIDE 5

How would you do it?

c

Performance Engineering Laboratory

5

slide-6
SLIDE 6

Jones 1986

Operation sequence (hold model):

push()N[pop()push()]K

e ← pop() increase the priority of e by − ln(drand())

push(e)

Input data: element size: 4 B; #elements: 1–213.5 Environment: computer: VAX 11/780 running UNIX (BSD 4.2); cache: 8 kB: TLB: 64 entries; compiler: Berkeley Pascal with optimization enabled

c

Performance Engineering Laboratory

6

slide-7
SLIDE 7

LaMarca & Ladner 1996

Operation sequence: Hold model? #define NOTSORANDNUM(x) (x + RANDNUM()) Input data: element size: 8 B; #elements: 210–223 Environment: computer: DEC Alphastation 250; processor: Al- pha 21064A 266 MHz; L1 cache: 8 kB; L2 cache: direct-mapped, 2 MB, 32 B per line; compiler?: cc

c

Performance Engineering Laboratory

7

slide-8
SLIDE 8

Sanders 1999

Operation sequence: [push()pop()push()]N[pop()push()pop()]N Input data: element size: 4 B, drawn randomly; satellite data: 4 B; #elements: 28–223 Environment: computer: Pentium II 300 MHz; compiler g++ -O6

c

Performance Engineering Laboratory

8

slide-9
SLIDE 9

Brengel et al. 1999

Operation sequence:

push()N/pop()N

Input data: element size: 4 B, drawn randomly from [0 . . 107]; #elements: 1 · 106–200 · 106 Environment: computer: Sparc Ultra 1/143; main memory: 256 MB, 8 kB per page; local disk: 9 GB fastwide SCSI; logical block size: 64 kB; buffer size: 16 MB

c

Performance Engineering Laboratory

9

slide-10
SLIDE 10

Edelkamp & Stiegeler 2002

Operation sequence:

make(N)[pop()]N

Input data: element size: 4 B, floating point numbers drawn randomly; #elements: 106; ordering: f 0(x) = x and f i(x) = ln(f i−1(x+1)) for i > 0 Environment: computer: Pentium III 450 MHz; compiler g++ -O2

c

Performance Engineering Laboratory

10

slide-11
SLIDE 11

How would you do it now?

c

Performance Engineering Laboratory

11

slide-12
SLIDE 12

Sanders’ programs on Pentium II

500 1000 1500 2000 2500 3000 1000 10000 100000 1e+06 1e+07 Execution time per element [in nanoseconds] n Sanders’ programs: [push()]N [pop()]N 2−ary heap 4−ary heap

slide-13
SLIDE 13

Sanders’ programs on Pentium III

500 1000 1500 2000 2500 1000 10000 100000 1e+06 1e+07 Execution time per element [in nanoseconds] n Sanders’ programs on Pentium III: [push()]N [pop()]N 2−ary heap 4−ary heap

slide-14
SLIDE 14

Sanders’ programs on Pentium IV

200 400 600 800 1000 1200 1400 1600 1000 10000 100000 1e+06 1e+07 Execution time per element [in nanoseconds] n Sanders’ programs on Pentium IV: [push()]N [pop()]N 2−ary heap 4−ary heap

slide-15
SLIDE 15

Cost of unsigned int operations

initializations instruction unsigned int p ← 1 a[i] ← 0 x ← 220 a[i] ← x n = 210 . . 224 4.1–4.7 ns p ← 617 a[i] ← 0 x ← 220 a[i] ← x n = 210 . . 214 7.3–8.9 ns n = 215 12 ns n = 216 29 ns n = 216 . . 222 62–63 ns p ← 1 a[i] ← 0 x ← 220 x ← a[i] n = 210 . . 224 3.3–3.8 ns p ← 617 a[i] ← 0 x ← 220 x ← a[i] n = 210 . . 215 3.3–4.1 ns n = 216 23 ns n = 217 . . 222 45–55 ns p ← 1 a[i] ← 0 x ← 220 r ← (a[i] < x) n = 210 . . 224 5.3–5.8 ns p ← 1 a[i] ← 0 x ← 220 r ← (ln(a[i]) < ln(x)) n = 210 . . 224 580–610ns c

Performance Engineering Laboratory

15

slide-16
SLIDE 16

Cost of bigint operations

initializations instruction bigint p ← 1 a[i] ← 0 x ← 220 a[i] ← x n = 210 . . 221 60–66 ns n = 222 290 ns p ← 617 a[i] ← 0 x ← 220 a[i] ← x n = 210 . . 212 75–78 ns n = 213 117 ns n = 214 229 ns n = 215 . . 220 297–318 ns n = 221 . . 222 748–752 ns p ← 1 a[i] ← 0 x ← 220 x ← a[i] n = 210 . . 222 18–21 ns p ← 617 a[i] ← 0 x ← 220 x ← a[i] n = 210 . . 212 24 ns n = 213 83 ns n = 214 180 ns n = 215 . . 222 230–260 ns p ← 1 a[i] ← 0 x ← 220 r ← (a[i] < x) n = 210 . . 222 13–16 ns c

Performance Engineering Laboratory

16

slide-17
SLIDE 17

Other current research

Pointer-based methods: hopelessly slow → theoretical computer science Methods with good amortized bounds: terrible worst case → not relevant for us Methods with few element moves: bad cache behaviour → not good for us External-memory methods: high constants → relevant only for very large data sets Cache-oblivious methods: huge constants → theoretical computer science

c

Performance Engineering Laboratory

17

slide-18
SLIDE 18

Our policy-based framework

template <arity d, typename position, typename ordering> class heap_policy { public: typedef typename std::iterator_traits<position>::difference_type index; typedef typename std::iterator_traits<position>::difference_type level; typedef typename std::iterator_traits<position>::value_type element; template <typename integer> heap_policy(integer n = 0); bool is_root(index) const; bool is_first_child(index) const; index size() const; level depth(index) const; index root() const; index leftmost_leaf() const; index last_leaf() const; index first_child(index) const; index parent(index) const; index ancestor(index, level) const; index top_some_absent(position, index, const ordering&) const; index top_all_present(position, index, const ordering&) const; void update(position, index, const element&); void erase_last_leaf(position, const ordering&); void insert_new_leaf(position, const ordering&); private: index n; };

c

Performance Engineering Laboratory

18

slide-19
SLIDE 19

Input data

cheap move expensive move cheap comparison unsigned int bigint expensive comparison unsigned int ln comparison (int, bigint) ln comparison

c

Performance Engineering Laboratory

19

slide-20
SLIDE 20

One new old idea: local heaps

c

Performance Engineering Laboratory

20

slide-21
SLIDE 21

Our solution for sort heap()

In-place mergesort by Katajainen, Pasanen, and Teuhola [1996] Fine-tuning not yet implemented Almost as fast as quicksort, see CPH STL Report 2003-2

c

Performance Engineering Laboratory

21

slide-22
SLIDE 22

Our solution for make heap()

Depth-first heap construction by Bojesen, Kata- jainen, and Spork [2000] Almost optimal in all respects Other work: less element comparisons → theoretical computer science

c

Performance Engineering Laboratory

22

slide-23
SLIDE 23

Various approaches for pop heap()

– top-down → many element comparisons – bottom-up → typical case good – move-saving bottom-up → theoretical com- puter science – binary-search top-down – two-levels-at-a-time top-down

c

Performance Engineering Laboratory

23

slide-24
SLIDE 24

Various approaches for push heap()

– move-saving top-down → slow – bottom-up → typical case good – bottom-up with buffering → complicated – binary-search bottom-up

c

Performance Engineering Laboratory

24

slide-25
SLIDE 25

Efficiency of 2-, 3-, 4-ary heaps

200 400 600 800 1000 1200 1400 1600 1800 1000 10000 100000 1e+06 1e+07 Execution time per element [in nanoseconds] n Efficiency of various sorting functions for random integers SGI::partial_sort() Bottom−up approach: 3−ary heap Bottom−up approach: 2−ary heap Bottom−up approach: 4−ary heap SGI::sort()

slide-26
SLIDE 26

Efficiency of 2-, 3-, 4-ary heaps

2000 4000 6000 8000 10000 12000 14000 16000 1000 10000 100000 1e+06 1e+07 Execution time per element [in nanoseconds] n Efficiency of various sorting functions for random integers using ln comparison Bottom−up approach: 4−ary heap Bottom−up approach: 3−ary heap SGI::sort() Bottom−up approach: 2−ary heap SGI::partial_sort()

slide-27
SLIDE 27

Efficiency of local heaps

200 400 600 800 1000 1200 1400 1600 1800 1000 10000 100000 1e+06 1e+07 Execution time per element [in nanoseconds] n Efficiency of various sorting functions for random integers SGI::partial_sort() Two−by−two top−down approach: 1−local heap Two−by−two top−down approach: 5−local heap Two−by−two top−down approach: 4−local heap Two−by−two top−down approach: 3−local heap Two−by−two top−down approach: 2−local heap SGI::sort()

slide-28
SLIDE 28

Efficiency of local heaps

2000 4000 6000 8000 10000 12000 14000 1000 10000 100000 1e+06 1e+07 Execution time per element [in nanoseconds] n Efficiency of various sorting functions for random integers using ln comparison Two−by−two top−down approach: 1−local heap Two−by−two top−down approach: 5−local heap Two−by−two top−down approach: 4−local heap Two−by−two top−down approach: 2−local heap Two−by−two top−down approach: 3−local heap SGI::sort() function: partial_sort source: SGI

slide-29
SLIDE 29

Conclusions

– In 40 years — not much progress – At the moment it is not clear how big the overhead of local heaps is for small problem sizes. – Some combinations of various approaches have still to be tested. – Code-tuning of the best approaches is still to be done. – It takes time to develop fast library rou- tines. – How does technology influence on the ef- ficiency of the library routines?

c

Performance Engineering Laboratory

29

slide-30
SLIDE 30

Exercise of the week

How many element comparisons incur the op- eration sequence [push() | pop()]N in the worst case? Or what is the amortized complexity of each of these operations? 1.5N log2 N is an obvious upper bound and N log2 N an obvious lower bound. Recall that the operation sequence

make(N)[pop()]N

requires about 1.5N log2 N element compar- isons.

c

Performance Engineering Laboratory

30