Weak Heaps and Friends: Recent Developments Stefan Edelkamp 1 , Amr - - PowerPoint PPT Presentation

weak heaps and friends recent developments
SMART_READER_LITE
LIVE PREVIEW

Weak Heaps and Friends: Recent Developments Stefan Edelkamp 1 , Amr - - PowerPoint PPT Presentation

Weak Heaps and Friends: Recent Developments Stefan Edelkamp 1 , Amr Elmasry 2 , Jyrki Katajainen 3 , 4 , Armin Wei 5 1) University of Bremen 2) Alexandria University 3) University of Copenhagen 4) Jyrki Katajainen and Company 5) University of


slide-1
SLIDE 1

c

Performance Engineering Laboratory

IWOCA 2013: Rouen, France (1)

Weak Heaps and Friends: Recent Developments

Stefan Edelkamp1, Amr Elmasry2, Jyrki Katajainen3,4, Armin Weiß5

1) University of Bremen 2) Alexandria University 3) University of Copenhagen 4) Jyrki Katajainen and Company 5) University of Stuttgart

These are Stefan’s slides for his invited talk at IWOCA 2013

slide-2
SLIDE 2

c

Performance Engineering Laboratory

IWOCA 2013: Rouen, France (2)

(Complete) Weak Heap

slide-3
SLIDE 3

c

Performance Engineering Laboratory

IWOCA 2013: Rouen, France (3)

Array-Based Representation

  • 1

7 6 5 4 2 3 9 10 8

12 26 53 46 47 8 80 75 27 49 10

Array a of elements Array r of bits (1’s in cyan)

  • 1

2 3 4 5 6 7 8 9 10

8 12 10 47 49 53 46 75 80 26 27

j

aj ai

2i+1−ri 2i+ri ⌊i/2⌋ i

slide-4
SLIDE 4

c

Performance Engineering Laboratory

IWOCA 2013: Rouen, France (4)

Why Weak Heaps?

Data structure

construct minimum insert extract-min

binary heap [Flo64,Wil64] 2n ⌈lg n⌉ 2⌈lg n⌉ weak heap [Dut93] n − 1 ⌈lg n⌉ ⌈lg n⌉ Repeated insertions [IWOCA-12, JDA-13]

2 4 6 8 10 12 14 16 18 20 22 103 104 105 106 107 Number of element comparisons per n n [logarithmic scale] Operation sequence: insertn binary heap weak heap weak queue

slide-5
SLIDE 5

c

Performance Engineering Laboratory

IWOCA 2013: Rouen, France (5)

Structure of the Talk: Research Questions What is the ”best” heap-construction algorithm? What is the ”best” sorting algorithm? What is the ”best” priority queue?

slide-6
SLIDE 6

c

Performance Engineering Laboratory

IWOCA 2013: Rouen, France (6)

What is the best in-place heap-construction algorithm? Best ∼ In terms of element comparisons and practical running time In-place ∼ Θ(1) extra words

slide-7
SLIDE 7

c

Performance Engineering Laboratory

IWOCA 2013: Rouen, France (7)

Some Options

Element comparisons Inventor Abbreviation Worst Average Extra space Floyd

  • alg. F

2n ∼1.88n Θ(1) words Gonnet & Munro

  • alg. GM

∼1.625n ∼1.625n Θ(n) words McDiarmid & Reed

  • alg. MR

2n ∼1.52n Θ(n) bits Li & Reed lower bound ∼1.37n ∼1.37n Ω(1) words Average-case results assume that the input is a random permutation

  • f n distinct elements
slide-8
SLIDE 8

c

Performance Engineering Laboratory

IWOCA 2013: Rouen, France (8)

Building Binary Heaps

Weak heap: Lower bound n − 1 (element comparisons) Weak heap → binary heap: ∼ 0.625n [IWOCA-12, MFCS-12]

❀ 1.625n heap construction, n bits (worst case)

Bottom trees: ❀ 1.625n in-place heap construction (worst case) [ ❀ 1.52n in-place heap construction (average case) ]

slide-9
SLIDE 9

c

Performance Engineering Laboratory

IWOCA 2013: Rouen, France (9)

Weak Heap -> Binary Heap

GM: Build a binary heap in two phases: 1) Construct a heap-ordered binomial tree 2) Convert this tree into a binary heap Alternative: a complete weak heap → # element comparisons C(8) = 1, C(2k) = 2C(2k−1) + k − 1 For n = 2k ≥ 8, the solution of this relation is C(n) = 5/8 · n − lg n − 1 Alternative: a navigation pile → less element moves

1 2 3 4 6 7 5 1 1 2 2

elements tournament tree

10 26 46 12 80 75 75 8

3

height navigation bits

1 3

011 | 1101 | 0111

slide-10
SLIDE 10

c

Performance Engineering Laboratory

IWOCA 2013: Rouen, France (10)

Bottom-Tree Conversion

Bottom trees: All complete binary trees of size m = 2⌊lg lg n⌋+1 − 1

  • 1. Convert all bottom trees to bottom heaps
  • 2. Ensure heap order at upper levels by using Floyd’s sift-down

procedure

  • 3. Optimize element moves by handling binary micro trees of size

7 differently

  • Elements involved in all bottom-heap constructions ≤ n

→ 1.625n element comparisons

  • At most
  • n/2h+1

nodes of height h → o(n) element comparisons at the levels above the bottom trees

slide-11
SLIDE 11

c

Performance Engineering Laboratory

IWOCA 2013: Rouen, France (11)

Experimental Setup and Summary

Random permutations of n distinct int(eger)s for different (small, medium, large, and very large) problem sizes Programs tuned to construct binary heaps of size 2k − 1

  • GM showed acceptable practical performance
  • number of element comparisons and element moves was larger for

in-situ GM than for in-situ MR

  • in-situ GM was faster than in-situ MR
  • but beaten by F and its BKS variant
slide-12
SLIDE 12

c

Performance Engineering Laboratory

IWOCA 2013: Rouen, France (12)

Element Comparisons

n std F BKS in-situ GM in-situ MR 210 − 1 1.64 1.86 1.86 1.74 1.52 215 − 1 1.64 1.88 1.88 1.65 1.54 220 − 1 1.64 1.88 1.88 1.63 1.53 225 − 1 1.65 1.88 1.88 1.63 1.53 std: Bottom-up heap construction (make heap, Floyd, Wegener) BKS: Improved version of Floyd’s algorithm (Bojesen et al. [JEA-00])

slide-13
SLIDE 13

c

Performance Engineering Laboratory

IWOCA 2013: Rouen, France (13)

Execution Times

n std F BKS in-situ GM in-situ MR 210 − 1 22.3 14.6 17.1 21.3 26.2 215 − 1 22.2 14.6 17.4 23.0 24.4 220 − 1 29.3 21.9 17.8 22.9 23.6 225 − 1 29.8 21.7 17.5 22.9 23.6 std: Bottom-up heap construction (make heap, Floyd, Wegener) BKS: Improved version of Floyd’s algorithm (Bojesen et al. [JEA-00])

slide-14
SLIDE 14

c

Performance Engineering Laboratory

IWOCA 2013: Rouen, France (14)

What is the best constant-factor-optimal in-situ/adaptive sorting algorithm? Best ∼ In terms of element comparisons and practical running time In-situ ∼ Θ(lg n) extra words Adaptive ∼ with respect to inversions

slide-15
SLIDE 15

c

Performance Engineering Laboratory

IWOCA 2013: Rouen, France (15)

Sequential Sorting

Lower bound: lg n! = n lg n−n/ ln 2+O(lg n), where 1/ ln 2 = 1.4426

  • Worst case: n lg n + 0.1n [Dutton 1993, BIT]
  • Best case/index sorting: n lg n − 0.9n [STACS-00, JEA-02]
  • QuickWeakHeapsort: n lg n+0.2n on average, in-place [JEA-02]
  • Optimal adaptive sorting:

n lg(Inv(n)/n) + O(n) worst case, two options [IWOCA-11, JDA-12]

slide-16
SLIDE 16

c

Performance Engineering Laboratory

IWOCA 2013: Rouen, France (16)

Constant-Factor-Optimal Algorithms

Space Time Worst Average Observed Lower bound O(1) Ω(n lg n)

  • 1.44
  • 1.44

BUHeapsort [Weg93] O(1) O(n lg n) ω(1) – [0.35,0.39] WeakHeapsort [Dut93] O(n/w) O(n lg n) 0.09 – [-0.46,-0.42] RWeakHeapsort [ES02] O(n) O(n lg n)

  • 0.91
  • 0.91
  • 0.91

Mergesort [Knu73] O(n) O(n lg n)

  • 0.91
  • 1.26

– EWeakHeapsort O(n) O(n lg n)

  • 0.91
  • 1.26

– Insertionsort [Knu73] O(1) O(n2)

  • 0.91
  • 1.38

– MergeInsertion [Knu73] O(n) O(n2)

  • 1.32
  • 1.3999

[-1.43,-1.41] InPlaceMergesort [R92] O(1) O(n lg n)

  • 1.32

– – QuickHeapsort [DW13] O(1) O(n lg n) ω(1)

  • 0.03

≈ 0.20 O(n/w) O(n lg n) ω(1)

  • 0.99

≈ -1.24 QuickMergesort (IS) O(lg n) O(n lg n)

  • 0.32
  • 1.38

– QuickMergesort O(1) O(n lg n)

  • 0.32
  • 1.26

[-1.29,-1.27] QuickMergesort (MI) O(lg n) O(n lg n)

  • 0.32
  • 1.3999

[-1.41,-1.40]

slide-17
SLIDE 17

c

Performance Engineering Laboratory

IWOCA 2013: Rouen, France (17)

Idea of QuickXsort

As in Quicksort the array is partitioned into the elements greater and less than some pivot element Then one part of the array is sorted by some algorithm X and the

  • ther part is sorted recursively

The advantage of this procedure is that, if X is a black box, then in QuickXsort the part of the array which is not currently being sorted may be used as temporary space, what yields an in-situ variant of X By taking a sample of Θ(√n) elements when selecting the pivot, QuickXsort performs, on an average, the same number of element comparisons as X up to an o(n) lower-order term

slide-18
SLIDE 18

c

Performance Engineering Laboratory

IWOCA 2013: Rouen, France (18)

Results for Small Datasets

−1.45 −1.44 −1.43 −1.42 −1.41 −1.4 −1.39 −1.38 −1.37 −1.36 −1.35 210 212 214 216 Number of element comparisons − n log n per n n [logarithmic scale] Small−Scale Comparison Experiment Lower Bound Insertionsort Merge Insertion Improved Merge Insertion 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 210 212 214 216 Execution time per (#elements)2 [µs] n [logarithmic scale] Small−Scale Runtime Experiment Insertionsort Merge Insertion Improved Merge Insertion

slide-19
SLIDE 19

c

Performance Engineering Laboratory

IWOCA 2013: Rouen, France (19)

Results for Large Datasets

−1.5 −1 −0.5 0.5 1 210 212 214 216 218 220 222 Number of element comparisons − n log n per n n [logarithmic scale] Large−Scale Comparison Experiment Quicksort Median Sqrt STL Introsort (out of range) STL Mergesort QuickMergesort (MI) Median Sqrt QuickMergesort Median 3 QuickMergesort Median Sqrt QuickWeakHeapsort Median Sqrt Lower Bound 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 212 214 216 218 220 222 Execution time per element [µs] n [logarithmic scale] Large−Scale Runtime Experiment Quicksort Median Sqrt STL Introsort STL Mergesort QuickMergesort (MI) Median Sqrt QuickMergesort Median 3 QuickMergesort Median Sqrt QuickWeakHeapsort Median Sqrt

slide-20
SLIDE 20

c

Performance Engineering Laboratory

IWOCA 2013: Rouen, France (20)

Adaptive Sorting

  • A sorting algorithm is adaptive with respect to a measure of

disorder, if it sorts all input sequences, but performs particularly well on those that have a low amount of disorder.

  • The running time of such algorithm is measured as a function of

the length of the input, n, and the amount of disorder. Hence, the running time varies between O(n) time and O(n lg n) depending

  • n the amount of disorder.
  • The algorithm should be adaptive without knowing the amount
  • f disorder beforehand.

Let x1, x2, . . . , xn be a sequence of n elements. For simplicity, assume that all elements are distinct.

Inv(n) :=

  • (i, j) | 1 ≤ i < j ≤ n and xi > xj
  • is one measure of dis-
  • rder
slide-21
SLIDE 21

c

Performance Engineering Laboratory

IWOCA 2013: Rouen, France (21)

Adaptive Heapsort

input: sequence x1, x2, . . . , xn of n elements

1 Construct an empty Cartesian tree C 2 hint ← 0 3 for i ∈ {1, 2, . . . , n} 4

hint ← C.insert(xi, hint)

5 Construct an empty priority queue Q

min

xi+1..xn xi x1..xi−1

6 Q.insert(C.minimum()) 7 for j ∈ {1, 2, . . . , n} 8

xj ← Q.extract-min()

9

Let Y be the set of children xj has in C

10

for each y ∈ Y

11

Q.insert(y) Idea: Keep Q small [Levcopoulos & Petersson 1993]

slide-22
SLIDE 22

c

Performance Engineering Laboratory

IWOCA 2013: Rouen, France (22)

Theoretical Race

For priority queue Q, element comparisons ≤ βn lg (Inv(n)/n) + O(n) Q β Reference binary heap 3 combined extract-min insert 2.5 [Levcopoulos & Petersson 1993] binomial queue 2 [folklore] weak heap 2 combined extract-min insert 1.5 [folklore] multipartite priority queue 1 [Elmasry, Jensen & Katajainen 2008] Goal: Achieve the constant-factor optimality, i.e. β = 1, and in the meantime ensure practicality!

slide-23
SLIDE 23

c

Performance Engineering Laboratory

IWOCA 2013: Rouen, France (23)

One-Buffer Solution

Task: insert in O(1) amortized time; extract-min in O(lg n) worst-case time including at most lg n + O(1) element comparisons Idea: Temporarily store inserted elements in a buffer and, once it is full (size lg n), move elements to the main structure as a bulk

  • 1

7 6 5 4 2 3 9 10 8

12 26 53 46 47 8 80 75 27 49 10

  • 1

2 3 4 5 6 7 8 9 10 12 11

minheap minbuffer

1 8 12 10 47 49 53 46 75 80 26 42 27

slide-24
SLIDE 24

c

Performance Engineering Laboratory

IWOCA 2013: Rouen, France (24)

Experiments

Running time and element comparisons when n = 108

50 100 150 200 250 300 108 109 1010 1011 1012 1013 1014 1015 time / n [s] #inversions Running times (n = 108) splaysort introsort adaptive heapsort (weak queue) adaptive heapsort (weak heap) 10 20 30 40 50 108 109 1010 1011 1012 1013 1014 1015 #element comparisons / n #inversions Comparison counts (n = 108) splaysort introsort adaptive heapsort (weak queue) adaptive heapsort (weak heap)

slide-25
SLIDE 25

c

Performance Engineering Laboratory

IWOCA 2013: Rouen, France (25)

What is the best elementary priority queue? Best ∼ In terms of element comparisons and practical running time Elementary ∼ no handles, no support of decrease and delete

slide-26
SLIDE 26

c

Performance Engineering Laboratory

IWOCA 2013: Rouen, France (26)

Elementary Priority Queues

Lower bounds for binary heaps: Ω(lg lg n) for insert, lg n + log∗ n − O(1) for extract-min (under some assumptions) Bulk insertion in weak heaps: O(1) amortized for insert, lg n for extract-min

❀ Engineered weak heaps:

O(1) for insert, lg n for extract-min, space n/w + O(1) Bulk insertion in binary heaps: O(1) amortized for insert, lg n amortized for extract-min

❀ Optimal in-place heaps:

O(1) for insert, lg n for extract-min, space O(1) [submitted]

slide-27
SLIDE 27

c

Performance Engineering Laboratory

IWOCA 2013: Rouen, France (27)

Performance of Some Priority Queues

Data structure Space

insert extract-min

binary heaps [Wil64] O(1) lg n + O(1) 2 lg n + O(1)

  • binom. queues [Bro78,Vui78]

O(n) O(1) 2 lg n + O(1) heaps on heaps [GM86] O(1) lg lg n + O(1) lg n + log∗ n + O(1) queue of pennants [CMP88] O(1) O(1) 3 lg n + log∗ n + O(1) multipartite PQs [EJK08] O(n) O(1) lg n + O(1)

  • engin. weak heaps [EEK13]

n/w + O(1) O(1) lg n + O(1)

  • ptimal in-place heaps

O(1) O(1) lg n + O(1) All data structures support construct in O(n) and minimum in O(1) worst-case time

slide-28
SLIDE 28

c

Performance Engineering Laboratory

IWOCA 2013: Rouen, France (28)

Strong Heaps

A strong heap is a binary heap where nodes dominate their right siblings Rotating sift-down Replace the minimum with 16

10 3 1 5 7 12 15 8 4 6 11 9 13 14 17 10 4 3 6 15 12 11 8 5 7 16 9 13 14 17

slide-29
SLIDE 29

c

Performance Engineering Laboratory

IWOCA 2013: Rouen, France (29)

Strong Heaps

Strong sift-down Replace the minimum with 16

10 3 1 5 7 12 15 8 4 6 11 9 13 14 17 10 3 5 7 12 15 8 4 6 11 9 13 14 16 17

slide-30
SLIDE 30

c

Performance Engineering Laboratory

IWOCA 2013: Rouen, France (30)

Optimal In-Place Heaps

submersion area insertion buffer both of size O(lg2 n0) ⌊lg n0⌋ − ⌈lg lg n0⌉ ⌈lg lg n0⌉ min ℓ2 r2ℓ1 r1 min top heap bottom heaps

slide-31
SLIDE 31

c

Performance Engineering Laboratory

IWOCA 2013: Rouen, France (31)

What is the best bound when handling a request sequence consisting of n insert, n extract-min, and m decrease operations? Best ∼ In terms of element comparisons and practical running time

slide-32
SLIDE 32

c

Performance Engineering Laboratory

IWOCA 2013: Rouen, France (32)

Addressable Priority Queues

insert

input: element

  • utput: locator

minimum

input: none

  • utput: locator

delete

input: locator

  • utput: none

decrease

input: locator, element

  • utput: none

union

input: two priority queues

  • utput: one priority queue

extract-min

p ←minimum()

delete(p)

slide-33
SLIDE 33

c

Performance Engineering Laboratory

IWOCA 2013: Rouen, France (33)

Market Analysis

Efficiency Operation binary heap worst case binomial queue worst case Fibonacci heap amortized run-relaxed heap worst case

minimum

Θ(1) Θ(1) Θ(1) Θ(1)

insert

Θ(lg n) Θ(1) Θ(1) Θ(1)

decrease

Θ(lg n) Θ(lg n) Θ(1) Θ(1)

delete

Θ(lg n) Θ(lg n) Θ(lg n) Θ(lg n)

union

Θ(lg m×lg n) Θ(min{lg m, lg n}) Θ(1) Θ(min{lg m, lg n})

Here m and n denote the number of elements in the priority queues just prior to the operation.

slide-34
SLIDE 34

c

Performance Engineering Laboratory

IWOCA 2013: Rouen, France (34)

Result

Rank-relaxed weak heaps are better than Fibonacci heaps! Data structure Element comparisons Fibonacci heap 2m + 2.89n lg n Rank-relaxed weak heap 2m + 1.5n lg n But they are not simpler! Data structure Lines of code Binary heap 205 Fibonacci heap 296 Rank-relaxed weak heap 883

slide-35
SLIDE 35

c

Performance Engineering Laboratory

IWOCA 2013: Rouen, France (35)

Pointer-Based Representation

  • Three pointers per node: left

child, right child, parent

  • One element per node; in the

right subtree no element is larger

  • Except the root, the nodes

that have at most one child are at the last two levels only

  • Recall where the minimum is

minimum

Run relaxation: Remove a po- tential violation if absolutely necessary (lazy) Rank relaxation: Remove a po- tential violation whenever possible (eager)

slide-36
SLIDE 36

c

Performance Engineering Laboratory

IWOCA 2013: Rouen, France (36)

Registries

Leaf registry: Keep track of the nodes that have one child or no children

minimum leaf registry

Mark registry: Keep track

  • f

the potential violations

minimum mark registry

Basic operations in both: Location-based insert and delete

slide-37
SLIDE 37

c

Performance Engineering Laboratory

IWOCA 2013: Rouen, France (37)

Transformations

C C D B B C D A B C D C C D A A A D C D A C D C D C D B C B C u u B D C D C A D B A A A u v w A B D A u w v A u v w v w w v u

c) a) b) d)

  • r
  • r
  • r

and

  • r
  • r

A v w A D v w v u w u B B B u v w v u w B w u v B u w y z u w w u B y z z y

a) cleaning transformation b) parent transformation c) sibling transformation d) pair transformation gray nodes are marked

slide-38
SLIDE 38

c

Performance Engineering Laboratory

IWOCA 2013: Rouen, France (38)

Rank-Relaxed Weak Heap

minimum

  • λ ≤ ⌊lg n⌋ − 1 nodes marked; they may

violate the weak-heap ordering

insert: Insert a leaf, mark it, apply λ-reducing

transformations as long as possible.

decrease: Decrease the value in the given node,

mark it, apply λ-reducing transformations as long as possible.

extract-min: Find the minimum (at the root or

  • ne of the marked nodes), borrow a leaf,

fix the structure of the subtree that lost its root, mark the root of the fixed subtree, apply λ-reducing transformations as long as possible. Improvement in extract-min: If the mark registry is more than half full before the minimum finding, empty it.

slide-39
SLIDE 39

c

Performance Engineering Laboratory

IWOCA 2013: Rouen, France (39)

Our Play with Dijkstra’s Algorithm

With your search engine, you will find many experimental studies

  • n Dijkstra’s algorithm. Be crit-

ical when you read the results.

  • Which algorithm
  • Which graph representation
  • Which priority queue
  • Which tuning level
  • scanned

labelled unlabelled source priority queue minimum

a factor of two speed-up

slide-40
SLIDE 40

c

Performance Engineering Laboratory

IWOCA 2013: Rouen, France (40)

Policy-Based Benchmarking

Element comparisons

2e+07 3e+07 4e+07 5e+07 6e+07 7e+07 8e+07 9e+07 1e+08 1.1e+08 1.2e+08 1e+06 1.2e+06 1.4e+06 1.6e+06 1.8e+06 2e+06 number of element comparisons number of vertices n SSSP normal graphs 2−3 heap violation heap pairing heap Fibonacci heap tuned−relaxed WQ run−relaxed WQ weak queue bu−heap weak−heap LEDA pairing heap LEDA Fibonacci heap

slide-41
SLIDE 41

c

Performance Engineering Laboratory

IWOCA 2013: Rouen, France (41)

Policy-Based Benchmarking

Running time

4 6 8 10 12 14 16 1e+06 1.2e+06 1.4e+06 1.6e+06 1.8e+06 2e+06 CPU time in s number of vertices n SSSP normal graphs 2−3 heap violation heap pairing heap Fibonacci heap tuned rank−relaxed WQs rank−relaxed WQ weak queue bu−heap weak−heap LEDA pairing heap LEDA Fibonacci heap

slide-42
SLIDE 42

c

Performance Engineering Laboratory

IWOCA 2013: Rouen, France (42)

Graph Representation

CPH STL

  • adjacency arrays
  • simple
  • static
  • 16m + 16n + O(1) bytes for

a graph with m edges and n vertices LEDA

  • adjacency lists
  • nice interface
  • fully dynamic
  • parameterized
  • 52m + 60n + O(1) bytes

[LEDA Book, § 6.14]

end points edge pointer

...

tentative distance state weight vertex edge

a factor of two speed-up

slide-43
SLIDE 43

c

Performance Engineering Laboratory

IWOCA 2013: Rouen, France (43)

Avoiding Indirection

  • Combine the graph vertex and the priority-

queue node [Knuth 1994] → improves cache behaviour a factor of two speed-up

slide-44
SLIDE 44

c

Performance Engineering Laboratory

IWOCA 2013: Rouen, France (44)

Tuning

Running time per n [µs]

Structure Operation CPH STL Fibonacci heap LEDA 6.2 Fibonacci heap

insert

n: 10 000 0.10 0.18 n: 100 000 0.09 0.15 n: 1 000 000 0.09 0.15

decrease

n: 10 000 0.03 0.06 n: 100 000 0.05 0.22 n: 1 000 000 0.06 0.31

extract-min

n: 10 000 0.7 1.2 n: 100 000 1.4 2.7 n: 1 000 000 2.8 4.5

Element comparisons per n

Structure Operation CPH STL Fibonacci heap LEDA 6.2 Fibonacci heap

insert

n: 10 000 1 n: 100 000 1 n: 1 000 000 1

decrease

n: 10 000 2 n: 100 000 2 n: 1 000 000 2

extract-min

n: 10 000 16.2 29.9 n: 100 000 21.2 38.3 n: 1 000 000 26.2 46.5

On my computer (Ubuntu, g++, with -O3) a factor of two speed-up

slide-45
SLIDE 45

c

Performance Engineering Laboratory

IWOCA 2013: Rouen, France (45)

Parameterized Design

weak heap comparator type node type weak-heap node, combined weak-heap node & graph node, ... std::less, std::greater, ... int, double, ... modifier type relaxed heap modifier, ... level-registry type mark-registry type leaf registry, ... naive mark registry, eager mark registry, lazy mark registry, ... element type

  • comparators shared
  • nodes shared
  • transformations shared
  • level registries shared
  • mark registries shared

a factor of two less code

slide-46
SLIDE 46

c

Performance Engineering Laboratory

IWOCA 2013: Rouen, France (46)

What Is the Best?

Our reference sequence Theory: rank-relaxed weak heap Dijkstra—time: binary heap [Williams 1964] Dijkstra—comps: weak heap [Dutton 1993] Worst case per operation

insert—time: Fibonacci heap

[Fredman & Tarjan 1987]

insert—comps: Fibonacci heap decrease—time: Fibonacci heap decrease—comps: Fibonacci heap extract-min—time: weak queue

[Vuillemin 1978]

extract-min—comps: weak heap

slide-47
SLIDE 47

c

Performance Engineering Laboratory

IWOCA 2013: Rouen, France (47)

Questions

slide-48
SLIDE 48
  • J. Bojesen, J. Katajainen, and M. Spork, Performance engineering case study: Heap construction.

ACM J. Exp. Algorithmics 5:Article 15, 2000

  • A. Bruun, S. Edelkamp, J. Katajainen, and J. Rasmussen.

Policy-based benchmarking of weak heaps and their relatives. SEA 2010, LNCS 6049, pp. 459–435. Springer, 2010

  • D. Cantone and G. Cinotti. QuickHeapsort, an efficient mix of classical sorting algorithms. Theoret.
  • Comput. Sci. 285(1):25–42, 2002
  • J. Chen, S. Edelkamp, A. Elmasry, and J. Katajainen. In-place heap construction with optimized

comparisons, moves, and cache misses. MFCS 2012, LNCS 7464, pp. 259–270, Springer, 2012

  • T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms. 3nd edition,

The MIT Press, 2009

  • V. Diekert and A. Weiß. Quickheapsort: Modifications and improved analysis. CSR 2013, LNCS

7913, pp. 24–35, Springer, 2013

  • R. D. Dutton. Weak-heap sort. BIT 33(3):372–381, 1993
  • S. Edelkamp, A. Elmasry, and J. Katajainen. Two constant-factor-optimal realizations of adaptive
  • heapsort. IWOCA 2011, LNCS 7056, pp. 195–208, Springer, 2011
  • S. Edelkamp, A. Elmasry, and J. Katajainen. A catalogue of algorithms for building weak heaps.

IWOCA 2012, LNCS 7643, pp. 249–262, Springer, 2012

  • S. Edelkamp, A. Elmasry, and J. Katajainen. The weak-heap data structure: Variants and applica-
  • tions. J. Discrete Algorithms 16:187–205, 2012
  • S. Edelkamp, A. Elmasry, and J. Katajainen.

The weak-heap family of priority queues in theory and praxis. CATS 2012, Conferences in Research and Practice in Information Technology 128, Australian Computer Society, pp. 103–112, 2012

slide-49
SLIDE 49
  • S. Edelkamp, A. Elmasry, and J. Katajainen. Optimal in-place heaps, 2013
  • S. Edelkamp, A. Elmasry, and J. Katajainen. Weak heaps engineered. J. Discrete Algorithms, 2013
  • S. Edelkamp and P. Stiegeler. Implementing Heapsort with n log n−0.9n and Quicksort with n log n+

0.2n comparisons. ACM J. Exp. Algorithmics 7:Article 5, 2002

  • S. Edelkamp and I. Wegener. On the performance of weak-heapsort. STACS 2000, LNCS 1770,
  • pp. 254–266, Springer, 2000
  • S. Edelkamp and A. Weiß. Quickmergesort: Efficient sorting with n log n−1.399n+o(n) comparisons
  • n average, 2013
  • A. Elmasry, C. Jensen, and J. Katajainen.

Multipartite priority queues. ACM Trans. Algorithms 5(1):Article 14, 2008

  • J. Katajainen. The ultimate heapsort. CATS 1998, Australian Computer Science Communications

20(3), pp. 87–96, Springer-Verlag Singapore, 1998

  • D. E. Knuth. Sorting and Searching. The Art of Computer Programming 3, 2nd edition, Addison

Wesley Longman, 1998

  • C. J. H. McDiarmid and B. A. Reed. Building heaps fast. J. Algorithms 10(3):352–365, 1989
  • J. Vuillemin.

A data structure for manipulating priority queues.

  • Commun. ACM 21(4):309–315,

1978

  • I. Wegener. The worst case complexity of McDiarmid and Reed’s variant of Bottom-Up Heapsort

is less than n log n + 1.1n. Inform. and Comput. 97(1):86–96, 1992

  • J. W. J. Williams. Algorithm 232: Heapsort. Commun. ACM 7(6):347–348, 1964