Helsinki, 8 December 2003 Title: The current truth about heaps - PDF document

Helsinki, 8 December 2003 Title: The current truth about heaps Speaker: Jyrki Katajainen Co-workers: Claus Jensen and Fabio Vitale This talk is about the heaps we all love. I will explain how the heap functions are implemented in the CPH STL program library. The main contribution of the work done by my co-workers and myself is an experimental evaluation of various heap variants proposed in the computing literature. We have also done micro-benchmarking which gives some directions for future research. These slides are available at http://www.cphstl.dk/ . � Performance Engineering Laboratory c 1

9th Scandinavian Workshop on Algorithm Theory July 8–10, 2004 Louisiana Museum of Modern Art Humlebæk, Denmark http://swat.diku.dk/ Deadline for submission: February 10, 2004 at noon (GMT) Notification of authors: March 23, 2004 Final version due: April 20, 2004 End of early registration: May 4, 2004 � Performance Engineering Laboratory c 2

c � Performance Engineering Laboratory 3

Heap functions in the STL void push heap (position A , position Z , ordering f ); ✁❆ ✁❆ ✁ ✁ ❆ ❆ at most log 2 n ✁ ✁ ✲ ❆ ❆ ✁ ✁ Effect: ❆ ❆ ✁ ✁ ❆ ❆ comparisons ✁ ✁ ✈ void pop heap (position A , position Z , ordering f ); ✁❆ ✁❆ ✈ ✁ ✁ ❆ ❆ at most 2 log 2 n ✁ ✁ ✲ ❆ ❆ ✁ ✁ Effect: ❆ ❆ ✁ ✁ ❆ ❆ comparisons ✁ ✁ ✈ void make heap (position A , position Z , ordering f ); ✁❆ ✁ ❆ at most 3 n ✁ ✲ ❆ ✁ Effect: ❆ ✁ ❆ comparisons ✁ void sort heap (position A , position Z , ordering f ); ✁❆ ✁ ❆ ✑ at most n log 2 n ✁ ✲ ❆ ✑✑✑✑✑ ✁ Effect: ❆ ✁ ❆ comparisons ✁ c � Performance Engineering Laboratory 4

How would you do it? � Performance Engineering Laboratory c 5

Jones 1986 Operation sequence (hold model): push () N [ pop () push ()] K e ← pop () increase the priority of e by − ln( drand ()) push ( e ) Input data: element size: 4 B; #elements: 1–2 13 . 5 Environment: computer: VAX 11/780 running UNIX (BSD 4.2); cache: 8 kB: TLB: 64 entries; compiler: Berkeley Pascal with optimization enabled � Performance Engineering Laboratory c 6

LaMarca & Ladner 1996 Operation sequence: Hold model? #define NOTSORANDNUM(x) (x + RANDNUM()) Input data: element size: 8 B; #elements: 2 10 –2 23 Environment: computer: DEC Alphastation 250; processor: Al- pha 21064A 266 MHz; L1 cache: 8 kB; L2 cache: direct-mapped, 2 MB, 32 B per line; compiler?: cc � Performance Engineering Laboratory c 7

Sanders 1999 Operation sequence: [ push () pop () push ()] N [ pop () push () pop ()] N Input data: element size: 4 B, drawn randomly; satellite data: 4 B; #elements: 2 8 –2 23 Environment: computer: Pentium II 300 MHz; compiler g++ -O6 � Performance Engineering Laboratory c 8

Brengel et al. 1999 Operation sequence: push () N / pop () N Input data: element size: 4 B, drawn randomly from [0 . . 10 7 ]; #elements: 1 · 10 6 –200 · 10 6 Environment: computer: Sparc Ultra 1/143; main memory: 256 MB, 8 kB per page; local disk: 9 GB fastwide SCSI; logical block size: 64 kB; buffer size: 16 MB � Performance Engineering Laboratory c 9

Edelkamp & Stiegeler 2002 Operation sequence: make ( N )[ pop ()] N Input data: element size: 4 B, floating point numbers drawn randomly; #elements: 10 6 ; ordering: f 0 ( x ) = x and f i ( x ) = ln( f i − 1 ( x +1)) for i > 0 Environment: computer: Pentium III 450 MHz; compiler g++ -O2 � Performance Engineering Laboratory c 10

How would you do it now? � Performance Engineering Laboratory c 11

Sanders’ programs: [push()] N [pop()] N Sanders’ programs on Pentium II 3000 2−ary heap 4−ary heap 2500 Execution time per element [in nanoseconds] 2000 1500 1000 500 0 1000 10000 100000 1e+06 1e+07 n

Sanders’ programs on Pentium III: [push()] N [pop()] N Sanders’ programs on Pentium III 2500 2−ary heap 4−ary heap Execution time per element [in nanoseconds] 2000 1500 1000 500 0 1000 10000 100000 1e+06 1e+07 n

Sanders’ programs on Pentium IV: [push()] N [pop()] N Sanders’ programs on Pentium IV 1600 2−ary heap 4−ary heap 1400 Execution time per element [in nanoseconds] 1200 1000 800 600 400 200 0 1000 10000 100000 1e+06 1e+07 n

Cost of unsigned int operations initializations instruction unsigned int p ← 1 a [ i ] ← 0 n = 2 10 . . 2 24 4.1–4.7 ns a [ i ] ← x x ← 2 20 n = 2 10 . . 2 14 7.3–8.9 ns p ← 617 n = 2 15 12 ns a [ i ] ← 0 a [ i ] ← x n = 2 16 x ← 2 20 29 ns n = 2 16 . . 2 22 62–63 ns p ← 1 a [ i ] ← 0 n = 2 10 . . 2 24 3.3–3.8 ns x ← a [ i ] x ← 2 20 p ← 617 n = 2 10 . . 2 15 3.3–4.1 ns a [ i ] ← 0 n = 2 16 x ← a [ i ] 23 ns x ← 2 20 n = 2 17 . . 2 22 45–55 ns p ← 1 a [ i ] ← 0 n = 2 10 . . 2 24 5.3–5.8 ns r ← ( a [ i ] < x ) x ← 2 20 p ← 1 a [ i ] ← 0 n = 2 10 . . 2 24 580–610ns r ← (ln( a [ i ]) < ln( x )) x ← 2 20 � Performance Engineering Laboratory c 15

Cost of bigint operations initializations instruction bigint p ← 1 n = 2 10 . . 2 21 60–66 ns a [ i ] ← 0 a [ i ] ← x n = 2 22 x ← 2 20 290 ns n = 2 10 . . 2 12 75–78 ns p ← 617 n = 2 13 117 ns a [ i ] ← 0 n = 2 14 a [ i ] ← x 229 ns x ← 2 20 n = 2 15 . . 2 20 297–318 ns n = 2 21 . . 2 22 748–752 ns p ← 1 a [ i ] ← 0 n = 2 10 . . 2 22 18–21 ns x ← a [ i ] x ← 2 20 n = 2 10 . . 2 12 p ← 617 24 ns n = 2 13 a [ i ] ← 0 83 ns x ← a [ i ] n = 2 14 x ← 2 20 180 ns n = 2 15 . . 2 22 230–260 ns p ← 1 a [ i ] ← 0 n = 2 10 . . 2 22 13–16 ns r ← ( a [ i ] < x ) x ← 2 20 � Performance Engineering Laboratory c 16

Other current research Pointer-based methods: hopelessly slow → theoretical computer science Methods with good amortized bounds: terrible worst case → not relevant for us Methods with few element moves: bad cache behaviour → not good for us External-memory methods: high constants → relevant only for very large data sets Cache-oblivious methods: huge constants → theoretical computer science � Performance Engineering Laboratory c 17

Our policy-based framework template <arity d, typename position, typename ordering> class heap_policy { public: typedef typename std::iterator_traits<position>::difference_type index; typedef typename std::iterator_traits<position>::difference_type level; typedef typename std::iterator_traits<position>::value_type element; template <typename integer> heap_policy(integer n = 0); bool is_root(index) const; bool is_first_child(index) const; index size() const; level depth(index) const; index root() const; index leftmost_leaf() const; index last_leaf() const; index first_child(index) const; index parent(index) const; index ancestor(index, level) const; index top_some_absent(position, index, const ordering&) const; index top_all_present(position, index, const ordering&) const; void update(position, index, const element&); void erase_last_leaf(position, const ordering&); void insert_new_leaf(position, const ordering&); private: index n; }; � Performance Engineering Laboratory c 18

Input data cheap expensive move move cheap unsigned int bigint comparison expensive unsigned int (int, bigint) comparison ln comparison ln comparison � Performance Engineering Laboratory c 19

One new old idea: local heaps � Performance Engineering Laboratory c 20

Our solution for sort heap() In-place mergesort by Katajainen, Pasanen, and Teuhola [1996] Fine-tuning not yet implemented Almost as fast as quicksort, see CPH STL Report 2003-2 � Performance Engineering Laboratory c 21

Our solution for make heap() Depth-first heap construction by Bojesen, Kata- jainen, and Spork [2000] Almost optimal in all respects Other work: less element comparisons → theoretical computer science � Performance Engineering Laboratory c 22

Various approaches for pop heap() – top-down → many element comparisons – bottom-up → typical case good – move-saving bottom-up → theoretical computer science – binary-search top-down – two-levels-at-a-time top-down � Performance Engineering Laboratory c 23

Various approaches for push heap() – move-saving top-down → slow – bottom-up → typical case good – bottom-up with buffering → complicated – binary-search bottom-up � Performance Engineering Laboratory c 24

Efficiency of various sorting functions for random integers 1800 Efficiency of 2-, 3-, 4-ary heaps SGI::partial_sort() Bottom−up approach: 3−ary heap 1600 Bottom−up approach: 2−ary heap Bottom−up approach: 4−ary heap Execution time per element [in nanoseconds] SGI::sort() 1400 1200 1000 800 600 400 200 0 1000 10000 100000 1e+06 1e+07 n

Efficiency of various sorting functions for random integers using ln comparison 16000 Efficiency of 2-, 3-, 4-ary heaps Bottom−up approach: 4−ary heap Bottom−up approach: 3−ary heap SGI::sort() 14000 Bottom−up approach: 2−ary heap Execution time per element [in nanoseconds] SGI::partial_sort() 12000 10000 8000 6000 4000 2000 0 1000 10000 100000 1e+06 1e+07 n

Helsinki, 8 December 2003 Title: The current truth about heaps - PDF document

Helsinki, 8 December 2003 Title: The current truth about heaps Speaker: Jyrki Katajainen Co-workers: Claus Jensen and Fabio Vitale This talk is about the heaps we all love. I will explain how the heap functions are im- plemented in the CPH

SYMBOLIC LOGIC UNIT 3: COMPUTING TRUTH VALUES Truth Values The truth value of a

Truth, T Truth-values, and the l like Fabien Schang National Research University Higher

Truth Revisited: What is Truth? Truth is Important. Pilate therefore said to Jesus: Art

Long Title Your Name Here Mount Holyoke College June 13, 2017 1 / 4 Section title subsection

HOW TO APPLY HELSINKI DESIGN WEEK Week HELSINKI DESIGN WEEK Founded in 2005 and held anually in

Helsinki Energy Challenge Live broadcast 4 May 2020 Emissions in Helsinki How can we

Financial indicators Passenger traffic volumes Current Assumptions 2030 Helsinki Helsinki- 70

The XML Metalanguage Mika Raento mika.raento@cs.helsinki.fi University of Helsinki

Chinese merger control Title Title Title Title Title Author Author Peter J Wang Firm Firm Firm

Patons Lane CLC Meeting INSERT DIVIDER TITLE 13 11 12 INSERT DIVIDER TITLE 14 A INSERT

ANNUAL GENERAL MEETING 12 INSERT DIVIDER TITLE 14 A INSERT DIVIDER TITLE 15 BINGO INDUSTRIES

Norwegian Air Shuttle ASA (NAS) Q4 2003 and FY 2003 24-26 February 2004 Agenda _ Introduction

CNGS Horns : Status 8 nov. 2003 NBI 2003 KEK, Japan 7-11 nov. 2003 NBI 2003 - CNGS Horns

2003 AGM AGM P P RESENTATION 2003 RESENTATION 15th July 2003 2003 AGM Presentation

Keppel Land Keppel Land Interim Results 2003 Interim Results 2003 24 July 2003 24 July 2003

SPEAKING TRUTH SPEAKING TRUTH Ti e Impact of World Religions on Leadership for Social Change: C

Monitoring Tool for Analysing the Use of the Internet Services in the Spanish Academic

mem o ry noun \memr, mem\ 1 a: the power or process of reproducing or

Devopsn the Operating System John Willis Director of Ecosystem Development Docker, Inc.

DNSSEC update TF MNM, Lyon Roland van Rijswijk roland.vanrijswijk [at] surfnet.nl February 16 th

Operating Systems Administration Curso de Adaptaci on ao Grao para enxe neiros t ecnicos

NetFORCE NetFORCE DATA INTEGRITY AND AVAILABILITY DATA INTEGRITY AND AVAILABILITY ACROSS ALL

1983 Thank you, fellow time travelers from the Future Dave Cheney Alan Donovan Steve

Adaptive Data Structures for IP Lookups Ioannis Ioannidis, Ananth Grama, and Mikhail Atallah