Algorithm Engineering (aka. How to Write Fast Code) CS26 S260 - PowerPoint PPT Presentation

Algorithm Engineering (aka. How to Write Fast Code) CS26 S260 – Lecture cture 6 Yan n Gu I/O Algorithms and Parallel Samplesort

The I/O Model CS260: Algorithm Sampling in Algorithm Design Engineering Lecture 6 Parallel Samplesort 2

Last week - The I/O model • The I/O O model el has two speci cial al memor mory y transfer sfer instructions: ructions: • Read transfe nsfer: load a block from slow memory • Write te transf sfer: write a block to slow memory • The co comp mplexi lexity ty of an algor orithm ithm on the I/O O model del (I/O O co complexi plexity) ty) is measur sured ed by: y: #( #(rea ead tran ansfe sfers) rs) + #( #(write e transfe ansfers) rs) Slow Memory Fast Memory 1 0 CPU 𝑁/𝐶 1 𝐶

Cache-Oblivious Algorithms • Alg lgorit ithms hms not paramete meteriz ized ed by 𝐶 or 𝑁 • These algorithms are unaware of the parameters of the memory hierarchy • Analy lyze ze in in the id ideal l cache model el — same e as the I/O m model l except pt optim imal al repla laceme ement nt is is assum sumed ed Fast Memory Slow Memory 1 0 CPU 𝑁/𝐶 1 𝐶

Why Sampling? • Yan has an array {𝒃 𝟏 , 𝒃 𝟐 , … , 𝒃 𝒐−𝟐 } such that 𝒃 𝒋 = 𝟏 or 𝟐 , and Yan wants to know how many 𝟏 (s) in the array • Scan, linear work, can be parallelized • Sounds like a good idea?

Why Sampling? • Yan has an array {𝒃 𝟏 , 𝒃 𝟐 , … , 𝒃 𝒐−𝟐 } and a function 𝒈(⋅) such that 𝒈(𝒃 𝒋 ) = 𝟏 or 𝟐 , and Yan wants to know how many 𝒈(𝒃 𝒋 ) = 𝟏

Why Sampling? • Yan has an array {𝒃 𝟏 , 𝒃 𝟐 , … , 𝒃 𝒐−𝟐 } and 𝒐 function 𝒈 𝟐 ⋅ , … , 𝒈 𝒐 (⋅) such that 𝒈 𝒌 (𝒃 𝒋 ) = 𝟏 or 𝟐 , and Yan wants to know how many 𝒈 𝒌 (𝒃 𝒋 ) = 𝟏 • Takes quadratic work, does not work for reasonable input size • Examples: • Find the median 𝑛 of 𝑏 𝑗 , 𝑔 𝑛 𝑏 𝑗 = "𝑏 𝑗 < 𝑛" , check if #(𝑔 𝑏 𝑘 𝑏 𝑗 = 0) is 𝑜/2 𝑜 3𝑜 • Find a good pivot 𝑞 in quicksort (e.g., 4 ≤ #(𝑔 𝑞 𝑏 𝑗 = 0) ≤ 4 ) • Guarantee all sorts of properties in graph, geometry and other algorithms

Approximate Solution: Sampling • Yan has an array {𝒃 𝟏 , 𝒃 𝟐 , … , 𝒃 𝒐−𝟐 } and 𝒐 function 𝒈 ⋅ such that 𝒈(𝒃 𝒋 ) = 𝟏 or 𝟐 , and Yan wants to know how many 𝒈(𝒃 𝒋 ) = 𝟏 • Uniformly randomly pick 𝒍 elements, compute the 𝒈 𝒃 𝒋 = 𝟏 𝒐⋅𝒍 𝟏 case (denoted as 𝒍 𝟏 ), and estimate by 𝒍 • As long as 𝑙 is sufficiently large, we are “confident” with our estimation • On the other hand, when 𝑙 is small, the result can be random • When is the estimation good? • What is “good”?

Approximate Solution: Sampling • What is “good”? • With high probability (informal): happens with probability 1 − 𝑜 −𝑑 for any constant 𝑑 > 0 • This is large when 𝑜 is reasonably large, like > 10 6 • When is the estimation good? • Claim: when 𝑙 0 is Ω log 𝑜 • How can reality off from the estimate?

Approximate Solution: Sampling • When is the estimation good? • Claim: when 𝑙 0 is Ω log 𝑜 • How can reality off from the estimate? • Assume there are 𝑨 elements with 𝒈(𝒃 𝒋 ) = 𝟏 , and we have 𝑙 samples with 𝑙 0 hits. The expected #hits E 𝑙 0 = 𝑙𝑨/𝑜 . • The probability that this is off by 100% (i.e., 𝑙 0 > 2𝑙𝑨/𝑜 ) is 𝑓 − 𝑙𝑨 3𝑜 Chernoff bound: for 𝑜 independent random variables in {0, 1} , let 𝑌 be the sum, and 𝜈 = E 𝑌 , then for any 0 ≤ 𝜀 ≤ 1 , Pr 𝑌 ≥ 1 + 𝜀 𝜈 ≤ 𝑓 −𝜀 2 𝜈 3

Approximate Solution: Sampling • When is the estimation good? • Claim: when 𝑙 0 is Ω log 𝑜 • How can reality off from the estimate? • Assume there are 𝑨 elements with 𝒈(𝒃 𝒋 ) = 𝟏 , and we have 𝑙 samples with 𝑙 0 hits. The expected #hits E 𝑙 0 = 𝑙𝑨/𝑜 . • The probability that this is off by 100% (i.e., 𝑙 0 > 2𝑙𝑨/𝑜 ) is 𝑓 − 𝑙𝑨 3𝑜 • Since 𝑙 0 ≈ 𝑙𝑨/𝑜 , 𝑓 − 𝑙𝑨 3𝑜 is 𝑜 −𝑑 when 𝑙 0 = Ω log 𝑜 , because 𝑓 − 𝑙𝑨 3𝑜 ≈ 𝑓 − 𝑙0 3 < 𝑓 −𝑑 ′ log 2 𝑜 = 𝑜 −𝑑

Approximate Solution: Sampling • When is the estimation good? • Claim: when 𝑙 0 is Ω log 𝑜 • How can reality off from the estimate? • Assume there are 𝑨 elements with 𝒈(𝒃 𝒋 ) = 𝟏 , and we have 𝑙 samples with 𝑙 0 hits. The expected #hits E 𝑙 0 = 𝑙𝑨/𝑜 . • The probability that this is off by 1% (i.e., 𝑙 0 > 1.01𝑙𝑨/𝑜 ) is 𝑓 − 𝜀2𝑙𝑨 3𝑜 • Since 𝑙 0 ≈ 𝑙𝑨/𝑜 , 𝑓 − 𝜀2𝑙𝑨 3𝑜 is 𝑜 −𝑑 when 𝑙 0 = Ω log 𝑜 , because 𝑓 − 𝜀2𝑙𝑨 𝑙0 3⋅1002 < 𝑓 −𝑑 ′ log 2 𝑜 = 𝑜 −𝑑 3𝑜 ≈ 𝑓 − Chernoff bound: for 𝑜 independent random variables in {0, 1} , let 𝑌 be the sum, and 𝜈 = E 𝑌 , then for any 0 < 𝜀 < 1 , Pr 𝑌 ≥ 1 + 𝜀 𝜈 ≤ 𝑓 −𝜀 2 𝜈 3

Rule of Thumbs for Sampling • Example Applications: • Find the median 𝑛 of 𝑏 𝑗 , 𝑔 𝑏 𝑗 = "𝑏 𝑗 < 𝑛" , check if #(𝑔 𝑏 𝑘 𝑏 𝑗 = 0) is 𝑜/2 𝑜 3𝑜 • Find a good pivot 𝑞 in quicksort (e.g., 4 ≤ #(𝑔 𝑞 𝑏 𝑗 = 0) ≤ 4 ) • Guarantee all sorts of properties in graph, geometry and other algorithms • Take some samples! Uniformly randomly pick 𝒍 elements, 𝒐⋅𝒍 𝟏 compute the 𝒈 𝒃 𝒋 = 𝟏 case (denoted as 𝒍 𝟏 ), and estimate by 𝒍 • 4 sample hits gives you reasonable result • 20 sample hits gives you confident • 100 sample hits is sufficient! • Remember: only hits count

Parallel and I/O-efficient Sorting Algorithms • Cla lassi sic c sortin ing g alg lgorit ithm hms s are easy y to b be p parallel lleliz ized ed • Quicksort: find a “good” pivot, apply partition (filter) to find elements that are smaller and that are larger, and recurse • Mergesort: apply parallel merge for log 2 𝑜 rounds • But not I/O efficient since we need log 2 𝑜 rounds of global data movement • We now introduce samplesort, which is both highly in parallel and I/O efficient

Sample-sort outline Analo logou gous s to mult ltiw iway ay quic ickso ksort 1. 1. Sp Spli lit in input ut array in into 𝑂 contiguo iguous us suba barra rrays ys of siz ize 𝑂 . So Sort subar arrays rays recursi sivel vely … 𝑂 , sorted 𝑂

Sample-sort outline Analo logou gous s to mult ltiw iway ay quic ickso ksort 𝑂 , sorted 1. 1. Sp Spli lit in input ut array in into 𝑂 contiguo iguous us suba barra rrays ys of siz ize 𝑂 . So Sort subar arrays rays recursi sivel vely y (sequ equent entia ially lly) …

Sample-sort outline 2. 2. Choo oose se 𝑂 − 1 “good” pivots 𝑂 , sorted 𝑞 1 ≤ 𝑞 2 ≤ ⋯ ≤ 𝑞 𝑂−1 3. 3. Dis istribu ribute te su subar barrays rays in into o buckets ckets , , ac accordin ording g to … pivot vots Size ≈ 𝑂 ≤ 𝑞 1 ≤ ≤ 𝑞 2 ≤ ⋯ ≤ 𝑞 𝑂−1 ≤ Bucket 1 Bucket 2 Bucket 𝑂

Sample-sort outline 4. Recurs 4. cursively ively sort rt the buckets ckets ≤ 𝑞 1 ≤ ≤ 𝑞 2 ≤ ⋯ ≤ 𝑞 𝑂−1 ≤ Bucket 1 Bucket 2 Bucket 𝑂 5. 5. Copy py conca oncatenated tenated buckets ckets bac ack k to input put ar arra ray sorted

Choosing good pivots based on sampling 2. 2. Cho hoose ose 𝑂 − 1 “good” pivots 𝑞 1 ≤ 𝑞 2 ≤ ⋯ ≤ 𝑞 𝑂−1 Can an be ac achieved ieved by y ra randoml domly y pic ick k 𝑑 𝑂 log 𝑂 ra rando dom m sam amples les, , sort rt them m an and pick ck the eve very ry 𝑑 log 𝑂 -th th element ment This is step p is fa fast

Sequential local sorts (e.g., call stl::sort) 1. 1. Sp Spli lit in input ut array in into 𝑂 contiguo iguous us subar array ays of siz ize 𝑂 . So Sort rt suba barray rrays s re recu cursi rsivel vely y (sequen quentia ially) lly) … 𝑂 , sorted 4. Recur ursi sively vely sort the buckets ets (sequ quenti ential al) ≤ 𝑞 1 ≤ ≤ 𝑞 2 ≤ ⋯ ≤ 𝑞 𝑂−1 ≤ Bucket 1 Bucket 2 Bucket 𝑂

Key Part: the Distribution Phase 3. . Dis istribute ribute su subarr arrays ays in into to 𝑂 , sorted buck uckets ets , , ac according cording to pivot vots … Size ≈ 𝑂 ≤ 𝑞 1 ≤ ≤ 𝑞 2 ≤ ⋯ ≤ 𝑞 𝑂−1 ≤ Bucket 1 Bucket 2 Bucket 𝑂

Algorithm Engineering (aka. How to Write Fast Code) CS26 S260 - PowerPoint PPT Presentation

Algorithm Engineering (aka. How to Write Fast Code) CS26 S260 Lecture cture 6 Yan n Gu I/O Algorithms and Parallel Samplesort The I/O Model CS260: Algorithm Sampling in Algorithm Design Engineering Lecture 6 Parallel Samplesort 2

Odds Algorithm An Online Algorithm Group Fibonado 20. Dec 2016 Group Fibonado Odds Algorithm

Algorithm Engineering (aka. How to Write Fast Code) CS26 S260 Lecture cture 10 Yan n Gu

Visible Surface Determination CS418 Computer Graphics John C. Hart Painters Algorithm

Algorithm Analysis October 12, 2016 CMPE 250 Algorithm Analysis October 12, 2016 1 / 66

Shortest path using A Algorithm Introduction History Components of A Algorithm

Stoer-Wagner Algorithm A Minimum Cut Algorithm for Undirected Graphs BigNews CS214: Algorithms

Quiz I Give the SVD-based algorithm for solving least squares, and I justify the algorithm by that

Some More Critical Section Solutions Dr. Liam OConnor University of Edinburgh LFCS (and UNSW)

A-Star Algorithm & Heaps/Priority Queues Mark Redekopp 2 A* Search Algorithm ALGORITHM

Earley algorithm Earley: introduction Example of Earley algorithm Scott Farrar CLMA,

The BBS Algorithm The BBS Algorithm The BBS Algorithm Prof. Paolo Ciaccia Prof. Paolo Ciaccia

Avoiding Register Overflow in the Bakery Algorithm The Bakery++ Algorithm The Bakery algorithm is

Trip Report FINAL MEETING AND SUMMER SCHOOL OF DFG PRIORITY PROGRAM ALGORITHM ENGINEERING DFG

Dijkstras Algorithm Austin Saporito and Charlie Rizzo Test Questions 1. What is the run time

Pollards Rho Algorithm for Elliptic Curves Aaron Blumenfeld November 30, 2015 Aaron

K-MEANS++ OPTIMAL INITIALIZATION ALGORITHM An Improved K-means Clustering Method OVERVIEW

Towards A Parallel and Restartable Data Transfer Mechanism in iRODS Zoey Greer Jason Coposky

Computer Graphics Seminar MTAT.03.305 Spring 2015 Raimond Tunnel Conclusion Geometry Front

Perception with Point Clouds Robert Platt Northeastern University Topics depth sensors

Flexible multibody dynamics: From FE formulations to control and optimization Olivier Brls

ARCHER/RDF Overview How do they fit together? Andy Turner, EPCC a.turner@epcc.ed.ac.uk

Algorithms for NLP CS 11711, Fall 2019 Lecture 21: Machine Translation I Yulia Tsvetkov 1

Portable Parallel I/O Handling large datasets in heterogeneous parallel environments May 21,

Extracting Semantic Transfer Rules from Parallel Corpora with SMT Phrase Aligners Petter

Algorithm Engineering (aka. How to Write Fast Code) CS26 S260 - PowerPoint PPT Presentation

Algorithm Engineering (aka. How to Write Fast Code) CS26 S260 Lecture cture 6 Yan n Gu I/O Algorithms and Parallel Samplesort The I/O Model CS260: Algorithm Sampling in Algorithm Design Engineering Lecture 6 Parallel Samplesort 2

Odds Algorithm An Online Algorithm Group Fibonado 20. Dec 2016 Group Fibonado Odds Algorithm

Algorithm Engineering (aka. How to Write Fast Code) CS26 S260 Lecture cture 10 Yan n Gu

Visible Surface Determination CS418 Computer Graphics John C. Hart Painters Algorithm

Algorithm Analysis October 12, 2016 CMPE 250 Algorithm Analysis October 12, 2016 1 / 66

Shortest path using A Algorithm Introduction History Components of A Algorithm

Stoer-Wagner Algorithm A Minimum Cut Algorithm for Undirected Graphs BigNews CS214: Algorithms

Quiz I Give the SVD-based algorithm for solving least squares, and I justify the algorithm by that

Some More Critical Section Solutions Dr. Liam OConnor University of Edinburgh LFCS (and UNSW)

A-Star Algorithm &amp; Heaps/Priority Queues Mark Redekopp 2 A* Search Algorithm ALGORITHM

Earley algorithm Earley: introduction Example of Earley algorithm Scott Farrar CLMA,

The BBS Algorithm The BBS Algorithm The BBS Algorithm Prof. Paolo Ciaccia Prof. Paolo Ciaccia

Avoiding Register Overflow in the Bakery Algorithm The Bakery++ Algorithm The Bakery algorithm is

Trip Report FINAL MEETING AND SUMMER SCHOOL OF DFG PRIORITY PROGRAM ALGORITHM ENGINEERING DFG

Dijkstras Algorithm Austin Saporito and Charlie Rizzo Test Questions 1. What is the run time

Pollards Rho Algorithm for Elliptic Curves Aaron Blumenfeld November 30, 2015 Aaron

K-MEANS++ OPTIMAL INITIALIZATION ALGORITHM An Improved K-means Clustering Method OVERVIEW

Towards A Parallel and Restartable Data Transfer Mechanism in iRODS Zoey Greer Jason Coposky

Computer Graphics Seminar MTAT.03.305 Spring 2015 Raimond Tunnel Conclusion Geometry Front

Perception with Point Clouds Robert Platt Northeastern University Topics depth sensors

Flexible multibody dynamics: From FE formulations to control and optimization Olivier Brls

ARCHER/RDF Overview How do they fit together? Andy Turner, EPCC a.turner@epcc.ed.ac.uk

Algorithms for NLP CS 11711, Fall 2019 Lecture 21: Machine Translation I Yulia Tsvetkov 1

Portable Parallel I/O Handling large datasets in heterogeneous parallel environments May 21,

Extracting Semantic Transfer Rules from Parallel Corpora with SMT Phrase Aligners Petter

A-Star Algorithm & Heaps/Priority Queues Mark Redekopp 2 A* Search Algorithm ALGORITHM