Algorithm Engineering (aka. How to Write Fast Code) CS26 S260 - - PowerPoint PPT Presentation

β–Ά
algorithm engineering
SMART_READER_LITE
LIVE PREVIEW

Algorithm Engineering (aka. How to Write Fast Code) CS26 S260 - - PowerPoint PPT Presentation

Algorithm Engineering (aka. How to Write Fast Code) CS26 S260 Lecture cture 6 Yan n Gu I/O Algorithms and Parallel Samplesort The I/O Model CS260: Algorithm Sampling in Algorithm Design Engineering Lecture 6 Parallel Samplesort 2


slide-1
SLIDE 1

Algorithm Engineering

(aka. How to Write Fast Code)

I/O Algorithms and Parallel Samplesort

CS26 S260 – Lecture cture 6 Yan n Gu

slide-2
SLIDE 2

CS260: Algorithm Engineering Lecture 6

2

The I/O Model Sampling in Algorithm Design Parallel Samplesort

slide-3
SLIDE 3

3

slide-4
SLIDE 4
  • The I/O

O model el has two speci cial al memor mory y transfer sfer instructions: ructions:

  • Read transfe

nsfer: load a block from slow memory

  • Write

te transf sfer: write a block to slow memory

  • The co

comp mplexi lexity ty of an algor

  • rithm

ithm on the I/O O model del (I/O O co complexi plexity) ty) is measur sured ed by: y: #( #(rea ead tran ansfe sfers) rs) + #( #(write e transfe ansfers) rs)

Last week - The I/O model

CPU

Fast Memory Slow Memory

1 1

𝑁/𝐢

𝐢

slide-5
SLIDE 5

Cache-Oblivious Algorithms

  • Alg

lgorit ithms hms not paramete meteriz ized ed by 𝐢 or 𝑁

  • These algorithms are unaware of the parameters of the memory

hierarchy

  • Analy

lyze ze in in the id ideal l cache model el β€” same e as the I/O m model l except pt optim imal al repla laceme ement nt is is assum sumed ed

CPU

Fast Memory

1 1

𝑁/𝐢

𝐢

Slow Memory

slide-6
SLIDE 6

CS260: Algorithm Engineering Lecture 6

6

The I/O Model Sampling in Algorithm Design Parallel Samplesort

slide-7
SLIDE 7
  • Yan has an array {π’ƒπŸ, π’ƒπŸ, … , π’ƒπ’βˆ’πŸ} such that 𝒃𝒋 = 𝟏 or 𝟐, and

Yan wants to know how many 𝟏(s) in the array

  • Scan, linear work, can be parallelized
  • Sounds like a good idea?

Why Sampling?

slide-8
SLIDE 8
  • Yan has an array {π’ƒπŸ, π’ƒπŸ, … , π’ƒπ’βˆ’πŸ} and a function π’ˆ(β‹…) such that

π’ˆ(𝒃𝒋) = 𝟏 or 𝟐, and Yan wants to know how many π’ˆ(𝒃𝒋) = 𝟏

Why Sampling?

slide-9
SLIDE 9
  • Yan has an array {π’ƒπŸ, π’ƒπŸ, … , π’ƒπ’βˆ’πŸ} and 𝒐 function π’ˆπŸ β‹… , … , π’ˆπ’(β‹…)

such that π’ˆπ’Œ(𝒃𝒋) = 𝟏 or 𝟐, and Yan wants to know how many π’ˆπ’Œ(𝒃𝒋) = 𝟏

  • Takes quadratic work, does not work for reasonable input size
  • Examples:
  • Find the median 𝑛 of 𝑏𝑗, 𝑔

𝑛 𝑏𝑗 = "𝑏𝑗 < 𝑛", check if #(𝑔 π‘π‘˜ 𝑏𝑗 = 0) is π‘œ/2

  • Find a good pivot π‘ž in quicksort (e.g.,

π‘œ 4 ≀ #(𝑔 π‘ž 𝑏𝑗 = 0) ≀ 3π‘œ 4 )

  • Guarantee all sorts of properties in graph, geometry and other algorithms

Why Sampling?

slide-10
SLIDE 10
  • Yan has an array {π’ƒπŸ, π’ƒπŸ, … , π’ƒπ’βˆ’πŸ} and 𝒐 function π’ˆ β‹… such that

π’ˆ(𝒃𝒋) = 𝟏 or 𝟐, and Yan wants to know how many π’ˆ(𝒃𝒋) = 𝟏

  • Uniformly randomly pick 𝒍 elements, compute the π’ˆ 𝒃𝒋 = 𝟏

case (denoted as π’πŸ), and estimate by

π’β‹…π’πŸ 𝒍

  • As long as 𝑙 is sufficiently large, we are β€œconfident” with our estimation
  • On the other hand, when 𝑙 is small, the result can be random
  • When is the estimation good?
  • What is β€œgood”?

Approximate Solution: Sampling

slide-11
SLIDE 11
  • What is β€œgood”?
  • With high probability (informal): happens with probability 1 βˆ’ π‘œβˆ’π‘‘ for any

constant 𝑑 > 0

  • This is large when π‘œ is reasonably large, like > 106
  • When is the estimation good?
  • Claim: when 𝑙0 is Ξ© log π‘œ
  • How can reality off from the estimate?

Approximate Solution: Sampling

slide-12
SLIDE 12
  • When is the estimation good?
  • Claim: when 𝑙0 is Ξ© log π‘œ
  • How can reality off from the estimate?
  • Assume there are 𝑨 elements with π’ˆ(𝒃𝒋) = 𝟏, and we have 𝑙 samples with

𝑙0 hits. The expected #hits E 𝑙0 = 𝑙𝑨/π‘œ.

  • The probability that this is off by 100% (i.e., 𝑙0 > 2𝑙𝑨/π‘œ) is π‘“βˆ’π‘™π‘¨

3π‘œ

Approximate Solution: Sampling

Chernoff bound: for π‘œ independent random variables in {0, 1}, let π‘Œ be the sum, and 𝜈 = E π‘Œ , then for any 0 ≀ πœ€ ≀ 1, Pr π‘Œ β‰₯ 1 + πœ€ 𝜈 ≀ π‘“βˆ’πœ€2𝜈

3

slide-13
SLIDE 13
  • When is the estimation good?
  • Claim: when 𝑙0 is Ξ© log π‘œ
  • How can reality off from the estimate?
  • Assume there are 𝑨 elements with π’ˆ(𝒃𝒋) = 𝟏, and we have 𝑙 samples with

𝑙0 hits. The expected #hits E 𝑙0 = 𝑙𝑨/π‘œ.

  • The probability that this is off by 100% (i.e., 𝑙0 > 2𝑙𝑨/π‘œ) is π‘“βˆ’π‘™π‘¨

3π‘œ

  • Since 𝑙0 β‰ˆ 𝑙𝑨/π‘œ, π‘“βˆ’π‘™π‘¨

3π‘œ is π‘œβˆ’π‘‘ when 𝑙0 = Ξ© log π‘œ , because

π‘“βˆ’π‘™π‘¨

3π‘œ β‰ˆ π‘“βˆ’π‘™0 3 < π‘“βˆ’π‘‘β€² log2π‘œ = π‘œβˆ’π‘‘

Approximate Solution: Sampling

slide-14
SLIDE 14
  • When is the estimation good?
  • Claim: when 𝑙0 is Ξ© log π‘œ
  • How can reality off from the estimate?
  • Assume there are 𝑨 elements with π’ˆ(𝒃𝒋) = 𝟏, and we have 𝑙 samples with

𝑙0 hits. The expected #hits E 𝑙0 = 𝑙𝑨/π‘œ.

  • The probability that this is off by 1% (i.e., 𝑙0 > 1.01𝑙𝑨/π‘œ) is π‘“βˆ’πœ€2𝑙𝑨

3π‘œ

  • Since 𝑙0 β‰ˆ 𝑙𝑨/π‘œ, π‘“βˆ’πœ€2𝑙𝑨

3π‘œ is π‘œβˆ’π‘‘ when 𝑙0 = Ξ© log π‘œ , because

π‘“βˆ’πœ€2𝑙𝑨

3π‘œ β‰ˆ π‘“βˆ’ 𝑙0 3β‹…1002 < π‘“βˆ’π‘‘β€² log2π‘œ = π‘œβˆ’π‘‘

Approximate Solution: Sampling

Chernoff bound: for π‘œ independent random variables in {0, 1}, let π‘Œ be the sum, and 𝜈 = E π‘Œ , then for any 0 < πœ€ < 1, Pr π‘Œ β‰₯ 1 + πœ€ 𝜈 ≀ π‘“βˆ’πœ€2𝜈

3

slide-15
SLIDE 15
  • Example Applications:
  • Find the median 𝑛 of 𝑏𝑗, 𝑔 𝑏𝑗 = "𝑏𝑗 < 𝑛", check if #(𝑔

π‘π‘˜ 𝑏𝑗 = 0) is π‘œ/2

  • Find a good pivot π‘ž in quicksort (e.g.,

π‘œ 4 ≀ #(𝑔 π‘ž 𝑏𝑗 = 0) ≀ 3π‘œ 4 )

  • Guarantee all sorts of properties in graph, geometry and other algorithms
  • Take some samples! Uniformly randomly pick 𝒍 elements,

compute the π’ˆ 𝒃𝒋 = 𝟏 case (denoted as π’πŸ), and estimate by

π’β‹…π’πŸ 𝒍

  • 4 sample hits gives you reasonable result
  • 20 sample hits gives you confident
  • 100 sample hits is sufficient!
  • Remember: only hits count

Rule of Thumbs for Sampling

slide-16
SLIDE 16

CS260: Algorithm Engineering Lecture 6

16

The I/O Model Sampling in Algorithm Design Parallel Samplesort

slide-17
SLIDE 17

Parallel and I/O-efficient Sorting Algorithms

  • Cla

lassi sic c sortin ing g alg lgorit ithm hms s are easy y to b be p parallel lleliz ized ed

  • Quicksort: find a β€œgood” pivot, apply partition (filter) to find

elements that are smaller and that are larger, and recurse

  • Mergesort: apply parallel merge for log2 π‘œ rounds
  • But not I/O efficient since we need log2 π‘œ rounds of global data

movement

  • We now introduce samplesort, which is both highly in parallel and

I/O efficient

slide-18
SLIDE 18

Sample-sort outline

Analo logou gous s to mult ltiw iway ay quic ickso ksort 1.

  • 1. Sp

Spli lit in input ut array in into 𝑂 contiguo iguous us suba barra rrays ys of siz ize 𝑂. So Sort subar arrays rays recursi sivel vely

… 𝑂, sorted 𝑂

slide-19
SLIDE 19

Sample-sort outline

𝑂, sorted …

Analo logou gous s to mult ltiw iway ay quic ickso ksort 1.

  • 1. Sp

Spli lit in input ut array in into 𝑂 contiguo iguous us suba barra rrays ys of siz ize 𝑂. So Sort subar arrays rays recursi sivel vely y (sequ equent entia ially lly)

slide-20
SLIDE 20

Sample-sort outline

2.

  • 2. Choo
  • ose

se 𝑂 βˆ’ 1 β€œgood” pivots π‘ž1 ≀ π‘ž2 ≀ β‹― ≀ π‘ž π‘‚βˆ’1 3.

  • 3. Dis

istribu ribute te su subar barrays rays in into

  • buckets

ckets, , ac accordin

  • rding

g to pivot vots

𝑂, sorted … Bucket 1 Bucket 2 Bucket 𝑂 ≀ π‘ž1 ≀ ≀ π‘ž2 ≀ β‹― ≀ π‘ž π‘‚βˆ’1 ≀

Size β‰ˆ 𝑂

slide-21
SLIDE 21

4.

  • 4. Recurs

cursively ively sort rt the buckets ckets 5.

  • 5. Copy

py conca

  • ncatenated

tenated buckets ckets bac ack k to input put ar arra ray

Sample-sort outline

Bucket 1 Bucket 2 Bucket 𝑂 ≀ π‘ž1 ≀ ≀ π‘ž2 ≀ β‹― ≀ π‘ž π‘‚βˆ’1 ≀ sorted

slide-22
SLIDE 22

Choosing good pivots based on sampling

2.

  • 2. Cho

hoose

  • se 𝑂 βˆ’ 1 β€œgood” pivots π‘ž1 ≀ π‘ž2 ≀ β‹― ≀

π‘ž π‘‚βˆ’1

Can an be ac achieved ieved by y ra randoml domly y pic ick k 𝑑 𝑂 log 𝑂 ra rando dom m sam amples les, , sort rt them m an and pick ck the eve very ry 𝑑 log 𝑂 -th th element ment This is step p is fa fast

slide-23
SLIDE 23

Sequential local sorts (e.g., call stl::sort)

1.

  • 1. Sp

Spli lit in input ut array in into 𝑂 contiguo iguous us subar array ays of siz ize 𝑂. So Sort rt suba barray rrays s re recu cursi rsivel vely y (sequen quentia ially) lly) 4. Recur ursi sively vely sort the buckets ets (sequ quenti ential al)

… 𝑂, sorted Bucket 1 Bucket 2 Bucket 𝑂 ≀ π‘ž1 ≀ ≀ π‘ž2 ≀ β‹― ≀ π‘ž π‘‚βˆ’1 ≀

slide-24
SLIDE 24

Key Part: the Distribution Phase

3. . Dis istribute ribute su subarr arrays ays in into to buck uckets ets, , ac according cording to pivot vots

𝑂, sorted … Bucket 1 Bucket 2 Bucket 𝑂 ≀ π‘ž1 ≀ ≀ π‘ž2 ≀ β‹― ≀ π‘ž π‘‚βˆ’1 ≀

Size β‰ˆ 𝑂

slide-25
SLIDE 25

Key Part: the Distribution Phase

  • For si

simpli plicity ity, , assum sume e 𝒐 = πŸπŸ•, a and the in input ut is is [𝟐, πŸ‘, πŸ’, πŸ“, 𝟐, 𝟐, πŸ’, πŸ’, 𝟐, πŸ‘, πŸ‘, πŸ“, 𝟐, πŸ‘, πŸ“, πŸ“]

  • Fir

irst, , ge get the count t for each subar array ray in in ea each bucket et [𝟐, 𝟐, 𝟐, 𝟐, πŸ‘, 𝟏, πŸ‘, 𝟏, 𝟐, πŸ‘, 𝟏, 𝟐, 𝟐, 𝟐, 𝟏, πŸ‘]

  • Then,

, transpos spose e the array and d scan to co compute ute the offse fsets ts [𝟐, πŸ‘, 𝟐, 𝟐, 𝟐, 𝟏, πŸ‘, 𝟐, 𝟐, πŸ‘, 𝟏, 𝟏, 𝟐, 𝟏, 𝟐, πŸ‘] [𝟏, 𝟐, πŸ’, πŸ“, πŸ”, πŸ•, πŸ•, πŸ—, 𝟘, 𝟐𝟏, πŸπŸ‘, πŸπŸ‘, πŸπŸ‘, πŸπŸ’, πŸπŸ’, πŸπŸ“]

  • Lastly

ly, , move e each ele lement ent to th the correspo pond nding ing bucket et [βˆ…, βˆ…, βˆ…, βˆ…, βˆ…, βˆ…, βˆ…, βˆ…, βˆ…, βˆ…, βˆ…, βˆ…, βˆ…, βˆ…, βˆ…, βˆ…]

25

[𝟐, βˆ…, βˆ…, βˆ…, βˆ…, πŸ‘, βˆ…, βˆ…, βˆ…, πŸ’, βˆ…, βˆ…, πŸ“, βˆ…, βˆ…, βˆ…] [𝟐, 𝟐, 𝟐, βˆ…, βˆ…, πŸ‘, βˆ…, βˆ…, βˆ…, πŸ’, πŸ’, πŸ’, πŸ“, βˆ…, βˆ…, βˆ…] [𝟐, 𝟐, 𝟐, 𝟐, 𝟐, πŸ‘, πŸ‘, πŸ‘, πŸ‘, πŸ’, πŸ’, πŸ’, πŸ“, πŸ“, πŸ“, πŸ“]

slide-26
SLIDE 26

Additional Details Left for You

  • How to

to d decid ide the count of each bucket et in in ea each suba barray ay

  • Hint: use a (sequential) merge algorithm
  • How to

to tr transpos spose the array for counts ts and writ ite the ele lements ents to b bucket ets s I/O effi ficie ientl ntly

  • Hint: use divide-and-conquer
  • Fin

ind d the best t #piv ivots ts and #subar arrays rays

  • How does #pivots and #subarrays affect performance?

26

slide-27
SLIDE 27

Samplesort is I/O-efficient

  • Only

ly need d two rounds ds of gl global l data accesses sses

  • For input size π‘œ between 10 million and 100 billion
  • In the mid

idterm erm project, t, you can choose e to im imple lemen ment this is alg lgorit ithm hm and engi ginee neer r the perfo forman rmance

  • This is harder than matrix multiplication, but easier than semisort
  • Expected score is 100%
  • Dis

iscussio ussion: n: what t is is th the work for sampl plesor esort? And what t about t depth? th?

27

slide-28
SLIDE 28

Next lecture: Semisort

  • https:

ps://ww www. w.cs.uc cs.ucr.ed .edu/~ u/~yg ygu/ u/te teachi aching ng/alge algeng ng/al alge geng.htm ng.html

  • https:

ps://il ilear earn.ucr n.ucr.ed .edu/ u/web webap apps/ ps/blac blackboa board/ d/exec execute/ ute/ann announc unc ement? nt?met method=se hod=sear arch&co ch&cont ntext=c ext=course& urse&cour

  • urse_

se_id=_3 id=_307782_ 1

28