Algorithm Engineering
(aka. How to Write Fast Code)
I/O Algorithms and Parallel Samplesort
CS26 S260 β Lecture cture 6 Yan n Gu
Algorithm Engineering (aka. How to Write Fast Code) CS26 S260 - - PowerPoint PPT Presentation
Algorithm Engineering (aka. How to Write Fast Code) CS26 S260 Lecture cture 6 Yan n Gu I/O Algorithms and Parallel Samplesort The I/O Model CS260: Algorithm Sampling in Algorithm Design Engineering Lecture 6 Parallel Samplesort 2
CS26 S260 β Lecture cture 6 Yan n Gu
2
3
O model el has two speci cial al memor mory y transfer sfer instructions: ructions:
nsfer: load a block from slow memory
te transf sfer: write a block to slow memory
comp mplexi lexity ty of an algor
ithm on the I/O O model del (I/O O co complexi plexity) ty) is measur sured ed by: y: #( #(rea ead tran ansfe sfers) rs) + #( #(write e transfe ansfers) rs)
CPU
Fast Memory Slow Memory
1 1
π/πΆ
πΆ
lgorit ithms hms not paramete meteriz ized ed by πΆ or π
hierarchy
lyze ze in in the id ideal l cache model el β same e as the I/O m model l except pt optim imal al repla laceme ement nt is is assum sumed ed
CPU
Fast Memory
1 1
π/πΆ
πΆ
Slow Memory
6
Yan wants to know how many π(s) in the array
π(ππ) = π or π, and Yan wants to know how many π(ππ) = π
such that ππ(ππ) = π or π, and Yan wants to know how many ππ(ππ) = π
π ππ = "ππ < π", check if #(π ππ ππ = 0) is π/2
π 4 β€ #(π π ππ = 0) β€ 3π 4 )
π(ππ) = π or π, and Yan wants to know how many π(ππ) = π
case (denoted as ππ), and estimate by
πβ ππ π
constant π > 0
π0 hits. The expected #hits E π0 = ππ¨/π.
3π
3
π0 hits. The expected #hits E π0 = ππ¨/π.
3π
3π is πβπ when π0 = Ξ© log π , because
πβππ¨
3π β πβπ0 3 < πβπβ² log2π = πβπ
π0 hits. The expected #hits E π0 = ππ¨/π.
3π
3π is πβπ when π0 = Ξ© log π , because
πβπ2ππ¨
3π β πβ π0 3β 1002 < πβπβ² log2π = πβπ
Chernoff bound: for π independent random variables in {0, 1}, let π be the sum, and π = E π , then for any 0 < π < 1, Pr π β₯ 1 + π π β€ πβπ2π
3
ππ ππ = 0) is π/2
π 4 β€ #(π π ππ = 0) β€ 3π 4 )
compute the π ππ = π case (denoted as ππ), and estimate by
πβ ππ π
16
lassi sic c sortin ing g alg lgorit ithm hms s are easy y to b be p parallel lleliz ized ed
elements that are smaller and that are larger, and recurse
movement
I/O efficient
Analo logou gous s to mult ltiw iway ay quic ickso ksort 1.
Spli lit in input ut array in into π contiguo iguous us suba barra rrays ys of siz ize π. So Sort subar arrays rays recursi sivel vely
β¦ π, sorted π
π, sorted β¦
Analo logou gous s to mult ltiw iway ay quic ickso ksort 1.
Spli lit in input ut array in into π contiguo iguous us suba barra rrays ys of siz ize π. So Sort subar arrays rays recursi sivel vely y (sequ equent entia ially lly)
π, sorted β¦ Bucket 1 Bucket 2 Bucket π β€ π1 β€ β€ π2 β€ β― β€ π πβ1 β€
Size β π
Bucket 1 Bucket 2 Bucket π β€ π1 β€ β€ π2 β€ β― β€ π πβ1 β€ sorted
1.
Spli lit in input ut array in into π contiguo iguous us subar array ays of siz ize π. So Sort rt suba barray rrays s re recu cursi rsivel vely y (sequen quentia ially) lly) 4. Recur ursi sively vely sort the buckets ets (sequ quenti ential al)
β¦ π, sorted Bucket 1 Bucket 2 Bucket π β€ π1 β€ β€ π2 β€ β― β€ π πβ1 β€
π, sorted β¦ Bucket 1 Bucket 2 Bucket π β€ π1 β€ β€ π2 β€ β― β€ π πβ1 β€
Size β π
simpli plicity ity, , assum sume e π = ππ, a and the in input ut is is [π, π, π, π, π, π, π, π, π, π, π, π, π, π, π, π]
irst, , ge get the count t for each subar array ray in in ea each bucket et [π, π, π, π, π, π, π, π, π, π, π, π, π, π, π, π]
, transpos spose e the array and d scan to co compute ute the offse fsets ts [π, π, π, π, π, π, π, π, π, π, π, π, π, π, π, π] [π, π, π, π, π, π, π, π, π, ππ, ππ, ππ, ππ, ππ, ππ, ππ]
ly, , move e each ele lement ent to th the correspo pond nding ing bucket et [β , β , β , β , β , β , β , β , β , β , β , β , β , β , β , β ]
25
[π, β , β , β , β , π, β , β , β , π, β , β , π, β , β , β ] [π, π, π, β , β , π, β , β , β , π, π, π, π, β , β , β ] [π, π, π, π, π, π, π, π, π, π, π, π, π, π, π, π]
to d decid ide the count of each bucket et in in ea each suba barray ay
to tr transpos spose the array for counts ts and writ ite the ele lements ents to b bucket ets s I/O effi ficie ientl ntly
ind d the best t #piv ivots ts and #subar arrays rays
26
ly need d two rounds ds of gl global l data accesses sses
idterm erm project, t, you can choose e to im imple lemen ment this is alg lgorit ithm hm and engi ginee neer r the perfo forman rmance
iscussio ussion: n: what t is is th the work for sampl plesor esort? And what t about t depth? th?
27
ps://ww www. w.cs.uc cs.ucr.ed .edu/~ u/~yg ygu/ u/te teachi aching ng/alge algeng ng/al alge geng.htm ng.html
ps://il ilear earn.ucr n.ucr.ed .edu/ u/web webap apps/ ps/blac blackboa board/ d/exec execute/ ute/ann announc unc ement? nt?met method=se hod=sear arch&co ch&cont ntext=c ext=course& urse&cour
se_id=_3 id=_307782_ 1
28