Vorlesung Datenstrukturen und Algorithmen Letzte Vorlesung 2018
Felix Friedrich, 30.5.2018 Map/Reduce Sorting Networks Prüfung
1
Vorlesung Datenstrukturen und Algorithmen Letzte Vorlesung 2018 - - PowerPoint PPT Presentation
Vorlesung Datenstrukturen und Algorithmen Letzte Vorlesung 2018 Felix Friedrich, 30.5.2018 Map/Reduce Sorting Networks Prfung 1 MAP AND REDUCE AND MAP/REDUCE 2 Summing a Vector Accumulator Divide and conquer + + + + + + + + +
Felix Friedrich, 30.5.2018 Map/Reduce Sorting Networks Prüfung
1
2
3
+ + + + + + + Accumulator + + + + + + + + Divide and conquer
Q: Why is the result the same? A: associativity: (a+b) + c = a + (b+c)
+ + + + + + + + Divide and conquer + + + + Is this correct? + + +
4
Only if the operation is commutative: a + b = b + a
Simple examples: sum, max Reductions over programmer-defined operations
– operation properties (associativity / commutativity) define the correct
executions
– supported in most parallel languages / frameworks – powerful construct
5
policy)
std::vector<double> v; ... double result = std::accumulate( v.begin(), v.end(), 0.0, [](double a, double b){return a + b;} );
6
Map
7 x
+ Multiply
x x x x x x x
Map Reduce
8 x
+ Multiply
x x x x x x x
+ + + + + + + + Accumulate
// example data
std::vector<double> v1(1024,0.5); auto v2 = v1; std::vector<double> result(1024); // map std::transform(v1.begin(), v1.end(), v2.begin(), result.begin(), [](double a, double b){return a*b;}); // reduce double value = std::accumulate(result.begin(), result.end(), 0.0); // = 256
9
Combination of two parallelisation patterns result = 𝑔 in1 ⊕ 𝑔 in2 ⊕ 𝑔(in3) ⊕ 𝑔(in4) 𝑔 = map ⊕ = reduce (associative) Examples: numerical reduction, word count in document, (word, document) list, maximal temperature per month over 50 years (etc.)
10
Maximal Temperature per Month for 50 years
Assume we (you and me) had to do this together. How would we distribute the work? What is the generic model? How would we be ideally prepared for different reductions (min, max, avg)?
11
12
data 01 / 5 01 / 8 01 / 8 .... 03 / 12 03 / 14 .... 05/20 05/19 05/20 ... 07/28 07/38 ... ... ... ... ... ... ... 03/14 03/18 .... ...
each map-process gets day/temperature pairs and maps them to month/temperature pairs
13
01 / -5 01 / -8 01 / -8 .... 03 / 12 03 / 14 .... 05/20 05/19 05/20 ... 07/28 07/38 ... ... ... ... ... ... ... 03/14 03/18 .... Jan 01 / -5 01 / -8 01 / -8 .... Feb 02 / 13 02 / 14 02 / 12 .... Mar 03 / 13 03 / 14 03 / 12 .... April 04 / 23 04 / 24 04 / 22 .... Dec 12 / 0 12 / -2 12/ 2 .... ...
data gets sorted / shuffled by month
14
Jan 01 / -5 01 / -8 01 / -8 .... Feb 02 / 13 02 / 14 02 / 12 .... Mar 03 / 13 03 / 14 03 / 12 .... April 04 / 23 04 / 24 04 / 22 .... Dec 12 / 0 12 / -2 12/ 2 .... Jan 18 Feb 20 Mar 22 April 28 Dec 20
each reduce process gets its own month with values and applies the reduction (here: max value) to it
15
Frameworks and tools have been written to perform map/reduce.
Big Data and Cloud Computing
(and actually not that new) and available with the Streams concept in Java (>=8)
frameworks.
16
Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: simplified data processing on large clusters. Commun. ACM 51, 1 (January 2008), 107-113. DOI=10.1145/1327452.1327492 http://doi.acm.org/10.1145/1327452.1327492
You may have heard of Google’s “map/reduce” or Amazon's Hadoop Idea: Perform maps/reduces on data using many machines
(key-value pairs) to a combined result
Separates how to do recursive divide-and-conquer from what computation to perform
distributed computing
17
Count word occurences in a very large file File =
18
how are you today do you like the weather outside I do I do I wish you the very best for your exams. GBytes
19
huge file part 1 part 2 part 3 key/value pairs (e.g. position / string) <0, "how are you today"> <15, "do you like ..."> <35, "I do"> <39, "I do"> <43, "I wish you the very best"> <70, "for your exams"> mapper 1 mapper 2 mapper 3
DISTRIBUTED
20
key/value pairs (e.g. position / string) <0, "how are you today"> <15, "do you like ..."> <35, "I do"> <39, "I do"> <43, "I wish you the very best"> <70, "for your exams"> mapper 1 mapper 2 mapper 3 key/value pairs (word, count)
<"how",1> <"are",1> <"you",1> ... <"I",1> <"do",1> <"I",1> <"do",1> <"I",1> <"wish",1> <"you",1> ...
input
21
mapper 1 mapper 2 mapper 3 unique key/value pairs (shuffled) (word, counts)
<"how",1> <"are",1> <"you",1> ... <"I",1> <"do",1> <"I",1> <"do",1> <"I",1> <"wish",1> <"you",1> ... <"do",1,1,1> <"for",1> ....
reducer 1 reducer 2
<"are",1> <"best",1> ...
key/value pairs (word, count)
DISTRIBUTED
22
unique key/value pairs (shuffled) (word, counts)
<"do",1,1,1> <"for",1> ....
reducer 1 reducer 2
<"are",1> <"best",1> ...
target file(s)
are 1 best 1 you 3 ... do 3 for 1 I 3 ...
input
23
24
Simple algorithms: O(n2) Fancier algorithms: O(n log n) Comparison lower bound: (n log n) Insertion sort Selection sort Bubble Sort Shell sort … Heap sort Merge sort Quick sort (avg) … Horrible algorithms: Ω(n2) Bogo Sort Stooge Sort
25
x y min(x,y) max(x,y)
x y min(x,y) max(x,y)
shorter notation:
void compare(int&a, int&b, boolean dir) { if (dir==(a,b)){ std::swap(a,b); } }
26
a b a b
27
1 5 4 3 5 1 4 3 1 3 4 5 3 4 3 1 4 5
28
2:3 4:3 2:1 4:1 2:4 3:4 2:1 3:1 1:3 4:3 1:2 4:2 1:4 3:4 1:2 3:2 2:4 2:4 2:3 2:3 1:4 1:4 1:3 1:3 1:3 1:4 2:3 2:4 3:4 3:4 1:2 𝑦1 𝑦2 𝑦3 𝑦4 Oblivious comparison tree redundant cases
29
𝑦1 𝑦2 𝑦3 𝑦𝑜−1 𝑦𝑜 𝑦𝑜+1 sorting network . . . . . . . . .
30
𝑦1 𝑦2 𝑦3 𝑦𝑜−1 𝑦𝑜 𝑦𝑜+1 sorting network . . . . . . . . .
31
insertion sort bubble sort with parallelism: insertion sort = bubble sort !
How many steps does a computer with infinite number of processors (comparators) require in order to sort using parallel bubble sort? Answer: 2n – 3 Can this be improved ? How many comparisons ? Answer: (n-1) n/2 How many comparators are required (at a time)? Answer: n/2 Reusable comparators: n-1
32
Odd-Even Transposition Sort: 9 8 2 7 3 1 5 6 4 1 8 9 2 7 1 3 5 6 4 2 8 2 9 1 7 3 5 4 6 3 2 8 1 9 3 7 4 5 6 4 2 1 8 3 9 4 7 5 6 5 1 2 3 8 4 9 5 7 6 6 1 2 3 4 8 5 9 6 7 7 1 2 3 4 5 8 6 9 7 8 1 2 3 4 5 6 8 7 9 1 2 3 4 5 6 7 8 9
33
void oddEvenTranspositionSort(std::vector<T>& v, boolean dir) { for (int i = 0; i<v.size(); ++i){ for (int j = i % 2; j+1<n; j+=2) compare(v[i],v[j],dir); } }
34
Same number of comparators (at a time) Same number of comparisons But less parallel steps (depth): n
35
In a massively parallel setup, bubble sort is thus not too bad. But it can go better...
36
Prove that the two networks above sort four numbers. Easy?
depth = 4 depth = 3
Theorem: If a network with 𝑜 input lines sorts all 2𝑜 sequences of 0s and 1s into non-decreasing order, it will sort any arbitrary sequence
37
Assume a monotonic function 𝑔(𝑦) with 𝑔 𝑦 ≤ 𝑔(𝑧) whenever 𝑦 ≤ 𝑧 and a network 𝑂 that sorts. Let N transform (𝑦1, 𝑦2, … , 𝑦𝑜) into (𝑧1, 𝑧2, … , 𝑧𝑜), then it also transforms (𝑔(𝑦1), 𝑔(𝑦2), … , 𝑔(𝑦𝑜)) into (𝑔(𝑧1), 𝑔(𝑧2), … , 𝑔(𝑧𝑜)). Assume 𝑧𝑗 > 𝑧𝑗+1for some 𝑗, then consider the monotonic function 𝑔(𝑦) = ቊ0, 𝑗𝑔 𝑦 < 𝑧𝑗 1, 𝑗𝑔 𝑦 ≥ 𝑧𝑗 N converts (𝑔(𝑦1), 𝑔(𝑦2), … , 𝑔(𝑦𝑜)) into 𝑔 𝑧1 , 𝑔(𝑧2 , … 𝑔 𝑧𝑗 , 𝑔 𝑧𝑗+1 , … , 𝑔(𝑧𝑜))
38
𝑦 not sorted by 𝑂 ⇒ there is an 𝑔 𝑦 ∈ 0,1 𝑜 not sorted by N ⇔ 𝑔 sorted by N for all 𝑔 ∈ 0,1 𝑜 ⇒ 𝑦 sorted by N for all x
Argue: If x is sorted by a network N then also any monotonic function of x. 20 8 1 30 5 9 8 5 1 9 20 30 10 4
99 2 9 4 2
9 10 99 Show: If x is not sorted by the network, then there is a monotonic function f that maps x to 0s and 1s and f(x) is not sorted by the network 20 8 1 30 5 9 9 5 1 8 20 30 1 1 0 1 1 1 1
Assume a monotonic function 𝑔(𝑦) with 𝑔 𝑦 ≤ 𝑔(𝑧) whenever 𝑦 ≤ 𝑧 and a network 𝑂 that sorts. Let N transform (𝑦1, 𝑦2, … , 𝑦𝑜) into (𝑧1, 𝑧2, … , 𝑧𝑜), then it also transforms (𝑔(𝑦1), 𝑔(𝑦2), … , 𝑔(𝑦𝑜)) into (𝑔(𝑧1), 𝑔(𝑧2), … , 𝑔(𝑧𝑜)). Assume 𝑧𝑗 > 𝑧𝑗+1for some 𝑗, then consider the monotonic function 𝑔(𝑦) = ቊ0, 𝑗𝑔 𝑦 < 𝑧𝑗 1, 𝑗𝑔 𝑦 ≥ 𝑧𝑗 N converts (𝑔(𝑦1), 𝑔(𝑦2), … , 𝑔(𝑦𝑜)) into 𝑔 𝑧1 , 𝑔(𝑧2 , … 𝑔 𝑧𝑗 , 𝑔 𝑧𝑗+1 , … , 𝑔(𝑧𝑜))
39
1 All comparators must act in the same way for the 𝑔(𝑦𝑗) as they do for the 𝑦𝑗
Bitonic (Merge) Sort is a parallel algorithm for sorting If enough processors are available, bitonic sort breaks the lower bound on sorting for comparison sort algorithm Asymptotic Runtime of 𝑃 𝑜 log2 𝑜 (sequential execution) Very good asymptotic runtime in the parallel case (as we'll see below). Worst = Average = Best case
40
Sequence (𝑦1, 𝑦2, … , 𝑦𝑜) is bitonic, if it can be circularly shifted such that it is first monotonically increasing and then monontonically decreasing. (1, 2, 3, 4, 5, 3, 1, 0) (4, 3, 2, 1, 2, 4, 6, 5)
41
42
43
bitonic 1 1 1 1 1 1 1 1 1 1 bitonic bitonic clean
void halfclean(std::vector<T>& a, int lo, int n, boolean dir){ for (int i=lo; i<lo+n/2; i++) compare(a[i],a[i+n/2], dir); }
44
bitonic 1 1 1 1 1 1 bitonic clean bitonic
void halfclean(std::vector<T>& a, int lo, int n, boolean dir){ for (int i=lo; i<lo+n/2; i++) compare(a[i],a[i+n/2], dir); }
45
bitonic bitonic bitonic
Input: a bitonic sequence of 0s and 1s, then for the output of the half-cleaner it holds that
46
47
1 bitonic 1 1 1 1 1 1 bitonic bitonic clean
top bottom top bottom
48
1 bitonic 1 1 1 1 bitonic clean bitonic
top bottom top bottom
1 1
49
1 bitonic 1 bitonic bitonic clean
top bottom top bottom
1 1
50
1 bitonic 1 bitonic bitonic clean
top bottom top bottom
1 1
51 1 1 bitonic 1 1 1 1 bitonic clean bitonic
top bottom top bottom
1 1 1 1 bitonic 1 1 1 1 bitonic bitonic clean
top bottom top bottom
1 1 1 1 bitonic 1 1 1 bitonic clean bitonic
top bottom top bottom
1 1 1 1 1 1 1 1 bitonic 1 1 1 bitonic clean bitonic
top bottom top bottom
1 1 1 1 1 1
52
1 1 1
1 1 1
halfclean halfclean halfclean
1 1 1
half clean half clean half clean half clean
1 1 1
bitonic sorted
53
halfclean (n)
bitonic ToSorted (n/2) bitonic ToSorted (n/2)
bitonicToSorted(n)
void bitonicToSorted (std::vector<int>& a, int lo, int n, boolean dir){ if (n>1){ halfClean(a, lo, n, dir); bitonicToSorted(a, lo, n/2, dir); bitonicToSorted(a, lo+m, n-n/2, dir); } }
54
halfclean (n)
bitonic ToSorted (n/2) bitonic ToSorted (n/2)
bitonic
1 1 1 1 1 1
sorted
55
1 1 1 1 1 1 bitonic bitonic sorted sorted 1 1 1 1 1 1 bitonic bitonic sorted reverse sorted
Bi-Merger on two sorted sequences acts like a half-cleaner on a bitonic sequence (when one of the sequences is reversed) biMerge half-cleaner
56
1 1 1 1 1 1 bitonic bitonic sorted sorted Bi-Merger on two sorted sequences acts like a half-cleaner on a bitonic sequence (when one of the sequences is reversed) bimerge
void bimerge(std::vector<T>& a, int lo, int n, boolean dir){ for (int i=0; i<n/2; i++) compare(a[lo+i],a[lo+n-i-1], dir); }
Merger
57
1 1 1
1 1 1
bimerge (n) halfclean (n/2) halfclean (n/2)
1 1 1
half clean half clean half clean half clean
1 1 1
sorted sorted sorted
Merger
58
1 1 1
1 1 1
bimerge (n)
1 1 1 1 1 1
sorted sorted sorted
bitonic ToSorted (n/2) bitonic ToSorted (n/2)
bitonic bitonic
void bitonicMerge (std::vector<int>& a, int lo, int n, boolean dir){ if (n>1){ int m=n/2; bimerge(a,lo,n,dir); bitonicToSorted(a, lo, m, dir); bitonicToSorted(a, lo+m, n-m, dir); } }
59
bitonic ToSorted (n/2) bitonic ToSorted (n/2)
sorted
1 1 1 1 1 1
sorted bimerge sorted
60
bitonicSort (n/2) bitonic Merge (n)
bitonicSort(n)
bitonicSort (n/2)
private void bitonicSort(std::vector&<T> a, int lo, int n, boolean dir){ if (n>1){ int m=n/2; bitonicSort(a, lo, m, ASCENDING); bitonicSort(a, lo+m, n-m, DESCENDING); bitonicMerge(a, lo, n, dir); } }
61
bitonicSort (n/2) bitonic Merge (n) bitonicSort (n/2)
bitonicMerge (8)
62
bitonicMerge(4) bitonicMerge(2)
biMerge halfclean halfclean half clean half clean half clean half clean biMerge half clean half clean biMerge half clean half clean half clean half clean half clean half clean
Merger (8)
63
Merger(4) Merger (2)
bi-merger half cleaner half cleaner half cleane r half cleane r half cleane r half cleane r bi- merger half cleane r half cleane r bi- merger half cleane r half cleane r half cleane r half cleane r half cleane r half cleane r
How many steps?
𝑗=1 log 𝑜
log 2𝑗 =
𝑗=1 log 𝑜
𝑗 log 2 = log 𝑜 ⋅ (log 𝑜 + 1) 2 = 𝑷(𝐦𝐩𝐡𝟑 𝒐)
64
#mergers #steps / merger
Programming,
Kopieren ist erlaubt aber nicht clever. Handgeschriebenes vom Tablet drucken ist auch erlaubt, wenn dabei die Schrift nicht wesentlich verkleinert wird (in Relation zur sonst üblichen Schreibschrift).
65
66
[morgen, Freitag 1. Juni, finden Übungen statt. Besprechung Übung 13.]
erklären?
wurde (geometrische Algorithmen, branch-and-bound)
Widmayer/Püschel nicht behandelt wurden (insbesondere C++ / Parallel Programming)
67
NICHT Prüfungsstoff sind
Ranomisierter Quicksort, Amortisierte Analyse von Move-To-Front, Beweis Theorem Universelles Hashing, Beweis Fibonacci Zahlen mit Erzeugendenfunktion, Beweis Amortisierte Kosten Fibonacci Heap,
68
69
Ich bin weiterhin erreichbar unter felix.friedrich@inf.ethz.ch