Vorlesung Datenstrukturen und Algorithmen Letzte Vorlesung 2018 - - PowerPoint PPT Presentation

vorlesung datenstrukturen und algorithmen
SMART_READER_LITE
LIVE PREVIEW

Vorlesung Datenstrukturen und Algorithmen Letzte Vorlesung 2018 - - PowerPoint PPT Presentation

Vorlesung Datenstrukturen und Algorithmen Letzte Vorlesung 2018 Felix Friedrich, 30.5.2018 Map/Reduce Sorting Networks Prfung 1 MAP AND REDUCE AND MAP/REDUCE 2 Summing a Vector Accumulator Divide and conquer + + + + + + + + +


slide-1
SLIDE 1

Vorlesung Datenstrukturen und Algorithmen Letzte Vorlesung 2018

Felix Friedrich, 30.5.2018 Map/Reduce Sorting Networks Prüfung

1

slide-2
SLIDE 2

MAP AND REDUCE AND MAP/REDUCE

2

slide-3
SLIDE 3

Summing a Vector

3

+ + + + + + + Accumulator + + + + + + + + Divide and conquer

Q: Why is the result the same? A: associativity: (a+b) + c = a + (b+c)

slide-4
SLIDE 4

Summing a Vector

+ + + + + + + + Divide and conquer + + + + Is this correct? + + +

4

Only if the operation is commutative: a + b = b + a

slide-5
SLIDE 5

Reductions

Simple examples: sum, max Reductions over programmer-defined operations

– operation properties (associativity / commutativity) define the correct

executions

– supported in most parallel languages / frameworks – powerful construct

5

slide-6
SLIDE 6

C++ Reduction

  • std::accumulate (requires associativity)
  • std::reduce (requires commutativity, from C++17, can specify execution

policy)

std::vector<double> v; ... double result = std::accumulate( v.begin(), v.end(), 0.0, [](double a, double b){return a + b;} );

6

slide-7
SLIDE 7

Elementwise Multiplication

Map

7 x

+ Multiply

x x x x x x x

slide-8
SLIDE 8

Scalar Product

Map Reduce

8 x

+ Multiply

x x x x x x x

+ + + + + + + + Accumulate

slide-9
SLIDE 9

C++ Scalar Product (map + reduce)

// example data

std::vector<double> v1(1024,0.5); auto v2 = v1; std::vector<double> result(1024); // map std::transform(v1.begin(), v1.end(), v2.begin(), result.begin(), [](double a, double b){return a*b;}); // reduce double value = std::accumulate(result.begin(), result.end(), 0.0); // = 256

9

slide-10
SLIDE 10

Map & Reduce = MapReduce

Combination of two parallelisation patterns result = 𝑔 in1 ⊕ 𝑔 in2 ⊕ 𝑔(in3) ⊕ 𝑔(in4) 𝑔 = map ⊕ = reduce (associative) Examples: numerical reduction, word count in document, (word, document) list, maximal temperature per month over 50 years (etc.)

10

slide-11
SLIDE 11

Motivating Example

Maximal Temperature per Month for 50 years

  • Input: 50 * 365 Days / Temperature pairs
  • Output: 12 Months / Max Temperature pairs

Assume we (you and me) had to do this together. How would we distribute the work? What is the generic model? How would we be ideally prepared for different reductions (min, max, avg)?

11

slide-12
SLIDE 12

Maximal Temperature per Month: Map

12

data 01 / 5 01 / 8 01 / 8 .... 03 / 12 03 / 14 .... 05/20 05/19 05/20 ... 07/28 07/38 ... ... ... ... ... ... ... 03/14 03/18 .... ...

each map-process gets day/temperature pairs and maps them to month/temperature pairs

slide-13
SLIDE 13

Maximal Temperature per Month: Shuffle

13

01 / -5 01 / -8 01 / -8 .... 03 / 12 03 / 14 .... 05/20 05/19 05/20 ... 07/28 07/38 ... ... ... ... ... ... ... 03/14 03/18 .... Jan 01 / -5 01 / -8 01 / -8 .... Feb 02 / 13 02 / 14 02 / 12 .... Mar 03 / 13 03 / 14 03 / 12 .... April 04 / 23 04 / 24 04 / 22 .... Dec 12 / 0 12 / -2 12/ 2 .... ...

data gets sorted / shuffled by month

slide-14
SLIDE 14

Maximal Temperature per Month: Reduce

14

Jan 01 / -5 01 / -8 01 / -8 .... Feb 02 / 13 02 / 14 02 / 12 .... Mar 03 / 13 03 / 14 03 / 12 .... April 04 / 23 04 / 24 04 / 22 .... Dec 12 / 0 12 / -2 12/ 2 .... Jan 18 Feb 20 Mar 22 April 28 Dec 20

each reduce process gets its own month with values and applies the reduction (here: max value) to it

slide-15
SLIDE 15

Map/Reduce

A strategy for implementing parallel algorithms.

  • map: A master worker takes the problem input, divides it

into smaller sub-problems, and distributes the sub-problems to workers (threads).

  • reduce: The master worker collects sub-solutions from the

workers and combines them in some way to produce the

  • verall answer.

15

slide-16
SLIDE 16

Map/Reduce

Frameworks and tools have been written to perform map/reduce.

  • MapReduce framework by Google
  • Hadoop framework by Yahoo!
  • related to the ideas of

Big Data and Cloud Computing

  • also related to functional programming

(and actually not that new) and available with the Streams concept in Java (>=8)

  • Map and reduce are user-supplied plug-ins, the rest is provided by the

frameworks.

16

Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: simplified data processing on large clusters. Commun. ACM 51, 1 (January 2008), 107-113. DOI=10.1145/1327452.1327492 http://doi.acm.org/10.1145/1327452.1327492

slide-17
SLIDE 17

MapReduce on Clusters

You may have heard of Google’s “map/reduce” or Amazon's Hadoop Idea: Perform maps/reduces on data using many machines

  • The system takes care of distributing the data and managing fault tolerance
  • You just write code to map one element (key-value part) and reduce elements

(key-value pairs) to a combined result

Separates how to do recursive divide-and-conquer from what computation to perform

  • Old idea in higher-order functional programming transferred to large-scale

distributed computing

17

slide-18
SLIDE 18

Example

Count word occurences in a very large file File =

18

how are you today do you like the weather outside I do I do I wish you the very best for your exams. GBytes

slide-19
SLIDE 19

Mappers

19

huge file part 1 part 2 part 3 key/value pairs (e.g. position / string) <0, "how are you today"> <15, "do you like ..."> <35, "I do"> <39, "I do"> <43, "I wish you the very best"> <70, "for your exams"> mapper 1 mapper 2 mapper 3

DISTRIBUTED

slide-20
SLIDE 20

Mappers

20

key/value pairs (e.g. position / string) <0, "how are you today"> <15, "do you like ..."> <35, "I do"> <39, "I do"> <43, "I wish you the very best"> <70, "for your exams"> mapper 1 mapper 2 mapper 3 key/value pairs (word, count)

<"how",1> <"are",1> <"you",1> ... <"I",1> <"do",1> <"I",1> <"do",1> <"I",1> <"wish",1> <"you",1> ...

input

  • utput
slide-21
SLIDE 21

Shuffle / Sort

21

mapper 1 mapper 2 mapper 3 unique key/value pairs (shuffled) (word, counts)

<"how",1> <"are",1> <"you",1> ... <"I",1> <"do",1> <"I",1> <"do",1> <"I",1> <"wish",1> <"you",1> ... <"do",1,1,1> <"for",1> ....

reducer 1 reducer 2

<"are",1> <"best",1> ...

key/value pairs (word, count)

DISTRIBUTED

slide-22
SLIDE 22

Reduce

22

unique key/value pairs (shuffled) (word, counts)

<"do",1,1,1> <"for",1> ....

reducer 1 reducer 2

<"are",1> <"best",1> ...

target file(s)

are 1 best 1 you 3 ... do 3 for 1 I 3 ...

input

  • utput
slide-23
SLIDE 23

SORTING NETWORKS

23

slide-24
SLIDE 24

Lower bound on sorting

24

Simple algorithms: O(n2) Fancier algorithms: O(n log n) Comparison lower bound: (n log n) Insertion sort Selection sort Bubble Sort Shell sort … Heap sort Merge sort Quick sort (avg) … Horrible algorithms: Ω(n2) Bogo Sort Stooge Sort

slide-25
SLIDE 25

Comparator

25

x y min(x,y) max(x,y)

<

x y min(x,y) max(x,y)

shorter notation:

slide-26
SLIDE 26

void compare(int&a, int&b, boolean dir) { if (dir==(a,b)){ std::swap(a,b); } }

26

a b a b

> <

slide-27
SLIDE 27

Sorting Networks

27

1 5 4 3 5 1 4 3 1 3 4 5 3 4 3 1 4 5

slide-28
SLIDE 28

Sorting Networks are Oblivious (and Redundant)

28

2:3 4:3 2:1 4:1 2:4 3:4 2:1 3:1 1:3 4:3 1:2 4:2 1:4 3:4 1:2 3:2 2:4 2:4 2:3 2:3 1:4 1:4 1:3 1:3 1:3 1:4 2:3 2:4 3:4 3:4 1:2 𝑦1 𝑦2 𝑦3 𝑦4 Oblivious comparison tree redundant cases

slide-29
SLIDE 29

Recursive Construction : Insertion

29

𝑦1 𝑦2 𝑦3 𝑦𝑜−1 𝑦𝑜 𝑦𝑜+1 sorting network . . . . . . . . .

slide-30
SLIDE 30

Recursive Construction: Selection

30

𝑦1 𝑦2 𝑦3 𝑦𝑜−1 𝑦𝑜 𝑦𝑜+1 sorting network . . . . . . . . .

slide-31
SLIDE 31

Applied recursively..

31

insertion sort bubble sort with parallelism: insertion sort = bubble sort !

slide-32
SLIDE 32

Question

How many steps does a computer with infinite number of processors (comparators) require in order to sort using parallel bubble sort? Answer: 2n – 3 Can this be improved ? How many comparisons ? Answer: (n-1) n/2 How many comparators are required (at a time)? Answer: n/2 Reusable comparators: n-1

32

slide-33
SLIDE 33

Improving parallel Bubble Sort

Odd-Even Transposition Sort: 9 8 2 7 3 1 5 6 4 1 8 9 2 7 1 3 5 6 4 2 8 2 9 1 7 3 5 4 6 3 2 8 1 9 3 7 4 5 6 4 2 1 8 3 9 4 7 5 6 5 1 2 3 8 4 9 5 7 6 6 1 2 3 4 8 5 9 6 7 7 1 2 3 4 5 8 6 9 7 8 1 2 3 4 5 6 8 7 9 1 2 3 4 5 6 7 8 9

33

slide-34
SLIDE 34

void oddEvenTranspositionSort(std::vector<T>& v, boolean dir) { for (int i = 0; i<v.size(); ++i){ for (int j = i % 2; j+1<n; j+=2) compare(v[i],v[j],dir); } }

34

slide-35
SLIDE 35

Improvement?

Same number of comparators (at a time) Same number of comparisons But less parallel steps (depth): n

35

In a massively parallel setup, bubble sort is thus not too bad. But it can go better...

slide-36
SLIDE 36

Parallel sorting

36

Prove that the two networks above sort four numbers. Easy?

depth = 4 depth = 3

slide-37
SLIDE 37

Zero-one-principle

Theorem: If a network with 𝑜 input lines sorts all 2𝑜 sequences of 0s and 1s into non-decreasing order, it will sort any arbitrary sequence

  • f 𝑜 numbers in nondecreasing order.

37

slide-38
SLIDE 38

Proof

Assume a monotonic function 𝑔(𝑦) with 𝑔 𝑦 ≤ 𝑔(𝑧) whenever 𝑦 ≤ 𝑧 and a network 𝑂 that sorts. Let N transform (𝑦1, 𝑦2, … , 𝑦𝑜) into (𝑧1, 𝑧2, … , 𝑧𝑜), then it also transforms (𝑔(𝑦1), 𝑔(𝑦2), … , 𝑔(𝑦𝑜)) into (𝑔(𝑧1), 𝑔(𝑧2), … , 𝑔(𝑧𝑜)). Assume 𝑧𝑗 > 𝑧𝑗+1for some 𝑗, then consider the monotonic function 𝑔(𝑦) = ቊ0, 𝑗𝑔 𝑦 < 𝑧𝑗 1, 𝑗𝑔 𝑦 ≥ 𝑧𝑗 N converts (𝑔(𝑦1), 𝑔(𝑦2), … , 𝑔(𝑦𝑜)) into 𝑔 𝑧1 , 𝑔(𝑧2 , … 𝑔 𝑧𝑗 , 𝑔 𝑧𝑗+1 , … , 𝑔(𝑧𝑜))

38

𝑦 not sorted by 𝑂 ⇒ there is an 𝑔 𝑦 ∈ 0,1 𝑜 not sorted by N ⇔ 𝑔 sorted by N for all 𝑔 ∈ 0,1 𝑜 ⇒ 𝑦 sorted by N for all x

Argue: If x is sorted by a network N then also any monotonic function of x. 20 8 1 30 5 9 8 5 1 9 20 30 10 4

  • 1

99 2 9 4 2

  • 1

9 10 99 Show: If x is not sorted by the network, then there is a monotonic function f that maps x to 0s and 1s and f(x) is not sorted by the network 20 8 1 30 5 9 9 5 1 8 20 30 1 1 0 1 1 1 1

slide-39
SLIDE 39

Proof

Assume a monotonic function 𝑔(𝑦) with 𝑔 𝑦 ≤ 𝑔(𝑧) whenever 𝑦 ≤ 𝑧 and a network 𝑂 that sorts. Let N transform (𝑦1, 𝑦2, … , 𝑦𝑜) into (𝑧1, 𝑧2, … , 𝑧𝑜), then it also transforms (𝑔(𝑦1), 𝑔(𝑦2), … , 𝑔(𝑦𝑜)) into (𝑔(𝑧1), 𝑔(𝑧2), … , 𝑔(𝑧𝑜)). Assume 𝑧𝑗 > 𝑧𝑗+1for some 𝑗, then consider the monotonic function 𝑔(𝑦) = ቊ0, 𝑗𝑔 𝑦 < 𝑧𝑗 1, 𝑗𝑔 𝑦 ≥ 𝑧𝑗 N converts (𝑔(𝑦1), 𝑔(𝑦2), … , 𝑔(𝑦𝑜)) into 𝑔 𝑧1 , 𝑔(𝑧2 , … 𝑔 𝑧𝑗 , 𝑔 𝑧𝑗+1 , … , 𝑔(𝑧𝑜))

39

1 All comparators must act in the same way for the 𝑔(𝑦𝑗) as they do for the 𝑦𝑗

slide-40
SLIDE 40

Bitonic Sort

Bitonic (Merge) Sort is a parallel algorithm for sorting If enough processors are available, bitonic sort breaks the lower bound on sorting for comparison sort algorithm Asymptotic Runtime of 𝑃 𝑜 log2 𝑜 (sequential execution) Very good asymptotic runtime in the parallel case (as we'll see below). Worst = Average = Best case

40

slide-41
SLIDE 41

Bitonic

Sequence (𝑦1, 𝑦2, … , 𝑦𝑜) is bitonic, if it can be circularly shifted such that it is first monotonically increasing and then monontonically decreasing. (1, 2, 3, 4, 5, 3, 1, 0) (4, 3, 2, 1, 2, 4, 6, 5)

41

slide-42
SLIDE 42

Bitonic 0-1 Sequences

0𝑗1𝑘0𝑙 1𝑗0𝑘1𝑙

42

slide-43
SLIDE 43

The Half-Cleaner

43

bitonic 1 1 1 1 1 1 1 1 1 1 bitonic bitonic clean

void halfclean(std::vector<T>& a, int lo, int n, boolean dir){ for (int i=lo; i<lo+n/2; i++) compare(a[i],a[i+n/2], dir); }

slide-44
SLIDE 44

The Half-Cleaner

44

bitonic 1 1 1 1 1 1 bitonic clean bitonic

void halfclean(std::vector<T>& a, int lo, int n, boolean dir){ for (int i=lo; i<lo+n/2; i++) compare(a[i],a[i+n/2], dir); }

slide-45
SLIDE 45

Bitonic Split Example

45

+

bitonic bitonic bitonic

slide-46
SLIDE 46

Lemma

Input: a bitonic sequence of 0s and 1s, then for the output of the half-cleaner it holds that

  • Upper and lower half is bitonic
  • [One of the two halfs is bitonic clean (only 0s or 1s)]
  • Every number in upper half ≤ every number in the lower half

46

slide-47
SLIDE 47

Proof: All cases

47

1 bitonic 1 1 1 1 1 1 bitonic bitonic clean

top bottom top bottom

slide-48
SLIDE 48

48

1 bitonic 1 1 1 1 bitonic clean bitonic

top bottom top bottom

1 1

slide-49
SLIDE 49

49

1 bitonic 1 bitonic bitonic clean

top bottom top bottom

1 1

slide-50
SLIDE 50

50

1 bitonic 1 bitonic bitonic clean

top bottom top bottom

1 1

slide-51
SLIDE 51

The four remaining cases (010  101)

51 1 1 bitonic 1 1 1 1 bitonic clean bitonic

top bottom top bottom

1 1 1 1 bitonic 1 1 1 1 bitonic bitonic clean

top bottom top bottom

1 1 1 1 bitonic 1 1 1 bitonic clean bitonic

top bottom top bottom

1 1 1 1 1 1 1 1 bitonic 1 1 1 bitonic clean bitonic

top bottom top bottom

1 1 1 1 1 1

slide-52
SLIDE 52

Construction BitonicToSorted

52

1 1 1

1 1 1

halfclean halfclean halfclean

1 1 1

half clean half clean half clean half clean

1 1 1

bitonic sorted

slide-53
SLIDE 53

Recursive Construction

53

halfclean (n)

bitonic ToSorted (n/2) bitonic ToSorted (n/2)

bitonicToSorted(n)

slide-54
SLIDE 54

BitonicToSorted sorts a Bitonic Sequence

void bitonicToSorted (std::vector<int>& a, int lo, int n, boolean dir){ if (n>1){ halfClean(a, lo, n, dir); bitonicToSorted(a, lo, n/2, dir); bitonicToSorted(a, lo+m, n-n/2, dir); } }

54

halfclean (n)

bitonic ToSorted (n/2) bitonic ToSorted (n/2)

bitonic

1 1 1 1 1 1

sorted

slide-55
SLIDE 55

Bi-Merger

55

1 1 1 1 1 1 bitonic bitonic sorted sorted 1 1 1 1 1 1 bitonic bitonic sorted reverse sorted

≜ bitonic

Bi-Merger on two sorted sequences acts like a half-cleaner on a bitonic sequence (when one of the sequences is reversed) biMerge half-cleaner

slide-56
SLIDE 56

Bi-Merger

56

1 1 1 1 1 1 bitonic bitonic sorted sorted Bi-Merger on two sorted sequences acts like a half-cleaner on a bitonic sequence (when one of the sequences is reversed) bimerge

void bimerge(std::vector<T>& a, int lo, int n, boolean dir){ for (int i=0; i<n/2; i++) compare(a[lo+i],a[lo+n-i-1], dir); }

slide-57
SLIDE 57

Merger

Merger

57

1 1 1

1 1 1

bimerge (n) halfclean (n/2) halfclean (n/2)

1 1 1

half clean half clean half clean half clean

1 1 1

sorted sorted sorted

slide-58
SLIDE 58

Merger

Merger

58

1 1 1

1 1 1

bimerge (n)

1 1 1 1 1 1

sorted sorted sorted

bitonic ToSorted (n/2) bitonic ToSorted (n/2)

bitonic bitonic

slide-59
SLIDE 59

BitonicMerge sorts a Halfsorted Sequence

void bitonicMerge (std::vector<int>& a, int lo, int n, boolean dir){ if (n>1){ int m=n/2; bimerge(a,lo,n,dir); bitonicToSorted(a, lo, m, dir); bitonicToSorted(a, lo+m, n-m, dir); } }

59

bitonic ToSorted (n/2) bitonic ToSorted (n/2)

sorted

1 1 1 1 1 1

sorted bimerge sorted

slide-60
SLIDE 60

Recursive Construction of a Sorter

60

bitonicSort (n/2) bitonic Merge (n)

bitonicSort(n)

bitonicSort (n/2)

slide-61
SLIDE 61

private void bitonicSort(std::vector&<T> a, int lo, int n, boolean dir){ if (n>1){ int m=n/2; bitonicSort(a, lo, m, ASCENDING); bitonicSort(a, lo+m, n-m, DESCENDING); bitonicMerge(a, lo, n, dir); } }

61

bitonicSort (n/2) bitonic Merge (n) bitonicSort (n/2)

slide-62
SLIDE 62

bitonicMerge (8)

Example

62

bitonicMerge(4) bitonicMerge(2)

biMerge halfclean halfclean half clean half clean half clean half clean biMerge half clean half clean biMerge half clean half clean half clean half clean half clean half clean

slide-63
SLIDE 63

Merger (8)

Example

63

Merger(4) Merger (2)

bi-merger half cleaner half cleaner half cleane r half cleane r half cleane r half cleane r bi- merger half cleane r half cleane r bi- merger half cleane r half cleane r half cleane r half cleane r half cleane r half cleane r

slide-64
SLIDE 64

Bitonic Merge Sort

How many steps? ෍

𝑗=1 log 𝑜

log 2𝑗 = ෍

𝑗=1 log 𝑜

𝑗 log 2 = log 𝑜 ⋅ (log 𝑜 + 1) 2 = 𝑷(𝐦𝐩𝐡𝟑 𝒐)

64

#mergers #steps / merger

slide-65
SLIDE 65

Zur Prüfung

  • findet statt am 6.8.2018 von 9:00 – 11:00 (2h)
  • Inhalt: Datenstrukturen und Algorithmen, C++ Advanced, Parallel

Programming,

  • Hilfsmittel: 4 A4 Seiten, handgeschrieben oder min. 11Pt Fontsize.

Kopieren ist erlaubt aber nicht clever. Handgeschriebenes vom Tablet drucken ist auch erlaubt, wenn dabei die Schrift nicht wesentlich verkleinert wird (in Relation zur sonst üblichen Schreibschrift).

65

slide-66
SLIDE 66

Vorschlag: Q&A vor der Prüfung.

  • Schulferien ab 16.Juli -- Termin sollte vorher sein.
  • Für Sie: je später desto besser. Vorschlag: +/- 12. Juli.

66

slide-67
SLIDE 67

Vorbereitung

  • Übungen machen / gemacht haben.

[morgen, Freitag 1. Juni, finden Übungen statt. Besprechung Übung 13.]

  • Können Sie die Vorlesungsinhalte einem Kollegen (ohne Folien)

erklären?

  • Alte Prüfungen stehen auf der Webseite zur Verfügung.
  • Die alten Prüfungen von Prof. Widmayer enthalten Material, das nicht behandelt

wurde (geometrische Algorithmen, branch-and-bound)

  • Die "neuen" Prüfungen bei mir enthalten Material, das in den Vorlesungen von

Widmayer/Püschel nicht behandelt wurden (insbesondere C++ / Parallel Programming)

67

slide-68
SLIDE 68

Ausschlüsse

NICHT Prüfungsstoff sind

  • Details der längeren Beweise (Laufzeit Algorithmus Blum, Analyse

Ranomisierter Quicksort, Amortisierte Analyse von Move-To-Front, Beweis Theorem Universelles Hashing, Beweis Fibonacci Zahlen mit Erzeugendenfunktion, Beweis Amortisierte Kosten Fibonacci Heap,

  • Atomare Register / RMW Operationen / Lock Free Programming
  • Hardware Architekturen, Pipelining, Peterson Algorithmus

68

slide-69
SLIDE 69

69

Ich bin weiterhin erreichbar unter felix.friedrich@inf.ethz.ch