Recap: Prefix Sums Given A : set of n integers Find B : prefix sums - - PowerPoint PPT Presentation

recap prefix sums
SMART_READER_LITE
LIVE PREVIEW

Recap: Prefix Sums Given A : set of n integers Find B : prefix sums - - PowerPoint PPT Presentation

Recap: Prefix Sums Given A : set of n integers Find B : prefix sums A: 3 1 1 7 2 5 9 2 4 3 3 B: 3 4 5 12 14 19 28 30 34 37 40 1 / 86 Recap: Parallel Prefix Sums Recursive algorithm Recursively computes sums Use


slide-1
SLIDE 1

1 / 86

Recap: Prefix Sums

  • Given A: set of n integers
  • Find B: prefix sums

3 1 1 7 2 5 9 2 4 3 3 3 4 5 12 14 19 28 30 34 37 40

A: B:

slide-2
SLIDE 2

2 / 86

Recap: Parallel Prefix Sums

  • Recursive algorithm

– Recursively computes sums – Use partial sums to get prefix sums

  • T(n) = O(log n)
  • W(n) = O(n)
  • Hard to get intuition
  • Iterative algorithm easier to grasp?
slide-3
SLIDE 3

3 / 86

Iterative prefix sum

  • 2 phases: up-sweep, down-sweep
  • Up-sweep pseudocode:
slide-4
SLIDE 4

4 / 86

Up-sweep phase

3 1 1 7 2 5 9 2 3 1 1 7 2 5 9 2

B[0] A

slide-5
SLIDE 5

5 / 86

Up-sweep phase

3 1 1 7 2 5 9 2

B[0] B[1]

slide-6
SLIDE 6

6 / 86

Up-sweep phase

3 1 1 7 2 5 9 2

B[0] B[1] B[2]

slide-7
SLIDE 7

7 / 86

Up-sweep phase

3 1 1 7 2 5 9 2

B[0] B[1] B[2] B[3]

slide-8
SLIDE 8

8 / 86

Up-sweep phase

3 1 1 7 2 5 9 2

B[0] B[1] B[2] B[3]

4 8 7 11

slide-9
SLIDE 9

9 / 86

Up-sweep phase

3 1 1 7 2 5 9 2

B[0] B[1] B[2] B[3]

4 8 7 11 12 18

slide-10
SLIDE 10

10 / 86

Up-sweep phase

3 1 1 7 2 5 9 2

B[0] B[1] B[2] B[3]

4 8 7 11 12 18 30

slide-11
SLIDE 11

11 / 86

Down-sweep phase

slide-12
SLIDE 12

12 / 86

Down-sweep phase

3 1 1 7 2 5 9 2

B[0] B[1] B[2] C[3]

4 8 7 11 12 18

slide-13
SLIDE 13

13 / 86

Down-sweep phase

3 1 1 7 2 5 9 2

B[0] B[1] B[2]

4 8 7 11 12 18

C[3]

slide-14
SLIDE 14

14 / 86

Down-sweep phase

3 1 1 7 2 5 9 2

B[0] B[1] B[2]

4 8 7 11 12 18

C[3]

slide-15
SLIDE 15

15 / 86

Down-sweep phase

3 1 1 7 2 5 9 2

B[0] B[1] C[2]

4 8 7 11 12

C[3]

slide-16
SLIDE 16

16 / 86

Down-sweep phase

3 1 1 7 2 5 9 2

B[0] B[1] C[2]

4 8 7 11 12

C[3]

slide-17
SLIDE 17

17 / 86

Down-sweep phase

3 1 1 7 2 5 9 2

B[0] C[1] C[2]

4 12 19 12

C[3]

slide-18
SLIDE 18

18 / 86

Down-sweep phase

3 1 1 7 2 5 9 2

B[0] C[1] C[2]

4 12 19 12

C[3]

slide-19
SLIDE 19

19 / 86

Down-sweep phase

3 4 5 12 17 19 28

C[0] C[1] C[2]

4 12 19 12

C[3]

slide-20
SLIDE 20

20 / 86

Down-sweep phase

3 4 5 12 17 19 28

C[0]

3 1 1 7 2 5 9 2

A

slide-21
SLIDE 21

21 / 86

Down-sweep phase

3 4 5 12 17 19 28

C[0]

3 4 5 12 14 22 28 30

A

slide-22
SLIDE 22

22 / 86

Applications of prefix sums

  • More useful than it seems:

– Create an array of 1s and 0s – Prefix sums gives # of 1s up to each point – Used to separate an array into 2

  • Using almost any criteria!
  • Examples:

– separate array into upper-case and lower-case

letters

– separate array into numbers >x and <x

slide-23
SLIDE 23

23 / 86

Example: string separation

  • Separate array A into lower-case and upper-case:

P r e R E c F I

  • X
  • S

U l M a

A

S

slide-24
SLIDE 24

24 / 86

Example: string separation

  • Create bitstring B:
  • 1 if upper-case, 0 otherwise

P r e R E c F I

  • X
  • S

U l M a

A

S

slide-25
SLIDE 25

25 / 86

Example: string separation

  • Create bitstring B:
  • 1 if upper-case, 0 otherwise
  • Time/work to do this in parallel?

1 1 1 1 1 1 1 1

B

P r e R E c F I

  • X
  • S

U l M a

A

S 1 1

slide-26
SLIDE 26

26 / 86

Example: string separation

  • Create bitstring B:
  • 1 if upper-case, 0 otherwise
  • Time/work to do this in parallel?

W(n) = O(n) T(n) = O(1)

1 1 1 1 1 1 1 1

B

P r e R E c F I

  • X
  • S

U l M a

A

S 1 1

slide-27
SLIDE 27

27 / 86

Example: string separation

  • Perform prefix sums on B

1 1 1 1 1 1 1 1

B

P r e R E c F I

  • X
  • S

U l M a

A

S 1 1

slide-28
SLIDE 28

28 / 86

Example: string separation

  • Perform prefix sums on B
  • What is B[i]?

1 1 1 2 3 5 5 6 6 7 8 8 4 9

B

P r e R E c F I

  • X
  • S

U l M a

A

S 10 3

slide-29
SLIDE 29

29 / 86

Example: string separation

  • Perform prefix sums on B
  • What is B[i]?

– The number of capital letters with index ≤ i

1 1 1 2 3 5 5 6 6 7 8 8 4 9

B

P r e R E c F I

  • X
  • S

U l M a

A

S 10 3

slide-30
SLIDE 30

30 / 86

Example: string separation

  • Copy capital letters into C
  • How can we use B to write only capitals into C?

C

1 1 1 2 3 5 5 6 6 7 8 8 4 9

B

P r e R E c F I

  • X
  • S

U l M a

A

S 10 3

slide-31
SLIDE 31

31 / 86

Example: string separation

  • Copy capital letters into C
  • How can we use B to write only capitals into C?

– B[i] is the index of each capital in C!

C

1 1 1 2 3 5 5 6 6 7 8 8 4 9

B

P r e R E c F I

  • X
  • S

U l M a

A

S 10 3

slide-32
SLIDE 32

32 / 86

Example: string separation

  • Copy capital letters into C
  • How can we use B to write only capitals into C?

– B[i] is the index of each capital in C!

2 3 4 5 6 7 1 8 9 10 11 12 13 14 15 16

C

1 1 1 2 3 5 5 6 6 7 8 8 4 9

B

P r e R E c F I

  • X
  • S

U l M a

A

S 10 3 17 P F I X U S R S M E

slide-33
SLIDE 33

33 / 86

Example: string separation

  • Create B‘
  • 1 for lower-case, 0 otherwise

2 3 4 5 6 7 1 8 9 10 11 12 13 14 15 16

C

1 1 1 1 1 1 1

B‘

P r e R E c F I

  • X
  • S

U l M a

A

S 17 P E F I X U S R S M

slide-34
SLIDE 34

34 / 86

Example: string separation

  • Prefix sums on B‘

2 3 4 5 6 7 1 8 9 10 11 12 13 14 15 16

C

1 2 3 3 4 5 4 5 6 6 7 6 4 7 1

B‘

P r e R E c F I

  • X
  • S

U l M a

A

S 7 3 17 P E F I X U S R S M

slide-35
SLIDE 35

35 / 86

Example: string separation

  • Copy lower-case into the rest of C

2 3 4 5 6 7 1 8 9 10 11 12 13 14 15 16

C

1 2 3 3 4 5 4 5 6 6 7 6 4 7 1

B‘

P r e R E c F I

  • X
  • S

U l M a

A

S 7 3 17 P E F I X U S R S M

slide-36
SLIDE 36

36 / 86

Example: string separation

  • Copy lower-case into the rest of C
  • A[i] = C[j]

– where j = B[n] + B‘[i] = 10 + B‘[i]

2 3 4 5 6 7 1 8 9 10 11 12 13 14 15 16

C

1 2 3 3 4 5 4 5 6 6 7 6 4 7 1

B‘

P r e R E c F I

  • X
  • S

U l M a

A

S 7 3 17 P E F I X U S R S M 2 3 4 5 6 1 7 a r e c

  • l
slide-37
SLIDE 37

37 / 86

Example: string separation

Create B and B‘ Prefix sums Copy into C Total algorithm

slide-38
SLIDE 38

38 / 86

Example: string separation

Create B and B‘ Prefix sums Copy into C Total algorithm

slide-39
SLIDE 39

39 / 86

Example: string separation

Create B and B‘ Prefix sums Copy into C Total algorithm

slide-40
SLIDE 40

40 / 86

Quicksort Review

  • Quicksort is a popular sorting algorithm

– Works in-place – O(n2) worst-case – BUT O(n log n) expected

  • Each recursive call:

– Find pivot – Partition around pivot

slide-41
SLIDE 41

41 / 86

Sequential Quicksort

slide-42
SLIDE 42

42 / 86

Select pivot

3 9 1 7 4 5 8 2

A pivot

slide-43
SLIDE 43

43 / 86

Select pivot

4 7 3 5 8 2

A

9 1

slide-44
SLIDE 44

44 / 86

Partition elements

4 7 3 5 8 2

A

9 1

slide-45
SLIDE 45

45 / 86

Partition elements

4 7 3 5 8 2

A part

9 1

slide-46
SLIDE 46

46 / 86

Partition elements

4 7 3 5 8 2

A part i

9 1

slide-47
SLIDE 47

47 / 86

Partition elements

4 7 3 5 8 2

A part i

9 1

FALSE

slide-48
SLIDE 48

48 / 86

Partition elements

4 7 3 5 8 2

A part i

9 1

TRUE

slide-49
SLIDE 49

49 / 86

Partition elements

4 7 3 5 8 2

A part i

1 9

TRUE

slide-50
SLIDE 50

50 / 86

Partition elements

4 7 3 5 8 2

A part i

1 9

FALSE

slide-51
SLIDE 51

51 / 86

Partition elements

4 7 3 5 8 2

A part i

1 9

TRUE

slide-52
SLIDE 52

52 / 86

Partition elements

4 7 9 5 8 2

A part i

1 3

TRUE

slide-53
SLIDE 53

53 / 86

Partition elements

4 7 9 5 8 2

A part i

1 3

FALSE

slide-54
SLIDE 54

54 / 86

Partition elements

4 7 9 5 8 2

A part i

1 3

FALSE

slide-55
SLIDE 55

55 / 86

Partition elements

4 7 9 5 8 2

A part i

1 3

TRUE

slide-56
SLIDE 56

56 / 86

Partition elements

4 2 9 5 8 7

A part i

1 3

TRUE

slide-57
SLIDE 57

57 / 86

Recurse

4 2 9 5 8 7

A part

1 3

slide-58
SLIDE 58

58 / 86

Recursion sorts sublists

1 4 5 7 8 9

A part

2 3

slide-59
SLIDE 59

59 / 86

How can we parallelize?

O(1) ??? Parallel calls

slide-60
SLIDE 60

60 / 86

Parallel partition

4 7 3 5 8 2

A

9 1

  • Separate all elements ≤ pivot
  • How can we do this in parallel?

pivot

slide-61
SLIDE 61

61 / 86

Parallel partition

4 7 3 5 8 2

A

9 1

  • Separate all elements ≤ pivot
  • How can we do this in parallel?

– Prefix sums!

pivot

slide-62
SLIDE 62

62 / 86

Parallel partition

4 7 3 5 8 2

A

9 1

  • Create B[i] by comparing A[i] to pivot

– 1 if A[i] ≤ A[0] – 0 otherwise

1 1 1

B

1

slide-63
SLIDE 63

63 / 86

Parallel partition

4 7 3 5 8 2

A

9 1

  • Prefix sums on B

1 2 3 3 3 4

B

1 2

slide-64
SLIDE 64

64 / 86

Parallel partition

4 7 3 5 8 2

A

9 1

  • Write each A[i] ≤ A[0] to array C

– C[B[i]] = A[i]

1 2 3 3 3 4

B

1 2 4 2

C

1 3

slide-65
SLIDE 65

65 / 86

Parallel partition

4 7 3 5 8 2

A

9 1

  • Create B‘ as opposite of B

– B‘[i] = 1 if A[i] > A[0] – B‘[i] = 0 otherwise

1 1 1

B‘

1 4 2

C

1 3

slide-66
SLIDE 66

66 / 86

Parallel partition

4 7 3 5 8 2

A

9 1

  • Prefix sums on B‘

2 2 3 4 4

B‘

1 1 4 2

C

1 3

slide-67
SLIDE 67

67 / 86

Parallel partition

4 7 3 5 8 2

A

9 1

  • Write remaining elements to C

– C[B[n-1] + B‘[i]] = A[i]

2 2 3 4 4

B‘

1 1 4 2

C

1 3 9 8 7 5

slide-68
SLIDE 68

68 / 86

Parallel quicksort analysis

  • Each recursive call performs prefix sum
  • Worst-case, pivot is always min or max:
  • If we assume “good“ pivot is chosen:
slide-69
SLIDE 69

69 / 86

Parallel quicksort analysis

  • Assuming a “good“ pivot choice:
slide-70
SLIDE 70

70 / 86

Issues with parallel quicksort

  • Have to copy A to C => not in-place

– O(n) extra space needed

  • O(log2n) “average“ parallel runtime
  • Recursive definition

– Difficult to make iterative – Perform many small prefix-sums

  • Performance overhead
slide-71
SLIDE 71

71 / 86

Iterative solution

  • What if we can combine recursive calls

– One iteration for each level

  • Separate recursive calls on partitions:

3 5 2 7 6 9 12 16 19 17 14 22 25 20 23 1

P0 P1 P2 P3 A

slide-72
SLIDE 72

72 / 86

Iterative solution

  • Know size of partition i = |Pi|
  • Find a pivot for each partition

3 5 2 7 6 9 12 16 19 17 14 22 25 20 23 1

P0 P1 P2 P3 A

slide-73
SLIDE 73

73 / 86

Iterative solution

  • Know size of partition i = |Pi|
  • Find a pivot for each partition

– Move pivots to front

1 5 2 9 6 7 16 12 19 17 14 22 25 20 23 3

P0 P1 P2 P3 A

slide-74
SLIDE 74

74 / 86

Iterative solution

  • Know size of partition i = |Pi|
  • Find a pivot for each partition

– Move pivots to front

  • Compute B

– Compare each to the pivot in its partition

1 5 2 9 6 7 16 12 19 17 14 22 25 20 23 3

P0 P1 P2 P3 A

1 1 1 1 1 1 1 1 1 1 1

B

slide-75
SLIDE 75

75 / 86

Iterative solution

  • Want prefix sum within each partition:
  • Segmented prefix sums

– Each partition is a separate segment

1 5 2 9 6 7 16 12 19 17 14 22 25 20 23 3

P0 P1 P2 P3 A

1 1 1 1 1 1 1 1 1 1 1

B

slide-76
SLIDE 76

76 / 86

Iterative solution

  • Want prefix sum within each partition:
  • Segmented prefix sums

– Each partition is a separate segment – Can combine into 1 operation...

1 5 2 9 6 7 16 12 19 17 14 22 25 20 23 3

P0 P1 P2 P3 A

2 2 3 1 2 3 1 2 2 2 3 1 1 2 2 1

B

slide-77
SLIDE 77

77 / 86

Segmented prefix sums

  • Input array A and flag bits F

– 1 if start of new segment – 0 otherwise

  • Prefix sums, except sum resets when F[i]=1

1 4 1 5 2 1 3 4 2 6 1 3 4 3

A

1 1 1

F

1

slide-78
SLIDE 78

78 / 86

Segmented prefix sums

  • Input array A and flag bits F

– 1 if start of new segment – 0 otherwise

  • Prefix sums, except sum resets when F[i]=1

1 4 1 5 2 1 3 4 2 6 1 3 4 3

A

1 1 1

F

1 4 8 1 6 8 9 12 16 0 2 6 1 1 4 8 3

C

slide-79
SLIDE 79

79 / 86

Partition with segments

  • Create F with partition boundaries

1 5 2 9 6 7 16 12 19 17 14 22 25 20 23 3

P0 P1 P2 P3 A

1 1 1 1 1 1 1 1 1 1 1

B

1 1 1

F

slide-80
SLIDE 80

80 / 86

Partition with segments

  • Create F with partition boundaries
  • Perform segmented prefix sums on B and F

P0 P1 P2 P3

2 2 3 1 2 3 1 2 2 2 3 1 1 2 2 1

B

1 1 1

F

1 5 2 9 6 7 16 12 19 17 14 22 25 20 23 3

A

slide-81
SLIDE 81

81 / 86

Partition with segments

  • Create F with partition boundaries
  • Perform segmented prefix sums on B and F
  • Copy A[i] into C[B[i]] (plus partition offsets)

P0 P1 P2 P3

2 2 3 1 2 3 1 2 2 2 3 1 1 2 2 1

B

1 2 9 6 7 16 12 14 22 20 3

C

1 5 2 9 6 7 16 12 19 17 14 22 25 20 23 3

A

slide-82
SLIDE 82

82 / 86

Partition with segments

  • Repeat for > pivots:

– Build B‘

P0 P1 P2 P3

1 1 1 1 1

B‘

1 2 9 6 7 16 12 14 22 20 3

C

1 5 2 9 6 7 16 12 19 17 14 22 25 20 23 3

A

slide-83
SLIDE 83

83 / 86

Partition with segments

  • Repeat for > pivots:

– Segmented prefix sums on B‘

P0 P1 P2 P3

1 1 1 2 2 1 1 2

B‘

1 2 9 6 7 16 12 14 22 20 3

C

1 5 2 9 6 7 16 12 19 17 14 22 25 20 23 3

A

slide-84
SLIDE 84

84 / 86

Partition with segments

  • Repeat for > pivots:

– Copy remaining A values into C

P0 P1 P2 P3

1 2 9 6 7 16 12 14 22 20 3

C

5 19 17 25 23 1 1 1 2 2 1 1 2

B‘

1 5 2 9 6 7 16 12 19 17 14 22 25 20 23 3

A

slide-85
SLIDE 85

85 / 86

Partition with segments

  • Ready for next iteration...

P0 P1 P2 P3

1 1 1 2 2 1 1 2

B‘

1 2 9 6 7 16 12 14 22 20 3

C

5 19 17 25 23 1 5 2 9 6 7 16 12 19 17 14 22 25 20 23 3

A

slide-86
SLIDE 86

86 / 86

Notes about Iterative quicksort

  • Need to keep track of partition offsets, etc.
  • Still need to pick good pivots
  • Same runtime as recursive
  • Easier to optimize

– Unroll loops, etc.

  • Less overhead (on most architectures)