15-150 Fall 2020 Stephen Brookes Lecture 17 Sequences and cost - - PowerPoint PPT Presentation

15 150 fall 2020
SMART_READER_LITE
LIVE PREVIEW

15-150 Fall 2020 Stephen Brookes Lecture 17 Sequences and cost - - PowerPoint PPT Presentation

15-150 Fall 2020 Stephen Brookes Lecture 17 Sequences and cost graphs Halloween, a full moon and a time change all happening simultaneously announcements Next Tuesday (3 Nov) is ELECTION DAY Class will be held as usual (on zoom)


slide-1
SLIDE 1

15-150 Fall 2020

Stephen Brookes

Lecture 17 Sequences and cost graphs

Halloween, a full moon and a time change all happening simultaneously

slide-2
SLIDE 2

announcements

  • Next Tuesday (3 Nov) is ELECTION DAY
  • Class will be held as usual (on zoom)
  • Homework is NOT DUE on Tuesday

to allow you time to participate in voting

  • NEW: the TAs will hold a Weekly Review

this evening (will be recorded, too)

slide-3
SLIDE 3

today

parallel programming

  • cost semantics
  • Brent’s Theorem
  • sequences:

an abstract type with efficient parallel operations

slide-4
SLIDE 4

parallelism

exploiting multiple processors evaluating independent code simultaneously

  • low-level implementation
  • scheduling work onto processors
  • high-level planning
  • designing code abstractly
  • without baking in a schedule
slide-5
SLIDE 5
  • ur approach

design abstractly

  • behavioral correctness
  • asymptotic runtime (work, span)

reason abstractly

  • independently of schedule
  • cost semantics and evaluation
slide-6
SLIDE 6
  • You design the code
  • The compiler schedules the work
slide-7
SLIDE 7

functional benefits

  • No side effects, so…

evaluation order doesn’t affect correctness

  • Can build abstract types that support

efficient parallel-friendly operations

  • Can use work and span to predict

potential for parallel speed-up

  • Work and span are independent of

scheduling details

slide-8
SLIDE 8

caveat

  • In practice, it’s hard to achieve speed-up
  • Current language implementations

don’t make it easy

  • Problems include:
  • scheduling overhead
  • locality of data (cache problems)
  • runtime sensitive to scheduling choices
slide-9
SLIDE 9

why bother?

  • It’s good to think abstractly first

and figure out details later

  • Focus on data dependencies

when you design your code

  • Our thesis: this approach to parallelism

will prevail... (and 15-210 builds on these ideas...)

slide-10
SLIDE 10

cost semantics

We already introduced work and span

  • Work estimates the sequential evaluation time
  • n a single processor
  • Span takes account of data dependency,

estimates the parallel evaluation time with unlimited processors

slide-11
SLIDE 11

cost semantics

  • We showed how to calculate work and span

for recursive functions with recurrence relations

  • Now we introduce cost graphs,

another way to deal with work and span

  • Cost graphs also allow us to talk about schedules...

... and the potential for speed-up

slide-12
SLIDE 12

cost graphs

A cost graph is a series-parallel graph

  • a directed graph, with source and sink
  • nodes represent units of work
  • edges represent data dependencies
  • branching indicates potential parallelism

(constant time)

slide-13
SLIDE 13

series-parallel graphs

.

a single node

G1 G2

sequential composition

G1 G2

. . . . . . . . . .

parallel composition

slide-14
SLIDE 14

work and span

  • The work is the number of nodes
  • The span is the length of the longest path

from source to sink

  • f a cost graph

span(G) ≤ work(G)

slide-15
SLIDE 15

G2 G1

. . . .

work = work G1 + work G2 + c

.

G1 G2

. . . . .

work = work G1 + work G2 + c

independent code … add the work dependent code … add the work

slide-16
SLIDE 16

G2 G1

. . . .

span = span G1 + span G2 + c

.

G1 G2

. . . . .

span = max(span G1 , span G2) + c

dependent code … add the span independent code … max the span

slide-17
SLIDE 17

sources and sinks

  • Sometimes we omit them from pictures
  • no loss of generality
  • easy to put them back in
  • No difference, asymptotically
  • a single node represents an additive

constant amount of work and span

  • Allows easier explanation of execution
slide-18
SLIDE 18

example

① ② ③ ⑤ ⑥ ⑨ ⑦ ⑩ ⑪ ⑧ ④ work = 11 span = 4 (number of nodes) (longest path length)

must be done before

⑦ ① ②

and

each node represents a single unit of work

slide-19
SLIDE 19

using cost graphs

  • Every expression can be given a cost graph
  • Can calculate work and span using the graph
  • These are asymptotically the same as

the work and span derived from recurrence relations work and span provide asymptotic estimates of actual running time, under certain assumptions

basic ops take constant time

work: single processor span: unlimited processors

slide-20
SLIDE 20

scheduling

  • Work: number of nodes
  • Span: length of critical path

① ② ③ ④ ⑤ ⑥ ⑨ ⑦ ⑩ ⑪ ⑧ w = 11 s = 4 an optimal parallel schedule ① ② ③ ④ ⑦ ⑤ ⑥ ⑧ ⑨ ⑩

(5 rounds,

  • r 4 steps)

⑪ (i) (ii) (iii) (iv) (v)

assign units of work to processors respecting data dependency

uses 5 processors

{

slide-21
SLIDE 21

example

What if there are only 2 processors? ① ② ③ ④ ⑤ ⑥ ⑨ ⑦ ⑩ ⑪ ⑧ w = 11 s = 4 a best schedule for 2 processors ① ② ③ ④ ⑦ ⑤ ⑥ ⑧ ⑨ ⑩ ⑪ (6 rounds, 5 steps) (i) (ii) (iii) (iv) (v) (vi) 2 processors cannot do the job as fast as 5 (!)

slide-22
SLIDE 22

Brent’s Theorem

An expression with work w and span s can be evaluated on a p-processor machine in time O(max(w/p, s)).

Optimal schedule using p processors: Do (up to) p units of work each round

Total work to do is w Needs at least s steps

Richard Brent is an illustrious Australian mathematician and computer scientist. He is known for Brent’s Theorem, which shows that a parallel algorithm can always be adapted to run on fewer processors with only the obvious time penalty —a beautiful example of an “obvious” but non-trivial theorem.

David Brent is the manager of the Slough branch

  • f Wernham–Hogg. He wants to know how many

computers to buy to improve office efficiency.

Using more than this many processors won’t yield any speed-up

Find me the smallest p such that w/p ≤ s

slide-23
SLIDE 23

example

① ② ③ ⑤ ⑥ ⑨ ⑦ ⑩ ⑪ ⑧ ④ w = 11 s = 4 min {p | w/p ≤ s} is 3 a best schedule for 3 processors (5 rounds, 4 steps) ① ③ ④ (i) ② ⑤ ⑥ (ii) ⑦ ⑧ (iii) ⑨ ⑩ (iv) ⑪ (v) 3 processors can do the work as fast as 5(!)

slide-24
SLIDE 24

summary

  • Cost graphs give us another way to talk

about work and span

  • Brent’s Theorem tells us about the

potential for parallel speed-up

  • check if w/p ≤ s
slide-25
SLIDE 25

next

  • Exploiting parallelism in ML
  • A signature for parallel collections
  • Cost analysis of implementations
  • Cost benefits of parallel algorithm design
  • we revisit some list-based functions
  • sequence-based functions are faster
slide-26
SLIDE 26

sequences

signature SEQ = sig type ’a seq exception Range val tabulate : (int -> ’a) -> int -> ’a seq val length : ’a seq -> int val nth : int -> ’a seq -> ’a
 val split : ’a seq -> ’a seq * ’a seq val map : (’a -> ’b) -> ’a seq -> ’b seq val reduce : (’a * ’a -> ’a) -> ’a -> ’a seq -> ’a val mapreduce : (’a -> ’b) -> ’b -> (’b * ’b -> ’b) -> ’a seq -> ’b end

slide-27
SLIDE 27

SEQ

  • We may expand the SEQ signature later…

… with some extra functions

  • For today, let’s keep it simple
  • Purpose: a value of type t seq

is a sequence of values of type t

  • with faster operations

than those available for t list

slide-28
SLIDE 28

implementations

  • Many ways to implement the signature
  • lists, balanced trees, arrays, ...
  • For each one, can give a cost analysis
  • There may be implementation trade-offs
  • lists: access is O(n), length is O(n)
  • arrays: access is O(1), length is O(1)
  • trees: access is O(log n), length is ??

Obviously, a list-based implementation of sequences isn’t going to be faster than lists! But arrays, trees, can be.

slide-29
SLIDE 29

Seq : SEQ

  • An abstract parameterized type of sequences
  • Think of a sequence as a parallel collection
  • With parallel-friendly operations
  • constant-time access to items
  • efficient map and reduce

We’ll work today with an implementation Seq : SEQ based on vectors

slide-30
SLIDE 30

notation

  • We have an abstract type of sequences
  • We want to think about sequence values

in a way that’s independent of any specific implementation

  • could be lists, arrays, trees, …
  • We need a neutral notation for sequences

⟨v0, ..., vn-1⟩ This is NOT program syntax!

slide-31
SLIDE 31

notation

  • Remember that if we have structures like

ListSeq : SEQ ArraySeq : SEQ BalancedTreeSeq : SEQ we can use qualified names like ListSeq.empty, int ListSeq.seq

slide-32
SLIDE 32

think abstractly

  • We’ll mostly use the abstract notation

for sequences

  • We’ll give abstract specifications
  • But we’ll discuss work/span characteristics

for a specific implementation

  • other implementations may have

different work/span

slide-33
SLIDE 33

sequence values

  • We use math notation like

⟨v1, ..., vn⟩ ⟨v0, ..., vn-1⟩ ⟨ ⟩ for sequence values is a value of type int seq ⟨1, 2, 4, 8⟩ A value of type t seq is a sequence of values of type t

slide-34
SLIDE 34

equality

  • Two sequence values are (extensionally) equal

iff they have the same length and have equal items at all positions ⟨v1, ..., vn⟩ ⟨u1, ..., um⟩ = if and only if n = m and for all i, vi = ui Again, this is NOT program notation

slide-35
SLIDE 35
  • perations

For our given structure Seq : SEQ, we specify

  • the (extensional) behavior
  • the cost semantics
  • f each operation

Other implementations of SEQ are designed to have the same extensional behavior but may have different work/span profiles Learn to choose wisely!

slide-36
SLIDE 36

tabulate

  • If Gi is cost graph for f(i),

the cost graph for tabulate f n is

G0 Gn-1

. . ... If f is O(1), the work for tabulate f n is O(n) If f is O(1), the span for tabulate f n is O(1) tabulate f n = ⟨f 0, ..., f(n-1)⟩

W(tabulate f n) = W(f 0) + … + W(f (n-1)) + c S(tabulate f n) = max {S(f 0), …, S(f (n-1))} + c REQUIRES f total & n ≥0 ENSURES

slide-37
SLIDE 37

examples

  • tabulate (fn x:int => x) 6
  • tabulate (fn x:int => x*x) 6
  • tabulate (fn _ => raise Range) 0

⟨0, 1, 2, 3, 4, 5⟩ ⟨0, 1, 4, 9, 16, 25⟩ ⟨ ⟩

slide-38
SLIDE 38

length

  • Cost graph is
  • Work is O(1)
  • Span is O(1)

. .

Contrast: List.length [v1,...,vn] = n work, span O(n) length ⟨v1, ..., vn⟩ = n

slide-39
SLIDE 39

nth

  • Work is O(1)
  • Span is O(1)
  • Cost graph is

Seq provides constant-time access to items

. .

nth i ⟨v0, ..., vn-1⟩ = vi if 0 ≤ i < n = raise Range otherwise Contrast with lists

slide-40
SLIDE 40

examples

When f is total & 0 ≤ i < n, length (tabulate f n) = n nth i (tabulate f n) = f i For all types t and values S : t seq, S = tabulate (fn i => nth i S) (length S)

slide-41
SLIDE 41

split

  • Work is O(n)
  • Span is O(1)

split ⟨v1, ..., v2n⟩ = (⟨v1, ..., vn⟩,⟨vn+1, ..., v2n⟩) split ⟨v1, ..., v2n+1⟩ = (⟨v1, ..., vn⟩,⟨vn+1, ..., v2n+1⟩)

slide-42
SLIDE 42

map

  • If f is constant time,

work O(n), span O(1) (contrast with List.map) map f ⟨v0, ..., vn-1⟩ = ⟨f v0, ..., f vn-1⟩

G0 Gn-1

. ... map f ⟨v0, ..., vn-1⟩ has cost graph where each Gi is cost graph for f vi map f ⟨v0, ..., vn-1⟩ has

slide-43
SLIDE 43

notes

  • The shape of the cost graph shows that

map f ⟨v0, ..., vn-1⟩ evaluates f v0, ..., f vn-1 in parallel,

producing a sequence value ⟨x0, ..., xn-1⟩

in which each xi is the value of f vi

G0 Gn-1

. ...

slide-44
SLIDE 44

notes

  • For List.map f [v0, …, vn-1]

the cost graph would look like

G0 Gn-1

. . .

showing sequential evaluation

  • f

f v0 … f vn-1

slide-45
SLIDE 45

examples

map (fn x => x*x) ⟨1, 2, 3⟩ = ⟨1, 4, 9⟩ map g (tabulate f n) = tabulate (g o f) n If f, g total then

map g ⟨f 0, ..., f(n-1)⟩ = ⟨g(f 0), ..., g(f(n-1))⟩ = ⟨(g o f) 0, ..., (g o f)(n-1)⟩

slide-46
SLIDE 46

reduce

reduce should be used to combine a sequence using an associative total function g with identity element z

  • g : t * t -> t is associative iff for all x1,x2,x3:t

g(x1, g(x2, x3)) = g(g(x1, x2), x3)

  • z is an identity element for g iff for all x:t,

g(x,z) = x We write v0 g v1 g ... g vn-1 for the result of combining v0, …, vn-1, z reduce g z ⟨v0, ..., vn-1⟩ = v0 g v1 g ... g vn-1 g z = v0 g v1 g ... g vn-1

slide-47
SLIDE 47

examples

(op +) : int * int -> int associative, with identity element 0 (op * ) : int * int -> int associative, with identity element 1 (op @) : t list * t list -> t list associative, with identity element [ ]

reduce (op +) 0 ⟨v0, ..., vn-1⟩ = v0 + … + vn-1 reduce (op * ) 1 ⟨v0, ..., vn-1⟩ = v0 * … * vn-1 reduce (op @ ) [ ] ⟨v0, ..., vn-1⟩ = v0 @ … @ vn-1

slide-48
SLIDE 48

reduce

  • When g is total, associative & z is an identity for g
  • If g is constant time,

reduce g z ⟨v0, ..., vn-1⟩ = v0 g v1 g ... g vn-1 reduce g z ⟨v0, ..., vn-1⟩ has work O(n) and span O(log n) (Contrast with foldr, foldl on lists) needs to use g n times divide-and-conquer

slide-49
SLIDE 49

reduce (op +) 0 ⟨1, 2, 3, 4, 5, 6, 7, 8⟩

1 2 3 4

+ + +

5 6 7 8

+ + + + cost graph . . .

slide-50
SLIDE 50

cost graphs

reduce g z ⟨v1, ..., v2n⟩ = g(reduce g z ⟨v1, ..., vn⟩, reduce g z ⟨vn+1, ..., v2n⟩) G⟨1, ..., 2n⟩ = G⟨1, ..., n⟩ G⟨n+1, ..., 2n⟩

. . .

g

W(2n) = 2*W(n) + c S(2n) = S(n) + c

W(n) is O(n) S(n) is O(log2 n) reduce splits the sequence into halves

Let W(m) = work for reduce g z S when length S = m

slide-51
SLIDE 51

reduce

  • Can be defined (using length, split, nth)

in a generic way (independent of the way sequences are implemented)

fun reduce g z S = case (length S) of 0 => z | 1 => nth 0 S | n => let val (L, R) = split S in g(reduce g z L, reduce g z R) end

slide-52
SLIDE 52

reduce

  • For a specific structure : SEQ

it may be possible (even better) to define reduce in a way that exploits how sequences are represented

fun reduce g z Empty = z | reduce g z (Node(A, x, B)) = let val (a, b) = (reduce g z A, reduce g z B) in g(a, g(x, b)) end

in BalancedTreeSeq : SEQ

slide-53
SLIDE 53

reduce

REQUIRES g total, associative & z an identity for g ENSURES reduce g z ⟨v0, ..., vn-1⟩ = g v0 … g vn-1

  • The requirements are crucial!
  • To see why, prove correctness of reduce,

given that nth, split, length satisfy their specs

slide-54
SLIDE 54

correctness

If g total, associative, z an identity for g, then reduce g z ⟨v0, ..., vn-1⟩ = g v0 … g vn-1

Proof: by induction on n

fun reduce g z S = case (length S) of 0 => z | 1 => nth 0 S | n => let val (L, R) = split S in g(reduce g z L, reduce g z R) end

EXERCISE

slide-55
SLIDE 55

mapreduce

  • When g is associative and z is an identity,
  • When f, g are constant time,

mapreduce f z g ⟨v0, ..., vn-1⟩ = (f v0) g ... g (f vn-1) g z mapreduce f z g ⟨v0, ..., vn-1⟩ has work O(n) and span O(log n)

slide-56
SLIDE 56

examples

fun sum (s : int seq) : int = reduce (op +) 0 s fun count (s : int seq seq) : int = sum (map sum s)

slide-57
SLIDE 57

analysis

  • Let s be a value of type int seq seq

consisting of n rows, each of length n

  • What are the work and span for

count s ?

fun sum (s : int seq) : int = reduce (op +) 0 s

fun count (s : int seq seq) : int = sum (map sum s)

slide-58
SLIDE 58

analysis

map sum s = ⟨sum s1, ..., sum sn⟩

sum s1

. . ...

sum sn

Let s = ⟨s1, ..., sn⟩ , si = ⟨xi1, ..., xin⟩, ti = sum si For each i, sum si = reduce(op +) 0 ⟨xi1, ..., xin⟩

sum si

log2 n

work is O(n) span is O(log n) work is O(n2) span is O(log n)

cost graph of sum si cost graph of map sum s

slide-59
SLIDE 59

analysis

count s = sum ⟨t1, ..., tn⟩

sum s1

. . ...

sum sn

Let ti = sum si work is O(n2) span is O(log n)

sum ⟨t1, ..., tn⟩

log2 n log2 n

cost graph of sum (map sum s)

slide-60
SLIDE 60

sequence laws

length (tabulate f n) = n if f total, n≥0 nth i (map f S) = f (nth i S) if f total, 0≤i<length S S = tabulate (fn i => nth i S) (length S)

Every sensible implementation of SEQ should validate these equations

ListSeq ArraySeq BalancedTreeSeq

slide-61
SLIDE 61

exercises

  • Define functions

reverse : ’a seq -> ’a seq zip : ’a seq * ’b seq -> (’a * ’b) seq and analyze their work and span with reasonable specifications

slide-62
SLIDE 62

reverse

fun reverse (s : ’a seq) : ’a seq = let val n = length s in tabulate (fn i => nth (n - i - 1) s) n end

What are the work and span for

reverse ⟨v1, ..., vn⟩ ?

Use the given information about work/span for length, nth, tabulate

slide-63
SLIDE 63

zip : ’a seq * ’b seq -> (’a * ’b) seq REQUIRES length xs = length ys ENSURES zip (⟨x1,. . .,xn⟩, ⟨y1,. . .,yn⟩) = ⟨(x1,y1),. . .,(xn,yn)⟩ fun zip (xs, ys) = let val n = Int.min(length xs, length ys) in tabulate (fn i => (nth i xs, nth i ys)) n end

zip

Use the given information about work/span for length, nth, tabulate

slide-64
SLIDE 64

zip analysis

  • See why we need to REQUIRE that the

two sequences have the same length

  • What happens otherwise, e.g.

val xs = tabulate (fn i => i) 4 val ys = tabulate (fn i => i*i) 3 val squares = zip(xs, ys) nth 4 squares

slide-65
SLIDE 65

summary

  • An abstract type of sequences
  • Signature SEQ
  • A variety of different implementations
  • ListSeq : SEQ
  • ArraySeq : SEQ
  • BalancedTreeSeq : SEQ
  • To think abstractly, ignore implementation

details and focus on the signature…

  • But work/span are implementation-dependent
slide-66
SLIDE 66

HALLOWEEN N+1 THE INDUCTION

Happy Halloween

(15150-style) red-black trees vicious bishops

The Fun and Easy Way to Create Complex Incomprehensible Programs that Work Your First Aid Kit For Cleaning Up Messy Direct-style Programs Explained in Plain English How To Make Your Program Backtrack

C O N T I N U A T I O N P A S S I N G S T Y L E F O R D U M M I E S

REQUIRES tricks ENSURES treats

most generals

Barnes-Hut

slide-67
SLIDE 67

happy halloween

NIGHTMARE ON ML STREET

slide-68
SLIDE 68

Check out Vote411.org Use it to help register to vote, check registrations,

  • r learn how to obtain absentee ballots.

Use it to find polling locations and check ID requirements for voting