Parallel Func+onal Arrays Ananya Kumar Guy Blelloch Robert Harper - - PowerPoint PPT Presentation

parallel func onal arrays
SMART_READER_LITE
LIVE PREVIEW

Parallel Func+onal Arrays Ananya Kumar Guy Blelloch Robert Harper - - PowerPoint PPT Presentation

Parallel Func+onal Arrays Ananya Kumar Guy Blelloch Robert Harper Carnegie Mellon University Goals Func+onal arrays Efficient (constant +me) Parallel Well defined cost seman+cs Previous Work - Monads Thread mutable state


slide-1
SLIDE 1

Parallel Func+onal Arrays

Ananya Kumar Guy Blelloch Robert Harper Carnegie Mellon University

slide-2
SLIDE 2

Goals

  • Func+onal arrays
  • Efficient (constant +me)
  • Parallel
  • Well defined cost seman+cs
slide-3
SLIDE 3

Previous Work - Monads

  • Thread mutable state
  • Enforce single reference to array
  • Need completely different code
  • Not parallel
slide-4
SLIDE 4

Previous Work – Specialized Type System

  • Enforce single threadedness of arrays
  • Not available in most languages
  • Hard to reason about
slide-5
SLIDE 5

Previous Work – Reference Coun+ng

  • Check reference counts
  • If one, update in place, else copy
  • Depends on compiler
  • Hard to reason about
slide-6
SLIDE 6

Sequences

3 11 14

A = NEW(5, 0) B = SET(A, 0, 3)

3

C = SET(A, 2, 3)

3 3 14

D = SET(C, 4, 14) E = SET(D, 1, 11)

slide-7
SLIDE 7

Sequences

3 11 14

A = NEW(5, 0) B = SET(A, 0, 3)

3

C = SET(A, 2, 3)

3 3 14

D = SET(C, 4, 14) E = SET(D, 1, 11)

slide-8
SLIDE 8

Sequences

3 11 14

A = NEW(5, 0) B = SET(A, 0, 3)

3

C = SET(A, 2, 3)

3 3 14

D = SET(C, 4, 14) E = SET(D, 1, 11)

slide-9
SLIDE 9

Sequences

3 11 14

A = NEW(5, 0) B = SET(A, 0, 3)

3

C = SET(A, 2, 3)

3 3 14

D = SET(C, 4, 14) E = SET(D, 1, 11)

slide-10
SLIDE 10

Sequences

3 11 14

A = NEW(5, 0) B = SET(A, 0, 3)

3

C = SET(A, 2, 3)

3 3 14

D = SET(C, 4, 14) E = SET(D, 1, 11)

slide-11
SLIDE 11

Sequences

3 11 14

A = NEW(5, 0) B = SET(A, 0, 3)

3

C = SET(A, 2, 3)

3 3 14

D = SET(C, 4, 14) E = SET(D, 1, 11)

slide-12
SLIDE 12

Sequences

3 11 14

A = NEW(5, 0) B = SET(A, 0, 3)

3

C = SET(A, 2, 3)

3 3 14

D = SET(C, 4, 14) E = SET(D, 1, 11)

slide-13
SLIDE 13

Sequences

3 11 14

A = NEW(5, 0) B = SET(A, 0, 3)

3

C = SET(A, 2, 3)

3 3 14

D = SET(C, 4, 14) E = SET(D, 1, 11)

slide-14
SLIDE 14

Sequences

3 11 14

A = NEW(5, 0) B = SET(A, 0, 3)

3

C = SET(A, 2, 3)

3 3 14

D = SET(C, 4, 14) E = SET(D, 1, 11)

slide-15
SLIDE 15

Sequences

3 11 14

A = NEW(5, 0) B = SET(A, 0, 3)

3

C = SET(A, 2, 3)

3 3 14

D = SET(C, 4, 14) E = SET(D, 1, 11)

slide-16
SLIDE 16

Previous Work

  • N = size of array
  • Dietz – O(log log N) per opera+on
  • Trailer arrays – O(1) for leaves
  • Improvements by Chuang, O’ Neill
  • No support for concurrency
slide-17
SLIDE 17

Our Approach

  • Func+onal
  • Efficient – O(1) for leaves, fast for interior
  • Parallel – wait-free
  • Well defined cost seman+cs
slide-18
SLIDE 18

Sequence Implementa+on

2

C

3 11 14 3

D

4

E

slide-19
SLIDE 19

Main Sec+ons

  • Cost dynamics
  • Concurrent implementa+on
slide-20
SLIDE 20

Fork-Join Parallelism

(1+2) || (3+4)

slide-21
SLIDE 21

Fork-Join Parallelism

(1+2) || (3+4) Fork

slide-22
SLIDE 22

Fork-Join Parallelism

(1+2) || (3+4) 1+2 3+4

slide-23
SLIDE 23

Fork-Join Parallelism

(1+2) || (3+4) 1+2 3 3+4 7

slide-24
SLIDE 24

Fork-Join Parallelism

(1+2) || (3+4) 1+2 3 3+4 7 Join

slide-25
SLIDE 25

Fork-Join Parallelism

(1+2) || (3+4) 1+2 3 3+4 7 (3, 7)

slide-26
SLIDE 26

Work and Span

N log(N) 1 1 1 1

Work: size of cost tree Span: depth of cost tree

slide-27
SLIDE 27

Work and Span

N log(N) 1 1 1 1

Work: N + log(N) + 4 Span: N + log(N) + 2

slide-28
SLIDE 28

Scheduling Theorems

  • Work + Span gives execu+on cost on P

processor machine

  • Goal: evaluate cost of using sequences on a P

processor machine

  • Sufficient to evaluate work and span
slide-29
SLIDE 29

Parallel Structural Dynamics

  • Cost of running program with ∞ processors
  • Determinis+c
slide-30
SLIDE 30

Interleaved Structural Dynamics

  • Cost of running program with 1 processor
  • Non-determinis+c
slide-31
SLIDE 31

Interleaved Structural Dynamics

  • Store which sequences are interior and leaf
slide-32
SLIDE 32

Work = Non-Determinis+c

A (leaf), size N GET SET GET GET Join

slide-33
SLIDE 33

Work (Good Interleaving)

Current Work: 1 Total Work: 1

A (leaf), size N GET SET GET GET Join

slide-34
SLIDE 34

Work (Good Interleaving)

Current Work: 1 Total Work: 2

A (leaf), size N GET SET GET GET Join

slide-35
SLIDE 35

Work (Good Interleaving)

A (leaf), size N GET SET GET GET Join

Current Work: 1 Total Work: 3

slide-36
SLIDE 36

Work (Good Interleaving)

A (leaf), size N GET SET GET GET Join

Current Work: 1 Total Work: 4

slide-37
SLIDE 37

Work = Non-Determinis+c

A (leaf), size N GET SET GET GET Join

slide-38
SLIDE 38

Work (Bad Interleaving)

Current Work: 1 Total Work: 1

A (leaf), size N GET SET GET GET Join

slide-39
SLIDE 39

Work (Bad Interleaving)

Current Work: 1 Total Work: 2

A (leaf), size N GET SET GET GET Join

slide-40
SLIDE 40

Work (Bad Interleaving)

Current Work: log(N) Total Work: 2 + log(N)

A (leaf), size N GET SET GET GET Join

slide-41
SLIDE 41

Work (Bad Interleaving)

Current Work: log(N) Total Work: 2 + 2log(N)

A (leaf), size N GET SET GET GET Join

slide-42
SLIDE 42

GET-GET Case

A (leaf), size N GET GET GET GET Join

slide-43
SLIDE 43

SET-GET Case

A (leaf), size N GET SET GET GET Join

slide-44
SLIDE 44

SET-SET Case

A (leaf), size N GET SET SET GET Join

slide-45
SLIDE 45

Upper Bounding Work

  • Determinis+c evalua+onal dynamics
  • Store which sequences are leaf and interior
  • Store the number of “cheap” (cost = 1) GETs
  • n each sequence
  • At the join, if sequence was modified on one

side, make the GETs expensive (cost = log(N))

slide-46
SLIDE 46

Upper Bounding Work

  • Showed that upper bounds are valid for all

inter-leavings

  • Showed that the upper bound is +ght*
slide-47
SLIDE 47

A = NEW(5, 0)

Version 1

Seq A ArrayData 1 (Version = 1)

slide-48
SLIDE 48

B = SET(A, 2, 5)

5 Version 1

Seq A ArrayData 1 (Version = 2)

Version 2

Seq B

Version 1 Value

slide-49
SLIDE 49

Naïve SET

  • Implementa+on of SET(A, i, v)
  • First set values[i] = v
  • Then add a log entry to arraydata
slide-50
SLIDE 50

GET-SET Race

Sequence A, version 1 Array data AD, version 1 Values = [0, 0, 0, 0, 0] Logs = empty

Thread 1 Thread 2 Result Step 1 Values[2] = 5 Step 2 GET(A, 2) Step 3 Add log entry to Logs[i]

slide-51
SLIDE 51

GET-SET Race

Sequence A, version 1 Array data AD, version 1 Values = [0, 0, 0, 0, 0] Logs = empty

Thread 1 Thread 2 Result Step 1 Values[2] = 5 ✓ Step 2 GET(A, 2) Step 3 Add log entry to Logs[i]

slide-52
SLIDE 52

GET-SET Race

Sequence A, version 1 Array data AD, version 1 Values = [0, 0, 0, 0, 0] Logs = empty

Thread 1 Thread 2 Result Step 1 Values[2] = 5 ✓ Step 2 GET(A, 2) 5 Step 3 Add log entry to Logs[i]

slide-53
SLIDE 53

GET-SET Race

Sequence A, version 1 Array data AD, version 1 Values = [0, 0, 0, 0, 0] Logs = empty

Thread 1 Thread 2 Result Step 1 Values[2] = 5 ✓ Step 2 GET(A, 2) 5 Step 3 Add log entry to Logs[i] ✓

slide-54
SLIDE 54

A Wait-Free Solu+on

  • Can be fixed by adding log entry before

muta+ng values array

  • Other issues in GET require careful ordering
  • Other issues in SET require compare & swap
slide-55
SLIDE 55

Experimental Results

  • Compared sequences to regular arrays
  • Random & sequen+al accesses
  • Wri+ng: 2-3 +mes slower
  • Reading: under 10% slower
slide-56
SLIDE 56

Concurrent Results

  • Compared

– 1 thread reading million +mes – 2 threads reading half million +mes

  • 2 threads were > 1.75 +mes faster
slide-57
SLIDE 57

Summary

  • Func+onal array implementa+on
  • O(1) opera+ons for leaf
  • Wait-free concurrent
  • Well defined cost seman+cs
slide-58
SLIDE 58

Future Work

  • Prove concurrent costs of sequence

implementa+on

  • Tighter cost bounds
  • Extend to disjoint sets, unordered sets
  • Lower bound for func+onal array costs
slide-59
SLIDE 59

Acknowledgements

  • Joe Tassaror for lots of advice on correctness

proof

  • Danny Sleator for ideas on lower bounds for

func+onal array costs

  • NSF, Air Force Office, Intel for grants