parallel func onal arrays
play

Parallel Func+onal Arrays Ananya Kumar Guy Blelloch Robert Harper - PowerPoint PPT Presentation

Parallel Func+onal Arrays Ananya Kumar Guy Blelloch Robert Harper Carnegie Mellon University Goals Func+onal arrays Efficient (constant +me) Parallel Well defined cost seman+cs Previous Work - Monads Thread mutable state


  1. Parallel Func+onal Arrays Ananya Kumar Guy Blelloch Robert Harper Carnegie Mellon University

  2. Goals • Func+onal arrays • Efficient (constant +me) • Parallel • Well defined cost seman+cs

  3. Previous Work - Monads • Thread mutable state • Enforce single reference to array • Need completely different code • Not parallel

  4. Previous Work – Specialized Type System • Enforce single threadedness of arrays • Not available in most languages • Hard to reason about

  5. Previous Work – Reference Coun+ng • Check reference counts • If one, update in place, else copy • Depends on compiler • Hard to reason about

  6. Sequences A = NEW(5, 0) 0 0 0 0 0 B = SET(A, 0, 3) C = SET(A, 2, 3) 3 0 0 0 0 0 0 3 0 0 D = SET(C, 4, 14) 0 0 3 0 14 E = SET(D, 1, 11) 0 11 3 0 14

  7. Sequences A = NEW(5, 0) 0 0 0 0 0 B = SET(A, 0, 3) C = SET(A, 2, 3) 3 0 0 0 0 0 0 3 0 0 D = SET(C, 4, 14) 0 0 3 0 14 E = SET(D, 1, 11) 0 11 3 0 14

  8. Sequences A = NEW(5, 0) 0 0 0 0 0 B = SET(A, 0, 3) C = SET(A, 2, 3) 3 0 0 0 0 0 0 3 0 0 D = SET(C, 4, 14) 0 0 3 0 14 E = SET(D, 1, 11) 0 11 3 0 14

  9. Sequences A = NEW(5, 0) 0 0 0 0 0 B = SET(A, 0, 3) C = SET(A, 2, 3) 3 0 0 0 0 0 0 3 0 0 D = SET(C, 4, 14) 0 0 3 0 14 E = SET(D, 1, 11) 0 11 3 0 14

  10. Sequences A = NEW(5, 0) 0 0 0 0 0 B = SET(A, 0, 3) C = SET(A, 2, 3) 3 0 0 0 0 0 0 3 0 0 D = SET(C, 4, 14) 0 0 3 0 14 E = SET(D, 1, 11) 0 11 3 0 14

  11. Sequences A = NEW(5, 0) 0 0 0 0 0 B = SET(A, 0, 3) C = SET(A, 2, 3) 3 0 0 0 0 0 0 3 0 0 D = SET(C, 4, 14) 0 0 3 0 14 E = SET(D, 1, 11) 0 11 3 0 14

  12. Sequences A = NEW(5, 0) 0 0 0 0 0 B = SET(A, 0, 3) C = SET(A, 2, 3) 3 0 0 0 0 0 0 3 0 0 D = SET(C, 4, 14) 0 0 3 0 14 E = SET(D, 1, 11) 0 11 3 0 14

  13. Sequences A = NEW(5, 0) 0 0 0 0 0 B = SET(A, 0, 3) C = SET(A, 2, 3) 3 0 0 0 0 0 0 3 0 0 D = SET(C, 4, 14) 0 0 3 0 14 E = SET(D, 1, 11) 0 11 3 0 14

  14. Sequences A = NEW(5, 0) 0 0 0 0 0 B = SET(A, 0, 3) C = SET(A, 2, 3) 3 0 0 0 0 0 0 3 0 0 D = SET(C, 4, 14) 0 0 3 0 14 E = SET(D, 1, 11) 0 11 3 0 14

  15. Sequences A = NEW(5, 0) 0 0 0 0 0 B = SET(A, 0, 3) C = SET(A, 2, 3) 3 0 0 0 0 0 0 3 0 0 D = SET(C, 4, 14) 0 0 3 0 14 E = SET(D, 1, 11) 0 11 3 0 14

  16. Previous Work • N = size of array • Dietz – O(log log N) per opera+on • Trailer arrays – O(1) for leaves • Improvements by Chuang, O’ Neill • No support for concurrency

  17. Our Approach • Func+onal • Efficient – O(1) for leaves, fast for interior • Parallel – wait-free • Well defined cost seman+cs

  18. Sequence Implementa+on C 0 11 3 0 14 2 D 3 E 4

  19. Main Sec+ons • Cost dynamics • Concurrent implementa+on

  20. Fork-Join Parallelism (1+2) || (3+4)

  21. Fork-Join Parallelism (1+2) || (3+4) Fork

  22. Fork-Join Parallelism (1+2) || (3+4) 3+4 1+2

  23. Fork-Join Parallelism (1+2) || (3+4) 3+4 1+2 7 3

  24. Fork-Join Parallelism (1+2) || (3+4) 3+4 1+2 7 3 Join

  25. Fork-Join Parallelism (1+2) || (3+4) 3+4 1+2 7 3 (3, 7)

  26. Work and Span N log(N) 1 Work: size of cost tree Span: depth of cost tree 1 1 1

  27. Work and Span N log(N) 1 Work: N + log(N) + 4 Span: N + log(N) + 2 1 1 1

  28. Scheduling Theorems • Work + Span gives execu+on cost on P processor machine • Goal: evaluate cost of using sequences on a P processor machine • Sufficient to evaluate work and span

  29. Parallel Structural Dynamics • Cost of running program with ∞ processors • Determinis+c

  30. Interleaved Structural Dynamics • Cost of running program with 1 processor • Non-determinis+c

  31. Interleaved Structural Dynamics • Store which sequences are interior and leaf

  32. Work = Non-Determinis+c A (leaf), size N GET GET GET SET Join

  33. Work (Good Interleaving) A (leaf), size N GET GET Current Work: 1 Total Work: 1 GET SET Join

  34. Work (Good Interleaving) A (leaf), size N GET GET Current Work: 1 Total Work: 2 GET SET Join

  35. Work (Good Interleaving) A (leaf), size N GET GET Current Work: 1 Total Work: 3 GET SET Join

  36. Work (Good Interleaving) A (leaf), size N GET GET Current Work: 1 Total Work: 4 GET SET Join

  37. Work = Non-Determinis+c A (leaf), size N GET GET GET SET Join

  38. Work (Bad Interleaving) A (leaf), size N GET GET Current Work: 1 Total Work: 1 GET SET Join

  39. Work (Bad Interleaving) A (leaf), size N GET GET Current Work: 1 Total Work: 2 GET SET Join

  40. Work (Bad Interleaving) A (leaf), size N GET GET Current Work: log(N) Total Work: 2 + log(N) GET SET Join

  41. Work (Bad Interleaving) A (leaf), size N GET GET Current Work: log(N) Total Work: 2 + 2log(N) GET SET Join

  42. GET-GET Case A (leaf), size N GET GET GET GET Join

  43. SET-GET Case A (leaf), size N GET GET GET SET Join

  44. SET-SET Case A (leaf), size N SET GET GET SET Join

  45. Upper Bounding Work • Determinis+c evalua+onal dynamics • Store which sequences are leaf and interior • Store the number of “cheap” (cost = 1) GETs on each sequence • At the join, if sequence was modified on one side, make the GETs expensive (cost = log(N))

  46. Upper Bounding Work • Showed that upper bounds are valid for all inter-leavings • Showed that the upper bound is +ght *

  47. A = NEW(5, 0) Seq A ArrayData 1 (Version = 1) Version 1 0 0 0 0 0

  48. B = SET(A, 2, 5) Seq A ArrayData 1 (Version = 2) Version 1 0 0 5 0 0 Seq B Version 1 Version 2 Value 0

  49. Naïve SET • Implementa+on of SET(A, i, v) • First set values[i] = v • Then add a log entry to arraydata

  50. GET-SET Race Sequence A, version 1 Array data AD, version 1 Values = [0, 0, 0, 0, 0] Logs = empty Thread 1 Thread 2 Result Step 1 Values[2] = 5 Step 2 GET(A, 2) Step 3 Add log entry to Logs[i]

  51. GET-SET Race Sequence A, version 1 Array data AD, version 1 Values = [0, 0, 0, 0, 0] Logs = empty Thread 1 Thread 2 Result Step 1 Values[2] = 5 ✓ Step 2 GET(A, 2) Step 3 Add log entry to Logs[i]

  52. GET-SET Race Sequence A, version 1 Array data AD, version 1 Values = [0, 0, 0, 0, 0] Logs = empty Thread 1 Thread 2 Result Step 1 Values[2] = 5 ✓ Step 2 GET(A, 2) 5 Step 3 Add log entry to Logs[i]

  53. GET-SET Race Sequence A, version 1 Array data AD, version 1 Values = [0, 0, 0, 0, 0] Logs = empty Thread 1 Thread 2 Result Step 1 Values[2] = 5 ✓ Step 2 GET(A, 2) 5 Step 3 Add log entry to ✓ Logs[i]

  54. A Wait-Free Solu+on • Can be fixed by adding log entry before muta+ng values array • Other issues in GET require careful ordering • Other issues in SET require compare & swap

  55. Experimental Results • Compared sequences to regular arrays • Random & sequen+al accesses • Wri+ng: 2-3 +mes slower • Reading: under 10% slower

  56. Concurrent Results • Compared – 1 thread reading million +mes – 2 threads reading half million +mes • 2 threads were > 1.75 +mes faster

  57. Summary • Func+onal array implementa+on • O(1) opera+ons for leaf • Wait-free concurrent • Well defined cost seman+cs

  58. Future Work • Prove concurrent costs of sequence implementa+on • Tighter cost bounds • Extend to disjoint sets, unordered sets • Lower bound for func+onal array costs

  59. Acknowledgements • Joe Tassaror for lots of advice on correctness proof • Danny Sleator for ideas on lower bounds for func+onal array costs • NSF, Air Force Office, Intel for grants

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend