A Compiler Representation for Incremental Parallelization Christoph - - PowerPoint PPT Presentation

a compiler representation for incremental parallelization
SMART_READER_LITE
LIVE PREVIEW

A Compiler Representation for Incremental Parallelization Christoph - - PowerPoint PPT Presentation

A Compiler Representation for Incremental Parallelization Christoph Angerer and Thomas Gross ETH Zurich Parallel Continuation Passing Style Unified Intermediate Representation for: Fully sequential code Parallel code Advanced


slide-1
SLIDE 1

A Compiler Representation for Incremental Parallelization

Christoph Angerer and Thomas Gross ETH Zurich

slide-2
SLIDE 2

Slide

Parallel Continuation Passing Style

  • Unified Intermediate Representation for:
  • Fully sequential code
  • Parallel code
  • Advanced control flow
  • Allows gradual parallelization of sequential programs

2

slide-3
SLIDE 3

Slide

Background

  • Compilers use suitable internal representations for:
  • Analysis
  • Program Transformations / Optimizations
  • Common IRs:
  • Static Single Assignment (SSA)
  • Continuation-Passing Style (CPS)
  • Automated translation from source into IR

3

slide-4
SLIDE 4

Slide

Problem

  • Current IRs lack support for parallelism:
  • No way for compiler to trade off/select grain size
  • f parallelization
  • No way to incrementally transform sequential

programs into parallel versions

  • in CPS: There can be only one tail-call

⇒ It is impossible to fork computation

  • (in the IR)

4

slide-5
SLIDE 5

Slide

Brief Review of CPS

  • A function never returns to its caller
  • Instead, a function expects a continuation function as

an additional parameter

  • A return is replaced with a (tail-)call to this

continuation, passing the result value

5

call/return CPS

A C B A B C

r=b(...); return res; b(..., fun c); c(res);

fun (int res) { ... }

slide-6
SLIDE 6

Slide

fib with call/return

6

int fib(int k) { if (k <= 2) return 1; else return fib(k-1) + fib(k-2); }

slide-7
SLIDE 7

Slide

fib with call/return

6

int fib(int k) { if (k <= 2) return 1; else return fib(k-1) + fib(k-2); }

slide-8
SLIDE 8

Slide

CPS Example

7

fun fib(int k, fun ret) { if (k <= 2) ret(1); else ret(

//fib(k-1) + fib(k-2)

); }

slide-9
SLIDE 9

Slide

CPS Example

7

fun fib(int k, fun ret) { if (k <= 2) ret(1); else ret(

//fib(k-1) + fib(k-2)

); }

slide-10
SLIDE 10

Slide

CPS Example

7

fun fib(int k, fun ret) { if (k <= 2) ret(1); else ret(

//fib(k-1) + fib(k-2)

); }

1 2 3

slide-11
SLIDE 11

Slide

CPS Example

8

fun fib(int k, fun ret) { if (k <= 2) ret(1); else

//fib(k-1) left //fib(k-2) right

ret(left + right); }

1 2 3

slide-12
SLIDE 12

Slide

CPS Example

9

fun fib(int k, fun ret) { if (k <= 2) ret(1); else fib(k-1, fun(left) { //fib(k-1) left fib(k-2, fun(right) { //fib(k-2) right ret(left + right); })}) }

slide-13
SLIDE 13

Slide

Basic Idea of pCPS

  • Relax tail-call restriction
  • Allow more than one successor
  • Enable forking of computation
  • Explicit happens-before relationships
  • Part of the IR
  • Can be analyzed and changed by the compiler

10

slide-14
SLIDE 14

Slide

Parallel CPS

11

class Main { task t() { schedule(this.foo()); } task foo() { ... } ... }

slide-15
SLIDE 15

Slide

Parallel CPS

11

class Main { task t() { schedule(this.foo()); } task foo() { ... } ... }

slide-16
SLIDE 16

Slide

Parallel CPS

11

class Main { task t() { schedule(this.foo()); } task foo() { ... } ... }

slide-17
SLIDE 17

Slide

Parallel CPS

11

class Main { task t() { schedule(this.foo()); } task foo() { ... } ... }

slide-18
SLIDE 18

Slide

Parallel CPS

12

class Main { task t() { schedule(this.foo()); schedule(this.bar(42)); } task foo() { ... } task bar(int x) { ... } ... }

slide-19
SLIDE 19

Slide

Parallel CPS

12

class Main { task t() { schedule(this.foo()); schedule(this.bar(42)); } task foo() { ... } task bar(int x) { ... } ... }

slide-20
SLIDE 20

Slide

Parallel CPS

12

class Main { task t() { schedule(this.foo()); schedule(this.bar(42)); } task foo() { ... } task bar(int x) { ... } ... }

slide-21
SLIDE 21

Slide

Parallel CPS

13

class Main { task t() { Activation a = schedule(this.foo()); Activation b = schedule(this.bar(42)); Activation c = schedule(this.blubb(a, b)); } task foo() { ... } task bar(int x) { ... } task blubb(Activation x, Activation y) { ... } ... }

slide-22
SLIDE 22

Slide

Parallel CPS

13

class Main { task t() { Activation a = schedule(this.foo()); Activation b = schedule(this.bar(42)); Activation c = schedule(this.blubb(a, b)); } task foo() { ... } task bar(int x) { ... } task blubb(Activation x, Activation y) { ... } ... }

slide-23
SLIDE 23

Slide

Parallel CPS

13

class Main { task t() { Activation a = schedule(this.foo()); Activation b = schedule(this.bar(42)); Activation c = schedule(this.blubb(a, b)); } task foo() { ... } task bar(int x) { ... } task blubb(Activation x, Activation y) { ... } ... }

slide-24
SLIDE 24

Slide

Parallel CPS

13

class Main { task t() { Activation a = schedule(this.foo()); Activation b = schedule(this.bar(42)); Activation c = schedule(this.blubb(a, b)); } task foo() { ... } task bar(int x) { ... } task blubb(Activation x, Activation y) { ... } ... }

slide-25
SLIDE 25

Slide

Parallel CPS

14

class Main { task t() { Activation a = schedule(this.foo()); Activation b = schedule(this.bar(42)); Activation c = schedule(this.blubb(a, b)); a c; b c; } task foo() { ... } task bar(int x) { ... } task blubb(Activation x, Activation y) { ... } ... }

slide-26
SLIDE 26

Slide

Parallel CPS

14

class Main { task t() { Activation a = schedule(this.foo()); Activation b = schedule(this.bar(42)); Activation c = schedule(this.blubb(a, b)); a c; b c; } task foo() { ... } task bar(int x) { ... } task blubb(Activation x, Activation y) { ... } ... }

slide-27
SLIDE 27

Slide

Parallel CPS

14

class Main { task t() { Activation a = schedule(this.foo()); Activation b = schedule(this.bar(42)); Activation c = schedule(this.blubb(a, b)); a c; b c; } task foo() { ... } task bar(int x) { ... } task blubb(Activation x, Activation y) { ... } ... }

slide-28
SLIDE 28

Slide

Parallel CPS

15

class Main { task t(Activation later) { Activation a = schedule(this.foo()); Activation b = schedule(this.bar(42)); Activation c = schedule(this.blubb(a, b)); a c; b c; c later; } task foo() { ... } task bar(int x) { ... } task blubb(Activation x, Activation y) { ... } ... }

slide-29
SLIDE 29

Slide

Parallel CPS

15

class Main { task t(Activation later) { Activation a = schedule(this.foo()); Activation b = schedule(this.bar(42)); Activation c = schedule(this.blubb(a, b)); a c; b c; c later; } task foo() { ... } task bar(int x) { ... } task blubb(Activation x, Activation y) { ... } ... }

slide-30
SLIDE 30

Slide

Parallel CPS

15

class Main { task t(Activation later) { Activation a = schedule(this.foo()); Activation b = schedule(this.bar(42)); Activation c = schedule(this.blubb(a, b)); a c; b c; c later; } task foo() { ... } task bar(int x) { ... } task blubb(Activation x, Activation y) { ... } ... }

slide-31
SLIDE 31

Slide

pCPS in a Nutshell

  • A task is similar to a method:
  • code that is executed in the context of this
  • Instead of calling a task, one schedules it for later

execution:

Activation b = schedule(this.bar(42));

  • -Statement creates explicit happens-before relationship:

a b;

16

slide-32
SLIDE 32

Slide

pCPS in a Nutshell (2)

  • The currently executing Activation is accessible through

the keyword now

  • Implicit happens-before relationship between now and a

newly scheduled task:

Activation a = schedule(this.bar(42));

/* implicit: now a; */

  • At runtime, a scheduler constantly chooses executable

⟨object, task()⟩ pairs (Activations)

17

slide-33
SLIDE 33

Slide

fib in pCPS

18

task fib(int k, Activation later) { ... } else { //make sum available in closures Activation sum; Activation left = schedule(fib(k-1, sum)); Activation right = schedule(fib(k-2, sum)); sum = schedule(this.sum(left, right)); left right; //left-to-right evaluation right sum; sum later; } } task sum(Activation left, Activation right) { ... }

slide-34
SLIDE 34

Slide

fib in pCPS

18

task fib(int k, Activation later) { ... } else { //make sum available in closures Activation sum; Activation left = schedule(fib(k-1, sum)); Activation right = schedule(fib(k-2, sum)); sum = schedule(this.sum(left, right)); left right; //left-to-right evaluation right sum; sum later; } } task sum(Activation left, Activation right) { ... }

slide-35
SLIDE 35

Slide

fib in pCPS

18

task fib(int k, Activation later) { ... } else { //make sum available in closures Activation sum; Activation left = schedule(fib(k-1, sum)); Activation right = schedule(fib(k-2, sum)); sum = schedule(this.sum(left, right)); left right; //left-to-right evaluation right sum; sum later; } } task sum(Activation left, Activation right) { ... }

slide-36
SLIDE 36

Slide

fib in pCPS

18

task fib(int k, Activation later) { ... } else { //make sum available in closures Activation sum; Activation left = schedule(fib(k-1, sum)); Activation right = schedule(fib(k-2, sum)); sum = schedule(this.sum(left, right)); left right; //left-to-right evaluation right sum; sum later; } } task sum(Activation left, Activation right) { ... }

slide-37
SLIDE 37

Slide

fib in pCPS

18

task fib(int k, Activation later) { ... } else { //make sum available in closures Activation sum; Activation left = schedule(fib(k-1, sum)); Activation right = schedule(fib(k-2, sum)); sum = schedule(this.sum(left, right)); left right; //left-to-right evaluation right sum; sum later; } } task sum(Activation left, Activation right) { ... }

⟨fib(k)⟩ ⟨fib(k-1)⟩ ⟨fib(k-2)⟩ ⟨sum()⟩

⟨later()⟩

slide-38
SLIDE 38

Slide

pCPS as an IR

  • Compiler can gradually increase/decrease parallelism
  • By adding/removing happens-before relationships
  • By combining/splitting tasks
  • Programmer may provide annotations to allow/

disallow/support certain optimizations

  • in fib(): computation of left and right hand side of the

+ can be done in parallel

19

slide-39
SLIDE 39

Slide

Removing Unnecessary Edges

20

⟨fib(k)⟩ ⟨fib(k-1)⟩ ⟨fib(k-2)⟩ ⟨sum()⟩ ⟨later()⟩ e f g

slide-40
SLIDE 40

Slide

Removing Unnecessary Edges

20

⟨fib(k)⟩ ⟨fib(k-1)⟩ ⟨fib(k-2)⟩ ⟨sum()⟩ ⟨later()⟩ e f g Unnecessary Edge X

slide-41
SLIDE 41

Slide

Removing Unnecessary Edges

20

⟨fib(k)⟩ ⟨fib(k-1)⟩ ⟨fib(k-2)⟩ ⟨sum()⟩ ⟨later()⟩ e g

slide-42
SLIDE 42

Slide

Removing Unnecessary Edges

20

⟨fib(k)⟩ ⟨fib(k-1)⟩ ⟨fib(k-2)⟩ ⟨sum()⟩ ⟨later()⟩ e g Need to fix transitive

  • rdering:

⟨fib(k)⟩ → ⟨fib(k-2)⟩ ⟨fib(k-1)⟩ → ⟨sum()⟩

slide-43
SLIDE 43

Slide

Fix Transitive Ordering

21

⟨fib(k)⟩ ⟨fib(k-1)⟩ ⟨fib(k-2)⟩ ⟨sum()⟩ ⟨later()⟩ e f g e g

slide-44
SLIDE 44

Slide

Fix Transitive Ordering

21

⟨fib(k)⟩ ⟨fib(k-1)⟩ ⟨fib(k-2)⟩ ⟨sum()⟩ ⟨later()⟩ e f g e g

slide-45
SLIDE 45

Slide

Fix Transitive Ordering

21

⟨fib(k)⟩ ⟨fib(k-1)⟩ ⟨fib(k-2)⟩ ⟨sum()⟩ ⟨later()⟩ e f g e g

slide-46
SLIDE 46

Slide

Fix Transitive Ordering

21

⟨fib(k)⟩ ⟨fib(k-1)⟩ ⟨fib(k-2)⟩ ⟨sum()⟩ ⟨later()⟩ e f g e g

slide-47
SLIDE 47

Slide

Fix Transitive Ordering

21

⟨fib(k)⟩ ⟨fib(k-1)⟩ ⟨fib(k-2)⟩ ⟨sum()⟩ ⟨later()⟩ e f g e e’ g

slide-48
SLIDE 48

Slide

Fix Transitive Ordering

21

⟨fib(k)⟩ ⟨fib(k-1)⟩ ⟨fib(k-2)⟩ ⟨sum()⟩ ⟨later()⟩ e f g e e’ g

slide-49
SLIDE 49

Slide

Fix Transitive Ordering

21

⟨fib(k)⟩ ⟨fib(k-1)⟩ ⟨fib(k-2)⟩ ⟨sum()⟩ ⟨later()⟩ e f g e e’ g

slide-50
SLIDE 50

Slide

Fix Transitive Ordering

21

⟨fib(k)⟩ ⟨fib(k-1)⟩ ⟨fib(k-2)⟩ ⟨sum()⟩ ⟨later()⟩ e f g e e’ g

slide-51
SLIDE 51

Slide

Fix Transitive Ordering

21

⟨fib(k)⟩ ⟨fib(k-1)⟩ ⟨fib(k-2)⟩ ⟨sum()⟩ ⟨later()⟩ e f g e e’ g

slide-52
SLIDE 52

Slide

Fix Transitive Ordering

21

⟨fib(k)⟩ ⟨fib(k-1)⟩ ⟨fib(k-2)⟩ ⟨sum()⟩ ⟨later()⟩ e f g e e’ g g’

slide-53
SLIDE 53

Slide

Fix Transitive Ordering

21

⟨fib(k)⟩ ⟨fib(k-1)⟩ ⟨fib(k-2)⟩ ⟨sum()⟩ ⟨later()⟩ e f g e e’ g g’

slide-54
SLIDE 54

Slide

Related Work

  • SSA for Parallel Programs
  • J. Lee, S. Midkiff, and D. A. Padua. Concurrent Static Single Assignment Form and

Constant Propagation for Explicitly Parallel Programs

  • H. Srinivasan, J. Hook, and M. Wolfe. Static Single Assignment Form for Explicitly

Parallel Programs

  • OpenMP and Cilk
  • K. Randall. Cilk: Efficient Multithreaded Computing.
  • Erbium
  • C. Miranda, P

. Dumont, A. Cohen, M. Duranton, and A. Pop. Erbium: A Deterministic, Concurrent Intermediate Representation for Portable and Scalable Performance

22

slide-55
SLIDE 55

Slide

Concluding Remarks

  • Current compiler representations lack support for

parallel constructs

  • pCPS allows a compiler to incrementally increase

(and decrease) parallelism

  • Starting from a sequential program
  • By adding/removing edges
  • By combining/splitting tasks
  • Different independent optimizations can be

integrated into a single optimizing compiler

23

slide-56
SLIDE 56

Questions?