Schedule Trees Sven Verdoolaege 1 Serge Guelton 2 Tobias Grosser 3 - - PowerPoint PPT Presentation

schedule trees
SMART_READER_LITE
LIVE PREVIEW

Schedule Trees Sven Verdoolaege 1 Serge Guelton 2 Tobias Grosser 3 - - PowerPoint PPT Presentation

January 20, 2014 1 / 21 Schedule Trees Sven Verdoolaege 1 Serge Guelton 2 Tobias Grosser 3 Albert Cohen 3 1 INRIA, Ecole Normale Sup erieure and KU Leuven 2 Ecole Normale Sup erieure and T el ecom Bretagne 3 INRIA and Ecole


slide-1
SLIDE 1

January 20, 2014 1 / 21

Schedule Trees

Sven Verdoolaege1 Serge Guelton2 Tobias Grosser3 Albert Cohen3

1INRIA, ´

Ecole Normale Sup´ erieure and KU Leuven

2 ´

Ecole Normale Sup´ erieure and T´ el´ ecom Bretagne

3INRIA and ´

Ecole Normale Sup´ erieure

January 20, 2014

slide-2
SLIDE 2

January 20, 2014 2 / 21

Outline

1

Introduction Example Single Statement Multiple Statements Schedule Trees

2

Advantages Useful in several contexts More natural More convenient More expressive Extensible

3

Conclusion

slide-3
SLIDE 3

Introduction January 20, 2014 3 / 21

Outline

1

Introduction Example Single Statement Multiple Statements Schedule Trees

2

Advantages Useful in several contexts More natural More convenient More expressive Extensible

3

Conclusion

slide-4
SLIDE 4

Introduction Example January 20, 2014 4 / 21

Introductory Example

for (i = 0; i <= N; ++i) S: a[i] = g(i); for (i = 0; i <= N; ++i) T: b[i] = f(a[N-i]);

slide-5
SLIDE 5

Introduction Example January 20, 2014 4 / 21

Introductory Example

for (i = 0; i <= N; ++i) S: a[i] = g(i); for (i = 0; i <= N; ++i) T: b[i] = f(a[N-i]);

Iteration domain

{ S[i] : 0 ≤ i ≤ N; T[i] : 0 ≤ i ≤ N }

Dependences

{ S[i] → T[N − i] : 0 ≤ i ≤ N }

Execution Order

◮ Original Order

S[0], S[1], S[2], . . . , S[N − 1], S[N], T[0], T[1], T[2], . . . , T[N − 1], T[N]

slide-6
SLIDE 6

Introduction Example January 20, 2014 4 / 21

Introductory Example

for (i = 0; i <= N; ++i) S: a[i] = g(i); for (i = 0; i <= N; ++i) T: b[i] = f(a[N-i]);

Iteration domain

{ S[i] : 0 ≤ i ≤ N; T[i] : 0 ≤ i ≤ N }

Dependences

{ S[i] → T[N − i] : 0 ≤ i ≤ N }

Execution Order

◮ Original Order

S[0], S[1], S[2], . . . , S[N − 1], S[N], T[0], T[1], T[2], . . . , T[N − 1], T[N]

◮ Alternative Order

S[0], T[N], S[1], T[N − 1], S[2], T[N − 2], . . . , S[N − 1], T[1], S[N], T[0]

slide-7
SLIDE 7

Introduction Example January 20, 2014 4 / 21

Introductory Example

for (i = 0; i <= N; ++i) S: a[i] = g(i); for (i = 0; i <= N; ++i) T: b[i] = f(a[N-i]); for (i = 0; i <= N; ++i) { a[i] = g(i); b[N-i] = f(a[i]); }

Iteration domain

{ S[i] : 0 ≤ i ≤ N; T[i] : 0 ≤ i ≤ N }

Dependences

{ S[i] → T[N − i] : 0 ≤ i ≤ N }

Execution Order

◮ Original Order

S[0], S[1], S[2], . . . , S[N − 1], S[N], T[0], T[1], T[2], . . . , T[N − 1], T[N]

◮ Alternative Order

S[0], T[N], S[1], T[N − 1], S[2], T[N − 2], . . . , S[N − 1], T[1], S[N], T[0]

slide-8
SLIDE 8

Introduction Single Statement January 20, 2014 5 / 21

Expressing Transformations (Single Statement)

for (i = 0; i <= N; ++i) b[i] = f(a[N-i]); ⇒ for (i = 0; i <= N; ++i) b[N-i] = f(a[i]);

slide-9
SLIDE 9

Introduction Single Statement January 20, 2014 5 / 21

Expressing Transformations (Single Statement)

for (i = 0; i <= N; ++i) b[i] = f(a[N-i]); ⇒ for (i = 0; i <= N; ++i) b[N-i] = f(a[i]);

Two approaches

1

Modify Iteration Domain

T[i] → T′[N − i]

◮ iteration domains have implicit execution order (lexicographic order) ◮ AST generator takes modified iteration domain as input ◮ access relations and dependence relations are adjusted accordingly

slide-10
SLIDE 10

Introduction Single Statement January 20, 2014 5 / 21

Expressing Transformations (Single Statement)

for (i = 0; i <= N; ++i) b[i] = f(a[N-i]); ⇒ for (i = 0; i <= N; ++i) b[N-i] = f(a[i]);

Two approaches

1

Modify Iteration Domain

T[i] → T′[N − i]

◮ iteration domains have implicit execution order (lexicographic order) ◮ AST generator takes modified iteration domain as input ◮ access relations and dependence relations are adjusted accordingly 2

Explicit Schedule

T[i] → [N − i]

◮ iteration domains have no implicit execution order ◮ execution order is determined by schedule space (lexicographic order) ◮ AST generator takes iteration domain and schedule as input ◮ schedule is typically a piecewise quasi-affine function

slide-11
SLIDE 11

Introduction Single Statement January 20, 2014 5 / 21

Expressing Transformations (Single Statement)

for (i = 0; i <= N; ++i) b[i] = f(a[N-i]); ⇒ for (i = 0; i <= N; ++i) b[N-i] = f(a[i]);

Two approaches

1

Modify Iteration Domain

T[i] → T′[N − i]

◮ iteration domains have implicit execution order (lexicographic order) ◮ AST generator takes modified iteration domain as input ◮ access relations and dependence relations are adjusted accordingly 2

Explicit Schedule

T[i] → [N − i]

◮ iteration domains have no implicit execution order ◮ execution order is determined by schedule space (lexicographic order) ◮ AST generator takes iteration domain and schedule as input ◮ schedule is typically a piecewise quasi-affine function

slide-12
SLIDE 12

Introduction Single Statement January 20, 2014 5 / 21

Expressing Transformations (Single Statement)

for (i = 0; i <= N; ++i) b[i] = f(a[N-i]); ⇒ for (i = 0; i <= N; ++i) b[N-i] = f(a[i]);

Two approaches

1

Modify Iteration Domain

T[i] → T′[N − i]

◮ iteration domains have implicit execution order (lexicographic order) ◮ AST generator takes modified iteration domain as input ◮ access relations and dependence relations are adjusted accordingly 2

Explicit Schedule

T[i] → [N − i]

◮ iteration domains have no implicit execution order ◮ execution order is determined by schedule space (lexicographic order) ◮ AST generator takes iteration domain and schedule as input ◮ schedule is typically a piecewise quasi-affine function

slide-13
SLIDE 13

Introduction Multiple Statements January 20, 2014 6 / 21

Representing Schedules for Multiple Statements

for (i = 0; i <= N; ++i) a[i] = g(i); for (i = 0; i <= N; ++i) b[i] = f(a[N-i]); for (i = 0; i <= N; ++i) { a[i] = g(i); b[N-i] = f(a[i]); }

first S[i]

S[i] → [i]

then T[i]

T[i] → [i] S[i] → [i]; T[i] → [N − i]

first S[i] then T[i]

slide-14
SLIDE 14

Introduction Multiple Statements January 20, 2014 6 / 21

Representing Schedules for Multiple Statements

for (i = 0; i <= N; ++i) a[i] = g(i); for (i = 0; i <= N; ++i) b[i] = f(a[N-i]); for (i = 0; i <= N; ++i) { a[i] = g(i); b[N-i] = f(a[i]); }

first S[i]

S[i] → [i]

then T[i]

T[i] → [i] S[i] → [i]; T[i] → [N − i]

first S[i] then T[i] Kelly

S : { [i] → [0, i] } T : { [i] → [1, i] } S : { [i] → [i, 0] } T : { [i] → [N − i, 1] } ⇒ encode statement ordering in affine function

slide-15
SLIDE 15

Introduction Multiple Statements January 20, 2014 6 / 21

Representing Schedules for Multiple Statements

for (i = 0; i <= N; ++i) a[i] = g(i); for (i = 0; i <= N; ++i) b[i] = f(a[N-i]); for (i = 0; i <= N; ++i) { a[i] = g(i); b[N-i] = f(a[i]); }

first S[i]

S[i] → [i]

then T[i]

T[i] → [i] S[i] → [i]; T[i] → [N − i]

first S[i] then T[i] Kelly

S : { [i] → [0, i] } T : { [i] → [1, i] } S : { [i] → [i, 0] } T : { [i] → [N − i, 1] }

union map

{ S[i] → [0, i]; T[i] → [1, i] } { S[i] → [i, 0]; T[i] → [N − i, 1] } ⇒ encode statement ordering in affine function

slide-16
SLIDE 16

Introduction Multiple Statements January 20, 2014 6 / 21

Representing Schedules for Multiple Statements

for (i = 0; i <= N; ++i) a[i] = g(i); for (i = 0; i <= N; ++i) b[i] = f(a[N-i]); for (i = 0; i <= N; ++i) { a[i] = g(i); b[N-i] = f(a[i]); }

schedule tree sequence

S[i] S[i] → [i] T[i] T[i] → [i] S[i] → [i]; T[i] → [N − i]

sequence

S[i] T[i]

Kelly

S : { [i] → [0, i] } T : { [i] → [1, i] } S : { [i] → [i, 0] } T : { [i] → [N − i, 1] }

union map

{ S[i] → [0, i]; T[i] → [1, i] } { S[i] → [i, 0]; T[i] → [N − i, 1] }

slide-17
SLIDE 17

Introduction Multiple Statements January 20, 2014 6 / 21

Representing Schedules for Multiple Statements

for (i = 0; i <= N; ++i) a[i] = g(i); for (i = 0; i <= N; ++i) b[i] = f(a[N-i]); for (i = 0; i <= N; ++i) { a[i] = g(i); b[N-i] = f(a[i]); }

schedule tree sequence

S[i] S[i] → [i] T[i] T[i] → [i] S[i] → [i]; T[i] → [N − i]

sequence

S[i] T[i]

Kelly

S : { [i] → [0, i] } T : { [i] → [1, i] } S : { [i] → [i, 0] } T : { [i] → [N − i, 1] }

union map

{ S[i] → [0, i]; T[i] → [1, i] } { S[i] → [i, 0]; T[i] → [N − i, 1] }

Other representations: “2d + 1”: special case of Kelly’s abstraction band forest: precursor to schedule trees

slide-18
SLIDE 18

Introduction Schedule Trees January 20, 2014 7 / 21

Schedule Trees

sequence

{ S[i] } { S[i] → [i] } { T[i] } { T[i] → [i] }

Core node types

◮ Band: multi-dimensional piecewise quasi-affine partial schedule ◮ Filter: selects statement instances that are executed by descendants ◮ Sequence: children executed in given order ◮ Set: children executed in arbitrary order

slide-19
SLIDE 19

Introduction Schedule Trees January 20, 2014 7 / 21

Schedule Trees

sequence

{ S[i] } { S[i] → [i] } { T[i] } { T[i] → [i] }

Core node types

◮ Band: multi-dimensional piecewise quasi-affine partial schedule ◮ Filter: selects statement instances that are executed by descendants ◮ Sequence: children executed in given order ◮ Set: children executed in arbitrary order

slide-20
SLIDE 20

Introduction Schedule Trees January 20, 2014 7 / 21

Schedule Trees

sequence

{ S[i] } { S[i] → [i] } { T[i] } { T[i] → [i] }

Core node types

◮ Band: multi-dimensional piecewise quasi-affine partial schedule ◮ Filter: selects statement instances that are executed by descendants ◮ Sequence: children executed in given order ◮ Set: children executed in arbitrary order

slide-21
SLIDE 21

Introduction Schedule Trees January 20, 2014 7 / 21

Schedule Trees

sequence

{ S[i] } { S[i] → [i] } { T[i] } { T[i] → [i] }

Core node types

◮ Band: multi-dimensional piecewise quasi-affine partial schedule ◮ Filter: selects statement instances that are executed by descendants ◮ Sequence: children executed in given order ◮ Set: children executed in arbitrary order

slide-22
SLIDE 22

Introduction Schedule Trees January 20, 2014 7 / 21

Schedule Trees

sequence

{ S[i] } { S[i] → [i] } { T[i] } { T[i] → [i] }

Core node types

◮ Band: multi-dimensional piecewise quasi-affine partial schedule ◮ Filter: selects statement instances that are executed by descendants ◮ Sequence: children executed in given order ◮ Set: children executed in arbitrary order

“External” node types

◮ Domain: set of statement instances to be scheduled ◮ Context: external constraints on symbolic constants

slide-23
SLIDE 23

Introduction Schedule Trees January 20, 2014 7 / 21

Schedule Trees

{ S[i] : 0 ≤ i ≤ N; T[i] : 0 ≤ i ≤ N }

sequence

{ S[i] } { S[i] → [i] } { T[i] } { T[i] → [i] }

Core node types

◮ Band: multi-dimensional piecewise quasi-affine partial schedule ◮ Filter: selects statement instances that are executed by descendants ◮ Sequence: children executed in given order ◮ Set: children executed in arbitrary order

“External” node types

◮ Domain: set of statement instances to be scheduled ◮ Context: external constraints on symbolic constants

slide-24
SLIDE 24

Introduction Schedule Trees January 20, 2014 7 / 21

Schedule Trees

{ : N mod 256 = 0 } { S[i] : 0 ≤ i ≤ N; T[i] : 0 ≤ i ≤ N }

sequence

{ S[i] } { S[i] → [i] } { T[i] } { T[i] → [i] }

Core node types

◮ Band: multi-dimensional piecewise quasi-affine partial schedule ◮ Filter: selects statement instances that are executed by descendants ◮ Sequence: children executed in given order ◮ Set: children executed in arbitrary order

“External” node types

◮ Domain: set of statement instances to be scheduled ◮ Context: external constraints on symbolic constants

slide-25
SLIDE 25

Introduction Schedule Trees January 20, 2014 7 / 21

Schedule Trees

{ : N mod 256 = 0 } { S[i] : 0 ≤ i ≤ N; T[i] : 0 ≤ i ≤ N }

sequence

{ S[i] } { S[i] → [i] } { T[i] } { T[i] → [i] }

Core node types

◮ Band: multi-dimensional piecewise quasi-affine partial schedule ◮ Filter: selects statement instances that are executed by descendants ◮ Sequence: children executed in given order ◮ Set: children executed in arbitrary order

“External” node types

◮ Domain: set of statement instances to be scheduled ◮ Context: external constraints on symbolic constants

Convenience node types

◮ Mark: attach additional information to subtrees ◮ Leaf: for easy navigation

slide-26
SLIDE 26

Introduction Schedule Trees January 20, 2014 8 / 21

Comparison

T1 :{[i]

→ [0, i ]}

T2 :{[i, j]

→ [1, j,

0, i

]}

T3 :{[i]

→ [1, i − 1,

1

]} { S1[i] → [0, i, 0, 0];

S2[i, j] → [1, j, 0, i]; S3[i] → [1, i − 1, 1, 0] } sequence S1[i] S1[i] → [i] S2[i, j]; S3[i] S2[i, j] → [j]; S3[i] → [i − 1] sequence S2[i, j] S2[i, j] → [i] S3[i] Kelly’s abstraction

◮ schedule spread over statements ◮ relaxed lexicographic order

union maps

◮ single object ◮ strict lexicographic order ◮ schedule transformations can be composed

schedule trees

◮ single object ◮ relaxed lexicographic order

slide-27
SLIDE 27

Advantages January 20, 2014 9 / 21

Outline

1

Introduction Example Single Statement Multiple Statements Schedule Trees

2

Advantages Useful in several contexts More natural More convenient More expressive Extensible

3

Conclusion

slide-28
SLIDE 28

Advantages Useful in several contexts January 20, 2014 10 / 21

Schedule Uses

Representing the original execution order

◮ Input to dependence analysis (in isl) ◮ Basis for manual/incremental transformations

Scheduling

◮ Construction based on dependences ◮ Schedule modifications

AST generation

◮ Generate AST from schedule

slide-29
SLIDE 29

Advantages Useful in several contexts January 20, 2014 10 / 21

Schedule Uses

dependence analysis dependences scheduling extract

  • riginal order

schedule AST generation transformation Representing the original execution order

◮ Input to dependence analysis (in isl) ◮ Basis for manual/incremental transformations

Scheduling

◮ Construction based on dependences ◮ Schedule modifications

AST generation

◮ Generate AST from schedule

slide-30
SLIDE 30

Advantages Useful in several contexts January 20, 2014 11 / 21

Schedule Trees Everywhere

Old PPCG: C code parse internal tree encode union map decode internal tree dependence analysis dependences scheduler band forest tile band forest encode union map decode internal tree AST generator AST

slide-31
SLIDE 31

Advantages Useful in several contexts January 20, 2014 11 / 21

Schedule Trees Everywhere

Old PPCG: C code parse internal tree encode union map decode internal tree dependence analysis dependences scheduler band forest tile band forest encode union map decode internal tree AST generator AST New PPCG: C code parse schedule tree dependence analysis dependences scheduler schedule tree tile schedule tree AST generator AST

slide-32
SLIDE 32

Advantages More natural January 20, 2014 12 / 21

Schedule Construction Example

for (i = 0; i <= N; ++i) S: a[i] = g(i); for (i = 0; i <= N; ++i) T: b[i] = f(a[N-i]); U:c = 0;

Iteration domain

{ S[i] : 0 ≤ i ≤ N; T[i] : 0 ≤ i ≤ N; U[] }

Dependences

{ S[i] → T[N − i] : 0 ≤ i ≤ N }

slide-33
SLIDE 33

Advantages More natural January 20, 2014 12 / 21

Schedule Construction Example

for (i = 0; i <= N; ++i) S: a[i] = g(i); for (i = 0; i <= N; ++i) T: b[i] = f(a[N-i]); U:c = 0;

Iteration domain

{ S[i] : 0 ≤ i ≤ N; T[i] : 0 ≤ i ≤ N; U[] }

Dependences

{ S[i] → T[N − i] : 0 ≤ i ≤ N }

set

U[] ⊥ S[i]; T[i] S[i] → [i]; T[i] → [N − i]

sequence

S[i] ⊥ T[i] ⊥

slide-34
SLIDE 34

Advantages More natural January 20, 2014 12 / 21

Schedule Construction Example

for (i = 0; i <= N; ++i) S: a[i] = g(i); for (i = 0; i <= N; ++i) T: b[i] = f(a[N-i]); U:c = 0;

Iteration domain

{ S[i] : 0 ≤ i ≤ N; T[i] : 0 ≤ i ≤ N; U[] }

Dependences set

U[] ⊥ S[i]; T[i] S[i] → [i]; T[i] → [N − i]

sequence

S[i] ⊥ T[i] ⊥

slide-35
SLIDE 35

Advantages More natural January 20, 2014 12 / 21

Schedule Construction Example

for (i = 0; i <= N; ++i) S: a[i] = g(i); for (i = 0; i <= N; ++i) T: b[i] = f(a[N-i]); U:c = 0;

Iteration domain

{ S[i] : 0 ≤ i ≤ N; T[i] : 0 ≤ i ≤ N; U[] }

Dependences set

U[] ⊥ S[i]; T[i] S[i] → [i]; T[i] → [N − i]

sequence

S[i] ⊥ T[i] ⊥

slide-36
SLIDE 36

Advantages More natural January 20, 2014 12 / 21

Schedule Construction Example

for (i = 0; i <= N; ++i) S: a[i] = g(i); for (i = 0; i <= N; ++i) T: b[i] = f(a[N-i]); U:c = 0;

Iteration domain

{ S[i] : 0 ≤ i ≤ N; T[i] : 0 ≤ i ≤ N; U[] }

Dependences

{ S[i] → T[N − i] : 0 ≤ i ≤ N }

set

U[] ⊥ S[i]; T[i] S[i] → [i]; T[i] → [N − i]

sequence

S[i] ⊥ T[i] ⊥

slide-37
SLIDE 37

Advantages More natural January 20, 2014 12 / 21

Schedule Construction Example

for (i = 0; i <= N; ++i) S: a[i] = g(i); for (i = 0; i <= N; ++i) T: b[i] = f(a[N-i]); U:c = 0;

Iteration domain

{ S[i] : 0 ≤ i ≤ N; T[i] : 0 ≤ i ≤ N; U[] }

Dependences

{ S[i] → T[N − i] : 0 ≤ i ≤ N }

set

U[] ⊥ S[i]; T[i] S[i] → [i]; T[i] → [N − i]

sequence

S[i] ⊥ T[i] ⊥

slide-38
SLIDE 38

Advantages More natural January 20, 2014 12 / 21

Schedule Construction Example

for (i = 0; i <= N; ++i) S: a[i] = g(i); for (i = 0; i <= N; ++i) T: b[i] = f(a[N-i]); U:c = 0;

Iteration domain

{ S[i] : 0 ≤ i ≤ N; T[i] : 0 ≤ i ≤ N; U[] }

Dependences

{ S[i] → T[N − i] : 0 ≤ i ≤ N }

set

U[] ⊥ S[i]; T[i] S[i] → [i]; T[i] → [N − i]

sequence

S[i] ⊥ T[i] ⊥

slide-39
SLIDE 39

Advantages More natural January 20, 2014 12 / 21

Schedule Construction Example

for (i = 0; i <= N; ++i) S: a[i] = g(i); for (i = 0; i <= N; ++i) T: b[i] = f(a[N-i]); U:c = 0;

Iteration domain

{ S[i] : 0 ≤ i ≤ N; T[i] : 0 ≤ i ≤ N; U[] }

Dependences set

U[] ⊥ S[i]; T[i] S[i] → [i]; T[i] → [N − i]

sequence

S[i] ⊥ T[i] ⊥

slide-40
SLIDE 40

Advantages More natural January 20, 2014 12 / 21

Schedule Construction Example

for (i = 0; i <= N; ++i) S: a[i] = g(i); for (i = 0; i <= N; ++i) T: b[i] = f(a[N-i]); U:c = 0;

Iteration domain

{ S[i] : 0 ≤ i ≤ N; T[i] : 0 ≤ i ≤ N; U[] }

Dependences set

U[] ⊥ S[i]; T[i] S[i] → [i]; T[i] → [N − i]

sequence

S[i] ⊥ T[i] ⊥

slide-41
SLIDE 41

Advantages More natural January 20, 2014 12 / 21

Schedule Construction Example

for (i = 0; i <= N; ++i) S: a[i] = g(i); for (i = 0; i <= N; ++i) T: b[i] = f(a[N-i]); U:c = 0;

Iteration domain

{ S[i] : 0 ≤ i ≤ N; T[i] : 0 ≤ i ≤ N; U[] }

Dependences set

U[] ⊥ S[i]; T[i] S[i] → [i]; T[i] → [N − i]

sequence

S[i] ⊥ T[i] ⊥

slide-42
SLIDE 42

Advantages More natural January 20, 2014 12 / 21

Schedule Construction Example

for (i = 0; i <= N; ++i) S: a[i] = g(i); for (i = 0; i <= N; ++i) T: b[i] = f(a[N-i]); U:c = 0;

Iteration domain

{ S[i] : 0 ≤ i ≤ N; T[i] : 0 ≤ i ≤ N; U[] }

Dependences set

U[] ⊥ S[i]; T[i] S[i] → [i]; T[i] → [N − i]

sequence

S[i] ⊥ T[i] ⊥

slide-43
SLIDE 43

Advantages More natural January 20, 2014 12 / 21

Schedule Construction Example

for (i = 0; i <= N; ++i) S: a[i] = g(i); for (i = 0; i <= N; ++i) T: b[i] = f(a[N-i]); U:c = 0;

Iteration domain

{ S[i] : 0 ≤ i ≤ N; T[i] : 0 ≤ i ≤ N; U[] }

Dependences set

U[] ⊥ S[i]; T[i] S[i] → [i]; T[i] → [N − i]

sequence

S[i] ⊥ T[i] ⊥ ⇒ natural representation of constructed schedule

slide-44
SLIDE 44

Advantages More convenient January 20, 2014 13 / 21

Local Transformations

Typical scenario:

1

Construct tilable bands (e.g., using Pluto algorithm)

2

Individually tile (some) tilable bands

◮ Given a band D(i) → f(i), insert a band D(i) → f(i)/S ◮ First iterate over blocks of size S and then iterate within each block

slide-45
SLIDE 45

Advantages More convenient January 20, 2014 13 / 21

Local Transformations

Typical scenario:

1

Construct tilable bands (e.g., using Pluto algorithm)

2

Individually tile (some) tilable bands

◮ Given a band D(i) → f(i), insert a band D(i) → f(i)/S ◮ First iterate over blocks of size S and then iterate within each block

Tiled individually:

◮ bands of different dimensionality ◮ different tile sizes S per band

slide-46
SLIDE 46

Advantages More convenient January 20, 2014 13 / 21

Local Transformations

Typical scenario:

1

Construct tilable bands (e.g., using Pluto algorithm)

2

Individually tile (some) tilable bands

◮ Given a band D(i) → f(i), insert a band D(i) → f(i)/S ◮ First iterate over blocks of size S and then iterate within each block

Tiled individually:

◮ bands of different dimensionality ◮ different tile sizes S per band

set S2[i, j, k] S2[i, j, k] → (k, j) S2[i, j, k] → (i) S1[i, j]; S3[i, j, k] S1[i, j] → (i, j, 0); S3[i, j, k] → (i, j, k) S1[i, j] → (⌊i/s0⌋ , ⌊j/s1⌋ , 0); S3[i, j, k] → (⌊i/s0⌋ , ⌊j/s1⌋ , ⌊k/s2⌋) S1[i, j] → (i, j, 0); S3[i, j, k] → (i, j, k)

slide-47
SLIDE 47

Advantages More convenient January 20, 2014 13 / 21

Local Transformations

Typical scenario:

1

Construct tilable bands (e.g., using Pluto algorithm)

2

Individually tile (some) tilable bands

◮ Given a band D(i) → f(i), insert a band D(i) → f(i)/S ◮ First iterate over blocks of size S and then iterate within each block

Tiled individually:

◮ bands of different dimensionality ◮ different tile sizes S per band

set S2[i, j, k] S2[i, j, k] → (k, j) S2[i, j, k] → (i) S1[i, j]; S3[i, j, k] S1[i, j] → (i, j, 0); S3[i, j, k] → (i, j, k) S1[i, j] → (⌊i/s0⌋ , ⌊j/s1⌋ , 0); S3[i, j, k] → (⌊i/s0⌋ , ⌊j/s1⌋ , ⌊k/s2⌋) S1[i, j] → (i, j, 0); S3[i, j, k] → (i, j, k)

slide-48
SLIDE 48

Advantages More convenient January 20, 2014 13 / 21

Local Transformations

Typical scenario:

1

Construct tilable bands (e.g., using Pluto algorithm)

2

Individually tile (some) tilable bands

◮ Given a band D(i) → f(i), insert a band D(i) → f(i)/S ◮ First iterate over blocks of size S and then iterate within each block

Tiled individually:

◮ bands of different dimensionality ◮ different tile sizes S per band

set S2[i, j, k] S2[i, j, k] → (k, j) S2[i, j, k] → (i) S1[i, j]; S3[i, j, k] S1[i, j] → (i, j, 0); S3[i, j, k] → (i, j, k) S1[i, j] → (⌊i/s0⌋ , ⌊j/s1⌋ , 0); S3[i, j, k] → (⌊i/s0⌋ , ⌊j/s1⌋ , ⌊k/s2⌋) S1[i, j] → (i, j, 0); S3[i, j, k] → (i, j, k)

slide-49
SLIDE 49

Advantages More convenient January 20, 2014 14 / 21

Local Transformations

Schedule Tree: set S2[i, j, k] S2[i, j, k] → (k, j) S2[i, j, k] → (i) S1[i, j]; S3[i, j, k] S1[i, j] → (i, j, 0); S3[i, j, k] → (i, j, k)

slide-50
SLIDE 50

Advantages More convenient January 20, 2014 14 / 21

Local Transformations

Schedule Tree: set S2[i, j, k] S2[i, j, k] → (k, j) S2[i, j, k] → (i) S1[i, j]; S3[i, j, k] S1[i, j] → (i, j, 0); S3[i, j, k] → (i, j, k) Kelly’s abstraction: T1 : { [i, j] → [1, i, j, 0] } T2 : { [i, j, k] → [0, k, j, i] } T3 : { [i, j, k] → [1, i, j, k] } How to identify node that needs to be tiled? interval of dimensions list of statements or values for set/sequence encodings

slide-51
SLIDE 51

Advantages More convenient January 20, 2014 14 / 21

Local Transformations

Schedule Tree: set S2[i, j, k] S2[i, j, k] → (k, j) S2[i, j, k] → (i) S1[i, j]; S3[i, j, k] S1[i, j] → (i, j, 0); S3[i, j, k] → (i, j, k) Kelly’s abstraction: T1 : { [i, j] → [1, i, j, 0] } T2 : { [i, j, k] → [0, k, j, i] } T3 : { [i, j, k] → [1, i, j, k] } How to identify node that needs to be tiled? interval of dimensions list of statements or values for set/sequence encodings Union map representation additionally requires alignment of single schedule space

slide-52
SLIDE 52

Advantages More expressive January 20, 2014 15 / 21

CARP Project

Design tools and techniques to aid Correct and Efficient Accelerator Programming

slide-53
SLIDE 53

Advantages More expressive January 20, 2014 16 / 21

Advanced Use: CUDA/OpenCL Code Generation

Schedule tree logically split into two parts

◮ Outer part mapped to host code ◮ Subtrees mapped to device code

Device part has additional symbolic constants ⇒ block and thread identifiers ⇒ internal context nodes Each thread executes only part of iteration domain ⇒ selected using filter nodes

slide-54
SLIDE 54

Advantages More expressive January 20, 2014 16 / 21

Advanced Use: CUDA/OpenCL Code Generation

Schedule tree logically split into two parts

◮ Outer part mapped to host code ◮ Subtrees mapped to device code

Device part has additional symbolic constants ⇒ block and thread identifiers ⇒ internal context nodes Each thread executes only part of iteration domain ⇒ selected using filter nodes Old PPCG used nested AST generation

⇒ difficult to understand and debug

slide-55
SLIDE 55

Advantages More expressive January 20, 2014 17 / 21

Advanced Use: CUDA/OpenCL Code Generation

for (t = 0; t < T; t++) { for (i = 1; i < N - 1; i++) B[i] = 0.33333 * (A[i-1] + A[i] + A[i + 1]); for (j = 1; j < N - 1; j++) A[j] = B[j]; }

S[t, i] → [t]; t[t, j] → [t] S[t, i] → [0]; t[t, j] → [1] set T[t, j] mark: kernel 0 ≤ b < 32768 ∧ 0 ≤ t < 32 T[t, j] : b = ⌊j/32⌋ mod 32768 T[t, j] → ⌊j/32⌋ T[t, j] : t = j mod 32 T[t, j] → j mod 32 S[t, i] mark: kernel 0 ≤ b < 32768 ∧ 0 ≤ t < 32 S[t, i] : b = ⌊i/32⌋ mod 32768 S[t, i] → ⌊i/32⌋ S[t, i] : t = i mod 32 S[t, i] → i mod 32

slide-56
SLIDE 56

Advantages More expressive January 20, 2014 17 / 21

Advanced Use: CUDA/OpenCL Code Generation

for (t = 0; t < T; t++) { for (i = 1; i < N - 1; i++) B[i] = 0.33333 * (A[i-1] + A[i] + A[i + 1]); for (j = 1; j < N - 1; j++) A[j] = B[j]; }

S[t, i] → [t]; t[t, j] → [t] S[t, i] → [0]; t[t, j] → [1] set T[t, j] mark: kernel 0 ≤ b < 32768 ∧ 0 ≤ t < 32 T[t, j] : b = ⌊j/32⌋ mod 32768 T[t, j] → ⌊j/32⌋ T[t, j] : t = j mod 32 T[t, j] → j mod 32 S[t, i] mark: kernel 0 ≤ b < 32768 ∧ 0 ≤ t < 32 S[t, i] : b = ⌊i/32⌋ mod 32768 S[t, i] → ⌊i/32⌋ S[t, i] : t = i mod 32 S[t, i] → i mod 32 subtree mapped to device

slide-57
SLIDE 57

Advantages More expressive January 20, 2014 17 / 21

Advanced Use: CUDA/OpenCL Code Generation

for (t = 0; t < T; t++) { for (i = 1; i < N - 1; i++) B[i] = 0.33333 * (A[i-1] + A[i] + A[i + 1]); for (j = 1; j < N - 1; j++) A[j] = B[j]; }

S[t, i] → [t]; t[t, j] → [t] S[t, i] → [0]; t[t, j] → [1] set T[t, j] mark: kernel 0 ≤ b < 32768 ∧ 0 ≤ t < 32 T[t, j] : b = ⌊j/32⌋ mod 32768 T[t, j] → ⌊j/32⌋ T[t, j] : t = j mod 32 T[t, j] → j mod 32 S[t, i] mark: kernel 0 ≤ b < 32768 ∧ 0 ≤ t < 32 S[t, i] : b = ⌊i/32⌋ mod 32768 S[t, i] → ⌊i/32⌋ S[t, i] : t = i mod 32 S[t, i] → i mod 32 introduce identifiers

slide-58
SLIDE 58

Advantages More expressive January 20, 2014 17 / 21

Advanced Use: CUDA/OpenCL Code Generation

for (t = 0; t < T; t++) { for (i = 1; i < N - 1; i++) B[i] = 0.33333 * (A[i-1] + A[i] + A[i + 1]); for (j = 1; j < N - 1; j++) A[j] = B[j]; }

S[t, i] → [t]; t[t, j] → [t] S[t, i] → [0]; t[t, j] → [1] set T[t, j] mark: kernel 0 ≤ b < 32768 ∧ 0 ≤ t < 32 T[t, j] : b = ⌊j/32⌋ mod 32768 T[t, j] → ⌊j/32⌋ T[t, j] : t = j mod 32 T[t, j] → j mod 32 S[t, i] mark: kernel 0 ≤ b < 32768 ∧ 0 ≤ t < 32 S[t, i] : b = ⌊i/32⌋ mod 32768 S[t, i] → ⌊i/32⌋ S[t, i] : t = i mod 32 S[t, i] → i mod 32 filter on identifiers

slide-59
SLIDE 59

Advantages Extensible January 20, 2014 18 / 21

Extension

In final stages of scheduling, additional statements may need to be added Copy code Synchronization . . . These additional statements depend on ancestors the statements should only be executed in a given part of the schedule tree iteration domains depend on outer schedule (e.g., data to be copied)

⇒ new “extension” node type ⇒ maps outer schedule dimensions to extra iteration domain

slide-60
SLIDE 60

Advantages Extensible January 20, 2014 19 / 21

Extension

0 ≤ b0, b1 < 128 ∧ 0 ≤ t0 < 32 ∧ 0 ≤ t1 < 16 S0[i, j] : b0 = ⌊i/32⌋ mod 128 ∧ b1 = ⌊j/32⌋ mod 128; S1[i, j, k] : b0 = ⌊i/32⌋ mod 128 ∧ b1 = ⌊j/32⌋ mod 128 [] → write C[u, v] : 0 ≤ u, v ≤ 4095 ∧ b0 = ⌊u/32⌋ ∧ b1 = ⌊v/32⌋ sequence S0[i, j]; S1[i, j, k] S0[i, j] → [⌊i/32⌋ , ⌊j/32⌋]; S1[i, j.k] → [⌊i/32⌋ , ⌊j/32⌋] S0[i, j] → [0]; S1[i, j.k] → [⌊k/32⌋] [i0, i1, i2] → sync[]; [i0, i1, i2] → read A[u, v] : 0 ≤ u, v ≤ 4095 ∧ b0 = ⌊u/32⌋ ∧ i2 = ⌊v/32⌋ ; [i0, i1, i2] → read B[u, v] : . . . write C[u, v] write C[32b0 + t0, v] : t1 = v mod 16 write C[u, v] → [u, v]

slide-61
SLIDE 61

Advantages Extensible January 20, 2014 19 / 21

Extension

0 ≤ b0, b1 < 128 ∧ 0 ≤ t0 < 32 ∧ 0 ≤ t1 < 16 S0[i, j] : b0 = ⌊i/32⌋ mod 128 ∧ b1 = ⌊j/32⌋ mod 128; S1[i, j, k] : b0 = ⌊i/32⌋ mod 128 ∧ b1 = ⌊j/32⌋ mod 128 [] → write C[u, v] : 0 ≤ u, v ≤ 4095 ∧ b0 = ⌊u/32⌋ ∧ b1 = ⌊v/32⌋ sequence S0[i, j]; S1[i, j, k] S0[i, j] → [⌊i/32⌋ , ⌊j/32⌋]; S1[i, j.k] → [⌊i/32⌋ , ⌊j/32⌋] S0[i, j] → [0]; S1[i, j.k] → [⌊k/32⌋] [i0, i1, i2] → sync[]; [i0, i1, i2] → read A[u, v] : 0 ≤ u, v ≤ 4095 ∧ b0 = ⌊u/32⌋ ∧ i2 = ⌊v/32⌋ ; [i0, i1, i2] → read B[u, v] : . . . write C[u, v] write C[32b0 + t0, v] : t1 = v mod 16 write C[u, v] → [u, v]

slide-62
SLIDE 62

Advantages Extensible January 20, 2014 19 / 21

Extension

0 ≤ b0, b1 < 128 ∧ 0 ≤ t0 < 32 ∧ 0 ≤ t1 < 16 S0[i, j] : b0 = ⌊i/32⌋ mod 128 ∧ b1 = ⌊j/32⌋ mod 128; S1[i, j, k] : b0 = ⌊i/32⌋ mod 128 ∧ b1 = ⌊j/32⌋ mod 128 [] → write C[u, v] : 0 ≤ u, v ≤ 4095 ∧ b0 = ⌊u/32⌋ ∧ b1 = ⌊v/32⌋ sequence S0[i, j]; S1[i, j, k] S0[i, j] → [⌊i/32⌋ , ⌊j/32⌋]; S1[i, j.k] → [⌊i/32⌋ , ⌊j/32⌋] S0[i, j] → [0]; S1[i, j.k] → [⌊k/32⌋] [i0, i1, i2] → sync[]; [i0, i1, i2] → read A[u, v] : 0 ≤ u, v ≤ 4095 ∧ b0 = ⌊u/32⌋ ∧ i2 = ⌊v/32⌋ ; [i0, i1, i2] → read B[u, v] : . . . write C[u, v] write C[32b0 + t0, v] : t1 = v mod 16 write C[u, v] → [u, v]

slide-63
SLIDE 63

Conclusion January 20, 2014 20 / 21

Outline

1

Introduction Example Single Statement Multiple Statements Schedule Trees

2

Advantages Useful in several contexts More natural More convenient More expressive Extensible

3

Conclusion

slide-64
SLIDE 64

Conclusion January 20, 2014 21 / 21

Conclusion

Conclusion:

Exploit the tree nature of a schedule rather than encoding it in a flat representation

Schedule trees are useful in several contexts more natural more convenient more expressive extensible

slide-65
SLIDE 65

Conclusion January 20, 2014 21 / 21

Conclusion

Conclusion:

Exploit the tree nature of a schedule rather than encoding it in a flat representation

Schedule trees are useful in several contexts more natural more convenient more expressive extensible Future work apply separation on schedule tree additional node types

◮ parametric tiling ◮ clustering ◮ . . .