Polyhedral AST generation is more than scanning polyhedra Tobias - - PowerPoint PPT Presentation

polyhedral ast generation is more than scanning polyhedra
SMART_READER_LITE
LIVE PREVIEW

Polyhedral AST generation is more than scanning polyhedra Tobias - - PowerPoint PPT Presentation

s p c l.in f.e th z .c h @s p c l_e th Polyhedral AST generation is more than scanning polyhedra Tobias Grosser, Sven Verdoolaege, Albert Cohen ETH Zurich, Polly Labs, INRIA TOPLAS - Presented at PLDI16 15. June 2015, Santa


slide-1
SLIDE 1

s p c l.in f.e th z .c h @s p c l_e th

Polyhedral AST generation is more than scanning polyhedra

Tobias Grosser, Sven Verdoolaege, Albert Cohen ETH Zurich, Polly Labs, INRIA

TOPLAS - Presented at PLDI’16

  • 15. June 2015, Santa Barbara, USA

1 / 38

slide-2
SLIDE 2

s p c l.in f.e th z .c h @s p c l_e th

AST Generation at the Heart of Research

PolyMage - ASPLOS’15 Associative Reordering - PLDI’14 Pluto - PLDI’08 LLVM Polly - PPL’12

L U

G LG

LU

GU GG G

L U LU + =

Basic Structured Linear Algebra Compiler - CGO’16 Hybrid-Hexagonal Tiling of Stencils - CGO’14

2 / 38

slide-3
SLIDE 3

s p c l.in f.e th z .c h @s p c l_e th

Hybrid-Hexagonal Tiling for Stencil Computations

3 / 38

slide-4
SLIDE 4

s p c l.in f.e th z .c h @s p c l_e th

Copy code from hybrid hexagonal tiling - Original

for (c2 = 0; c2 <= 1; c2 += 1) for (c3 = 1; c3 <= 4; c3 += 1) for (c4 = max(((t1-c3+130) % 128) + c3 - 2, ((t1+c3+125) % 128) - c3 + 3); c4 <= min(((c2+c3) % 2) + c3 + 128,

  • ((c2+c3) % 2) - c3 + 134);

c4 += 128) if (c3 + c4 >= 7 || (c4 == t1 && c3 + 2 >= t1 && t1 + c3 <= 6 && t1 + c3 >= ((t1 + c2 + 2 * c3 + 1) % 2) + 3 && t1 + 2 >= ((t1 + c2 + 2 * c3 + 1) % 2) + c3) || (c4 == t1 && c3 == 1 && t1 <= 5 && t1 >= 4 && c2 <= 1 && c2 >= 0)) A[c2][6 * b0 + c3][128 * g7 + c4 - 4] = ...;

4 / 38

slide-5
SLIDE 5

s p c l.in f.e th z .c h @s p c l_e th

Copy code from hybrid hexagonal tiling - Unrolled

A[0][6 * b0 + 1][128 * g7 + (t1 + 125) % 128) - 1] = ...; A[0][6 * b0 + 2][128 * g7 + (t1 + 127) % 128) - 3] = ...; if (t1 <= 2 && t1 >= 1) A[0][6 * b0 + 2][128 * g7 + t1 + 128] = ...; A[0][6 * b0 + 3][128 * g7 + (t1 + 127) % 128) - 3] = ...; if (t1 <= 2 && t1 >= 1) A[0][6 * b0 + 3][128 * g7 + t1 + 128] = ...; A[0][6 * b0 + 4][128 * g7 + (t1 + 125) % 128) - 1] = ...; A[1][6 * b0 + 1][128 * g7 + (t1 + 126) % 128) - 2] = ...; A[1][6 * b0 + 2][128 * g7 + (t1 + 126) % 128) - 2] = ...; if (t1 <= 3 && t1 >= 2) A[1][6 * b0 + 2][128 * g7 + t1 + 128] = ...; A[1][6 * b0 + 3][128 * g7 + (t1 + 126) % 128) - 2] = ...; if (t1 <= 3 && t1 >= 2) A[1][6 * b0 + 3][128 * g7 + t1 + 128] = ...; A[1][6 * b0 + 4][128 * g7 + (t1 + 126) % 128) - 2] = ...;

5 / 38

slide-6
SLIDE 6

s p c l.in f.e th z .c h @s p c l_e th

Program for (i = 0; i <= n; i++) for (j = 0; j <= i; j++) S(i,j);

1 1 2 3 4 5

j

1 1 2 3 4 5

i

State of Variables Statement Instances Executed

6 / 38

slide-7
SLIDE 7

s p c l.in f.e th z .c h @s p c l_e th

Program for (i = 0; i <= n; i++) for (j = 0; j <= i; j++) S(i,j);

1 1 2 3 4 5

j

1 1 2 3 4 5

i

State of Variables n = 4, i = 0, j = 0 Statement Instances Executed

S(0,0)

6 / 38

slide-8
SLIDE 8

s p c l.in f.e th z .c h @s p c l_e th

Program for (i = 0; i <= n; i++) for (j = 0; j <= i; j++) S(i,j);

1 1 2 3 4 5

j

1 1 2 3 4 5

i

State of Variables n = 4, i = 1, j = 0 Statement Instances Executed

S(1,0), S(0,0)

6 / 38

slide-9
SLIDE 9

s p c l.in f.e th z .c h @s p c l_e th

Program for (i = 0; i <= n; i++) for (j = 0; j <= i; j++) S(i,j);

1 1 2 3 4 5

j

1 1 2 3 4 5

i

State of Variables n = 4, i = 1, j = 1 Statement Instances Executed

S(1,0), S(1,1) S(0,0)

6 / 38

slide-10
SLIDE 10

s p c l.in f.e th z .c h @s p c l_e th

Program for (i = 0; i <= n; i++) for (j = 0; j <= i; j++) S(i,j);

1 1 2 3 4 5

j

1 1 2 3 4 5

i

State of Variables n = 4, i = 2, j = 0 Statement Instances Executed

S(2,0), S(1,0), S(1,1) S(0,0)

6 / 38

slide-11
SLIDE 11

s p c l.in f.e th z .c h @s p c l_e th

Program for (i = 0; i <= n; i++) for (j = 0; j <= i; j++) S(i,j);

1 1 2 3 4 5

j

1 1 2 3 4 5

i

State of Variables n = 4, i = 2, j = 1 Statement Instances Executed

S(2,0), S(2,1), S(1,0), S(1,1) S(0,0)

6 / 38

slide-12
SLIDE 12

s p c l.in f.e th z .c h @s p c l_e th

Program for (i = 0; i <= n; i++) for (j = 0; j <= i; j++) S(i,j);

1 1 2 3 4 5

j

1 1 2 3 4 5

i

State of Variables n = 4, i = 2, j = 2 Statement Instances Executed

S(2,0), S(2,1), S(2,2) S(1,0), S(1,1) S(0,0)

6 / 38

slide-13
SLIDE 13

s p c l.in f.e th z .c h @s p c l_e th

Program for (i = 0; i <= n; i++) for (j = 0; j <= i; j++) S(i,j);

1 1 2 3 4 5

j

1 1 2 3 4 5

i

State of Variables n = 4, i = 3, j = 0 Statement Instances Executed

S(3,0), S(2,0), S(2,1), S(2,2) S(1,0), S(1,1) S(0,0)

6 / 38

slide-14
SLIDE 14

s p c l.in f.e th z .c h @s p c l_e th

Program for (i = 0; i <= n; i++) for (j = 0; j <= i; j++) S(i,j);

1 1 2 3 4 5

j

1 1 2 3 4 5

i

State of Variables n = 4, i = 3, j = 1 Statement Instances Executed

S(3,0), S(3,1), S(2,0), S(2,1), S(2,2) S(1,0), S(1,1) S(0,0)

6 / 38

slide-15
SLIDE 15

s p c l.in f.e th z .c h @s p c l_e th

Program for (i = 0; i <= n; i++) for (j = 0; j <= i; j++) S(i,j);

1 1 2 3 4 5

j

1 1 2 3 4 5

i

State of Variables n = 4, i = 3, j = 2 Statement Instances Executed

S(3,0), S(3,1), S(3,2), S(2,0), S(2,1), S(2,2) S(1,0), S(1,1) S(0,0)

6 / 38

slide-16
SLIDE 16

s p c l.in f.e th z .c h @s p c l_e th

Program for (i = 0; i <= n; i++) for (j = 0; j <= i; j++) S(i,j);

1 1 2 3 4 5

j

1 1 2 3 4 5

i

State of Variables n = 4, i = 3, j = 3 Statement Instances Executed

S(3,0), S(3,1), S(3,2), S(3,3) S(2,0), S(2,1), S(2,2) S(1,0), S(1,1) S(0,0)

6 / 38

slide-17
SLIDE 17

s p c l.in f.e th z .c h @s p c l_e th

Program for (i = 0; i <= n; i++) for (j = 0; j <= i; j++) S(i,j);

1 1 2 3 4 5

j

1 1 2 3 4 5

i

State of Variables n = 4, i = 4, j = 0 Statement Instances Executed

S(4,0), S(3,0), S(3,1), S(3,2), S(3,3) S(2,0), S(2,1), S(2,2) S(1,0), S(1,1) S(0,0)

6 / 38

slide-18
SLIDE 18

s p c l.in f.e th z .c h @s p c l_e th

Program for (i = 0; i <= n; i++) for (j = 0; j <= i; j++) S(i,j);

1 1 2 3 4 5

j

1 1 2 3 4 5

i

State of Variables n = 4, i = 4, j = 1 Statement Instances Executed

S(4,0), S(4,1), S(3,0), S(3,1), S(3,2), S(3,3) S(2,0), S(2,1), S(2,2) S(1,0), S(1,1) S(0,0)

6 / 38

slide-19
SLIDE 19

s p c l.in f.e th z .c h @s p c l_e th

Program for (i = 0; i <= n; i++) for (j = 0; j <= i; j++) S(i,j);

1 1 2 3 4 5

j

1 1 2 3 4 5

i

State of Variables n = 4, i = 4, j = 2 Statement Instances Executed

S(4,0), S(4,1), S(4,2), S(3,0), S(3,1), S(3,2), S(3,3) S(2,0), S(2,1), S(2,2) S(1,0), S(1,1) S(0,0)

6 / 38

slide-20
SLIDE 20

s p c l.in f.e th z .c h @s p c l_e th

Program for (i = 0; i <= n; i++) for (j = 0; j <= i; j++) S(i,j);

1 1 2 3 4 5

j

1 1 2 3 4 5

i

State of Variables n = 4, i = 4, j = 3 Statement Instances Executed

S(4,0), S(4,1), S(4,2), S(4,3), S(3,0), S(3,1), S(3,2), S(3,3) S(2,0), S(2,1), S(2,2) S(1,0), S(1,1) S(0,0)

6 / 38

slide-21
SLIDE 21

s p c l.in f.e th z .c h @s p c l_e th

Program for (i = 0; i <= n; i++) for (j = 0; j <= i; j++) S(i,j);

1 1 2 3 4 5

j

1 1 2 3 4 5

i

State of Variables n = 4, i = 4, j = 4 Statement Instances Executed

S(4,0), S(4,1), S(4,2), S(4,3), S(4,4) S(3,0), S(3,1), S(3,2), S(3,3) S(2,0), S(2,1), S(2,2) S(1,0), S(1,1) S(0,0)

6 / 38

slide-22
SLIDE 22

s p c l.in f.e th z .c h @s p c l_e th

Program for (i = 0; i <= n; i++) for (j = 0; j <= i; j++) S(i,j); Iteration space

1 1 2 3 4 5

j

1 1 2 3 4 5

i

State of Variables n = 4, i = 4, j = 4 Statement Instances Executed

S(4,0), S(4,1), S(4,2), S(4,3), S(4,4) S(3,0), S(3,1), S(3,2), S(3,3) S(2,0), S(2,1), S(2,2) S(1,0), S(1,1) S(0,0)

6 / 38

slide-23
SLIDE 23

s p c l.in f.e th z .c h @s p c l_e th

Program for (i = 0; i <= n; i++) for (j = 0; j <= i; j++) S(i,j); Iteration space

1 1 2 3 4 5

j

1 1 2 3 4 5

i

State of Variables n = 4, i = 0, j = 0 Statement Instances Executed

S(4,0), S(4,1), S(4,2), S(4,3), S(4,4) S(3,0), S(3,1), S(3,2), S(3,3) S(2,0), S(2,1), S(2,2) S(1,0), S(1,1) S(0,0)

6 / 38

slide-24
SLIDE 24

s p c l.in f.e th z .c h @s p c l_e th

Program for (i = 0; i <= n; i++) for (j = 0; j <= i; j++) S(i,j); Iteration space

1 1 2 3 4 5

j

1 1 2 3 4 5

i

State of Variables n = 4, i = 1, j = 0 Statement Instances Executed

S(4,0), S(4,1), S(4,2), S(4,3), S(4,4) S(3,0), S(3,1), S(3,2), S(3,3) S(2,0), S(2,1), S(2,2) S(1,0), S(1,1) S(0,0)

6 / 38

slide-25
SLIDE 25

s p c l.in f.e th z .c h @s p c l_e th

Program for (i = 0; i <= n; i++) for (j = 0; j <= i; j++) S(i,j); Iteration space

1 1 2 3 4 5

j

1 1 2 3 4 5

i

State of Variables n = 4, i = 1, j = 1 Statement Instances Executed

S(4,0), S(4,1), S(4,2), S(4,3), S(4,4) S(3,0), S(3,1), S(3,2), S(3,3) S(2,0), S(2,1), S(2,2) S(1,0), S(1,1) S(0,0)

6 / 38

slide-26
SLIDE 26

s p c l.in f.e th z .c h @s p c l_e th

Program for (i = 0; i <= n; i++) for (j = 0; j <= i; j++) S(i,j); Iteration space

1 1 2 3 4 5

j

1 1 2 3 4 5

i

State of Variables n = 4, i = 2, j = 0 Statement Instances Executed

S(4,0), S(4,1), S(4,2), S(4,3), S(4,4) S(3,0), S(3,1), S(3,2), S(3,3) S(2,0), S(2,1), S(2,2) S(1,0), S(1,1) S(0,0)

6 / 38

slide-27
SLIDE 27

s p c l.in f.e th z .c h @s p c l_e th

Program for (i = 0; i <= n; i++) for (j = 0; j <= i; j++) S(i,j); Iteration space

1 1 2 3 4 5

j

1 1 2 3 4 5

i

State of Variables n = 4, i = 2, j = 1 Statement Instances Executed

S(4,0), S(4,1), S(4,2), S(4,3), S(4,4) S(3,0), S(3,1), S(3,2), S(3,3) S(2,0), S(2,1), S(2,2) S(1,0), S(1,1) S(0,0)

6 / 38

slide-28
SLIDE 28

s p c l.in f.e th z .c h @s p c l_e th

Program for (i = 0; i <= n; i++) for (j = 0; j <= i; j++) S(i,j); Iteration space

1 1 2 3 4 5

j

1 1 2 3 4 5

i

State of Variables n = 4, i = 2, j = 2 Statement Instances Executed

S(4,0), S(4,1), S(4,2), S(4,3), S(4,4) S(3,0), S(3,1), S(3,2), S(3,3) S(2,0), S(2,1), S(2,2) S(1,0), S(1,1) S(0,0)

6 / 38

slide-29
SLIDE 29

s p c l.in f.e th z .c h @s p c l_e th

Program for (i = 0; i <= n; i++) for (j = 0; j <= i; j++) S(i,j); Iteration space

1 1 2 3 4 5

j

1 1 2 3 4 5

i

State of Variables n = 4, i = 3, j = 0 Statement Instances Executed

S(4,0), S(4,1), S(4,2), S(4,3), S(4,4) S(3,0), S(3,1), S(3,2), S(3,3) S(2,0), S(2,1), S(2,2) S(1,0), S(1,1) S(0,0)

6 / 38

slide-30
SLIDE 30

s p c l.in f.e th z .c h @s p c l_e th

Program for (i = 0; i <= n; i++) for (j = 0; j <= i; j++) S(i,j); Iteration space

1 1 2 3 4 5

j

1 1 2 3 4 5

i

State of Variables n = 4, i = 3, j = 1 Statement Instances Executed

S(4,0), S(4,1), S(4,2), S(4,3), S(4,4) S(3,0), S(3,1), S(3,2), S(3,3) S(2,0), S(2,1), S(2,2) S(1,0), S(1,1) S(0,0)

6 / 38

slide-31
SLIDE 31

s p c l.in f.e th z .c h @s p c l_e th

Program for (i = 0; i <= n; i++) for (j = 0; j <= i; j++) S(i,j); Iteration space

1 1 2 3 4 5

j

1 1 2 3 4 5

i

State of Variables n = 4, i = 3, j = 2 Statement Instances Executed

S(4,0), S(4,1), S(4,2), S(4,3), S(4,4) S(3,0), S(3,1), S(3,2), S(3,3) S(2,0), S(2,1), S(2,2) S(1,0), S(1,1) S(0,0)

6 / 38

slide-32
SLIDE 32

s p c l.in f.e th z .c h @s p c l_e th

Program for (i = 0; i <= n; i++) for (j = 0; j <= i; j++) S(i,j); Iteration space

1 1 2 3 4 5

j

1 1 2 3 4 5

i

State of Variables n = 4, i = 3, j = 3 Statement Instances Executed

S(4,0), S(4,1), S(4,2), S(4,3), S(4,4) S(3,0), S(3,1), S(3,2), S(3,3) S(2,0), S(2,1), S(2,2) S(1,0), S(1,1) S(0,0)

6 / 38

slide-33
SLIDE 33

s p c l.in f.e th z .c h @s p c l_e th

Program for (i = 0; i <= n; i++) for (j = 0; j <= i; j++) S(i,j); Iteration space

1 1 2 3 4 5

j

1 1 2 3 4 5

i

State of Variables n = 4, i = 4, j = 0 Statement Instances Executed

S(4,0), S(4,1), S(4,2), S(4,3), S(4,4) S(3,0), S(3,1), S(3,2), S(3,3) S(2,0), S(2,1), S(2,2) S(1,0), S(1,1) S(0,0)

6 / 38

slide-34
SLIDE 34

s p c l.in f.e th z .c h @s p c l_e th

Program for (i = 0; i <= n; i++) for (j = 0; j <= i; j++) S(i,j); Iteration space

1 1 2 3 4 5

j

1 1 2 3 4 5

i

State of Variables n = 4, i = 4, j = 1 Statement Instances Executed

S(4,0), S(4,1), S(4,2), S(4,3), S(4,4) S(3,0), S(3,1), S(3,2), S(3,3) S(2,0), S(2,1), S(2,2) S(1,0), S(1,1) S(0,0)

6 / 38

slide-35
SLIDE 35

s p c l.in f.e th z .c h @s p c l_e th

Program for (i = 0; i <= n; i++) for (j = 0; j <= i; j++) S(i,j); Iteration space

1 1 2 3 4 5

j

1 1 2 3 4 5

i

State of Variables n = 4, i = 4, j = 2 Statement Instances Executed

S(4,0), S(4,1), S(4,2), S(4,3), S(4,4) S(3,0), S(3,1), S(3,2), S(3,3) S(2,0), S(2,1), S(2,2) S(1,0), S(1,1) S(0,0)

6 / 38

slide-36
SLIDE 36

s p c l.in f.e th z .c h @s p c l_e th

Program for (i = 0; i <= n; i++) for (j = 0; j <= i; j++) S(i,j); Iteration space

1 1 2 3 4 5

j

1 1 2 3 4 5

i

State of Variables n = 4, i = 4, j = 3 Statement Instances Executed

S(4,0), S(4,1), S(4,2), S(4,3), S(4,4) S(3,0), S(3,1), S(3,2), S(3,3) S(2,0), S(2,1), S(2,2) S(1,0), S(1,1) S(0,0)

6 / 38

slide-37
SLIDE 37

s p c l.in f.e th z .c h @s p c l_e th

Program for (i = 0; i <= n; i++) for (j = 0; j <= i; j++) S(i,j); Iteration space

1 1 2 3 4 5

j

1 1 2 3 4 5

i

State of Variables n = 4, i = 4, j = 4 Statement Instances Executed

S(4,0), S(4,1), S(4,2), S(4,3), S(4,4) S(3,0), S(3,1), S(3,2), S(3,3) S(2,0), S(2,1), S(2,2) S(1,0), S(1,1) S(0,0)

6 / 38

slide-38
SLIDE 38

s p c l.in f.e th z .c h @s p c l_e th

Program for (i = 0; i <= n; i++) for (j = 0; j <= i; j++) S(i,j); Iteration space

1 1 2 3 4 5

j

1 1 2 3 4 5

i

j <= i j >= 0 i <= n (4) i >= 0

State of Variables n = 4, i = 4, j = 4 Statement Instances Executed

S(4,0), S(4,1), S(4,2), S(4,3), S(4,4) S(3,0), S(3,1), S(3,2), S(3,3) S(2,0), S(2,1), S(2,2) S(1,0), S(1,1) S(0,0)

6 / 38

slide-39
SLIDE 39

s p c l.in f.e th z .c h @s p c l_e th

Program for (i = 0; i <= n; i++) for (j = 0; j <= i; j++) S(i,j); Iteration space

1 1 2 3 4 5

j

1 1 2 3 4 5

i

j <= i j >= 0 i <= n (4) i >= 0

State of Variables n = 4, i = 4, j = 4 Statement Instances Executed

S(4,0), S(4,1), S(4,2), S(4,3), S(4,4) S(3,0), S(3,1), S(3,2), S(3,3) S(2,0), S(2,1), S(2,2) S(1,0), S(1,1) S(0,0)

= {S(i, j) | 0 ≤ i ≤ n ∧ 0 ≤ j ≤ i}

6 / 38

slide-40
SLIDE 40

s p c l.in f.e th z .c h @s p c l_e th

AST Generation - Basic Example

{ S1(i) → (i, 0, 0) | 0 ≤ i < n; S2(i, j) → (i, 1, j) | 0 ≤ j < i < n; S3(i) → (i, 2, 0) | 0 ≤ i < n }

7 / 38

slide-41
SLIDE 41

s p c l.in f.e th z .c h @s p c l_e th

AST Generation - Basic Example

{ (i, 0, 0) → S1(i) | 0 ≤ i < n; (i, 1, j) → S2(i, j) | 0 ≤ j < i < n; (i, 2, 0) → S3(i) | 0 ≤ i < n }

8 / 38

slide-42
SLIDE 42

s p c l.in f.e th z .c h @s p c l_e th

AST Generation - Basic Example

{ (i, 0, 0) → S1(i) | 0 ≤ i < n; (i, 1, j) → S2(i, j) | 0 ≤ j < i < n; (i, 2, 0) → S3(i) | 0 ≤ i < n } Project on dim. 1 { (i) | 0 ≤ i < n }

for (i = 0; i < n; i++) { ... }

8 / 38

slide-43
SLIDE 43

s p c l.in f.e th z .c h @s p c l_e th

AST Generation - Basic Example

{ (i, 0, 0) → S1(i) | 0 ≤ i < n; (i, 1, j) → S2(i, j) | 0 ≤ j < i < n; (i, 2, 0) → S3(i) | 0 ≤ i < n } Project on dim. 1 { (i) | 0 ≤ i < n } Project on dim. 1, 2 { (i, t) | 0 ≤ i < n ∧ 0 ≤ t ≤ 2 }

for (i = 0; i < n; i++) { // t = 0 S1(i); // t = 1 ... // t = 2 S3(i); }

9 / 38

slide-44
SLIDE 44

s p c l.in f.e th z .c h @s p c l_e th

AST Generation - Basic Example

{ (i, 0, 0) → S1(i) | 0 ≤ i < n; (i, 1, j) → S2(i, j) | 0 ≤ j < i < n; (i, 2, 0) → S3(i) | 0 ≤ i < n } Project on dim. 1 { (i) | 0 ≤ i < n } Project on dim. 1, 2 { (i, t) | 0 ≤ i < n ∧ 0 ≤ t ≤ 2 } Project on dim. 1, 2, 3 { (i, t, j) | 0 ≤ i < n ∧ 0 ≤ t ≤ 2 ∧ 0 ≤ j < i }

for (i = 0; i < n; i++) { // t = 0 S1(i); // t = 1 for (j = 0; i < n; i++) S2(i, j); // t = 2 S3(i); }

10 / 38

slide-45
SLIDE 45

s p c l.in f.e th z .c h @s p c l_e th

Elimination of Existentially Quantified Variables

Domain { (t) : (∃ α : α ≥ −1 + t ∧ 2α ≥ 1 + t ∧ α ≤ t ∧ 4α ≤ N + 2t) } Quantifier Elimination { (t) : (t ≥ 3 ∧ 2t ≤ 4 + N) ∨ (t ≤ 2 ∧ t ≥ 1 ∧ 2t ≤ N) }

for (c0 = 1; c0 <= min(2, floordiv(N, 2)); c0 += 1) // body for (c0 = 3; c0 <= floordiv(N, 2) + 2; c0 += 1) // body

Fourier-Motzkin (Rational Quantifier Elimination) { (t) : 2t ≤ 4 + N ∧ N ≥ 2 ∧ t ≥ 1 }

for (c0 = 1; c0 <= floordiv(N, 2) + 2; c0 += 1) // body

11 / 38

slide-46
SLIDE 46

s p c l.in f.e th z .c h @s p c l_e th

Elimination of Existentially Quantified Dimensions

QE: { (t) : (t ≥ 3 ∧ 2t ≤ 4 + N) ∨ (t ≤ 2 ∧ t ≥ 1 ∧ 2t ≤ N) } FM: { (t) : 2t ≤ 4 + N ∧ N ≥ 2 ∧ t ≥ 1 } t N

1 2 3 4 5 1 2 3 4 5 6

Two more points in FM: { (2) : 2 ≤ N ≤ 3 }

◮ Simple code at outer levels → Fourier-Motzkin ◮ No approximation at innermost level → Quant. Elimination

12 / 38

slide-47
SLIDE 47

s p c l.in f.e th z .c h @s p c l_e th

Semantic Unrolling

Domain: {i | 0 ≤ i < 1000 ∧ N ≤ i < N + 4}

13 / 38

slide-48
SLIDE 48

s p c l.in f.e th z .c h @s p c l_e th

Semantic Unrolling

Domain: {i | 0 ≤ i < 1000 ∧ N ≤ i < N + 4} Lower Bound: 0 ≤ i

if (N <= 0 && 0 < N + 4) S(0); if (N <= 1 && 1 < N + 4) S(1); if (N <= 2 && 2 < N + 4) S(2); if (N <= 3 && 3 < N + 4) S(3); ... if (N <= 999 && 999 < N + 4) S(999);

Lower Bound: N ≤ i

if (N >= 0 && N <= 999) S(N); if (N >= -1 && N <= 998) S(N + 1); if (N >= -2 && N <= 997) S(N + 2);

13 / 38

slide-49
SLIDE 49

s p c l.in f.e th z .c h @s p c l_e th

Isolation

Domain: {(i) | m ≤ i < n} Schedule: {(i) → (i)}

for (i = m; i < n; i++) A(i);

14 / 38

slide-50
SLIDE 50

s p c l.in f.e th z .c h @s p c l_e th

Isolation

Domain: {(i) | m ≤ i < n} Schedule: {(i) → (4⌊i/4⌋), i)}

for (c0 = 4 * floordiv(m, 4); c0 < n; c0 += 4) for (c1 = max(m, c0); c1 <= min(n - 1, c0 + 3); c1 += 1) A(c1);

15 / 38

slide-51
SLIDE 51

s p c l.in f.e th z .c h @s p c l_e th

Isolation

Domain: {(i) | m ≤ i < n} Schedule: {(i) → (4⌊i/4⌋, i)}, Isolate: {(t) | m ≤ t ∧ t + 3 < n}

// Before if (n >= m + 4) for (c1 = m; c1 <= 4 * floordiv(m - 1, 4) + 3; c1 += 1) S(c1); // Main for (c0 = 4 * floordiv(m - 1, 4) + 4; c0 < n - 3; c0 += 4) for (c1 = c0; c1 <= c0 + 3; c1 += 1) S(c1); // After if (n >= m + 4 && 4 * floordiv(n - 1, 4) + 3 >= n) { for (c1 = 4 * floordiv(n - 1, 4); c1 < n; c1 += 1) S(c1); } else if (m + 3 >= n) // Other for (c0 = 4 * floordiv(m, 4); c0 < n; c0 += 4) for (c1 = max(m, c0); c1 <= min(n - 1, c0 + 3); c1 += 1) S(c1);

16 / 38

slide-52
SLIDE 52

s p c l.in f.e th z .c h @s p c l_e th

AST Expression Generation

Piecewise Affine Expr. (i) → (⌊i/4⌋) (i) → (i mod 4) AST Expression → floordiv(i, 4) → i - 4 * floordiv(i, 4)

17 / 38

slide-53
SLIDE 53

s p c l.in f.e th z .c h @s p c l_e th

AST Expression Generation

Piecewise Affine Expr. (i) → (⌊i/4⌋) (i) → (i mod 4) AST Expression → floordiv(i, 4) → i - 4 * floordiv(i, 4) C implementation

#define floordiv(n, d) \ (((n)<0) ? -((-(n)+(d)-1)/(d)) : (n)/(d))

17 / 38

slide-54
SLIDE 54

s p c l.in f.e th z .c h @s p c l_e th

AST Expression Generation

Piecewise Affine Expr. (i) → (⌊i/4⌋) (i) → (i mod 4) AST Expression → floordiv(i, 4) → i - 4 * floordiv(i, 4) C implementation

#define floordiv(n, d) \ (((n)<0) ? -((-(n)+(d)-1)/(d)) : (n)/(d))

  • Pw. Aff. Expr.

(i) → (⌊i/4⌋) (i) → (i mod 4) Context i ≥ 0 i ≤ 0 i mod 4 = 0 i ≥ 0 i ≤ 0 AST Expression → i / 4 → -((-i + 3) / 4) → i / 4 → i % 4 → -((-i + 3) % 4) + 3

17 / 38

slide-55
SLIDE 55

s p c l.in f.e th z .c h @s p c l_e th

Schedule Trees - A structured schedule representation

for (i = 0; i < n; i++) { for (j = i; j < n; j++) for (k = 0; k < p1 ; k++) S1: A[i][j] = k * B[i] // Mark "A" S2: A[i][i] = A[i][i] / B[i]; }

S1(i, j, k) | 0 ≤ i ≤ j < n ∧ 0 ≤ k < p1 S2(i) | 0 ≤ i < n domain S1(i, j, k) → (i) ; S2(i) → (i) band seq sequence S1(i, j, k) filter S1(i, j, k) → (j, k) band S2(i) filter Mark ”A” marker

18 / 38

slide-56
SLIDE 56

s p c l.in f.e th z .c h @s p c l_e th

Example - Start

S1(i, j, k) | 0 ≤ i ≤ j < n ∧ 0 ≤ k < p1 domain S1(i, j, k) → (i, j, k) band

for (i = 0; i < n; i++) for (j = i; j < n; j++) for (k = 0; k < n ; k++) S1: S(i,j,k) 19 / 38

slide-57
SLIDE 57

s p c l.in f.e th z .c h @s p c l_e th

Example - Tiling

S1(i, j, k) | 0 ≤ i ≤ j < n ∧ 0 ≤ k < p1 domain S1(i, j, k) → (⌊i/128⌋, ⌊j/128⌋, ⌊k/128⌋) band S1(i, j, k) → (i%128, j%128, k%128) band

for (c0 = 0; c0 < n; c0 += 128) for (c1 = 0; c1 < n; c1 += 128) for (c2 = 0; c2 < n; c2 += 128) for (c3 = 0; c3 <= min(127, n - c0 - 1); c3 += 1) for (c4 = 0; c4 <= min(127, n - c1 - 1); c4 += 1) for (c5 = 0; c5 <= min(127, n - c2 - 1); c5 += 1) S1(c0 + c3, c1 + c4, c2 + c5); 20 / 38

slide-58
SLIDE 58

s p c l.in f.e th z .c h @s p c l_e th

Example - Split

S1(i, j, k) | 0 ≤ i ≤ j < n ∧ 0 ≤ k < p1 domain S1(i, j, k) → (⌊i/128⌋, ⌊j/128⌋, ⌊k/128⌋) band S1(i, j, k) → (i%128) band S1(i, j, k) → (j%128) band S1(i, j, k) → (k%128) band

for (c0 = 0; c0 < n; c0 += 128) for (c1 = 0; c1 < n; c1 += 128) for (c2 = 0; c2 < n; c2 += 128) for (c3 = 0; c3 <= min(127, n - c0 - 1); c3 += 1) for (c4 = 0; c4 <= min(127, n - c1 - 1); c4 += 1) for (c5 = 0; c5 <= min(127, n - c2 - 1); c5 += 1) S1(c0 + c3, c1 + c4, c2 + c5); 21 / 38

slide-59
SLIDE 59

s p c l.in f.e th z .c h @s p c l_e th

Example - Strip-mine and interchange

S1(i, j, k) | 0 ≤ i ≤ j < n ∧ 0 ≤ k < p1 domain S1(i, j, k) → (⌊i/128⌋, ⌊j/128⌋, ⌊k/128⌋) band S1(i, j, k) → (i%128) band S1(i, j, k) → (⌊(j%128)/8⌋) band S1(i, j, k) → (k%128) band S1(i, j, k) → (j%8) band

[...] for (c3 = 0; c3 <= min(127, n - c0 - 1); c3 += 1) for (c4 = 0; c4 <= min(127, n - c1 - 1); c4 += 1) for (c5 = 0; c5 <= min(127, n - c2 - 1); c5 += 1) // SIMD Parallel Loop // at most 8 iterations for (c6 = 0; c6 <= min(7, n - c1 - c4 - 1); c6 += 1) S1(c0 + c3, c1 + c4 + c6, c2 + c5); 22 / 38

slide-60
SLIDE 60

s p c l.in f.e th z .c h @s p c l_e th

Example - Isolate Core Computation

S1(i, j, k) | 0 ≤ i ≤ j < n ∧ 0 ≤ k < p1 domain S1(i, j, k) → (⌊i/128⌋, ⌊j/128⌋, ⌊k/128⌋) band S1(i, j, k) → (i%128) band S1(i, j, k) → (⌊(j%128)/8⌋) {isolate[[a, b, c, d] → [e]] : b < ⌊n/128⌋} band S1(i, j, k) → (k%128) band S1(i, j, k) → (j%8) band

[...] for (c3 = 0; c3 <= min(127, n - c0 - 1); c3 += 1) if (n >= 128 * c1 + 128) { for (c4 = 0; c4 <= 127; c4 += 8) for (c5 = 0; c5 <= min(127, n - c2 - 1); c5 += 1) // SIMD Parallel Loop // Exactly 8 Iterations for (c6 = 0; c6 <= 7; c6 += 1) S1(c0 + c3, c1 + c4 + c6, c2 + c5); } else { // Handle remainder 23 / 38

slide-61
SLIDE 61

s p c l.in f.e th z .c h @s p c l_e th

Experimental Evaluation

24 / 38

slide-62
SLIDE 62

s p c l.in f.e th z .c h @s p c l_e th

Robustness

25 / 38

slide-63
SLIDE 63

s p c l.in f.e th z .c h @s p c l_e th

Generated Code Performance – Consistent Performance

0.5 1 2 0.5 1 2 relative execution time (isl) relative execution time (CodeGen+) gcc clang icc

26 / 38

slide-64
SLIDE 64

s p c l.in f.e th z .c h @s p c l_e th

Generated Code Performance – Outliers

0.5 1 2 0.5 1 2 relative execution time (isl) relative execution time (CodeGen+) gcc clang icc

27 / 38

slide-65
SLIDE 65

s p c l.in f.e th z .c h @s p c l_e th

Code Quality: youcefn [Bastoul 2004]

CLooG 0.14.1

for(i=1; i<=n-2; i++) { S0(i,i); S1(i,i); for(j=i+1; j<=n-1; j++) S1(i,j); S1(i,n); S2(i,n); } S0(n-1,n-1); S1(n-1,n-1); S1(n-1,n); S2(n-1,n); S0(n,n); S1(n,n); S2(n,n); for (i=n+1; i <= m; i++) S3(i,j);

28 / 38

slide-66
SLIDE 66

s p c l.in f.e th z .c h @s p c l_e th

Code Quality: youcefn [Bastoul 2004]

CLooG 0.14.1

for(i=1; i<=n-2; i++) { S0(i,i); S1(i,i); for(j=i+1; j<=n-1; j++) S1(i,j); S1(i,n); S2(i,n); } S0(n-1,n-1); S1(n-1,n-1); S1(n-1,n); S2(n-1,n); S0(n,n); S1(n,n); S2(n,n); for (i=n+1; i <= m; i++) S3(i,j);

CodeGen+

for(i=1; i<=m; i++) { if(i>=n +1) { S2(i,n); } else { S0(i,i); S1(i,i); if (i>=n) S2 (i,i); } for(j=i+1; j<=n-1; j++) S0(i,j); if(n >= i+1) { S0(i,n); S2(i,n); } }

28 / 38

slide-67
SLIDE 67

s p c l.in f.e th z .c h @s p c l_e th

Code Quality: youcefn [Bastoul 2004]

CLooG 0.14.1

for(i=1; i<=n-2; i++) { S0(i,i); S1(i,i); for(j=i+1; j<=n-1; j++) S1(i,j); S1(i,n); S2(i,n); } S0(n-1,n-1); S1(n-1,n-1); S1(n-1,n); S2(n-1,n); S0(n,n); S1(n,n); S2(n,n); for (i=n+1; i <= m; i++) S3(i,j);

CodeGen+

for(i=1; i<=m; i++) { if(i>=n +1) { S2(i,n); } else { S0(i,i); S1(i,i); if (i>=n) S2 (i,i); } for(j=i+1; j<=n-1; j++) S0(i,j); if(n >= i+1) { S0(i,n); S2(i,n); } }

isl codegen

for (c0=1;c0<=n;c0+=1) { S0(c0, c0); for (c1=c0;c1<=n;c1+=1) S1(c0, c1); S2(c0, n); } for (c0=n+1;c0<=m;c0+=1) S2(c0, n);

28 / 38

slide-68
SLIDE 68

s p c l.in f.e th z .c h @s p c l_e th

youcefn [Bastoul 2004] - Statistics

Instruction Count

clang gcc icc Compiler 2000 4000 6000 8000 10000 12000 14000 16000 18000 Instruction Count Code Generator ClooG 0.14.1 CodeGen+ isl

Code Size

clang gcc icc Compiler 500 1000 1500 2000 2500 3000 Code Size Code Generator ClooG 0.14.1 CodeGen+ isl

29 / 38

slide-69
SLIDE 69

s p c l.in f.e th z .c h @s p c l_e th

Code Quality: [Chen 2012] - Figure 8(b)

CLooG 0.18.1

if (n >= 2) for (i = 2; i <= n; i += 2) { if (i%4 == 0) S0(i); if ((i+2)%4 == 0) S1(i); }

30 / 38

slide-70
SLIDE 70

s p c l.in f.e th z .c h @s p c l_e th

Code Quality: [Chen 2012] - Figure 8(b)

CLooG 0.18.1

if (n >= 2) for (i = 2; i <= n; i += 2) { if (i%4 == 0) S0(i); if ((i+2)%4 == 0) S1(i); }

CodeGen+

#define intMod(a,b) ((a) >= 0 ? (a) % (b) : (b) - abs((a) % (b)) % (b)) for(i = 2; i <= n; i += 2) if (intMod(i,4) == 0) S0(i); else S1(i);

30 / 38

slide-71
SLIDE 71

s p c l.in f.e th z .c h @s p c l_e th

Code Quality: [Chen 2012] - Figure 8(b)

CLooG 0.18.1

if (n >= 2) for (i = 2; i <= n; i += 2) { if (i%4 == 0) S0(i); if ((i+2)%4 == 0) S1(i); }

isl codegen

for (c0 = 2; c0 < n - 1; c0 += 4) { S1(c0); S0(c0 + 2); } if (n >= 2 && n % 4 >= 2) S1(-(n % 4) + n + 2);

CodeGen+

#define intMod(a,b) ((a) >= 0 ? (a) % (b) : (b) - abs((a) % (b)) % (b)) for(i = 2; i <= n; i += 2) if (intMod(i,4) == 0) S0(i); else S1(i);

30 / 38

slide-72
SLIDE 72

s p c l.in f.e th z .c h @s p c l_e th

[Chen 2012] - Figure 8(b) - Statistics

Instruction Count

clang gcc icc Compiler 100 200 300 400 500 600 Instruction Count Code Generator ClooG 0.18.1 CodeGen+ isl

Code Size

clang gcc icc Compiler 200 400 600 800 1000 1200 Code Size Code Generator ClooG 0.18.1 CodeGen+ isl

31 / 38

slide-73
SLIDE 73

s p c l.in f.e th z .c h @s p c l_e th

[Chen 2012] - Figure 8(b) - Statistics (-no-vec, -no-unroll)

Instruction Count

clang gcc icc Compiler 200 400 600 800 1000 1200 Instruction Count Code Generator ClooG 0.18.1 CodeGen+ isl

Code Size

clang gcc icc Compiler 20 40 60 80 100 120 140 160 180 Code Size Code Generator ClooG 0.18.1 CodeGen+ isl

32 / 38

slide-74
SLIDE 74

s p c l.in f.e th z .c h @s p c l_e th

Modulo and Existentially Quantified Variables

CodeGen+

// Simple for(i = intMod(n,128); i <= 127; i += 128) S(i); // Shifted for(i = 7+intMod(t1-7,128); i <= 134; i += 128) S(i); // Conditional for(i = 7+intMod(t1-7,128); i <= 130; i += 128) S(i);

33 / 38

slide-75
SLIDE 75

s p c l.in f.e th z .c h @s p c l_e th

Modulo and Existentially Quantified Variables

CodeGen+

// Simple for(i = intMod(n,128); i <= 127; i += 128) S(i); // Shifted for(i = 7+intMod(t1-7,128); i <= 134; i += 128) S(i); // Conditional for(i = 7+intMod(t1-7,128); i <= 130; i += 128) S(i);

isl codegen

// Simple S(n % 128); // Shifted S(((t1 + 121) % 128) + 7); // Conditional if ((t1 + 121) % 128 <= 123) S(((t1 + 125) % 128) + 3);

33 / 38

slide-76
SLIDE 76

s p c l.in f.e th z .c h @s p c l_e th

Modulo and Existentially Quantified Variables - Statistics

Instruction Count Simple

clang gcc icc Compiler 100 101 102 Instruction Count CodeGen+ isl isl (unsigned mod)

Shifted

clang gcc icc Compiler 100 101 102 Instruction Count CodeGen+ isl isl (unsigned mod)

Conditional

clang gcc icc Compiler 100 101 102 Instruction Count CodeGen+ isl isl (unsigned mod)

Code Size Simple

clang gcc icc Compiler 101 102 103 Code size CodeGen+ isl isl (unsigned mod)

Shifted

clang gcc icc Compiler 101 102 103 Code size CodeGen+ isl isl (unsigned mod)

Conditional

clang gcc icc Compiler 101 102 103 Code size CodeGen+ isl isl (unsigned mod)

34 / 38

slide-77
SLIDE 77

s p c l.in f.e th z .c h @s p c l_e th

Polyhedral Unrolling

Normal loop code

// Two e.q. variables for (c0 = 0; c0 <= 7; c0 += 1) if (2 * (2 * c0 / 3) >= c0) S(c0); // Multiple bounds for (c0 = 0; c0 <= 1; c0 += 1) for (c1 = max(t1 - 384, t2 - 514); c1 < t1 - 255; c1 += 1) if (c1 + 256 == t1 || (t1 >= 126 && t2 <= 255 && c1 + 384 == t1) || (t2 == 256 && c1 + 384 == t1)) S(c0, c1);

35 / 38

slide-78
SLIDE 78

s p c l.in f.e th z .c h @s p c l_e th

Polyhedral Unrolling

Normal loop code

// Two e.q. variables for (c0 = 0; c0 <= 7; c0 += 1) if (2 * (2 * c0 / 3) >= c0) S(c0); // Multiple bounds for (c0 = 0; c0 <= 1; c0 += 1) for (c1 = max(t1 - 384, t2 - 514); c1 < t1 - 255; c1 += 1) if (c1 + 256 == t1 || (t1 >= 126 && t2 <= 255 && c1 + 384 == t1) || (t2 == 256 && c1 + 384 == t1)) S(c0, c1);

Unrolled

// Two e.q. variables S(0); S(2); S(3); S(4); S(5); S(6); S(7); // Multiple bounds if (t1 >= 126) S(0, t1 - 384); S(0, t1 - 256); if (t1 >= 126) S(1, t1 - 384); S(1, t1 - 256);

35 / 38

slide-79
SLIDE 79

s p c l.in f.e th z .c h @s p c l_e th

Polyhedral Unrolling - Statistics

Instruction Count Two e.q. variables

clang gcc icc Compiler 100 101 Instruction Count isl isl unrolled

Multiple Bounds

clang gcc icc Compiler 101 102 103 104 Instruction Count isl isl unrolled

Code Size Two e.q. variables

clang gcc icc Compiler 101 102 Code size isl isl unrolled

Multiple Bounds

clang gcc icc Compiler 102 103 Code size isl isl unrolled

36 / 38

slide-80
SLIDE 80

s p c l.in f.e th z .c h @s p c l_e th

AST Generation Strategies for Hybrid-Hexagonal Tiling

Heat 2D

no all all ­ isolation all ­ IO unrolling all ­ compute unrolling all ­ modulo detection

Options

5 10 15 20 25 30

GFLOPS

Heat 3D

no all all ­ isolation all ­ IO unrolling all ­ compute unrolling all ­ modulo detection

Options

5 10 15 20 25 30

GFLOPS

Hybrid hexagonal/classical tiling for GPUs, Tobias Grosser, Albert Cohen, Justin Holewinski, P. Sadayappan, Sven Verdoolaege, International Symposium on Code Generation and Optimization (CGO’14) Hardware: NVIDIA NVS 5200M GPU, CUDA 5.5 37 / 38

slide-81
SLIDE 81

s p c l.in f.e th z .c h @s p c l_e th

AST Generation beyond Polyhedral Scanning

◮ Complete support for Presburger Relations

◮ Existentially quantified variables ◮ Piecewise schedules

◮ Aggressive simplification of AST expressions ◮ Stride and component detection ◮ Fine-grained options: code-size vs. control ◮ Specialization:

◮ Polyhedral unrolling ◮ User-directed versioning

◮ AST generation from structured schedules

http://playground.pollylabs.org

38 / 38