Input Space Splitting for OpenCL Simon Moll, Johannes Doerfert, - - PowerPoint PPT Presentation

input space splitting for opencl
SMART_READER_LITE
LIVE PREVIEW

Input Space Splitting for OpenCL Simon Moll, Johannes Doerfert, - - PowerPoint PPT Presentation

Input Space Splitting for OpenCL Simon Moll, Johannes Doerfert, Sebastian Hack Saarbrcken Graduate School of Computer Science Saarland University Saarbrcken, Germany October 29, 2015 saarland university OpenCL: Execution Model computer


slide-1
SLIDE 1

Input Space Splitting for OpenCL

Simon Moll, Johannes Doerfert, Sebastian Hack

Saarbrücken Graduate School of Computer Science Saarland University Saarbrücken, Germany

October 29, 2015

slide-2
SLIDE 2

computer science

saarland

university

OpenCL: Execution Model

Simon Moll, Johannes Doerfert, Sebastian Hack Background October 29, 2015 2 / 25

slide-3
SLIDE 3

computer science

saarland

university

OpenCL: Parallelized & Vectorized

Simon Moll, Johannes Doerfert, Sebastian Hack Background October 29, 2015 3 / 25

slide-4
SLIDE 4

computer science

saarland

university

Vectorization (SIMD)

Perform the same operation for multiple vector lanes simultaneously.

Simon Moll, Johannes Doerfert, Sebastian Hack Background October 29, 2015 4 / 25

slide-5
SLIDE 5

computer science

saarland

university

Vectorization (SIMD)

Perform the same operation for multiple vector lanes simultaneously.

Vector Patterns

Consecutive: contiguous entries < i, i + 1, i + 2, i + 3 > Uniform: single entry <i,i,i,i> → i Divergent: unrelated entries < i, j, 7, − > for (i = 0; i < 16; i++) O[i] = I[i] + 2; for (i = 0; i < 16; i += 2) O[i] = I[i] + 1;

Simon Moll, Johannes Doerfert, Sebastian Hack Background October 29, 2015 4 / 25

slide-6
SLIDE 6

computer science

saarland

university

Diverging Control Flow

a b c d e f Thread Trace 1

a b c e f

2

a b d e f

3

a b c e b c e f

4

a b c e b d e f

Different threads execute different code paths

Simon Moll, Johannes Doerfert, Sebastian Hack Background October 29, 2015 5 / 25

slide-7
SLIDE 7

computer science

saarland

university

Diverging Control Flow

a b c d e f a b c d e f Thread Trace 1

a b c d e b c d e f

2

a b c d e b c d e f

3

a b c d e b c d e f

4

a b c d e b c d e f

Different threads execute different code paths Execute everything, mask out results of inactive threads (using predication, blending) Control flow to data flow conversion on ASTs [Allen & Kennedy ’83] Whole-Function Vectorization on SSA CFGs [K & H ’11]

Simon Moll, Johannes Doerfert, Sebastian Hack Background October 29, 2015 5 / 25

slide-8
SLIDE 8

computer science

saarland

university

Non-Divergent Control Flow

Idea: optimize cases where threads do not diverge a b c d e f a b c d e f Thread Trace 1

a b c e b d e f

2

a b c e b d e f

3

a b c e b d e f

4

a b c e b d e f

Simon Moll, Johannes Doerfert, Sebastian Hack Background October 29, 2015 6 / 25

slide-9
SLIDE 9

computer science

saarland

university

Non-Divergent Control Flow

Idea: optimize cases where threads do not diverge a b c d e f a b c d e f Thread Trace 1

a b c e b d e f

2

a b c e b d e f

3

a b c e b d e f

4

a b c e b d e f

Option 1: Insert dynamic predicate-tests & branches to skip paths

◮ “Branch on superword condition code” (BOSCC) [Shin et al. PACT’07] ◮ Additional overhead for dynamic test ◮ Does not help against increased register pressure

Simon Moll, Johannes Doerfert, Sebastian Hack Background October 29, 2015 6 / 25

slide-10
SLIDE 10

computer science

saarland

university

Non-Divergent Control Flow

Idea: optimize cases where threads do not diverge a b c d e f a b c d e f

u v

Thread Trace 1

a b c e b d e f

2

a b c e b d e f

3

a b c e b d e f

4

a b c e b d e f

Option 2: Statically prove non-divergence of certain blocks

◮ Non-divergent blocks can be excluded from linearization ◮ Less executed code, less register pressure ◮ More conservative than dynamic test

Simon Moll, Johannes Doerfert, Sebastian Hack Background October 29, 2015 6 / 25

slide-11
SLIDE 11

computer science

saarland

university

Non-Divergent Control Flow

Idea: optimize cases where threads do not diverge a b c d e f a b c d e f

u u

Thread Trace 1

a b c e f

2

a b c e f

3

a b c e b d e f

4

a b c e b d e f

5

a b c e b d e f

6

a b c e b d e f

Option 3: Statically split non-divergence inputs

◮ Code versions with improved divergence properties ◮ Orthogonal to both other options =

⇒ combination possible

Simon Moll, Johannes Doerfert, Sebastian Hack Background October 29, 2015 6 / 25

slide-12
SLIDE 12

computer science

saarland

university

2D Convolution

x y

Simon Moll, Johannes Doerfert, Sebastian Hack Motivation October 29, 2015 7 / 25

slide-13
SLIDE 13

computer science

saarland

university

2D Convolution

x y

Simon Moll, Johannes Doerfert, Sebastian Hack Motivation October 29, 2015 7 / 25

slide-14
SLIDE 14

computer science

saarland

university

2D Convolution

x y

Simon Moll, Johannes Doerfert, Sebastian Hack Motivation October 29, 2015 7 / 25

slide-15
SLIDE 15

computer science

saarland

university

2D Convolution

x y

Simon Moll, Johannes Doerfert, Sebastian Hack Motivation October 29, 2015 7 / 25

slide-16
SLIDE 16

computer science

saarland

university

2D Convolution

x y

Simon Moll, Johannes Doerfert, Sebastian Hack Motivation October 29, 2015 7 / 25

slide-17
SLIDE 17

computer science

saarland

university

2D Convolution

x y

Simon Moll, Johannes Doerfert, Sebastian Hack Motivation October 29, 2015 7 / 25

slide-18
SLIDE 18

computer science

saarland

university

2D Convolution

x y

Simon Moll, Johannes Doerfert, Sebastian Hack Motivation October 29, 2015 7 / 25

slide-19
SLIDE 19

computer science

saarland

university

2D Convolution

int left = x - 2; int right = x + 2; int top = y - 2; int bottom = y + 2; int sum = 0; for (int i = left; i <= right; ++i) for (int j = top; j <= bottom; ++j) sum += input[j][i] * mask[j - top][i - left];

  • utput[y][x] = sum;

Simon Moll, Johannes Doerfert, Sebastian Hack Motivation October 29, 2015 8 / 25

slide-20
SLIDE 20

computer science

saarland

university

2D Convolution

auto left = x - 2; auto right = x + 2; int top = y - 2; int bottom = y + 2; int sum = 0; for (auto i = left; i <= right; ++i) for (int j = top; j <= bottom; ++j) sum += input[j][i] * mask[j - top][i - left];

  • utput[y][x] = sum;

Simon Moll, Johannes Doerfert, Sebastian Hack Motivation October 29, 2015 9 / 25

slide-21
SLIDE 21

computer science

saarland

university

2D Convolution

x y

Simon Moll, Johannes Doerfert, Sebastian Hack Motivation October 29, 2015 10 / 25

slide-22
SLIDE 22

computer science

saarland

university

2D Convolution

x y

Simon Moll, Johannes Doerfert, Sebastian Hack Motivation October 29, 2015 10 / 25

slide-23
SLIDE 23

computer science

saarland

university

2D Convolution

int left = MAX(0, x - 2); int right = MIN(width - 1, x + 2); int top = MAX(0, y - 2); int bottom = MIN(height - 1, y + 2); int sum = 0; for (int i = left; i <= right; ++i) for (int j = top; j <= bottom; ++j) sum += input[j][i] * mask[j − (y − 2)][i − (x − 2)];

  • utput[y][x] = sum;

Simon Moll, Johannes Doerfert, Sebastian Hack Motivation October 29, 2015 11 / 25

slide-24
SLIDE 24

computer science

saarland

university

2D Convolution

auto left = MAX(0, x - 2); auto right = MIN(width - 1, x + 2); int top = MAX(0, y - 2); int bottom = MIN(height - 1, y + 2); int sum = 0; for (auto i = left; i <= right; ++i) for (int j = top; j <= bottom; ++j) sum += input[j][i] * mask[j − (y − 2)][i − (x − 2)];

  • utput[y][x] = sum;

Simon Moll, Johannes Doerfert, Sebastian Hack Motivation October 29, 2015 12 / 25

slide-25
SLIDE 25

computer science

saarland

university

Input Space Splitting

x y

Simon Moll, Johannes Doerfert, Sebastian Hack Input Space Splitting October 29, 2015 13 / 25

slide-26
SLIDE 26

computer science

saarland

university

Input Space Splitting

x y x y

Simon Moll, Johannes Doerfert, Sebastian Hack Input Space Splitting October 29, 2015 13 / 25

slide-27
SLIDE 27

computer science

saarland

university

Input Space Splitting

vector

x y

scalar

x y

Simon Moll, Johannes Doerfert, Sebastian Hack Input Space Splitting October 29, 2015 13 / 25

slide-28
SLIDE 28

computer science

saarland

university

The Polyhedral Model

S: A[i][j] = /* ... */ ; if (j <= i) P: A[i][j]+= A[j][i];

Simon Moll, Johannes Doerfert, Sebastian Hack Background October 29, 2015 14 / 25

slide-29
SLIDE 29

computer science

saarland

university

The Polyhedral Model

for (int i = 0; i <= N; i++) for (int j = 0; j <= N; j++) { S: A[i][j] = /* ... */ ; if (j <= i) P: A[i][j]+= A[j][i]; }

Simon Moll, Johannes Doerfert, Sebastian Hack Background October 29, 2015 15 / 25

slide-30
SLIDE 30

computer science

saarland

university

The Polyhedral Model

for (int i = 0; i <= N; i++) for (int j = 0; j <= N; j++) { S: A[i][j] = /* ... */ ; if (j <= i) P: A[i][j]+= A[j][i]; }

i j N N

IS = {(S, (i, j)) | 0 ≤ i ≤ N ∧ 0 ≤ j ≤ N}

Simon Moll, Johannes Doerfert, Sebastian Hack Background October 29, 2015 15 / 25

slide-31
SLIDE 31

computer science

saarland

university

The Polyhedral Model

for (int i = 0; i <= N; i++) for (int j = 0; j <= N; j++) { S: A[i][j] = /* ... */ ; if (j <= i) P: A[i][j]+= A[j][i]; }

i j N N

IS = {(S, (i, j)) | 0 ≤ i ≤ N ∧ 0 ≤ j ≤ N} IP = {(P, (i, j)) | 0 ≤ i ≤ N ∧ 0 ≤ j ≤ i}

Simon Moll, Johannes Doerfert, Sebastian Hack Background October 29, 2015 15 / 25

slide-32
SLIDE 32

computer science

saarland

university

The Polyhedral Model

for (int i = 0; i <= N; i++) for (int j = 0; j <= N; j++) { S: A[i][j] = /* ... */ ; if (j <= i) P: A[i][j]+= A[j][i]; }

i j N N

FS = {(S, (i, j)) → (i, j)}

Simon Moll, Johannes Doerfert, Sebastian Hack Background October 29, 2015 15 / 25

slide-33
SLIDE 33

computer science

saarland

university

The Polyhedral Model

for (int i = 0; i <= N; i++) for (int j = 0; j <= N; j++) { S: A[i][j] = /* ... */ ; if (j <= i) P: A[i][j]+= A[j][i]; }

i j N N

FS = {(S, (i, j)) → (i, j)} FP1 = {(P, (i, j)) → (i, j)} FP2 = {(P, (i, j)) → (j, i)}

Simon Moll, Johannes Doerfert, Sebastian Hack Background October 29, 2015 15 / 25

slide-34
SLIDE 34

computer science

saarland

university

Splitting Predicates

Full Tile Predicate

for (int i = 0; i <= N; i++) for (int j = 0; j <= N; j += 8) { S: A[i][j] = /* ... */ ; if (j <= i) P: A[i][j]+= A[j][i]; }

Simon Moll, Johannes Doerfert, Sebastian Hack Approach October 29, 2015 16 / 25

slide-35
SLIDE 35

computer science

saarland

university

Splitting Predicates

Full Tile Predicate

for (int i = 0; i <= N; i++) for (int j = 0; j <= N; j += 8) { S: A[i][j] = /* ... */ ; if (j <= i) P: A[i][j]+= A[j][i]; } FullS = {(S, (i, j)) | (j − (j mod 8)) + 7 ≤ N}

Simon Moll, Johannes Doerfert, Sebastian Hack Approach October 29, 2015 16 / 25

slide-36
SLIDE 36

computer science

saarland

university

Splitting Predicates

Full Tile Predicate

for (int i = 0; i <= N; i++) for (int j = 0; j <= N; j += 8) { S: A[i][j] = /* ... */ ; if (j <= i) P: A[i][j]+= A[j][i]; } FullS = {(S, (i, j)) | (j − (j mod 8)) + 7 ≤ N} FullP = {(P, (i, j)) | (j − (j mod 8)) + 7 ≤ min(i, N)}

Simon Moll, Johannes Doerfert, Sebastian Hack Approach October 29, 2015 16 / 25

slide-37
SLIDE 37

computer science

saarland

university

Splitting Predicates

Uniform Access Predicate

for (int i = 0; i <= N; i++) for (int j = 0; j <= N; j += 8) { S: A[i][j] = /* ... */ ; if (j <= i) P: A[i][j]+= A[j][i]; } UniFS1 = {(S, (i, j)) | FS1(i, j + 1) = FS1(i, j)}

Simon Moll, Johannes Doerfert, Sebastian Hack Approach October 29, 2015 17 / 25

slide-38
SLIDE 38

computer science

saarland

university

Splitting Predicates

Uniform Access Predicate

for (int i = 0; i <= N; i++) for (int j = 0; j <= N; j += 8) { S: A[i][j] = /* ... */ ; if (j <= i) P: A[i][j]+= A[j][i]; } UniFS1 = {(S, (i, j)) | FS1(i, j + 1) = FS1(i, j)} UniFP1 = {(P, (i, j)) | FP1(i, j + 1) = FP1(i, j)} UniFP2 = {(P, (i, j)) | FP2(i, j + 1) = FP2(i, j)}

Simon Moll, Johannes Doerfert, Sebastian Hack Approach October 29, 2015 17 / 25

slide-39
SLIDE 39

computer science

saarland

university

Splitting Predicates

Uniform Access Predicate

for (int i = 0; i <= N; i++) for (int j = 0; j <= N; j += 8) { S: A[i][j] = /* ... */ ; if (j <= i) P: A[i][j]+= A[j][i]; } UniFS1 = {(S, (i, j)) | FS1(i, j + 1) = FS1(i, j)} = {} UniFP1 = {(P, (i, j)) | FP1(i, j + 1) = FP1(i, j)} = {} UniFP2 = {(P, (i, j)) | FP2(i, j + 1) = FP2(i, j)} = {}

Simon Moll, Johannes Doerfert, Sebastian Hack Approach October 29, 2015 17 / 25

slide-40
SLIDE 40

computer science

saarland

university

Splitting Predicates

Consecutive Access Predicate

for (int i = 0; i <= N; i++) for (int j = 0; j <= N; j += 8) { S: A[i][j] = /* ... */ ; if (j <= i) P: A[i][j]+= A[j][i]; } ConsFS1 = {(S, (i, j)) | FS1(i, j + 1) = FS1(i, j) + 1} ConsFP1 = {(P, (i, j)) | FP1(i, j + 1) = FP1(i, j) + 1} ConsFP2 = {(P, (i, j)) | FP2(i, j + 1) = FP2(i, j) + 1}

Simon Moll, Johannes Doerfert, Sebastian Hack Approach October 29, 2015 18 / 25

slide-41
SLIDE 41

computer science

saarland

university

Splitting Predicates

Consecutive Access Predicate

for (int i = 0; i <= N; i++) for (int j = 0; j <= N; j += 8) { S: A[i][j] = /* ... */ ; if (j <= i) P: A[i][j]+= A[j][i]; } ConsFS1 = {(S, (i, j)) | FS1(i, j + 1) = FS1(i, j) + 1} = IS ConsFP1 = {(P, (i, j)) | FP1(i, j + 1) = FP1(i, j) + 1} = IP ConsFP2 = {(P, (i, j)) | FP2(i, j + 1) = FP2(i, j) + 1} = {}

Simon Moll, Johannes Doerfert, Sebastian Hack Approach October 29, 2015 18 / 25

slide-42
SLIDE 42

computer science

saarland

university

CFG simplification

Hoisting conditionals

for (int i = 0; i <= N; i++) for (int j = 0; j <= N; j += 8) if (i <= NumParticles) { S: ... }

Simon Moll, Johannes Doerfert, Sebastian Hack Approach October 29, 2015 19 / 25

slide-43
SLIDE 43

computer science

saarland

university

CFG simplification

Hoisting conditionals

for (int i = 0; i <= N; i++) for (int j = 0; j <= N; j += 8) if (i <= NumParticles) { S: ... } for (int i = 0; i <= NumParticles; i++) for (int j = 0; j <= i; j += 8) S: ...

Simon Moll, Johannes Doerfert, Sebastian Hack Approach October 29, 2015 19 / 25

slide-44
SLIDE 44

computer science

saarland

university

CFG simplification

Hoisting conditionals

for (int i = 0; i <= N; i++) for (int j = 0; j <= N; j += 8) if (reverse) { S: ... } else { P: ... }

Simon Moll, Johannes Doerfert, Sebastian Hack Approach October 29, 2015 20 / 25

slide-45
SLIDE 45

computer science

saarland

university

CFG simplification

Hoisting conditionals

for (int i = 0; i <= N; i++) for (int j = 0; j <= N; j += 8) if (reverse) { S: ... } else { P: ... } if (reverse) { for (int i = 0; i <= N; i++) for (int j = 0; j <= i; j += 8) S: ... } else { for (int i = 0; i <= N; i++) for (int j = 0; j <= i; j += 8) P: ... }

Simon Moll, Johannes Doerfert, Sebastian Hack Approach October 29, 2015 20 / 25

slide-46
SLIDE 46

computer science

saarland

university

Predicate Based Domain Splitting

Kernel

Simon Moll, Johannes Doerfert, Sebastian Hack Approach October 29, 2015 21 / 25

slide-47
SLIDE 47

computer science

saarland

university

Predicate Based Domain Splitting

Kernel Polyhedral Model

Simon Moll, Johannes Doerfert, Sebastian Hack Approach October 29, 2015 21 / 25

slide-48
SLIDE 48

computer science

saarland

university

Predicate Based Domain Splitting

Kernel Polyhedral Model Subkernel Subkernel Subkernel

Simon Moll, Johannes Doerfert, Sebastian Hack Approach October 29, 2015 21 / 25

slide-49
SLIDE 49

computer science

saarland

university

Predicate Based Domain Splitting

Kernel Polyhedral Model Subkernel Subkernel Subkernel Scalar Kernel

Simon Moll, Johannes Doerfert, Sebastian Hack Approach October 29, 2015 21 / 25

slide-50
SLIDE 50

computer science

saarland

university

Predicate Based Domain Splitting

Kernel Polyhedral Model Splitting Predicates Subkernel Subkernel Subkernel Scalar Kernel Scalar Codelet Scalar Codelet Scalar Codelet Scalar Codelet

Simon Moll, Johannes Doerfert, Sebastian Hack Approach October 29, 2015 21 / 25

slide-51
SLIDE 51

computer science

saarland

university

Predicate Based Domain Splitting

Kernel Polyhedral Model Splitting Predicates Subkernel Subkernel Subkernel Scalar Kernel Scalar Codelet Vector Codelet Vector Codelet Vector Codelet

Simon Moll, Johannes Doerfert, Sebastian Hack Approach October 29, 2015 21 / 25

slide-52
SLIDE 52

computer science

saarland

university

Predicate Based Domain Splitting

Kernel Polyhedral Model Splitting Predicates Subkernel Subkernel Subkernel Scalar Kernel Scalar Codelet Vector Codelet Vector Codelet Vector Codelet Optimized Kernel

Simon Moll, Johannes Doerfert, Sebastian Hack Approach October 29, 2015 21 / 25

slide-53
SLIDE 53

computer science

saarland

university

Evaluation

Pipeline App OpenCL driver Barrier elimination Polly Optimization JIT IOC OpenCL API Kernel module (LLVM) Barrier-free kernels Polyhedral kernel Vector codelets Scalar codelets Dispatch code Domain Knowledge

Simon Moll, Johannes Doerfert, Sebastian Hack Approach October 29, 2015 22 / 25

slide-54
SLIDE 54

computer science

saarland

university

Evaluation

Performance

BinOpt BS Floyd LUD DCT C2D Myo 0.2 0.4 0.6 0.8 1 Speed up

scalar vec only split Intel

Simon Moll, Johannes Doerfert, Sebastian Hack Approach October 29, 2015 23 / 25

slide-55
SLIDE 55

computer science

saarland

university

Ongoing Work

Model synchronization the Polyhedral Model. Apply polyhedral optimizations (scheduling). Improve the representation of non-affine parts.

Simon Moll, Johannes Doerfert, Sebastian Hack Conclusion & Ongoing Work October 29, 2015 24 / 25

slide-56
SLIDE 56

computer science

saarland

university

Conclusion

Simon Moll, Johannes Doerfert, Sebastian Hack Conclusion & Ongoing Work October 29, 2015 25 / 25

slide-57
SLIDE 57

computer science

saarland

university

Simon Moll, Johannes Doerfert, Sebastian Hack Backup October 29, 2015 26 / 25

slide-58
SLIDE 58

computer science

saarland

university

OpenCL Programming Model

work

Simon Moll, Johannes Doerfert, Sebastian Hack Backup October 29, 2015 27 / 25

slide-59
SLIDE 59

computer science

saarland

university

OpenCL Programming Model

work work group

Simon Moll, Johannes Doerfert, Sebastian Hack Backup October 29, 2015 27 / 25

slide-60
SLIDE 60

computer science

saarland

university

OpenCL Programming Model

work work group work item

Simon Moll, Johannes Doerfert, Sebastian Hack Backup October 29, 2015 27 / 25

slide-61
SLIDE 61

computer science

saarland

university

Codelet Score

Scoren(k) :=

           ΣQ∈k,F∈FQ

wconsBox(ConsF(dk)) if n ≥ w

+wuniBox(UniF(dk))

  • tw.

Simon Moll, Johannes Doerfert, Sebastian Hack Backup October 29, 2015 28 / 25

slide-62
SLIDE 62

computer science

saarland

university

Access Splitting Predicate

IC

k :=

  • Q∈k
  • F∈FQ,st

ConsF(dk)=∅

ConsF(dk) .

Simon Moll, Johannes Doerfert, Sebastian Hack Backup October 29, 2015 29 / 25

slide-63
SLIDE 63

computer science

saarland

university

Full Tile Predicate

. . .

n-1 n

full tile

. . .

n-1 n id id − (id mod w)

partial tile

id − (id mod w) + (w − 1) id

Simon Moll, Johannes Doerfert, Sebastian Hack Backup October 29, 2015 30 / 25