Input Space Splitting for OpenCL Simon Moll, Johannes Doerfert, - - PowerPoint PPT Presentation
Input Space Splitting for OpenCL Simon Moll, Johannes Doerfert, - - PowerPoint PPT Presentation
Input Space Splitting for OpenCL Simon Moll, Johannes Doerfert, Sebastian Hack Saarbrcken Graduate School of Computer Science Saarland University Saarbrcken, Germany October 29, 2015 saarland university OpenCL: Execution Model computer
computer science
saarland
university
OpenCL: Execution Model
Simon Moll, Johannes Doerfert, Sebastian Hack Background October 29, 2015 2 / 25
computer science
saarland
university
OpenCL: Parallelized & Vectorized
Simon Moll, Johannes Doerfert, Sebastian Hack Background October 29, 2015 3 / 25
computer science
saarland
university
Vectorization (SIMD)
Perform the same operation for multiple vector lanes simultaneously.
Simon Moll, Johannes Doerfert, Sebastian Hack Background October 29, 2015 4 / 25
computer science
saarland
university
Vectorization (SIMD)
Perform the same operation for multiple vector lanes simultaneously.
Vector Patterns
Consecutive: contiguous entries < i, i + 1, i + 2, i + 3 > Uniform: single entry <i,i,i,i> → i Divergent: unrelated entries < i, j, 7, − > for (i = 0; i < 16; i++) O[i] = I[i] + 2; for (i = 0; i < 16; i += 2) O[i] = I[i] + 1;
Simon Moll, Johannes Doerfert, Sebastian Hack Background October 29, 2015 4 / 25
computer science
saarland
university
Diverging Control Flow
a b c d e f Thread Trace 1
a b c e f
2
a b d e f
3
a b c e b c e f
4
a b c e b d e f
Different threads execute different code paths
Simon Moll, Johannes Doerfert, Sebastian Hack Background October 29, 2015 5 / 25
computer science
saarland
university
Diverging Control Flow
a b c d e f a b c d e f Thread Trace 1
a b c d e b c d e f
2
a b c d e b c d e f
3
a b c d e b c d e f
4
a b c d e b c d e f
Different threads execute different code paths Execute everything, mask out results of inactive threads (using predication, blending) Control flow to data flow conversion on ASTs [Allen & Kennedy ’83] Whole-Function Vectorization on SSA CFGs [K & H ’11]
Simon Moll, Johannes Doerfert, Sebastian Hack Background October 29, 2015 5 / 25
computer science
saarland
university
Non-Divergent Control Flow
Idea: optimize cases where threads do not diverge a b c d e f a b c d e f Thread Trace 1
a b c e b d e f
2
a b c e b d e f
3
a b c e b d e f
4
a b c e b d e f
Simon Moll, Johannes Doerfert, Sebastian Hack Background October 29, 2015 6 / 25
computer science
saarland
university
Non-Divergent Control Flow
Idea: optimize cases where threads do not diverge a b c d e f a b c d e f Thread Trace 1
a b c e b d e f
2
a b c e b d e f
3
a b c e b d e f
4
a b c e b d e f
Option 1: Insert dynamic predicate-tests & branches to skip paths
◮ “Branch on superword condition code” (BOSCC) [Shin et al. PACT’07] ◮ Additional overhead for dynamic test ◮ Does not help against increased register pressure
Simon Moll, Johannes Doerfert, Sebastian Hack Background October 29, 2015 6 / 25
computer science
saarland
university
Non-Divergent Control Flow
Idea: optimize cases where threads do not diverge a b c d e f a b c d e f
u v
Thread Trace 1
a b c e b d e f
2
a b c e b d e f
3
a b c e b d e f
4
a b c e b d e f
Option 2: Statically prove non-divergence of certain blocks
◮ Non-divergent blocks can be excluded from linearization ◮ Less executed code, less register pressure ◮ More conservative than dynamic test
Simon Moll, Johannes Doerfert, Sebastian Hack Background October 29, 2015 6 / 25
computer science
saarland
university
Non-Divergent Control Flow
Idea: optimize cases where threads do not diverge a b c d e f a b c d e f
u u
Thread Trace 1
a b c e f
2
a b c e f
3
a b c e b d e f
4
a b c e b d e f
5
a b c e b d e f
6
a b c e b d e f
Option 3: Statically split non-divergence inputs
◮ Code versions with improved divergence properties ◮ Orthogonal to both other options =
⇒ combination possible
Simon Moll, Johannes Doerfert, Sebastian Hack Background October 29, 2015 6 / 25
computer science
saarland
university
2D Convolution
x y
Simon Moll, Johannes Doerfert, Sebastian Hack Motivation October 29, 2015 7 / 25
computer science
saarland
university
2D Convolution
x y
Simon Moll, Johannes Doerfert, Sebastian Hack Motivation October 29, 2015 7 / 25
computer science
saarland
university
2D Convolution
x y
Simon Moll, Johannes Doerfert, Sebastian Hack Motivation October 29, 2015 7 / 25
computer science
saarland
university
2D Convolution
x y
Simon Moll, Johannes Doerfert, Sebastian Hack Motivation October 29, 2015 7 / 25
computer science
saarland
university
2D Convolution
x y
Simon Moll, Johannes Doerfert, Sebastian Hack Motivation October 29, 2015 7 / 25
computer science
saarland
university
2D Convolution
x y
Simon Moll, Johannes Doerfert, Sebastian Hack Motivation October 29, 2015 7 / 25
computer science
saarland
university
2D Convolution
x y
Simon Moll, Johannes Doerfert, Sebastian Hack Motivation October 29, 2015 7 / 25
computer science
saarland
university
2D Convolution
int left = x - 2; int right = x + 2; int top = y - 2; int bottom = y + 2; int sum = 0; for (int i = left; i <= right; ++i) for (int j = top; j <= bottom; ++j) sum += input[j][i] * mask[j - top][i - left];
- utput[y][x] = sum;
Simon Moll, Johannes Doerfert, Sebastian Hack Motivation October 29, 2015 8 / 25
computer science
saarland
university
2D Convolution
auto left = x - 2; auto right = x + 2; int top = y - 2; int bottom = y + 2; int sum = 0; for (auto i = left; i <= right; ++i) for (int j = top; j <= bottom; ++j) sum += input[j][i] * mask[j - top][i - left];
- utput[y][x] = sum;
Simon Moll, Johannes Doerfert, Sebastian Hack Motivation October 29, 2015 9 / 25
computer science
saarland
university
2D Convolution
x y
Simon Moll, Johannes Doerfert, Sebastian Hack Motivation October 29, 2015 10 / 25
computer science
saarland
university
2D Convolution
x y
Simon Moll, Johannes Doerfert, Sebastian Hack Motivation October 29, 2015 10 / 25
computer science
saarland
university
2D Convolution
int left = MAX(0, x - 2); int right = MIN(width - 1, x + 2); int top = MAX(0, y - 2); int bottom = MIN(height - 1, y + 2); int sum = 0; for (int i = left; i <= right; ++i) for (int j = top; j <= bottom; ++j) sum += input[j][i] * mask[j − (y − 2)][i − (x − 2)];
- utput[y][x] = sum;
Simon Moll, Johannes Doerfert, Sebastian Hack Motivation October 29, 2015 11 / 25
computer science
saarland
university
2D Convolution
auto left = MAX(0, x - 2); auto right = MIN(width - 1, x + 2); int top = MAX(0, y - 2); int bottom = MIN(height - 1, y + 2); int sum = 0; for (auto i = left; i <= right; ++i) for (int j = top; j <= bottom; ++j) sum += input[j][i] * mask[j − (y − 2)][i − (x − 2)];
- utput[y][x] = sum;
Simon Moll, Johannes Doerfert, Sebastian Hack Motivation October 29, 2015 12 / 25
computer science
saarland
university
Input Space Splitting
x y
Simon Moll, Johannes Doerfert, Sebastian Hack Input Space Splitting October 29, 2015 13 / 25
computer science
saarland
university
Input Space Splitting
x y x y
Simon Moll, Johannes Doerfert, Sebastian Hack Input Space Splitting October 29, 2015 13 / 25
computer science
saarland
university
Input Space Splitting
vector
x y
scalar
x y
Simon Moll, Johannes Doerfert, Sebastian Hack Input Space Splitting October 29, 2015 13 / 25
computer science
saarland
university
The Polyhedral Model
S: A[i][j] = /* ... */ ; if (j <= i) P: A[i][j]+= A[j][i];
Simon Moll, Johannes Doerfert, Sebastian Hack Background October 29, 2015 14 / 25
computer science
saarland
university
The Polyhedral Model
for (int i = 0; i <= N; i++) for (int j = 0; j <= N; j++) { S: A[i][j] = /* ... */ ; if (j <= i) P: A[i][j]+= A[j][i]; }
Simon Moll, Johannes Doerfert, Sebastian Hack Background October 29, 2015 15 / 25
computer science
saarland
university
The Polyhedral Model
for (int i = 0; i <= N; i++) for (int j = 0; j <= N; j++) { S: A[i][j] = /* ... */ ; if (j <= i) P: A[i][j]+= A[j][i]; }
i j N N
IS = {(S, (i, j)) | 0 ≤ i ≤ N ∧ 0 ≤ j ≤ N}
Simon Moll, Johannes Doerfert, Sebastian Hack Background October 29, 2015 15 / 25
computer science
saarland
university
The Polyhedral Model
for (int i = 0; i <= N; i++) for (int j = 0; j <= N; j++) { S: A[i][j] = /* ... */ ; if (j <= i) P: A[i][j]+= A[j][i]; }
i j N N
IS = {(S, (i, j)) | 0 ≤ i ≤ N ∧ 0 ≤ j ≤ N} IP = {(P, (i, j)) | 0 ≤ i ≤ N ∧ 0 ≤ j ≤ i}
Simon Moll, Johannes Doerfert, Sebastian Hack Background October 29, 2015 15 / 25
computer science
saarland
university
The Polyhedral Model
for (int i = 0; i <= N; i++) for (int j = 0; j <= N; j++) { S: A[i][j] = /* ... */ ; if (j <= i) P: A[i][j]+= A[j][i]; }
i j N N
FS = {(S, (i, j)) → (i, j)}
Simon Moll, Johannes Doerfert, Sebastian Hack Background October 29, 2015 15 / 25
computer science
saarland
university
The Polyhedral Model
for (int i = 0; i <= N; i++) for (int j = 0; j <= N; j++) { S: A[i][j] = /* ... */ ; if (j <= i) P: A[i][j]+= A[j][i]; }
i j N N
FS = {(S, (i, j)) → (i, j)} FP1 = {(P, (i, j)) → (i, j)} FP2 = {(P, (i, j)) → (j, i)}
Simon Moll, Johannes Doerfert, Sebastian Hack Background October 29, 2015 15 / 25
computer science
saarland
university
Splitting Predicates
Full Tile Predicate
for (int i = 0; i <= N; i++) for (int j = 0; j <= N; j += 8) { S: A[i][j] = /* ... */ ; if (j <= i) P: A[i][j]+= A[j][i]; }
Simon Moll, Johannes Doerfert, Sebastian Hack Approach October 29, 2015 16 / 25
computer science
saarland
university
Splitting Predicates
Full Tile Predicate
for (int i = 0; i <= N; i++) for (int j = 0; j <= N; j += 8) { S: A[i][j] = /* ... */ ; if (j <= i) P: A[i][j]+= A[j][i]; } FullS = {(S, (i, j)) | (j − (j mod 8)) + 7 ≤ N}
Simon Moll, Johannes Doerfert, Sebastian Hack Approach October 29, 2015 16 / 25
computer science
saarland
university
Splitting Predicates
Full Tile Predicate
for (int i = 0; i <= N; i++) for (int j = 0; j <= N; j += 8) { S: A[i][j] = /* ... */ ; if (j <= i) P: A[i][j]+= A[j][i]; } FullS = {(S, (i, j)) | (j − (j mod 8)) + 7 ≤ N} FullP = {(P, (i, j)) | (j − (j mod 8)) + 7 ≤ min(i, N)}
Simon Moll, Johannes Doerfert, Sebastian Hack Approach October 29, 2015 16 / 25
computer science
saarland
university
Splitting Predicates
Uniform Access Predicate
for (int i = 0; i <= N; i++) for (int j = 0; j <= N; j += 8) { S: A[i][j] = /* ... */ ; if (j <= i) P: A[i][j]+= A[j][i]; } UniFS1 = {(S, (i, j)) | FS1(i, j + 1) = FS1(i, j)}
Simon Moll, Johannes Doerfert, Sebastian Hack Approach October 29, 2015 17 / 25
computer science
saarland
university
Splitting Predicates
Uniform Access Predicate
for (int i = 0; i <= N; i++) for (int j = 0; j <= N; j += 8) { S: A[i][j] = /* ... */ ; if (j <= i) P: A[i][j]+= A[j][i]; } UniFS1 = {(S, (i, j)) | FS1(i, j + 1) = FS1(i, j)} UniFP1 = {(P, (i, j)) | FP1(i, j + 1) = FP1(i, j)} UniFP2 = {(P, (i, j)) | FP2(i, j + 1) = FP2(i, j)}
Simon Moll, Johannes Doerfert, Sebastian Hack Approach October 29, 2015 17 / 25
computer science
saarland
university
Splitting Predicates
Uniform Access Predicate
for (int i = 0; i <= N; i++) for (int j = 0; j <= N; j += 8) { S: A[i][j] = /* ... */ ; if (j <= i) P: A[i][j]+= A[j][i]; } UniFS1 = {(S, (i, j)) | FS1(i, j + 1) = FS1(i, j)} = {} UniFP1 = {(P, (i, j)) | FP1(i, j + 1) = FP1(i, j)} = {} UniFP2 = {(P, (i, j)) | FP2(i, j + 1) = FP2(i, j)} = {}
Simon Moll, Johannes Doerfert, Sebastian Hack Approach October 29, 2015 17 / 25
computer science
saarland
university
Splitting Predicates
Consecutive Access Predicate
for (int i = 0; i <= N; i++) for (int j = 0; j <= N; j += 8) { S: A[i][j] = /* ... */ ; if (j <= i) P: A[i][j]+= A[j][i]; } ConsFS1 = {(S, (i, j)) | FS1(i, j + 1) = FS1(i, j) + 1} ConsFP1 = {(P, (i, j)) | FP1(i, j + 1) = FP1(i, j) + 1} ConsFP2 = {(P, (i, j)) | FP2(i, j + 1) = FP2(i, j) + 1}
Simon Moll, Johannes Doerfert, Sebastian Hack Approach October 29, 2015 18 / 25
computer science
saarland
university
Splitting Predicates
Consecutive Access Predicate
for (int i = 0; i <= N; i++) for (int j = 0; j <= N; j += 8) { S: A[i][j] = /* ... */ ; if (j <= i) P: A[i][j]+= A[j][i]; } ConsFS1 = {(S, (i, j)) | FS1(i, j + 1) = FS1(i, j) + 1} = IS ConsFP1 = {(P, (i, j)) | FP1(i, j + 1) = FP1(i, j) + 1} = IP ConsFP2 = {(P, (i, j)) | FP2(i, j + 1) = FP2(i, j) + 1} = {}
Simon Moll, Johannes Doerfert, Sebastian Hack Approach October 29, 2015 18 / 25
computer science
saarland
university
CFG simplification
Hoisting conditionals
for (int i = 0; i <= N; i++) for (int j = 0; j <= N; j += 8) if (i <= NumParticles) { S: ... }
Simon Moll, Johannes Doerfert, Sebastian Hack Approach October 29, 2015 19 / 25
computer science
saarland
university
CFG simplification
Hoisting conditionals
for (int i = 0; i <= N; i++) for (int j = 0; j <= N; j += 8) if (i <= NumParticles) { S: ... } for (int i = 0; i <= NumParticles; i++) for (int j = 0; j <= i; j += 8) S: ...
Simon Moll, Johannes Doerfert, Sebastian Hack Approach October 29, 2015 19 / 25
computer science
saarland
university
CFG simplification
Hoisting conditionals
for (int i = 0; i <= N; i++) for (int j = 0; j <= N; j += 8) if (reverse) { S: ... } else { P: ... }
Simon Moll, Johannes Doerfert, Sebastian Hack Approach October 29, 2015 20 / 25
computer science
saarland
university
CFG simplification
Hoisting conditionals
for (int i = 0; i <= N; i++) for (int j = 0; j <= N; j += 8) if (reverse) { S: ... } else { P: ... } if (reverse) { for (int i = 0; i <= N; i++) for (int j = 0; j <= i; j += 8) S: ... } else { for (int i = 0; i <= N; i++) for (int j = 0; j <= i; j += 8) P: ... }
Simon Moll, Johannes Doerfert, Sebastian Hack Approach October 29, 2015 20 / 25
computer science
saarland
university
Predicate Based Domain Splitting
Kernel
Simon Moll, Johannes Doerfert, Sebastian Hack Approach October 29, 2015 21 / 25
computer science
saarland
university
Predicate Based Domain Splitting
Kernel Polyhedral Model
Simon Moll, Johannes Doerfert, Sebastian Hack Approach October 29, 2015 21 / 25
computer science
saarland
university
Predicate Based Domain Splitting
Kernel Polyhedral Model Subkernel Subkernel Subkernel
Simon Moll, Johannes Doerfert, Sebastian Hack Approach October 29, 2015 21 / 25
computer science
saarland
university
Predicate Based Domain Splitting
Kernel Polyhedral Model Subkernel Subkernel Subkernel Scalar Kernel
Simon Moll, Johannes Doerfert, Sebastian Hack Approach October 29, 2015 21 / 25
computer science
saarland
university
Predicate Based Domain Splitting
Kernel Polyhedral Model Splitting Predicates Subkernel Subkernel Subkernel Scalar Kernel Scalar Codelet Scalar Codelet Scalar Codelet Scalar Codelet
Simon Moll, Johannes Doerfert, Sebastian Hack Approach October 29, 2015 21 / 25
computer science
saarland
university
Predicate Based Domain Splitting
Kernel Polyhedral Model Splitting Predicates Subkernel Subkernel Subkernel Scalar Kernel Scalar Codelet Vector Codelet Vector Codelet Vector Codelet
Simon Moll, Johannes Doerfert, Sebastian Hack Approach October 29, 2015 21 / 25
computer science
saarland
university
Predicate Based Domain Splitting
Kernel Polyhedral Model Splitting Predicates Subkernel Subkernel Subkernel Scalar Kernel Scalar Codelet Vector Codelet Vector Codelet Vector Codelet Optimized Kernel
Simon Moll, Johannes Doerfert, Sebastian Hack Approach October 29, 2015 21 / 25
computer science
saarland
university
Evaluation
Pipeline App OpenCL driver Barrier elimination Polly Optimization JIT IOC OpenCL API Kernel module (LLVM) Barrier-free kernels Polyhedral kernel Vector codelets Scalar codelets Dispatch code Domain Knowledge
Simon Moll, Johannes Doerfert, Sebastian Hack Approach October 29, 2015 22 / 25
computer science
saarland
university
Evaluation
Performance
BinOpt BS Floyd LUD DCT C2D Myo 0.2 0.4 0.6 0.8 1 Speed up
scalar vec only split Intel
Simon Moll, Johannes Doerfert, Sebastian Hack Approach October 29, 2015 23 / 25
computer science
saarland
university
Ongoing Work
Model synchronization the Polyhedral Model. Apply polyhedral optimizations (scheduling). Improve the representation of non-affine parts.
Simon Moll, Johannes Doerfert, Sebastian Hack Conclusion & Ongoing Work October 29, 2015 24 / 25
computer science
saarland
university
Conclusion
Simon Moll, Johannes Doerfert, Sebastian Hack Conclusion & Ongoing Work October 29, 2015 25 / 25
computer science
saarland
university
Simon Moll, Johannes Doerfert, Sebastian Hack Backup October 29, 2015 26 / 25
computer science
saarland
university
OpenCL Programming Model
work
Simon Moll, Johannes Doerfert, Sebastian Hack Backup October 29, 2015 27 / 25
computer science
saarland
university
OpenCL Programming Model
work work group
Simon Moll, Johannes Doerfert, Sebastian Hack Backup October 29, 2015 27 / 25
computer science
saarland
university
OpenCL Programming Model
work work group work item
Simon Moll, Johannes Doerfert, Sebastian Hack Backup October 29, 2015 27 / 25
computer science
saarland
university
Codelet Score
Scoren(k) :=
ΣQ∈k,F∈FQ
wconsBox(ConsF(dk)) if n ≥ w
+wuniBox(UniF(dk))
- tw.
Simon Moll, Johannes Doerfert, Sebastian Hack Backup October 29, 2015 28 / 25
computer science
saarland
university
Access Splitting Predicate
IC
k :=
- Q∈k
- F∈FQ,st
ConsF(dk)=∅
ConsF(dk) .
Simon Moll, Johannes Doerfert, Sebastian Hack Backup October 29, 2015 29 / 25
computer science
saarland
university
Full Tile Predicate
. . .
n-1 n
full tile
. . .
n-1 n id id − (id mod w)
partial tile
id − (id mod w) + (w − 1) id
Simon Moll, Johannes Doerfert, Sebastian Hack Backup October 29, 2015 30 / 25