Polly Polyhedral Optimizations for LLVM Tobias Grosser - Hongbin - - PowerPoint PPT Presentation

polly polyhedral optimizations for llvm
SMART_READER_LITE
LIVE PREVIEW

Polly Polyhedral Optimizations for LLVM Tobias Grosser - Hongbin - - PowerPoint PPT Presentation

Polly Polyhedral Optimizations for LLVM Tobias Grosser - Hongbin Zheng - Raghesh Aloor Andreas Simb urger - Armin Gr osslinger - Louis-No el Pouchet April 03, 2011 Polly - Polyhedral Optimizations for LLVM April 03, 2011 1 / 27


slide-1
SLIDE 1

Polly Polyhedral Optimizations for LLVM

Tobias Grosser - Hongbin Zheng - Raghesh Aloor Andreas Simb¨ urger - Armin Gr¨

  • sslinger - Louis-No¨

el Pouchet April 03, 2011

Polly - Polyhedral Optimizations for LLVM April 03, 2011 1 / 27

slide-2
SLIDE 2

Polyhedral today Good polyhedral libraries Good solutions to some problems (Parallelisation, Tiling, GPGPU) Several successfull research projects First compiler integrations but still limited IMPACT. Can Polly help to change this?

Polly - Polyhedral Optimizations for LLVM April 03, 2011 2 / 27

slide-3
SLIDE 3

Outline

1

LLVM

2

Polly - Concepts & Implementation

3

Experiments

3

Future Work + Conclusion

Polly - Polyhedral Optimizations for LLVM April 03, 2011 3 / 27

slide-4
SLIDE 4

LLVM

Compiler Infrastructure Low Level Intermediate Language

◮ SSA, Register Machine ◮ Language and Target Independent ◮ Integrated SIMD Support

Large Set of Analysis and Optimization Optimizations Compile, Link, and Run Time JIT Infrastructure Very convenient to work with

Polly - Polyhedral Optimizations for LLVM April 03, 2011 4 / 27

slide-5
SLIDE 5

Classical Compilers:

◮ clang → C/C++/Objective-C ◮ Mono → .Net ◮ OpenJDK → Java ◮ dragonegg → C/C++/Fortran/ADA/Go ◮ Others → Ruby/Python/Lua

GPGPU: PTX backend OpenCL (NVIDIA, AMD, INTEL, Apple, Qualcomm, ...) Graphics Rendering (VMWare Gallium3D/LLVMPipe/LunarGlass/Adobe Hydra) Web

◮ ActionScript (Adobe) ◮ Google Native Client

HLS (C-To-Verilog, LegUp, UCLA - autoESL) Source to Source: LLVM C-Backend

Polly - Polyhedral Optimizations for LLVM April 03, 2011 5 / 27

slide-6
SLIDE 6

The Architecture

LLVM IR LLVM IR

PSCoP

SCoP Detection & LLVM to Poly Code Generation SIMD Backend OpenMP Backend JSCoP Import/Export * Classical loop transformations (Blocking, Interchange, Fusion, ...) * Expose parallelism * Dead instruction elimination / Constant propagation

Transformations

Manual Optimization / LooPo / Pluto / PoCC+Pluto / ... Dependency Analysis PTX Backend

Polly - Polyhedral Optimizations for LLVM April 03, 2011 6 / 27

slide-7
SLIDE 7

The SCoP - Classical Definition

for i = 1 to (5n + 3) for j = n to (4i + 3n + 4) A[i-j] = A[i] if i < (n - 20) A[i+20] = j Structured control flow

◮ Regular for loops ◮ Conditions

Affine expressions in:

◮ Loop bounds, conditions, access functions

Side effect free

Polly - Polyhedral Optimizations for LLVM April 03, 2011 7 / 27

slide-8
SLIDE 8

AST based frameworks

What about: Goto-based loops C++ iterators C++0x foreach loop

Common restrictions

Limited to subset of C/C++ Require explicit annotations Only canonical code Correct? (Integer overflow, Operator overloading, ...)

Polly - Polyhedral Optimizations for LLVM April 03, 2011 8 / 27

slide-9
SLIDE 9

Semantic SCoP

Thanks to LLVM Analysis and Optimization Passes:

SCoP - The Polly way

Structured control flow

◮ Regular for loops → Anything that acts like a regular for loop ◮ Conditions

Affine expressions → Expressions that calculate an affine result Side effect free known Memory accesses through arrays → Arrays + Pointers

Polly - Polyhedral Optimizations for LLVM April 03, 2011 9 / 27

slide-10
SLIDE 10

Valid SCoPs

do..while loop

i = 0; do { int b = 2 * i; int c = b * 3 + 5 * i; A[c] = i; i += 2; } while (i < N);

pointer loop

int A[1024]; void pointer_loop () { int *B = A; while (B < &A[1024]) { *B = 1; ++B; } }

Polly - Polyhedral Optimizations for LLVM April 03, 2011 10 / 27

slide-11
SLIDE 11

Polyhedral Representation - SCoP

SCoP = (Context, [Statement]) Statement = (Domain, Schedule, [Access]) Access = (“read”|“write”|“may write”, Relation) Interesting: Data structures are integer sets/maps Domain is read-only Schedule can be partially affine Access is a relation Access can be may write

Polly - Polyhedral Optimizations for LLVM April 03, 2011 11 / 27

slide-12
SLIDE 12

Applying transformations

D = {Stmt[i, j] : 0 <= i < 32 ∧ 0 <= j < 1000} S = {Stmt[i, j] → [i, j]} S′ = S for (i = 0; i < 32; i++) for (j = 0; j < 1000; j++) A[i][j] += 1;

Polly - Polyhedral Optimizations for LLVM April 03, 2011 12 / 27

slide-13
SLIDE 13

Applying transformations

D = {Stmt[i, j] : 0 <= i < 32 ∧ 0 <= j < 1000} S = {Stmt[i, j] → [i, j]} TInterchange = {[i, j] → [j, i]} S′ = S ◦ TInterchange for (j = 0; j < 1000; j++) for (i = 0; i < 32; i++) A[i][j] += 1;

Polly - Polyhedral Optimizations for LLVM April 03, 2011 13 / 27

slide-14
SLIDE 14

Applying transformations

D = {Stmt[i, j] : 0 <= i < 32 ∧ 0 <= j < 1000} S = {Stmt[i, j] → [i, j]} TInterchange = {[i, j] → [j, i]} TStripMine = {[i, j] → [i, jj, j] : jj mod 4 = 0 ∧ jj <= j < jj + 4} S′ = S ◦ TInterchange ◦ TStripMine for (j = 0; j < 1000; j++) for (ii = 0; ii < 32; ii+=4) for (i = ii; i < ii+4; i++) A[i][j] += 1;

Polly - Polyhedral Optimizations for LLVM April 03, 2011 14 / 27

slide-15
SLIDE 15

JSCoP - Exchange format

Specification: Representation of a SCoP Stored as JSON text file Integer Sets/Maps use ISL Representation Benefits: Can express modern polyhedral representation Can be imported easily (JSON bindings readily available) Is already valid Python

Polly - Polyhedral Optimizations for LLVM April 03, 2011 15 / 27

slide-16
SLIDE 16

JSCoP - Example

{ "name": "body => loop.end", "context": "[N] -> { []: N >= 0 }", "statements": [{ "name": "Stmt", "domain": "[N] -> { Stmt[i0, i1] : 0 <= i0, i1 <= N }", "schedule": "[N] -> { Stmt[i0, i1] -> scattering[i0, i1] }", "accesses": [{ "kind": "read", "relation": "[N] -> { Stmt[i0, i1] -> A[o0] }" }, { "kind": "write", "relation": "[N] -> { Stmt[i0, i1] -> C[i0][i1] }" }] }] }

Polly - Polyhedral Optimizations for LLVM April 03, 2011 16 / 27

slide-17
SLIDE 17

Optimized Code Generation

Automatically detect parallelism, after code generation Automatically transform it to:

◮ OpenMP, if loop ⋆ is parallel ⋆ is not surrounded by any other parallel loop ◮ Efficient SIMD instructions, if loop ⋆ is innermost ⋆ is parallel ⋆ has constant number of iterations Polly - Polyhedral Optimizations for LLVM April 03, 2011 17 / 27

slide-18
SLIDE 18

Generation of Parallel Code

for (i = 0; i < N; i++) for (j = 0; j < N; j++) for (kk = 0; kk < 1024; kk++) for (k = kk; k < kk+4; k++) A[j][k] += 9; for (j = 0; j < M; j++) B[i] = B[i] * i;AAA

Polly - Polyhedral Optimizations for LLVM April 03, 2011 18 / 27

slide-19
SLIDE 19

Generation of Parallel Code

for (i = 0; i < N; i++) for (j = 0; j < N; j++) for (kk = 0; kk < 1024; kk++) for (k = kk; k < kk+4; k++) A[j][k] += 9; for (j = 0; j < M; j++) B[i] = B[i] * i;AAA

S = {[i, 0, j, ...] : 0 <= i, j < N} S = {[i, 1, j, ...] : 0 <= i, j < N}

Polly - Polyhedral Optimizations for LLVM April 03, 2011 19 / 27

slide-20
SLIDE 20

Generation of Parallel Code

for (i = 0; i < N; i++) #pragma omp parallel for (j = 0; j < N; j++) for (kk = 0; kk < 1024; kk++) for (k = kk; k < kk+4; k++) A[j][k] += 9; for (j = 0; j < M; j++) B[i] = B[i] * i;AAA

Polly - Polyhedral Optimizations for LLVM April 03, 2011 20 / 27

slide-21
SLIDE 21

Generation of Parallel Code

for (i = 0; i < N; i++) #pragma omp parallel for (j = 0; j < N; j++) for (kk = 0; kk < 1024; kk++) for (k = kk; k < kk+4; k++) A[j][k] += 9; for (j = 0; j < M; j++) B[i] = B[i] * i;AAA

Polly - Polyhedral Optimizations for LLVM April 03, 2011 21 / 27

slide-22
SLIDE 22

Generation of Parallel Code

for (i = 0; i < N; i++) #pragma omp parallel for (j = 0; j < N; j++) for (kk = 0; kk < 1024; kk++) A[j][kk:kk+3] += [9,9,9,9]; for (j = 0; j < M; j++) B[i] = B[i] * i;AAA

Polly - Polyhedral Optimizations for LLVM April 03, 2011 22 / 27

slide-23
SLIDE 23

Optimizing of Matrix Multiply

1 2 3 4 5 6 7 8 9 clang -O3 gcc -ffast-math -O3 icc -fast Polly: Only LLVM -O3 Polly: + Strip mining Polly: += Vectorization Polly: += Hoisting Polly: += Unrolling Speedup

32x32 double, Transposed matric Multiply, C[i][j] += A[k][i] * B[j][k]; Intel R Core R i5 @ 2.40GH, polly and clang from 23. March 2011

Polly - Polyhedral Optimizations for LLVM April 03, 2011 23 / 27

slide-24
SLIDE 24

Pluto Tiling on Polybench

Polybench 2.0 (large data set), Intel R Xeon R X5670 @ 2.93GH polly and clang from 23. March 2011

Polly - Polyhedral Optimizations for LLVM April 03, 2011 24 / 27

slide-25
SLIDE 25

Current Status

LLVM IR LLVM IR

PSCoP

SCoP Detection & LLVM to Poly Code Generation SIMD Backend OpenMP Backend JSCoP Import/Export * Classical loop transformations (Blocking, Interchange, Fusion, ...) * Expose parallelism * Dead instruction elimination / Constant propagation

Transformations

Manual Optimization / LooPo / Pluto / PoCC+Pluto / ... Dependency Analysis Usable for experiments Planned Under Construction PTX Backend

Polly - Polyhedral Optimizations for LLVM April 03, 2011 25 / 27

slide-26
SLIDE 26

Future Work

Increase general coverage Expose more SIMDization opportunities Modifieable Memory Access Functions GPU code generation

Polly - Polyhedral Optimizations for LLVM April 03, 2011 26 / 27

slide-27
SLIDE 27

Polly - Conclusion

Automatic SCoP Extraction Non canonical SCoPs Modern Polyhedral Representation JSCoP - Connect External Optimizers OpenMP/SIMD/PTX backends What features do we miss to apply YOUR optimizations? http://wiki.llvm.org/Polly

Polly - Polyhedral Optimizations for LLVM April 03, 2011 27 / 27