GRAPHITE: Polyhedral Analyses and Optimizations for GCC Sebastian - - PowerPoint PPT Presentation

graphite polyhedral analyses and optimizations for gcc
SMART_READER_LITE
LIVE PREVIEW

GRAPHITE: Polyhedral Analyses and Optimizations for GCC Sebastian - - PowerPoint PPT Presentation

GRAPHITE: Polyhedral Analyses and Optimizations for GCC Sebastian Pop 1 , Albert Cohen 2 , edric Bastoul 2 , Sylvain Girbal 2 , C e Silber 1 , Nicolas Vasilache 2 Georges-Andr 1 CRI/ENSMP 2 Alchemy/INRIA, LRI/Paris Sud 11 University June,


slide-1
SLIDE 1

GRAPHITE: Polyhedral Analyses and Optimizations for GCC Sebastian Pop 1, Albert Cohen 2, C´ edric Bastoul 2, Sylvain Girbal 2, Georges-Andr´ e Silber 1, Nicolas Vasilache 2

1CRI/ENSMP 2Alchemy/INRIA, LRI/Paris Sud 11 University

June, 2006

S.Pop, A.Cohen, C.Bastoul, S.Girbal, G.A.Silber, N.Vasilache GRAPHITE: Polyhedral Analyses and Optimizations for GCC

slide-2
SLIDE 2

Architecture of GCC and Loop Nest Optimizer

C C++ Java F95 Ada GENERIC Machine description arm ppc x86 Analyses − aliasing − number of iterations − data dependences GIMPLE LNO RTL GIMPLE + CFG + SSA + Loops

S.Pop, A.Cohen, C.Bastoul, S.Girbal, G.A.Silber, N.Vasilache GRAPHITE: Polyhedral Analyses and Optimizations for GCC

slide-3
SLIDE 3

Problems with Classical LNO Transforms “source to source” modifies the compiled program difficult to undo

  • rder of transforms fixed once for all

invalidated data deps: ad-hoc correction or rebuild difficult to compose

S.Pop, A.Cohen, C.Bastoul, S.Girbal, G.A.Silber, N.Vasilache GRAPHITE: Polyhedral Analyses and Optimizations for GCC

slide-4
SLIDE 4

Problems with Classical LNO Transforms “source to source” modifies the compiled program difficult to undo

  • rder of transforms fixed once for all

invalidated data deps: ad-hoc correction or rebuild difficult to compose solved in WRaP-IT(from 2002 at INRIA on ORC/Open64) GRAPHITE = WRaP-IT for GCC

S.Pop, A.Cohen, C.Bastoul, S.Girbal, G.A.Silber, N.Vasilache GRAPHITE: Polyhedral Analyses and Optimizations for GCC

slide-5
SLIDE 5

GRAPHITE : Representation on Top of Gimple-SSA Statements + parametric affine inequalities

1 a domain = bounds of enclosing loops 2 a list of access functions 3 a schedule = execution time

for (i=0; i<m; i++) for (j=5; j<n; j++) A[2*i][j+1] = ...

2 6 6 6 4 i j m n cst 1 −1 1 −1 1 5 −1 1 −1 3 7 7 7 5 i ≥ 0 −i + m ≥ −1 j ≥ 5 −j + n ≥ −1

S.Pop, A.Cohen, C.Bastoul, S.Girbal, G.A.Silber, N.Vasilache GRAPHITE: Polyhedral Analyses and Optimizations for GCC

slide-6
SLIDE 6

GRAPHITE : Representation on Top of Gimple-SSA Statements + parametric affine inequalities

1 a domain = bounds of enclosing loops 2 a list of access functions 3 a schedule = execution time

for (i=0; i<m; i++) for (j=5; j<n; j++) A[2*i][j+1] = ...

2 4 i j m n cst 2 1 1 3 5 2 ∗ i j + 1

S.Pop, A.Cohen, C.Bastoul, S.Girbal, G.A.Silber, N.Vasilache GRAPHITE: Polyhedral Analyses and Optimizations for GCC

slide-7
SLIDE 7

GRAPHITE : Representation on Top of Gimple-SSA Statements + parametric affine inequalities

1 a domain = bounds of enclosing loops 2 a list of access functions 3 a schedule = execution time

GRAPHITE(1, 2, 3) extends LAMBDA(1, 2) GRAPHITE: Gimple Represented As Polyhedra

(with interchangeable envelopes)

S.Pop, A.Cohen, C.Bastoul, S.Girbal, G.A.Silber, N.Vasilache GRAPHITE: Polyhedral Analyses and Optimizations for GCC

slide-8
SLIDE 8

GRAPHITE versus LAMBDA common part: unimodular transform data and iteration order transform regions: extended from loops to SCoP

“static control parts”: sequences, affine conditions and loops

GRAPHITE knows about the sequence!

enables more loop transforms: fusion, fission, tiling, software pipelining, scheduling

S.Pop, A.Cohen, C.Bastoul, S.Girbal, G.A.Silber, N.Vasilache GRAPHITE: Polyhedral Analyses and Optimizations for GCC

slide-9
SLIDE 9

GRAPHITE versus LAMBDA common part: unimodular transform data and iteration order transform regions: extended from loops to SCoP

“static control parts”: sequences, affine conditions and loops

GRAPHITE knows about the sequence!

enables more loop transforms: fusion, fission, tiling, software pipelining, scheduling

S.Pop, A.Cohen, C.Bastoul, S.Girbal, G.A.Silber, N.Vasilache GRAPHITE: Polyhedral Analyses and Optimizations for GCC

slide-10
SLIDE 10

GRAPHITE versus LAMBDA common part: unimodular transform data and iteration order transform regions: extended from loops to SCoP

“static control parts”: sequences, affine conditions and loops

GRAPHITE knows about the sequence!

enables more loop transforms: fusion, fission, tiling, software pipelining, scheduling

S.Pop, A.Cohen, C.Bastoul, S.Girbal, G.A.Silber, N.Vasilache GRAPHITE: Polyhedral Analyses and Optimizations for GCC

slide-11
SLIDE 11

Schedule: Operational Semantics (How Program Works) build a scheduling function S[ [stmt] ] → time sequence [ [s1; s2] ]: trivial S[ [s1] ] = t S[ [s2] ] = t + 1 loop [ [loop1 s end1] ]: add new dimensions S[ [loop1] ] = t S[ [s] ] = (t, i1, 0) i1 indexes loop1 iterations: dynamic time

S.Pop, A.Cohen, C.Bastoul, S.Girbal, G.A.Silber, N.Vasilache GRAPHITE: Polyhedral Analyses and Optimizations for GCC

slide-12
SLIDE 12

Schedule: Operational Semantics (How Program Works) build a scheduling function S[ [stmt] ] → time sequence [ [s1; s2] ]: trivial S[ [s1] ] = t S[ [s2] ] = t + 1 loop [ [loop1 s end1] ]: add new dimensions S[ [loop1] ] = t S[ [s] ] = (t, i1, 0) i1 indexes loop1 iterations: dynamic time

S.Pop, A.Cohen, C.Bastoul, S.Girbal, G.A.Silber, N.Vasilache GRAPHITE: Polyhedral Analyses and Optimizations for GCC

slide-13
SLIDE 13

Schedule: Example

S0; S1; for (i=0; i<m; i++) { S2; for (j=5; j<n; j++) S3; } S4;

S[ [S0] ] = » i j m n cst –

S.Pop, A.Cohen, C.Bastoul, S.Girbal, G.A.Silber, N.Vasilache GRAPHITE: Polyhedral Analyses and Optimizations for GCC

slide-14
SLIDE 14

Schedule: Example

S0; S1; for (i=0; i<m; i++) { S2; for (j=5; j<n; j++) S3; } S4;

S[ [S0] ] = » i j m n cst – S[ [S1] ] = » i j m n cst 1 –

S.Pop, A.Cohen, C.Bastoul, S.Girbal, G.A.Silber, N.Vasilache GRAPHITE: Polyhedral Analyses and Optimizations for GCC

slide-15
SLIDE 15

Schedule: Example

S0; S1; for (i=0; i<m; i++) { S2; for (j=5; j<n; j++) S3; } S4;

S[ [S0] ] = » i j m n cst – S[ [S1] ] = » i j m n cst 1 – S[ [S2] ] = 2 6 6 4 i j m n cst 2 1 3 7 7 5

S.Pop, A.Cohen, C.Bastoul, S.Girbal, G.A.Silber, N.Vasilache GRAPHITE: Polyhedral Analyses and Optimizations for GCC

slide-16
SLIDE 16

Schedule: Example

S0; S1; for (i=0; i<m; i++) { S2; for (j=5; j<n; j++) S3; } S4;

S[ [S0] ] = » i j m n cst – S[ [S1] ] = » i j m n cst 1 – S[ [S2] ] = 2 6 6 4 i j m n cst 2 1 3 7 7 5 S[ [S3] ] = 2 6 6 6 6 6 4 i j m n cst 2 1 1 1 3 7 7 7 7 7 5

S.Pop, A.Cohen, C.Bastoul, S.Girbal, G.A.Silber, N.Vasilache GRAPHITE: Polyhedral Analyses and Optimizations for GCC

slide-17
SLIDE 17

Schedule: Example

S0; S1; for (i=0; i<m; i++) { S2; for (j=5; j<n; j++) S3; } S4;

S[ [S4] ] = » i j m n cst 3 – S[ [S0] ] = » i j m n cst – S[ [S1] ] = » i j m n cst 1 – S[ [S2] ] = 2 6 6 4 i j m n cst 2 1 3 7 7 5 S[ [S3] ] = 2 6 6 6 6 6 4 i j m n cst 2 1 1 1 3 7 7 7 7 7 5

S.Pop, A.Cohen, C.Bastoul, S.Girbal, G.A.Silber, N.Vasilache GRAPHITE: Polyhedral Analyses and Optimizations for GCC

slide-18
SLIDE 18

Schedule: Separation Example

2 1 1 1 n m j i cst scheduling matrix S[ [S3] ]

S.Pop, A.Cohen, C.Bastoul, S.Girbal, G.A.Silber, N.Vasilache GRAPHITE: Polyhedral Analyses and Optimizations for GCC

slide-19
SLIDE 19

Schedule: Separation Example

2 1 1 1 1 1 1 n m j i cst n m j i cst 2 n m cst j i separate static / dynamic schedules

S.Pop, A.Cohen, C.Bastoul, S.Girbal, G.A.Silber, N.Vasilache GRAPHITE: Polyhedral Analyses and Optimizations for GCC

slide-20
SLIDE 20

Schedule: Separation Example

2 1 1 1 1 1 1 n m j i cst n m j i cst 2 1 2 cst n m cst j i static scheduling vector fusion, fission, code motion

S.Pop, A.Cohen, C.Bastoul, S.Girbal, G.A.Silber, N.Vasilache GRAPHITE: Polyhedral Analyses and Optimizations for GCC

slide-21
SLIDE 21

Schedule: Separation Example

2 1 1 1 1 1 1 n m j i cst n m j i cst 2 1 2 n m cst cst n m cst j i Parameter scheduling matrix shifting

S.Pop, A.Cohen, C.Bastoul, S.Girbal, G.A.Silber, N.Vasilache GRAPHITE: Polyhedral Analyses and Optimizations for GCC

slide-22
SLIDE 22

Schedule: Separation Example

2 1 1 1 1 1 1 n m j i cst n m j i cst 2 1 2 n m cst j i 1 1 cst n m cst j i Iteration scheduling matrix interchange, skewing, reversal

S.Pop, A.Cohen, C.Bastoul, S.Girbal, G.A.Silber, N.Vasilache GRAPHITE: Polyhedral Analyses and Optimizations for GCC

slide-23
SLIDE 23

Compose Transforms Small set of primitives (basic operations on matrices)

1 motion 2 interchange 3 strip-mine 4 insert, delete 5 shift 6 skew, reversal, reindexing 7 privatize

fission/fusion (1) tiling (2 + 3)

S.Pop, A.Cohen, C.Bastoul, S.Girbal, G.A.Silber, N.Vasilache GRAPHITE: Polyhedral Analyses and Optimizations for GCC

slide-24
SLIDE 24

Optimal Transform? Find sequences of transforms based on size of loops cache misses simulation Automatic selection of transforms amounts to choosing a point in a vector space hard part (open questions) WRaP-IT uses directives some transforms yield cool speedups . . .

S.Pop, A.Cohen, C.Bastoul, S.Girbal, G.A.Silber, N.Vasilache GRAPHITE: Polyhedral Analyses and Optimizations for GCC

slide-25
SLIDE 25

Results From WRaP-IT on Top of PathScale EKOPath swim from SPEC CPU2000 32% speedup on AthlonXP wrt. peak EKOPath (V2.1) 38% speedup for Athlon64 wrt. peak EKOPath (V2.1) principal SCoP: 421 lines of code apply 30 transforms to principal SCoP

fusion, tiling, peeling, unrolling, interchange, strip-mining

result 2267 LOC 39 sec source to assembly on AthlonXP 2.08GHz 22 sec in the backend 12 sec polyhedral data deps 4 sec polyhedral code gen

S.Pop, A.Cohen, C.Bastoul, S.Girbal, G.A.Silber, N.Vasilache GRAPHITE: Polyhedral Analyses and Optimizations for GCC

slide-26
SLIDE 26

GRAPHITE: Road Map

1 select SCoPs filter out difficult codes (Alexandru Plesco) 2 extend LAMBDA build schedule functions, GLooG 3 cost models more static analyzers, and transform selection 4 array regions improve data deps in interproc mode 5 lib integration PolyLib, PiPLib, Omega, lib-APRON 1 2 4 5 3 3 PolyLib Omega PIPlib Numerical Domains Common Interface Cost Models Transform Selection GIMPLE GIMPLE Generation From GIMPLE Data Dependences Array Regions GIMPLE GRAPHITE Intervals Congruences Octagons S.Pop, A.Cohen, C.Bastoul, S.Girbal, G.A.Silber, N.Vasilache GRAPHITE: Polyhedral Analyses and Optimizations for GCC

slide-27
SLIDE 27

lib-APRON: interchange envelopes limit computation complexity = restrict expressivity use coarser representations

  • Octagons

Polyhedra Boxes (4 constraints) (n constraints) (8 constraints)

S.Pop, A.Cohen, C.Bastoul, S.Girbal, G.A.Silber, N.Vasilache GRAPHITE: Polyhedral Analyses and Optimizations for GCC

slide-28
SLIDE 28

Library integrations proposed libs: PolyLib, PiPLib, Omega, Octagon, lib-APRON public domain, or GPL, about 20 kLOC in GCC, or GCC depend on?

S.Pop, A.Cohen, C.Bastoul, S.Girbal, G.A.Silber, N.Vasilache GRAPHITE: Polyhedral Analyses and Optimizations for GCC

slide-29
SLIDE 29

Questions?

S.Pop, A.Cohen, C.Bastoul, S.Girbal, G.A.Silber, N.Vasilache GRAPHITE: Polyhedral Analyses and Optimizations for GCC