Locus: A System and a Language for Program Optimization Thiago - - PowerPoint PPT Presentation

locus a system and a language for program optimization
SMART_READER_LITE
LIVE PREVIEW

Locus: A System and a Language for Program Optimization Thiago - - PowerPoint PPT Presentation

Locus: A System and a Language for Program Optimization Thiago Teixeira *, Corinne Ancourt + , David Padua*, William Gropp* *Department of Computer Science, University of Illinois at Urbana-Champaign, USA + MINES ParisTech, PSL University, France


slide-1
SLIDE 1

Locus: A System and a Language for Program Optimization

Thiago Teixeira*, Corinne Ancourt+, David Padua*, William Gropp*

*Department of Computer Science, University of Illinois at Urbana-Champaign, USA

+MINES ParisTech, PSL University, France

CGO - Washington, DC - Feb 2019

slide-2
SLIDE 2

Introduction

2

slide-3
SLIDE 3

Introduction

2

  • Very complex machines
  • Gap between performance of hand-tuned and compiler-generated

code has grown substantially

slide-4
SLIDE 4

Introduction

2

  • Very complex machines
  • Gap between performance of hand-tuned and compiler-generated

code has grown substantially

  • Platform-specific optimizations are required
slide-5
SLIDE 5

Introduction

2

  • Very complex machines
  • Gap between performance of hand-tuned and compiler-generated

code has grown substantially

  • Platform-specific optimizations are required
  • Platforms change, and new ones are introduced
slide-6
SLIDE 6

Introduction

2

  • Very complex machines
  • Gap between performance of hand-tuned and compiler-generated

code has grown substantially

  • Platform-specific optimizations are required
  • Platforms change, and new ones are introduced

Source Code

slide-7
SLIDE 7

Introduction

2

  • Very complex machines
  • Gap between performance of hand-tuned and compiler-generated

code has grown substantially

  • Platform-specific optimizations are required
  • Platforms change, and new ones are introduced
  • As you add them the code becomes less and less maintainable and

understandable

Source Code

slide-8
SLIDE 8

Goal

3

  • Improve performance automatically
  • Target multiple platforms
  • Keep the code maintainable in the long term
slide-9
SLIDE 9

Goal

3

  • Improve performance automatically
  • Target multiple platforms
  • Keep the code maintainable in the long term
  • How?
slide-10
SLIDE 10

Goal

3

  • Improve performance automatically
  • Target multiple platforms
  • Keep the code maintainable in the long term
  • How?

We use empirical search

slide-11
SLIDE 11

Goal

3

  • Improve performance automatically
  • Target multiple platforms
  • Keep the code maintainable in the long term
  • How?

Automatically generate and evaluate a collection of

  • ptimized variants by executing them

We use empirical search

slide-12
SLIDE 12

Challenges

4

slide-13
SLIDE 13

Challenges

4

1. How to describe a collection of optimized variants (opt space) concisely?

  • modify and extend the use of optimizations
slide-14
SLIDE 14

Challenges

4

1. How to describe a collection of optimized variants (opt space) concisely?

  • modify and extend the use of optimizations

2. Generate the variants automatically:

  • often needs multiple techniques
  • a lot tools out there
  • tools are not prepared to work with each other
  • compose a diverse set of transformations into a final code is not trivial
slide-15
SLIDE 15

Challenges

4

1. How to describe a collection of optimized variants (opt space) concisely?

  • modify and extend the use of optimizations

2. Generate the variants automatically:

  • often needs multiple techniques
  • a lot tools out there
  • tools are not prepared to work with each other
  • compose a diverse set of transformations into a final code is not trivial

3. Select relevant variants

  • optimization space too large to be fully evaluated
slide-16
SLIDE 16

Challenges

4

1. How to describe a collection of optimized variants (opt space) concisely?

  • modify and extend the use of optimizations

2. Generate the variants automatically:

  • often needs multiple techniques
  • a lot tools out there
  • tools are not prepared to work with each other
  • compose a diverse set of transformations into a final code is not trivial

3. Select relevant variants

  • optimization space too large to be fully evaluated

4. Manage platform-specific recipes of transformations

  • how and where to store
  • make it available to non-experts
slide-17
SLIDE 17

Optimization Space

5

  • triple nested loop

for i for j for k

slide-18
SLIDE 18

Optimization Space

5

  • triple nested loop

for i for j for k

Interchange 6 variants

{

all permutations

slide-19
SLIDE 19

Optimization Space

5

  • triple nested loop

for i for j for k for j for i for k for j for k for i for i for k for j for i for j for k for i for j for k for i for j for k

Interchange 6 variants

{

all permutations

slide-20
SLIDE 20

Optimization Space

5

  • triple nested loop

for i for j for k for j for i for k for j for k for i for i for k for j for i for j for k for i for j for k for i for j for k

Interchange 6 variants

{

all permutations

Unroll

{

18 variants

(2, 4, 8)

slide-21
SLIDE 21

Optimization Space

5

  • triple nested loop

for i for j for k for j for i for k for j for k for i for i for k for j for i for j for k for i for j for k for i for j for k for j for i for k/2 for j for i for k/4 for j for i for k/8

Interchange 6 variants

{

all permutations

Unroll

{

18 variants

(2, 4, 8)

slide-22
SLIDE 22

Optimization Space

5

  • triple nested loop

for i for j for k for j for i for k for j for k for i for i for k for j for i for j for k for i for j for k for i for j for k for j for i for k/2 for j for i for k/4 for j for i for k/8

Interchange 6 variants

{

all permutations

Unroll

{

18 variants

(2, 4, 8)

Tiling

{

(2, 4, 8, 16, 32, 64, 128)

126 variants

slide-23
SLIDE 23

Optimization Space

5

  • triple nested loop

for i for j for k for j for i for k for j for k for i for i for k for j for i for j for k for i for j for k for i for j for k for j for i for k/2 for j for i for k/4 for j for i for k/8

Interchange 6 variants

{

all permutations

Unroll

{

18 variants

(2, 4, 8)

for t_i for j for i for k/4

. . .

for t_i for j for i for k/4 for t_i for j for i for k/4

Tiling

{

(2, 4, 8, 16, 32, 64, 128)

126 variants

slide-24
SLIDE 24

Optimization Space

5

  • triple nested loop

for i for j for k for j for i for k for j for k for i for i for k for j for i for j for k for i for j for k for i for j for k for j for i for k/2 for j for i for k/4 for j for i for k/8

Interchange 6 variants

{

all permutations

Unroll

{

18 variants

(2, 4, 8)

for t_i for j for i for k/4

. . .

for t_i for j for i for k/4 for t_i for j for i for k/4

Tiling

{

(2, 4, 8, 16, 32, 64, 128)

126 variants Unroll-and-jam

{

882 variants

(2, 4, 8, 16, 32, 64, 128)

slide-25
SLIDE 25

Locus

6

for i for j for k

Interchange Unroll 6 variants

for j for i for k for j for k for i for i for k for j for i for j for k for i for j for k for i for j for k for j for i for k/2 for j for i for k/4 for j for i for k/8

{ {

18 variants Tiling

{

for t_i for j for i for k/4

. . .

all permutations (2, 4, 8) (2, 4, 8, 16, 32, 64, 128)

for t_i for j for i for k/4 for t_i for j for i for k/4

126 variants Unroll-and-jam

{

882 variants

(2, 4, 8, 16, 32, 64, 128)

slide-26
SLIDE 26

Locus

6

for i for j for k +

Locus program

Interchange Unroll 6 variants for j for i for k for j for k for i for i for k for j for i for j for k for i for j for k for i for j for k for j for i for k/2 for j for i for k/4 for j for i for k/8 { { 18 variants Tiling { for t_i for j for i for k/4 . . . all permutations (2, 4, 8) (2, 4, 8, 16, 32, 64, 128) for t_i for j for i for k/4 for t_i for j for i for k/4 126 variants Unroll-and-jam { 882 variants (2, 4, 8, 16, 32, 64, 128)
slide-27
SLIDE 27

Locus

6

for i for j for k +

Locus program

  • 1. Selects variants (avoid

explosion)

  • 2. Runs
  • 3. Determine the best variant

Locus system

Interchange Unroll 6 variants for j for i for k for j for k for i for i for k for j for i for j for k for i for j for k for i for j for k for j for i for k/2 for j for i for k/4 for j for i for k/8 { { 18 variants Tiling { for t_i for j for i for k/4 . . . all permutations (2, 4, 8) (2, 4, 8, 16, 32, 64, 128) for t_i for j for i for k/4 for t_i for j for i for k/4 126 variants Unroll-and-jam { 882 variants (2, 4, 8, 16, 32, 64, 128)
slide-28
SLIDE 28

Locus

6

for i for j for k

Locus program with steps to the best variant found

+

Locus program

  • 1. Selects variants (avoid

explosion)

  • 2. Runs
  • 3. Determine the best variant

Locus system

Interchange Unroll 6 variants for j for i for k for j for k for i for i for k for j for i for j for k for i for j for k for i for j for k for j for i for k/2 for j for i for k/4 for j for i for k/8 { { 18 variants Tiling { for t_i for j for i for k/4 . . . all permutations (2, 4, 8) (2, 4, 8, 16, 32, 64, 128) for t_i for j for i for k/4 for t_i for j for i for k/4 126 variants Unroll-and-jam { 882 variants (2, 4, 8, 16, 32, 64, 128)
slide-29
SLIDE 29

Locus

7

  • Semi-automatic approach to assist performance experts and code

developers in the performance optimization of programs in C, C++, and Fortran

  • Orchestrates the application of transformations to a baseline version
  • f the code
  • Specially for optimizing complex, long-lived applications running on

different environments

slide-30
SLIDE 30

Contributions

8

slide-31
SLIDE 31

Contributions

8

  • Defined Locus language:

– describe concisely complex space of optimizations – agnostic of any specific traversal method – decouple performance expert role from application expert role

slide-32
SLIDE 32

Contributions

8

  • Defined Locus language:

– describe concisely complex space of optimizations – agnostic of any specific traversal method – decouple performance expert role from application expert role

  • Implemented a system with flexible API for plugging in:

– different variant selection techniques (optimization space traversal) – collection of transformations developed internally and externally

slide-33
SLIDE 33

Contributions

8

  • Defined Locus language:

– describe concisely complex space of optimizations – agnostic of any specific traversal method – decouple performance expert role from application expert role

  • Implemented a system with flexible API for plugging in:

– different variant selection techniques (optimization space traversal) – collection of transformations developed internally and externally

  • Optimizer and interpreter for the Locus programs:

– prune the space automatically – speeds-up the empirical search

slide-34
SLIDE 34
  • Baseline code: defined by the developer, no platform- or compiler-

specific optimizations

  • Annotated regions of interest (i.e., code regions)
  • Program the application of the optimizations for each code region

Locus Approach

9

slide-35
SLIDE 35

Locus System

10

#pragma @Locus loop = matmul for (i=0; i<M; i++) for (j=0; j<N; j++) for (k=0; k<K; k++) C[i][j] = beta*C[i][j] + alpha*A[i][k]*B[k][j];

Annotated Source Code

slide-36
SLIDE 36

Locus System

11 #pragma @Locus loop = matmul

for (i=0; i<M; i++) for (j=0; j<N; j++) for (k=0; k<K; k++) C[i][j] = beta*C[i][j] + alpha*A[i][k]*B[k][j];

Annotated Source Code

slide-37
SLIDE 37

Locus System

12

#pragma @Locus loop = matmul for (i=0; i<M; i++) for (j=0; j<N; j++) for (k=0; k<K; k++) C[i][j] = beta*C[i][j] + alpha*A[i][k]*B[k][j];

Annotated Source Code

CodeReg matmul { tiledim = 4; tiletype = Tiling2D() OR Tiling3D(); printstatus(tiletype); if (tiletype == "2D") { RoseLocus.Unroll(loop=innermost, factor=tiledim); } }

Locus Program

slide-38
SLIDE 38

Locus System

13

#pragma @Locus loop = matmul for (i=0; i<M; i++) for (j=0; j<N; j++) for (k=0; k<K; k++) C[i][j] = beta*C[i][j] + alpha*A[i][k]*B[k][j];

Annotated Source Code

CodeReg matmul {

tiledim = 4; tiletype = Tiling2D() OR Tiling3D(); printstatus(tiletype); if (tiletype == "2D") { RoseLocus.Unroll(loop=innermost, factor=tiledim); } }

Locus Program

slide-39
SLIDE 39

Locus System

13

#pragma @Locus loop = matmul for (i=0; i<M; i++) for (j=0; j<N; j++) for (k=0; k<K; k++) C[i][j] = beta*C[i][j] + alpha*A[i][k]*B[k][j];

Annotated Source Code

CodeReg matmul {

tiledim = 4; tiletype = Tiling2D() OR Tiling3D(); printstatus(tiletype); if (tiletype == "2D") { RoseLocus.Unroll(loop=innermost, factor=tiledim); } }

Locus Program

  • Optimizations are target-specific and region-specific
  • Separated from the application’s code
slide-40
SLIDE 40

Locus Optimization Language

14

slide-41
SLIDE 41

Locus Optimization Language

14

  • Optimization recipes for each code region (CodeReg, OptSeq)
slide-42
SLIDE 42

Locus Optimization Language

14

  • Optimization recipes for each code region (CodeReg, OptSeq)
  • Loops, If-then-else
slide-43
SLIDE 43

Locus Optimization Language

14

  • Optimization recipes for each code region (CodeReg, OptSeq)
  • Loops, If-then-else
  • Special Search Constructs:

– OR blocks and statements; – Optional statements; – enum, integer, permutation, poweroftwo…

slide-44
SLIDE 44

Locus Optimization Language

15

Interchange Unroll Tiling Distribute

slide-45
SLIDE 45

Locus Optimization Language

15

Interchange Unroll Tiling Distribute

slide-46
SLIDE 46

Locus Optimization Language

15

Interchange Unroll Tiling Distribute Unroll Unroll-and-jam Distribute

slide-47
SLIDE 47

Locus Optimization Language

15

Interchange Unroll Tiling Distribute Unroll Unroll-and-jam Distribute OR

slide-48
SLIDE 48

Locus Optimization Language

15

Interchange Unroll Tiling Distribute Unroll Unroll-and-jam Distribute OR

slide-49
SLIDE 49

Locus Optimization Language

15

Interchange Unroll Tiling Distribute Unroll Unroll-and-jam Distribute OR

Distribute is optional

*

slide-50
SLIDE 50

Locus Optimization Language

15

Interchange Unroll Tiling Distribute Unroll Unroll-and-jam Distribute OR

Distribute is optional

*

CodeReg test { Interchange(…); { Tiling(…); Distribute(…); Unroll(…); } OR { Unroll-and-jam(…); *Distribute(…); Unroll(…); } }

slide-51
SLIDE 51

Modules Integration 1/3

16

slide-52
SLIDE 52

Modules Integration 1/3

16

  • Collaborative environment, reuse other’s work
slide-53
SLIDE 53

Modules Integration 1/3

16

  • Collaborative environment, reuse other’s work
  • Locus defines an entire search space
slide-54
SLIDE 54

Modules Integration 1/3

16

  • Collaborative environment, reuse other’s work
  • Locus defines an entire search space
  • Locus allows for both multiple search and transformation modules
slide-55
SLIDE 55

Modules Integration 1/3

16

  • Collaborative environment, reuse other’s work
  • Locus defines an entire search space
  • Locus allows for both multiple search and transformation modules
  • Given the search space, one must:

– decide which variants to evaluate (search module) – use tools to generate code that follows each variant’s transformation plan (transformation module)

slide-56
SLIDE 56

Modules Integration 2/3

17

slide-57
SLIDE 57

Modules Integration 2/3

17

  • Search modules (OpenTuner, HyperOpt):
slide-58
SLIDE 58

Modules Integration 2/3

17

  • Search modules (OpenTuner, HyperOpt):

Locus program

slide-59
SLIDE 59

Modules Integration 2/3

17

  • Search modules (OpenTuner, HyperOpt):

Locus program Locus's space

slide-60
SLIDE 60

Modules Integration 2/3

17

  • Search modules (OpenTuner, HyperOpt):

Locus program Locus's space Search Module’s

  • pt space
slide-61
SLIDE 61

Modules Integration 2/3

17

  • Search modules (OpenTuner, HyperOpt):

Locus program Locus's space Search Module’s

  • pt space

Select a point and converts

Code Generator

slide-62
SLIDE 62

Modules Integration 2/3

17

  • Search modules (OpenTuner, HyperOpt):

Locus program Locus's space Search Module’s

  • pt space

Select a point and converts

Code Generator Evaluate a variant

slide-63
SLIDE 63

Modules Integration 2/3

17

  • Search modules (OpenTuner, HyperOpt):

Locus program Locus's space Search Module’s

  • pt space

Return a metric Select a point and converts

Code Generator Evaluate a variant

slide-64
SLIDE 64

Modules Integration 2/3

17

  • Search modules (OpenTuner, HyperOpt):

– Convert the Locus' space to module’s space

  • parameters, OR statements and blocks, conditionals

Locus program Locus's space Search Module’s

  • pt space

Return a metric Select a point and converts

Code Generator Evaluate a variant

slide-65
SLIDE 65

Modules Integration 2/3

17

  • Search modules (OpenTuner, HyperOpt):

– Convert the Locus' space to module’s space

  • parameters, OR statements and blocks, conditionals

– For each point converts it back to Locus representation, and invokes the interpreter Locus program Locus's space Search Module’s

  • pt space

Return a metric Select a point and converts

Code Generator Evaluate a variant

slide-66
SLIDE 66

Modules Integration 2/3

17

  • Search modules (OpenTuner, HyperOpt):

– Convert the Locus' space to module’s space

  • parameters, OR statements and blocks, conditionals

– For each point converts it back to Locus representation, and invokes the interpreter – Search: start process Locus program Locus's space Search Module’s

  • pt space

Return a metric Select a point and converts

Code Generator Evaluate a variant

slide-67
SLIDE 67

Modules Integration 3/3

18

slide-68
SLIDE 68

Modules Integration 3/3

18

  • Transformation modules (Pips, RoseLocus, Pragmas, BuiltIn):

– Allows for fine-grain selection

  • Can pick a different module for each transformation (e.g., Interchange, Tiling)

– Work on code region level – Workflow:

  • Locus transforms to modules notation
  • Module applies the optimization
  • Locus transforms the resulting code into its internal representation (AST and

code region structure)

– Flexible enough to integrate other transformations if needed

slide-69
SLIDE 69

Optimizations for Pruning

19

  • During conversion:

– Dead code elimination – Constant folding – Constant propagation

slide-70
SLIDE 70

Optimizations for Pruning

19

  • During conversion:

– Dead code elimination – Constant folding – Constant propagation Locus program Locus's space Search Module’s

  • pt space

Return a metric Select a point and converts

Code Generator Evaluate a variant

slide-71
SLIDE 71

Optimizations for Pruning

19

  • During conversion:

– Dead code elimination – Constant folding – Constant propagation Locus program Locus's space Search Module’s

  • pt space

Return a metric Select a point and converts

Code Generator Evaluate a variant

slide-72
SLIDE 72

Optimizations for Pruning

19

  • During conversion:

– Dead code elimination – Constant folding – Constant propagation Locus program Locus's space Search Module’s

  • pt space

Return a metric Select a point and converts

Code Generator Evaluate a variant

CodeReg test { perfect = IsPerfectLoopNest(); if (perfect) { Interchange(…); } Tiling(…); Distribute(…); Unroll(…); }

slide-73
SLIDE 73

Optimizations for Pruning

19

  • During conversion:

– Dead code elimination – Constant folding – Constant propagation Locus program Locus's space Search Module’s

  • pt space

Return a metric Select a point and converts

Code Generator Evaluate a variant

CodeReg test { perfect = IsPerfectLoopNest(); if (perfect) { Interchange(…); } Tiling(…); Distribute(…); Unroll(…); }

False

slide-74
SLIDE 74

Optimizations for Pruning

19

  • During conversion:

– Dead code elimination – Constant folding – Constant propagation Locus program Locus's space Search Module’s

  • pt space

Return a metric Select a point and converts

Code Generator Evaluate a variant

CodeReg test { perfect = IsPerfectLoopNest(); if (perfect) { Interchange(…); } Tiling(…); Distribute(…); Unroll(…); }

False

slide-75
SLIDE 75

Experimental Results

20

  • Intel Xeon E5-2660 10-Core 2.60 GHz
  • Compared to Pluto and Intel MKL

– Default values for parameters, no search

  • Examples:

– Matrix-Matrix Multiplication – Stencil Kernels – Kripke – Arbitrary Loop Nests

  • Generic enough to be applied on known and unknown code

applications

slide-76
SLIDE 76

Matrix-Matrix Multiplication

21

Speedup over sequential 100 200 300 400 500 600 CPU Cores 1 2 4 6 8 10 Pluto MKL Locus

slide-77
SLIDE 77

Matrix-Matrix Multiplication

21

  • Empirical search could find very efficient variants
  • Comparable with Intel MKL performance

Speedup over sequential 100 200 300 400 500 600 CPU Cores 1 2 4 6 8 10 Pluto MKL Locus

slide-78
SLIDE 78

Matrix-Matrix Multiplication

22

slide-79
SLIDE 79

Matrix-Matrix Multiplication

22

Interchange

slide-80
SLIDE 80

Matrix-Matrix Multiplication

22

Interchange Tiling

slide-81
SLIDE 81

Matrix-Matrix Multiplication

22

Interchange Tiling Tiling

slide-82
SLIDE 82

Matrix-Matrix Multiplication

22

Interchange Tiling Tiling Parallel For

slide-83
SLIDE 83

Matrix-Matrix Multiplication

22

Interchange Tiling Tiling Parallel For

Static + chunk Dynamic + chunk

OR

slide-84
SLIDE 84

Matrix-Matrix Multiplication

22

  • Large space of optimization

Interchange Tiling Tiling Parallel For

Static + chunk Dynamic + chunk

OR

slide-85
SLIDE 85

Matrix-Matrix Multiplication

22

  • Large space of optimization
  • 34,012,224 possible variants

Interchange Tiling Tiling Parallel For

Static + chunk Dynamic + chunk

OR

slide-86
SLIDE 86

Matrix-Matrix Multiplication

22

  • Large space of optimization
  • 34,012,224 possible variants
  • Average of ~450 variants evaluated per setup

Interchange Tiling Tiling Parallel For

Static + chunk Dynamic + chunk

OR

slide-87
SLIDE 87

Matrix-Matrix Multiplication

22

  • Large space of optimization
  • 34,012,224 possible variants
  • Average of ~450 variants evaluated per setup
  • 80 minutes search per setup

Interchange Tiling Tiling Parallel For

Static + chunk Dynamic + chunk

OR

slide-88
SLIDE 88

Stencils

23

slide-89
SLIDE 89

Stencils

23

  • 6 different stencils
slide-90
SLIDE 90

Stencils

23

  • 6 different stencils
  • Skew tiling accross time-space
slide-91
SLIDE 91

Stencils

23

  • 6 different stencils
  • Skew tiling accross time-space
  • Found better tiling shapes
slide-92
SLIDE 92

Stencils

23

  • 6 different stencils
  • Skew tiling accross time-space
  • Found better tiling shapes
slide-93
SLIDE 93

Stencils

23

  • 6 different stencils
  • Skew tiling accross time-space
  • Found better tiling shapes

Speedup 1 2 3 4 Jacobi 1d Jacobi 2d Heat 1d Heat 2d Seidel 1d Seidel 2d Pluto Locus

slide-94
SLIDE 94

Kripke

24

  • Deterministic particle transport code and proxy-app for the Ardra

project developed at LLNL

  • 5 kernels: LTimes, LPlusTimes, Scattering , Source, and Sweep
  • 6 hand-optimized versions (6 angular fluxes using a 3D array

indexed by direction D, group G and zone Z)

  • From a single source code generate the 6 hand-optimized versions

using Locus

slide-95
SLIDE 95

Kripke

25

Execution Time (sec) 1 2 3 4 5 6 7 8 9 DGZ DZG GDZ GZD ZDG ZGD Hand-Optimized Locus

slide-96
SLIDE 96

Kripke - Scattering Kernel

26

for(int nm = 0; nm < num_moments; ++nm) for(int g = 0; g < num_groups; ++g) for(int gp = 0; gp < num_groups; ++gp) for(int zone = 0; zone < num_zones; ++zone) for(int mix = z_mixed[z]; mix < z_mixed[z]+num_mixed[z]; ++mix) { int material = mixed_material[mix]; double fraction = mixed_fraction[mix]; int n = moment_to_coeff[nm]; ##### # Address calculation to be included here. ##### *phi_out += *sigs * *phi * fraction; }

slide-97
SLIDE 97

Kripke - Scattering Kernel

27

for(int nm = 0; nm < num_moments; ++nm) for(int g = 0; g < num_groups; ++g) for(int gp = 0; gp < num_groups; ++gp) for(int zone = 0; zone < num_zones; ++zone) for(int mix = z_mixed[z]; mix < z_mixed[z]+num_mixed[z]; ++mix) { int material = mixed_material[mix]; double fraction = mixed_fraction[mix]; int n = moment_to_coeff[nm]; ##### # Address calculation to be included here. ##### *phi_out += *sigs * *phi * fraction; } datalayout=enum("DZG","DGZ","GDZ","GZD","ZDG","ZGD"); CodeReg Scattering { if (datalayout == "DGZ") {

  • mploop="0.0.0.0";

} elif (datalayout == "GDZ") { looporder=[1,2,0,3,4];

  • mploop="0.0.0.0";

} elif (datalayout == "GZD") { looporder=[1,2,3,4,0];

  • mploop="0.0.0";

} elif (datalayout == "ZGD") { looporder=[3,4,1,2,0];

  • mploop="0";

} elif (datalayout == "ZDG") { looporder=[3,4,0,1,2];

  • mploop="0";

} elif (datalayout == "DZG") { looporder=[0,3,4,1,2];

  • mploop="0.0";

} sourcepath="scatter_"+datalayout+".txt"; BuiltIn.Altdesc(stmt="0.0.0.0.0.3", source=sourcepath); RoseLocus.Interchange(order=looporder); RoseLocus.LICM(); RoseLocus.ScalarRepl(); Pragma.OMPFor(loop=omploop); }

slide-98
SLIDE 98

Kripke - Scattering Kernel

28

for(int nm = 0; nm < num_moments; ++nm) for(int g = 0; g < num_groups; ++g) for(int gp = 0; gp < num_groups; ++gp) for(int zone = 0; zone < num_zones; ++zone) for(int mix = z_mixed[z]; mix < z_mixed[z]+num_mixed[z]; ++mix) { int material = mixed_material[mix]; double fraction = mixed_fraction[mix]; int n = moment_to_coeff[nm]; ##### # Address calculation to be included here. ##### *phi_out += *sigs * *phi * fraction; }

datalayout=enum("DZG","DGZ","GDZ","GZD","ZDG","ZGD");

CodeReg Scattering { if (datalayout == "DGZ") {

  • mploop="0.0.0.0";

} elif (datalayout == "GDZ") { looporder=[1,2,0,3,4];

  • mploop="0.0.0.0";

} elif (datalayout == "GZD") { looporder=[1,2,3,4,0];

  • mploop="0.0.0";

} elif (datalayout == "ZGD") { looporder=[3,4,1,2,0];

  • mploop="0";

} elif (datalayout == "ZDG") { looporder=[3,4,0,1,2];

  • mploop="0";

} elif (datalayout == "DZG") { looporder=[0,3,4,1,2];

  • mploop="0.0";

} sourcepath="scatter_"+datalayout+".txt"; BuiltIn.Altdesc(stmt="0.0.0.0.0.3", source=sourcepath); RoseLocus.Interchange(order=looporder); RoseLocus.LICM(); RoseLocus.ScalarRepl(); Pragma.OMPFor(loop=omploop); }

slide-99
SLIDE 99

Kripke - Scattering Kernel

29

for(int nm = 0; nm < num_moments; ++nm) for(int g = 0; g < num_groups; ++g) for(int gp = 0; gp < num_groups; ++gp) for(int zone = 0; zone < num_zones; ++zone) for(int mix = z_mixed[z]; mix < z_mixed[z]+num_mixed[z]; ++mix) { int material = mixed_material[mix]; double fraction = mixed_fraction[mix]; int n = moment_to_coeff[nm]; ##### # Address calculation to be included here. ##### *phi_out += *sigs * *phi * fraction; } datalayout=enum("DZG","DGZ","GDZ","GZD","ZDG","ZGD"); CodeReg Scattering { if (datalayout == "DGZ") {

  • mploop="0.0.0.0";

} elif (datalayout == "GDZ") { looporder=[1,2,0,3,4];

  • mploop="0.0.0.0";

} elif (datalayout == "GZD") { looporder=[1,2,3,4,0];

  • mploop="0.0.0";

} elif (datalayout == "ZGD") { looporder=[3,4,1,2,0];

  • mploop="0";

} elif (datalayout == "ZDG") { looporder=[3,4,0,1,2];

  • mploop="0";

} elif (datalayout == "DZG") { looporder=[0,3,4,1,2];

  • mploop="0.0";

}

sourcepath="scatter_"+datalayout+".txt"; BuiltIn.Altdesc(stmt="0.0.0.0.0.3", source=sourcepath); RoseLocus.Interchange(order=looporder); RoseLocus.LICM(); RoseLocus.ScalarRepl(); Pragma.OMPFor(loop=omploop); }

slide-100
SLIDE 100

Kripke - Scattering Kernel

30

for(int nm = 0; nm < num_moments; ++nm) for(int g = 0; g < num_groups; ++g) for(int gp = 0; gp < num_groups; ++gp) for(int zone = 0; zone < num_zones; ++zone) for(int mix = z_mixed[z]; mix < z_mixed[z]+num_mixed[z]; ++mix) { int material = mixed_material[mix]; double fraction = mixed_fraction[mix]; int n = moment_to_coeff[nm]; ##### # Address calculation to be included here. ##### *phi_out += *sigs * *phi * fraction; } datalayout=enum("DZG","DGZ","GDZ","GZD","ZDG","ZGD"); CodeReg Scattering { if (datalayout == "DGZ") {

  • mploop="0.0.0.0";

} elif (datalayout == "GDZ") { looporder=[1,2,0,3,4];

  • mploop="0.0.0.0";

} elif (datalayout == "GZD") { looporder=[1,2,3,4,0];

  • mploop="0.0.0";

} elif (datalayout == "ZGD") { looporder=[3,4,1,2,0];

  • mploop="0";

} elif (datalayout == "ZDG") { looporder=[3,4,0,1,2];

  • mploop="0";

} elif (datalayout == "DZG") { looporder=[0,3,4,1,2];

  • mploop="0.0";

}

sourcepath="scatter_"+datalayout+".txt"; BuiltIn.Altdesc(stmt="0.0.0.0.0.3", source=sourcepath);

RoseLocus.Interchange(order=looporder); RoseLocus.LICM(); RoseLocus.ScalarRepl(); Pragma.OMPFor(loop=omploop); }

slide-101
SLIDE 101

Kripke - Scattering Kernel

31

for(int nm = 0; nm < num_moments; ++nm) for(int g = 0; g < num_groups; ++g) for(int gp = 0; gp < num_groups; ++gp) for(int zone = 0; zone < num_zones; ++zone) for(int mix = z_mixed[z]; mix < z_mixed[z]+num_mixed[z]; ++mix) { int material = mixed_material[mix]; double fraction = mixed_fraction[mix]; int n = moment_to_coeff[nm]; ##### # Address calculation to be included here. ##### *phi_out += *sigs * *phi * fraction; } datalayout=enum("DZG","DGZ","GDZ","GZD","ZDG","ZGD"); CodeReg Scattering { if (datalayout == "DGZ") {

  • mploop="0.0.0.0";

} elif (datalayout == "GDZ") { looporder=[1,2,0,3,4];

  • mploop="0.0.0.0";

} elif (datalayout == "GZD") { looporder=[1,2,3,4,0];

  • mploop="0.0.0";

} elif (datalayout == "ZGD") { looporder=[3,4,1,2,0];

  • mploop="0";

} elif (datalayout == "ZDG") { looporder=[3,4,0,1,2];

  • mploop="0";

} elif (datalayout == "DZG") { looporder=[0,3,4,1,2];

  • mploop="0.0";

} sourcepath="scatter_"+datalayout+".txt"; BuiltIn.Altdesc(stmt="0.0.0.0.0.3", source=sourcepath);

RoseLocus.Interchange(order=looporder);

RoseLocus.LICM(); RoseLocus.ScalarRepl(); Pragma.OMPFor(loop=omploop); }

slide-102
SLIDE 102

Kripke - Scattering Kernel

32

for(int nm = 0; nm < num_moments; ++nm) for(int g = 0; g < num_groups; ++g) for(int gp = 0; gp < num_groups; ++gp) for(int zone = 0; zone < num_zones; ++zone) for(int mix = z_mixed[z]; mix < z_mixed[z]+num_mixed[z]; ++mix) { int material = mixed_material[mix]; double fraction = mixed_fraction[mix]; int n = moment_to_coeff[nm]; ##### # Address calculation to be included here. ##### *phi_out += *sigs * *phi * fraction; } datalayout=enum("DZG","DGZ","GDZ","GZD","ZDG","ZGD"); CodeReg Scattering { if (datalayout == "DGZ") {

  • mploop="0.0.0.0";

} elif (datalayout == "GDZ") { looporder=[1,2,0,3,4];

  • mploop="0.0.0.0";

} elif (datalayout == "GZD") { looporder=[1,2,3,4,0];

  • mploop="0.0.0";

} elif (datalayout == "ZGD") { looporder=[3,4,1,2,0];

  • mploop="0";

} elif (datalayout == "ZDG") { looporder=[3,4,0,1,2];

  • mploop="0";

} elif (datalayout == "DZG") { looporder=[0,3,4,1,2];

  • mploop="0.0";

} sourcepath="scatter_"+datalayout+".txt"; BuiltIn.Altdesc(stmt="0.0.0.0.0.3", source=sourcepath); RoseLocus.Interchange(order=looporder); RoseLocus.LICM();

RoseLocus.ScalarRepl(); Pragma.OMPFor(loop=omploop);

}

slide-103
SLIDE 103

Optimization of Arbitrary Loop Nests

33

  • Generic Locus program to optimize source codes unknown

beforehand

  • Goal: reproduce Gong Zhangxiaowen et al.1 work using Locus
  • Selected 856 loops from 16 benchmarks
  • Transformed loops with all subsets of two sequences:

1Gong Zhangxiaowen et al. “An empirical study of the effect of source-level loop transformations on compiler stability”.

Interchange Unroll-and-jam Tiling Unroll Distribution

slide-104
SLIDE 104

Optimization of Arbitrary Loop Nests

34

CodeReg scop { perfect = BuiltIn.IsPerfectLoopNest(); depth = BuiltIn.LoopNestDepth(); if (RoseLocus.IsDepAvailable()) { if (perfect && depth > 1) { permorder = permutation(seq(0,depth)); RoseLocus.Interchange(order=permorder); } { if (perfect) { indexT1 = integer(1..depth); T1fac = poweroftwo(2..32); RoseLocus.Tiling(loop=indexT1, factor=T1fac); } } OR { if (depth > 1) { indexUAJ = integer(1..depth-1); UAJfac = poweroftwo(2..4); RoseLocus.UnrollAndJam(loop=indexUAJ, factor=UAJfac); } } OR { None; # No tiling, interchange, or unroll and jam. } innerloops = BuiltIn.ListInnerLoops(); *RoseLocus.Distribute(loop=innerloops); } innerloops = BuiltIn.ListInnerLoops(); RoseLocus.Unroll(loop=innerloops, factor=poweroftwo(2..8)); }

slide-105
SLIDE 105

Optimization of Arbitrary Loop Nests

35

CodeReg scop { perfect = BuiltIn.IsPerfectLoopNest(); depth = BuiltIn.LoopNestDepth(); if (RoseLocus.IsDepAvailable()) { if (perfect && depth > 1) { permorder = permutation(seq(0,depth)); RoseLocus.Interchange(order=permorder); } { if (perfect) { indexT1 = integer(1..depth); T1fac = poweroftwo(2..32); RoseLocus.Tiling(loop=indexT1, factor=T1fac); } } OR { if (depth > 1) { indexUAJ = integer(1..depth-1); UAJfac = poweroftwo(2..4); RoseLocus.UnrollAndJam(loop=indexUAJ, factor=UAJfac); } } OR { None; # No tiling, interchange, or unroll and jam. } innerloops = BuiltIn.ListInnerLoops(); *RoseLocus.Distribute(loop=innerloops); } innerloops = BuiltIn.ListInnerLoops(); RoseLocus.Unroll(loop=innerloops, factor=poweroftwo(2..8)); }

slide-106
SLIDE 106

Optimization of Arbitrary Loop Nests

35

CodeReg scop { perfect = BuiltIn.IsPerfectLoopNest(); depth = BuiltIn.LoopNestDepth(); if (RoseLocus.IsDepAvailable()) { if (perfect && depth > 1) { permorder = permutation(seq(0,depth)); RoseLocus.Interchange(order=permorder); } { if (perfect) { indexT1 = integer(1..depth); T1fac = poweroftwo(2..32); RoseLocus.Tiling(loop=indexT1, factor=T1fac); } } OR { if (depth > 1) { indexUAJ = integer(1..depth-1); UAJfac = poweroftwo(2..4); RoseLocus.UnrollAndJam(loop=indexUAJ, factor=UAJfac); } } OR { None; # No tiling, interchange, or unroll and jam. } innerloops = BuiltIn.ListInnerLoops(); *RoseLocus.Distribute(loop=innerloops); } innerloops = BuiltIn.ListInnerLoops(); RoseLocus.Unroll(loop=innerloops, factor=poweroftwo(2..8)); }

Information about the code:

slide-107
SLIDE 107

Optimization of Arbitrary Loop Nests

35

CodeReg scop { perfect = BuiltIn.IsPerfectLoopNest(); depth = BuiltIn.LoopNestDepth(); if (RoseLocus.IsDepAvailable()) { if (perfect && depth > 1) { permorder = permutation(seq(0,depth)); RoseLocus.Interchange(order=permorder); } { if (perfect) { indexT1 = integer(1..depth); T1fac = poweroftwo(2..32); RoseLocus.Tiling(loop=indexT1, factor=T1fac); } } OR { if (depth > 1) { indexUAJ = integer(1..depth-1); UAJfac = poweroftwo(2..4); RoseLocus.UnrollAndJam(loop=indexUAJ, factor=UAJfac); } } OR { None; # No tiling, interchange, or unroll and jam. } innerloops = BuiltIn.ListInnerLoops(); *RoseLocus.Distribute(loop=innerloops); } innerloops = BuiltIn.ListInnerLoops(); RoseLocus.Unroll(loop=innerloops, factor=poweroftwo(2..8)); }

Information about the code:

  • Perfect loop nest?
slide-108
SLIDE 108

Optimization of Arbitrary Loop Nests

35

CodeReg scop { perfect = BuiltIn.IsPerfectLoopNest(); depth = BuiltIn.LoopNestDepth(); if (RoseLocus.IsDepAvailable()) { if (perfect && depth > 1) { permorder = permutation(seq(0,depth)); RoseLocus.Interchange(order=permorder); } { if (perfect) { indexT1 = integer(1..depth); T1fac = poweroftwo(2..32); RoseLocus.Tiling(loop=indexT1, factor=T1fac); } } OR { if (depth > 1) { indexUAJ = integer(1..depth-1); UAJfac = poweroftwo(2..4); RoseLocus.UnrollAndJam(loop=indexUAJ, factor=UAJfac); } } OR { None; # No tiling, interchange, or unroll and jam. } innerloops = BuiltIn.ListInnerLoops(); *RoseLocus.Distribute(loop=innerloops); } innerloops = BuiltIn.ListInnerLoops(); RoseLocus.Unroll(loop=innerloops, factor=poweroftwo(2..8)); }

Information about the code:

  • Perfect loop nest?
  • Loop nest depth
slide-109
SLIDE 109

Optimization of Arbitrary Loop Nests

35

CodeReg scop { perfect = BuiltIn.IsPerfectLoopNest(); depth = BuiltIn.LoopNestDepth(); if (RoseLocus.IsDepAvailable()) { if (perfect && depth > 1) { permorder = permutation(seq(0,depth)); RoseLocus.Interchange(order=permorder); } { if (perfect) { indexT1 = integer(1..depth); T1fac = poweroftwo(2..32); RoseLocus.Tiling(loop=indexT1, factor=T1fac); } } OR { if (depth > 1) { indexUAJ = integer(1..depth-1); UAJfac = poweroftwo(2..4); RoseLocus.UnrollAndJam(loop=indexUAJ, factor=UAJfac); } } OR { None; # No tiling, interchange, or unroll and jam. } innerloops = BuiltIn.ListInnerLoops(); *RoseLocus.Distribute(loop=innerloops); } innerloops = BuiltIn.ListInnerLoops(); RoseLocus.Unroll(loop=innerloops, factor=poweroftwo(2..8)); }

Information about the code:

  • Perfect loop nest?
  • Loop nest depth
  • Dependence test available?
slide-110
SLIDE 110

Optimization of Arbitrary Loop Nests

36

CodeReg scop { perfect = BuiltIn.IsPerfectLoopNest(); depth = BuiltIn.LoopNestDepth(); if (RoseLocus.IsDepAvailable()) { if (perfect && depth > 1) {

permorder = permutation(seq(0,depth)); RoseLocus.Interchange(order=permorder); }

{ if (perfect) { indexT1 = integer(1..depth); T1fac = poweroftwo(2..32); RoseLocus.Tiling(loop=indexT1, factor=T1fac); } } OR { if (depth > 1) { indexUAJ = integer(1..depth-1); UAJfac = poweroftwo(2..4); RoseLocus.UnrollAndJam(loop=indexUAJ, factor=UAJfac); } } OR { None; # No tiling, interchange, or unroll and jam. } innerloops = BuiltIn.ListInnerLoops(); *RoseLocus.Distribute(loop=innerloops); } innerloops = BuiltIn.ListInnerLoops(); RoseLocus.Unroll(loop=innerloops, factor=poweroftwo(2..8)); }

slide-111
SLIDE 111

Optimization of Arbitrary Loop Nests

37

CodeReg scop { perfect = BuiltIn.IsPerfectLoopNest(); depth = BuiltIn.LoopNestDepth(); if (RoseLocus.IsDepAvailable()) { if (perfect && depth > 1) { permorder = permutation(seq(0,depth)); RoseLocus.Interchange(order=permorder); } { if (perfect) { indexT1 = integer(1..depth); T1fac = poweroftwo(2..32); RoseLocus.Tiling(loop=indexT1, factor=T1fac); } } OR { if (depth > 1) { indexUAJ = integer(1..depth-1); UAJfac = poweroftwo(2..4); RoseLocus.UnrollAndJam(loop=indexUAJ, factor=UAJfac); } } OR { None; # No tiling, interchange, or unroll and jam. } innerloops = BuiltIn.ListInnerLoops(); *RoseLocus.Distribute(loop=innerloops); } innerloops = BuiltIn.ListInnerLoops(); RoseLocus.Unroll(loop=innerloops, factor=poweroftwo(2..8)); }

slide-112
SLIDE 112

Optimization of Arbitrary Loop Nests

38

CodeReg scop { perfect = BuiltIn.IsPerfectLoopNest(); depth = BuiltIn.LoopNestDepth(); if (RoseLocus.IsDepAvailable()) { if (perfect && depth > 1) { permorder = permutation(seq(0,depth)); RoseLocus.Interchange(order=permorder); } { if (perfect) { indexT1 = integer(1..depth); T1fac = poweroftwo(2..32); RoseLocus.Tiling(loop=indexT1, factor=T1fac); } } OR { if (depth > 1) { indexUAJ = integer(1..depth-1); UAJfac = poweroftwo(2..4); RoseLocus.UnrollAndJam(loop=indexUAJ, factor=UAJfac); } } OR { None; # No tiling, interchange, or unroll and jam. } innerloops = BuiltIn.ListInnerLoops(); *RoseLocus.Distribute(loop=innerloops); } innerloops = BuiltIn.ListInnerLoops(); RoseLocus.Unroll(loop=innerloops, factor=poweroftwo(2..8)); }

slide-113
SLIDE 113

Optimization of Arbitrary Loop Nests

39

CodeReg scop { perfect = BuiltIn.IsPerfectLoopNest(); depth = BuiltIn.LoopNestDepth(); if (RoseLocus.IsDepAvailable()) { if (perfect && depth > 1) { permorder = permutation(seq(0,depth)); RoseLocus.Interchange(order=permorder); } { if (perfect) { indexT1 = integer(1..depth); T1fac = poweroftwo(2..32); RoseLocus.Tiling(loop=indexT1, factor=T1fac); } } OR { if (depth > 1) { indexUAJ = integer(1..depth-1); UAJfac = poweroftwo(2..4); RoseLocus.UnrollAndJam(loop=indexUAJ, factor=UAJfac); } } OR { None; # No tiling, interchange, or unroll and jam. } innerloops = BuiltIn.ListInnerLoops(); *RoseLocus.Distribute(loop=innerloops); } innerloops = BuiltIn.ListInnerLoops(); RoseLocus.Unroll(loop=innerloops, factor=poweroftwo(2..8)); }

slide-114
SLIDE 114

Optimization of Arbitrary Loop Nests

40

#include <iostream> #include <sstream> #include <fstream> #include <vector> #include <map> #include <algorithm> #include <functional> #include <numeric> #include <cstdio> #include "rose.h" #include <CPPAstInterface.h> #include <ArrayAnnot.h> #include <ArrayRewrite.h> #include <AstInterface_ROSE.h> #include <LoopTransformInterface.h> #include <AnnotCollect.h> #include <OperatorAnnotation.h> #include <candl/candl.h> #include <scoplib/scop.h> #include <polyopt/PolyOpt.hpp> #include <polyopt/ScopExtractor.hpp> #include <polyopt/SageNodeAnnotation.hpp> #include <utils.hh> #include <transformation.hh> #include <dependence.hh> #include <analysis.hh> #include <database.hh> #include <config.hh> #include <staticfeature.hh> #include <boost/program_options.hpp> using namespace std; using namespace restructurer; namespace po = boost::program_options; int main(int argc, char* argv[]) { po::options_description description("restructurer usage"); description.add_options() ("help", "Display this help message") ("benchmark", po::value<string>(), "Specify the benchmark") ("version", po::value<string>(), "Specify the version of the benchmark") ("application", po::value<string>(), "Specify the application in the benchmark") ("file", po::value<string>(), "Specify the file that contains the loop") ("function", po::value<string>(), "Specify the function that contains the loop") ("line", po::value<string>(), "Specify the starting line number of the loop") ("skipinterchangetiling", "Do not perform interchange or tiling") ("nodb", "Do not write to database") ("dependenceonly", "Only output the dependence information of the original loop nest") ("extractstaticfeatures", "Only extract static features of the original loop nest"); string benchmark, version, application, file_name, func_name, line_no; bool dependenceonly = false; bool extractstaticfeatures = false; bool skip_interchange_tiling = false; try { po::variables_map vm; po::store(po::command_line_parser(argc, argv).options(description).allow_unregistered().run(), vm); po::notify(vm); if (vm.count("nodb")) { write_to_db = false; } else { if (!(vm.count("benchmark") && vm.count("version") && vm.count("application") && vm.count("file") && vm.count("function") && vm.count("line"))) { throw std::exception(); } benchmark = vm["benchmark"].as<string>(); version = vm["version"].as<string>(); application = vm["application"].as<string>(); file_name = vm["file"].as<string>(); func_name = vm["function"].as<string>(); line_no = vm["line"].as<string>(); } if (vm.count("dependenceonly")) { dependenceonly = true; } if (vm.count("extractstaticfeatures")) { extractstaticfeatures = true; } if (vm.count("skipinterchangetiling")) { skip_interchange_tiling = true; } } catch ( const std::exception& e ) { cerr << "Failed to process arguments " << e.what() << endl; return -1; } SgStringList args = CommandlineProcessing::generateArgListFromArgcArgv(argc, argv); SgProject* project = frontend(args); ROSE_ASSERT(project != NULL); SgFile &file = project->get_file(0); Sg_File_Info *file_info = file.get_file_info(); Database *db = Database::getInstance(); cout << "benchmark: " << benchmark << endl; cout << "version: " << version << endl; cout << "application: " << application << endl; cout << "file name: " << file_name << endl; cout << "function: " << func_name << endl; cout << "line: " << line_no << endl; db->init(benchmark, version, application, file_name, func_name, line_no); SageInterface::changeAllBodiesToBlocks(project); SgBasicBlock *body = NULL; VariantVector vv_func(V_SgFunctionDefinition); Rose_STL_Container<SgNode*> funcion_list = NodeQuery::queryMemoryPool(vv_func); for (Rose_STL_Container<SgNode*>::iterator f_itr = funcion_list.begin(); f_itr != funcion_list.end(); ++f_itr) { SgFunctionDefinition *cur_func = isSgFunctionDefinition(*f_itr); string name = cur_func->get_declaration()->get_name().getString(); SgBasicBlock *func_body = cur_func->get_body(); if (name == "loop" && func_body) { body = func_body; } } //project->unparse(); //cout << skip_interchange_tiling << endl; //return 0; if (dependenceonly) { DependenceGraph orig_dep_graph;
  • rig_dep_graph.printDepSet(cout);
//orig_dep_graph.printStmtDepSet(cout);
  • fstream dot("dep.dot", ofstream::out);
  • rig_dep_graph.outputDot(dot);
dot.close(); return 0; } InheritedAttribute inh_attr; OuterLoopIdentifier first_pass; first_pass.traverseInputFiles(project, inh_attr); const loop_info_vec_t &outer_loop_vec = first_pass.getOuterLoopVec(); if (extractstaticfeatures) { for (loop_info_vec_t::const_iterator citr = outer_loop_vec.begin(); citr != outer_loop_vec.end(); ++citr) { OuterLoopInfo loop(*citr); if (!SageInterface::isAncestor(body, loop.loop_)) { continue; } int depth = loop.inner_loop_nest_depth_; StaticFeatureExtraction *extract = new StaticFeatureExtraction(loop.loop_, depth); extract->startExtractingFeatures(); extract->writeToDataBase(db); delete extract; } return 0; } string ref_out; //distributeInnerLoops(root, ref_out); //return 0; string file_path(file.get_sourceFileNameWithPath()); string bin_path(file_path + ".out"); if (verify_output) { printBanner("Generate Reference Output"); compile(file_path, bin_path); ref_out = exec_cmd(bin_path); cout << "Reference Output:" << endl << ref_out << endl; } printBanner("Calculate Original Dependence Vectors"); DependenceGraph orig_dep_graph;
  • rig_dep_graph.printDepSet(cout);
if (generate_output(ref_out, db->getMutationNumber())) { db->clearOldData(); db->addMutation(Database::trans_vec_t()); } dep_set_t orig_dep_set = orig_dep_graph.getDepSet(); SgBasicBlock *root = isSgBasicBlock(orig_dep_graph.getRoot()); if (root == NULL) { cout << "No proper SCoP is found in the input file." << endl; if (body) { cout << "Only doing unrolling." << endl; Database::trans_vec_t trans_vec; unrollInnerLoops(trans_vec, body, ref_out); } db->finalize(true); return 0; } /*vector<set<int> > disjoint_sets = orig_dep_graph.getDisjointStatmentSets(); for (vector<set<int> >::iterator v_itr = disjoint_sets.begin(); v_itr != disjoint_sets.end(); ++v_itr) { for (set<int>::iterator s_itr = v_itr->begin(); s_itr != v_itr->end(); ++s_itr) { cout << *s_itr << " "; } cout << endl; }*/ //NodePrinter np; //np.traverse(root); //return 0; //return 0; for (loop_info_vec_t::const_iterator citr = outer_loop_vec.begin(); citr != outer_loop_vec.end(); ++citr) { OuterLoopInfo loop(*citr); if (!SageInterface::isAncestor(root, loop.loop_)) { continue; } Database::trans_vec_t trans_vec; int depth = loop.inner_loop_nest_depth_; unrollInnerLoops(trans_vec, root, ref_out); if (!skip_interchange_tiling) { unrollAndJam(trans_vec, root, depth, ref_out, orig_dep_graph); } distributeInnerLoops(trans_vec, root, ref_out, orig_dep_graph); // Check if the loop nest is perfect. Skip interchange and tiling if the nest is not perfect. bool perfect = true; vector<SgForStatement *> loop_nest = SageInterface::querySubTree<SgForStatement>(loop.loop_, V_SgForStatement); for (vector<SgForStatement *>::iterator f_itr = loop_nest.begin(); f_itr != loop_nest.end(); ++f_itr) { SgForStatement* cur_loop = *f_itr; SgBasicBlock *cur_body = isSgBasicBlock(SageInterface::getLoopBody(cur_loop)); if (cur_body->get_numberOfTraversalSuccessors() != 1) { if (SageInterface::querySubTree<SgForStatement>(cur_body, V_SgForStatement).size() == 0) { continue; } perfect = false; break; } } if (!perfect) { cout << "Loop nest is imperfect. Skip Interchange and Tiling." << endl; continue; } if (!skip_interchange_tiling) { tileLoops(trans_vec, root, loop.loop_, depth, ref_out, orig_dep_graph); } // If depth is 1, skip interchange. if (depth == 1 || skip_interchange_tiling) { continue; } int num_order = 1; for (int i = 2; i <= depth; ++i) { num_order *= i; } for (int i = 1; i < num_order; ++i) { //! A helper function to return a permutation order for n elements based on a lexicographical order number. //! See also, Combinatorics::permute(), which is faster but does not use strict lexicographic ordering. size_t k = i; vector<size_t> s(depth); // initialize the permutation vector for (size_t j = 0; j < depth; ++j) { s[j]=j; } //compute (n- 1)! size_t factorial = 1; for (int j = 2; j <= depth - 1; ++j) { factorial *= j; } // Algorithm: //check each element of the array, excluding the right most one. //the goal is to find the right element for each s[j] from 0 to n-2 // method: each position is associated a factorial number // s[0] -> (n-1)! // s[1] -> (n-2)! ... // the input number k is divided by the factorial at each position (6, 3, 2, 1 for size =4) // so only big enough k can have non-zero value after division // 0 value means no change to the position for the current iteration // The non-zero value is further modular by the number of the right hand elements of the current element. // (mode on 4, 3, 2 to get offset 1-2-3, 1-2, 1 from the current position 0, 1, 2) // choose one of them to be moved to the current position, // shift elements between the current and the moved element to the right direction for one position for (size_t j = 0; j < depth - 1; ++j) { //calculates the next cell from the cells left //(the cells in the range [j, s.length - 1]) int tempj = (k / factorial) % (depth - j); //Temporarily saves the value of the cell needed // to add to the permutation this time int temps = s[j + tempj]; //shift all elements to "cover" the "missing" cell //shift them to the right for (size_t l = j + tempj; l > j; --l) { s[l] = s[l - 1]; //shift the chain right } // put the chosen cell in the correct spot s[j] = temps; // updates the factorial factorial = factorial / (depth - (j + 1)); } bool legal = orig_dep_graph.isPermutationLegal(s); if (!legal) { cout << "Permutation [ "; for (int i = 0; i < s.size(); ++i) { cout << s[i] << " "; } cout << "] is illegal" << endl; continue; } SgForStatement *new_loop = isSgForStatement(SageInterface::copyStatement(loop.loop_)); SageInterface::replaceStatement(loop.loop_, new_loop); if (SageInterface::loopInterchange(new_loop, depth, i)) { cout << "Interchanged a loop nest of depth " << depth << " with permutation [ "; for (int j = 0; j < s.size(); ++j) { cout << s[j] << " "; } cout << "]" << endl; if (generate_output(ref_out, db->getMutationNumber())) { stringstream ss; ss << "depth=" << depth << ",order=" << i; Database::trans_vec_t tmp_vec = trans_vec; Database::Transformation trans("interchange", ss.str()); tmp_vec.push_back(trans); db->addMutation(tmp_vec); DependenceGraph permu_dep_graph; unrollInnerLoops(tmp_vec, root, ref_out); unrollAndJam(tmp_vec, root, depth, ref_out, permu_dep_graph); distributeInnerLoops(tmp_vec, root, ref_out, permu_dep_graph); tileLoops(tmp_vec, root, new_loop, depth, ref_out, permu_dep_graph); } } SageInterface::replaceStatement(new_loop, loop.loop_); SageInterface::deepDelete(new_loop); } } db->finalize(true); delete project; return 0; } #include <iostream> #include <sstream> #include <fstream> #include <vector> #include <map> #include <algorithm> #include <functional> #include <numeric> #include <cstdio> #include <string> #include "rose.h" #include <AstInterface_ROSE.h> #include <LoopTransformInterface.h> #include <transformation.hh> #include <utils.hh> #include <analysis.hh> #include <database.hh> using namespace std; using namespace restructurer; void restructurer::unrollInnerLoops(const Database::trans_vec_t &trans_vec, SgStatement *base, const string &ref_out, bool if_distribute) { SgProject *project = SageInterface::getProject(); Database *db = Database::getInstance(); SgStatement *old_copy = NULL; SgStatement *new_copy = base; for (int factor = 2; factor <= 8; factor *= 2) { //printBanner("Find Innermost Loops");
  • ld_copy = new_copy;
new_copy = SageInterface::copyStatement(base); SageInterface::replaceStatement(old_copy, new_copy); if (old_copy != base) { SageInterface::deepDelete(old_copy); } InnerLoopIdentifier finder; finder.traverseInputFiles(project); //printBanner("Loop Unrolling"); const loop_vec_t &inner_loop_vec = finder.getInnerLoopVec(); bool if_changed = false; for (loop_vec_t::const_iterator citr = inner_loop_vec.begin(); citr != inner_loop_vec.end(); ++citr) { SgForStatement *loop = *citr; if (!SageInterface::isAncestor(new_copy, loop)) { continue; } SgVariableSymbol *for_ierator; SgExpression * lower_bound; SgExpression * upper_bound; SgExpression * stride; bool success = SageInterface::getForLoopInformations(loop, for_ierator, lower_bound, upper_bound, stride); if (success) { SgType *type = for_ierator->get_type(); if (type->stripType()->isUnsignedType()) { // Not doing unrolling on unsigned iterator cout << "Skip unrolling a loop with unsigned iterator" << endl; continue; } } int trip = getForLoopTripCount(loop); if (factor > 2 && trip < factor && trip != -1) { cout << "Skip unrolling a loop with trip count " << trip << " lower than unroll factor " << factor << endl; continue; } SgForStatement *new_loop = isSgForStatement(SageInterface::copyStatement(loop)); SageInterface::replaceStatement(loop, new_loop); SageInterface::deepDelete(loop); if (SageInterface::loopUnrolling(new_loop, factor)) { cout << "Unrolled an inner most loop with factor " << factor << endl; if_changed = true; } else { cout << "Failed to unroll an inner most loop" << endl; } } if (if_changed) { generate_output(ref_out, db->getMutationNumber()); stringstream ss; ss << "factor=" << factor; Database::trans_vec_t tmp_vec = trans_vec; Database::Transformation trans("unrolling", ss.str()); tmp_vec.push_back(trans); db->addMutation(tmp_vec); if (if_distribute) { //distributeInnerLoops(new_copy, ref_out, false); } } } if (new_copy != base) { SageInterface::replaceStatement(new_copy, base); SageInterface::deepDelete(new_copy); } } void restructurer::unrollAndJam(const Database::trans_vec_t &trans_vec, SgBasicBlock *base, int nest_level, const std::string &ref_out, const DependenceGraph &orig_dep_graph) { SgProject *project = SageInterface::getProject(); Database *db = Database::getInstance(); for (int i = 0; i < nest_level - 1; ++i) { for (int factor = 2; factor <= 4; factor *= 2) { if (orig_dep_graph.isUnrollAndJamLegal(i, factor)) { SgBasicBlock *new_copy = isSgBasicBlock(SageInterface::copyStatement(base)); SageInterface::replaceStatement(base, new_copy); SgStatementPtrList stmts = new_copy->get_statements(); SgForStatement *loop = NULL; for (SgStatementPtrList::iterator itr = stmts.begin(); itr != stmts.end(); ++itr) { if (isSgForStatement(*itr)) { loop = isSgForStatement(*itr); break; } } ROSE_ASSERT(loop != NULL); SgForStatement *target_loop = loop; SgBasicBlock *enclosing_body = new_copy; for (int j = 0; j < i; ++j) { enclosing_body = isSgBasicBlock(target_loop->get_loop_body()); ROSE_ASSERT(enclosing_body != NULL); SgStatementPtrList stmts = enclosing_body->get_statements(); for (SgStatementPtrList::iterator itr = stmts.begin(); itr != stmts.end(); ++itr) { if (isSgForStatement(*itr)) { target_loop = isSgForStatement(*itr); break; } } } size_t enclosing_size = enclosing_body->get_numberOfTraversalSuccessors(); SgBasicBlock *target_body = isSgBasicBlock(target_loop->get_loop_body()); ROSE_ASSERT(target_body != NULL); if (target_body->get_numberOfTraversalSuccessors() == 1) { int trip = getForLoopTripCount(target_loop); if (factor == 2 || trip >= factor || trip == -1) { if (SageInterface::loopUnrolling(target_loop, factor)) { SgStatementPtrList stmts = target_body->get_statements(); SgForStatement *inner_loop = isSgForStatement(stmts[0]); ROSE_ASSERT(inner_loop != NULL); SgBasicBlock *inner_scope = isSgBasicBlock(inner_loop->get_loop_body()); ROSE_ASSERT(inner_scope != NULL); for (int j = 1; j < factor; ++j) { SgBasicBlock *scope = isSgBasicBlock(stmts[j]); ROSE_ASSERT(scope != NULL); SgForStatement *dup_loop = isSgForStatement(scope->get_traversalSuccessorByIndex(0)); ROSE_ASSERT(dup_loop != NULL); SgBasicBlock *dup_scope = isSgBasicBlock(dup_loop->get_loop_body()); ROSE_ASSERT(dup_scope != NULL); SgStatementPtrList dup_stmts = dup_scope->get_statements(); for (SgStatementPtrList::iterator itr2 = dup_stmts.begin(); itr2 != dup_stmts.end(); ++itr2) { SgStatement *new_stmt = SageInterface::copyStatement(*itr2); SageInterface::appendStatement(new_stmt, inner_scope); } SageInterface::removeStatement(scope); SageInterface::deepDelete(scope); } if (enclosing_body->get_numberOfTraversalSuccessors() == enclosing_size + 2) { // we have a residue loop SgForStatement *residue_loop = isSgForStatement(enclosing_body->get_traversalSuccessorByIndex(enclosing_body->get_childIndex(target_loop) + 1)); ROSE_ASSERT(residue_loop != NULL); residue_loop->set_for_init_stmt(SageBuilder::buildForInitStatement(SageBuilder::buildNullStatement())); // need to fix this because ROSE's unrolling has a bug } cout << "Unroll-jammed at level " << i << " with factor " << factor << endl; generate_output(ref_out, db->getMutationNumber()); stringstream ss; ss << "level=" << i << ",factor=" << factor; Database::trans_vec_t tmp_vec = trans_vec; Database::Transformation trans("unrolljam", ss.str()); tmp_vec.push_back(trans); db->addMutation(tmp_vec); unrollInnerLoops(tmp_vec, new_copy, ref_out); } else { cout << "Failed to unroll-jam at level " << i << " with factor " << factor << endl; } } else { cout << "Failed to unroll-jam at level " << i << " with factor " << factor << endl; } } else { cout << "Failed to unroll-jam at level " << i << " with factor " << factor << endl; } SageInterface::replaceStatement(new_copy, base); SageInterface::deepDelete(new_copy); } else { cout << "Unroll-jam at level " << i << " with factor " << factor << " is illegal" << endl; break; } } } } void restructurer::tileLoops(const Database::trans_vec_t &trans_vec, SgStatement *base, SgForStatement *loop, int nest_level, const string &ref_out, const DependenceGraph &orig_dep_graph) { SgProject *project = SageInterface::getProject(); Database *db = Database::getInstance(); SgForStatement *target_loop = loop; vector<int> trips; for (int i = 0; i < nest_level - 1; ++i) { trips.push_back(getForLoopTripCount(target_loop)); SgBasicBlock *body = isSgBasicBlock(target_loop->get_loop_body()); ROSE_ASSERT(body != NULL); SgStatementPtrList stmts = body->get_statements(); for (SgStatementPtrList::iterator itr = stmts.begin(); itr != stmts.end(); ++itr) { if (isSgForStatement(*itr)) { target_loop = isSgForStatement(*itr); break; } } } trips.push_back(getForLoopTripCount(target_loop)); SgStatement *parent = isSgStatement(loop->get_parent()); size_t child_idx = parent->get_childIndex(loop); SgStatement *old_copy = NULL; SgStatement *new_copy = parent; for (int i = 1; i < nest_level + 1; ++i) { if (!orig_dep_graph.isTilingLegal(i - 1)) { cout << "Tiling at level " << i << " is illegal" << endl; continue; } int trip = trips[i - 1]; int bound = 32; if (trip < 32 && trip != -1) { bound = trip; } for (int tile_sz = 8; tile_sz <= bound; tile_sz *= 2) {
  • ld_copy = new_copy;
new_copy = SageInterface::copyStatement(parent); SgForStatement *new_loop = isSgForStatement(new_copy->get_traversalSuccessorByIndex(child_idx)); //printBanner("Loop Tiling"); SageInterface::replaceStatement(old_copy, new_copy); if (old_copy != parent) { SageInterface::deepDelete(old_copy); } if (SageInterface::loopTiling(new_loop, i, tile_sz)) { cout << "Tiled a loop at level " << i << " with tile size " << tile_sz << endl; if (generate_output(ref_out, db->getMutationNumber())) { stringstream ss; ss << "level=" << i << ",size=" << tile_sz; Database::trans_vec_t tmp_vec = trans_vec; Database::Transformation trans("tiling", ss.str()); tmp_vec.push_back(trans); db->addMutation(tmp_vec); unrollInnerLoops(tmp_vec, new_copy, ref_out); distributeInnerLoops(tmp_vec, new_copy, ref_out, orig_dep_graph); } } else { cout << "Failed to tile a loop" << endl; } } } if (new_copy != parent) { SageInterface::replaceStatement(new_copy, parent); SageInterface::deepDelete(new_copy); } } void restructurer::distributeInnerLoops(const Database::trans_vec_t &trans_vec, SgStatement *base, const std::string &ref_out, const DependenceGraph &orig_dep_graph, bool if_unroll) { static bool success_history = true; if (success_history == false) { return; } Rose_STL_Container<SgNode *> if_list = NodeQuery::querySubTree(base, V_SgIfStmt); if (if_list.size()) { return; } SgProject *project = SageInterface::getProject(); Database *db = Database::getInstance(); SgStatement *new_copy = SageInterface::copyStatement(base); SageInterface::replaceStatement(base, new_copy); DependenceGraph dep_graph; InnerLoopIdentifier finder; finder.traverseInputFiles(project); //printBanner("Loop Fission"); const loop_vec_t &inner_loop_vec = finder.getInnerLoopVec(); //orig_dep_graph.printDepSet(cout); bool if_changed = false; for (loop_vec_t::const_iterator citr = inner_loop_vec.begin(); citr != inner_loop_vec.end(); ++citr) { SgForStatement *loop = *citr; if (!SageInterface::isAncestor(new_copy, loop)) { continue; } vector<set<int> > disjoint_sets = dep_graph.getDisjointStatmentSets(); /*for (int i = 0; i < disjoint_sets.size(); ++i) { set<int> &s = disjoint_sets[i]; for (set<int>::iterator sitr = s.begin(); sitr != s.end(); ++sitr) { cout << *sitr << " "; } cout << endl; }*/ if (disjoint_sets.size() == 1) { continue; } int cnt = 0; for (vector<set<int> >::iterator v_itr = disjoint_sets.begin(); v_itr != disjoint_sets.end(); ++v_itr) { vector<SgExprStatement *> expr_list; for (set<int>::iterator s_itr = v_itr->begin(); s_itr != v_itr->end(); ++s_itr) { SgExprStatement *expr = dep_graph.getStatementById(*s_itr); if (SageInterface::isAncestor(loop, expr)) { //cout << expr->unparseToString() << endl; expr_list.push_back(isSgExprStatement(SageInterface::copyStatement(expr))); } } if (expr_list.size()) { SgForStatement *new_loop = isSgForStatement(SageInterface::copyStatement(loop)); SageInterface::insertStatement(loop, new_loop); SgStatement *old_body = new_loop->get_loop_body(); SgBasicBlock *new_body = SageBuilder::buildBasicBlock(); for (vector<SgExprStatement *>::iterator expr_itr = expr_list.begin(); expr_itr != expr_list.end(); ++expr_itr) { SageInterface::appendStatement(*expr_itr, new_body); } new_loop->set_loop_body(new_body); SageInterface::deepDelete(old_body); cnt++; } } if (cnt > 1) { cout << "Distributed a loop into " << cnt << " loops" << endl; if_changed = true; } SageInterface::removeStatement(loop); SageInterface::deepDelete(loop); } success_history = false; if (if_changed) { if (generate_output(ref_out, db->getMutationNumber())) { success_history = true; stringstream ss; ss << "allow_dep=" << 0; Database::trans_vec_t tmp_vec = trans_vec; Database::Transformation trans("distribution", ss.str()); tmp_vec.push_back(trans); db->addMutation(tmp_vec); if (if_unroll) { unrollInnerLoops(tmp_vec, new_copy, ref_out, false); } } } SageInterface::replaceStatement(new_copy, base); SageInterface::deepDelete(new_copy); }

CodeReg scop { perfect = BuiltIn.IsPerfectLoopNest(); depth = BuiltIn.LoopNestDepth(); if (RoseLocus.IsDepAvailable()) { if (perfect && depth > 1) { permorder = permutation(seq(0,depth)); RoseLocus.Interchange(order=permorder); } { if (perfect) { indexT1 = integer(1..depth); T1fac = poweroftwo(2..32); RoseLocus.Tiling(loop=indexT1, factor=T1fac); } } OR { if (depth > 1) { indexUAJ = integer(1..depth-1); UAJfac = poweroftwo(2..4); RoseLocus.UnrollAndJam(loop=indexUAJ, factor=UAJfac); } } OR { None; # No tiling, interchange, or unroll and jam. } innerloops = BuiltIn.ListInnerLoops(); *RoseLocus.Distribute(loop=innerloops); } innerloops = BuiltIn.ListInnerLoops(); RoseLocus.Unroll(loop=innerloops, factor=poweroftwo(2..8)); }

slide-115
SLIDE 115

CodeReg scop { perfect = BuiltIn.IsPerfectLoopNest(); depth = BuiltIn.LoopNestDepth(); if (RoseLocus.IsDepAvailable()) { if (perfect && depth > 1) { permorder = permutation(seq(0,depth)); RoseLocus.Interchange(order=permorder); } { if (perfect) { indexT1 = integer(1..depth); T1fac = poweroftwo(2..32); RoseLocus.Tiling(loop=indexT1, factor=T1fac); } } OR { if (depth > 1) { indexUAJ = integer(1..depth-1); UAJfac = poweroftwo(2..4); RoseLocus.UnrollAndJam(loop=indexUAJ, factor=UAJfac); } } OR { None; # No tiling, interchange, or unroll and jam. } innerloops = BuiltIn.ListInnerLoops(); *RoseLocus.Distribute(loop=innerloops); } innerloops = BuiltIn.ListInnerLoops(); RoseLocus.Unroll(loop=innerloops, factor=poweroftwo(2..8)); }

Optimization of Arbitrary Loop Nests

41

#include <iostream> #include <sstream> #include <fstream> #include <vector> #include <map> #include <algorithm> #include <functional> #include <numeric> #include <cstdio> #include "rose.h" #include <CPPAstInterface.h> #include <ArrayAnnot.h> #include <ArrayRewrite.h> #include <AstInterface_ROSE.h> #include <LoopTransformInterface.h> #include <AnnotCollect.h> #include <OperatorAnnotation.h> #include <candl/candl.h> #include <scoplib/scop.h> #include <polyopt/PolyOpt.hpp> #include <polyopt/ScopExtractor.hpp> #include <polyopt/SageNodeAnnotation.hpp> #include <utils.hh> #include <transformation.hh> #include <dependence.hh> #include <analysis.hh> #include <database.hh> #include <config.hh> #include <staticfeature.hh> #include <boost/program_options.hpp> using namespace std; using namespace restructurer; namespace po = boost::program_options; int main(int argc, char* argv[]) { po::options_description description("restructurer usage"); description.add_options() ("help", "Display this help message") ("benchmark", po::value<string>(), "Specify the benchmark") ("version", po::value<string>(), "Specify the version of the benchmark") ("application", po::value<string>(), "Specify the application in the benchmark") ("file", po::value<string>(), "Specify the file that contains the loop") ("function", po::value<string>(), "Specify the function that contains the loop") ("line", po::value<string>(), "Specify the starting line number of the loop") ("skipinterchangetiling", "Do not perform interchange or tiling") ("nodb", "Do not write to database") ("dependenceonly", "Only output the dependence information of the original loop nest") ("extractstaticfeatures", "Only extract static features of the original loop nest"); string benchmark, version, application, file_name, func_name, line_no; bool dependenceonly = false; bool extractstaticfeatures = false; bool skip_interchange_tiling = false; try { po::variables_map vm; po::store(po::command_line_parser(argc, argv).options(description).allow_unregistered().run(), vm); po::notify(vm); if (vm.count("nodb")) { write_to_db = false; } else { if (!(vm.count("benchmark") && vm.count("version") && vm.count("application") && vm.count("file") && vm.count("function") && vm.count("line"))) { throw std::exception(); } benchmark = vm["benchmark"].as<string>(); version = vm["version"].as<string>(); application = vm["application"].as<string>(); file_name = vm["file"].as<string>(); func_name = vm["function"].as<string>(); line_no = vm["line"].as<string>(); } if (vm.count("dependenceonly")) { dependenceonly = true; } if (vm.count("extractstaticfeatures")) { extractstaticfeatures = true; } if (vm.count("skipinterchangetiling")) { skip_interchange_tiling = true; } } catch ( const std::exception& e ) { cerr << "Failed to process arguments " << e.what() << endl; return -1; } SgStringList args = CommandlineProcessing::generateArgListFromArgcArgv(argc, argv); SgProject* project = frontend(args); ROSE_ASSERT(project != NULL); SgFile &file = project->get_file(0); Sg_File_Info *file_info = file.get_file_info(); Database *db = Database::getInstance(); cout << "benchmark: " << benchmark << endl; cout << "version: " << version << endl; cout << "application: " << application << endl; cout << "file name: " << file_name << endl; cout << "function: " << func_name << endl; cout << "line: " << line_no << endl; db->init(benchmark, version, application, file_name, func_name, line_no); SageInterface::changeAllBodiesToBlocks(project); SgBasicBlock *body = NULL; VariantVector vv_func(V_SgFunctionDefinition); Rose_STL_Container<SgNode*> funcion_list = NodeQuery::queryMemoryPool(vv_func); for (Rose_STL_Container<SgNode*>::iterator f_itr = funcion_list.begin(); f_itr != funcion_list.end(); ++f_itr) { SgFunctionDefinition *cur_func = isSgFunctionDefinition(*f_itr); string name = cur_func->get_declaration()->get_name().getString(); SgBasicBlock *func_body = cur_func->get_body(); if (name == "loop" && func_body) { body = func_body; } } //project->unparse(); //cout << skip_interchange_tiling << endl; //return 0; if (dependenceonly) { DependenceGraph orig_dep_graph;
  • rig_dep_graph.printDepSet(cout);
//orig_dep_graph.printStmtDepSet(cout);
  • fstream dot("dep.dot", ofstream::out);
  • rig_dep_graph.outputDot(dot);
dot.close(); return 0; } InheritedAttribute inh_attr; OuterLoopIdentifier first_pass; first_pass.traverseInputFiles(project, inh_attr); const loop_info_vec_t &outer_loop_vec = first_pass.getOuterLoopVec(); if (extractstaticfeatures) { for (loop_info_vec_t::const_iterator citr = outer_loop_vec.begin(); citr != outer_loop_vec.end(); ++citr) { OuterLoopInfo loop(*citr); if (!SageInterface::isAncestor(body, loop.loop_)) { continue; } int depth = loop.inner_loop_nest_depth_; StaticFeatureExtraction *extract = new StaticFeatureExtraction(loop.loop_, depth); extract->startExtractingFeatures(); extract->writeToDataBase(db); delete extract; } return 0; } string ref_out; //distributeInnerLoops(root, ref_out); //return 0; string file_path(file.get_sourceFileNameWithPath()); string bin_path(file_path + ".out"); if (verify_output) { printBanner("Generate Reference Output"); compile(file_path, bin_path); ref_out = exec_cmd(bin_path); cout << "Reference Output:" << endl << ref_out << endl; } printBanner("Calculate Original Dependence Vectors"); DependenceGraph orig_dep_graph;
  • rig_dep_graph.printDepSet(cout);
if (generate_output(ref_out, db->getMutationNumber())) { db->clearOldData(); db->addMutation(Database::trans_vec_t()); } dep_set_t orig_dep_set = orig_dep_graph.getDepSet(); SgBasicBlock *root = isSgBasicBlock(orig_dep_graph.getRoot()); if (root == NULL) { cout << "No proper SCoP is found in the input file." << endl; if (body) { cout << "Only doing unrolling." << endl; Database::trans_vec_t trans_vec; unrollInnerLoops(trans_vec, body, ref_out); } db->finalize(true); return 0; } /*vector<set<int> > disjoint_sets = orig_dep_graph.getDisjointStatmentSets(); for (vector<set<int> >::iterator v_itr = disjoint_sets.begin(); v_itr != disjoint_sets.end(); ++v_itr) { for (set<int>::iterator s_itr = v_itr->begin(); s_itr != v_itr->end(); ++s_itr) { cout << *s_itr << " "; } cout << endl; }*/ //NodePrinter np; //np.traverse(root); //return 0; //return 0; for (loop_info_vec_t::const_iterator citr = outer_loop_vec.begin(); citr != outer_loop_vec.end(); ++citr) { OuterLoopInfo loop(*citr); if (!SageInterface::isAncestor(root, loop.loop_)) { continue; } Database::trans_vec_t trans_vec; int depth = loop.inner_loop_nest_depth_; unrollInnerLoops(trans_vec, root, ref_out); if (!skip_interchange_tiling) { unrollAndJam(trans_vec, root, depth, ref_out, orig_dep_graph); } distributeInnerLoops(trans_vec, root, ref_out, orig_dep_graph); // Check if the loop nest is perfect. Skip interchange and tiling if the nest is not perfect. bool perfect = true; vector<SgForStatement *> loop_nest = SageInterface::querySubTree<SgForStatement>(loop.loop_, V_SgForStatement); for (vector<SgForStatement *>::iterator f_itr = loop_nest.begin(); f_itr != loop_nest.end(); ++f_itr) { SgForStatement* cur_loop = *f_itr; SgBasicBlock *cur_body = isSgBasicBlock(SageInterface::getLoopBody(cur_loop)); if (cur_body->get_numberOfTraversalSuccessors() != 1) { if (SageInterface::querySubTree<SgForStatement>(cur_body, V_SgForStatement).size() == 0) { continue; } perfect = false; break; } } if (!perfect) { cout << "Loop nest is imperfect. Skip Interchange and Tiling." << endl; continue; } if (!skip_interchange_tiling) { tileLoops(trans_vec, root, loop.loop_, depth, ref_out, orig_dep_graph); } // If depth is 1, skip interchange. if (depth == 1 || skip_interchange_tiling) { continue; } int num_order = 1; for (int i = 2; i <= depth; ++i) { num_order *= i; } for (int i = 1; i < num_order; ++i) { //! A helper function to return a permutation order for n elements based on a lexicographical order number. //! See also, Combinatorics::permute(), which is faster but does not use strict lexicographic ordering. size_t k = i; vector<size_t> s(depth); // initialize the permutation vector for (size_t j = 0; j < depth; ++j) { s[j]=j; } //compute (n- 1)! size_t factorial = 1; for (int j = 2; j <= depth - 1; ++j) { factorial *= j; } // Algorithm: //check each element of the array, excluding the right most one. //the goal is to find the right element for each s[j] from 0 to n-2 // method: each position is associated a factorial number // s[0] -> (n-1)! // s[1] -> (n-2)! ... // the input number k is divided by the factorial at each position (6, 3, 2, 1 for size =4) // so only big enough k can have non-zero value after division // 0 value means no change to the position for the current iteration // The non-zero value is further modular by the number of the right hand elements of the current element. // (mode on 4, 3, 2 to get offset 1-2-3, 1-2, 1 from the current position 0, 1, 2) // choose one of them to be moved to the current position, // shift elements between the current and the moved element to the right direction for one position for (size_t j = 0; j < depth - 1; ++j) { //calculates the next cell from the cells left //(the cells in the range [j, s.length - 1]) int tempj = (k / factorial) % (depth - j); //Temporarily saves the value of the cell needed // to add to the permutation this time int temps = s[j + tempj]; //shift all elements to "cover" the "missing" cell //shift them to the right for (size_t l = j + tempj; l > j; --l) { s[l] = s[l - 1]; //shift the chain right } // put the chosen cell in the correct spot s[j] = temps; // updates the factorial factorial = factorial / (depth - (j + 1)); } bool legal = orig_dep_graph.isPermutationLegal(s); if (!legal) { cout << "Permutation [ "; for (int i = 0; i < s.size(); ++i) { cout << s[i] << " "; } cout << "] is illegal" << endl; continue; } SgForStatement *new_loop = isSgForStatement(SageInterface::copyStatement(loop.loop_)); SageInterface::replaceStatement(loop.loop_, new_loop); if (SageInterface::loopInterchange(new_loop, depth, i)) { cout << "Interchanged a loop nest of depth " << depth << " with permutation [ "; for (int j = 0; j < s.size(); ++j) { cout << s[j] << " "; } cout << "]" << endl; if (generate_output(ref_out, db->getMutationNumber())) { stringstream ss; ss << "depth=" << depth << ",order=" << i; Database::trans_vec_t tmp_vec = trans_vec; Database::Transformation trans("interchange", ss.str()); tmp_vec.push_back(trans); db->addMutation(tmp_vec); DependenceGraph permu_dep_graph; unrollInnerLoops(tmp_vec, root, ref_out); unrollAndJam(tmp_vec, root, depth, ref_out, permu_dep_graph); distributeInnerLoops(tmp_vec, root, ref_out, permu_dep_graph); tileLoops(tmp_vec, root, new_loop, depth, ref_out, permu_dep_graph); } } SageInterface::replaceStatement(new_loop, loop.loop_); SageInterface::deepDelete(new_loop); } } db->finalize(true); delete project; return 0; } #include <iostream> #include <sstream> #include <fstream> #include <vector> #include <map> #include <algorithm> #include <functional> #include <numeric> #include <cstdio> #include <string> #include "rose.h" #include <AstInterface_ROSE.h> #include <LoopTransformInterface.h> #include <transformation.hh> #include <utils.hh> #include <analysis.hh> #include <database.hh> using namespace std; using namespace restructurer; void restructurer::unrollInnerLoops(const Database::trans_vec_t &trans_vec, SgStatement *base, const string &ref_out, bool if_distribute) { SgProject *project = SageInterface::getProject(); Database *db = Database::getInstance(); SgStatement *old_copy = NULL; SgStatement *new_copy = base; for (int factor = 2; factor <= 8; factor *= 2) { //printBanner("Find Innermost Loops");
  • ld_copy = new_copy;
new_copy = SageInterface::copyStatement(base); SageInterface::replaceStatement(old_copy, new_copy); if (old_copy != base) { SageInterface::deepDelete(old_copy); } InnerLoopIdentifier finder; finder.traverseInputFiles(project); //printBanner("Loop Unrolling"); const loop_vec_t &inner_loop_vec = finder.getInnerLoopVec(); bool if_changed = false; for (loop_vec_t::const_iterator citr = inner_loop_vec.begin(); citr != inner_loop_vec.end(); ++citr) { SgForStatement *loop = *citr; if (!SageInterface::isAncestor(new_copy, loop)) { continue; } SgVariableSymbol *for_ierator; SgExpression * lower_bound; SgExpression * upper_bound; SgExpression * stride; bool success = SageInterface::getForLoopInformations(loop, for_ierator, lower_bound, upper_bound, stride); if (success) { SgType *type = for_ierator->get_type(); if (type->stripType()->isUnsignedType()) { // Not doing unrolling on unsigned iterator cout << "Skip unrolling a loop with unsigned iterator" << endl; continue; } } int trip = getForLoopTripCount(loop); if (factor > 2 && trip < factor && trip != -1) { cout << "Skip unrolling a loop with trip count " << trip << " lower than unroll factor " << factor << endl; continue; } SgForStatement *new_loop = isSgForStatement(SageInterface::copyStatement(loop)); SageInterface::replaceStatement(loop, new_loop); SageInterface::deepDelete(loop); if (SageInterface::loopUnrolling(new_loop, factor)) { cout << "Unrolled an inner most loop with factor " << factor << endl; if_changed = true; } else { cout << "Failed to unroll an inner most loop" << endl; } } if (if_changed) { generate_output(ref_out, db->getMutationNumber()); stringstream ss; ss << "factor=" << factor; Database::trans_vec_t tmp_vec = trans_vec; Database::Transformation trans("unrolling", ss.str()); tmp_vec.push_back(trans); db->addMutation(tmp_vec); if (if_distribute) { //distributeInnerLoops(new_copy, ref_out, false); } } } if (new_copy != base) { SageInterface::replaceStatement(new_copy, base); SageInterface::deepDelete(new_copy); } } void restructurer::unrollAndJam(const Database::trans_vec_t &trans_vec, SgBasicBlock *base, int nest_level, const std::string &ref_out, const DependenceGraph &orig_dep_graph) { SgProject *project = SageInterface::getProject(); Database *db = Database::getInstance(); for (int i = 0; i < nest_level - 1; ++i) { for (int factor = 2; factor <= 4; factor *= 2) { if (orig_dep_graph.isUnrollAndJamLegal(i, factor)) { SgBasicBlock *new_copy = isSgBasicBlock(SageInterface::copyStatement(base)); SageInterface::replaceStatement(base, new_copy); SgStatementPtrList stmts = new_copy->get_statements(); SgForStatement *loop = NULL; for (SgStatementPtrList::iterator itr = stmts.begin(); itr != stmts.end(); ++itr) { if (isSgForStatement(*itr)) { loop = isSgForStatement(*itr); break; } } ROSE_ASSERT(loop != NULL); SgForStatement *target_loop = loop; SgBasicBlock *enclosing_body = new_copy; for (int j = 0; j < i; ++j) { enclosing_body = isSgBasicBlock(target_loop->get_loop_body()); ROSE_ASSERT(enclosing_body != NULL); SgStatementPtrList stmts = enclosing_body->get_statements(); for (SgStatementPtrList::iterator itr = stmts.begin(); itr != stmts.end(); ++itr) { if (isSgForStatement(*itr)) { target_loop = isSgForStatement(*itr); break; } } } size_t enclosing_size = enclosing_body->get_numberOfTraversalSuccessors(); SgBasicBlock *target_body = isSgBasicBlock(target_loop->get_loop_body()); ROSE_ASSERT(target_body != NULL); if (target_body->get_numberOfTraversalSuccessors() == 1) { int trip = getForLoopTripCount(target_loop); if (factor == 2 || trip >= factor || trip == -1) { if (SageInterface::loopUnrolling(target_loop, factor)) { SgStatementPtrList stmts = target_body->get_statements(); SgForStatement *inner_loop = isSgForStatement(stmts[0]); ROSE_ASSERT(inner_loop != NULL); SgBasicBlock *inner_scope = isSgBasicBlock(inner_loop->get_loop_body()); ROSE_ASSERT(inner_scope != NULL); for (int j = 1; j < factor; ++j) { SgBasicBlock *scope = isSgBasicBlock(stmts[j]); ROSE_ASSERT(scope != NULL); SgForStatement *dup_loop = isSgForStatement(scope->get_traversalSuccessorByIndex(0)); ROSE_ASSERT(dup_loop != NULL); SgBasicBlock *dup_scope = isSgBasicBlock(dup_loop->get_loop_body()); ROSE_ASSERT(dup_scope != NULL); SgStatementPtrList dup_stmts = dup_scope->get_statements(); for (SgStatementPtrList::iterator itr2 = dup_stmts.begin(); itr2 != dup_stmts.end(); ++itr2) { SgStatement *new_stmt = SageInterface::copyStatement(*itr2); SageInterface::appendStatement(new_stmt, inner_scope); } SageInterface::removeStatement(scope); SageInterface::deepDelete(scope); } if (enclosing_body->get_numberOfTraversalSuccessors() == enclosing_size + 2) { // we have a residue loop SgForStatement *residue_loop = isSgForStatement(enclosing_body->get_traversalSuccessorByIndex(enclosing_body->get_childIndex(target_loop) + 1)); ROSE_ASSERT(residue_loop != NULL); residue_loop->set_for_init_stmt(SageBuilder::buildForInitStatement(SageBuilder::buildNullStatement())); // need to fix this because ROSE's unrolling has a bug } cout << "Unroll-jammed at level " << i << " with factor " << factor << endl; generate_output(ref_out, db->getMutationNumber()); stringstream ss; ss << "level=" << i << ",factor=" << factor; Database::trans_vec_t tmp_vec = trans_vec; Database::Transformation trans("unrolljam", ss.str()); tmp_vec.push_back(trans); db->addMutation(tmp_vec); unrollInnerLoops(tmp_vec, new_copy, ref_out); } else { cout << "Failed to unroll-jam at level " << i << " with factor " << factor << endl; } } else { cout << "Failed to unroll-jam at level " << i << " with factor " << factor << endl; } } else { cout << "Failed to unroll-jam at level " << i << " with factor " << factor << endl; } SageInterface::replaceStatement(new_copy, base); SageInterface::deepDelete(new_copy); } else { cout << "Unroll-jam at level " << i << " with factor " << factor << " is illegal" << endl; break; } } } } void restructurer::tileLoops(const Database::trans_vec_t &trans_vec, SgStatement *base, SgForStatement *loop, int nest_level, const string &ref_out, const DependenceGraph &orig_dep_graph) { SgProject *project = SageInterface::getProject(); Database *db = Database::getInstance(); SgForStatement *target_loop = loop; vector<int> trips; for (int i = 0; i < nest_level - 1; ++i) { trips.push_back(getForLoopTripCount(target_loop)); SgBasicBlock *body = isSgBasicBlock(target_loop->get_loop_body()); ROSE_ASSERT(body != NULL); SgStatementPtrList stmts = body->get_statements(); for (SgStatementPtrList::iterator itr = stmts.begin(); itr != stmts.end(); ++itr) { if (isSgForStatement(*itr)) { target_loop = isSgForStatement(*itr); break; } } } trips.push_back(getForLoopTripCount(target_loop)); SgStatement *parent = isSgStatement(loop->get_parent()); size_t child_idx = parent->get_childIndex(loop); SgStatement *old_copy = NULL; SgStatement *new_copy = parent; for (int i = 1; i < nest_level + 1; ++i) { if (!orig_dep_graph.isTilingLegal(i - 1)) { cout << "Tiling at level " << i << " is illegal" << endl; continue; } int trip = trips[i - 1]; int bound = 32; if (trip < 32 && trip != -1) { bound = trip; } for (int tile_sz = 8; tile_sz <= bound; tile_sz *= 2) {
  • ld_copy = new_copy;
new_copy = SageInterface::copyStatement(parent); SgForStatement *new_loop = isSgForStatement(new_copy->get_traversalSuccessorByIndex(child_idx)); //printBanner("Loop Tiling"); SageInterface::replaceStatement(old_copy, new_copy); if (old_copy != parent) { SageInterface::deepDelete(old_copy); } if (SageInterface::loopTiling(new_loop, i, tile_sz)) { cout << "Tiled a loop at level " << i << " with tile size " << tile_sz << endl; if (generate_output(ref_out, db->getMutationNumber())) { stringstream ss; ss << "level=" << i << ",size=" << tile_sz; Database::trans_vec_t tmp_vec = trans_vec; Database::Transformation trans("tiling", ss.str()); tmp_vec.push_back(trans); db->addMutation(tmp_vec); unrollInnerLoops(tmp_vec, new_copy, ref_out); distributeInnerLoops(tmp_vec, new_copy, ref_out, orig_dep_graph); } } else { cout << "Failed to tile a loop" << endl; } } } if (new_copy != parent) { SageInterface::replaceStatement(new_copy, parent); SageInterface::deepDelete(new_copy); } } void restructurer::distributeInnerLoops(const Database::trans_vec_t &trans_vec, SgStatement *base, const std::string &ref_out, const DependenceGraph &orig_dep_graph, bool if_unroll) { static bool success_history = true; if (success_history == false) { return; } Rose_STL_Container<SgNode *> if_list = NodeQuery::querySubTree(base, V_SgIfStmt); if (if_list.size()) { return; } SgProject *project = SageInterface::getProject(); Database *db = Database::getInstance(); SgStatement *new_copy = SageInterface::copyStatement(base); SageInterface::replaceStatement(base, new_copy); DependenceGraph dep_graph; InnerLoopIdentifier finder; finder.traverseInputFiles(project); //printBanner("Loop Fission"); const loop_vec_t &inner_loop_vec = finder.getInnerLoopVec(); //orig_dep_graph.printDepSet(cout); bool if_changed = false; for (loop_vec_t::const_iterator citr = inner_loop_vec.begin(); citr != inner_loop_vec.end(); ++citr) { SgForStatement *loop = *citr; if (!SageInterface::isAncestor(new_copy, loop)) { continue; } vector<set<int> > disjoint_sets = dep_graph.getDisjointStatmentSets(); /*for (int i = 0; i < disjoint_sets.size(); ++i) { set<int> &s = disjoint_sets[i]; for (set<int>::iterator sitr = s.begin(); sitr != s.end(); ++sitr) { cout << *sitr << " "; } cout << endl; }*/ if (disjoint_sets.size() == 1) { continue; } int cnt = 0; for (vector<set<int> >::iterator v_itr = disjoint_sets.begin(); v_itr != disjoint_sets.end(); ++v_itr) { vector<SgExprStatement *> expr_list; for (set<int>::iterator s_itr = v_itr->begin(); s_itr != v_itr->end(); ++s_itr) { SgExprStatement *expr = dep_graph.getStatementById(*s_itr); if (SageInterface::isAncestor(loop, expr)) { //cout << expr->unparseToString() << endl; expr_list.push_back(isSgExprStatement(SageInterface::copyStatement(expr))); } } if (expr_list.size()) { SgForStatement *new_loop = isSgForStatement(SageInterface::copyStatement(loop)); SageInterface::insertStatement(loop, new_loop); SgStatement *old_body = new_loop->get_loop_body(); SgBasicBlock *new_body = SageBuilder::buildBasicBlock(); for (vector<SgExprStatement *>::iterator expr_itr = expr_list.begin(); expr_itr != expr_list.end(); ++expr_itr) { SageInterface::appendStatement(*expr_itr, new_body); } new_loop->set_loop_body(new_body); SageInterface::deepDelete(old_body); cnt++; } } if (cnt > 1) { cout << "Distributed a loop into " << cnt << " loops" << endl; if_changed = true; } SageInterface::removeStatement(loop); SageInterface::deepDelete(loop); } success_history = false; if (if_changed) { if (generate_output(ref_out, db->getMutationNumber())) { success_history = true; stringstream ss; ss << "allow_dep=" << 0; Database::trans_vec_t tmp_vec = trans_vec; Database::Transformation trans("distribution", ss.str()); tmp_vec.push_back(trans); db->addMutation(tmp_vec); if (if_unroll) { unrollInnerLoops(tmp_vec, new_copy, ref_out, false); } } } SageInterface::replaceStatement(new_copy, base); SageInterface::deepDelete(new_copy); }

37 lines of code

slide-116
SLIDE 116

Optimization of Arbitrary Loop Nests

42

#include <iostream> #include <sstream> #include <fstream> #include <vector> #include <map> #include <algorithm> #include <functional> #include <numeric> #include <cstdio> #include "rose.h" #include <CPPAstInterface.h> #include <ArrayAnnot.h> #include <ArrayRewrite.h> #include <AstInterface_ROSE.h> #include <LoopTransformInterface.h> #include <AnnotCollect.h> #include <OperatorAnnotation.h> #include <candl/candl.h> #include <scoplib/scop.h> #include <polyopt/PolyOpt.hpp> #include <polyopt/ScopExtractor.hpp> #include <polyopt/SageNodeAnnotation.hpp> #include <utils.hh> #include <transformation.hh> #include <dependence.hh> #include <analysis.hh> #include <database.hh> #include <config.hh> #include <staticfeature.hh> #include <boost/program_options.hpp> using namespace std; using namespace restructurer; namespace po = boost::program_options; int main(int argc, char* argv[]) { po::options_description description("restructurer usage"); description.add_options() ("help", "Display this help message") ("benchmark", po::value<string>(), "Specify the benchmark") ("version", po::value<string>(), "Specify the version of the benchmark") ("application", po::value<string>(), "Specify the application in the benchmark") ("file", po::value<string>(), "Specify the file that contains the loop") ("function", po::value<string>(), "Specify the function that contains the loop") ("line", po::value<string>(), "Specify the starting line number of the loop") ("skipinterchangetiling", "Do not perform interchange or tiling") ("nodb", "Do not write to database") ("dependenceonly", "Only output the dependence information of the original loop nest") ("extractstaticfeatures", "Only extract static features of the original loop nest"); string benchmark, version, application, file_name, func_name, line_no; bool dependenceonly = false; bool extractstaticfeatures = false; bool skip_interchange_tiling = false; try { po::variables_map vm; po::store(po::command_line_parser(argc, argv).options(description).allow_unregistered().run(), vm); po::notify(vm); if (vm.count("nodb")) { write_to_db = false; } else { if (!(vm.count("benchmark") && vm.count("version") && vm.count("application") && vm.count("file") && vm.count("function") && vm.count("line"))) { throw std::exception(); } benchmark = vm["benchmark"].as<string>(); version = vm["version"].as<string>(); application = vm["application"].as<string>(); file_name = vm["file"].as<string>(); func_name = vm["function"].as<string>(); line_no = vm["line"].as<string>(); } if (vm.count("dependenceonly")) { dependenceonly = true; } if (vm.count("extractstaticfeatures")) { extractstaticfeatures = true; } if (vm.count("skipinterchangetiling")) { skip_interchange_tiling = true; } } catch ( const std::exception& e ) { cerr << "Failed to process arguments " << e.what() << endl; return -1; } SgStringList args = CommandlineProcessing::generateArgListFromArgcArgv(argc, argv); SgProject* project = frontend(args); ROSE_ASSERT(project != NULL); SgFile &file = project->get_file(0); Sg_File_Info *file_info = file.get_file_info(); Database *db = Database::getInstance(); cout << "benchmark: " << benchmark << endl; cout << "version: " << version << endl; cout << "application: " << application << endl; cout << "file name: " << file_name << endl; cout << "function: " << func_name << endl; cout << "line: " << line_no << endl; db->init(benchmark, version, application, file_name, func_name, line_no); SageInterface::changeAllBodiesToBlocks(project); SgBasicBlock *body = NULL; VariantVector vv_func(V_SgFunctionDefinition); Rose_STL_Container<SgNode*> funcion_list = NodeQuery::queryMemoryPool(vv_func); for (Rose_STL_Container<SgNode*>::iterator f_itr = funcion_list.begin(); f_itr != funcion_list.end(); ++f_itr) { SgFunctionDefinition *cur_func = isSgFunctionDefinition(*f_itr); string name = cur_func->get_declaration()->get_name().getString(); SgBasicBlock *func_body = cur_func->get_body(); if (name == "loop" && func_body) { body = func_body; } } //project->unparse(); //cout << skip_interchange_tiling << endl; //return 0; if (dependenceonly) { DependenceGraph orig_dep_graph;
  • rig_dep_graph.printDepSet(cout);
//orig_dep_graph.printStmtDepSet(cout);
  • fstream dot("dep.dot", ofstream::out);
  • rig_dep_graph.outputDot(dot);
dot.close(); return 0; } InheritedAttribute inh_attr; OuterLoopIdentifier first_pass; first_pass.traverseInputFiles(project, inh_attr); const loop_info_vec_t &outer_loop_vec = first_pass.getOuterLoopVec(); if (extractstaticfeatures) { for (loop_info_vec_t::const_iterator citr = outer_loop_vec.begin(); citr != outer_loop_vec.end(); ++citr) { OuterLoopInfo loop(*citr); if (!SageInterface::isAncestor(body, loop.loop_)) { continue; } int depth = loop.inner_loop_nest_depth_; StaticFeatureExtraction *extract = new StaticFeatureExtraction(loop.loop_, depth); extract->startExtractingFeatures(); extract->writeToDataBase(db); delete extract; } return 0; } string ref_out; //distributeInnerLoops(root, ref_out); //return 0; string file_path(file.get_sourceFileNameWithPath()); string bin_path(file_path + ".out"); if (verify_output) { printBanner("Generate Reference Output"); compile(file_path, bin_path); ref_out = exec_cmd(bin_path); cout << "Reference Output:" << endl << ref_out << endl; } printBanner("Calculate Original Dependence Vectors"); DependenceGraph orig_dep_graph;
  • rig_dep_graph.printDepSet(cout);
if (generate_output(ref_out, db->getMutationNumber())) { db->clearOldData(); db->addMutation(Database::trans_vec_t()); } dep_set_t orig_dep_set = orig_dep_graph.getDepSet(); SgBasicBlock *root = isSgBasicBlock(orig_dep_graph.getRoot()); if (root == NULL) { cout << "No proper SCoP is found in the input file." << endl; if (body) { cout << "Only doing unrolling." << endl; Database::trans_vec_t trans_vec; unrollInnerLoops(trans_vec, body, ref_out); } db->finalize(true); return 0; } /*vector<set<int> > disjoint_sets = orig_dep_graph.getDisjointStatmentSets(); for (vector<set<int> >::iterator v_itr = disjoint_sets.begin(); v_itr != disjoint_sets.end(); ++v_itr) { for (set<int>::iterator s_itr = v_itr->begin(); s_itr != v_itr->end(); ++s_itr) { cout << *s_itr << " "; } cout << endl; }*/ //NodePrinter np; //np.traverse(root); //return 0; //return 0; for (loop_info_vec_t::const_iterator citr = outer_loop_vec.begin(); citr != outer_loop_vec.end(); ++citr) { OuterLoopInfo loop(*citr); if (!SageInterface::isAncestor(root, loop.loop_)) { continue; } Database::trans_vec_t trans_vec; int depth = loop.inner_loop_nest_depth_; unrollInnerLoops(trans_vec, root, ref_out); if (!skip_interchange_tiling) { unrollAndJam(trans_vec, root, depth, ref_out, orig_dep_graph); } distributeInnerLoops(trans_vec, root, ref_out, orig_dep_graph); // Check if the loop nest is perfect. Skip interchange and tiling if the nest is not perfect. bool perfect = true; vector<SgForStatement *> loop_nest = SageInterface::querySubTree<SgForStatement>(loop.loop_, V_SgForStatement); for (vector<SgForStatement *>::iterator f_itr = loop_nest.begin(); f_itr != loop_nest.end(); ++f_itr) { SgForStatement* cur_loop = *f_itr; SgBasicBlock *cur_body = isSgBasicBlock(SageInterface::getLoopBody(cur_loop)); if (cur_body->get_numberOfTraversalSuccessors() != 1) { if (SageInterface::querySubTree<SgForStatement>(cur_body, V_SgForStatement).size() == 0) { continue; } perfect = false; break; } } if (!perfect) { cout << "Loop nest is imperfect. Skip Interchange and Tiling." << endl; continue; } if (!skip_interchange_tiling) { tileLoops(trans_vec, root, loop.loop_, depth, ref_out, orig_dep_graph); } // If depth is 1, skip interchange. if (depth == 1 || skip_interchange_tiling) { continue; } int num_order = 1; for (int i = 2; i <= depth; ++i) { num_order *= i; } for (int i = 1; i < num_order; ++i) { //! A helper function to return a permutation order for n elements based on a lexicographical order number. //! See also, Combinatorics::permute(), which is faster but does not use strict lexicographic ordering. size_t k = i; vector<size_t> s(depth); // initialize the permutation vector for (size_t j = 0; j < depth; ++j) { s[j]=j; } //compute (n- 1)! size_t factorial = 1; for (int j = 2; j <= depth - 1; ++j) { factorial *= j; } // Algorithm: //check each element of the array, excluding the right most one. //the goal is to find the right element for each s[j] from 0 to n-2 // method: each position is associated a factorial number // s[0] -> (n-1)! // s[1] -> (n-2)! ... // the input number k is divided by the factorial at each position (6, 3, 2, 1 for size =4) // so only big enough k can have non-zero value after division // 0 value means no change to the position for the current iteration // The non-zero value is further modular by the number of the right hand elements of the current element. // (mode on 4, 3, 2 to get offset 1-2-3, 1-2, 1 from the current position 0, 1, 2) // choose one of them to be moved to the current position, // shift elements between the current and the moved element to the right direction for one position for (size_t j = 0; j < depth - 1; ++j) { //calculates the next cell from the cells left //(the cells in the range [j, s.length - 1]) int tempj = (k / factorial) % (depth - j); //Temporarily saves the value of the cell needed // to add to the permutation this time int temps = s[j + tempj]; //shift all elements to "cover" the "missing" cell //shift them to the right for (size_t l = j + tempj; l > j; --l) { s[l] = s[l - 1]; //shift the chain right } // put the chosen cell in the correct spot s[j] = temps; // updates the factorial factorial = factorial / (depth - (j + 1)); } bool legal = orig_dep_graph.isPermutationLegal(s); if (!legal) { cout << "Permutation [ "; for (int i = 0; i < s.size(); ++i) { cout << s[i] << " "; } cout << "] is illegal" << endl; continue; } SgForStatement *new_loop = isSgForStatement(SageInterface::copyStatement(loop.loop_)); SageInterface::replaceStatement(loop.loop_, new_loop); if (SageInterface::loopInterchange(new_loop, depth, i)) { cout << "Interchanged a loop nest of depth " << depth << " with permutation [ "; for (int j = 0; j < s.size(); ++j) { cout << s[j] << " "; } cout << "]" << endl; if (generate_output(ref_out, db->getMutationNumber())) { stringstream ss; ss << "depth=" << depth << ",order=" << i; Database::trans_vec_t tmp_vec = trans_vec; Database::Transformation trans("interchange", ss.str()); tmp_vec.push_back(trans); db->addMutation(tmp_vec); DependenceGraph permu_dep_graph; unrollInnerLoops(tmp_vec, root, ref_out); unrollAndJam(tmp_vec, root, depth, ref_out, permu_dep_graph); distributeInnerLoops(tmp_vec, root, ref_out, permu_dep_graph); tileLoops(tmp_vec, root, new_loop, depth, ref_out, permu_dep_graph); } } SageInterface::replaceStatement(new_loop, loop.loop_); SageInterface::deepDelete(new_loop); } } db->finalize(true); delete project; return 0; } #include <iostream> #include <sstream> #include <fstream> #include <vector> #include <map> #include <algorithm> #include <functional> #include <numeric> #include <cstdio> #include <string> #include "rose.h" #include <AstInterface_ROSE.h> #include <LoopTransformInterface.h> #include <transformation.hh> #include <utils.hh> #include <analysis.hh> #include <database.hh> using namespace std; using namespace restructurer; void restructurer::unrollInnerLoops(const Database::trans_vec_t &trans_vec, SgStatement *base, const string &ref_out, bool if_distribute) { SgProject *project = SageInterface::getProject(); Database *db = Database::getInstance(); SgStatement *old_copy = NULL; SgStatement *new_copy = base; for (int factor = 2; factor <= 8; factor *= 2) { //printBanner("Find Innermost Loops");
  • ld_copy = new_copy;
new_copy = SageInterface::copyStatement(base); SageInterface::replaceStatement(old_copy, new_copy); if (old_copy != base) { SageInterface::deepDelete(old_copy); } InnerLoopIdentifier finder; finder.traverseInputFiles(project); //printBanner("Loop Unrolling"); const loop_vec_t &inner_loop_vec = finder.getInnerLoopVec(); bool if_changed = false; for (loop_vec_t::const_iterator citr = inner_loop_vec.begin(); citr != inner_loop_vec.end(); ++citr) { SgForStatement *loop = *citr; if (!SageInterface::isAncestor(new_copy, loop)) { continue; } SgVariableSymbol *for_ierator; SgExpression * lower_bound; SgExpression * upper_bound; SgExpression * stride; bool success = SageInterface::getForLoopInformations(loop, for_ierator, lower_bound, upper_bound, stride); if (success) { SgType *type = for_ierator->get_type(); if (type->stripType()->isUnsignedType()) { // Not doing unrolling on unsigned iterator cout << "Skip unrolling a loop with unsigned iterator" << endl; continue; } } int trip = getForLoopTripCount(loop); if (factor > 2 && trip < factor && trip != -1) { cout << "Skip unrolling a loop with trip count " << trip << " lower than unroll factor " << factor << endl; continue; } SgForStatement *new_loop = isSgForStatement(SageInterface::copyStatement(loop)); SageInterface::replaceStatement(loop, new_loop); SageInterface::deepDelete(loop); if (SageInterface::loopUnrolling(new_loop, factor)) { cout << "Unrolled an inner most loop with factor " << factor << endl; if_changed = true; } else { cout << "Failed to unroll an inner most loop" << endl; } } if (if_changed) { generate_output(ref_out, db->getMutationNumber()); stringstream ss; ss << "factor=" << factor; Database::trans_vec_t tmp_vec = trans_vec; Database::Transformation trans("unrolling", ss.str()); tmp_vec.push_back(trans); db->addMutation(tmp_vec); if (if_distribute) { //distributeInnerLoops(new_copy, ref_out, false); } } } if (new_copy != base) { SageInterface::replaceStatement(new_copy, base); SageInterface::deepDelete(new_copy); } } void restructurer::unrollAndJam(const Database::trans_vec_t &trans_vec, SgBasicBlock *base, int nest_level, const std::string &ref_out, const DependenceGraph &orig_dep_graph) { SgProject *project = SageInterface::getProject(); Database *db = Database::getInstance(); for (int i = 0; i < nest_level - 1; ++i) { for (int factor = 2; factor <= 4; factor *= 2) { if (orig_dep_graph.isUnrollAndJamLegal(i, factor)) { SgBasicBlock *new_copy = isSgBasicBlock(SageInterface::copyStatement(base)); SageInterface::replaceStatement(base, new_copy); SgStatementPtrList stmts = new_copy->get_statements(); SgForStatement *loop = NULL; for (SgStatementPtrList::iterator itr = stmts.begin(); itr != stmts.end(); ++itr) { if (isSgForStatement(*itr)) { loop = isSgForStatement(*itr); break; } } ROSE_ASSERT(loop != NULL); SgForStatement *target_loop = loop; SgBasicBlock *enclosing_body = new_copy; for (int j = 0; j < i; ++j) { enclosing_body = isSgBasicBlock(target_loop->get_loop_body()); ROSE_ASSERT(enclosing_body != NULL); SgStatementPtrList stmts = enclosing_body->get_statements(); for (SgStatementPtrList::iterator itr = stmts.begin(); itr != stmts.end(); ++itr) { if (isSgForStatement(*itr)) { target_loop = isSgForStatement(*itr); break; } } } size_t enclosing_size = enclosing_body->get_numberOfTraversalSuccessors(); SgBasicBlock *target_body = isSgBasicBlock(target_loop->get_loop_body()); ROSE_ASSERT(target_body != NULL); if (target_body->get_numberOfTraversalSuccessors() == 1) { int trip = getForLoopTripCount(target_loop); if (factor == 2 || trip >= factor || trip == -1) { if (SageInterface::loopUnrolling(target_loop, factor)) { SgStatementPtrList stmts = target_body->get_statements(); SgForStatement *inner_loop = isSgForStatement(stmts[0]); ROSE_ASSERT(inner_loop != NULL); SgBasicBlock *inner_scope = isSgBasicBlock(inner_loop->get_loop_body()); ROSE_ASSERT(inner_scope != NULL); for (int j = 1; j < factor; ++j) { SgBasicBlock *scope = isSgBasicBlock(stmts[j]); ROSE_ASSERT(scope != NULL); SgForStatement *dup_loop = isSgForStatement(scope->get_traversalSuccessorByIndex(0)); ROSE_ASSERT(dup_loop != NULL); SgBasicBlock *dup_scope = isSgBasicBlock(dup_loop->get_loop_body()); ROSE_ASSERT(dup_scope != NULL); SgStatementPtrList dup_stmts = dup_scope->get_statements(); for (SgStatementPtrList::iterator itr2 = dup_stmts.begin(); itr2 != dup_stmts.end(); ++itr2) { SgStatement *new_stmt = SageInterface::copyStatement(*itr2); SageInterface::appendStatement(new_stmt, inner_scope); } SageInterface::removeStatement(scope); SageInterface::deepDelete(scope); } if (enclosing_body->get_numberOfTraversalSuccessors() == enclosing_size + 2) { // we have a residue loop SgForStatement *residue_loop = isSgForStatement(enclosing_body->get_traversalSuccessorByIndex(enclosing_body->get_childIndex(target_loop) + 1)); ROSE_ASSERT(residue_loop != NULL); residue_loop->set_for_init_stmt(SageBuilder::buildForInitStatement(SageBuilder::buildNullStatement())); // need to fix this because ROSE's unrolling has a bug } cout << "Unroll-jammed at level " << i << " with factor " << factor << endl; generate_output(ref_out, db->getMutationNumber()); stringstream ss; ss << "level=" << i << ",factor=" << factor; Database::trans_vec_t tmp_vec = trans_vec; Database::Transformation trans("unrolljam", ss.str()); tmp_vec.push_back(trans); db->addMutation(tmp_vec); unrollInnerLoops(tmp_vec, new_copy, ref_out); } else { cout << "Failed to unroll-jam at level " << i << " with factor " << factor << endl; } } else { cout << "Failed to unroll-jam at level " << i << " with factor " << factor << endl; } } else { cout << "Failed to unroll-jam at level " << i << " with factor " << factor << endl; } SageInterface::replaceStatement(new_copy, base); SageInterface::deepDelete(new_copy); } else { cout << "Unroll-jam at level " << i << " with factor " << factor << " is illegal" << endl; break; } } } } void restructurer::tileLoops(const Database::trans_vec_t &trans_vec, SgStatement *base, SgForStatement *loop, int nest_level, const string &ref_out, const DependenceGraph &orig_dep_graph) { SgProject *project = SageInterface::getProject(); Database *db = Database::getInstance(); SgForStatement *target_loop = loop; vector<int> trips; for (int i = 0; i < nest_level - 1; ++i) { trips.push_back(getForLoopTripCount(target_loop)); SgBasicBlock *body = isSgBasicBlock(target_loop->get_loop_body()); ROSE_ASSERT(body != NULL); SgStatementPtrList stmts = body->get_statements(); for (SgStatementPtrList::iterator itr = stmts.begin(); itr != stmts.end(); ++itr) { if (isSgForStatement(*itr)) { target_loop = isSgForStatement(*itr); break; } } } trips.push_back(getForLoopTripCount(target_loop)); SgStatement *parent = isSgStatement(loop->get_parent()); size_t child_idx = parent->get_childIndex(loop); SgStatement *old_copy = NULL; SgStatement *new_copy = parent; for (int i = 1; i < nest_level + 1; ++i) { if (!orig_dep_graph.isTilingLegal(i - 1)) { cout << "Tiling at level " << i << " is illegal" << endl; continue; } int trip = trips[i - 1]; int bound = 32; if (trip < 32 && trip != -1) { bound = trip; } for (int tile_sz = 8; tile_sz <= bound; tile_sz *= 2) {
  • ld_copy = new_copy;
new_copy = SageInterface::copyStatement(parent); SgForStatement *new_loop = isSgForStatement(new_copy->get_traversalSuccessorByIndex(child_idx)); //printBanner("Loop Tiling"); SageInterface::replaceStatement(old_copy, new_copy); if (old_copy != parent) { SageInterface::deepDelete(old_copy); } if (SageInterface::loopTiling(new_loop, i, tile_sz)) { cout << "Tiled a loop at level " << i << " with tile size " << tile_sz << endl; if (generate_output(ref_out, db->getMutationNumber())) { stringstream ss; ss << "level=" << i << ",size=" << tile_sz; Database::trans_vec_t tmp_vec = trans_vec; Database::Transformation trans("tiling", ss.str()); tmp_vec.push_back(trans); db->addMutation(tmp_vec); unrollInnerLoops(tmp_vec, new_copy, ref_out); distributeInnerLoops(tmp_vec, new_copy, ref_out, orig_dep_graph); } } else { cout << "Failed to tile a loop" << endl; } } } if (new_copy != parent) { SageInterface::replaceStatement(new_copy, parent); SageInterface::deepDelete(new_copy); } } void restructurer::distributeInnerLoops(const Database::trans_vec_t &trans_vec, SgStatement *base, const std::string &ref_out, const DependenceGraph &orig_dep_graph, bool if_unroll) { static bool success_history = true; if (success_history == false) { return; } Rose_STL_Container<SgNode *> if_list = NodeQuery::querySubTree(base, V_SgIfStmt); if (if_list.size()) { return; } SgProject *project = SageInterface::getProject(); Database *db = Database::getInstance(); SgStatement *new_copy = SageInterface::copyStatement(base); SageInterface::replaceStatement(base, new_copy); DependenceGraph dep_graph; InnerLoopIdentifier finder; finder.traverseInputFiles(project); //printBanner("Loop Fission"); const loop_vec_t &inner_loop_vec = finder.getInnerLoopVec(); //orig_dep_graph.printDepSet(cout); bool if_changed = false; for (loop_vec_t::const_iterator citr = inner_loop_vec.begin(); citr != inner_loop_vec.end(); ++citr) { SgForStatement *loop = *citr; if (!SageInterface::isAncestor(new_copy, loop)) { continue; } vector<set<int> > disjoint_sets = dep_graph.getDisjointStatmentSets(); /*for (int i = 0; i < disjoint_sets.size(); ++i) { set<int> &s = disjoint_sets[i]; for (set<int>::iterator sitr = s.begin(); sitr != s.end(); ++sitr) { cout << *sitr << " "; } cout << endl; }*/ if (disjoint_sets.size() == 1) { continue; } int cnt = 0; for (vector<set<int> >::iterator v_itr = disjoint_sets.begin(); v_itr != disjoint_sets.end(); ++v_itr) { vector<SgExprStatement *> expr_list; for (set<int>::iterator s_itr = v_itr->begin(); s_itr != v_itr->end(); ++s_itr) { SgExprStatement *expr = dep_graph.getStatementById(*s_itr); if (SageInterface::isAncestor(loop, expr)) { //cout << expr->unparseToString() << endl; expr_list.push_back(isSgExprStatement(SageInterface::copyStatement(expr))); } } if (expr_list.size()) { SgForStatement *new_loop = isSgForStatement(SageInterface::copyStatement(loop)); SageInterface::insertStatement(loop, new_loop); SgStatement *old_body = new_loop->get_loop_body(); SgBasicBlock *new_body = SageBuilder::buildBasicBlock(); for (vector<SgExprStatement *>::iterator expr_itr = expr_list.begin(); expr_itr != expr_list.end(); ++expr_itr) { SageInterface::appendStatement(*expr_itr, new_body); } new_loop->set_loop_body(new_body); SageInterface::deepDelete(old_body); cnt++; } } if (cnt > 1) { cout << "Distributed a loop into " << cnt << " loops" << endl; if_changed = true; } SageInterface::removeStatement(loop); SageInterface::deepDelete(loop); } success_history = false; if (if_changed) { if (generate_output(ref_out, db->getMutationNumber())) { success_history = true; stringstream ss; ss << "allow_dep=" << 0; Database::trans_vec_t tmp_vec = trans_vec; Database::Transformation trans("distribution", ss.str()); tmp_vec.push_back(trans); db->addMutation(tmp_vec); if (if_unroll) { unrollInnerLoops(tmp_vec, new_copy, ref_out, false); } } } SageInterface::replaceStatement(new_copy, base); SageInterface::deepDelete(new_copy); }

1200+ lines of code

CodeReg scop { perfect = BuiltIn.IsPerfectLoopNest(); depth = BuiltIn.LoopNestDepth(); if (RoseLocus.IsDepAvailable()) { if (perfect && depth > 1) { permorder = permutation(seq(0,depth)); RoseLocus.Interchange(order=permorder); } { if (perfect) { indexT1 = integer(1..depth); T1fac = poweroftwo(2..32); RoseLocus.Tiling(loop=indexT1, factor=T1fac); } } OR { if (depth > 1) { indexUAJ = integer(1..depth-1); UAJfac = poweroftwo(2..4); RoseLocus.UnrollAndJam(loop=indexUAJ, factor=UAJfac); } } OR { None; # No tiling, interchange, or unroll and jam. } innerloops = BuiltIn.ListInnerLoops(); *RoseLocus.Distribute(loop=innerloops); } innerloops = BuiltIn.ListInnerLoops(); RoseLocus.Unroll(loop=innerloops, factor=poweroftwo(2..8)); }

37 lines of code

slide-117
SLIDE 117

CodeReg scop { perfect = BuiltIn.IsPerfectLoopNest(); depth = BuiltIn.LoopNestDepth(); if (RoseLocus.IsDepAvailable()) { if (perfect && depth > 1) { permorder = permutation(seq(0,depth)); RoseLocus.Interchange(order=permorder); } { if (perfect) { indexT1 = integer(1..depth); T1fac = poweroftwo(2..32); RoseLocus.Tiling(loop=indexT1, factor=T1fac); } } OR { if (depth > 1) { indexUAJ = integer(1..depth-1); UAJfac = poweroftwo(2..4); RoseLocus.UnrollAndJam(loop=indexUAJ, factor=UAJfac); } } OR { None; # No tiling, interchange, or unroll and jam. } innerloops = BuiltIn.ListInnerLoops(); *RoseLocus.Distribute(loop=innerloops); } innerloops = BuiltIn.ListInnerLoops(); RoseLocus.Unroll(loop=innerloops, factor=poweroftwo(2..8)); }

Optimization of Arbitrary Loop Nests

43

#include <iostream> #include <sstream> #include <fstream> #include <vector> #include <map> #include <algorithm> #include <functional> #include <numeric> #include <cstdio> #include "rose.h" #include <CPPAstInterface.h> #include <ArrayAnnot.h> #include <ArrayRewrite.h> #include <AstInterface_ROSE.h> #include <LoopTransformInterface.h> #include <AnnotCollect.h> #include <OperatorAnnotation.h> #include <candl/candl.h> #include <scoplib/scop.h> #include <polyopt/PolyOpt.hpp> #include <polyopt/ScopExtractor.hpp> #include <polyopt/SageNodeAnnotation.hpp> #include <utils.hh> #include <transformation.hh> #include <dependence.hh> #include <analysis.hh> #include <database.hh> #include <config.hh> #include <staticfeature.hh> #include <boost/program_options.hpp> using namespace std; using namespace restructurer; namespace po = boost::program_options; int main(int argc, char* argv[]) { po::options_description description("restructurer usage"); description.add_options() ("help", "Display this help message") ("benchmark", po::value<string>(), "Specify the benchmark") ("version", po::value<string>(), "Specify the version of the benchmark") ("application", po::value<string>(), "Specify the application in the benchmark") ("file", po::value<string>(), "Specify the file that contains the loop") ("function", po::value<string>(), "Specify the function that contains the loop") ("line", po::value<string>(), "Specify the starting line number of the loop") ("skipinterchangetiling", "Do not perform interchange or tiling") ("nodb", "Do not write to database") ("dependenceonly", "Only output the dependence information of the original loop nest") ("extractstaticfeatures", "Only extract static features of the original loop nest"); string benchmark, version, application, file_name, func_name, line_no; bool dependenceonly = false; bool extractstaticfeatures = false; bool skip_interchange_tiling = false; try { po::variables_map vm; po::store(po::command_line_parser(argc, argv).options(description).allow_unregistered().run(), vm); po::notify(vm); if (vm.count("nodb")) { write_to_db = false; } else { if (!(vm.count("benchmark") && vm.count("version") && vm.count("application") && vm.count("file") && vm.count("function") && vm.count("line"))) { throw std::exception(); } benchmark = vm["benchmark"].as<string>(); version = vm["version"].as<string>(); application = vm["application"].as<string>(); file_name = vm["file"].as<string>(); func_name = vm["function"].as<string>(); line_no = vm["line"].as<string>(); } if (vm.count("dependenceonly")) { dependenceonly = true; } if (vm.count("extractstaticfeatures")) { extractstaticfeatures = true; } if (vm.count("skipinterchangetiling")) { skip_interchange_tiling = true; } } catch ( const std::exception& e ) { cerr << "Failed to process arguments " << e.what() << endl; return -1; } SgStringList args = CommandlineProcessing::generateArgListFromArgcArgv(argc, argv); SgProject* project = frontend(args); ROSE_ASSERT(project != NULL); SgFile &file = project->get_file(0); Sg_File_Info *file_info = file.get_file_info(); Database *db = Database::getInstance(); cout << "benchmark: " << benchmark << endl; cout << "version: " << version << endl; cout << "application: " << application << endl; cout << "file name: " << file_name << endl; cout << "function: " << func_name << endl; cout << "line: " << line_no << endl; db->init(benchmark, version, application, file_name, func_name, line_no); SageInterface::changeAllBodiesToBlocks(project); SgBasicBlock *body = NULL; VariantVector vv_func(V_SgFunctionDefinition); Rose_STL_Container<SgNode*> funcion_list = NodeQuery::queryMemoryPool(vv_func); for (Rose_STL_Container<SgNode*>::iterator f_itr = funcion_list.begin(); f_itr != funcion_list.end(); ++f_itr) { SgFunctionDefinition *cur_func = isSgFunctionDefinition(*f_itr); string name = cur_func->get_declaration()->get_name().getString(); SgBasicBlock *func_body = cur_func->get_body(); if (name == "loop" && func_body) { body = func_body; } } //project->unparse(); //cout << skip_interchange_tiling << endl; //return 0; if (dependenceonly) { DependenceGraph orig_dep_graph;
  • rig_dep_graph.printDepSet(cout);
//orig_dep_graph.printStmtDepSet(cout);
  • fstream dot("dep.dot", ofstream::out);
  • rig_dep_graph.outputDot(dot);
dot.close(); return 0; } InheritedAttribute inh_attr; OuterLoopIdentifier first_pass; first_pass.traverseInputFiles(project, inh_attr); const loop_info_vec_t &outer_loop_vec = first_pass.getOuterLoopVec(); if (extractstaticfeatures) { for (loop_info_vec_t::const_iterator citr = outer_loop_vec.begin(); citr != outer_loop_vec.end(); ++citr) { OuterLoopInfo loop(*citr); if (!SageInterface::isAncestor(body, loop.loop_)) { continue; } int depth = loop.inner_loop_nest_depth_; StaticFeatureExtraction *extract = new StaticFeatureExtraction(loop.loop_, depth); extract->startExtractingFeatures(); extract->writeToDataBase(db); delete extract; } return 0; } string ref_out; //distributeInnerLoops(root, ref_out); //return 0; string file_path(file.get_sourceFileNameWithPath()); string bin_path(file_path + ".out"); if (verify_output) { printBanner("Generate Reference Output"); compile(file_path, bin_path); ref_out = exec_cmd(bin_path); cout << "Reference Output:" << endl << ref_out << endl; } printBanner("Calculate Original Dependence Vectors"); DependenceGraph orig_dep_graph;
  • rig_dep_graph.printDepSet(cout);
if (generate_output(ref_out, db->getMutationNumber())) { db->clearOldData(); db->addMutation(Database::trans_vec_t()); } dep_set_t orig_dep_set = orig_dep_graph.getDepSet(); SgBasicBlock *root = isSgBasicBlock(orig_dep_graph.getRoot()); if (root == NULL) { cout << "No proper SCoP is found in the input file." << endl; if (body) { cout << "Only doing unrolling." << endl; Database::trans_vec_t trans_vec; unrollInnerLoops(trans_vec, body, ref_out); } db->finalize(true); return 0; } /*vector<set<int> > disjoint_sets = orig_dep_graph.getDisjointStatmentSets(); for (vector<set<int> >::iterator v_itr = disjoint_sets.begin(); v_itr != disjoint_sets.end(); ++v_itr) { for (set<int>::iterator s_itr = v_itr->begin(); s_itr != v_itr->end(); ++s_itr) { cout << *s_itr << " "; } cout << endl; }*/ //NodePrinter np; //np.traverse(root); //return 0; //return 0; for (loop_info_vec_t::const_iterator citr = outer_loop_vec.begin(); citr != outer_loop_vec.end(); ++citr) { OuterLoopInfo loop(*citr); if (!SageInterface::isAncestor(root, loop.loop_)) { continue; } Database::trans_vec_t trans_vec; int depth = loop.inner_loop_nest_depth_; unrollInnerLoops(trans_vec, root, ref_out); if (!skip_interchange_tiling) { unrollAndJam(trans_vec, root, depth, ref_out, orig_dep_graph); } distributeInnerLoops(trans_vec, root, ref_out, orig_dep_graph); // Check if the loop nest is perfect. Skip interchange and tiling if the nest is not perfect. bool perfect = true; vector<SgForStatement *> loop_nest = SageInterface::querySubTree<SgForStatement>(loop.loop_, V_SgForStatement); for (vector<SgForStatement *>::iterator f_itr = loop_nest.begin(); f_itr != loop_nest.end(); ++f_itr) { SgForStatement* cur_loop = *f_itr; SgBasicBlock *cur_body = isSgBasicBlock(SageInterface::getLoopBody(cur_loop)); if (cur_body->get_numberOfTraversalSuccessors() != 1) { if (SageInterface::querySubTree<SgForStatement>(cur_body, V_SgForStatement).size() == 0) { continue; } perfect = false; break; } } if (!perfect) { cout << "Loop nest is imperfect. Skip Interchange and Tiling." << endl; continue; } if (!skip_interchange_tiling) { tileLoops(trans_vec, root, loop.loop_, depth, ref_out, orig_dep_graph); } // If depth is 1, skip interchange. if (depth == 1 || skip_interchange_tiling) { continue; } int num_order = 1; for (int i = 2; i <= depth; ++i) { num_order *= i; } for (int i = 1; i < num_order; ++i) { //! A helper function to return a permutation order for n elements based on a lexicographical order number. //! See also, Combinatorics::permute(), which is faster but does not use strict lexicographic ordering. size_t k = i; vector<size_t> s(depth); // initialize the permutation vector for (size_t j = 0; j < depth; ++j) { s[j]=j; } //compute (n- 1)! size_t factorial = 1; for (int j = 2; j <= depth - 1; ++j) { factorial *= j; } // Algorithm: //check each element of the array, excluding the right most one. //the goal is to find the right element for each s[j] from 0 to n-2 // method: each position is associated a factorial number // s[0] -> (n-1)! // s[1] -> (n-2)! ... // the input number k is divided by the factorial at each position (6, 3, 2, 1 for size =4) // so only big enough k can have non-zero value after division // 0 value means no change to the position for the current iteration // The non-zero value is further modular by the number of the right hand elements of the current element. // (mode on 4, 3, 2 to get offset 1-2-3, 1-2, 1 from the current position 0, 1, 2) // choose one of them to be moved to the current position, // shift elements between the current and the moved element to the right direction for one position for (size_t j = 0; j < depth - 1; ++j) { //calculates the next cell from the cells left //(the cells in the range [j, s.length - 1]) int tempj = (k / factorial) % (depth - j); //Temporarily saves the value of the cell needed // to add to the permutation this time int temps = s[j + tempj]; //shift all elements to "cover" the "missing" cell //shift them to the right for (size_t l = j + tempj; l > j; --l) { s[l] = s[l - 1]; //shift the chain right } // put the chosen cell in the correct spot s[j] = temps; // updates the factorial factorial = factorial / (depth - (j + 1)); } bool legal = orig_dep_graph.isPermutationLegal(s); if (!legal) { cout << "Permutation [ "; for (int i = 0; i < s.size(); ++i) { cout << s[i] << " "; } cout << "] is illegal" << endl; continue; } SgForStatement *new_loop = isSgForStatement(SageInterface::copyStatement(loop.loop_)); SageInterface::replaceStatement(loop.loop_, new_loop); if (SageInterface::loopInterchange(new_loop, depth, i)) { cout << "Interchanged a loop nest of depth " << depth << " with permutation [ "; for (int j = 0; j < s.size(); ++j) { cout << s[j] << " "; } cout << "]" << endl; if (generate_output(ref_out, db->getMutationNumber())) { stringstream ss; ss << "depth=" << depth << ",order=" << i; Database::trans_vec_t tmp_vec = trans_vec; Database::Transformation trans("interchange", ss.str()); tmp_vec.push_back(trans); db->addMutation(tmp_vec); DependenceGraph permu_dep_graph; unrollInnerLoops(tmp_vec, root, ref_out); unrollAndJam(tmp_vec, root, depth, ref_out, permu_dep_graph); distributeInnerLoops(tmp_vec, root, ref_out, permu_dep_graph); tileLoops(tmp_vec, root, new_loop, depth, ref_out, permu_dep_graph); } } SageInterface::replaceStatement(new_loop, loop.loop_); SageInterface::deepDelete(new_loop); } } db->finalize(true); delete project; return 0; } #include <iostream> #include <sstream> #include <fstream> #include <vector> #include <map> #include <algorithm> #include <functional> #include <numeric> #include <cstdio> #include <string> #include "rose.h" #include <AstInterface_ROSE.h> #include <LoopTransformInterface.h> #include <transformation.hh> #include <utils.hh> #include <analysis.hh> #include <database.hh> using namespace std; using namespace restructurer; void restructurer::unrollInnerLoops(const Database::trans_vec_t &trans_vec, SgStatement *base, const string &ref_out, bool if_distribute) { SgProject *project = SageInterface::getProject(); Database *db = Database::getInstance(); SgStatement *old_copy = NULL; SgStatement *new_copy = base; for (int factor = 2; factor <= 8; factor *= 2) { //printBanner("Find Innermost Loops");
  • ld_copy = new_copy;
new_copy = SageInterface::copyStatement(base); SageInterface::replaceStatement(old_copy, new_copy); if (old_copy != base) { SageInterface::deepDelete(old_copy); } InnerLoopIdentifier finder; finder.traverseInputFiles(project); //printBanner("Loop Unrolling"); const loop_vec_t &inner_loop_vec = finder.getInnerLoopVec(); bool if_changed = false; for (loop_vec_t::const_iterator citr = inner_loop_vec.begin(); citr != inner_loop_vec.end(); ++citr) { SgForStatement *loop = *citr; if (!SageInterface::isAncestor(new_copy, loop)) { continue; } SgVariableSymbol *for_ierator; SgExpression * lower_bound; SgExpression * upper_bound; SgExpression * stride; bool success = SageInterface::getForLoopInformations(loop, for_ierator, lower_bound, upper_bound, stride); if (success) { SgType *type = for_ierator->get_type(); if (type->stripType()->isUnsignedType()) { // Not doing unrolling on unsigned iterator cout << "Skip unrolling a loop with unsigned iterator" << endl; continue; } } int trip = getForLoopTripCount(loop); if (factor > 2 && trip < factor && trip != -1) { cout << "Skip unrolling a loop with trip count " << trip << " lower than unroll factor " << factor << endl; continue; } SgForStatement *new_loop = isSgForStatement(SageInterface::copyStatement(loop)); SageInterface::replaceStatement(loop, new_loop); SageInterface::deepDelete(loop); if (SageInterface::loopUnrolling(new_loop, factor)) { cout << "Unrolled an inner most loop with factor " << factor << endl; if_changed = true; } else { cout << "Failed to unroll an inner most loop" << endl; } } if (if_changed) { generate_output(ref_out, db->getMutationNumber()); stringstream ss; ss << "factor=" << factor; Database::trans_vec_t tmp_vec = trans_vec; Database::Transformation trans("unrolling", ss.str()); tmp_vec.push_back(trans); db->addMutation(tmp_vec); if (if_distribute) { //distributeInnerLoops(new_copy, ref_out, false); } } } if (new_copy != base) { SageInterface::replaceStatement(new_copy, base); SageInterface::deepDelete(new_copy); } } void restructurer::unrollAndJam(const Database::trans_vec_t &trans_vec, SgBasicBlock *base, int nest_level, const std::string &ref_out, const DependenceGraph &orig_dep_graph) { SgProject *project = SageInterface::getProject(); Database *db = Database::getInstance(); for (int i = 0; i < nest_level - 1; ++i) { for (int factor = 2; factor <= 4; factor *= 2) { if (orig_dep_graph.isUnrollAndJamLegal(i, factor)) { SgBasicBlock *new_copy = isSgBasicBlock(SageInterface::copyStatement(base)); SageInterface::replaceStatement(base, new_copy); SgStatementPtrList stmts = new_copy->get_statements(); SgForStatement *loop = NULL; for (SgStatementPtrList::iterator itr = stmts.begin(); itr != stmts.end(); ++itr) { if (isSgForStatement(*itr)) { loop = isSgForStatement(*itr); break; } } ROSE_ASSERT(loop != NULL); SgForStatement *target_loop = loop; SgBasicBlock *enclosing_body = new_copy; for (int j = 0; j < i; ++j) { enclosing_body = isSgBasicBlock(target_loop->get_loop_body()); ROSE_ASSERT(enclosing_body != NULL); SgStatementPtrList stmts = enclosing_body->get_statements(); for (SgStatementPtrList::iterator itr = stmts.begin(); itr != stmts.end(); ++itr) { if (isSgForStatement(*itr)) { target_loop = isSgForStatement(*itr); break; } } } size_t enclosing_size = enclosing_body->get_numberOfTraversalSuccessors(); SgBasicBlock *target_body = isSgBasicBlock(target_loop->get_loop_body()); ROSE_ASSERT(target_body != NULL); if (target_body->get_numberOfTraversalSuccessors() == 1) { int trip = getForLoopTripCount(target_loop); if (factor == 2 || trip >= factor || trip == -1) { if (SageInterface::loopUnrolling(target_loop, factor)) { SgStatementPtrList stmts = target_body->get_statements(); SgForStatement *inner_loop = isSgForStatement(stmts[0]); ROSE_ASSERT(inner_loop != NULL); SgBasicBlock *inner_scope = isSgBasicBlock(inner_loop->get_loop_body()); ROSE_ASSERT(inner_scope != NULL); for (int j = 1; j < factor; ++j) { SgBasicBlock *scope = isSgBasicBlock(stmts[j]); ROSE_ASSERT(scope != NULL); SgForStatement *dup_loop = isSgForStatement(scope->get_traversalSuccessorByIndex(0)); ROSE_ASSERT(dup_loop != NULL); SgBasicBlock *dup_scope = isSgBasicBlock(dup_loop->get_loop_body()); ROSE_ASSERT(dup_scope != NULL); SgStatementPtrList dup_stmts = dup_scope->get_statements(); for (SgStatementPtrList::iterator itr2 = dup_stmts.begin(); itr2 != dup_stmts.end(); ++itr2) { SgStatement *new_stmt = SageInterface::copyStatement(*itr2); SageInterface::appendStatement(new_stmt, inner_scope); } SageInterface::removeStatement(scope); SageInterface::deepDelete(scope); } if (enclosing_body->get_numberOfTraversalSuccessors() == enclosing_size + 2) { // we have a residue loop SgForStatement *residue_loop = isSgForStatement(enclosing_body->get_traversalSuccessorByIndex(enclosing_body->get_childIndex(target_loop) + 1)); ROSE_ASSERT(residue_loop != NULL); residue_loop->set_for_init_stmt(SageBuilder::buildForInitStatement(SageBuilder::buildNullStatement())); // need to fix this because ROSE's unrolling has a bug } cout << "Unroll-jammed at level " << i << " with factor " << factor << endl; generate_output(ref_out, db->getMutationNumber()); stringstream ss; ss << "level=" << i << ",factor=" << factor; Database::trans_vec_t tmp_vec = trans_vec; Database::Transformation trans("unrolljam", ss.str()); tmp_vec.push_back(trans); db->addMutation(tmp_vec); unrollInnerLoops(tmp_vec, new_copy, ref_out); } else { cout << "Failed to unroll-jam at level " << i << " with factor " << factor << endl; } } else { cout << "Failed to unroll-jam at level " << i << " with factor " << factor << endl; } } else { cout << "Failed to unroll-jam at level " << i << " with factor " << factor << endl; } SageInterface::replaceStatement(new_copy, base); SageInterface::deepDelete(new_copy); } else { cout << "Unroll-jam at level " << i << " with factor " << factor << " is illegal" << endl; break; } } } } void restructurer::tileLoops(const Database::trans_vec_t &trans_vec, SgStatement *base, SgForStatement *loop, int nest_level, const string &ref_out, const DependenceGraph &orig_dep_graph) { SgProject *project = SageInterface::getProject(); Database *db = Database::getInstance(); SgForStatement *target_loop = loop; vector<int> trips; for (int i = 0; i < nest_level - 1; ++i) { trips.push_back(getForLoopTripCount(target_loop)); SgBasicBlock *body = isSgBasicBlock(target_loop->get_loop_body()); ROSE_ASSERT(body != NULL); SgStatementPtrList stmts = body->get_statements(); for (SgStatementPtrList::iterator itr = stmts.begin(); itr != stmts.end(); ++itr) { if (isSgForStatement(*itr)) { target_loop = isSgForStatement(*itr); break; } } } trips.push_back(getForLoopTripCount(target_loop)); SgStatement *parent = isSgStatement(loop->get_parent()); size_t child_idx = parent->get_childIndex(loop); SgStatement *old_copy = NULL; SgStatement *new_copy = parent; for (int i = 1; i < nest_level + 1; ++i) { if (!orig_dep_graph.isTilingLegal(i - 1)) { cout << "Tiling at level " << i << " is illegal" << endl; continue; } int trip = trips[i - 1]; int bound = 32; if (trip < 32 && trip != -1) { bound = trip; } for (int tile_sz = 8; tile_sz <= bound; tile_sz *= 2) {
  • ld_copy = new_copy;
new_copy = SageInterface::copyStatement(parent); SgForStatement *new_loop = isSgForStatement(new_copy->get_traversalSuccessorByIndex(child_idx)); //printBanner("Loop Tiling"); SageInterface::replaceStatement(old_copy, new_copy); if (old_copy != parent) { SageInterface::deepDelete(old_copy); } if (SageInterface::loopTiling(new_loop, i, tile_sz)) { cout << "Tiled a loop at level " << i << " with tile size " << tile_sz << endl; if (generate_output(ref_out, db->getMutationNumber())) { stringstream ss; ss << "level=" << i << ",size=" << tile_sz; Database::trans_vec_t tmp_vec = trans_vec; Database::Transformation trans("tiling", ss.str()); tmp_vec.push_back(trans); db->addMutation(tmp_vec); unrollInnerLoops(tmp_vec, new_copy, ref_out); distributeInnerLoops(tmp_vec, new_copy, ref_out, orig_dep_graph); } } else { cout << "Failed to tile a loop" << endl; } } } if (new_copy != parent) { SageInterface::replaceStatement(new_copy, parent); SageInterface::deepDelete(new_copy); } } void restructurer::distributeInnerLoops(const Database::trans_vec_t &trans_vec, SgStatement *base, const std::string &ref_out, const DependenceGraph &orig_dep_graph, bool if_unroll) { static bool success_history = true; if (success_history == false) { return; } Rose_STL_Container<SgNode *> if_list = NodeQuery::querySubTree(base, V_SgIfStmt); if (if_list.size()) { return; } SgProject *project = SageInterface::getProject(); Database *db = Database::getInstance(); SgStatement *new_copy = SageInterface::copyStatement(base); SageInterface::replaceStatement(base, new_copy); DependenceGraph dep_graph; InnerLoopIdentifier finder; finder.traverseInputFiles(project); //printBanner("Loop Fission"); const loop_vec_t &inner_loop_vec = finder.getInnerLoopVec(); //orig_dep_graph.printDepSet(cout); bool if_changed = false; for (loop_vec_t::const_iterator citr = inner_loop_vec.begin(); citr != inner_loop_vec.end(); ++citr) { SgForStatement *loop = *citr; if (!SageInterface::isAncestor(new_copy, loop)) { continue; } vector<set<int> > disjoint_sets = dep_graph.getDisjointStatmentSets(); /*for (int i = 0; i < disjoint_sets.size(); ++i) { set<int> &s = disjoint_sets[i]; for (set<int>::iterator sitr = s.begin(); sitr != s.end(); ++sitr) { cout << *sitr << " "; } cout << endl; }*/ if (disjoint_sets.size() == 1) { continue; } int cnt = 0; for (vector<set<int> >::iterator v_itr = disjoint_sets.begin(); v_itr != disjoint_sets.end(); ++v_itr) { vector<SgExprStatement *> expr_list; for (set<int>::iterator s_itr = v_itr->begin(); s_itr != v_itr->end(); ++s_itr) { SgExprStatement *expr = dep_graph.getStatementById(*s_itr); if (SageInterface::isAncestor(loop, expr)) { //cout << expr->unparseToString() << endl; expr_list.push_back(isSgExprStatement(SageInterface::copyStatement(expr))); } } if (expr_list.size()) { SgForStatement *new_loop = isSgForStatement(SageInterface::copyStatement(loop)); SageInterface::insertStatement(loop, new_loop); SgStatement *old_body = new_loop->get_loop_body(); SgBasicBlock *new_body = SageBuilder::buildBasicBlock(); for (vector<SgExprStatement *>::iterator expr_itr = expr_list.begin(); expr_itr != expr_list.end(); ++expr_itr) { SageInterface::appendStatement(*expr_itr, new_body); } new_loop->set_loop_body(new_body); SageInterface::deepDelete(old_body); cnt++; } } if (cnt > 1) { cout << "Distributed a loop into " << cnt << " loops" << endl; if_changed = true; } SageInterface::removeStatement(loop); SageInterface::deepDelete(loop); } success_history = false; if (if_changed) { if (generate_output(ref_out, db->getMutationNumber())) { success_history = true; stringstream ss; ss << "allow_dep=" << 0; Database::trans_vec_t tmp_vec = trans_vec; Database::Transformation trans("distribution", ss.str()); tmp_vec.push_back(trans); db->addMutation(tmp_vec); if (if_unroll) { unrollInnerLoops(tmp_vec, new_copy, ref_out, false); } } } SageInterface::replaceStatement(new_copy, base); SageInterface::deepDelete(new_copy); }

1200+ lines of code

  • Reproduced Gong Zhangxiaowen et al. results
  • Much more concise and flexible

37 lines of code

slide-118
SLIDE 118

Conclusions

44

  • Locus is able to represent complex optimization spaces for different

code regions

  • Easy to use fine-grain optimizations in fine-grain regions of code to

improve performance

  • Share resulting optimization programs to amortize the search time
  • Keep the baseline version cleaner and simpler for the long term
  • Future work:

– Use multiple search modules concurrently to speed up the search process – Help users at designing optimization sequences

slide-119
SLIDE 119

Acknowledgments

Project is part of the Center for Exascale Simulation of Plasma-Coupled Combustion (XPACC) xpacc.illinois.edu This material is based in part upon work supported by the Department of Energy, National Nuclear Security Administration, under Award Number DE-NA0002374 and by the National Science Foundation under Award 1533912. We also gratefully acknowledge Gong Zhangxiaowen and Justin Szaday for their valuable help in setting up the experiments presented for optimizing arbitrary loop nests.

45

slide-120
SLIDE 120

Locus: A System and a Language for Program Optimization

Thiago Teixeira*, Corinne Ancourt+, David Padua*, William Gropp*

*Department of Computer Science, University of Illinois at Urbana-Champaign, USA

+MINES ParisTech, PSL University, France

CGO - Washington, DC - Feb 2019

tteixei2@illinois.edu

Thank you!