Open64/ORC compilers Sbastian Pop Universit Louis Pasteur - - PowerPoint PPT Presentation

open64 orc compilers
SMART_READER_LITE
LIVE PREVIEW

Open64/ORC compilers Sbastian Pop Universit Louis Pasteur - - PowerPoint PPT Presentation

Open64/ORC compilers Sbastian Pop Universit Louis Pasteur Strasbourg, Project A3 INRIA FRANCE Open64/ORC compilers p.1 Short history 1994: Ragnarok compiler for MIPS R8000 Open64/ORC compilers p.2 Short history 1994: Ragnarok


slide-1
SLIDE 1

Open64/ORC compilers

Sébastian Pop Université Louis Pasteur Strasbourg, Project A3 INRIA FRANCE

Open64/ORC compilers – p.1

slide-2
SLIDE 2

Short history

1994: Ragnarok compiler for MIPS R8000

Open64/ORC compilers – p.2

slide-3
SLIDE 3

Short history

1994: Ragnarok compiler for MIPS R8000 designed for scientific applications

Open64/ORC compilers – p.2

slide-4
SLIDE 4

Short history

1994: Ragnarok compiler for MIPS R8000 designed for scientific applications August 1994: start Mongoose compiler

Open64/ORC compilers – p.2

slide-5
SLIDE 5

Short history

1994: Ragnarok compiler for MIPS R8000 designed for scientific applications August 1994: start Mongoose compiler scientific and non-scientific applications

Open64/ORC compilers – p.2

slide-6
SLIDE 6

Short history

1994: Ragnarok compiler for MIPS R8000 designed for scientific applications August 1994: start Mongoose compiler scientific and non-scientific applications fast and stable for day-to-day development

Open64/ORC compilers – p.2

slide-7
SLIDE 7

Short history

1994: Ragnarok compiler for MIPS R8000 designed for scientific applications August 1994: start Mongoose compiler scientific and non-scientific applications fast and stable for day-to-day development 1999: focus on IA-64: SGIpro 1.0 (alias osprey1.0)

Open64/ORC compilers – p.2

slide-8
SLIDE 8

Short history

1994: Ragnarok compiler for MIPS R8000 designed for scientific applications August 1994: start Mongoose compiler scientific and non-scientific applications fast and stable for day-to-day development 1999: focus on IA-64: SGIpro 1.0 (alias osprey1.0) 2001: Intel and ICT Chinese Academy of Sc. ORC (Open Research Compiler)

Open64/ORC compilers – p.2

slide-9
SLIDE 9

Compiler’s Structure

  • 1. FE (Front-ends)
  • 2. WHIRL (Intermediate Representation)
  • 3. IPA (Inter Procedural Analysis)
  • 4. LNO (Loop Nest Optimizer)
  • 5. WOPT (Global Optimizer)
  • 6. CG (Code Generator)
  • 7. ORC (Open Research Compiler)

Compiler’s Structure – p.3

slide-10
SLIDE 10

Front Ends

Use GCC’s C/C++ and Cray F90 front ends

Compiler’s Structure – p.4

slide-11
SLIDE 11

Front Ends

Use GCC’s C/C++ and Cray F90 front ends Each front end has its own specific trees

Compiler’s Structure – p.4

slide-12
SLIDE 12

Front Ends

Use GCC’s C/C++ and Cray F90 front ends Each front end has its own specific trees Translation to WHIRL

Compiler’s Structure – p.4

slide-13
SLIDE 13

Front Ends

Use GCC’s C/C++ and Cray F90 front ends Each front end has its own specific trees Translation to WHIRL Question: Is this translation valid?

Compiler’s Structure – p.4

slide-14
SLIDE 14

Front Ends

Use GCC’s C/C++ and Cray F90 front ends Each front end has its own specific trees Translation to WHIRL Question: Is this translation valid? Test suites were not GPL-ed, could use GCC test suites (inappropriate)

Compiler’s Structure – p.4

slide-15
SLIDE 15

Front Ends

Use GCC’s C/C++ and Cray F90 front ends Each front end has its own specific trees Translation to WHIRL Question: Is this translation valid? Test suites were not GPL-ed, could use GCC test suites (inappropriate) Bug data base wasn’t GPL-ed.

Compiler’s Structure – p.4

slide-16
SLIDE 16

WHIRL

Winning Hierarchical Intermediate Representation Language

Compiler’s Structure – p.5

slide-17
SLIDE 17

WHIRL

Winning Hierarchical Intermediate Representation Language 5 levels: VH, H, M, L, VL Lowering happens when needed Each optimization performed at the right level

Compiler’s Structure – p.5

slide-18
SLIDE 18

WHIRL

Winning Hierarchical Intermediate Representation Language

whirl2c and whirl2f dump WHIRL in compilable

files.

whirl2a dump WHIRL in ASCII.

Compiler’s Structure – p.5

slide-19
SLIDE 19

Inter Procedural Analysis

file2.cxx file3.f

Suppose that we want to build a project containing 3 files and use the IPA for

  • ptimizing it.

file1.c

Compiler’s Structure – p.6

slide-20
SLIDE 20

Inter Procedural Analysis

file2.cxx C++ front−end file3.f F90 front−end C front−end file1.c

the right front−end. The first step invokes

Compiler’s Structure – p.6

slide-21
SLIDE 21

Inter Procedural Analysis

file2.cxx C++ front−end file3.f F90 front−end WHIRL dumper C front−end file1.c

This representation is then dumped into a .o file. These .o files behave like normal relocatable code (I.e. can be put in archives, etc.)

file3.o file2.o file1.o

The compiler transforms front−end specific trees into WHIRL trees.

Compiler’s Structure – p.6

slide-22
SLIDE 22

Inter Procedural Analysis

file2.cxx C++ front−end file3.f F90 front−end WHIRL dumper C front−end file1.c file1.o file2.o file3.o lib1.a

The linker is called as usual

  • n the last step of compilation.

Linker lib2.so

.a files can contain WHIRL normal .o files. trees, as well as

Compiler’s Structure – p.6

slide-23
SLIDE 23

Inter Procedural Analysis

file2.cxx C++ front−end file3.f F90 front−end WHIRL dumper C front−end file1.c file1.o file2.o file3.o lib1.a

Some files contain WHIRL trees: the compilation is not complete, and the IPA is called.

Linker Inter Procedural Analysis (IPA) lib2.so

Compiler’s Structure – p.6

slide-24
SLIDE 24

Inter Procedural Analysis

file2.cxx C++ front−end file3.f F90 front−end WHIRL dumper C front−end file1.c file1.o file2.o file3.o lib1.a Linker Inter Procedural Optimizations (IPO) Inter Procedural Analysis (IPA) lib2.so

Compiler’s Structure – p.6

slide-25
SLIDE 25

Inter Procedural Analysis

file2.cxx C++ front−end file3.f F90 front−end WHIRL dumper C front−end file1.c file1.o file2.o file3.o lib1.a Linker Loop Nest Optimizer (LNO) Inter Procedural Optimizations (IPO) Inter Procedural Analysis (IPA) lib2.so

Compiler’s Structure – p.6

slide-26
SLIDE 26

Inter Procedural Analysis

file2.cxx C++ front−end file3.f F90 front−end WHIRL dumper C front−end file1.c file1.o file2.o file3.o lib1.a Linker Loop Nest Optimizer (LNO) Inter Procedural Optimizations (IPO) Inter Procedural Analysis (IPA) Main Optimizer (WOPT) lib2.so

Compiler’s Structure – p.6

slide-27
SLIDE 27

Inter Procedural Analysis

file2.cxx C++ front−end file3.f F90 front−end WHIRL dumper C front−end file1.c file1.o file2.o file3.o lib1.a Linker Code Generator (CG) Loop Nest Optimizer (LNO) Inter Procedural Optimizations (IPO) Inter Procedural Analysis (IPA) Main Optimizer (WOPT) lib2.so

Compiler’s Structure – p.6

slide-28
SLIDE 28

Inter Procedural Analysis

file2.cxx C++ front−end file3.f F90 front−end WHIRL dumper C front−end file1.c file1.o file2.o file3.o lib1.a Executable file Linker Code Generator (CG) Loop Nest Optimizer (LNO) Inter Procedural Optimizations (IPO) Inter Procedural Analysis (IPA) Main Optimizer (WOPT) lib2.so

Compiler’s Structure – p.6

slide-29
SLIDE 29

Inter Procedural Analysis

Idea: gather information over a whole project

Compiler’s Structure – p.7

slide-30
SLIDE 30

Inter Procedural Analysis

Idea: gather information over a whole project Solution: save WHIRL trees in .o files build a global tree at link time perform all optimizations generate code

Compiler’s Structure – p.7

slide-31
SLIDE 31

Loop Nest Optimizer

LNO works on High level WHIRL. Lowering removed unstructured control flow (gotos, switch, ...)

Compiler’s Structure – p.8

slide-32
SLIDE 32

Loop Nest Optimizer

Analyzes extract information from WHIRL and construct specific Intermediate Representations (IRs): Array Dependence Graph LEGO: for data distributions Array and vectors accesses Vector space Systems of equations Polytope

Compiler’s Structure – p.8

slide-33
SLIDE 33

Loop Nest Optimizer

Main optimizers in LNO: Loop unrolling Hoist conditionals Hoist varying lower bounds Dead store eliminate arrays Loop reversal / fission / fusion / tiling Array scalarization Prefetch Inter iteration Common Subexpression Elimination

Compiler’s Structure – p.8

slide-34
SLIDE 34

Global Optimizer

WOPT works on Medium-level WHIRL (arrays lowered into load/store + offset, ...)

Compiler’s Structure – p.9

slide-35
SLIDE 35

Global Optimizer

Main intermediate representations: CFG (Control Flow Graph) SSA (Static Single Assignement) Main optimizations: SSA-PRE (Partial Redundancy Elimination) DCE (Dead Code Elimination) IVR (Induction Variable Recognition) VNFRE (Value Numbering based Full Redundancy Elimination) Copy propagation

Compiler’s Structure – p.9

slide-36
SLIDE 36

Code Generator

Code Generator works on CGIR. explicit CFG each BB contains a list of instructions each instruction is under the form OP_result OP_code OP_opnd This representation is close to assembler code.

Compiler’s Structure – p.10

slide-37
SLIDE 37

Code Generator

Main optimizers in CG are: EBO: Extended Block Optimizer GRA: Global Register Allocation LRA: Local Register Allocation GCM: Global Code Motion SWP: Software Pipelining CIO: Cross Iteration loop Optimizations FREQ: execution frequencies of BBs and edges

Compiler’s Structure – p.10

slide-38
SLIDE 38

Open Research Compiler

ORC is an extension of the Code Generator. ORC added the following infrastructure: IPFEC Regions: structures the CFG into a tree If-conversion PRDB: Predicate Relation DataBase Microscheduler Local/Global instruction scheduling

Compiler’s Structure – p.11

slide-39
SLIDE 39

Partial Redundancy Elimination

Compiler’s Structure – p.12

slide-40
SLIDE 40

Predicated code

IF_COND BB1 BB2

Compiler’s Structure – p.13

slide-41
SLIDE 41

Predicated code

IF_COND <BB1, p1> <BB2, p2>

If−conversion converts control−dependences to data−dependences replaces "if" constructs with guarded statements IF−conversion =

Compiler’s Structure – p.13

slide-42
SLIDE 42

Predicated code

IF_COND <BB1, p1> <BB2, p2> eval (BB1) (p1) eval (BB2) (p2) (p1, p2) = eval (IF_COND)

If−conversion DAG of the CFG linearized into a single block with guarded stmts Hyperblock = converts control−dependences to data−dependences replaces "if" constructs with guarded statements IF−conversion =

Compiler’s Structure – p.13

slide-43
SLIDE 43

Predicated code

IF ELSE THEN

4 instructions can be executed in parallel

  • n IA−64 architectures

ILP = Instruction Level Parallelism

✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✄ ✄ ✄ ✄ ✄ ✄ ✄ ✄ ✄ ✄ ✄ ✄ ✄ ✄ ✄ ✄ ✄ ✄ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ✆ ✆ ✆ ✆ ✆ ✆ ✆ ✆ ✆ ✆ ✆ ✆ ✆ ✆ ✆ ✆ ✆ ✆ ✆ ✆ ✝ ✝ ✝ ✝ ✝ ✝ ✝ ✝ ✝ ✝ ✝ ✝ ✝ ✝ ✝ ✝ ✝ ✝ ✞ ✞ ✞ ✞ ✞ ✞ ✞ ✞ ✞ ✞ ✞ ✞ ✞ ✞ ✞

Compiler’s Structure – p.13

slide-44
SLIDE 44

Predicated code

IF ELSE THEN

✟ ✟ ✟ ✟ ✟ ✟ ✟ ✟ ✟ ✟ ✟ ✟ ✟ ✟ ✟ ✟ ✟ ✟ ✠ ✠ ✠ ✠ ✠ ✠ ✠ ✠ ✠ ✠ ✠ ✠ ✠ ✠ ✠ ✠ ✠ ✠ ✡ ✡ ✡ ✡ ✡ ✡ ✡ ✡ ✡ ✡ ✡ ✡ ✡ ✡ ✡ ✡ ✡ ✡ ✡ ✡ ✡ ✡ ✡ ✡ ☛ ☛ ☛ ☛ ☛ ☛ ☛ ☛ ☛ ☛ ☛ ☛ ☛ ☛ ☛ ☛ ☛ ☛ ☞ ☞ ☞ ☞ ☞ ☞ ☞ ☞ ☞ ☞ ☞ ☞ ☞ ☞ ☞ ☞ ☞ ☞ ☞ ☞ ☞ ☞ ☞ ☞ ✌ ✌ ✌ ✌ ✌ ✌ ✌ ✌ ✌ ✌ ✌ ✌ ✌ ✌ ✌ ✌ ✌ ✌ ✌ ✌ ✍ ✍ ✍ ✍ ✍ ✍ ✍ ✍ ✍ ✍ ✍ ✍ ✍ ✍ ✍ ✍ ✍ ✍ ✎ ✎ ✎ ✎ ✎ ✎ ✎ ✎ ✎ ✎ ✎ ✎ ✎ ✎ ✎

p0 p0 p0 p0 p0 p1 p2 p1 p1 p1 p1 p2 p2 p2 p0 p0 p0 p0 p0 p0 p0 Each instruction is predicated.

Compiler’s Structure – p.13

slide-45
SLIDE 45

Predicated code

✏ ✏ ✏ ✏ ✏ ✏ ✏ ✏ ✏ ✏ ✏ ✏ ✑ ✑ ✑ ✑ ✑

p0 p0 p0 p0 p0 p0 p0 p0 p1 p1 p1 p1 p1 p0 p0 p0 p0 p2 p2 p2 p2 Create a hyperblock

Compiler’s Structure – p.13

slide-46
SLIDE 46

Compiler’s Structure – p.14

slide-47
SLIDE 47

Compiler’s Structure – p.15