CS502: Compiler Design Intermediate Code Generation Manas Thakur - - PowerPoint PPT Presentation

cs502 compiler design intermediate code generation manas
SMART_READER_LITE
LIVE PREVIEW

CS502: Compiler Design Intermediate Code Generation Manas Thakur - - PowerPoint PPT Presentation

CS502: Compiler Design Intermediate Code Generation Manas Thakur Fall 2020 Midway through the course! Character stream Machine-Independent Machine-Independent Lexical Analyzer Lexical Analyzer Code Optimizer Code Optimizer B a c k e n d


slide-1
SLIDE 1

CS502: Compiler Design Intermediate Code Generation Manas Thakur

Fall 2020

slide-2
SLIDE 2

Manas Thakur CS502: Compiler Design 2

Midway through the course!

Lexical Analyzer Lexical Analyzer Syntax Analyzer Syntax Analyzer Semantic Analyzer Semantic Analyzer Intermediate Code Generator Intermediate Code Generator Character stream Token stream Syntax tree Syntax tree Intermediate representation Machine-Independent Code Optimizer Machine-Independent Code Optimizer Code Generator Code Generator Target machine code Intermediate representation Machine-Dependent Code Optimizer Machine-Dependent Code Optimizer Target machine code Symbol Table

F r o n t e n d B a c k e n d

slide-3
SLIDE 3

Manas Thakur CS502: Compiler Design 3

Roles of IR Generator

  • Act as a glue between front-end and back-end

– Or source and machine codes

  • Lower abstraction from source level

– To make life simple

  • Maintain some high-level information

– To keep life interesting

  • Make the dream of m+n components for m languages and n

platforms look like a possibility

– Scala to Java Bytecode, for example

  • Enable machine-independent optimization

– Next phase

slide-4
SLIDE 4

Manas Thakur CS502: Compiler Design 4

Intermediate Representations (IR)

  • IR design affects compiler speed and capabilities
  • Some important IR properties:

– Ease of generation, manipulation, optimization – Size of the representation – Level of abstraction: level of detail in the IR

  • How close is the IR to source code? To the machine?
  • What kinds of operations are represented?
  • Often, different IRs for different jobs:

– High-level IR: close to the source language – Low-level IR: close to the assembly code – Some compilers even have mid-level IRs!

slide-5
SLIDE 5

Manas Thakur CS502: Compiler Design 5

Kinds of IRs

  • Structural

– Graph oriented – Heavily used in IDEs, source-to-source

translators

– Tend to be large

  • Linear

– Pseudo-code for an abstract machine – Level of abstraction varies – Simple, compact data structures

  • Hybrid

– Combination of graphs and linear code

Examples: ASTs, DAGs Examples: 3 address code Bytecode (Stack machine) Examples: Control-fmow graphs, Ideal IR (HotSpot C2)

slide-6
SLIDE 6

Manas Thakur CS502: Compiler Design 6

Abstract Syntax Tree (AST)

  • Parse tree with some intermediate nodes removed
  • Advantages:

– Easy to evaluate

  • Postfix form: x 2 y * -
  • Useful for interpretation

– Source code can be reconstructed

  • Helpful in program understanding

x – 2 * y

  • x

2 y

*

slide-7
SLIDE 7

Manas Thakur CS502: Compiler Design 7

Directed Acyclic Graph (DAG)

  • AST with a unique node for each value
  • Advantages:

– Compact (reduces redundancy) – Won’t have to evaluate the same expression twice

a + a * (b – c) + (b – c) * d

+ + * * + + a a * *

  • d

d b b c c a a

  • b

b c c + + * * + + * * d d a a

  • b

b c c

becomes

slide-8
SLIDE 8

Manas Thakur CS502: Compiler Design 8

Three Address Code (3AC or TAC)

  • At most

– Three addresses (names/constants) in the instruction – One operator on the right hand side of assignment

  • General statement form: x = y op z
  • Longer expressions are simplified by introducing temporaries
  • Advantages:

– Easy to understand – Names for intermediate values

z = x – 2 * y

becomes

t1 = 2 * y t2 = x – t1 z = t2

  • r

t1 = 2 * y z = x – t1

slide-9
SLIDE 9

Manas Thakur CS502: Compiler Design 9

More about 3AC

  • Allows variety of instructions:

– Assignments

  • x = y op z
  • x = op y
  • x = y
  • x = y[i] and x[i] = y
  • x = y.f and x.f = y

– Branches

  • goto L
  • if x goto L

– Procedure calls

  • param x1; param x2; ..., param xn; call p, n

– Pointer assignments

slide-10
SLIDE 10

Manas Thakur CS502: Compiler Design 10

Classwork: Generate 3AC

  • r = a + a * (b – c) + (b – c) * d
  • if (x < y) S1 else S2
  • while (x < 10) S1

t1 = b - c t2 = t1 * d t3 = b – c t4 = a * t3 t5 = t4 + t2 r = a + t5 t1 = x < y if !t1 goto L1 S1 goto L2 L1: S2 L2: L1: c = x < 10 t = !c if !t goto L2 S1 goto L1 L2:

slide-11
SLIDE 11

Manas Thakur CS502: Compiler Design 11

3AC Representations

  • Triples
  • Quadruples

Assignment: a = b * -c + d * -e

minus c t1 * b t1 t2 minus e t3 * d t3 t4 + t2 t4 t5 = t5 a t1 = minus c t2 = b * t1 t3 = minus e t4 = d * t3 t5 = t2 + t4 a = t5

  • p

arg1 arg2 result minus c * b (0) minus e * d (2) + (1) (3) = a (4)

  • p

arg1 arg2

1 2 3 4 5

Instructions can be reordered easily. Instructions cannot be reordered easily.

slide-12
SLIDE 12

Manas Thakur CS502: Compiler Design 12

3AC Representations (Cont.)

  • Triples
  • Quadruples

Assignment: a = b * -c + d * -e

minus c * b (0) minus e * d (2) + (1) (3) = a (4)

  • p

arg1 arg2

1 2 3 4 5

Instructions cannot be reordered easily.

1 2 3 4 5

(2) (3) (0) (1) (4) (5)

can be reordered easily

(0) (1) (2) (3) (4) (5)

Indirect triples

t1 = minus c t2 = b * t1 t3 = minus e t4 = d * t3 t5 = t2 + t4 a = t5

slide-13
SLIDE 13

Manas Thakur CS502: Compiler Design 13

2 Address Code

  • Where have you seen them?

– Common in Assembly

  • Example:
  • Larger number of instructions compared to 3AC
  • Good for register allocation

z = x – 2 * y MOV R1, y MUL R1, 2 MOV R2, x SUB R2, R1 MOV x , R2

becomes

slide-14
SLIDE 14

Manas Thakur CS502: Compiler Design 14

1 Address Code

  • Stack-based computers
  • Example: Java Virtual Machines!
  • Advantages:

– Simple to generate and execute – Compact form

  • There is a reason you find Java based systems popular in:

– Embedded systems – Mobile phones (Android) – Systems where code is transmitted (Internet)

x – 2 * y

becomes

push x push 2 push y multiply subtract

slide-15
SLIDE 15

Manas Thakur CS502: Compiler Design 15

What next?

  • More IRs (while learning CGO):

– Control-Flow Graph (CFG) – Static Single Assignment (SSA)

  • Next class: IR generation

– Focus: 3AC. Why?

  • Comfortable and still affordable!
  • Offers a wide understanding of

the involved challenges.

  • Assignment 3 would involve

3AC generation!

– But there is time for it.

slide-16
SLIDE 16

CS502: Compiler Design Intermediate Code Generation (Cont.) Manas Thakur

Fall 2020

slide-17
SLIDE 17

Manas Thakur CS502: Compiler Design 17

IR Generation

  • High level language is complex
  • Goal: Lower HLL code to a simpler form (3AC)
  • Constructs that we need to translate:

– Variable declarations – Expressions – Array accesses – Control structures (conditionals, loops) – Function calls – Function bodies – Classes and objects!

  • Approach: Syntax-directed translation from parse tree.
slide-18
SLIDE 18

Manas Thakur CS502: Compiler Design 18

Variable declarations

  • Use symbol tables

– Maps from names to values

  • Take care of nested scopes

– What will you do at the entry to a new block? – What to do at a function call? – Function entry? – Function exit? – Need to push and pop the current environment.

  • Fields of a structure/class?

– We will study in detail when we learn translating objects.

slide-19
SLIDE 19

Manas Thakur CS502: Compiler Design 19

Lowering scheme

  • Code template for each AST node

– Captures key semantics of each construct – Has blanks for the node’s children – Implemented in a function called gen

  • To fill in the template:

– Call the function gen recursively on children

  • Did anyone say “visitors”?

– Plug code into the blanks

  • How to stitch code together?

– gen stores the results into a temporary – Emit code that combines the results for the syntactic construct

represented by the current node

slide-20
SLIDE 20

Manas Thakur CS502: Compiler Design 20

Translating expressions

Say E.addr is a synthesized attribute that denotes the temporary holding the value of E.

Construct Translation E -> E1 + E2 E.addr = newtemp(); gen(E.addr ‘=’ E1.addr ‘+’ E2.addr) Construct visit() method E -> E1 + E2 t1 = visit(E1); t2 = visit(E2); r = newtemp(); System.out.println(“r = t1 + t2”); return r; Construct Translation E -> E1 + E2 E.addr = newtemp(); E.code = E1.code || E2.code || gen(E.addr ‘=’ E1.addr ‘+’ E2.addr) In terms of our assignment: In terms of an attribute E.code:

slide-21
SLIDE 21

Manas Thakur CS502: Compiler Design 21

Translating expressions (Cont.)

  • symTab is the symbol table of the current scope.

Construct Translation S -> id = E gen(symTab.get(id.lexeme) ‘=’ E.addr) E -> -E1 E.addr = newtemp() gen(E.addr ‘=’ ‘-’E1.addr) E -> (E1) E.addr = E1.addr E -> id E.addr = symTab.get(id.lexeme)

slide-22
SLIDE 22

Manas Thakur CS502: Compiler Design 22

Example

  • 3AC for a = b + -c:

Construct Translation S -> id = E gen(symTab.get(id.lexeme) ‘=’ E.addr) E -> E1 + E2 E.addr = newtemp(); gen(E.addr ‘=’ E1.addr ‘+’ E2.addr) E -> -E1 E.addr = newtemp() gen(E.addr ‘=’ ‘-’E1.addr) E -> (E1) E.addr = E1.addr E -> id E.addr = symTab.get(id.lexeme)

t1 = - c t2 = b + t1 a = t2

slide-23
SLIDE 23

Manas Thakur CS502: Compiler Design 23

Translating array references

  • Each type has a width (e.g., int may have 4)
  • How do you get the relative address (from base) of the ith

element of an array A, that is, A[i]?

– base + i * w

  • What about A[i][j]?

– base + i1 * w1 + i2 * w2

  • In general for a k-dimension array:

– base + i1 * w1 + i2 * w2 + ... + ik * wk

  • Note: We are assuming row-major order.
slide-24
SLIDE 24

Manas Thakur CS502: Compiler Design 24

Translating array references (Cont.)

  • Say we have the following grammar rule for generating a

possibly multidimensional array reference:

  • Say we have the following attributes:

– L.addr: a temporary that holds the offset for the array reference – L.array: pointer to the symTab entry for the array name – L.array.base: actual location of the array reference – L.type: type of the subarray generated by L – t.width: width of type t – t.elem: type of the elements of array type t

L -> L[E] | id [E]

slide-25
SLIDE 25

Manas Thakur CS502: Compiler Design 25

Translating array references (Cont.)

S -> L = E E -> L L -> id[E] L -> L1[E] gen(L.array.base’[‘L.addr’]’ ’=’ E.addr) E.addr = newtemp() gen(E.addr ‘=’ L.array.base’[’L.addr’]’) L.array = symTab.get(id.lexeme) L.type = L.array.type.elem L.addr = newtemp() gen(L.addr ‘=’ E.addr ‘*’ L.type.width) L.array = L1.array L.type = L1.type.elem t = newtemp() L.addr = newtemp() gen(t ‘=’ E.addr ‘*’ L.type.width) gen(L.addr ‘=’ L1.addr ‘+’ t)

slide-26
SLIDE 26

Manas Thakur CS502: Compiler Design 26

Example

  • 3AC for c + a[i][j],

– where type of a is array(2, array(3, integer)), – and width of integer is 4.

t1 = i * 12 t2 = j * 4 t3 = t1 + t2 t4 = a[t3] t5 = c + t4

Construct Translation S -> L = E gen(L.array.base’[‘L.addr’]’ ’=’ E.addr) E -> L E.addr = newtemp() gen(E.addr ‘=’ L.array.base’[’L.addr’]’) L -> id[E] L.array = symTab.get(id.lexeme) L.type = L.array.type.elem; L.addr = newtemp() gen(L.addr ‘=’ E.addr ‘*’ L.type.width) L -> L1[E] L.array = L1.array; L.type = L1.type.elem t = newtemp(); L.addr = newtemp() gen(t ‘=’ E.addr ‘*’ L.type.width) gen(L.addr ‘=’ L1.addr ‘+’ t)

Section 6.4 (DB)

slide-27
SLIDE 27

CS502: Compiler Design Intermediate Code Generation (Cont.) Manas Thakur

Fall 2020

slide-28
SLIDE 28

Manas Thakur CS502: Compiler Design 28

Control flow

  • By default straight line

– One statement after another

  • Conditionals

– if, if-else, switch

  • Loops

– while, do-while, for, repeat-until

  • But we first need to consider:

– Boolean expressions – Jumps (gotos) and labels

slide-29
SLIDE 29

Manas Thakur CS502: Compiler Design 29

Translating boolean expressions

  • B -> B || B | B && B | !B | E relop E | true | false
  • relop -> < | <= | > | >= | == | !=
  • How to optimize the evaluation of || and &&?

– Short-circuiting – We need to keep that in mind in order to generate efficient 3AC. – When can not doing short-circuiting affect correctness?

slide-30
SLIDE 30

Manas Thakur CS502: Compiler Design 30

Translating boolean expressions (Cont.)

B -> B1 || B2 B1.true = B.true B1.false = newlabel() B2.true = B.true B2.false = B.false B.code = B1.code || label(B1.false) || B2.code B -> B1 && B2 B1.true = newlabel() B1.false = B.false B2.true = B.true B2.false = B.false B.code = B1.code || label(B1.true) || B2.code Say apart from the synthesized attributed code, each Boolean expressions has two inherited attributes true and false. We will see how B.true and B.false are set after two slides.

slide-31
SLIDE 31

Manas Thakur CS502: Compiler Design 31

Translating boolean expressions (Cont.)

B1.true = B.false B1.false = B.true B.code = B1.code B.code = E1.code || E2.code || gen(‘if’ E1.addr relop E2.addr ‘goto’ B.true) || gen(‘goto’ B.false) B.code = gen(‘goto’ B.true) B.code = gen(‘goto’ B.false) B -> !B1 B -> E1 relop E2 B -> true B -> false

slide-32
SLIDE 32

Manas Thakur CS502: Compiler Design 32

Translating control-flow expressions

S -> if (B) S1 S -> if (B) S1 else S2 B.true = newlabel() B.false = S1.next = S.next S.code = B.code || label(B.true) || S1.code B.true = newlabel() B.false = newlabel() S1.next = S2.next = S.next S.code = B.code || label(B.true) || S1.code || gen(‘goto’ S.next) || label(B.false) || S2.code Notice that next is another inherited attribute with each statement.

slide-33
SLIDE 33

Manas Thakur CS502: Compiler Design 33

Translating control-flow expressions (Cont.)

S -> while (B) S1 S -> S1 S2 begin = newlabel() B.true = newlabel() B.false = S.next S1.next = begin S.code = label(begin) || B.code || label(B.true) || S1.code || gen(‘goto’ begin) S1.next = newlabel() S2.next = S.next S.code = S1.code || label(S1.next) || S2.code

slide-34
SLIDE 34

Manas Thakur CS502: Compiler Design 34

3AC for if (x < 100 || x > 200 && x != y) x = 0

if x < 100 goto L2 goto L3 L3: if x > 200 goto L4 goto L1 L4: if x != y goto L2 goto L1 L2: x = 0 L1:

B1.true = B.true B1.false = newlabel() B2.true = B.true; B2.false = B.false B.code = B1.code || label(B1.false) || B2.code B1.true = newlabel() B1.false = B.false; B2.true = B.true B2.false = B.false B.code = B1.code || label(B1.true) || B2.code B.true = newlabel() B.false = S1.next = S.next S.code = B.code || label(B.true) || S1.code B -> B1 || B2 B.code = E1.code || E2.code || gen(‘if’ E1.addr relop E2.addr ‘goto’ B.true) || gen(‘goto’ B.false) if (B) S1

B -> E1 relop E2

B -> B1 && B2 S1.next = newlabel() S2.next = S.next S.code = S1.code || label(S1.next) || S2.code

S -> S1 S2

if x < 100 goto L2 if x <= 200 goto L1 if x == y goto L1 L2: x = 0 L1:

another way straightforward

slide-35
SLIDE 35

Manas Thakur CS502: Compiler Design 35

Backpatching

  • S -> if (B) S1 required us to pass label for evaluating B.

– We did it using inherited attributes.

  • Alternatively, we could leave the label unspecified,

– and fill it in later.

  • Called backpatching.

– A general concept for one-pass code generation. – Self reading: Section 6.7 of Dragon book.

slide-36
SLIDE 36

Manas Thakur CS502: Compiler Design 36

Translating break and continue

  • break and continue are special (disciplined?) gotos.
  • Their IR needs

– currently enclosing loop/switch. – goto to a label just outside/before the enclosing block.

  • How to generate the 3AC for break?

– either pass on the enclosing block and label as inherited attributes,

  • r

– use backpatching to fill-in the label of goto.

  • For continue?
slide-37
SLIDE 37

Manas Thakur CS502: Compiler Design 37

Translating switch statements

  • Using nested if-else
  • Using a table of pairs <Vi, Si>
  • Using a hash-table

– when n is large (say >10)

  • Special case when Vis are

consecutive integrals

– Indexed array would be sufficient

switch (E) { case V1: S1 case V2: S2 ... case Vn-1: Sn-1 default: Sn }

slide-38
SLIDE 38

Manas Thakur CS502: Compiler Design 38

Where are we?

  • Learnt to generate 3AC for:

– Expressions – Array references – Control-flow statements

  • Key learning:

– Generate code for yourself; trust the family to patch-up

  • Next class (when?):

– Translating classes/structures, objects, object references.