cs502 compiler design intermediate code generation manas
play

CS502: Compiler Design Intermediate Code Generation Manas Thakur - PowerPoint PPT Presentation

CS502: Compiler Design Intermediate Code Generation Manas Thakur Fall 2020 Midway through the course! Character stream Machine-Independent Machine-Independent Lexical Analyzer Lexical Analyzer Code Optimizer Code Optimizer B a c k e n d


  1. CS502: Compiler Design Intermediate Code Generation Manas Thakur Fall 2020

  2. Midway through the course! Character stream Machine-Independent Machine-Independent Lexical Analyzer Lexical Analyzer Code Optimizer Code Optimizer B a c k e n d Intermediate representation Token stream F r o n t e n d Syntax Analyzer Code Generator Syntax Analyzer Code Generator Syntax tree Target machine code Machine-Dependent Semantic Analyzer Machine-Dependent Semantic Analyzer Code Optimizer Code Optimizer Syntax tree Target machine code Intermediate Intermediate Symbol Code Generator Code Generator Table Intermediate representation Manas Thakur CS502: Compiler Design 2

  3. Roles of IR Generator ● Act as a glue between front-end and back-end – Or source and machine codes ● Lower abstraction from source level – To make life simple ● Maintain some high-level information – To keep life interesting ● Make the dream of m+n components for m languages and n platforms look like a possibility – Scala to Java Bytecode, for example ● Enable machine-independent optimization – Next phase Manas Thakur CS502: Compiler Design 3

  4. Intermediate Representations (IR) ● IR design affects compiler speed and capabilities ● Some important IR properties: – Ease of generation, manipulation, optimization – Size of the representation – Level of abstraction: level of detail in the IR ● How close is the IR to source code? To the machine? ● What kinds of operations are represented? ● Often, different IRs for different jobs: – High-level IR: close to the source language – Low-level IR: close to the assembly code – Some compilers even have mid-level IRs! Manas Thakur CS502: Compiler Design 4

  5. Kinds of IRs ● Structural Examples: – Graph oriented ASTs, DAGs – Heavily used in IDEs, source-to-source translators – Tend to be large ● Linear Examples: 3 address code – Pseudo-code for an abstract machine Bytecode (Stack machine) – Level of abstraction varies – Simple, compact data structures ● Hybrid Examples: Control-fmow graphs, – Combination of graphs and linear code Ideal IR (HotSpot C2) Manas Thakur CS502: Compiler Design 5

  6. Abstract Syntax Tree (AST) ● Parse tree with some intermediate nodes removed - x – 2 * y x * y 2 ● Advantages: – Easy to evaluate ● Postfix form: x 2 y * - ● Useful for interpretation – Source code can be reconstructed ● Helpful in program understanding Manas Thakur CS502: Compiler Design 6

  7. Directed Acyclic Graph (DAG) ● AST with a unique node for each value a + a * (b – c) + (b – c) * d + + + + + * + * + * + * * becomes a d * * - d a d * - d - a - - c a a b - c a b c b c c b b c b ● Advantages: – Compact (reduces redundancy) – Won’t have to evaluate the same expression twice Manas Thakur CS502: Compiler Design 7

  8. Three Address Code (3AC or TAC) ● At most – Three addresses (names/constants) in the instruction – One operator on the right hand side of assignment ● General statement form: x = y op z ● Longer expressions are simplified by introducing temporaries t1 = 2 * y t2 = x – t1 z = x – 2 * y becomes z = t2 or ● Advantages: t1 = 2 * y z = x – t1 – Easy to understand – Names for intermediate values Manas Thakur CS502: Compiler Design 8

  9. More about 3AC ● Allows variety of instructions: – Assignments ● x = y op z ● x = op y ● x = y ● x = y[i] and x[i] = y ● x = y.f and x.f = y – Branches ● goto L ● if x goto L – Procedure calls ● param x 1 ; param x 2 ; ..., param x n ; call p, n – Pointer assignments Manas Thakur CS502: Compiler Design 9

  10. Classwork: Generate 3AC t1 = b - c r = a + a * (b – c) + (b – c) * d ● t2 = t1 * d t3 = b – c t4 = a * t3 t5 = t4 + t2 if (x < y) S1 else S2 r = a + t5 ● t1 = x < y if !t1 goto L1 S1 goto L2 L1: S2 L2: L1: c = x < 10 t = !c while (x < 10) S1 ● if !t goto L2 S1 goto L1 L2: Manas Thakur CS502: Compiler Design 10

  11. 3AC Representations ● Triples ● Quadruples Instructions cannot be reordered easily. Instructions can be reordered easily. Assignment: a = b * -c + d * -e op arg1 arg2 result op arg1 arg2 minus c t1 t1 = minus c minus c 0 t2 = b * t1 * b t1 t2 1 * b (0) t3 = minus e minus e t3 minus e 2 3 t4 = d * t3 * d t3 t4 * d (2) t5 = t2 + t4 + t2 t4 t5 + (1) (3) 4 a = t5 = t5 a = a (4) 5 Manas Thakur CS502: Compiler Design 11

  12. 3AC Representations (Cont.) ● Triples Instructions cannot be ● Quadruples reordered easily. Assignment: a = b * -c + d * -e op arg1 arg2 (0) (2) minus c t1 = minus c 0 0 (3) (1) 1 1 * b (0) t2 = b * t1 (0) (2) minus e t3 = minus e 2 2 (1) 3 3 (3) * d (2) t4 = d * t3 (4) (4) + (1) (3) t5 = t2 + t4 4 4 (5) (5) = a (4) a = t5 5 5 Indirect triples can be reordered easily Manas Thakur CS502: Compiler Design 12

  13. 2 Address Code ● Where have you seen them? – Common in Assembly ● Example: MOV R1, y MUL R1, 2 z = x – 2 * y becomes MOV R2, x SUB R2, R1 MOV x , R2 ● Larger number of instructions compared to 3AC ● Good for register allocation Manas Thakur CS502: Compiler Design 13

  14. 1 Address Code ● Stack-based computers ● Example: Java Virtual Machines! push x push 2 push y becomes x – 2 * y multiply subtract ● Advantages: – Simple to generate and execute – Compact form ● There is a reason you find Java based systems popular in: – Embedded systems – Mobile phones (Android) – Systems where code is transmitted (Internet) Manas Thakur CS502: Compiler Design 14

  15. What next? ● More IRs (while learning CGO): – Control-Flow Graph (CFG) – Static Single Assignment (SSA) ● Next class: IR generation – Focus: 3AC. Why? ● Comfortable and still affordable! ● Offers a wide understanding of the involved challenges. ● Assignment 3 would involve 3AC generation! – But there is time for it. Manas Thakur CS502: Compiler Design 15

  16. CS502: Compiler Design Intermediate Code Generation (Cont.) Manas Thakur Fall 2020

  17. IR Generation ● High level language is complex ● Goal: Lower HLL code to a simpler form (3AC) ● Constructs that we need to translate: – Variable declarations – Expressions – Array accesses – Control structures (conditionals, loops) – Function calls – Function bodies – Classes and objects! ● Approach: Syntax-directed translation from parse tree. Manas Thakur CS502: Compiler Design 17

  18. Variable declarations ● Use symbol tables – Maps from names to values ● Take care of nested scopes – What will you do at the entry to a new block? – What to do at a function call? – Function entry? – Function exit? – Need to push and pop the current environment. ● Fields of a structure/class? – We will study in detail when we learn translating objects. Manas Thakur CS502: Compiler Design 18

  19. Lowering scheme ● Code template for each AST node – Captures key semantics of each construct – Has blanks for the node’s children – Implemented in a function called gen ● To fill in the template: – Call the function gen recursively on children ● Did anyone say “visitors”? – Plug code into the blanks ● How to stitch code together? – gen stores the results into a temporary – Emit code that combines the results for the syntactic construct represented by the current node Manas Thakur CS502: Compiler Design 19

  20. Translating expressions Say E.addr is a synthesized attribute that denotes the temporary holding the value of E . Construct Translation E.addr = newtemp(); E -> E 1 + E 2 gen(E.addr ‘=’ E 1 .addr ‘+’ E 2 .addr) attribute E.code : Construct Translation In terms of an E.addr = newtemp(); E -> E 1 + E 2 E.code = E1.code || E2.code || gen(E.addr ‘=’ E 1 .addr ‘+’ E 2 .addr) Construct visit() method In terms of our t1 = visit(E1); assignment: t2 = visit(E2); E -> E 1 + E 2 r = newtemp(); System.out.println(“r = t1 + t2”); return r; Manas Thakur CS502: Compiler Design 20

  21. Translating expressions (Cont.) Construct Translation S -> id = E gen(symTab.get(id.lexeme) ‘=’ E.addr) E.addr = newtemp() E -> -E 1 gen(E.addr ‘=’ ‘-’E 1 .addr) E -> (E 1 ) E.addr = E 1 .addr E -> id E.addr = symTab.get(id.lexeme) ● symTab is the symbol table of the current scope. Manas Thakur CS502: Compiler Design 21

  22. Example ● 3AC for a = b + -c : t1 = - c t2 = b + t1 a = t2 Construct Translation S -> id = E gen(symTab.get(id.lexeme) ‘=’ E.addr) E.addr = newtemp(); E -> E 1 + E 2 gen(E.addr ‘=’ E 1 .addr ‘+’ E 2 .addr) E.addr = newtemp() E -> -E 1 gen(E.addr ‘=’ ‘-’E 1 .addr) E -> (E 1 ) E.addr = E 1 .addr E -> id E.addr = symTab.get(id.lexeme) Manas Thakur CS502: Compiler Design 22

  23. Translating array references ● Each type has a width (e.g., int may have 4) ● How do you get the relative address (from base) of the i th element of an array A , that is, A[i] ? – base + i * w ● What about A[i][j] ? – base + i 1 * w 1 + i 2 * w 2 ● In general for a k-dimension array: – base + i 1 * w 1 + i 2 * w 2 + ... + i k * w k ● Note: We are assuming row-major order. Manas Thakur CS502: Compiler Design 23

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend