compiler construction chapter 1 introduction
play

Compiler Construction Chapter 1: Introduction Slides modified from - PowerPoint PPT Presentation

Compiler Construction Chapter 1: Introduction Slides modified from Louden Book and Dr. Scherger Terminology Compiler Source Language Interpreter Target Language Translator Target Platform Relocatable Assembler


  1. Compiler Construction Chapter 1: Introduction Slides modified from Louden Book and Dr. Scherger

  2. Terminology  Compiler  Source Language  Interpreter  Target Language  Translator  Target Platform  Relocatable  Assembler  Macro substitution  Linker  Loader  IDE  Preprocessor  Cross Compiler  Editor  Dissambler  Front End  Debugger  Back End  Profiler Chapter 1: Introduction January, 2010 2

  3. Source Code Analysys Compiler Stages Scanner Tokens Parser Syntax Tree Literal Synthesis Semantic Table Analyzer Symbol Annotated Table Tree Source Code Optimizer Error Handler Intermediate Code Code Generator Target Code Target Code Optimizer Target Chapter 1: Introduction January, 2010 3

  4. Files Used by Compilers  A source code text file (.c, .cpp, .java, etc. file extensions).  Intermediate code files: transformations of source code during compilation, usually kept in temporary files rarely seen by the user.  An assembly code text file containing symbolic machine code, often produced as the output of a compiler (.asm, .s file extensions). Chapter 1: Introduction January, 2010 4

  5. Files Used by Compilers (cont.)  One or more binary object code files: machine instructions, not yet linked or executable (.obj, .o file extensions)  A binary executable file: linked, independently executable (well, not always…) code (.exe, .out extensions, or no extension). Chapter 1: Introduction January, 2010 5

  6. Compiler Execution  What is O() of a compiler?

  7. Extended Example  Source code:  a[index] = 4 + 2  Tokens:  ID Lbracket ID Rbracket AssignOp Num AddOp Num  Parse tree (syntax tree with all steps of the parser in gory detail): Chapter 1: Introduction January, 2010 7

  8. Parse Tree expression assign-expression = expression expression subscript-expression additive-expression [ ] + expression expression expression expression number number identifier identifier 4 2 a index Chapter 1: Introduction January, 2010 8

  9. Syntax Tree a "trimmed" version of the parse tree with only essential information: assign-expression additive-expression subscript-expression identifier identifier number number a index 4 2 Chapter 1: Introduction January, 2010 9

  10. Annotated Syntax Tree (with attributes) assign-expression integer additive-expression subscript-expression integer integer identifier identifier number number a index 4 2 array of integer integer integer integer Chapter 1: Introduction January, 2010 10

  11. Intermediate Code  Syntax tree very abstract  Machine code too specific  Something in between may make optimization much easier  One such representation is three-address code  Has only up to three different variables (addresses) t = 4 + 2 a[index] = t

  12. Target Code Source code: a[index] = 4 + 2 Tokens: ID Lbracket ID Rbracket AssignOp Num AddOp Num (edited & modified for this presentation): mov eax, 6 mov ecx, DWORD PTR _index$[ebp] mov DWORD PTR _a$[ebp+ecx*4], eax (Note source level constant folding optimization.) Chapter 1: Introduction January, 2010 12

  13. Source Code a[index] = 4 + 2 The Big Picture Scanner ID Lbracket ID Rbracket AssignOp Num AddOp Num Tokens Parser assign-expression Syntax Tree additive-expression subscript-expression Literal Semantic Table Analyzer identifier identifier number number a index 4 2 Symbol Annotated assign-expression Table Tree integer Source Code subscript-expression additive-expression Optimizer Error integer integer Handler t = 4 + 2 Intermediate identifier identifier number number a index 4 2 Code array of integer integer integer a[index] = t integer Code Generator mov eax, 6 Target Code mov ecx, DWORD PTR _index$[ebp] Target Code mov DWORD PTR _a$[ebp+ecx*4], eax Optimizer Target Chapter 1: Introduction January, 2010 13

  14. Algorithmic Tools  Tokens: defined using regular expressions. (Chapter 2)  Scanner:  an implementation of a finite state machine (deterministic automaton) that recognizes the token regular expressions (Chapter 2). Chapter 1: Introduction January, 2010 14

  15. Algorithmic Tools (cont.)  Parser  A push-down automaton (i.e. uses a stack), based on grammar rules in a standard format (BNF – Backus-Naur Form). (Chapters 3, 4, 5)  Semantic Analyzer and Code Generator:  Recursive evaluators based on semantic rules for attributes (properties of language constructs). (Chapters 6, 7, 8) Chapter 1: Introduction January, 2010 15

  16. Other Phase Features  Parser and scanner together typically operate as a unit (parser calls scanner repeatedly to generate tokens).  Front end:  Parser, scanner, semantic analyzer and source code optimizer depend primarily on source language.  Back end:  code generator and target code optimizer depend primarily on target language (machine architecture). Chapter 1: Introduction January, 2010 16

  17. Other Classifications  Logical unit: phase  Physical unit: separately compiled code file (see later)  Temporal unit: pass  Passes: trips through the source code (or intermediate code). These are not phases (but they could be). Chapter 1: Introduction January, 2010 17

  18. Data Structure Tools  Syntax tree:  see previous pictures.  Literal table:  "Hello, world!", 3.141592653589793, etc.  If a literal is used more than once (as they often are in a program), we still want to store it only once.  So we use a table (almost always a hash table or table of hash tables).  Symbol table:  all names (variables, functions, classes, typedefs, constants, namespaces).  Again, a hash table or set of hash tables is the most likely data structure. Chapter 1: Introduction January, 2010 18

  19. Error Handler  One of the more difficult parts of a compiler to design.  Must handle a wide range of errors  Must handle multiple errors.  Must not get stuck.  Must not get into an infinite loop (typical simple-minded strategy:count errors, stop if count gets too high). Chapter 1: Introduction January, 2010 19

  20. Kinds of Errors  Syntax:  iff (x == 0) y + = z + r; }  Semantic:  int x = "Hello, world!";  Runtime:  int x = 2;  ...  double y = 3.14159 / (x - 2); Chapter 1: Introduction January, 2010 20

  21. Errors (cont.)  A compiler must handle syntax and semantic errors, but not runtime errors (whether a runtime error will occur is an undecidable question).  Sometimes a compiler is required to generate code to catch runtime errors and handle them in some graceful way (either with or without exception handling).  This, too, is often difficult. Chapter 1: Introduction January, 2010 21

  22. Sample Compilers in This Class ("Toys")  TINY: a 4-pass compiler for the TINY language, based on Pascal (see text, pages 22-26)  C-Minus: A project language given in the text(see text, pages 26-27 and Appendix A). Based on C.  SIL: Simple Island Language: Chapter 1: Introduction January, 2010 22

  23. TINY Example read x; if x > 0 then fact := 1; repeat fact := fact * x; x := x - 1 until x = 0; write fact end Chapter 1: Introduction January, 2010 23

  24. C-Minus Example int fact( int x ) { if (x > 1) return x * fact(x-1); else return 1; } void main( void ) { int x; x = read(); if (x > 0) write( fact(x) ); } Chapter 1: Introduction January, 2010 24

  25. Structure of the TINY Compiler globals.h main.c util.h util.c scan.h scan.c parse.h parse.c symtab.h symtab.c analyze.h analyze.c code.h code.c cgen.h cgen.c Chapter 1: Introduction January, 2010 25

  26. Conditional Compilation Options  NO_PARSE:  Builds a scanner-only compiler.  NO_ANALYZE:  Builds a compiler that parses and scans only.  NO_CODE:  Builds a compiler that performs semantic analysis, but generates no code. Chapter 1: Introduction January, 2010 26

  27. Listing Options (built in - not flags)  EchoSource:  Echoes the TINY source program to the listing, together with line numbers.  TraceScan:  Displays information on each token as the scanner recognizes it.  TraceParse:  Displays the syntax tree in a linearlized format.  TraceAnalyze:  Displays summary information on the symbol table and type checking.  TraceCode:  Prints code generation-tracing comments to the code file. Chapter 1: Introduction January, 2010 27

  28. Terminology Review  Compiler  Source Language  Interpreter  Target Language  Translator  Target Platform  Relocatable  Assembler  Macro substitution  Linker  Loader  IDE  Preprocessor  Cross Compiler  Editor  Dissambler  Front End  Debugger  Back End  Profiler Chapter 1: Introduction January, 2010 28

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend