compiler construction
play

Compiler Construction Hanspeter Mssenbck University of Linz - PowerPoint PPT Presentation

Compiler Construction Hanspeter Mssenbck University of Linz http://ssw.jku.at/Misc/CC/ Text Book N.Wirth: Compiler Construction, Addison-Wesley 1996 http://www.ethoberon.ethz.ch/WirthPubl/CBEAll.pdf 1 1. Overview 1.1 Motivation 1.2


  1. Compiler Construction Hanspeter Mössenböck University of Linz http://ssw.jku.at/Misc/CC/ Text Book N.Wirth: Compiler Construction, Addison-Wesley 1996 http://www.ethoberon.ethz.ch/WirthPubl/CBEAll.pdf 1

  2. 1. Overview 1.1 Motivation 1.2 Structure of a Compiler 1.3 Grammars 1.4 Chomsky's Classification of Grammars 1.5 The MicroJava Language 2

  3. Why should I learn about compilers? It's part of the general background of any software engineer • How do compilers work? • How do computers work? (instruction set, registers, addressing modes, run-time data structures, ...) • What machine code is generated for certain language constructs? (efficiency considerations) • What is good language design? • Opportunity for a non-trivial programming project Also useful for general software development • Reading syntactically structured command-line arguments • Reading structured data (e.g. XML files, part lists, image files, ...) • Searching in hierarchical namespaces • Interpretation of command codes • ... 3

  4. 1. Overview 1.1 Motivation 1.2 Structure of a Compiler 1.3 Grammars 1.4 Chomsky's Classification of Grammars 1.5 The MicroJava Language 4

  5. Dynamic Structure of a Compiler character stream v a l = 1 0 * v a l + i lexical analysis (scanning) ident assign ident plus ident number times 1 3 2 4 1 5 1 token stream token number "val" 10 "val" "i" token value syntax analysis (parsing) Statement syntax tree Expression Term ident = number * ident + ident 5

  6. Dynamic Structure of a Compiler Statement syntax tree Expression Term ident = number * ident + ident semantic analysis (type checking, ...) intermediate syntax tree, symbol table, ... representation optimization code generation const 10 machine code load 1 mul ... 6

  7. Compiler versus Interpreter Compiler translates to machine code scanner parser ... code generator loader source code machine code Interpreter executes source code "directly" • statements in a loop are scanned and parsed scanner parser again and again source code interpretation Variant: interpretation of intermediate code • source code is translated into the ... compiler ... VM code of a virtual machine (VM) source code intermediate code • VM interprets the code (e.g. Java bytecode) simulating the physical machine 7

  8. Static Structure of a Compiler "main program" parser & directs the whole compilation sem. analysis scanner code generation provides tokens from generates machine code symbol table the source code maintains information about declared names and types uses data flow 8

  9. 1. Overview 1.1 Motivation 1.2 Structure of a Compiler 1.3 Grammars 1.4 Chomsky's Classification of Grammars 1.5 The MicroJava Language 9

  10. What is a grammar? Example Statement = "if" "(" Condition ")" Statement ["else" Statement]. Four components terminal symbols are atomic "if", ">=", ident, number, ... nonterminal symbols are decomposed Statement, Condition, Type, ... into smaller units productions rules how to decom- Statement = Designator "=" Expr ";". Designator = ident ["." ident]. pose nonterminals ... start symbol topmost nonterminal Java 10

  11. EBNF Notation John Backus : developed the first Fortran compiler Extended Backus-Naur form Peter Naur : edited the Algol60 report for writing grammars terminal nonterminal terminates literal Productions symbol symbol a production Statement = "write" ident "," Expression ";" . left-hand side right-hand side by convention • terminal symbols start with lower-case letters • nonterminal symbols start with upper-case letters Metasymbols ≡ a or b or c | separates alternatives a | b | c ≡ ab | ac (...) groups alternatives a (b | c) ≡ ab | b [...] optional part [a] b ≡ b | ab | aab | aaab | ... {...} iterative part {a}b 11

  12. Example: Grammar for Arithmetic Expressions Productions Expr = ["+" | "-"] Term {("+" | "-") Term}. Term = Factor {("*" | "/") Factor}. Expr Factor = ident | number | "(" Expr ")". Terminal symbols Term simple TS: "+", "-", "*", "/", "(", ")" (just 1 instance) terminal classes: ident, number (multiple instances) Factor Nonterminal symbols Expr, Term, Factor Start symbol Expr 12

  13. Terminal Start Symbols of Nonterminals What are the terminal symbols with which a nonterminal can start? Expr = ["+" | "-"] Term {("+" | "-") Term}. Term = Factor {("*" | "/") Factor}. Factor = ident | number | "(" Expr ")". First(Factor) = ident, number, "(" First(Term) = First(Factor) = ident, number, "(" First(Expr) = "+", "-", First(Term) = "+", "-", ident, number, "(" 14

  14. Terminal Successors of Nonterminals Which terminal symbols can follow a nonterminal in the grammar? Expr = ["+" | "-"] Term {("+" | "-") Term}. Term = Factor {("*" | "/") Factor}. Factor = ident | number | "(" Expr ")". Where does Expr occur on the Follow(Expr) = ")" , eof right-hand side of a production? What terminal symbols can follow there? Follow(Term) = "+", "-", Follow(Expr) = "+", "-", ")", eof Follow(Factor) = "*", "/", Follow(Term) = "*", "/", "+", "-", ")", eof 15

  15. Strings and Derivations String A finite sequence of symbols from an alphabet. Alphabet: all terminal and nonterminal symbols of a grammar. Strings are denoted by greek letters ( α , β , γ , ...) e.g: α = ident + number β = - Term + Factor * number Empty String The string that contains no symbol (denoted by ε ). Derivation α β α ⇒ β ⇒ (direct derivation) Term + Factor * Factor Term + ident * Factor NTS right-hand side of a production of NTS α ⇒ * β α ⇒ γ 1 ⇒ γ 2 ⇒ ... ⇒ γ n ⇒ β (indirect derivation) 16

  16. Recursion X ⇒ * ω 1 X ω 2 A production is recursive if Can be used to express repetitions and nested structures X ⇒ ω 1 X ω 2 Direct recursion X ⇒ X a ⇒ X a a ⇒ X a a a ⇒ b a a a a a ... Left recursion X = b | X a. X ⇒ a X ⇒ a a X ⇒ a a a X ⇒ ... a a a a a b Right recursion X = b | a X. X ⇒ (X) ⇒ ((X)) ⇒ (((X))) ⇒ (((... (b)...))) Central recursion X = b | "(" X ")". X ⇒ * ω 1 X ω 2 Indirect recursion Example Expr ⇒ Term ⇒ Factor ⇒ "(" Expr ")" Expr = Term {"+" Term}. Term = Factor {"*" Factor}. Factor = id | "(" Expr ")". 17

  17. How to Remove Left Recursion Left recursion cannot be handled in topdown parsing Both alternatives start with b . X = b | X a. The parser cannot decide which one to choose Left recursion can always be transformed into iteration X ⇒ baaaa...a X = b {a} . Another example E = T | E "+" T. What phrases can be derived? T E T + T E + T T + T + T E + T + T ... E + T + T + T ... Thus E = T {"+" T}. 18

  18. 1. Overview 1.1 Motivation 1.2 Structure of a Compiler 1.3 Grammars 1.4 Chomsky's Classification of Grammars 1.5 The MicroJava Language 19

  19. Classification of Grammars Due to Noam Chomsky (1956) Grammars are sets of productions of the form α = β . Unrestricted grammars ( α and β arbitrary) class 0 e.g: X = a X b | Y c Y. X ⇒ aXb ⇒ aYcYb ⇒ dYb ⇒ bbb a Y c = d. d Y = b b. Recognized by Turing machines Context-sensitive grammars (| α | ≤ | β |) class 1 e.g: a X = a b c. Recognized by linear bounded automata Context-free grammars ( α = NT, β ≠ ε ) class 2 e.g: X = a b c. Recognized by push-down automata Only these two classes are relevant in compiler Regular grammars ( α = NT, β = T or T NT) construction class 3 e.g: X = b | b Y. Recognized by finite automata 20

  20. 1. Overview 1.1 Motivation 1.2 Structure of a Compiler 1.3 Grammars 1.4 Chomsky's Classification of Grammars 1.5 The MicroJava Language 21

  21. Sample MicroJava Program main program; no separate compilation program P final int size = 10; class Table { classes (without methods) int[] pos; int[] neg; } Table val; global variables { void main() int x, i; local variables { //---------- initialize val ---------- val = new Table; val.pos = new int[size]; val.neg = new int[size]; i = 0; while (i < size) { val.pos[i] = 0; val.neg[i] = 0; i = i + 1; } //---------- read values ---------- read(x); while (x != 0) { if (x >= 0) val.pos[x] = val.pos[x] + 1; else if (x < 0) val.neg[-x] = val.neg[-x] + 1; read(x); } } } 22

  22. Lexical Structure of MicroJava Identifiers ident = letter {letter | digit | '_'}. Numbers all numbers are of type int number = digit {digit}. Char constants all character constants are of type char charConst = '\'' char '\''. (may contain \r, \n, \t) no strings Keywords program class if else while read print return void final new Operators + - * / % == != > >= < <= ( ) [ ] { } = ; , . Comments // ... eol Types arrays classes int char 23

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend