cs502 compiler design syntax analysis manas thakur
play

CS502: Compiler Design Syntax Analysis Manas Thakur Fall 2020 - PowerPoint PPT Presentation

CS502: Compiler Design Syntax Analysis Manas Thakur Fall 2020 Where are we? Character stream Machine-Independent Machine-Independent Lexical Analyzer Lexical Analyzer Code Optimizer Code Optimizer B a c k e n d Intermediate


  1. CS502: Compiler Design Syntax Analysis Manas Thakur Fall 2020

  2. Where are we? Character stream Machine-Independent Machine-Independent Lexical Analyzer Lexical Analyzer Code Optimizer Code Optimizer B a c k e n d Intermediate representation Token stream F r o n t e n d Syntax Analyzer Code Generator Syntax Analyzer Code Generator Target machine code Syntax tree Machine-Dependent Machine-Dependent Semantic Analyzer Semantic Analyzer Code Optimizer Code Optimizer Syntax tree Target machine code Intermediate Intermediate Symbol Code Generator Code Generator Table Intermediate representation Manas Thakur CS502: Compiler Design 2

  3. Roles of Parsing / Syntax analysis ● Read the specification given by the language implementor. ● Get help from lexer to collect tokens. ● Check if the sequence of tokens matches the specification. ● Declare successful program structure or report errors in a useful manner. ● Later: Also identify some semantic errors. Manas Thakur CS502: Compiler Design 3

  4. Specifying the syntax ● Regular expressions are mostly not capable enough. ● Syntactic constructs specified using context-free grammars . ● The corresponding language is called a context-free language . ● CFGs subsume REs. – Then why did we use REs for scanning? ● Right tool for the right job! Manas Thakur CS502: Compiler Design 4

  5. Contex-Free Grammar (CFG) list → list + digit list → list + digit 1. A set of terminals called tokens. list → list – digit list → list – digit list → digit  Terminals are elementary symbols list → digit digit → 0 | 1 | ... | 8 | 9 digit → 0 | 1 | ... | 8 | 9 of the parsing language. 2. A set of non-terminals called variables.  A non-terminal represents a set of strings of terminals. 3. A set of productions. – They define the syntactic rules. 4. A start symbol designated by a non-terminal. Manas Thakur CS502: Compiler Design 5

  6. Productions All of the below are productions (or rules): list → list + digit list → list + digit list → list – digit list → list – digit list → digit list → digit digit → 0 | 1 | ... | 8 | 9 digit → 0 | 1 | ... | 8 | 9 right or left or body head Manas Thakur CS502: Compiler Design 6

  7. Derivations ● A grammar derives strings by beginning with the start symbol and repeatedly replacing a non-terminal by the body of a production for that non-terminal. list → list + digit list → list + digit list → list – digit list → list – digit list → digit list → digit digit → 0 | 1 | ... | 8 | 9 digit → 0 | 1 | ... | 8 | 9 ● The above grammar derives sentences like – 3+1-0+8-2+0+1+5 – 0 ● The set of all such strings forms the language specified by the above CFG. Manas Thakur CS502: Compiler Design 7

  8. Practice ● Write a CFG to generate strings of the form 0 n 1 n . – S --> 0S1 – S --> ε – Can also be written as: ● S --> 0S1 | ε ● Homework: – wcw r Manas Thakur CS502: Compiler Design 8

  9. Derivations (cont.) ● Given a CFG, we can derive strings in the associated CFL by succesively replacing the non-terminals based on productions. Example derivation (x + 2 * y): goal → expr → expr op expr → id op expr → x op expr goal → expr goal → expr → x + expr expr → expr op expr | num | id expr → expr op expr | num | id → x + expr op expr id → a | b | ... | z id → a | b | ... | z → x + num op expr num → 0 | 1 | ... | 9 num → 0 | 1 | ... | 9 → x + 2 op expr op → + | - | * | / op → + | - | * | / → x + 2 * expr → x + 2 * id → x + 2 * y Manas Thakur CS502: Compiler Design 9

  10. Leftmost derivations ● What did we do at each step in the previous derivation? – Replaced the leftmost non-terminal Example derivation (x + 2 * y): – Called a leftmost derivation – expr , goal → expr → expr op expr expr op expr , → id op expr id op expr , etc. → x op expr → x + expr are the leftmost sentential forms → x + expr op expr → x + num op expr goal → expr goal → expr → x + 2 op expr expr → expr op expr | num | id expr → expr op expr | num | id → x + 2 * expr id → a | b | ... | z id → a | b | ... | z → x + 2 * id num → 0 | 1 | ... | 9 num → 0 | 1 | ... | 9 → x + 2 * y op → + | - | * | / op → + | - | * | / Manas Thakur CS502: Compiler Design 10

  11. Rightmost derivations ● Replace the rightmost non-terminal at each step – Called a rightmost derivation Example derivation (x + 2 * y): – expr , goal → expr expr op expr , → expr op expr expr op id , etc. → expr op id are the rightmost sentential forms → expr op y → expr * y → expr op expr * y → expr op num * y goal → expr goal → expr → expr op 2 * y expr → expr op expr | num | id expr → expr op expr | num | id → expr + 2 * y id → a | b | ... | z id → a | b | ... | z → id + 2 * y num → 0 | 1 | ... | 9 num → 0 | 1 | ... | 9 → x + 2 * y op → + | - | * | / op → + | - | * | / Manas Thakur CS502: Compiler Design 11

  12. Formally ● → * denotes a derivation of zero or more steps ● → + denotes a derivation of one or more steps If S → * β, then β is a sentential form of the associated grammar G. ● L(G) = {w | S → + w and w consists only of terminals}; w L(G) is ∈ ● called a sentence of G. The process of discovering a derivation is called parsing . ● The output is a parse tree , which we shall see tomorrow . ● Manas Thakur CS502: Compiler Design 12

  13. CS502: Compiler Design Syntax Analysis (Cont.) Manas Thakur Fall 2020

  14. Parse Tree ● A pictorial representation of program derivation. expr → expr + expr | expr * expr | id expr → expr + expr | expr * expr | id id → a | b | ... | z id → a | b | ... | z ● A parse tree for x + y * z: expr expr + expr + expr expr expr id * id expr * expr expr expr x x id id id id y z y z Manas Thakur CS502: Compiler Design 14

  15. Precedence ● Another parse tree for x+y*z: expr expr * expr * expr expr expr + expr + expr id expr expr id z id id z id id x y x y ● Operator evaluation in a left-to-right tree walk gives: (x+y)*z – Wrong answer! – Should have been: x+(y*z) Manas Thakur CS502: Compiler Design 15

  16. The precedence problem ● Our grammar has no notion of precedence or an implied order of evaluation . ● Ideally, multiplication should be enforced before addition. ● Will the green grammar generate all the strings that could be generated by the orange grammar? expr → expr + expr | expr * expr | id expr → expr + expr | expr * expr | id id → a | b | ... | z id → a | b | ... | z expr → expr + term | term expr → expr + term | term term → term * factor | factor term → term * factor | factor factor → id factor → id id → a | b | ... | z id → a | b | ... | z ● Does it solve the problem? Manas Thakur CS502: Compiler Design 16

  17. New derivation and parse tree expr → expr + term | term expr → expr + term | term term → term * factor | factor term → term * factor | factor factor → id factor → id id → a | b | ... | z id → a | b | ... | z expr → expr + term expr expr → expr + term * factor → expr + term * id + expr + term expr term → expr + term * z → expr + factor * z * term term * factor term term factor → expr + id * z factor → expr + y * z factor id factor factor id ! k l a w → term + y * z - e e r t id id z z → id + y * z id id t c e r r o → x + y * z C x y x y Manas Thakur CS502: Compiler Design 17

  18. Ambiguity रोको मत जाने दो ● – Whether to stop or let go. ● Sarah gave a bath to her dog wearing a pink t-shirt. – Who was wearing the pink t-shirt? Manas Thakur CS502: Compiler Design 18

  19. Ambiguity in grammars ● If a grammar has more than one leftmost or rightmost derivation for a single sentential form, then it is ambiguous. ● Example: <stmt> → if <expr> then <stmt> <stmt> → if <expr> then <stmt> | if <expr> then <stmt> else <stmt> | if <expr> then <stmt> else <stmt> | <other stmts> | <other stmts> ! r a m m a ● Try deriving the sentential form: r g s u o u g i – if E1 then if E2 then S1 else S2 b m A if E1 then if E1 then if E2 then if E2 then S1 S1 else else S2 S2 Manas Thakur CS502: Compiler Design 19

  20. Resolving ambiguity ● Need to re-arrange the grammar. ● Match an else with the closest unmatched then : <stmt> → <matched> <stmt> → <matched> | <unmatched> | <unmatched> <matched> → if <expr> then <matched> else <matched> <matched> → if <expr> then <matched> else <matched> | <other stmts> | <other stmts> <unmatched> → if <expr> then <stmt> <unmatched> → if <expr> then <stmt> | if <expr> then <matched> else <unmatched> | if <expr> then <matched> else <unmatched> ● Check: if E1 then if E2 then S1 else S2 ● Not a trivial task, but comes with practice. Manas Thakur CS502: Compiler Design 20

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend