defining syntax using cfgs roadmap
play

Defining syntax using CFGs Roadmap Last time Defined context-free - PowerPoint PPT Presentation

Defining syntax using CFGs Roadmap Last time Defined context-free grammar This time CFGs for specifying a languages syntax Language membership List grammars Resolving ambiguity CFG Review G = (N,,P,S) Example:


  1. Defining syntax using CFGs

  2. Roadmap Last time – Defined context-free grammar This time – CFGs for specifying a language’s syntax • Language membership • List grammars • Resolving ambiguity

  3. CFG Review • G = (N,Σ,P,S) Example: Nested parens N = { Q } • ⇒ ! means “ derives in Σ = { ( , ) } 1 or more steps” P = Q → ( Q ) • CFG generates a | ε string by applying S = Q productions until no non-terminals remain

  4. Formal Definition of a CFG’s Language Let G = (N,Σ,P,S) be a CFG. Then L(G) = 𝑥 𝑇 ⇒ ! 𝑥 where S is the start nonterminal of G, and w is a sequence that consists of (only) terminal symbols or 𝜁

  5. A CFG Defines a Language CFG productions define the syntax of a language 1. Prog → begin Stmts end 2. Stmts → Stmts semicolon Stmt 3. | Stmt 4. Stmt → id assign Expr 5. Expr → id 6. | Expr plus id We call this notation “ BNF” (for “Backus-Naur Form”) or “ extended BNF” HTTP grammar using BNF: – http://www.w3.org/Protocols/rfc2616/rfc2616-sec2.html

  6. List Grammars • Useful to repeat a structure arbitrarily often Stmts → Stmts semicolon Stmt | Stmt Stmts Stmts ; Stmt Stmts ; Stmt List skews left Stmts Stmts ; Stmt Stmts Stmts ; Stmt …

  7. List Grammars • Useful to repeat a structure arbitrarily often Stmts → Stmt semicolon Stmts | Stmt Stmts Stmts ; Stmt List skews right Stmt ; Stmts Stmts Stmt ; Stmts Stmts Stmt ; Stmts

  8. List Grammars • What if we allowed both “skews”? Stmts → Stmts semicolon Stmts | Stmt Stmts ; Stmts Stmts Stmts ; Stmts Stmts ; Stmts Stmts ; Stmts Stmt

  9. Derivation Order Leftmost Derivation: always expand the leftmost nonterminal • Rightmost Derivation: always expand the rightmost nonterminal • Prog 1. Prog → begin Stmts end begin Stmts end 2. Stmts → Stmts semicolon Stmt 3. | Stmt Stmts semicolon Stmt 4. Stmt → id assign Expr 5. Expr → id 6. | Expr plus id Rightmost expands Leftmost expands this nonterminal this nonterminal

  10. Ambiguity Even with a fixed derivation order, it is possible to derive the same string in multiple ways For Grammar G and string w – G is ambiguous if • >1 leftmost derivation of w • >1 rightmost derivation of w • > 1 parse tree for w

  11. Exercise • Give a grammar G and a word w that has more than 1 left-most derivation in G

  12. Example: Ambiguous Grammars Expr → intlit Derive the string 4 - 7 * 3 | Expr minus Expr (assume tokenization) | Expr times Expr | lparen Expr rparen Parse Tree 1 Parse Tree 2 Expr Expr Expr times Expr Expr minus Expr Expr minus Expr intlit intlit Expr times Expr 4 3 intlit intlit intlit intlit 7 7 3 4

  13. Why is Ambiguity Bad?

  14. Why is Ambiguity Bad? Eventually, we’ll be using CFGs as the basis for our parser – Parsing is much easier when there is no ambiguity in the grammar – The parse tree may mismatch user understanding! Operator precedence 4 - 7 * 3 Expr Expr Expr times Expr Expr minus Expr Expr minus Expr intlit intlit Expr times Expr 4 3 intlit intlit intlit intlit 7 7 3 4

  15. Resolving Grammar Ambiguity: Precedence Intuitive problem – Nonterminals are the same for both operators Expr → intlit To fix precedence | Expr minus Expr | Expr times Expr – 1 nonterminal per precedence level | lparen Expr rparen – Parse lowest level first

  16. Resolving Grammar Ambiguity: Precedence lowest precedence level first Expr → intlit 1 nonterminal per precedence level | Expr minus Expr Derive the string 4 - 7 * 3 | Expr times Expr | lparen Expr rparen Expr Expr minus Expr Expr → Expr minus Expr Term Term | Term Term times Term Factor Term → Term times Term intlit Factor Factor | Factor 4 Factor → intlit intlit intlit | lparen Expr rparen 7 3

  17. Resolving Grammar Ambiguity: Precedence Fixed Grammar Expr → expr minus expr Derive the string 4 - 7 * 3 | Term Let’s try to re-build the wrong parse tree Term → Term times Term | Factor Expr Factor → intlit Term | lparen Expr rparen Term Term times Expr Factor Expr Expr minus 3 intlit Term Term We’ll never be able to derive minus Term times Term Factor without parens Factor Factor intlit 4 7 intlit intlit 3

  18. Did we fix all ambiguity? Derive the string 4 - 7 - 3 Fixed Grammar Expr → Expr minus Expr Expr | Term Term → Term times Term Expr Expr minus | Factor Expr Term Expr minus Factor → intlit Factor Term Term | lparen Expr rparen Factor Factor intlit intlit intlit NO! These subtrees could have been swapped!

  19. Where we are so far Precedence – We want correct behavior on 4 – 7 * 9 – A new nonterminal for each precedence level Associativity – We want correct behavior on 4 – 7 – 9 – Minus should be left associative : a – b – c = (a – b) – c – Problem: the recursion in a rule like Expr → Expr mi minus Expr

  20. Definition: Recursion in Grammars • A A gr grammar is s recu cursive in in (n (nonter ermin minal) al) X if if 𝑌 ⇒ ! α𝑌γ for non-empty strings of symbols α and γ • A A gr grammar is s le left ft-recu cursive in in X if if 𝑌 ⇒ ! 𝑌γ for non-empty string of symbols γ • A A gr grammar is s rig right-recu cursive in in X if if 𝑌 ⇒ ! α𝑌 for non-empty string of symbols α

  21. Resolving Grammar Ambiguity: Associativity Recognize left-assoc operators with left-recursive productions Recognize right-assoc operators with right-recursive productions Term Example: 4 – 7 – 9 E Expr → Expr minus Expr E - T | Term Factor T F E - Term → Term times Term F T intlit | Factor 9 F intlit Factor → intlit | lparen Expr rparen 7 intlit 4

  22. Resolving Grammar Ambiguity: Associativity Expr → Expr minus Term Example: 4 – 7 – 9 | Term Let’s try to re-build the wrong parse tree again Term → Term times Factor | Factor Factor → intlit | lparen Expr rparen E E - T T F We’ll never be able to derive minus intlit without parens 4

  23. Example • Language of Boolean expressions – bexp → TRUE bexp → FALSE bexp → bexp OR bexp bexp → bexp AND bexp bexp → NOT bexp bexp → LPAREN bexp RPAREN OR has lowest • Add nonterminals so that OR precedence, then AND AND , then NO NOT . Then change the grammar to reflect the fact that AND and OR OR are left associative. both AND • Draw a parse tree for the expression: – true AND NOT true.

  24. Another ambiguous example Consider this word in this grammar: if a then if b then s else s2 Stmt → How would you derive it? if Cond th en Stmt | if then if Cond th en Stmt el se Stmt | … if then else

  25. Summary To understand how a parser works, we start by understanding co context-fr free grammars , which are used to define the language recognized by the parser. terminal symbol – (non)terminal symbol – grammar rule (or production) – derivation (leftmost derivation, rightmost derivation) – parse (or derivation) tree – the language defined by a grammar – ambiguous grammar

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend