compiler construction
play

Compiler Construction Lecture 6: Top-down parsing and LL(1) parser - PowerPoint PPT Presentation

Compiler Construction Lecture 6: Top-down parsing and LL(1) parser construction 2020-01-24 Michael Engel Includes material by Jan Christian Meyer Overview Ambiguity of grammars revisited Elimination of left recursion Top-down


  1. Compiler Construction Lecture 6: Top-down parsing and LL(1) parser construction 2020-01-24 Michael Engel Includes material by Jan Christian Meyer

  2. Overview • Ambiguity of grammars revisited • Elimination of left recursion • Top-down parsing • Recursive descent parsers: 
 structure and implementation • Table-driven LL(1) parsers • Table generation Compiler Construction 06: Top-down, LL(1) parsing � 2

  3. Ambiguity of grammars Syntax analysis • For the compiler, it is important that each sentence in the language defined by a context-free grammar has a unique rightmost (or leftmost) derivation • A grammar in which multiple rightmost (or leftmost) derivations exist for a sentence is called an ambiguous grammar • it can produce multiple derivations and multiple parse trees • Multiple parse trees imply multiple possible meanings for a single program! ⚡ Compiler Construction 06: Top-down, LL(1) parsing � 3

  4. Ambiguity of grammars: example Syntax analysis " dangling else "- 1 Statement → if Expr t hen Statement e l se Statement 
 problem in 2 | if Expr t hen Statement ALGOL-like 3 | Assignment 4 | …other statements… languages 
 (e.g. PASCAL) "else" part is optional This statement if Expr1 t hen if Expr2 t hen Assignment1 e l se Assignment2 has two distinct rightmost derivations with different behaviors: Statement Statement Expr 1 if then Statement else Statement if Expr 1 then Statement Expr 2 then Statement Assignment 2 if if Expr 2 then Statement else Statement Assignment 1 Assignment 1 Assignment 2 Compiler Construction 06: Top-down, LL(1) parsing � 4

  5. Removing ambiguity Syntax analysis We can modify the grammar to include a rule that determined which if controls an else : 1 Statement → if Expr t hen Statement 
 2 | if Expr t hen WithElse e l se Statement 3 | Assignment 4 WithElse → if Expr t hen WithElse e l se WithElse 5 | Assignment This solution restricts the set of statements that can occur in the then part of an if-then-else construct • It accepts the same set of sentences as the original grammar • but ensures that each else has an unambiguous match to a specific if Compiler Construction 06: Top-down, LL(1) parsing � 5

  6. Removing ambiguity: example Syntax analysis The modified grammar 
 1 Statement → if Expr t hen Statement 
 has only one rightmost 
 2 | if Expr t hen WithElse e l se Statement 3 | Assignment derivation for the example 4 WithElse → if Expr t hen WithElse e l se WithElse 5 | Assignment if Expr1 t hen if Expr2 t hen Assignment1 e l se Assignment2 Rule Sentential form Statement 1 if Expr t hen Statement 2 if Expr t hen if Expr t hen WithElse e l se Statement 3 if Expr t hen if Expr t hen WithElse e l se Assignment 5 if Expr t hen if Expr t hen Assignment e l se Assignment Compiler Construction 06: Top-down, LL(1) parsing � 6

  7. Order of derivations Syntax analysis 1 Expr → "(" Expr ")" 
 Rightmost : 
 2 | Expr Op name rewrite, at each step, the rightmost nonterminal 3 | name Rule Sentential form 4 Op → + 
 Expr 5 | - Expr Op name 2 6 | × Expr × name 6 7 | ÷ "(" Expr ")" × name 1 "(" Expr Op name ")" × name 2 "(" Expr + name ")" × name 4 Expr "(" name + name ")" × name 3 Leftmost : rewrite, at each step, the leftmost nonterminal Expr Op name( c ) Rule Sentential form Expr Expr Op name 2 "(" ")" Expr × 1 "(" Expr ")" Op name 2 "(" Expr Op name ")" Op name Expr Op name(b) 3 "(" name Op name ")" Op name 4 "(" name + name ")" Op name parse tree 
 "(" name + name ")" × name 6 identical for both! name(a) + Compiler Construction 06: Top-down, LL(1) parsing � 7

  8. Left factoring Syntax analysis • Parsers (and scanners) only have a limited lookahead to upcoming tokens • Example: given a production A → ab c de f X g h | ab c de f Y g h the parser is unable to choose between the two production if it can only look one character ahead • As with NFA → DFA conversion, if we can postpone the decision until it makes a difference, that works • Rewriting the grammar as 
 A → ab c de f A’ 
 A’ → X g h | Y g h preserves the language by adding one production to collect a common prefix shared by several other productions Compiler Construction 06: Top-down, LL(1) parsing � 8

  9. 
 Left recursion Syntax analysis • Let’s consider this grammar for a list of 'a’s: 
 A → A a | a which derives the following words: 
 A → a A → A a → aa A → A a → A aa → aaa … • The production A → A a is left recursive , the head (nonterminal symbol) always appears on the left side of the production Compiler Construction 06: Top-down, LL(1) parsing � 9

  10. 
 An equivalent grammar Syntax analysis • The same sequences can be generated by this grammar: 
 A → a A’ the empty string 𝜁 
 returns from the 
 A’ → a A’ | 𝜁 production It derives the following words: 
 A → a A → a A’ → aa A’ → aa A → a A’ → aa A’ → aaa A’ → aaa … Compiler Construction 06: Top-down, LL(1) parsing � 10

  11. Eliminating left recursion Syntax analysis • If a nonterminal has m productions that are left recursive and 
 n productions that are not greek letters (except 𝜁 ) stand 
 A → A 𝛽 1 | A 𝛽 2 | A 𝛽 3 | … | A 𝛽 m for arbitrary combinations of other (non-)terminals A → 𝛾 1 | 𝛾 2 | 𝛾 3 | … | 𝛾 n we can introduce A’ and rewrite the productions as (see [1]): 
 A → 𝛾 1 A’ | 𝛾 2 A’ | 𝛾 3 A’ | … | 𝛾 n A’ A’ → 𝛽 1 A’ | 𝛽 2 A’ | 𝛽 3 A’ | … | 𝛽 m A’ | 𝜁 • This generates the same language and removes (immediate) left recursion • “Immediate” because left recursion can also happen in several steps (indirectly), e.g. in the following productions A → B x and B → A y result in A → B x → A y x Here, A again shows up on the left when derived from A Compiler Construction 06: Top-down, LL(1) parsing � 11

  12. What can we do with CFGs now? Syntax analysis • So far, we have encountered (see also [2]) 
 • Context-Free Grammars, their derivations and syntax trees • Ambiguous grammars, and mentioned that there’s no single, true way to disambiguate them (it depends on what we want them to stand for) 
 • Left factoring, which always shortens the distance to the next nonterminal 
 • Left recursion elimination, which always shifts a nonterminal to the right Compiler Construction 06: Top-down, LL(1) parsing � 12

  13. Recursive descent parsing Syntax analysis • Example: grammar that models "if" and "while" statements: P → if COND t hen STATEMENT end 
 | if COND t hen STATEMENT e l se STATEMENT end | wh il e COND do STATEMENT end • Let’s make it a bit simpler: P → i C t S z | i C t S e S z | w C d S z 
 C → c 
 S → s • Let us parse the string " ictsesz " • A top-down parser begins at the start symbol P and chooses a production: P ??? Compiler Construction 06: Top-down, LL(1) parsing � 13

  14. Recursive descent: what next? Syntax analysis • If we can only look ahead by one token and read an " i ", we can choose between two productions: P → i C t S z 
 | i C t S e S z • We cannot make this choice before seeing more of the token stream • Left factoring makes this problem decidable with only one character of lookahead • It generates the following grammar: P → i C t SP’ | w C d S z 
 P’ → z | e S z 
 C → c 
 S → s Compiler Construction 06: Top-down, LL(1) parsing � 14

  15. Recursive descent: what next? Syntax analysis • Now we only have one production P → i C t SP’ | w C d S z 
 to choose from when reading an " i ": 
 P’ → z | e S z 
 C → c 
 P → i C t SP’ 
 S → s • and we can generate the parse tree equivalent to the derivation: P i t C S P’ Compiler Construction 06: Top-down, LL(1) parsing � 15

  16. Recursive descent: going down… Syntax analysis • Recursive descent implies that we follow P → i C t SP’ | w C d S z 
 the children of the current parse tree P’ → z | e S z 
 node down to the leaves (which must be C → c 
 terminal symbols) 
 S → s • So let’s see if we can parse " ictsesz " • We follow the tree from P to its first child: P The input token sequence: ict sesz ↑ i t C S P’ • we have an "i" as lookahead the arrow indicates 
 the parser’s position ⇒ matches the first production for P! in the token stream • Now, the remaining token stream is " ctsesz " Compiler Construction 06: Top-down, LL(1) parsing � 16

  17. Backtrack and repeat Syntax analysis • we have an "i" as lookahead ⇒ match! P → i C t SP’ | w C d S z 
 P’ → z | e S z 
 • Now, the remaining token stream is C → c 
 " ctsesz " S → s • We return (backtrack) to P to continue parsing: P The input token sequence: i ct sesz ↑ i t C S P’ • This gives us the nonterminal C • A nonterminal cannot match any token, so we need to pick another production Compiler Construction 06: Top-down, LL(1) parsing � 17

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend