Compiler Construction Lecture 6: Top-down parsing and LL(1) parser - PowerPoint PPT Presentation

Compiler Construction Lecture 6: Top-down parsing and LL(1) parser construction 2020-01-24 Michael Engel Includes material by Jan Christian Meyer

Overview • Ambiguity of grammars revisited • Elimination of left recursion • Top-down parsing • Recursive descent parsers:   structure and implementation • Table-driven LL(1) parsers • Table generation Compiler Construction 06: Top-down, LL(1) parsing � 2

Ambiguity of grammars Syntax analysis • For the compiler, it is important that each sentence in the language defined by a context-free grammar has a unique rightmost (or leftmost) derivation • A grammar in which multiple rightmost (or leftmost) derivations exist for a sentence is called an ambiguous grammar • it can produce multiple derivations and multiple parse trees • Multiple parse trees imply multiple possible meanings for a single program! ⚡ Compiler Construction 06: Top-down, LL(1) parsing � 3

Ambiguity of grammars: example Syntax analysis " dangling else "- 1 Statement → if Expr t hen Statement e l se Statement   problem in 2 | if Expr t hen Statement ALGOL-like 3 | Assignment 4 | …other statements… languages   (e.g. PASCAL) "else" part is optional This statement if Expr1 t hen if Expr2 t hen Assignment1 e l se Assignment2 has two distinct rightmost derivations with different behaviors: Statement Statement Expr 1 if then Statement else Statement if Expr 1 then Statement Expr 2 then Statement Assignment 2 if if Expr 2 then Statement else Statement Assignment 1 Assignment 1 Assignment 2 Compiler Construction 06: Top-down, LL(1) parsing � 4

Removing ambiguity Syntax analysis We can modify the grammar to include a rule that determined which if controls an else : 1 Statement → if Expr t hen Statement   2 | if Expr t hen WithElse e l se Statement 3 | Assignment 4 WithElse → if Expr t hen WithElse e l se WithElse 5 | Assignment This solution restricts the set of statements that can occur in the then part of an if-then-else construct • It accepts the same set of sentences as the original grammar • but ensures that each else has an unambiguous match to a specific if Compiler Construction 06: Top-down, LL(1) parsing � 5

Removing ambiguity: example Syntax analysis The modified grammar   1 Statement → if Expr t hen Statement   has only one rightmost   2 | if Expr t hen WithElse e l se Statement 3 | Assignment derivation for the example 4 WithElse → if Expr t hen WithElse e l se WithElse 5 | Assignment if Expr1 t hen if Expr2 t hen Assignment1 e l se Assignment2 Rule Sentential form Statement 1 if Expr t hen Statement 2 if Expr t hen if Expr t hen WithElse e l se Statement 3 if Expr t hen if Expr t hen WithElse e l se Assignment 5 if Expr t hen if Expr t hen Assignment e l se Assignment Compiler Construction 06: Top-down, LL(1) parsing � 6

Order of derivations Syntax analysis 1 Expr → "(" Expr ")"   Rightmost :   2 | Expr Op name rewrite, at each step, the rightmost nonterminal 3 | name Rule Sentential form 4 Op → +   Expr 5 | - Expr Op name 2 6 | × Expr × name 6 7 | ÷ "(" Expr ")" × name 1 "(" Expr Op name ")" × name 2 "(" Expr + name ")" × name 4 Expr "(" name + name ")" × name 3 Leftmost : rewrite, at each step, the leftmost nonterminal Expr Op name( c ) Rule Sentential form Expr Expr Op name 2 "(" ")" Expr × 1 "(" Expr ")" Op name 2 "(" Expr Op name ")" Op name Expr Op name(b) 3 "(" name Op name ")" Op name 4 "(" name + name ")" Op name parse tree   "(" name + name ")" × name 6 identical for both! name(a) + Compiler Construction 06: Top-down, LL(1) parsing � 7

Left factoring Syntax analysis • Parsers (and scanners) only have a limited lookahead to upcoming tokens • Example: given a production A → ab c de f X g h | ab c de f Y g h the parser is unable to choose between the two production if it can only look one character ahead • As with NFA → DFA conversion, if we can postpone the decision until it makes a difference, that works • Rewriting the grammar as   A → ab c de f A’   A’ → X g h | Y g h preserves the language by adding one production to collect a common prefix shared by several other productions Compiler Construction 06: Top-down, LL(1) parsing � 8

  Left recursion Syntax analysis • Let’s consider this grammar for a list of 'a’s:   A → A a | a which derives the following words:   A → a A → A a → aa A → A a → A aa → aaa … • The production A → A a is left recursive , the head (nonterminal symbol) always appears on the left side of the production Compiler Construction 06: Top-down, LL(1) parsing � 9

  An equivalent grammar Syntax analysis • The same sequences can be generated by this grammar:   A → a A’ the empty string 𝜁   returns from the   A’ → a A’ | 𝜁 production It derives the following words:   A → a A → a A’ → aa A’ → aa A → a A’ → aa A’ → aaa A’ → aaa … Compiler Construction 06: Top-down, LL(1) parsing � 10

Eliminating left recursion Syntax analysis • If a nonterminal has m productions that are left recursive and   n productions that are not greek letters (except 𝜁 ) stand   A → A 𝛽 1 | A 𝛽 2 | A 𝛽 3 | … | A 𝛽 m for arbitrary combinations of other (non-)terminals A → 𝛾 1 | 𝛾 2 | 𝛾 3 | … | 𝛾 n we can introduce A’ and rewrite the productions as (see [1]):   A → 𝛾 1 A’ | 𝛾 2 A’ | 𝛾 3 A’ | … | 𝛾 n A’ A’ → 𝛽 1 A’ | 𝛽 2 A’ | 𝛽 3 A’ | … | 𝛽 m A’ | 𝜁 • This generates the same language and removes (immediate) left recursion • “Immediate” because left recursion can also happen in several steps (indirectly), e.g. in the following productions A → B x and B → A y result in A → B x → A y x Here, A again shows up on the left when derived from A Compiler Construction 06: Top-down, LL(1) parsing � 11

What can we do with CFGs now? Syntax analysis • So far, we have encountered (see also [2])   • Context-Free Grammars, their derivations and syntax trees • Ambiguous grammars, and mentioned that there’s no single, true way to disambiguate them (it depends on what we want them to stand for)   • Left factoring, which always shortens the distance to the next nonterminal   • Left recursion elimination, which always shifts a nonterminal to the right Compiler Construction 06: Top-down, LL(1) parsing � 12

Recursive descent parsing Syntax analysis • Example: grammar that models "if" and "while" statements: P → if COND t hen STATEMENT end   | if COND t hen STATEMENT e l se STATEMENT end | wh il e COND do STATEMENT end • Let’s make it a bit simpler: P → i C t S z | i C t S e S z | w C d S z   C → c   S → s • Let us parse the string " ictsesz " • A top-down parser begins at the start symbol P and chooses a production: P ??? Compiler Construction 06: Top-down, LL(1) parsing � 13

Recursive descent: what next? Syntax analysis • If we can only look ahead by one token and read an " i ", we can choose between two productions: P → i C t S z   | i C t S e S z • We cannot make this choice before seeing more of the token stream • Left factoring makes this problem decidable with only one character of lookahead • It generates the following grammar: P → i C t SP’ | w C d S z   P’ → z | e S z   C → c   S → s Compiler Construction 06: Top-down, LL(1) parsing � 14

Recursive descent: what next? Syntax analysis • Now we only have one production P → i C t SP’ | w C d S z   to choose from when reading an " i ":   P’ → z | e S z   C → c   P → i C t SP’   S → s • and we can generate the parse tree equivalent to the derivation: P i t C S P’ Compiler Construction 06: Top-down, LL(1) parsing � 15

Recursive descent: going down… Syntax analysis • Recursive descent implies that we follow P → i C t SP’ | w C d S z   the children of the current parse tree P’ → z | e S z   node down to the leaves (which must be C → c   terminal symbols)   S → s • So let’s see if we can parse " ictsesz " • We follow the tree from P to its first child: P The input token sequence: ict sesz ↑ i t C S P’ • we have an "i" as lookahead the arrow indicates   the parser’s position ⇒ matches the first production for P! in the token stream • Now, the remaining token stream is " ctsesz " Compiler Construction 06: Top-down, LL(1) parsing � 16

Backtrack and repeat Syntax analysis • we have an "i" as lookahead ⇒ match! P → i C t SP’ | w C d S z   P’ → z | e S z   • Now, the remaining token stream is C → c   " ctsesz " S → s • We return (backtrack) to P to continue parsing: P The input token sequence: i ct sesz ↑ i t C S P’ • This gives us the nonterminal C • A nonterminal cannot match any token, so we need to pick another production Compiler Construction 06: Top-down, LL(1) parsing � 17

Compiler Construction Lecture 6: Top-down parsing and LL(1) parser - PowerPoint PPT Presentation

Compiler Construction Lecture 6: Top-down parsing and LL(1) parser construction 2020-01-24 Michael Engel Includes material by Jan Christian Meyer Overview Ambiguity of grammars revisited Elimination of left recursion Top-down

Compiler Construction Chapter 11 1 Compiler Construction Compiler Construction A New Compiler

Compiler Construction Compiler Construction 1 / 111 Mayer Goldberg \ Ben-Gurion University

Compiler Construction November 21, 2018 Compiler Construction November 21, 2018 1 / 102 Mayer

Compiler Construction Compiler Construction 1 / 54 Mayer Goldberg \ Ben-Gurion University Tuesday

Compiler Construction Compiler Construction 1 / 193 Mayer Goldberg \ Ben-Gurion University Friday

Compiler Construction October 20, 2018 Compiler Construction October 20, 2018 1 / 115 Mayer

Compiler Construction Compiler Construction 1 / 177 Mayer Goldberg \ Ben-Gurion University

Compiler Construction Compiler Construction 1 / 87 Mayer Goldberg \ Ben-Gurion University

Compiler Construction Compiler Construction 1 / 88 Mayer Goldberg \ Ben-Gurion University Tuesday

Compiler Construction Compiler Construction 1 / 104 Mayer Goldberg \ Ben-Gurion University Friday

Compiler Construction Compiler Construction 1 / 104 Mayer Goldberg \ Ben-Gurion University Monday

Compiler Construction October 31, 2018 Compiler Construction October 31, 2018 1 / 175 Mayer

Compiler Construction Compiler Construction 1 / 114 Mayer Goldberg \ Ben-Gurion University

Compiler Construction Compiler Construction 1 / 112 Mayer Goldberg \ Ben-Gurion University

Compiler Construction Christian Rinderknecht 31 October 2008 1 Why study compiler construction?

Compiler Construction Lecture 19: Code Generation V (Compiler Backend) Winter Semester 2018/19

Grammars and Parsing Forth mini-homework If there is a number on the stack, and we enter dup

CSE 105 THEORY OF COMPUTATION Fall 2016 http://cseweb.ucsd.edu/classes/fa16/cse105-abc/ Today's

Compilers and computer architecture From strings to ASTs (2): context free grammars Martin Berger

COMP3630/6360: Theory of Computation Semester 1, 2020 The Australian National University Context

Taaltheorie en Taalverwerking BSc Artificial Intelligence Raquel Fernndez Institute for Logic,

Parsing: Episode I Matthew Might University of Utah matt.might.net ucombinator.org

Scattering Amplitudes LECTURE 1 Jaroslav Trnka Center for Quantum Mathematics and Physics

Waves Tsunami caused by Sumatra earthquake Waves: Definitions Vibration ( V ) Back and forth