Compiling T echniques Lecture 5: Introduction to Parsing - PowerPoint PPT Presentation

Compiling T echniques Lecture 5: Introduction to Parsing Christophe Dubach

Overview Context Free Grammars Derivations and Parse T rees Ambiguity T op-Down Parsing Left Recursion

Front End: Parser IR tokens Source Parser Scanner code Errors Checks the stream of words and their parts of speech (produced by the scanner) for grammatical correctness Determines if the input is syntactically well formed Guides checking at deeper levels than syntax Builds an IR representation of the code Think of this as the mathematics of diagramming sentences

The Study of Parsing The process of discovering a derivation for some sentence Need a mathematical model of syntax — a grammar G Need an algorithm for testing membership in L(G) Need to keep in mind that our goal is building parsers, not studying the mathematics of arbitrary languages Roadmap Context-free grammars and derivations T op-down parsing: Recursive descent parsers LL(1) == L eft-to-right, L eftmost derivation, 1 token of lookahead Bottom-up parsing: Operator precedence parser LR(1) == L eft-to-right, R ightmost derivation, 1 token of lookahead

Specifying Syntax with a Grammar Context-free syntax is specifjed with a grammar 1 SheepNoise → SheepNoise baa 2 baa | This grammar defjnes the set of noises that a sheep makes under normal circumstances It is written in a variant of Backus–Naur Form (BNF) Formally, a grammar G = (S,N,T,P) S is the start symbol N is a set of non-terminal symbols T is a set of terminal symbols or words P is a set of productions or rewrite rules (P:N→N∪T)

Deriving Syntax We can use the SheepNoise grammar to create sentences: use the productions as rewriting rules And so on ... While it is cute, this example quickly runs out of intellectual steam ...

A More Useful Grammar 1 Expr Expr Op Expr → 2 num | 3 id | 4 Op + → 5 - | 6 * | 7 / | this derivation represents x - 2 * y Such a sequence of rewrites is called a derivation Process of discovering a derivation is called parsing

Derivations At each step, we choose a non-terminal to replace Difgerent choices can lead to difgerent derivations T wo derivations are of interest Leftmost derivation — replace leftmost NT at each step Rightmost derivation — replace rightmost NT at each step These are the two systematic derivations (We don’t care about randomly-ordered derivations!) The example on the preceding slide was a leftmost derivation Of course, there is also a rightmost derivation Interestingly, it turns out to be difgerent

The T wo Derivations for x – 2 * y Leftmost derivation Rightmost derivation In both cases, id – num * id The two derivations produce difgerent parse trees The parse trees imply difgerent evaluation orders!

Derivations and Parse Trees LEFTMOST DERIVATION G E E Op E x – Op E E 2 y * This evaluates as x – ( 2 * y )

Derivations and Parse Trees RIGHTMOST DERIVATION G E E Op E y E Op E * x 2 – This evaluates as ( x – 2 ) * y

Derivations and Precedence These two derivations point out a problem with the grammar: It has no notion of precedence , or implied order of evaluation T o add precedence Create a non-terminal for each level of precedence Isolate the corresponding part of the grammar Force the parser to recognise high precedence subexpressions fjrst For algebraic expressions Multiplication and division, fjrst (level one) Subtraction and addition, next (level two)

Derivations and Precedence 1 Goal Expr → This grammar is slightly larger 2 Expr Expr + T erm → level • T two akes more rewriting to reach 3 Expr - T erm | some of the terminal symbols 4 T erm | 5 T erm T erm * Factor → level • Encodes expected precedence one 6 T erm / Factor | • Produces same parse tree 7 Factor | under leftmost & rightmost 8 Factor number → derivations 9 id | Let’s see how it parses x - 2 * y

Derivations and Precedence G E – E T T T * F < id,y > F F < id,x > < num,2 > The rightmost derivation Its parse tree This produces x – ( 2 * y ), along with an appropriate parse tree. Both the leftmost and rightmost derivations give the same expression, because the grammar directly encodes the desired precedence.

Ambiguous Grammars Our original expression grammar had other problems 1 Expr Expr Op Expr → 2 num | 3 id | 4 Op + → 5 - | 6 * | 7 / | difgerent choice than the fjrst time • This grammar allows multiple leftmost derivations for x - 2 * y • Hard to automate derivation if > 1 choice • The grammar is ambiguous

T wo Leftmost Derivations for x – 2 * y The Difgerence: Difgerent productions chosen on the second step Both derivations succeed in producing x - 2 * y Original choice New choice

Ambiguous Grammars If a grammar has more than one leftmost derivation for a single sentential form, the grammar is ambiguous If a grammar has more than one rightmost derivation for a single sentential form, the grammar is ambiguous The leftmost and rightmost derivations for a sentential form may difger, even in an unambiguous grammar Classic example — the if-then-else problem 1 Stmt → if Expr then Stmt 2 if Expr then Stmt else Stmt | 3 OtherStmt | This ambiguity is entirely grammatical in nature

Ambiguity if E 1 then if E 2 then S 1 else S 2 if if E 1 E 1 then else then production 1, then production 2, then production 2 S 2 production 1 if if E 2 E 2 else then then S 1 S 1 S 2 This sentential form has two derivations if E 1 then if E 1 then if E 2 then if E 2 then S 1 S 1 else else S 2 S 2

Ambiguity Removing the ambiguity Must rewrite the grammar to avoid generating the problem Match each else to innermost unmatched if (common sense rule) 1 Stmt → WithElse 2 NoElse | 3 WithElse if Expr then WithElse else WithElse → 4 OtherStmt | 5 NoElse if Expr then Stmt → 6 if Expr then WithElse else NoElse | Intuition: a NoElse always has no else on its last cascaded else if statement With this grammar, the example has only one derivation

1 Stmt → WithElse 2 NoElse | Ambiguity 3 WithElse if Expr then WithElse else WithElse → 4 OtherStmt | 5 NoElse if Expr then Stmt → 6 if Expr then WithElse else NoElse | if E 1 then if E 2 then S 1 else S 2 This binds the else controlling S 2 to the inner if

Deeper Ambiguity Ambiguity usually refers to confusion in the CFG (Context-Free Grammar) Consider the following case: a = f(17) In Algol-like languages, f could be either a function or an array In such cases, a context is required Need to track declarations Really a type issue, not context-free syntax Requires an extra-grammatical solution (not in the CFG) Must handle these with a difgerent mechanism Step outside the grammar rather than making it more complex

Ambiguity - Final Word Ambiguity arises from two distinct sources • Confusion in the context-free syntax (if-then-else) • Confusion that requires context to resolve (overloading) Resolving ambiguity • T o remove context-free ambiguity, rewrite the grammar • T o handle context-sensitive ambiguity takes cooperation → Knowledge of declarations, types, … → Accept a superset of L(G) & check it by other means → This is a language design problem Sometimes, the compiler writer accepts an ambiguous grammar → Parsing techniques that “do the right thing” → i.e., always select the same derivation

Parsing T echniques Top-down parsers (LL(1), recursive descent) Start at the root of the parse tree and grow toward leaves Pick a production & try to match the input Bad “pick” ⇒ may need to backtrack Some grammars are backtrack-free (predictive parsing) Bottom-up parsers (LR(1), operator precedence) Start at the leaves and grow toward root As input is consumed, encode possibilities in an internal state Start in a state valid for legal fjrst tokens Bottom-up parsers handle a large class of grammars

T op-Down Parsing A top-down parser starts with the root of the parse tree The root node is labelled with the goal symbol of the grammar T op-down parsing algorithm: Construct the root node of the parse tree Repeat until the fringe of the parse tree matches the input string 1 At a node labelled A, select a production with A on its lhs and, for each symbol on its rhs, construct the appropriate child 2 When a terminal symbol is added to the fringe and it doesn’t match the fringe, backtrack 3 Find the next node to be expanded (label ∈ NT) • The key is picking the right production in step 1 → That choice should be guided by the input string

1 Goal Expr → 2 Expr Expr + T erm → Example 3 Expr - T erm | 4 T erm | 5 T erm T erm * Factor → 6 T erm / Factor | 7 Factor | 8 Factor number → 9 | id Let’s try x – 2 * y : Goal Expr Expr + T erm T erm Fact. < id,x > Leftmost derivation, choose productions in an order that exposes problems

1 Goal Expr → 2 Expr Expr + T erm → Example 3 Expr - T erm | 4 T erm | T erm 5 T erm * Factor → 6 T erm / Factor | 7 Factor | 8 Factor number → 9 | id Let’s try x – 2 * y : Goal Expr Expr + T erm T erm Fact. < id,x > This worked well, except that “–” doesn’t match “+” The parser must backtrack to here

Compiling T echniques Lecture 5: Introduction to Parsing - PowerPoint PPT Presentation

Compiling T echniques Lecture 5: Introduction to Parsing Christophe Dubach Overview Context Free Grammars Derivations and Parse T rees Ambiguity T op-Down Parsing Left Recursion Front End: Parser IR tokens Source Parser Scanner

Compiling T echniques Lecture 7: Bottom-Up Parsing Christophe Dubach Overview Bottom-Up

Introduction to Compiling Chapter 1 1 Compiler Construction Introduction to Compiling To Do

Always be Cross-compiling Matthew Bauer, John Ericson October 9, 2019 Always be cross compiling

Compiling & Debugging FF Compiling & Running FF (Linux & Mac) System Requirement:

Compiling Java for Real-Time Systems Anders Nilsson andersn@cs.lth.se Department of Computer

By Grant Nelson Goals Virtual Ubuntu Compiling a New Kernel Complications

Compiling Techniques Lecture 7: Abstract Syntax Christophe Dubach 3 October 2017 Christophe

Compiling topic-specific corpora from limited-access online databases Costas Gabrielatos

On Another Level: How to Debug Compiling Query Engines Timo Kersten and Thomas Neumann Technical

EMSCRIPTEN - COMPILING LLVM BITCODE TO JAVASCRIPT (?!) ALON ZAKAI (MOZILLA) @kripken

Compiling Techniques Lecture 3: Introduction to Lexical Analysis Christophe Dubach 22 September

C Programming for Engineers Arrays ICEN 360 Spring 2017 Prof. Dola Saha 1 Compiling your

LLVMLinux: Compiling Android with LLVM Presented by: Behan Webster Presentation Date:

Compiling occam to C with Tock Adam Sampson ats@offog.org University of Kent

Compiling Techniques Lecture 2: The view from 35000 feet Christophe Dubach 19 September 2017

Compiling Techniques Lecture 1: Introduction Christophe Dubach 17 September 2019 Christophe

Coalgebraic Semantics for Parallel Derivation Strategies in Logic Programming. Ekaterina

Context-Free Grammars Z. Sawa (TU Ostrava) Introd. to Theoretical Computer Science April 20,

Generation of Verification Conditions Andreas Podelski November 15, 2011 mechanization of

Derivation of 1d and 2d GrossPitaevskii equations for strongly confined 3d bosons Lea Bomann

Context Free Languages and 1 constant memory computation. Grammars NFA + stack 2 context

Constructing a Parse Tree Initial Conditions Before we begin we need to have two things: 1. A

Overview CS20a: summary (Oct 15, 2002) So-far: regular languages DFA = NFA = e-NFA =

Context Free Grammars and Languages 5DV037 Fundamentals of Computer Science Ume a

Compiling T echniques Lecture 5: Introduction to Parsing - PowerPoint PPT Presentation

Compiling T echniques Lecture 5: Introduction to Parsing Christophe Dubach Overview Context Free Grammars Derivations and Parse T rees Ambiguity T op-Down Parsing Left Recursion Front End: Parser IR tokens Source Parser Scanner

Compiling T echniques Lecture 7: Bottom-Up Parsing Christophe Dubach Overview Bottom-Up

Introduction to Compiling Chapter 1 1 Compiler Construction Introduction to Compiling To Do

Always be Cross-compiling Matthew Bauer, John Ericson October 9, 2019 Always be cross compiling

Compiling &amp; Debugging FF Compiling &amp; Running FF (Linux &amp; Mac) System Requirement:

Compiling Java for Real-Time Systems Anders Nilsson andersn@cs.lth.se Department of Computer

By Grant Nelson Goals Virtual Ubuntu Compiling a New Kernel Complications

Compiling Techniques Lecture 7: Abstract Syntax Christophe Dubach 3 October 2017 Christophe

Compiling topic-specific corpora from limited-access online databases Costas Gabrielatos

On Another Level: How to Debug Compiling Query Engines Timo Kersten and Thomas Neumann Technical

EMSCRIPTEN - COMPILING LLVM BITCODE TO JAVASCRIPT (?!) ALON ZAKAI (MOZILLA) @kripken

Compiling Techniques Lecture 3: Introduction to Lexical Analysis Christophe Dubach 22 September

C Programming for Engineers Arrays ICEN 360 Spring 2017 Prof. Dola Saha 1 Compiling your

LLVMLinux: Compiling Android with LLVM Presented by: Behan Webster Presentation Date:

Compiling occam to C with Tock Adam Sampson ats@offog.org University of Kent

Compiling Techniques Lecture 2: The view from 35000 feet Christophe Dubach 19 September 2017

Compiling Techniques Lecture 1: Introduction Christophe Dubach 17 September 2019 Christophe

Coalgebraic Semantics for Parallel Derivation Strategies in Logic Programming. Ekaterina

Context-Free Grammars Z. Sawa (TU Ostrava) Introd. to Theoretical Computer Science April 20,

Generation of Verification Conditions Andreas Podelski November 15, 2011 mechanization of

Derivation of 1d and 2d GrossPitaevskii equations for strongly confined 3d bosons Lea Bomann

Context Free Languages and 1 constant memory computation. Grammars NFA + stack 2 context

Constructing a Parse Tree Initial Conditions Before we begin we need to have two things: 1. A

Overview CS20a: summary (Oct 15, 2002) So-far: regular languages DFA = NFA = e-NFA =

Context Free Grammars and Languages 5DV037 Fundamentals of Computer Science Ume a

Compiling & Debugging FF Compiling & Running FF (Linux & Mac) System Requirement: