Compiling T echniques Lecture 5: Introduction to Parsing - - PowerPoint PPT Presentation

compiling t echniques
SMART_READER_LITE
LIVE PREVIEW

Compiling T echniques Lecture 5: Introduction to Parsing - - PowerPoint PPT Presentation

Compiling T echniques Lecture 5: Introduction to Parsing Christophe Dubach Overview Context Free Grammars Derivations and Parse T rees Ambiguity T op-Down Parsing Left Recursion Front End: Parser IR tokens Source Parser Scanner


slide-1
SLIDE 1

Compiling T echniques

Lecture 5: Introduction to Parsing

Christophe Dubach

slide-2
SLIDE 2

Overview

Context Free Grammars Derivations and Parse T rees Ambiguity T

  • p-Down Parsing

Left Recursion

slide-3
SLIDE 3

Front End: Parser

Checks the stream of words and their parts of speech (produced by the scanner) for grammatical correctness Determines if the input is syntactically well formed Guides checking at deeper levels than syntax Builds an IR representation of the code Think of this as the mathematics of diagramming sentences

Source code Scanner

IR

Parser

Errors tokens

slide-4
SLIDE 4

The Study of Parsing

The process of discovering a derivation for some sentence Need a mathematical model of syntax — a grammar G Need an algorithm for testing membership in L(G) Need to keep in mind that our goal is building parsers, not studying the mathematics of arbitrary languages Roadmap Context-free grammars and derivations T

  • p-down parsing: Recursive descent parsers

LL(1) == Left-to-right, Leftmost derivation, 1 token of lookahead Bottom-up parsing: Operator precedence parser LR(1) == Left-to-right, Rightmost derivation, 1 token of lookahead

slide-5
SLIDE 5

Specifying Syntax with a Grammar

Context-free syntax is specifjed with a grammar This grammar defjnes the set of noises that a sheep makes under normal circumstances It is written in a variant of Backus–Naur Form (BNF) Formally, a grammar G = (S,N,T,P) S is the start symbol N is a set of non-terminal symbols T is a set of terminal symbols or words P is a set of productions or rewrite rules (P:N→N∪T)

SheepNoise

→ |

SheepNoise baa baa 1 2

slide-6
SLIDE 6

Deriving Syntax

We can use the SheepNoise grammar to create sentences: use the productions as rewriting rules

And so on ...

While it is cute, this example quickly runs out of intellectual steam ...

slide-7
SLIDE 7

A More Useful Grammar

Such a sequence of rewrites is called a derivation Process of discovering a derivation is called parsing

Expr Op

→ | | → | | |

Expr Op Expr num id +

  • *

/ 1 2 3 4 5 6 7

this derivation represents x - 2 * y

slide-8
SLIDE 8

Derivations

At each step, we choose a non-terminal to replace Difgerent choices can lead to difgerent derivations T wo derivations are of interest Leftmost derivation — replace leftmost NT at each step Rightmost derivation — replace rightmost NT at each step These are the two systematic derivations (We don’t care about randomly-ordered derivations!) The example on the preceding slide was a leftmost derivation Of course, there is also a rightmost derivation Interestingly, it turns out to be difgerent

slide-9
SLIDE 9

The T wo Derivations for x – 2 * y

In both cases, id – num * id The two derivations produce difgerent parse trees The parse trees imply difgerent evaluation orders!

Leftmost derivation Rightmost derivation

slide-10
SLIDE 10

Derivations and Parse Trees

G x E E Op – 2 E E E y Op *

LEFTMOST DERIVATION

This evaluates as x – ( 2 * y )

slide-11
SLIDE 11

Derivations and Parse Trees

RIGHTMOST DERIVATION

This evaluates as ( x – 2 ) * y

x 2 G E Op E E E Op E y – *

slide-12
SLIDE 12

Derivations and Precedence

These two derivations point out a problem with the grammar: It has no notion of precedence, or implied order of evaluation T

  • add precedence

Create a non-terminal for each level of precedence Isolate the corresponding part of the grammar Force the parser to recognise high precedence subexpressions fjrst For algebraic expressions Multiplication and division, fjrst (level one) Subtraction and addition, next (level two)

slide-13
SLIDE 13

Derivations and Precedence

This grammar is slightly larger

  • T

akes more rewriting to reach some of the terminal symbols

  • Encodes expected precedence
  • Produces same parse tree

under leftmost & rightmost derivations Let’s see how it parses x - 2 * y

Goal Expr T erm Factor

→ → | | → | | → |

Expr Expr + T erm Expr - T erm T erm T erm * Factor T erm / Factor Factor number id 1 2 3 4 5 6 7 8 9

level

  • ne

level two

slide-14
SLIDE 14

Derivations and Precedence

The rightmost derivation

G E

E T F <id,x> T T F F * <num,2> <id,y>

Its parse tree This produces x – ( 2 * y ), along with an appropriate parse tree. Both the leftmost and rightmost derivations give the same expression, because the grammar directly encodes the desired precedence.

slide-15
SLIDE 15

Ambiguous Grammars

Our original expression grammar had other problems

  • This grammar allows multiple leftmost derivations for x - 2 * y
  • Hard to automate derivation if > 1 choice
  • The grammar is ambiguous

difgerent choice than the fjrst time

Expr Op

→ | | → | | |

Expr Op Expr num id +

  • *

/ 1 2 3 4 5 6 7

slide-16
SLIDE 16

T wo Leftmost Derivations for x – 2 * y

The Difgerence:

Difgerent productions chosen on the second step Both derivations succeed in producing x - 2 * y

Original choice New choice

slide-17
SLIDE 17

Ambiguous Grammars

If a grammar has more than one leftmost derivation for a single sentential form, the grammar is ambiguous If a grammar has more than one rightmost derivation for a single sentential form, the grammar is ambiguous

The leftmost and rightmost derivations for a sentential form may difger, even in an unambiguous grammar

Classic example — the if-then-else problem This ambiguity is entirely grammatical in nature

Stmt →

| |

if Expr then Stmt if Expr then Stmt else Stmt OtherStmt 1 2 3

slide-18
SLIDE 18

Ambiguity

then else if then if E1 E2 S2 S1

production 2, then production 1

then if then if E1 E2 S1 else S2

production 1, then production 2

This sentential form has two derivations if E1 then if E2 then S1 else S2

if E1 then if E2 then S1 else S2 if E1 then if E2 then S1 else S2

slide-19
SLIDE 19

Ambiguity

Removing the ambiguity Must rewrite the grammar to avoid generating the problem Match each else to innermost unmatched if (common sense rule)

Intuition: a NoElse always has no else on its last cascaded else if statement

With this grammar, the example has only one derivation

Stmt WithElse NoElse

→ | → | → |

WithElse NoElse if Expr then WithElse else WithElse OtherStmt if Expr then Stmt if Expr then WithElse else NoElse 1 2 3 4 5 6

slide-20
SLIDE 20

Ambiguity

if E1 then if E2 then S1 else S2 This binds the else controlling S2 to the inner if

Stmt WithElse NoElse

→ | → | → |

WithElse NoElse if Expr then WithElse else WithElse OtherStmt if Expr then Stmt if Expr then WithElse else NoElse 1 2 3 4 5 6

slide-21
SLIDE 21

Deeper Ambiguity

Ambiguity usually refers to confusion in the CFG (Context-Free Grammar) Consider the following case: a = f(17) In Algol-like languages, f could be either a function or an array In such cases, a context is required Need to track declarations Really a type issue, not context-free syntax Requires an extra-grammatical solution (not in the CFG) Must handle these with a difgerent mechanism Step outside the grammar rather than making it more complex

slide-22
SLIDE 22

Ambiguity - Final Word

Ambiguity arises from two distinct sources

  • Confusion in the context-free syntax (if-then-else)
  • Confusion that requires context to resolve (overloading)

Resolving ambiguity

  • T
  • remove context-free ambiguity, rewrite the grammar
  • T
  • handle context-sensitive ambiguity takes cooperation

→Knowledge of declarations, types, … →Accept a superset of L(G) & check it by other means →This is a language design problem

Sometimes, the compiler writer accepts an ambiguous grammar

Parsing techniques that “do the right thing”

→i.e., always select the same derivation

slide-23
SLIDE 23

Parsing T echniques

Top-down parsers (LL(1), recursive descent) Start at the root of the parse tree and grow toward leaves Pick a production & try to match the input Bad “pick” ⇒ may need to backtrack Some grammars are backtrack-free (predictive parsing) Bottom-up parsers (LR(1), operator precedence) Start at the leaves and grow toward root As input is consumed, encode possibilities in an internal state Start in a state valid for legal fjrst tokens Bottom-up parsers handle a large class of grammars

slide-24
SLIDE 24

T

  • p-Down Parsing

A top-down parser starts with the root of the parse tree The root node is labelled with the goal symbol of the grammar T

  • p-down parsing algorithm:

Construct the root node of the parse tree Repeat until the fringe of the parse tree matches the input string 1 At a node labelled A, select a production with A on its lhs and, for each symbol on its rhs, construct the appropriate child 2 When a terminal symbol is added to the fringe and it doesn’t match the fringe, backtrack 3 Find the next node to be expanded (label ∈ NT)

  • The key is picking the right production in step 1

→That choice should be guided by the input string

slide-25
SLIDE 25

Example

Let’s try x – 2 * y :

Goal Expr T erm + Expr T erm Fact. <id,x>

Leftmost derivation, choose productions in an order that exposes problems

Goal Expr T erm Factor

→ → | | → | | → |

Expr Expr + T erm Expr - T erm T erm T erm * Factor T erm / Factor Factor number id 1 2 3 4 5 6 7 8 9

slide-26
SLIDE 26

Example

Let’s try x – 2 * y : This worked well, except that “–” doesn’t match “+” The parser must backtrack to here

Goal Expr T erm + Expr T erm Fact. <id,x>

Goal Expr T erm Factor

→ → | | → | | → |

Expr Expr + T erm Expr - T erm T erm T erm * Factor T erm / Factor Factor number id 1 2 3 4 5 6 7 8 9

slide-27
SLIDE 27

Example

Continuing with x – 2 * y :

Goal Expr T erm – Expr T erm Fact. <id,x>

This time, “–” and “–” matched We can advance past “–” to look at “2”

Goal Expr T erm Factor

→ → | | → | | → |

Expr Expr + T erm Expr - T erm T erm T erm * Factor T erm / Factor Factor number id 1 2 3 4 5 6 7 8 9

slide-28
SLIDE 28

Example

Trying to match the “2” in x – 2 * y : Where are we?

  • “2” matches “2”
  • We have more input, but no NTs left to expand
  • The expansion terminated too soon

⇒ Need to backtrack

Goal Expr T erm – Expr T erm Fact. <id,x> Fact. <num,2>

Goal Expr T erm Factor

→ → | | → | | → |

Expr Expr + T erm Expr - T erm T erm T erm * Factor T erm / Factor Factor number id 1 2 3 4 5 6 7 8 9

slide-29
SLIDE 29

Example

T rying again with “2” in x – 2 * y : This time, we matched & consumed all the input ⇒ Success!

Goal Expr T erm – Expr T erm Fact. <id,x> Fact. <id,y> T erm Fact. <num,2> *

Goal Expr T erm Factor

→ → | | → | | → |

Expr Expr + T erm Expr - T erm T erm T erm * Factor T erm / Factor Factor number id 1 2 3 4 5 6 7 8 9

slide-30
SLIDE 30

Left Recursion

T

  • p-down parsers cannot handle left-recursive grammars

Formally, A grammar is left recursive if ∃ A ∈ NT such that ∃ derivation A → Aα+, for some string α ∈ (NT ∪ T )+ Our expression grammar is left recursive

  • This can lead to non-termination in a top-down parser
  • For a top-down parser, any recursion must be right recursion
  • We would like to convert the left recursion to right recursion

Non-termination is a bad property in any part of a compiler

slide-31
SLIDE 31

Eliminating Left Recursion

T

  • remove left recursion, we can transform the grammar

Consider a grammar fragment of the form Fee → Fee β | α where neither α nor β start with Fee We can rewrite this as Fee → α Faa Faa → β Faa | ε where Faa is a new non-terminal This accepts the same language, but uses only right recursion Exercise: eliminate left recursion from previous grammar

slide-32
SLIDE 32

Preview

T

  • p-Down Parsing: Recursive Descent

LL(1) Property T able-driven LL(1) parsers