CS453 Intro and PA1 1 Augmenting the grammar with End of File - - PowerPoint PPT Presentation

cs453 intro and pa1 1
SMART_READER_LITE
LIVE PREVIEW

CS453 Intro and PA1 1 Augmenting the grammar with End of File - - PowerPoint PPT Presentation

Plan for Today Ambiguous Grammars E Ambiguous grammar: >1 parse tree for 1 sentence E E + Ambiguous Grammars E E Expression grammar parse tree 1 * Num Disambiguating ambiguous grammars E E * E Num (42) Num E


slide-1
SLIDE 1

CS453 Intro and PA1 1

CS453 Lecture Top-Down Predictive Parsers 1

Plan for Today

Ambiguous Grammars Disambiguating ambiguous grammars Predictive parsing FIRST and FOLLOW sets Predictive Parsing table

Ambiguous Grammars

Ambiguous grammar: >1 parse tree for 1 sentence Expression grammar parse tree 1 String parse tree 2

CS453 Lecture Top-Down Predictive Parsers 2

E à E * E E à E + E E à E - E E à ( E ) E à ID E à NUM 42 + 7 * 6 what about 42-7-6? E E E E E Num (42) Num (7) Num (6) + * E E E E E Num (42) Num (7) Num (6) * + Goal: disambiguate the grammar

Cause: the grammar did not specify the precedence nor the associativity of the operators +,-,* Two options: keep the ambiguous grammar, but add extra directives to the parser, so that only one tree is formed (See PA0.cup for Simple Expression Language) Rewrite the grammar, making the precedence and associativity explicit in the grammar.

CS453 Lecture Top-Down Predictive Parsers 3

Unambiguous grammar for simple expressions

CS453 Lecture Top-Down Predictive Parsers 4

Grammar E à à E + T | E-T | T parse tree T à à T * F | F F à à ( E ) | ID | NUM String 42+7*6 How is the precedence encoded? How is the associativity encoded? E E T T F Num (42) Num (7) Num (6) + * T F F

slide-2
SLIDE 2

CS453 Intro and PA1 2

Augmenting the grammar with End of File

Grammar defines the syntactically valid strings Parser recognizes them (same as reg.exp. and scanner) To deal with end-of-file we augment the grammar with an end-of-file symbol ($), and create a new start symbol: S’ à à S $

CS453 Lecture Top-Down Predictive Parsers 5

Predictive Parsing

Predictive parsing, such as recursive descent parsing, creates the parse tree TOP DOWN, starting at the start symbol. For each non-terminal N there is a method recognizing the strings that can be produced by N, with one (case) clause for each production. This worked great for a slightly changed version of our example from last lecture: because each clause could be uniquely identified by looking ahead

  • ne token. Let’s predictively build the parse tree for

if t { while b { x = 6 }} $

CS453 Lecture Top-Down Predictive Parsers 6 start -> stmts EOF start -> stmts EOF stmts -> stmts -> ε | stmt stmts | stmt stmts stmt -> ifStmt | whileStmt | ID = NUM stmt -> ifStmt | whileStmt | ID = NUM ifStmt -> IF id { stmts } ifStmt -> IF id { stmts } whileStmt -> WHILE id { stmts } whileStmt -> WHILE id { stmts }

When Predictive Parsing works, when it does not

What about our expression grammar: E à à E + T | E-T | T T à à T * F | F F à à ( E ) | ID | NUM The E method cannot decide looking one token ahead whether to predict E+T, E-T, or T. Same problem for T. Predictive parsing works for grammars where the first terminal symbol

  • f each sub expression provides enough information to decide which

production to use.

CS453 Lecture Top-Down Predictive Parsers 7

First

Given a phrase γ of terminals and non-terminals (a rhs of a production), FIRST(γ) is the set of all terminals that can begin a string derived from γ. FIRST(T*F) = ? FIRST(F)= ? FIRST(XYZ) = FIRST(X) ?

CS453 Lecture Top-Down Predictive Parsers 8

NO! X could produce ε and then FIRST(Y) comes into play we must keep track of which non terminals are NULLABLE

slide-3
SLIDE 3

CS453 Intro and PA1 3

Follow

It also turns out to be useful to determine which terminals can directly

follow a non terminal X (to decide parsing X is finished).

terminal t is in FOLLOW(X) if there is any derivation containing Xt. This can occur if the derivation contains XYZt and Y and Z are nullable

CS453 Lecture Top-Down Predictive Parsers 9 CS453 Lecture Top-Down Predictive Parsers 10

FIRST and FOLLOW sets

NULLABLE

– X is a nonterminal – nullable(X) is true if X can derive the empty string

FIRST

– FIRST(z) = {z}, where z is a terminal – FIRST(X) = union of all FIRST( rhsi ), where X is a nonterminal and X

  • > rhsi

– FIRST(rhsi) = union all of FIRST(sym) on rhs up to and including first nonnullable

FOLLOW(Y), only relevant when Y is a nonterminal

– look for Y in rhs of rules (lhs -> rhs) and union all FIRST sets for symbols after Y up to and including first nonnullable – if all symbols after Y are nullable then also union in FOLLOW(lhs)

Constructive Definition of nullable, first and follow

for each terminal t FIRST(t)={t} Another Transitive Closure algorithm: keep doing STEP until nothing changes STEP: for each production X à à Y1 Y2 … Yk

if Y1to Yk nullable (or k = 0) nullable(X) = true

for each i from 1 to k, each j from i+1 to k 1: if Y1…Yi-1 nullable (or i=1) FIRST(X) += FIRST(Yi) //+: union 2: if Yi+1…Yk nullable (or i=k) FOLLOW(Yi) += FOLLOW(X) 3: if Yi+1…Yj-1 nullable (or i+1=j) FOLLOW(Yi) += FIRST(Yj) We can compute nullable, then FIRST, and then FOLLOW

CS453 Lecture Top-Down Predictive Parsers 11

Class Exercise

Compute nullable, FIRST and FOLLOW for Z à à d | X Y Z X à à a | Y Y à à c | ε

CS453 Lecture Top-Down Predictive Parsers 12

slide-4
SLIDE 4

CS453 Intro and PA1 4

CS453 Lecture Top-Down Predictive Parsers 13

Constructing the Predictive Parser Table

A predictive parse table has a row for each non-terminal X, and a column for each input token t. Entries table[X,t] contain productions:

for each X -> gamma for each t in FIRST(gamma) table[X,t] = X->gamma if gamma is nullable for each t in FOLLOW(X) table[X,t] = X->gamma

Compute the predictive parse table for Z à à d | X Y Z X à à a | Y Y à à c | ε a c d X Xàa XàY XàY XàY Y Yà ε Yà ε Yà ε Yàc Z ZàXYZ ZàXYZ ZàXYZ Zàd

Multiple entries in the Predictive parse table: Ambiguity

An ambiguous grammar will lead to multiple entries in the parse table. Our grammar IS ambiguous, e.g. Z à à d but also Zà àXYZà àYZà àd For grammars with no multiple entries in the table, we can use the table to produce one parse tree for each valid sentence. We call these grammars LL(1): Left to right parse, Left-most derivation, 1 symbol lookahead. A recursive descent parser examines input left to right. The order it expands non-terminals is leftmost first, and it looks ahead 1 token.

CS453 Lecture Top-Down Predictive Parsers 14

Left recursion and Predictive parsing

What happens to the recursive descent parser if we have a left recursive production rule, e.g. E à à E+T|T E calls E calls E forever To eliminate left recursion we rewrite the grammar: from: to: E à à E + T | E-T | T E à àT E’ T à à T * F | F E’ à à + T E’ | - T E’ | ε F à à ( E ) | ID | NUM T à à F T’ T’ à à * T E’ | ε F à à ( E ) | ID | NUM replacing left recursion Xà àXγ | α (where α does not start with X) by right recursion, as X produces α γ* that can be produced right

  • recursively. Now we can augment the grammar (Sà

àE$), compute nullable, FIRST and FOLLOW, and produce an LL(1) predictive parse table, see Tiger Section 3.2.

CS453 Lecture Top-Down Predictive Parsers 15

Left Factoring

Left recursion does not work for predictive parsing. Neither does a grammar that has a non-terminal with two productions that start with a common phrase, so we left factor the grammar: E.g.: if statement: S à à IF t THEN S ELSE S | IF t THEN S | o becomes S à à IF t THEN S X | o Xà à ELSE S | ε When building the predictive parse table, there will be a multiple entries.

WHY?

CS453 Lecture Top-Down Predictive Parsers 16

S →αβ

1

S →αβ2

Left refactor S →αS'

S' → β1 | β2

slide-5
SLIDE 5

CS453 Intro and PA1 5

Dangling else problem: ambiguity

Given construct two parse trees for S à à IF t THEN S X | o IF t THEN IF t THEN o ELSE o Xà à ELSE S | ε

CS453 Lecture Top-Down Predictive Parsers 17

S IF t THEN S X IF t THEN S X ε ELSE

  • S
  • S

IF t THEN S X IF t THEN S X ε ELSE

  • S
  • Which is the correct parse tree? (C, Java rules)

Dangling else disambiguation

The correct parse tree is: We can get this parse tree by removing the Xà àε rule in the multiple entry slot in the parse tree. See written homework 2.

CS453 Lecture Top-Down Predictive Parsers 18

S IF t THEN S X IF t THEN S X ε ELSE

  • S
  • One more time

Balanced parentheses grammar 1:

S à à ( S ) | SS | ε

  • 1. Augment the grammar
  • 2. Construct Nullable, First and Follow
  • 3. Build the predictive parse table, what happens?

CS453 Lecture Top-Down Predictive Parsers 19

One more time, but this time with feeling …

Balanced parentheses grammar 1:

S à à ( S )S | ε

  • 1. Augment the grammar
  • 2. Construct Nullable, First and Follow
  • 3. Build the predictive parse table
  • 4. Using the predictive parse table, construct the parse tree for

( ) ( ( ) ) $ and ( ) ( ) ( ) $

CS453 Lecture Top-Down Predictive Parsers 20