Defining syntax using CFGs Roadmap Last time Defined context-free - - PowerPoint PPT Presentation

defining syntax using cfgs roadmap
SMART_READER_LITE
LIVE PREVIEW

Defining syntax using CFGs Roadmap Last time Defined context-free - - PowerPoint PPT Presentation

Defining syntax using CFGs Roadmap Last time Defined context-free grammar This time CFGs for specifying a languages syntax Language membership List grammars Resolving ambiguity CFG Review G = (N,,P,S) Example:


slide-1
SLIDE 1

Defining syntax using CFGs

slide-2
SLIDE 2

Roadmap

Last time

– Defined context-free grammar

This time

– CFGs for specifying a language’s syntax

  • Language membership
  • List grammars
  • Resolving ambiguity
slide-3
SLIDE 3

CFG Review

  • G = (N,Σ,P,S)
  • ⇒!means “derives in

1 or more steps”

  • CFG generates a

string by applying productions until no non-terminals remain

Example: Nested parens N = { Q } Σ = { ( , ) } P = Q → ( Q ) | ε S = Q

slide-4
SLIDE 4

Formal Definition of a CFG’s Language

Let G = (N,Σ,P,S) be a CFG. Then L(G) = 𝑥 𝑇 ⇒! 𝑥 where S is the start nonterminal of G, and w is a sequence that consists of (only) terminal symbols or 𝜁

slide-5
SLIDE 5

A CFG Defines a Language

CFG productions define the syntax of a language We call this notation “BNF” (for “Backus-Naur Form”) or “extended BNF” HTTP grammar using BNF:

– http://www.w3.org/Protocols/rfc2616/rfc2616-sec2.html

  • 1. Prog → begin Stmts end
  • 2. Stmts → Stmts semicolon Stmt

3. | Stmt

  • 4. Stmt → id assign Expr
  • 5. Expr

→ id 6. | Expr plus id

slide-6
SLIDE 6

List Grammars

  • Useful to repeat a structure arbitrarily often

Stmts → Stmts semicolon Stmt | Stmt

Stmts ; Stmts Stmt Stmts ; Stmt Stmts Stmt Stmts ; Stmts Stmt Stmts ; List skews left …

slide-7
SLIDE 7

List Grammars

Stmts ; Stmts Stmt Stmts ; Stmt Stmts Stmt Stmts ; Stmts Stmt Stmts ; List skews right

  • Useful to repeat a structure arbitrarily often

Stmts → Stmt semicolon Stmts | Stmt

slide-8
SLIDE 8

List Grammars

Stmts ; Stmts Stmts

  • What if we allowed both “skews”?

Stmts → Stmts semicolon Stmts | Stmt

Stmts ; Stmts Stmts ; Stmts Stmt Stmts ; Stmts

slide-9
SLIDE 9

Derivation Order

  • Leftmost Derivation: always expand the leftmost nonterminal
  • Rightmost Derivation: always expand the rightmost nonterminal
  • 1. Prog → begin Stmts end
  • 2. Stmts → Stmts semicolon Stmt

3. | Stmt

  • 4. Stmt → id assign Expr
  • 5. Expr

→ id 6. | Expr plus id Stmt Stmts end begin Stmts Prog semicolon Leftmost expands this nonterminal Rightmost expands this nonterminal

slide-10
SLIDE 10

Ambiguity

Even with a fixed derivation order, it is possible to derive the same string in multiple ways For Grammar G and string w

–G is ambiguous if

  • >1 leftmost derivation of w
  • >1 rightmost derivation of w
  • > 1 parse tree for w
slide-11
SLIDE 11

Exercise

  • Give a grammar G and a word w that has more

than 1 left-most derivation in G

slide-12
SLIDE 12

Example: Ambiguous Grammars

Expr Expr minus Expr Expr → intlit | Expr minus Expr | Expr times Expr | lparen Expr rparen

Derive the string 4 - 7 * 3 (assume tokenization)

Expr Expr times intlit intlit intlit 4 7 3 Expr times Expr Expr Expr minus intlit intlit 4 7 Expr intlit 3 Parse Tree 1 Parse Tree 2

slide-13
SLIDE 13

Why is Ambiguity Bad?

slide-14
SLIDE 14

Why is Ambiguity Bad?

Eventually, we’ll be using CFGs as the basis for our parser

– Parsing is much easier when there is no ambiguity in the grammar – The parse tree may mismatch user understanding!

Expr Expr minus Expr Expr Expr times intlit intlit intlit 4 7 3 Expr times Expr Expr Expr minus intlit intlit 4 7 Expr intlit 3 4 - 7 * 3 Operator precedence

slide-15
SLIDE 15

Resolving Grammar Ambiguity: Precedence

Intuitive problem

– Nonterminals are the same for both

  • perators

To fix precedence

– 1 nonterminal per precedence level – Parse lowest level first

Expr → intlit | Expr minus Expr | Expr times Expr | lparen Expr rparen

slide-16
SLIDE 16

Resolving Grammar Ambiguity: Precedence

lowest precedence level first 1 nonterminal per precedence level Expr → intlit | Expr minus Expr | Expr times Expr | lparen Expr rparen Expr → Expr minus Expr | Term Term → Term times Term | Factor Factor → intlit | lparen Expr rparen Expr Expr minus Expr Term Term Term times Factor Factor 3 intlit 7 intlit Term Factor intlit 4

Derive the string 4 - 7 * 3

slide-17
SLIDE 17

Resolving Grammar Ambiguity: Precedence

Fixed Grammar Expr → expr minus expr | Term Term → Term times Term | Factor Factor → intlit | lparen Expr rparen Expr Expr minus Expr Term Term Term times Factor Factor 3 intlit 7 intlit Term Factor intlit 4

Derive the string 4 - 7 * 3

Let’s try to re-build the wrong parse tree Term Term times Term Factor intlit 3 Expr We’ll never be able to derive minus without parens

slide-18
SLIDE 18

Did we fix all ambiguity?

Fixed Grammar Expr → Expr minus Expr | Term Term → Term times Term | Factor Factor → intlit | lparen Expr rparen

Derive the string 4 - 7 - 3

Expr Expr minus Expr intlit Term Factor intlit Term Factor intlit Term Factor Expr Expr minus

NO!

These subtrees could have been swapped!

slide-19
SLIDE 19

Where we are so far

Precedence

– We want correct behavior on 4 – 7 * 9 – A new nonterminal for each precedence level

Associativity

– We want correct behavior on 4 – 7 – 9 – Minus should be left associative: a – b – c = (a – b) – c – Problem: the recursion in a rule like Expr → Expr mi minus Expr

slide-20
SLIDE 20

Definition: Recursion in Grammars

  • A

A gr grammar is s recu cursive in in (n (nonter ermin minal) al) X if if

𝑌 ⇒! α𝑌γ for non-empty strings of symbols α and γ

  • A

A gr grammar is s le left ft-recu cursive in in X if if 𝑌 ⇒! 𝑌γ for non-empty string of symbols γ

  • A

A gr grammar is s rig right-recu cursive in in X if if 𝑌 ⇒! α𝑌 for non-empty string of symbols α

slide-21
SLIDE 21

Resolving Grammar Ambiguity: Associativity

Recognize left-assoc operators with left-recursive productions Recognize right-assoc operators with right-recursive productions

Expr → Expr minus Expr | Term Term → Term times Term | Factor Factor → intlit | lparen Expr rparen

Term

Factor

Example: 4 – 7 – 9 E

  • E

intlit T F intlit T F intlit T F E

  • 4

7 9

slide-22
SLIDE 22

Expr → Expr minus Term | Term Term → Term times Factor | Factor Factor → intlit | lparen Expr rparen Example: 4 – 7 – 9 E

  • E

T intlit T F 4

Resolving Grammar Ambiguity: Associativity

Let’s try to re-build the wrong parse tree again We’ll never be able to derive minus without parens

slide-23
SLIDE 23

Example

  • Language of Boolean expressions

– bexp → TRUE bexp → FALSE bexp → bexp OR bexp bexp → bexp AND bexp bexp → NOT bexp bexp → LPAREN bexp RPAREN

  • Add nonterminals so that OR

OR has lowest precedence, then AND AND, then NO

  • NOT. Then change

the grammar to reflect the fact that both AND AND and OR OR are left associative.

  • Draw a parse tree for the expression:

– true AND NOT true.

slide-24
SLIDE 24
slide-25
SLIDE 25

Another ambiguous example

Stmt →

if if Cond th then en Stmt | if if Cond th then en Stmt el else se Stmt | …

Consider this word in this grammar: if a then if b then s else s2 How would you derive it?

slide-26
SLIDE 26

Summary

To understand how a parser works, we start by understanding co context-fr free grammars, which are used to define the language recognized by the parser. terminal symbol

– (non)terminal symbol – grammar rule (or production) – derivation (leftmost derivation, rightmost derivation) – parse (or derivation) tree – the language defined by a grammar – ambiguous grammar