Syntactical analysis Syntactical analysis Context-free grammars A - - PowerPoint PPT Presentation

syntactical analysis syntactical analysis
SMART_READER_LITE
LIVE PREVIEW

Syntactical analysis Syntactical analysis Context-free grammars A - - PowerPoint PPT Presentation

Syntactical analysis Syntactical analysis Context-free grammars A context-free grammar is a 4-tuple G = ( N , , P , S ) g ( ) Derivations 1. N is a set of non terminals Parse Trees 2. is a set of terminals (disjoint from N )


slide-1
SLIDE 1

Syntactical analysis

  • Context-free grammars
  • Derivations
  • Parse Trees
  • Left-recursive grammars
  • Top-down parsing
  • non-recursive predictive parsers
  • construction of parse tables

B i

  • Bottom-up parsing
  • shift/reduce parsers
  • LR parsers

G li d i

  • Generalized parsing
  • GLL parsers
  • GLR parsers
  • SGLR parsers

/ Faculteit Wiskunde en Informatica

PAGE 0 22-9-2011

  • SGLR parsers

Syntactical analysis

A context-free grammar is a 4-tuple G = (N, Σ, P, S) g ( )

1. N is a set of non terminals 2. Σ is a set of terminals (disjoint from N) 3 P is a subset of (N  Σ)*  N 3. P is a subset of (N  Σ)  N An element (α, A)  P is called a production A ::= α or α  A 4. S  N is the start symbol

The sets N Σ P are finite The sets N, Σ, P are finite

/ Faculteit Wiskunde en Informatica

PAGE 1 22-9-2011

Syntactical analysis

A context-free grammar can be consider as a simple rewrite system: A   if  A  P (, ,   (NΣ)*, A  N) Example N = {E}, Σ = {+,*,(,),-, a}, S = E, p { }, { , ,(,), , }, , P = { E + E  E E * E  E ( E )  E ( E )  E

  • E  E

a  E} Derivation: E  -E  -(E) -(E+E) -(a+E) -(a+a)

/ Faculteit Wiskunde en Informatica

PAGE 2 22-9-2011

Syntactical analysis

The language L(G) generated by the context-free grammar G = (N, Σ, P, S) is: L(G) = {w  Σ* | S + w} A sentence w  L(G) contains only terminals A sentential form  is a string of terminals and non-terminals which can be derived from S: S *  with   (N  Σ)* A sentence in L(G) is a sentential form in which no non-terminals

  • ccur

/ Faculteit Wiskunde en Informatica

PAGE 3 22-9-2011

slide-2
SLIDE 2

Syntactical analysis

Left/right derivations

f

  • There are choices to be made for each derivation step:
  • which non-terminal must be replaced?
  • which alternative of the selected non-terminal must be applied?
  • Always selecting the leftmost non-terminal in the sentential form gives

a leftmost derivation: lm

  • There exists also a rightmost derivation: rm
  • Consider the context-free grammar for expressions:
  • Leftmost derivation for -(a+a)

E  -E  -(E) -(E+E) -(a+E) -(a+a) ( ) ( ) ( ) ( )

  • Rightmost derivation for -(a+a)

E  -E  -(E) -(E+E) -(E+a) -(a+a)

/ Faculteit Wiskunde en Informatica

PAGE 4 22-9-2011

Syntactical analysis

A parse tree for a context-free grammar is a G = (N, Σ, P, S) tree:

1 The root is labeled with S (the start non terminal) 1. The root is labeled with S (the start non-terminal) 2. Each leaf is labeled with a terminal ( Σ)or ε 3. All other nodes are labeled with a non-terminal

If A is the label of a node and X1,…,Xn are the labels of the children (from left to right) then

X1,…,Xn  A

must be a production rule in G (with Xi is either a terminal or a non-terminal) S i l A ith l b l A hi h h tl hild ith Special case: ε  A with label A which has exactly one child with label ε

  • Also called a nullable non-terminal

/ Faculteit Wiskunde en Informatica

PAGE 5 22-9-2011

Syntactical analysis

Example:

E  -E  -(E) -(E+E) -(a+E) -(a+a) E  E  E  E  E  E  E

  • E -

E - E - E - E - E ( E ) ( E ) ( E ) ( E ) ( E ) E + E E + E E + E a a a

The parse tree abstracts from the derivation order

/ Faculteit Wiskunde en Informatica

PAGE 6 22-9-2011

p

Syntactical analysis

Acceptor and parser For each grammar G there exists a decision procedure (acceptor) AG for L(G): AG for L(G):

AG: STRING  {true, false}

such that

AG( ) t L(G) AG(w) = true  w  L(G)

A parser is an acceptor which constructs a parse tree as well.

  • A top-down parser constructs the tree starting from the root
  • A bottom-up parser constructs the tree starting from the leafs

/ Faculteit Wiskunde en Informatica

PAGE 7 22-9-2011

slide-3
SLIDE 3

Syntactical analysis

During parsing the following problems may occur:

  • The grammar is ambiguous
  • The grammar is left recursive
  • The grammar contains cycles

A grammar G is ambiguous if one word w  L(G) has at least two parse trees least two parse trees

  • Expression grammar without associativities and priorities
  • Dangling else problem

/ Faculteit Wiskunde en Informatica

PAGE 8 22-9-2011

Syntactical analysis

  • A grammar is immediate left recursive if the

g grammar contains a rule of the form A   A

  • A grammar is left recursive if there exists a non-

terminal A and a string   (N  Σ)* such that A  A terminal A and a string   (N  Σ)* such that A  A

  • This means that after one or more steps in a

derivation an occurrence of A reduces again to an g

  • ccurrence of A without recognizing any terminal in

the input sentence.

/ Faculteit Wiskunde en Informatica

PAGE 9 22-9-2011

Syntactical analysis

Examples of indirect left recursion

B   A A   B

  • r worse

B   A D A   B  ε  D  G  D

It is relatively easy to remove left recursion from a grammar

/ Faculteit Wiskunde en Informatica

PAGE 10 22-9-2011

Syntactical analysis

Elimination of left recursion

A   A   A

where  =/> ε where  =/> ε produce the sentential forms:  n A set of equivalent (non left recursive) rules are A set of equivalent (non left recursive) rules are

 A’  A  A’  A’ A’ ε  A’

/ Faculteit Wiskunde en Informatica

PAGE 11 22-9-2011

slide-4
SLIDE 4

Syntactical analysis

Example:

(1) E + T  E (immediate left rec.) (2) T  E (3) T * F  T (immediate left rec ) (3) T F  T (immediate left rec.) (4) F  T (5) ( E )  F (6) (6) a  F

Applying the left recursion elimination transformation: Applying the left recursion elimination transformation:

(1) E   E (with  = + T) (2)   E (with  = T)

/ Faculteit Wiskunde en Informatica

PAGE 12 22-9-2011

Syntactical analysis

Example:

(1’) T E’  E (2’) + T E’  E’ (2’’) ε  E’ (2 ) ε  E

the same for:

(3’) F T’  T (4’) * F T’  T’ (4’’) ε  T’ (4 ) ε  T

/ Faculteit Wiskunde en Informatica

PAGE 13 22-9-2011

Syntactical analysis

Indirect left recursion elimination

  • Suppose we have a rule of the form

B   A 1  B 2  B … n  B

  • The rule B   A is now transformed into:

1   A    A 2   A … n   A

/ Faculteit Wiskunde en Informatica

PAGE 14 22-9-2011

Syntactical analysis

This process is repeated until either

  • t   A; the process stops, or
  • A  A; the immediately left recursion elimination rule can be

applied applied

/ Faculteit Wiskunde en Informatica

PAGE 15 22-9-2011

slide-5
SLIDE 5

Syntactical analysis

Left factorization

I l i i ffi i h diff b h

  • In general it is efficient to move the difference between the

alternatives of a non-terminal as far as possible to the left

  • Productions of the form

 1  A  1  A  2  A …  n  A

  • Are equivalent with

 A’  A   A’ 1  A … n  A’

/ Faculteit Wiskunde en Informatica

PAGE 16 22-9-2011

Syntactical analysis

Example

if b then S else S  S if b then S  S

Only at the occurrence of else it can be decided which y alternative should have been selected An equivalent grammar is

if b then S S’  S else S  S’ ε  S’

/ Faculteit Wiskunde en Informatica

PAGE 17 22-9-2011

Syntactical analysis

  • Left recursion elimination and left factorization:
  • introduce new (extra) non-terminals
  • change the structure of the derivation tree
  • may influence semantic actions connected to grammar rules
  • may influence semantic actions connected to grammar rules

/ Faculteit Wiskunde en Informatica

PAGE 18 22-9-2011

Syntactical analysis

Top-down parsing

  • A top-down parser “guesses” the next alternative to be

recognized, and verifies whether this alternative can be recognized in the input. If not, another alternative will be tried C t t th t t ti t th t

  • Constructs the parse tree starting at the root
  • Finds the leftmost derivation of the sentence

Al i f d

  • Alternative types of top-down parsers:
  • recursive descent parser with backtracking
  • recursive descent parser without backtracking (“predictive parser”)

non recursive predictive parser (uses push down automaton)

  • non-recursive predictive parser (uses push-down automaton)
  • generalized parser

/ Faculteit Wiskunde en Informatica

PAGE 19 22-9-2011

slide-6
SLIDE 6

Syntactical analysis

  • Recursive descent parser with backtracking

G

  • Grammar:

c A d  S a  A a b  A a b  A

/ Faculteit Wiskunde en Informatica

PAGE 20 22-9-2011

Syntactical analysis

  • Parser

bool proc S() { if input = ‘c’ then inptr +:= 1; if A() then if input = ‘d’ then inptr +:= 1; if input = eof if input eof then return(true) else return(false) fi fi fi fi fi; return(false)

/ Faculteit Wiskunde en Informatica

PAGE 21 22-9-2011

}

Syntactical analysis

bool proc A() { isave := inptr; isave := inptr; if input = ‘a’ then inptr +:= 1; if input = ‘b’ then inptr +:= 1; return(true) fi fi; inptr := isave; inptr : isave; if input = ‘a’ then inptr +:= 1; return(true) else return(false) fi }

/ Faculteit Wiskunde en Informatica

PAGE 22 22-9-2011

Syntactical analysis

Recursive descent without backtracking

  • For ``some'' context-free grammars a recursive descent grammar

without backtracking can be derived

G

  • Grammar:

T E’  E + T E’  E’ ε  E’ F T’  T * F T’  T’ ε  T’ ( E )  F a  F

/ Faculteit Wiskunde en Informatica

PAGE 23 22-9-2011

slide-7
SLIDE 7

Syntactical analysis

Recursive descent without backtracking

  • Parser:

bool proc E() { proc E’() { T(); if input = ‘+’ E’(); then inptr +:= 1; T(); E’() if input = eof fi then return(true) } else return(false) else return(false) fi } proc T’() { if input = ‘*’ proc T() { then inptr +:= 1; F(); T’() F(); fi T’() } }

/ Faculteit Wiskunde en Informatica

PAGE 24 22-9-2011

}

Syntactical analysis

Recursive descent without backtracking

proc F() { if input = ‘a’ then inptr +:= 1 else if input = ‘(’ then inptr +:= 1; E(); if input = ‘)’ if input ) then inptr +:= 1 else ERROR() fi else ERROR() fi fi }

/ Faculteit Wiskunde en Informatica

PAGE 25 22-9-2011

}

Syntactical analysis

Non-recursive predictive parser

a a * a $ + … … Input: Predictive parser Output X Stack: Parse table M[X,a] X non-terminal …

Inspects top of stack (X) and current input symbol (a)

X non-terminal a terminal $

/ Faculteit Wiskunde en Informatica

PAGE 26 22-9-2011

Syntactical analysis

There are three cases:

1. X = a = $: successful parse 2. X = a  $: pop X and increment input pointer 3 X is a non-terminal 3. X is a non-terminal

a. M[X, a] = error b. M[X, a] = {U V W  X} put W V and U on the stack (U on top) put W, V and U on the stack (U on top)

/ Faculteit Wiskunde en Informatica

PAGE 27 22-9-2011

slide-8
SLIDE 8

Syntactical analysis

Grammar:

T E’  E + T E’  E’ ε  E’ F T’  T * F T’  T’ ε  T’ ε  T’ ( E )  F a  F

Parse table: a + * ( ) $ E T E’ T E’ E’ + T E’ ε ε E + T E ε ε T F T’ F T’ T’ ε * F T’ ε ε

/ Faculteit Wiskunde en Informatica

PAGE 28 22-9-2011

F a ( E )

Syntactical analysis

Stack Input Output $E a + a * a $ $ $ $E’T a + a * a $ T E’  E $E’T’F a + a * a $ F T’  T $E’T’a a + a * a $ a  F $E’T’ + a * a $ $E’ + a * a $ ε  T’ $E’ + a * a $ ε  T’ $E’T+ + a * a $ + T E’  E’ $E’T a * a $ $E’T’F a * a $ F T’  T $E’T’a a * a $ a  F $E’T’ * a $ $E’T’F* * a $ * F T’  T’ $E’T’F a $ $E’T’a a $ a  F $E’T’ $ $E T $ $E’ $ ε  T’ $ $ ε  E’ accept

/ Faculteit Wiskunde en Informatica

PAGE 29 22-9-2011

Syntactical analysis

Construction of parse table

  • First(): set of all terminals where strings which are derived

from  can start with; and: if  * ε then ε  First() and: if   ε then ε  First()

  • Follow(A): set of all terminals which follow A immediately in a

t ti l f sentential form

/ Faculteit Wiskunde en Informatica

PAGE 30 22-9-2011

Syntactical analysis

First(X): ( )

  • 1. initially First(X)= {} for all X  N
  • 2. if X is a terminal: First(X) := {X}

3 if ε  X then: First(X) := First(X)  {ε}

  • 3. if ε  X then: First(X) := First(X)  {ε}
  • 4. if Y1 ... Yi-1Yi ... Yk  X and ε  First(Yj), j = 1,…, i-1

then First(X) := First(X)  First(Y1)\{ε}  …  First(Yi-1)\{ε}  First(Yi) if Y1 ... Yi-1Yi ... Yk  X and ε  First(Yj), j = 1,…, k then First(X) := First(X)  {ε}

  • 5. Repeat step 4 until no new elements are added to any First set

/ Faculteit Wiskunde en Informatica

PAGE 31 22-9-2011

slide-9
SLIDE 9

Syntactical analysis

Follow(A): ( )

1. for all A  S (where S is the start symbol): Follow(A) = {} Follow(S) = {$} Follow(S) = {$} 2. if there is a production  B   A then Follow(B) := First()\{ε} if there is a production  B  A, or  B   A, with ε  First(), then Follow(B) := Follow(B)  Follow(A) 3. repeat step 2 until no new elements are added to any Follow set

/ Faculteit Wiskunde en Informatica

PAGE 32 22-9-2011

Syntactical analysis

Grammar:

T E’ E + T E’ E’ T E’  E + T E’  E’ ε  E’ F T’  T * F T’  T’ ε  T’ ( E )  F a  F

First(E) = First(T) = First(F) = {(, a} First(E’) = {+ ε} First(E’) = {+, ε} First(T’) = {*, ε} Follow(E) = Follow(E’) = {) $} Follow(E) Follow(E ) {), $} Follow(T) = Follow(T’) = {+, ), $} Follow(F) = {*, +, ), $}

/ Faculteit Wiskunde en Informatica

PAGE 33 22-9-2011

Syntactical analysis

Construction of parse table for top-down parser

Input: context-free grammar G Output: parse table T for G 1 For all non-terminals A and terminals a: T[A a] := {} 1. For all non-terminals A and terminals a: T[A,a] := {} 2. For every rule   A in G:

a. For every terminal a  First(): T[A,a] := T[A,a]  {  A} b If ε First( ) then b. If ε  First(), then:

i. For all b  Follow(A): T[A,b] := T[A,b]  {  A} ii. If $  Follow(A), then: T[A,$] :=T[A,$]  {  A}

3 Give all empty entries of T the value error 3. Give all empty entries of T the value error.

/ Faculteit Wiskunde en Informatica

PAGE 34 22-9-2011

Syntactical analysis

LL(1) condition: A grammar with a parse table which does not contain multiple entries can be parsed predicatively. For each input symbol there is a unique choice. p y q The property is the LL(1)-property: LL(1) = Left to right parsing LL(1) = Left-to-right-parsing, Left-most derivation, 1 symbol look ahead

/ Faculteit Wiskunde en Informatica

PAGE 35 22-9-2011

slide-10
SLIDE 10

Syntactical analysis

Alternative definition of LL(1): Grammar G is LL(1) if and only if for each pair of productions   A and   A holds:

1.First()  First() =  (meaning  and  can not produce strings starting with the same terminal) 2.Either  or  may produce empty  y p p y 3.If ε  First(), then holds: First()  Follow(A) =  (meaning if  produces the empty string, then  may not produce strings beginning with a terminal that follows A) follows A)

/ Faculteit Wiskunde en Informatica

PAGE 36 22-9-2011