Bottom-Up Parsing (A First Step) CockeYoungerKasami (CYK) algorithm - - PowerPoint PPT Presentation

bottom up parsing
SMART_READER_LITE
LIVE PREVIEW

Bottom-Up Parsing (A First Step) CockeYoungerKasami (CYK) algorithm - - PowerPoint PPT Presentation

Bottom-Up Parsing (A First Step) CockeYoungerKasami (CYK) algorithm and Chomsky Normal Form 1 Last time Showed how to use Java CUP for getting ASTs But we never saw HOW the parser works 2 This time Dip our toe into parsing


slide-1
SLIDE 1

Bottom-Up Parsing

(A First Step)

Cocke–Younger–Kasami (CYK) algorithm and Chomsky Normal Form

1

slide-2
SLIDE 2

Last time

Showed how to use Java CUP for getting ASTs But we never saw HOW the parser works

2

slide-3
SLIDE 3

This time

Dip our toe into parsing

– Approaches to parsing – CFG transformations

  • Useless non-terminals
  • Chomsky Normal Form: A form of grammar that is easier to

deal with

– CYK: powerful, heavyweight approach to parsing

3

slide-4
SLIDE 4

Approaches to Parsing

Top Down / “Goal driven”

– Begin with the start nonterminal – Grow parse tree downward to match the string

Bottom Up / “Data Driven”

– Start at terminals – Generate ever larger subtrees; the goal is to obtain a single tree whose root is the start nonterminal

4

Expr Expr Term Term id id plus

slide-5
SLIDE 5

CYK: A General Approach to Parsing (Cocke–Younger–Kasami algorithm)

Operates in time O(n3) Works bottom-up Requires the grammar to be in Chomsky Normal Form

– This turns out not to be a limitation: any context-free grammar can be converted into one in Chomsky Normal Form

5

slide-6
SLIDE 6

Chomsky Normal Form

All rules must be one of two forms:

X t (terminal) X A B

The only rule allowed to derive epsilon is the start S

6

slide-7
SLIDE 7

What CNF buys CYK

  • The fact that non-terminals come in pairs

allows you to think of a subtree as a subspan

  • f the input
  • The fact that non-terminals are not nullable

(except for start) means that each subspan has at least one character

7

s = s1 s2 s3 s4

slide-8
SLIDE 8

CYK: Dynamic Programming

X t Form the leaves of the parse tree X A B Form binary interior nodes of the parse tree

8

s1 s2 s3 s4 S1,2 S1,1 S2,2 S3,3 S4,4 S3,4 S1,4 s1 s2 s3 s4 S1,1 S2,2 S3,3 S4,4 S3,4 S2,4 S1,4 s4 s3 s2 s1 S4,4 S3,3 S2,2 S1,1 S1,2 S1,3 S1,4

slide-9
SLIDE 9

Running CYK …

Track every viable subtree from leaf to root. Here are all the subspans for a string of 6 terminals:

9

1,1 2,2 3,3 4,4 5,5 6,6 1,2 2,3 3,4 4,5 5,6 1,3 2,4 3,5 4,6 1,4 2,5 3,6 1,5 2,6 1,6

Starting position of subspan Ending position of subspan

start, end

Single characters Full string

slide-10
SLIDE 10

CYK Example

10

I,N L I,N C I,N R Z X N X W F

F ⟶ I W F ⟶ I Y W ⟶ L X X ⟶ N R Y ⟶ L R N ⟶ id N ⟶ I Z Z ⟶ C N I ⟶ id L ⟶ ( R ⟶ ) C ⟶ ,

id id id , ) ( 1,2 2,3 3,4 4,5 5,6 1,3 2,4 3,5 4,6 1,4 2,5 3,6 1,5 2,6 1,6 In general, go up a column and down a diagonal

slide-11
SLIDE 11

CYK Example

11

I,N L I,N C I,N R Z N X W F id id id , ) ( 4,5 3,5 3,6 2,6 1,6

F ⟶ I W F ⟶ I Y W ⟶ L X X ⟶ N R Y ⟶ L R N ⟶ id N ⟶ I Z Z ⟶ C N I ⟶ id L ⟶ ( R ⟶ ) C ⟶ ,

slide-12
SLIDE 12

CYK Example

12

I,N L I,N C N R Z N X W F id id id , ) ( 4,5 3,5 3,6 2,6 1,6

F ⟶ I W F ⟶ I Y W ⟶ L X X ⟶ N R Y ⟶ L R N ⟶ id N ⟶ I Z Z ⟶ C N I ⟶ id L ⟶ ( R ⟶ ) C ⟶ ,

slide-13
SLIDE 13

CYK Example

13

I,N L I C N R Z N X W F id id id , ) ( 4,5 3,5 3,6 2,6 1,6

F ⟶ I W F ⟶ I Y W ⟶ L X X ⟶ N R Y ⟶ L R N ⟶ id N ⟶ I Z Z ⟶ C N I ⟶ id L ⟶ ( R ⟶ ) C ⟶ ,

slide-14
SLIDE 14

CYK Example

14

I,N L I C N R Z N X W F id id id , ) ( 4,5 3,5 3,6 2,6 1,6

F ⟶ I W F ⟶ I Y W ⟶ L X X ⟶ N R Y ⟶ L R N ⟶ id N ⟶ I Z Z ⟶ C N I ⟶ id L ⟶ ( R ⟶ ) C ⟶ ,

slide-15
SLIDE 15

CYK Example

15

I,N L I C N R Z N X W F id id id , ) ( 4,5 3,5 3,6 2,6 1,6

F ⟶ I W F ⟶ I Y W ⟶ L X X ⟶ N R Y ⟶ L R N ⟶ id N ⟶ I Z Z ⟶ C N I ⟶ id L ⟶ ( R ⟶ ) C ⟶ ,

slide-16
SLIDE 16

CYK Example

16

I,N L I C N R Z N X W F id id id , ) ( 4,5 3,5 3,6 2,6 1,6

F ⟶ I W F ⟶ I Y W ⟶ L X X ⟶ N R Y ⟶ L R N ⟶ id N ⟶ I Z Z ⟶ C N I ⟶ id L ⟶ ( R ⟶ ) C ⟶ ,

slide-17
SLIDE 17

Cleaning up our grammars

We want to avoid unnecessary work

– Remove useless rules

17

slide-18
SLIDE 18

Eliminating Useless Nonterminals

18

  • 1. If a nonterminal cannot derive a sequence of

terminal symbols, then it is useless

  • 2. If a nonterminal cannot be derived from the

start symbol, then it is useless

slide-19
SLIDE 19

Eliminate Useless Nonterminals

19

If a nonterminal cannot derive a sequence of terminal symbols, then it is useless

Mark all terminal symbols Repeat If all symbols on the righthand side of a production are marked mark the lefthand side Until no more non-terminals can be marked

slide-20
SLIDE 20

Example:

20

S X | Y X ( ) Y ( Y Y )

slide-21
SLIDE 21

Eliminate Useless Nonterminals

21

If a nonterminal cannot be derived from the start symbol, then it is useless

Mark the start symbol Repeat If the lefthand side of a production is marked mark all righthand non-terminal Until no more non-terminals can be marked

slide-22
SLIDE 22

Example:

S A B A + | - | ε B digit | B digit C . B

22

slide-23
SLIDE 23

Chomsky Normal Form

4 Steps

– Eliminate epsilon rules – Eliminate unit rules – Fix productions with terminals on RHS – Fix productions with > 2 nonterminals on RHS

23

slide-24
SLIDE 24

Eliminate (Most) Epsilon Productions

24

If a nonterminal A immediately derives epsilon

– Make copies of all rules with A on the RHS and delete all combinations of A in those copies

slide-25
SLIDE 25

Example 1

25

F id ( A ) A ε A N N id N id , N F id ( A ) F id ( ) A N N id N id , N

slide-26
SLIDE 26

X A x A y A | A x A y | A x y A | x A y A | A x y | x A y | x y A | x y A z

Example 2

26

X A x A y A A ε A z

slide-27
SLIDE 27

Eliminate Unit Productions

27

Productions of the form A B are called unit productions Place B anywhere A could have appeared and remove the unit production

slide-28
SLIDE 28

F id ( N ) F id ( ) N id N id , N

Example 1

28

F id ( A ) F id ( ) A N N id N id , N

slide-29
SLIDE 29

Fix RHS Terminals

29

For productions with terminals and something else on the RHS

– For each terminal t add the rule

X t Where X is a new non-terminal

– Replace t with X in the original rules

slide-30
SLIDE 30

Example

30

F id ( N ) F id ( ) N id N id , N F I L N R F I L R N id N I C N I id L ( R ) C ,

slide-31
SLIDE 31

Fix RHS Nonterminals

31

For productions with > 2 Nonterminals on the RHS

– Replace all but the first nonterminal with a new nonterminal – Add a rule from the new nonterminal to the replaced nonterminal sequence – Repeat

slide-32
SLIDE 32

Example

32

F I L N R F I W W L N R F I W W L X X N R

slide-33
SLIDE 33

Parsing is Tough

CYK parses an arbitrary CFG, but

– O(n3) time – Too slow!

For special classes of grammars

– O(n) time – Examples of such classes: LL(1) and LALR(1)

33

slide-34
SLIDE 34

Classes of Grammars

LL(1)

– Scans input from Left-to-right (first L) – Builds a Leftmost Derivation (second L) – Can peek (1) token ahead of the token being parsed – Top-down “predictive parsers”

LALR(1)

– Uses special lookahead procedure (LA) – Scans input from Left-to-right (second L) – Rightmost derivation (R) – Can also peek (1) token ahead

LALR(1) strictly more powerful, but the algorithm is harder to understand (Java CUP generates a LALR(1) parser)

34

slide-35
SLIDE 35

Summary

We covered

  • How to parse with the CYK algorithm (dynamic

programming)

  • How to put a grammar into Chomsky Normal

Form

35