CSCI-2320 Syntactic Analysis (Ch 3 & Wikipedia for CYK) - - PDF document

csci 2320 syntactic analysis ch 3
SMART_READER_LITE
LIVE PREVIEW

CSCI-2320 Syntactic Analysis (Ch 3 & Wikipedia for CYK) - - PDF document

10/3/17 CSCI-2320 Syntactic Analysis (Ch 3 & Wikipedia for CYK) Mohammad T . Irfan AKA "parser" Stream of Parse tree/ Parser tokens syntax error Question: What is the grammar? 1 10/3/17 Parsing algorithms u Predictive


slide-1
SLIDE 1

10/3/17 1

CSCI-2320 Syntactic Analysis (Ch 3 & Wikipedia for CYK)

Mohammad T . Irfan

AKA "parser"

Question: What is the grammar?

Parser

Stream of tokens Parse tree/ syntax error

slide-2
SLIDE 2

10/3/17 2

Parsing algorithms

u Predictive parser (as opposed to backtracking) u Recursive descent (RD) parser: each nonterminal

is a function that recognizes input derivable from that nonterminal

u Top-down u LL(1): left to right scan, left-most

derivation, and 1 token look-ahead

RD parser for assignment stmt

Assignment à Id = Expr; Expr à Term {AddOp Term} AddOp à + | - Term à Factor {MulOp Factor} MulOp à * | / Factor à [UnaryOp] Primary UnaryOp à - Primary à Id | IntLiteral | FloatLiteral | (Expr)

... <Lexical syntax for Id, IntLiteral, FloatLiteral> ...

slide-3
SLIDE 3

10/3/17 3

Python code for smaller version

Expr à Term {(+|-) Term} Term à Factor {(*|/) Factor} Factor à IntLiteral

... <Lexical syntax for IntLiteral> ...

u Code is available on Blackboard under Assignment 2

u parser_v1.py: Only check for syntactic correctness (expression

evaluation later when we do semantics)

Requirements for RD parser

  • 1. Remove left recursions (why?)
  • 2. Do "left factoring"
slide-4
SLIDE 4

10/3/17 4

Removing left recursion

u Example u Algorithm (assume no cycle; i.e., no A => A)

Nonterminals: A1, A2, ..., An (ordered arbitrarily) For each i For each j < i Let Aj à δ1 | δ2 | ... | δk Replace each Ai à Aj γ by 
 Ai à δ1γ | δ2γ | ... | δkγ Eliminate left recursion from all Ai products

+

No left recursion here

Left factoring

u IfStmt à if Expr then Stmt u IfStmt à if Expr then Stmt else Stmt

u Why can't RD parser deal with it? u Solution

u Find the largest prefix α and factor it out

A à αβ1 | αβ2
 
 
 A à αA'
 A' à β1 | β2

slide-5
SLIDE 5

10/3/17 5

Literature review

u NP-hard: Given a CFG, is there an LL(1)

parser?

u Impossibility example:

LG = {an 0 bn | n >= 1} U {an 1 b2n | n >= 1}

u Why is an LL(1) impossible?

Literature review

u Is there always a parser (not necessarily

LL(1)) for any CFG?

u CYK algorithm: Cocke & Younger (1967) and

Kasami (1965)

u First parser for any CFG u Bottom-up parser

u Frost (2007): First top-down parser for any

CFG; improved by Ridge (2014)

slide-6
SLIDE 6

10/3/17 6

CYK Parsing Algorithm

https://en.wikipedia.org/wiki/CYK_algorithm

What it does

u Given (1) a CFG and (2) a string, verifies

whether the string can be derived by this grammar

u Example

u Detects syntactic errors in a given C program

slide-7
SLIDE 7

10/3/17 7

Requirements

u CFG must be in Chomsky Normal Form (CNF)

A à BC A à a

u No ε in any product u OK to have left recursion! u Left factoring is out of question (why?)

Idea

u Bottom-up approach + dynamic programming u Start with individual symbols of input string u Combine multiple symbols together

u 2 symbols u 3 symbols u ...

u Climb up the grammar hierarchy u Yes answer to parsing

we can get to the start symbol

slide-8
SLIDE 8

10/3/17 8

CYK example

u Input CFG

Expr à Expr + Term | Expr – Term | Term Term à Term*Factor | Term/Factor | Factor Factor à 0 | 1 | ... | 9

u CNF

Expr à Expr X X à AddOp Term AddOp à + | – Expr à Term Y #Avoid bypassing Expr à Term à ... Term à Term Y Y à MultOp Factor MultOp à * | / Factor à 0 | 1 | ... | 9 Term à 0 | 1 | ... | 9 Expr à 0 | 1 | ... | 9

CYK example (cont...)

u Input string: 2 – 3 * 4

Expr, Term, Factor AddOp Expr, Term, Factor MultOp Expr, Term, Factor

2 – 3 * 4

Length 1 2 3 4 5

X Y Expr Term, Expr X Expr

Expr à Expr X X à AddOp Term AddOp à + | – Expr à Term Y Term à Term Y Y à MultOp Factor MultOp à * | / Factor à 0 | 1 | ... | 9 Term à 0 | 1 | ... | 9 Expr à 0 | 1 | ... | 9

Start index j

1 2 3 4 5

slide-9
SLIDE 9

10/3/17 9

CYK Algorithm

Inputs: CNF grammar and n tokens Fill in the row for length 1 For each length i from 2 to n: For each index j from 1 to n-i+1: A à BC?
 For k = length of B from 1 to i-1: If there's a product A à BC s.t.
 B is in cell (j,k) and 
 C is in cell (j+k, i-k): Add A to cell (j,i) 
 Return True iff cell (1,n) contains 
 the start symbol.

Negative example

u Input string:

2 + 3 * /

Expr à Expr X X à AddOp Term AddOp à + | – Term à Term Y Y à MultOp Factor MultOp à * | / Factor à 0 | 1 | ... | 9 Term à 0 | 1 | ... | 9 Expr à 0 | 1 | ... | 9

slide-10
SLIDE 10

10/3/17 10

Class Participation 4

u CNF grammar

S à AX | AB X à SB A à 0 B à 1

u Parse the following strings using the CYK alg

u 0011 ✔ u 01010 ✗

u Collaboration level: 0 (work freely in groups)