10/3/17 1
CSCI-2320 Syntactic Analysis (Ch 3 & Wikipedia for CYK)
Mohammad T . Irfan
AKA "parser"
Question: What is the grammar?
Parser
Stream of tokens Parse tree/ syntax error
CSCI-2320 Syntactic Analysis (Ch 3 & Wikipedia for CYK) - - PDF document
10/3/17 CSCI-2320 Syntactic Analysis (Ch 3 & Wikipedia for CYK) Mohammad T . Irfan AKA "parser" Stream of Parse tree/ Parser tokens syntax error Question: What is the grammar? 1 10/3/17 Parsing algorithms u Predictive
Stream of tokens Parse tree/ syntax error
u Predictive parser (as opposed to backtracking) u Recursive descent (RD) parser: each nonterminal
is a function that recognizes input derivable from that nonterminal
u Top-down u LL(1): left to right scan, left-most
derivation, and 1 token look-ahead
... <Lexical syntax for Id, IntLiteral, FloatLiteral> ...
... <Lexical syntax for IntLiteral> ...
u Code is available on Blackboard under Assignment 2
u parser_v1.py: Only check for syntactic correctness (expression
evaluation later when we do semantics)
u Example u Algorithm (assume no cycle; i.e., no A => A)
Nonterminals: A1, A2, ..., An (ordered arbitrarily) For each i For each j < i Let Aj à δ1 | δ2 | ... | δk Replace each Ai à Aj γ by Ai à δ1γ | δ2γ | ... | δkγ Eliminate left recursion from all Ai products
+
No left recursion here
u IfStmt à if Expr then Stmt u IfStmt à if Expr then Stmt else Stmt
u Why can't RD parser deal with it? u Solution
u Find the largest prefix α and factor it out
A à αβ1 | αβ2 A à αA' A' à β1 | β2
u NP-hard: Given a CFG, is there an LL(1)
u Impossibility example:
LG = {an 0 bn | n >= 1} U {an 1 b2n | n >= 1}
u Why is an LL(1) impossible?
u Is there always a parser (not necessarily
u CYK algorithm: Cocke & Younger (1967) and
u First parser for any CFG u Bottom-up parser
u Frost (2007): First top-down parser for any
https://en.wikipedia.org/wiki/CYK_algorithm
u Given (1) a CFG and (2) a string, verifies
u Example
u Detects syntactic errors in a given C program
u CFG must be in Chomsky Normal Form (CNF)
u No ε in any product u OK to have left recursion! u Left factoring is out of question (why?)
u Bottom-up approach + dynamic programming u Start with individual symbols of input string u Combine multiple symbols together
u 2 symbols u 3 symbols u ...
u Climb up the grammar hierarchy u Yes answer to parsing
u Input CFG
Expr à Expr + Term | Expr – Term | Term Term à Term*Factor | Term/Factor | Factor Factor à 0 | 1 | ... | 9
u CNF
Expr à Expr X X à AddOp Term AddOp à + | – Expr à Term Y #Avoid bypassing Expr à Term à ... Term à Term Y Y à MultOp Factor MultOp à * | / Factor à 0 | 1 | ... | 9 Term à 0 | 1 | ... | 9 Expr à 0 | 1 | ... | 9
u Input string: 2 – 3 * 4
Expr, Term, Factor AddOp Expr, Term, Factor MultOp Expr, Term, Factor
Length 1 2 3 4 5
X Y Expr Term, Expr X Expr
Expr à Expr X X à AddOp Term AddOp à + | – Expr à Term Y Term à Term Y Y à MultOp Factor MultOp à * | / Factor à 0 | 1 | ... | 9 Term à 0 | 1 | ... | 9 Expr à 0 | 1 | ... | 9
Start index j
1 2 3 4 5
u Input string:
Expr à Expr X X à AddOp Term AddOp à + | – Term à Term Y Y à MultOp Factor MultOp à * | / Factor à 0 | 1 | ... | 9 Term à 0 | 1 | ... | 9 Expr à 0 | 1 | ... | 9
u CNF grammar
u Parse the following strings using the CYK alg
u 0011 ✔ u 01010 ✗
u Collaboration level: 0 (work freely in groups)