Parsing CSCI 3130 Formal Languages and Automata Theory Siu On CHAN - PowerPoint PPT Presentation

Parsing CSCI 3130 Formal Languages and Automata Theory Siu On CHAN Fall 2018 Chinese University of Hong Kong 1/28

Context-free versus regular Every regular language is context-free regular expression NFA DFA 2/28 Write a CFG for the language ( 0 + 1 ) ∗ 111 S → U 111 U → 0 U | 1 U | ε Can you do so for every regular language?

From regular to context-free a (alphabet symbol) 1 E 1 E 2 regular expression 3/28 grammar with no rules ⇒ CFG ∅ ε S → ε S → a E 1 + E 2 S → S 1 | S 2 S → S 1 S 2 E ∗ S → SS 1 | ε S becomes the new start variable

Context-free versus regular Is every context-free language regular? S 0 S 1 L 0 n 1 n n 0 Is context-free but not regular regular context-free 4/28

Context-free versus regular Is every context-free language regular? Is context-free but not regular regular context-free 4/28 L = { 0 n 1 n | n � 0 } S → 0 S 1

Ambiguity

Ambiguity + A CFG is ambiguous if some string has more than one parse tree 2 2 * 1 5/28 2 2 1 + * 1+2*2 E → E + E | E * E | ( E ) | N N → 1 | 2 ✗ = 6 = 5

Example S Two ways to derive xxx x S x S S x S x S x S x S S S Yes, because 6/28 Is S → SS | x ambiguous?

Disambiguation S S S x x x Sometimes we can rewrite the grammar to remove ambiguity 7/28 S → SS | x ⇒ S → S x | x

Disambiguation + and * have the same precedence! F F T T F T 8/28 E → E + E | E * E | ( E ) | N N → 1 | 2 Decompose expression into terms and factors 2 * ( 1 + 2 * 2 )

Disambiguation Each term is a product of one or more factors Each factor is a parenthesized expression or a number 9/28 E → E + E | E * E | ( E ) | N N → 1 | 2 An expression is a sum of one or more terms E → T | E + T T → F | T * F F → ( E ) | 1 | 2

Parsing example 2 F 1 + T T F * F F 2 ) + T F 1 1 T 10/28 E Parse tree for 2+(1+1+2*2)+1 E E E T F 2 + T F ( E E E → T | E + T T → F | T * F F → ( E ) | 1 | 2 + T

In programming languages, ambiguity comes from the precedence Disambiguation Disambiguation is not always possible because rules, and we can resolve like in the example In English, ambiguity is sometimes a problem: I look at the dog with one eye 11/28 There exists inherently ambiguous languages There is no general procedure for disambiguation

Disambiguation In English, ambiguity is sometimes a problem: the dog with one eye I look at Disambiguation is not always possible because 11/28 rules, and we can resolve like in the example There exists inherently ambiguous languages There is no general procedure for disambiguation In programming languages, ambiguity comes from the precedence � ��

Parsing input: 0011 If so, how to build a parse tree with a program? 12/28 S → 0 S 1 | 1 S 0 S | T T → S | ε Is 0011 ∈ L ?

Parsing 0 S 1 This is (part of) the tree of all derivations, not the parse tree … 00 S 11 00 T 11 00 S 11 … 01 S 0 S 1 … 0 T 1 … 10 S 10 S … 1 S 0 S … S T S Try all derivations? input: 0011 13/28 S → 0 S 1 | 1 S 0 S | T T → S | ε 0011 ✓ ε

Problems 1. Trying all derivations may take too long Let’s tackle the 2nd problem 14/28 2. If input is not in the language, parsing will never stop

Derviation may loop When to stop S and unit productions Remove productions” because of “unit T S T because of “ -productions” Derived string may shrink 01 0 T 1 0 S 1 S Problems: Idea: Stop when 15/28 S → 0 S 1 | 1 S 0 S | T | derived string | > | input | T → S | ε

Derviation may loop When to stop T and unit productions Remove productions” because of “unit T S S Derived string may shrink Problems: Idea: Stop when 15/28 S → 0 S 1 | 1 S 0 S | T | derived string | > | input | T → S | ε S ⇒ 0 S 1 ⇒ 0 T 1 ⇒ 01 because of “ ε -productions”

When to stop Idea: Stop when Problems: Derived string may shrink because of “unit productions” 15/28 S → 0 S 1 | 1 S 0 S | T | derived string | > | input | T → S | ε S ⇒ T ⇒ S ⇒ T ⇒ . . . S ⇒ 0 S 1 ⇒ 0 T 1 ⇒ 01 Derviation may loop because of “ ε -productions” Remove ε and unit productions

16/28 D If S is the start variable and Removing Add a new start variable T A S is not the (new) start variable E C D AC AD S C Removing ε -productions Goal: remove all A → ε rules for every non-start variable A For every rule A → ε where A the rule S → ε exists 1. Remove the rule A → ε 2. If you see B → α A β Add the rule T → S Add a new rule B → αβ S → ACD A → a B → ε C → ED | ε D → BC | b E → b

16/28 S If S is the start variable and A Add a new start variable T S E is not the (new) start variable C D AC AD Removing ε -productions Goal: remove all A → ε rules for every non-start variable A For every rule A → ε where A the rule S → ε exists 1. Remove the rule A → ε 2. If you see B → α A β Add the rule T → S Add a new rule B → αβ S → ACD D → C A → a ✘✘✘ B → ε C → ED | ε D → BC | b E → b Removing B → ε

16/28 AC If S is the start variable and A Add a new start variable T S E is not the (new) start variable C D Removing ε -productions Goal: remove all A → ε rules for every non-start variable A For every rule A → ε where A the rule S → ε exists 1. Remove the rule A → ε 2. If you see B → α A β Add the rule T → S Add a new rule B → αβ D → C | B S → ACD A → a S → AD ✘✘✘ B → ε C → ED | ✁ ε D → BC | b E → b Removing C → ε

16/28 AC If S is the start variable and A Add a new start variable T S E is not the (new) start variable C Removing ε -productions Goal: remove all A → ε rules for every non-start variable A For every rule A → ε where A the rule S → ε exists 1. Remove the rule A → ε 2. If you see B → α A β Add the rule T → S Add a new rule B → αβ D → C | B S → ACD A → a S → AD ✘✘✘ B → ε D → ε C → ED | ✁ ε D → BC | b E → b Removing C → ε

16/28 is not the (new) start variable If S is the start variable and A Add a new start variable T S Removing ε -productions Goal: remove all A → ε rules for every non-start variable A For every rule A → ε where A the rule S → ε exists 1. Remove the rule A → ε 2. If you see B → α A β Add the rule T → S Add a new rule B → αβ D → C | B S → ACD A → a S → AD | AC ✘✘✘ ✘✘✘ ✘ B → ε D → ε C → ED | ✁ ε C → E D → BC | b E → b Removing D → ε

16/28 Add a new start variable T If S is the start variable and is not the (new) start variable Removing ε -productions Goal: remove all A → ε rules for every non-start variable A For every rule A → ε where A the rule S → ε exists 1. Remove the rule A → ε 2. If you see B → α A β Add the rule T → S Add a new rule B → αβ D → C | B S → ACD A → a S → AD | AC ✘✘✘ ✘✘✘ ✘ B → ε D → ε C → ED | ✁ ε C → E D → BC | b S → A E → b Removing D → ε

B A becomes B If B was removed earlier, don’t add it back 17/28 Eliminating ε -productions For every A → ε rule where A is not the start variable 1. Remove the rule A → ε 2. If you see B → α A β Add a new rule B → αβ Do 2. every time A appears B → α A β A γ yields B → αβ A γ B → α A βγ B → αβγ

don’t add it back 17/28 Eliminating ε -productions For every A → ε rule where A is not the start variable 1. Remove the rule A → ε 2. If you see B → α A β Add a new rule B → αβ Do 2. every time A appears B → A becomes B → ε B → α A β A γ yields If B → ε was removed earlier, B → αβ A γ B → α A βγ B → αβγ

Eliminating unit productions A unit production is a production of the form Grammar: Unit production graph: S T R 18/28 A → B S → 0 S 1 | 1 S 0 S | T T → S | R | ε R → 0 SR

Parsing CSCI 3130 Formal Languages and Automata Theory Siu On CHAN - PowerPoint PPT Presentation

Parsing CSCI 3130 Formal Languages and Automata Theory Siu On CHAN Fall 2018 Chinese University of Hong Kong 1/28 Context-free versus regular Every regular language is context-free regular expression NFA DFA 2/28 Write a CFG for the

Introduction to Bottom-Up Parsing Shift-reduce parsing The LR parsing algorithm

CSC 4181 Compiler Construction Parsing 1 1 Outline Top-down v.s. Bottom-up Top-down parsing

Robust Incremental Neural Semantic Graph Parsing Jan Buys and Phil Blunsom Dependency Parsing vs

Basic Parsing Algorithms Chart Parsing Seminar Recent Advances in Parsing Technology WS

Models of Human Parsing Experimental Data 2 Informatics 2A: Lecture 22 Eye-tracking Reading

Outline LR Parsing Review of bottom-up parsing LALR Parser Generators Computing the

Graph-Based Parsing Joakim Nivre Uppsala University Department of Linguistics and Philology

Dependency Parsing II CMSC 470 Marine Carpuat Graph-based Dependency Parsing Slides credit:

Generalised Parsing and Combinator Parsing A Happy Marriage? L. Thomas van Binsbergen

Parsing as Deduction Joseph K uhner March 24, 2007 Joseph K uhner Parsing as Deduction

Bottom-up parsing LR parsing Construct parse tree for input from leaves up LR( k ) parsing

Compilers Shift-Reduce Parsing Alex Aiken Shift-Reduce Parsing Important Fact #1 about

Parsing, Part I Jim Royer April 2, 2019 CIS 352 Parsing, Part I 1 Miss Teen South

Programming Languages: Parsing Onur Tolga S ehito glu Computer Engineering,METU 27 May

* 07/16/96 Plan for Today Shift-reduce parsing The problem with predictive top down parsing

Statistical Dependency Parsing in Korean: From Corpus Generation To Automatic Parsing Workshop on

Parsing CSCI 3130 Formal Languages and Automata Theory Siu On CHAN Chinese University of Hong

Parse Trees Statistical NLP Spring 2011 Lecture 15: Parsing I The move followed a round of

Certification of context-free grammar algorithms Denis Firsov Institute of Cybernetics at TUT

Weighted Context-Free Grammars over Bimonoids George Rahonis and Faidra Torpari Aristotle

CS20a: summary (Oct 24, 2002) Context-free languages Grammars G = (V, T, P, S)

Chrobak normal form revisited, with applications Pawe Gawrychowski Institute of Computer

INF2080 Context-Free Langugaes Daniel Lupp Universitetet i Oslo 1st February 2016 Department

s tt s

Sambuz

Useful Links

Newsletter

Mail Us