Parsing Parsing involves: determining if a string belongs to a - PowerPoint PPT Presentation

Parsing ◮ Parsing involves: ◮ determining if a string belongs to a language, and ◮ constructing structure of string if it belongs to language. ◮ Two approaches to constructing parsers: 1. Top down parsing: Our focus is on table-driven predictive parsing. Perform a left most derivation of string. 2. Bottom up: SLR(1), Canonical LR(1), and LALR(1). Perform a right-most derivation in reverse. ◮ Common theme: Finite state controller for stack automaton. In top down parsing, the states were implicit. In LR parsing the states are all explicit. 1 / 18

Top Down Parsing Introduction ◮ Top-down parser starts at the top of the tree, and tries to construct the tree that led to the given token string. Can be viewed as an attempt to find left-most derivation for an input string. ◮ Constrains on a top-down parser: 1. Start from the root, and construct tree solely from tokens and rules. 2. It must scan tokens left to right. ◮ Recursive descent as well as non-recursive predictive parsers. ◮ Approach for a table driven parser: ◮ Construct a CFG. CFG must be in a certain specific form. If not, apply transformations. (We will do these last). ◮ Construct a table that uniquely determines what productions to apply given a nonterminal and an input symbol. ◮ Use a parser driver to recognize strings. 2 / 18

Grammar Transformations ◮ Left recursion: E E T + E + T F E → E + T | T T T → T ∗ F | F F id F → id | ( E ) F id id ◮ Top down parser will start by expanding E to E + T , then expand E to E + T again. It may therefore keep expanding on it infinitely often. It could have expanded using rule E → T , but given the input string. Also, since it has not consumed any input, it must keep expanding using some rule, in this case the same rule. No top down parsers can handle left-recursive grammars. ◮ Solution: rewrite the grammar so that left recursion can be eliminated. Note: two kinds of left recursion: Immediate and indirect. A → B α | · · · B → A β | · · · 3 / 18

Behavior of Parser ◮ Parser may create the parse tree by applying all possible rules until a parse tree is constructed. ◮ Consider grammar G1 (shown below) and string bcde . S → ee | bAc | bAe A → d | cA ◮ Behavior of parser as it tries to build the parse tree: S ⇒ bAc ⇒ bcAc ⇒ bcdc ***Show the parse tree.** ◮ Parser must backtrack now. Here it must backtrack all the way up to the root and try rule S → bAe What kind of search? ◮ Problems: As the parser creates the parse tree, it uses up the tokens. When it backtracks, it must be able to go back to tokens it has already consumed. If scanner is under control of parser, scanner also must backtrack to produce the tokens again or parser must have a separate buffer. ◮ Backtracking slows down parsing and hence not an attractive approach. ◮ Chage the grammar: S → ee | bAQ Q → c | e A → d | cA Factor out the common prefix. Now parser can grow the tree without backtracking. 4 / 18

Predictive Parsers ◮ Compute the terminal symbols that a terminal can produce. Use this information to select a rule during derivation. ◮ Consider grammar: S → Ab | Bc A → Df | CA B → gA | e → dC | c C D → h | i ◮ For an input string gchfc , a simple parser may have to do great deal of backtracking before it finds the derivation. Backtracking can be avoided if parser can look ahead in grammar to anticipate what symbols are derivable. Consider possible leftmost derivation starting from S: hfb Dfb ifb Ab dCAb CAb cAb S gAc Bc eC Choose S → Ab if string begins with c , d , h , or i . Choose S → Bc if string begins with g , e . ◮ First: terminals that can begin strings derivable from a nonterminal. First( Ab ) = { c , d , h , i } First( Bc ) = { e , g } ◮ Parsers that use First are known as predictive parsers . 5 / 18

Non-recursive Predictive Parsers for LL(1) grammars ◮ Skip recursive descent predictive parsers (hopefully you wrote one in ECS 140A). ◮ Consists of a simple control procedure that runs off a table: 1. Input buffer 2. Stack 3. Parsing table, M[A, a]. 4. Output stream ◮ Constructing parse = ⇒ constructing table. We will look at it later. ◮ Table: for each nonterminal, specify the rule that should be used to expand the nonterminal for a given input symbol. ◮ Example table for Grammar: 1. E → TQ → 2. T FR 3. Q → + TQ | − TQ | ǫ 4. R → ∗ FR | / FT | ǫ 5. F → ( E ) | id id + − ∗ / ( ) $ E TQ TQ Q + TQ − TQ ǫ ǫ Blanks: error conditions. T FR FR R ǫ ǫ ∗ FR / FR ǫ ǫ F id ( E ) 6 / 18

Behavior of Parser ◮ Initially stack contains S $ with S at top, and input contains w $. ◮ Behavior of parser at each step: let X be at the top of stack, and a be a symbol: 1. X = a � = $: pop X off stack and advance to next symbol. 2. X is NT, consult table M [ X , a ]. If M [ X , a ] = Y 1 Y 2 · · · Y n : 2.1 pop X off 2.2 Push Y 1 Y 2 · · · Y n on stack with Y 1 on top. If no entry, issue error. At this step, parser has determined the rule that can be applied and uses that for derivation. 3. X = a = $ : Parser halts. That is, we have matched all symbols. 7 / 18

Stack Input Production $ E ( id + id ) ∗ id $ $ QT ( id + id ) ∗ id $ E → TQ $ QRF ( id + id ) ∗ id $ T → FR $ QR ) E ( ( id + id ) ∗ id $ → ( E ) F $ QR ) E id + id ) ∗ id $ Pop token $ QR ) QT id + id ) ∗ id $ E → TQ $ QR ) QRF id + id ) ∗ id $ T → FR $ QR ) QR id id + id ) ∗ id $ → id F $ QR ) QR + id ) ∗ id $ Pop token $ QR ) Q + id ) ∗ id $ R → ǫ $ QR ) QT + + id ) ∗ id $ Q → + TQ $ QR ) QT id ) ∗ id $ pop token $ QR ) QRF id ) ∗ id $ → T FR $ QR ) QR id id ) ∗ id $ F → id $ QR ) QR ) ∗ id $ Pop token $ QR ) Q ) ∗ id $ R → ǫ $ QR ) ) ∗ id $ → ǫ Q $ QR ∗ id $ Pop token $ QRF ∗ ∗ id $ R → ∗ FR $ QRF id $ Pop token $ QR id id $ → id F $ QR $ Pop token $ Q $ R → ǫ $ $ Q → ǫ Accept 8 / 18

LL(1) Parser Stack Input Production $ E ( id ∗ )$ $ QT ( id ∗ )$ E → TQ $ QRF ( id ∗ )$ → T FR $ QR ) E ( ( id ∗ )$ F → ( E ) ◮ Example: $ QR ) E id ∗ )$ Pop token $ QR ) QT id ∗ )$ E → TQ $ QR ) QRF id ∗ )$ → T FR $ QR ) QR id ) id ∗ )$ F → cfgId $ QR ) QR ∗ )$ Pop token $ QRF ∗ ∗ )$ T → + FR $ QRF )$ Error ◮ A correct, leftmost parse is guarateed. ◮ All grammars in the LL(1) class are unambiguous: ambiguity implies two or more distinct leftmost parse. This means more than one correct predictions possible. ◮ LL(1) parsers operate in linear time, and at most, linear space (relative to the length of the input being parsed). 9 / 18

First ◮ First: terminals that can begin strings derivable from a nonterminal. ◮ Algorithm for evaluating First( α ): 1) α is a single character or ǫ : ◮ terminal or ǫ = ⇒ First( α ) = α ◮ Nonterminal and α → β 1 | β 2 | · · · | β n = ⇒ First( α ) = ∪ k First( β k ) 2) α = X 1 X 2 · · · X n : First( α ) = {} ; j := 0; repeat j := j + 1; include First( X j ) in First( α ) until X j does not derive ǫ or j = n ; if X n derives ǫ , add { ǫ } to First( α ). ◮ Example: S → ABCd → e | f | ǫ A B → g | h | ǫ C → p | q First( ABCd ) = { e , f } ∪ { g , h } ∪ { p , q } = { e , f , g , h , p , q } 10 / 18

Follow ◮ Sometimes First does not contain enough information for the parser to choose the right rule for derivation, especially when grammar contains ǫ − productions: S → XY X → a | ǫ → Y c What does X do when it sees on input c? ◮ Follow tells us when to use ǫ productions: check whether the forthcoming token is in the First set. If it is not and there is a ǫ production, check if the token is in Follow . If it is, then use the ǫ production. Else there is a parsing error. ◮ Follow( A ): set of all terminals that can come right after A in any sentential form. ◮ Assume that end of string denoted by $. ◮ Algorithm for evaluating Follow( A ): 1. If A is starting symbol, put $ in Follow( A ). 2. For all productions of form Q → α A β : a) if β begins with a terminal a , add a to Follow( A ) otherwise Follow( A ) includes First( β ) −{ ǫ } . b) if β = ǫ or if β derives ǫ , add Follow( Q ) in Follow( A ). 11 / 18

Example: First and Follow ◮ Grammar: 1. E → TQ → 2. T FR → + TQ | − TQ | ǫ 3. Q R → ∗ FR | / FT | ǫ 4. 5. F → ( E ) | id ◮ First: First( E ) = First( T ) = First( F ) = { ( , id } First( Q ) = { + , − , ǫ } First( R ) = {∗ , /, ǫ } ◮ Follow: Follow( E ) = { $ , ) } Follow( Q ) = Follow( E ) Follow( T ) = { + , − , ) , $ } Follow( R ) = { + , − , ) , $ } Follow( F ) = { + , − , ) , ∗ , /, $ } 12 / 18

Parsing Parsing involves: determining if a string belongs to a - PowerPoint PPT Presentation

Parsing Parsing involves: determining if a string belongs to a language, and constructing structure of string if it belongs to language. Two approaches to constructing parsers: 1. Top down parsing: Our focus is on table-driven

Introduction to Bottom-Up Parsing Shift-reduce parsing The LR parsing algorithm

CSC 4181 Compiler Construction Parsing 1 1 Outline Top-down v.s. Bottom-up Top-down parsing

Robust Incremental Neural Semantic Graph Parsing Jan Buys and Phil Blunsom Dependency Parsing vs

Basic Parsing Algorithms Chart Parsing Seminar Recent Advances in Parsing Technology WS

Models of Human Parsing Experimental Data 2 Informatics 2A: Lecture 22 Eye-tracking Reading

Outline LR Parsing Review of bottom-up parsing LALR Parser Generators Computing the

Graph-Based Parsing Joakim Nivre Uppsala University Department of Linguistics and Philology

Dependency Parsing II CMSC 470 Marine Carpuat Graph-based Dependency Parsing Slides credit:

Generalised Parsing and Combinator Parsing A Happy Marriage? L. Thomas van Binsbergen

Parsing as Deduction Joseph K uhner March 24, 2007 Joseph K uhner Parsing as Deduction

Bottom-up parsing LR parsing Construct parse tree for input from leaves up LR( k ) parsing

Compilers Shift-Reduce Parsing Alex Aiken Shift-Reduce Parsing Important Fact #1 about

Parsing, Part I Jim Royer April 2, 2019 CIS 352 Parsing, Part I 1 Miss Teen South

Programming Languages: Parsing Onur Tolga S ehito glu Computer Engineering,METU 27 May

* 07/16/96 Plan for Today Shift-reduce parsing The problem with predictive top down parsing

Statistical Dependency Parsing in Korean: From Corpus Generation To Automatic Parsing Workshop on

Neural Joint Model for Transition-based Chinese Syntactic Analysis Shuhei Kurita Daisuke Kawahara

Heterogeneous Logs Biplob Debnath and Will Dennis NEC Laboratories America, Inc .

OSSAMS Open Source Security Assessment Management System

A GENERAL FORMULA FOR OPTION PRICES IN A STOCHASTIC VOLATILITY MODEL Stephen Chin and Daniel

Practical Semantic Parsing for Spoken Language Understanding NAACL 2019 Marco Damonte 1 , Rahul

Backpropagating through Structured Argmax using a SPIGOT Hao Peng, Sam Thomson, Noah A. Smith

Supporting Analysis of SQL Queries in PHP AiR David Anderson and Mark Hills (@hillsma on Twitter)

TF-TRT BEST PRACTICE, EAST AS AN EXAMPLE Xiaowei Wang ( ), Dec 18 th , 2019