Top-Down Parsing Slides modified from Louden Book and Dr. Scherger - PowerPoint PPT Presentation

Compiler Design and Construction Top-Down Parsing Slides modified from Louden Book and Dr. Scherger

Top Down Parsing  A top-down parsing algorithm parses an input string of tokens by tracing out the steps in a leftmost derivation.  Such an algorithm is called top-down because the implied traversal of the parse tree is a preorder traversal. 2 Top Down Parsing COSC 4353

Parsing  A top-down parser “discovers” the parse tree by starting at the root (start symbol) and expanding (predict) downward in a depth-first manner  They predict the derivation before the matching is done  A bottom-up parser starts at the leaves (terminals) and determines which production generates them. Then it determines the rules to generate their parents and so-on, until reaching root (S)

Parsing Example Consider the following Grammar <program>  begin <stmts> end $ <stmts>  SimpleStmt ; <stmts> <stmts>  begin <stmts> end ; <stmts> <stmts>  l  Input: begin SimpleStmt; SimpleStmt; end $

Top-down Parsing Example Input: begin SimpleStmt; SimpleStmt; end $ <program> <program>  begin <stmts> end $ <stmts>  SimpleStmt ; <stmts> <stmts>  begin <stmts> end ; <stmts> <stmts>  l

Top-down Parsing Example Input: begin SimpleStmt; SimpleStmt; end $ <program> begin <stmts> end $ <program>  begin <stmts> end $ <stmts>  SimpleStmt ; <stmts> <stmts>  begin <stmts> end ; <stmts> <stmts>  l

Top-down Parsing Example Input: begin SimpleStmt; SimpleStmt; end $ <program> begin <stmts> end $ SimpleStmt ; <stmts> <program>  begin <stmts> end $ <stmts>  SimpleStmt ; <stmts> <stmts>  begin <stmts> end ; <stmts> <stmts>  l

Top-down Parsing Example Input: begin SimpleStmt; SimpleStmt; end $ <program> begin <stmts> end $ SimpleStmt ; <stmts> SimpleStmts ; <stmts> <program>  begin <stmts> end $ <stmts>  SimpleStmt ; <stmts> <stmts>  begin <stmts> end ; <stmts> <stmts>  l

Top-down Parsing Example Input: begin SimpleStmt; SimpleStmt; end $ <program> begin <stmts> end $ SimpleStmt ; <stmts> SimpleStmts ; <stmts> l <program>  begin <stmts> end $ <stmts>  SimpleStmt ; <stmts> <stmts>  begin <stmts> end ; <stmts> <stmts>  l

Two Kinds of Top Down Parsers  Predictive parsers that try to make decisions about the structure of the tree below a node based on a few lookahead tokens (usually one!).  This means that only 1 (or k) rules can expand on given terminal  This is a weakness, since little program structure has been seen before predictive decisions must be made.  Backtracking parsers that solve the lookahead problem by backtracking if one decision turns out to be wrong and making a different choice.  But such parsers are slow (exponential time in general). 10 Top Down Parsing COSC 4353

Top Down Parsers (cont.)  Fortunately, many practical techniques have been developed to overcome the predictive lookahead problem, and the version of predictive parsing called recursive-descent is still the method of choice for hand-coding, due to its simplicity.  But because of the inherent weakness of top-down parsing, it is not a good choice for machine-generated parsers. Instead, more powerful bottom-up parsing methods should be used (Chapter 5). 11 Top Down Parsing COSC 4353

Recursive Descent Parsing  Simple, elegant idea:  Use the grammar rules as recipes for procedure code.  Each non-terminal (lhs) corresponds to a procedure.  Each appearance of a terminal in the rhs of a rule causes a token to be matched.  Each appearance of a non-terminal corresponds to a call of the associated procedure. 12 Top Down Parsing COSC 4353

Recursive Descent Example Grammar rule: factor  ( exp ) | number Code: void factor(void) { if (token == number ) match( number ); else { match(‘(‘); exp(); match(‘)’); } } 13 Top Down Parsing COSC 4353

Recursive Descent Example, (cont.) Note how lookahead is not a problem in this example: if the token is number , go one way, if the token is ‘(‘ go the other, and if the token is neither, declare error: void match(Token expect) { if (token == expect) getToken(); else error(token,expect); } 14 Top Down Parsing COSC 4353

Recursive Descent Example (cont.) A recursive-descent procedure can also compute values or syntax trees: int factor(void) { if (token == number ) { int temp = atoi(tokStr); match( number ); return temp; } else { match(‘(‘); int temp = exp(); match(‘)’); return temp; } } 15 Top Down Parsing COSC 4353

Errors in Recursive Descent Are Tricky to Handle:  If an error occurs, we must somehow gracefully exit possibly many recursive calls.  Best solution: use exception handling to manage stack unwinding (which C doesn’t have!).  But there are worse problems:  left recursion doesn’t work! 16 Top Down Parsing COSC 4353

Left recursion is impossible! exp  exp addop term | term void exp(void) { if (token == ??) { exp(); // uh, oh!! addop(); term(); } else term(); } 17 Top Down Parsing COSC 4353

Review on EBNF 18 Top Down Parsing COSC 4353

Extra Notation:  So far: Backus-Naur Form (BNF)  Metasymbols are |    Extended BNF (EBNF):  New metasymbols […] and {…}   largely eliminated by these 19 Top Down Parsing COSC 4353

EBNF Metasymbols:  Brackets […] mean “optional” (like ? in regular expressions):  exp  term ‘ | ’ exp | term becomes: exp  term [ ‘ | ’ exp ]  if-stmt  if ( exp ) stmt | if ( exp ) stmt else stmt becomes: if-stmt  if ( exp ) stmt [ else stmt ]  Braces {…} mean “repetition” (like * in regexps - see next slide) 20 Top Down Parsing COSC 4353

Braces in EBNF  Replace only left-recursive repetition:  exp  exp + term | term becomes: exp  term { + term }  Left associativity still implied  Watch out for choices:  exp  exp + term | exp - term | term is not the same as exp  term { + term } | term { - term } 21 Top Down Parsing COSC 4353

Simple Expressions in EBNF exp  term { addop term } addop  + | - term  factor { mulop factor } mulop  * factor  ( exp ) | number 22 Top Down Parsing COSC 4353

Left recursion is impossible! exp  exp addop term | term void exp(void) { if (token == ??) { exp(); // uh, oh!! addop(); term(); } else term(); } 23 Top Down Parsing COSC 4353

EBNF to the rescue! exp  term { addop term } void exp(void) { term(); while (token is an addop ) { addop(); term(); } } 24 Top Down Parsing COSC 4353

This code can even left associate: int exp(void) Left associative tells us that 5-7+2 = ? { int temp = term(); -4 or 0 while (token == ‘+’ || token == ‘ - ’) if (token == ‘+’) { match(‘+’); temp += term();} else { match(‘ - ’); temp -= term();} return temp; } 25 Top Down Parsing COSC 4353

Note that right recursion/assoc. is not a problem: Right-associative tells us that exp  term [ addop exp ] 5*2^2 = ? 20 or 100 void exp(void) Or a = 5; { term(); a=b=2 a ?= 2 or 5 if (token is an addop ) { addop(); exp(); } } 26 Top Down Parsing COSC 4353

Non-Recursive Top Down Parsing 27 Top Down Parsing COSC 4353

Step 1: Make DFA-like Transition Diagrams  One can represent the actions of a T E’ E 0 1 2 predictive parser with a transition diagram for each nonterminal of the + T E’ 3 4 5 6 E' grammar. For example, lets draw the  diagrams for the following grammar: 6 F T’ 7 8 9 T E --> T E' E' -->  | + T E' F T’ * 10 11 13 12 T’ T --> F T'  T' -->  | * F T' 13 F --> id | ( E ) E ) ( 17 18 16 F 15 id 19 28 Top Down Parsing COSC 4353

Top Down Parsing  To traverse an edge labeled with T E’ E 0 1 2 a nonterminal the parser goes to the starting state of the diagram + T E’ 3 4 5 6 E' for that nonterminal and returns  to the original diagram when it has reached the end state of that 6 nonterminal. F T’ 7 8 9 T  The parser has a stack to keep F T’ * 10 11 13 12 T’ track of these actions.   For example, to traverse the T -edge 13 from state 0 to state 1, the parser puts state 1 on the top of the stack, E ) ( traverses the T -diagram from state 17 18 16 F 15 7 to state 9 and then goes to state id 1 after popping it off the stack. 19 29 Top Down Parsing COSC 4353

Top Down Parsing  An edge labeled with a T E’ E 0 1 2 terminal can be traversed + T E’ when the current input token 3 4 5 6 E'  equals that terminal: 6  When such an edge is traversed the current input token is F T’ 7 8 9 T replaced with the next input token. F T’ * 10 11 13 12 T’  For example, the + -edge from  state 3 to state 4 can be 13 traversed when the parser is in state 3 and the input token is + : E ) ( 17 18 16 F 15 traversing the edge will replace id the + token with the next token. 19 30 Top Down Parsing COSC 4353

Top-Down Parsing Slides modified from Louden Book and Dr. Scherger - PowerPoint PPT Presentation

Compiler Design and Construction Top-Down Parsing Slides modified from Louden Book and Dr. Scherger Top Down Parsing A top-down parsing algorithm parses an input string of tokens by tracing out the steps in a leftmost derivation. Such an

CSC 4181 Compiler Construction Parsing 1 1 Outline Top-down v.s. Bottom-up Top-down parsing

Introduction to Bottom-Up Parsing Shift-reduce parsing The LR parsing algorithm

Statistical Dependency Parsing in Korean: From Corpus Generation To Automatic Parsing Workshop on

Basic Parsing Algorithms Chart Parsing Seminar Recent Advances in Parsing Technology WS

Overview Introduction Lexicalized TAG, Advantages of parsing with LTAG Parsing LTAGs

* 07/16/96 Plan for Today Shift-reduce parsing The problem with predictive top down parsing

1 Determinism and Parsing The parsing problem is, given a string w and a context-free grammar G ,

Bottom-up parsing LR parsing Construct parse tree for input from leaves up LR( k ) parsing

Bottom Up Parsing Also known as Shift-Reduce parsing More powerful than top down Dont

Dependency Parsing & Feature-based Parsing Ling571 Deep Processing Techniques for NLP

Statistical Parsing Parsing context-free languages ar ltekin University of Tbingen

Generalised Parsing and Combinator Parsing A Happy Marriage? L. Thomas van Binsbergen

Robust Incremental Neural Semantic Graph Parsing Jan Buys and Phil Blunsom Dependency Parsing vs

Statistical Parsing Dependency parsing ar ltekin University of Tbingen Seminar fr

Plan for Today Predictive parsing as a specific subclass of recursive descent parsing

Lectures 7 and 8. Parsing (syntax analysis) Wei Le 2015.9 Bottom Up Parsing Recognize many

Ontological Constraints Giorgio Orsi 1,2 and Andreas Pieris 2 1 Institute for the Future of

CS 301 Lecture 08 Regular languages recap Stephen Checkoway February 12, 2018 1 / 39

Isabelles use of unifjcation When using rule , erule , drule Isabelle uses a process known as

Monotonicity Testing Yevgeniy Dodis, Oded Goldreich, Eric Lehman, Sofya Raskhodnikova, Dana Ron

L INEAR T EMPORAL L OGIC (LTL) 1 Presented by Rehab Ashari Sahar Habib C

A Query Language for Formal Mathematical Libraries Florian Rabe Jacobs University Bremen,

skip [Sipser] [Sipser]

Logica (I&E) najaar 2018 http://liacs.leidenuniv.nl/~vlietrvan1/logica/ Rudy van Vliet

Top-Down Parsing Slides modified from Louden Book and Dr. Scherger - PowerPoint PPT Presentation

Compiler Design and Construction Top-Down Parsing Slides modified from Louden Book and Dr. Scherger Top Down Parsing A top-down parsing algorithm parses an input string of tokens by tracing out the steps in a leftmost derivation. Such an

CSC 4181 Compiler Construction Parsing 1 1 Outline Top-down v.s. Bottom-up Top-down parsing

Introduction to Bottom-Up Parsing Shift-reduce parsing The LR parsing algorithm

Statistical Dependency Parsing in Korean: From Corpus Generation To Automatic Parsing Workshop on

Basic Parsing Algorithms Chart Parsing Seminar Recent Advances in Parsing Technology WS

Overview Introduction Lexicalized TAG, Advantages of parsing with LTAG Parsing LTAGs

* 07/16/96 Plan for Today Shift-reduce parsing The problem with predictive top down parsing

1 Determinism and Parsing The parsing problem is, given a string w and a context-free grammar G ,

Bottom-up parsing LR parsing Construct parse tree for input from leaves up LR( k ) parsing

Bottom Up Parsing Also known as Shift-Reduce parsing More powerful than top down Dont

Dependency Parsing &amp; Feature-based Parsing Ling571 Deep Processing Techniques for NLP

Statistical Parsing Parsing context-free languages ar ltekin University of Tbingen

Generalised Parsing and Combinator Parsing A Happy Marriage? L. Thomas van Binsbergen

Robust Incremental Neural Semantic Graph Parsing Jan Buys and Phil Blunsom Dependency Parsing vs

Statistical Parsing Dependency parsing ar ltekin University of Tbingen Seminar fr

Plan for Today Predictive parsing as a specific subclass of recursive descent parsing

Lectures 7 and 8. Parsing (syntax analysis) Wei Le 2015.9 Bottom Up Parsing Recognize many

Ontological Constraints Giorgio Orsi 1,2 and Andreas Pieris 2 1 Institute for the Future of

CS 301 Lecture 08 Regular languages recap Stephen Checkoway February 12, 2018 1 / 39

Isabelles use of unifjcation When using rule , erule , drule Isabelle uses a process known as

Monotonicity Testing Yevgeniy Dodis, Oded Goldreich, Eric Lehman, Sofya Raskhodnikova, Dana Ron

L INEAR T EMPORAL L OGIC (LTL) 1 Presented by Rehab Ashari Sahar Habib C

A Query Language for Formal Mathematical Libraries Florian Rabe Jacobs University Bremen,

skip [Sipser] [Sipser]

Logica (I&amp;E) najaar 2018 http://liacs.leidenuniv.nl/~vlietrvan1/logica/ Rudy van Vliet

Dependency Parsing & Feature-based Parsing Ling571 Deep Processing Techniques for NLP

Logica (I&E) najaar 2018 http://liacs.leidenuniv.nl/~vlietrvan1/logica/ Rudy van Vliet