top down parsing
play

Top-Down Parsing Slides modified from Louden Book and Dr. Scherger - PowerPoint PPT Presentation

Compiler Design and Construction Top-Down Parsing Slides modified from Louden Book and Dr. Scherger Top Down Parsing A top-down parsing algorithm parses an input string of tokens by tracing out the steps in a leftmost derivation. Such an


  1. Compiler Design and Construction Top-Down Parsing Slides modified from Louden Book and Dr. Scherger

  2. Top Down Parsing  A top-down parsing algorithm parses an input string of tokens by tracing out the steps in a leftmost derivation.  Such an algorithm is called top-down because the implied traversal of the parse tree is a preorder traversal. 2 Top Down Parsing COSC 4353

  3. Parsing  A top-down parser “discovers” the parse tree by starting at the root (start symbol) and expanding (predict) downward in a depth-first manner  They predict the derivation before the matching is done  A bottom-up parser starts at the leaves (terminals) and determines which production generates them. Then it determines the rules to generate their parents and so-on, until reaching root (S)

  4. Parsing Example Consider the following Grammar <program>  begin <stmts> end $ <stmts>  SimpleStmt ; <stmts> <stmts>  begin <stmts> end ; <stmts> <stmts>  l  Input: begin SimpleStmt; SimpleStmt; end $

  5. Top-down Parsing Example Input: begin SimpleStmt; SimpleStmt; end $ <program> <program>  begin <stmts> end $ <stmts>  SimpleStmt ; <stmts> <stmts>  begin <stmts> end ; <stmts> <stmts>  l

  6. Top-down Parsing Example Input: begin SimpleStmt; SimpleStmt; end $ <program> begin <stmts> end $ <program>  begin <stmts> end $ <stmts>  SimpleStmt ; <stmts> <stmts>  begin <stmts> end ; <stmts> <stmts>  l

  7. Top-down Parsing Example Input: begin SimpleStmt; SimpleStmt; end $ <program> begin <stmts> end $ SimpleStmt ; <stmts> <program>  begin <stmts> end $ <stmts>  SimpleStmt ; <stmts> <stmts>  begin <stmts> end ; <stmts> <stmts>  l

  8. Top-down Parsing Example Input: begin SimpleStmt; SimpleStmt; end $ <program> begin <stmts> end $ SimpleStmt ; <stmts> SimpleStmts ; <stmts> <program>  begin <stmts> end $ <stmts>  SimpleStmt ; <stmts> <stmts>  begin <stmts> end ; <stmts> <stmts>  l

  9. Top-down Parsing Example Input: begin SimpleStmt; SimpleStmt; end $ <program> begin <stmts> end $ SimpleStmt ; <stmts> SimpleStmts ; <stmts> l <program>  begin <stmts> end $ <stmts>  SimpleStmt ; <stmts> <stmts>  begin <stmts> end ; <stmts> <stmts>  l

  10. Two Kinds of Top Down Parsers  Predictive parsers that try to make decisions about the structure of the tree below a node based on a few lookahead tokens (usually one!).  This means that only 1 (or k) rules can expand on given terminal  This is a weakness, since little program structure has been seen before predictive decisions must be made.  Backtracking parsers that solve the lookahead problem by backtracking if one decision turns out to be wrong and making a different choice.  But such parsers are slow (exponential time in general). 10 Top Down Parsing COSC 4353

  11. Top Down Parsers (cont.)  Fortunately, many practical techniques have been developed to overcome the predictive lookahead problem, and the version of predictive parsing called recursive-descent is still the method of choice for hand-coding, due to its simplicity.  But because of the inherent weakness of top-down parsing, it is not a good choice for machine-generated parsers. Instead, more powerful bottom-up parsing methods should be used (Chapter 5). 11 Top Down Parsing COSC 4353

  12. Recursive Descent Parsing  Simple, elegant idea:  Use the grammar rules as recipes for procedure code.  Each non-terminal (lhs) corresponds to a procedure.  Each appearance of a terminal in the rhs of a rule causes a token to be matched.  Each appearance of a non-terminal corresponds to a call of the associated procedure. 12 Top Down Parsing COSC 4353

  13. Recursive Descent Example Grammar rule: factor  ( exp ) | number Code: void factor(void) { if (token == number ) match( number ); else { match(‘(‘); exp(); match(‘)’); } } 13 Top Down Parsing COSC 4353

  14. Recursive Descent Example, (cont.) Note how lookahead is not a problem in this example: if the token is number , go one way, if the token is ‘(‘ go the other, and if the token is neither, declare error: void match(Token expect) { if (token == expect) getToken(); else error(token,expect); } 14 Top Down Parsing COSC 4353

  15. Recursive Descent Example (cont.) A recursive-descent procedure can also compute values or syntax trees: int factor(void) { if (token == number ) { int temp = atoi(tokStr); match( number ); return temp; } else { match(‘(‘); int temp = exp(); match(‘)’); return temp; } } 15 Top Down Parsing COSC 4353

  16. Errors in Recursive Descent Are Tricky to Handle:  If an error occurs, we must somehow gracefully exit possibly many recursive calls.  Best solution: use exception handling to manage stack unwinding (which C doesn’t have!).  But there are worse problems:  left recursion doesn’t work! 16 Top Down Parsing COSC 4353

  17. Left recursion is impossible! exp  exp addop term | term void exp(void) { if (token == ??) { exp(); // uh, oh!! addop(); term(); } else term(); } 17 Top Down Parsing COSC 4353

  18. Review on EBNF 18 Top Down Parsing COSC 4353

  19. Extra Notation:  So far: Backus-Naur Form (BNF)  Metasymbols are |    Extended BNF (EBNF):  New metasymbols […] and {…}   largely eliminated by these 19 Top Down Parsing COSC 4353

  20. EBNF Metasymbols:  Brackets […] mean “optional” (like ? in regular expressions):  exp  term ‘ | ’ exp | term becomes: exp  term [ ‘ | ’ exp ]  if-stmt  if ( exp ) stmt | if ( exp ) stmt else stmt becomes: if-stmt  if ( exp ) stmt [ else stmt ]  Braces {…} mean “repetition” (like * in regexps - see next slide) 20 Top Down Parsing COSC 4353

  21. Braces in EBNF  Replace only left-recursive repetition:  exp  exp + term | term becomes: exp  term { + term }  Left associativity still implied  Watch out for choices:  exp  exp + term | exp - term | term is not the same as exp  term { + term } | term { - term } 21 Top Down Parsing COSC 4353

  22. Simple Expressions in EBNF exp  term { addop term } addop  + | - term  factor { mulop factor } mulop  * factor  ( exp ) | number 22 Top Down Parsing COSC 4353

  23. Left recursion is impossible! exp  exp addop term | term void exp(void) { if (token == ??) { exp(); // uh, oh!! addop(); term(); } else term(); } 23 Top Down Parsing COSC 4353

  24. EBNF to the rescue! exp  term { addop term } void exp(void) { term(); while (token is an addop ) { addop(); term(); } } 24 Top Down Parsing COSC 4353

  25. This code can even left associate: int exp(void) Left associative tells us that 5-7+2 = ? { int temp = term(); -4 or 0 while (token == ‘+’ || token == ‘ - ’) if (token == ‘+’) { match(‘+’); temp += term();} else { match(‘ - ’); temp -= term();} return temp; } 25 Top Down Parsing COSC 4353

  26. Note that right recursion/assoc. is not a problem: Right-associative tells us that exp  term [ addop exp ] 5*2^2 = ? 20 or 100 void exp(void) Or a = 5; { term(); a=b=2 a ?= 2 or 5 if (token is an addop ) { addop(); exp(); } } 26 Top Down Parsing COSC 4353

  27. Non-Recursive Top Down Parsing 27 Top Down Parsing COSC 4353

  28. Step 1: Make DFA-like Transition Diagrams  One can represent the actions of a T E’ E 0 1 2 predictive parser with a transition diagram for each nonterminal of the + T E’ 3 4 5 6 E' grammar. For example, lets draw the  diagrams for the following grammar: 6 F T’ 7 8 9 T E --> T E' E' -->  | + T E' F T’ * 10 11 13 12 T’ T --> F T'  T' -->  | * F T' 13 F --> id | ( E ) E ) ( 17 18 16 F 15 id 19 28 Top Down Parsing COSC 4353

  29. Top Down Parsing  To traverse an edge labeled with T E’ E 0 1 2 a nonterminal the parser goes to the starting state of the diagram + T E’ 3 4 5 6 E' for that nonterminal and returns  to the original diagram when it has reached the end state of that 6 nonterminal. F T’ 7 8 9 T  The parser has a stack to keep F T’ * 10 11 13 12 T’ track of these actions.   For example, to traverse the T -edge 13 from state 0 to state 1, the parser puts state 1 on the top of the stack, E ) ( traverses the T -diagram from state 17 18 16 F 15 7 to state 9 and then goes to state id 1 after popping it off the stack. 19 29 Top Down Parsing COSC 4353

  30. Top Down Parsing  An edge labeled with a T E’ E 0 1 2 terminal can be traversed + T E’ when the current input token 3 4 5 6 E'  equals that terminal: 6  When such an edge is traversed the current input token is F T’ 7 8 9 T replaced with the next input token. F T’ * 10 11 13 12 T’  For example, the + -edge from  state 3 to state 4 can be 13 traversed when the parser is in state 3 and the input token is + : E ) ( 17 18 16 F 15 traversing the edge will replace id the + token with the next token. 19 30 Top Down Parsing COSC 4353

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend