Compiler Construction Chapter 2: CFGs & Parsing Slides modified - PowerPoint PPT Presentation

Productions  A production is for a nonterminal if that nonterminal appears on the left-side of the production.  A grammar derives a string of tokens by starting with the start symbol and repeatedly replacing nonterminals with right-sides of productions for those nonterminals.  A parse tree is a convenient method of showing that a given token string can be derived from the start symbol of a grammar:  the root of the tree must be the starting symbol, the leaves must be the tokens in the token string, and the children of each parent node must be the right-side of some production for that parent node.  For example, draw the parse tree for the token string 9 - 5 + 2 24 Chapter 3: Context Free Grammars and February, 2010 Parsers

Productions  The language defined by a grammar is the set of all token strings that can be derived from its start symbol.  The language defined by the example grammar contains all lists of digits separated by plus and minus signs. 25 Chapter 3: Context Free Grammars and February, 2010 Parsers

Productions  Epsilon, e , on the right-side of a production denotes the empty string.  Consider the grammar for Pascal begin-end blocks  a block does not need to contain any statements block --> begin opt_stmts end opt_stmts --> stmt_list | e stmt_list --> stmt_list ; stmt | stmt 26 Chapter 3: Context Free Grammars and February, 2010 Parsers

Ambiguity  A grammar is ambiguous if two or more different parse trees can derive the same token string.  Grammars for compilers should be unambiguous since different parse trees will give a token string different meanings. 27 Chapter 3: Context Free Grammars and February, 2010 Parsers

Ambiguity (cont.)  Here is another example of a grammar for strings of digits separated by plus and minus signs: string --> string + string | string - string | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9  However this grammar is ambiguous. Why?  Draw two different parse trees for the token string 9 - 5 + 2 that correspond to two different ways of parenthesizing the expression: ( 9 - 5 ) + 2 or 9 - ( 5 + 2 )  The first parenthesization evaluates to 6 while the second parenthesization evaluates to 2 . 28 Chapter 3: Context Free Grammars and February, 2010 Parsers

Sources Of Ambiguity:  Associativity and precedence of operators  Sequencing  Extent of a substructure (dangling else)  “Obscure” recursion (unusual)  exp  exp exp 29 Chapter 3: Context Free Grammars and February, 2010 Parsers

Dealing With Ambiguity  Disambiguating rules  Change the grammar (but not the language!)  Can all ambiguity be removed?  Backtracking can handle it, but the expense is great 30 Chapter 3: Context Free Grammars and February, 2010 Parsers

Associativity of Operators  By convention, when an operand like 5 in the expression 9 - 5 + 2 has operators on both sides, it should be associated with the operator on the left:  In most programming languages arithmetic operators like addition, subtraction, multiplication, and division are left-associative .  In the C language the assignment operator, = , is right- associative:  The string a = b = c should be treated as though it were parenthesized a = ( b = c ) .  A grammar for a right-associative operator like = looks like: right --> letter = right | letter letter --> a | b | ... | z 31 Chapter 3: Context Free Grammars and February, 2010 Parsers

Precedence of Operators  Should the expression 9 + 5 * 2 be interpreted like (9 + 5) * 2 or 9 + (5 * 2) ?  The convention is to give multiplication and division higher precedence than addition and subtraction.  When evaluating an arithmetic expression we perform operations of higher precedence before operations of lower precedence:  Only when we have operations of equal precedence (like addition and subtraction) do we apply the rules of associativity. 32 Chapter 3: Context Free Grammars and February, 2010 Parsers

Syntax of Expressions  An arithmetic expression is a string of terms separated by left- associative addition and subtraction operators.  A term is a string of factors separated by left-associative multiplication and division operators.  A factor is a single operand (like an id or num token) or an expression wrapped inside of parentheses.  Therefore, a grammar of arithmetic expressions looks like: expr --> expr + term | expr - term | term term --> term * factor | term / factor | factor factor --> id | num | ( expr ) 33 Chapter 3: Context Free Grammars and February, 2010 Parsers

Syntax Directed Translation  As mentioned in earlier, modern compilers use syntax- directed translation to interleave the actions of the compiler phases.  The syntax analyzer directs the whole process:  calling the lexical analyzer whenever it wants another token and performing the actions of the semantic analyzer and the intermediate code generator as it parses the source code. 34 Chapter 3: Context Free Grammars and February, 2010 Parsers

Syntax Directed Translation (cont.)  The actions of the semantic analyzer and the intermediate code generator usually require the passage of information up and/or down the parse tree.  We think of this information as attributes attached to the nodes of the parse tree and the parser moving this information between parent nodes and children nodes as it performs the productions of the grammar. 35 Chapter 3: Context Free Grammars and February, 2010 Parsers

Postfix Notation  As an example of syntax-directed translation a simple infix-to-postfix translator is developed here.  Postfix notation (also called Reverse Polish Notation or RPN) places each binary arithmetic operator after its two source operands instead of between them:  The infix expression (9 - 5) + 2 becomes 9 5 - 2 + in postfix notation  The infix expression 9 - (5 + 2) becomes 9 5 2 + - in postfix (postfix expressions do not need parentheses.) 36 Chapter 3: Context Free Grammars and February, 2010 Parsers

Principle of Syntax-directed Semantics  The parse tree will be used as the basic model;  semantic content will be attached to the tree;  thus the tree should reflect the structure of the eventual semantics (semantics-based syntax would be a better term) 37 Chapter 3: Context Free Grammars and February, 2010 Parsers

Syntax Directed Defintions  A syntax-directed definition uses a context-free grammar to specify the syntactic structure of the input, associates a set of attributes with each grammar symbol, and associates a set of semantic rules with each production of the grammar.  As an example, suppose the grammar contains the production: X --> Y Z so node X in a parse tree has nodes Y and Z as children and further suppose that nodes X , Y , and Z have associated attributes X.a , Y.a , and Z.a , respectively. 38 Chapter 3: Context Free Grammars and February, 2010 Parsers

Syntax Directed Definitions  As an example, suppose the grammar contains the production: X X --> Y Z so node X in a parse tree has  nodes Y and Z as children and further (X.a) suppose that nodes X , Y , and Z have associated attributes X.a , Y.a , and Z.a , respectively. An annotated parse tree looks like this   Y Z (Y.a) (Z.a)  If the semantic rule { X.a := Y.a + Z.a } is associated with the  X --> Y Z production then the parser should add the a attributes of nodes Y and Z together and set the a attribute of node X to their sum. 39 Chapter 3: Context Free Grammars and February, 2010 Parsers

Synthesized Attributes  An attribute is synthesized if its value at a parent node can be determined from attributes at its children.  Attribute a in the previous example is a synthesized attribute.  Synthesized attributes can be evaluated by a single bottom-up traversal of the parse tree. 40 Chapter 3: Context Free Grammars and February, 2010 Parsers

Example: Infix to Postfix Translation The following table shows the syntax-directed definition of an infix-to-postfix translator.  Attribute t associated with each node is a character string and the || operator denotes  concatenation. Since the grammar symbol expr appears more than once in some productions, subscripts are  used to differentiate between the tree nodes in the production and in the associated semantic rule. The figure shows how the input infix expression 9 - 5 + 2 is translated to the postfix  expression 9 5 - 2 + at the root of the parse tree. Production Semantic Rule expr -> expr 1 + term expr 1 .t := expr 1 .t || term.t || ‘+’ expr -> expr 1 – term expr 1 .t := expr 1 .t || term.t || ‘ - ’ expr -> term expr 1 .t := term.t term -> 0 term.t := ‘0’ term -> 0 term.t := ‘1’ … … term -> 9 term.t := ‘9’ 41 Chapter 3: Context Free Grammars and February, 2010 Parsers

Example: Infix to Postfix Translation The following table shows the syntax-directed definition of an infix-to-postfix translator.  Attribute t associated with each node is a character string and the || operator denotes  concatenation. Since the grammar symbol expr appears more than once in some productions, subscripts are  used to differentiate between the tree nodes in the production and in the associated semantic rule. The figure shows how the input infix expression 9 - 5 + 2 is translated to the postfix  expression 9 5 - 2 + at the root of the parse tree. expr.t = 95-2+ term.t = 2 expr.t = 95- term.t = 5 expr.t = 9 term.t = 9 9 - 5 + 2 42 Chapter 3: Context Free Grammars and February, 2010 Parsers

Example: Robot Navigation  Suppose a robot can be instructed to move one step east, north, west, or south from its current position. north  A sequence of such instructions is (2,1) generated by the following grammar. west begin (-1,0) (0,0)  seq -> seq instr | begin north  instr -> east | north | west | south south east east east (-1,-1) (2,-1)  Changes in the position of the robot on input begin west south east east  east north north 43 Chapter 3: Context Free Grammars and February, 2010 Parsers

Example: Robot Navigation seq.x = -1 seq.x = seq 1 .x + instr.dx seq.y = -1 seq.y = seq 1 .y + instr.dy instr.dx = 0 seq.x = -1 instr.dy = -1 seq.y = 0 instr.dx = -1 seq.x = 0 instr.dy = 0 seq.y = 0 south begin west 44 Chapter 3: Context Free Grammars and February, 2010 Parsers

Example: Robot Navigation Production Semantic Rules seq -> begin seq.x := 0 seq.y := 0 seq -> seq 1 instr seq.x := seq 1 .x + instr.dx seq.y := seq 1 .y + instr.dy instr -> east instr.dx = 1 instr.dy = 0 instr -> north instr.dx = 0 instr.dy = 1 instr ->west instr.dx = -1 instr.dy = 0 instr -> south instr.dx = 0 instr.dy = -1 45 Chapter 3: Context Free Grammars and February, 2010 Parsers

Depth First Traversals A depth-first traversal of the parse tree is a  convenient way of evaluating attributes. The traversal starts at the root, visits every  child, returns to a parent after visiting each of its children, and eventually returns to the root procedure visit(n: node) begin for each child m of n, from left to right do Synthesized attributes can be evaluated  whenever the traversal goes from a node visit( m ); to its parent. evaluate semantic rules at node n Other attributes (like inherited attributes) end   can be evaluated whenever the traversal goes from a parent to its children. . 46 Chapter 3: Context Free Grammars and February, 2010 Parsers

Translation Schemes  A translation scheme is another way of specifying a syntax- directed translation:  semantic actions (enclosed in braces) are embedded within the right-sides of the productions of a context-free grammar.  For example, rest --> + term { print ('+') } rest 1  This indicates that a plus sign should be printed between the depth-first traversal of the term node and the depth-first traversal of the rest 1 node of the parse tree. 47 Chapter 3: Context Free Grammars and February, 2010 Parsers

Translation Schemes  This figure shows the translation scheme for an infix-to- postfix translator: expr -> expr + term { print(‘+’) } expr -> expr - term { print(‘ - ’) } expr -> term term -> 0 { print(‘0’) } term -> 1 { print(‘1’) } … term -> 9 { print(‘9’) } 48 Chapter 3: Context Free Grammars and February, 2010 Parsers

Translation Schemes  The postfix expression is printed out as the parse tree is traversed as shown in this figure  Note that it is not necessary to actually construct the parse tree. expr.t = 95-2+ {print(‘+’)} expr.t = 95- term.t = 2 {print(‘ - ’)} term.t = 5 expr.t = 9 - term.t = 9 {print(‘5’)} {print(‘2’)} 9 5 + 2 {print(‘9’)} 49 Chapter 3: Context Free Grammars and February, 2010 Parsers

Parsing  For a given input string of tokens we can ask, “Is this input syntactically valid?”  That is, can it be generated by our grammar  An algorithm that answers this question is a recognizer  If we also get the structure (derivation tree) we have a parser  For any language that can be described by a context-free grammar a parser that parses a string of n tokens in O ( n 3 ) time can be constructed.  However, most every programming language is so simple that a parser requires just O ( n ) time with a single left-to-right scan over the input. 50 Chapter 3: Context Free Grammars and February, 2010 Parsers

Parsing  Most parsers are either top-down or bottom-up.  A top-down parser “discovers” a parse tree by starting at the root (start symbol) and expanding (predict) downward depth-first.  Predict the derivation before the matching is done  A bottom-up parser builds a parse tree by starting at the leaves (terminals) and determining the rule to generate them, and continues working up toward the root.  Top-down parsers are usually easier to code by hand but compiler- generating software tools usually generate bottom-up parsers because they can handle a wider class of context-free grammars.  This course covers both top-down and bottom-up parsers and the coding projects may give you the experience of coding both kinds: 51 Chapter 3: Context Free Grammars and February, 2010 Parsers

Parsing Example Consider the following Grammar <program>  begin <stmts> end $ <stmts>  SimpleStmt ; <stmts> <stmts>  begin <stmts> end ; <stmts> <stmts> l   Input: begin SimpleStmt; SimpleStmt; end $

Top-down Parsing Example Input: begin SimpleStmt; SimpleStmt; end $ <program> <program>  begin <stmts> end $ <stmts>  SimpleStmt ; <stmts> <stmts>  begin <stmts> end ; <stmts> <stmts> l

Top-down Parsing Example Input: begin SimpleStmt; SimpleStmt; end $ <program> begin <stmts> end $ <program>  begin <stmts> end $ <stmts>  SimpleStmt ; <stmts> <stmts>  begin <stmts> end ; <stmts> <stmts> l

Top-down Parsing Example Input: begin SimpleStmt; SimpleStmt; end $ <program> begin <stmts> end $ SimpleStmt ; <stmts> <program>  begin <stmts> end $ <stmts>  SimpleStmt ; <stmts> <stmts>  begin <stmts> end ; <stmts> <stmts> l

Top-down Parsing Example Input: begin SimpleStmt; SimpleStmt; end $ <program> begin <stmts> end $ SimpleStmt ; <stmts> SimpleStmts ; <stmts> <program>  begin <stmts> end $ <stmts>  SimpleStmt ; <stmts> <stmts>  begin <stmts> end ; <stmts> <stmts> l

Top-down Parsing Example Input: begin SimpleStmt; SimpleStmt; end $ <program> begin <stmts> end $ SimpleStmt ; <stmts> SimpleStmts ; <stmts> l <program>  begin <stmts> end $ <stmts>  SimpleStmt ; <stmts> <stmts>  begin <stmts> end ; <stmts> <stmts> l

Bottom-up Parsing Example  Scan the input looking for any substrings that appear on the RHS of a rule!  We can do this left-to-right or right-to-left  Let's use left-to-right  Replace that RHS with the LHS  Repeat until left with Start symbol or error

Bottom-up Parsing Example Input: begin SimpleStmt; SimpleStmt; end $ <stmts> l <program>  begin <stmts> end $ <stmts>  SimpleStmt ; <stmts>  <stmts>  begin <stmts> end ; <stmts> <stmts> l

Bottom-up Parsing Example Input: begin SimpleStmt; SimpleStmt; end $ <stmts> SimpleStmts ; <stmts>   l <program>  begin <stmts> end $ <stmts>  SimpleStmt ; <stmts> <stmts>  begin <stmts> end ; <stmts> <stmts> l

Bottom-up Parsing Example Input: begin SimpleStmt; SimpleStmt; end $ <stmts> SimpleStmt ; <stmts> SimpleStmts ; <stmts> l <program>  begin <stmts> end $ <stmts>  SimpleStmt ; <stmts>  <stmts>  begin <stmts> end ; <stmts> <stmts> l

Bottom-up Parsing Example Input: begin SimpleStmt; SimpleStmt; end $ <program> begin <stmts> end $ SimpleStmt ; <stmts> SimpleStmts ; <stmts> l <program>  begin <stmts> end $ <stmts>  SimpleStmt ; <stmts>  <stmts>  begin <stmts> end ; <stmts> <stmts> l

Top Down Parsing  To introduce top-down parsing we consider the following context-free grammar: expr --> term rest rest --> + term rest | - term rest | e term --> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9  and show the construction of the parse tree for the input string: 9 - 5 + 2 . 63 Chapter 3: Context Free Grammars and February, 2010 Parsers

Top Down Parsing  Initialization: The root of the parse tree must be the starting symbol of the grammar, expr . expr expr --> term rest rest --> + term rest | - term rest | e term --> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 64 Chapter 3: Context Free Grammars and February, 2010 Parsers

Top Down Parsing  Step 1: The only production for expr is expr --> term rest so the root node must have a term node and a rest node as children. expr term rest expr --> term rest rest --> + term rest | - term rest | e term --> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 65 Chapter 3: Context Free Grammars and February, 2010 Parsers

Top Down Parsing  Step 2: The first token in the input is 9 and the only production in the grammar containing a 9 is:  term --> 9 so 9 must be a leaf with the term node as a parent. expr term rest 9 expr --> term rest rest --> + term rest | - term rest | e term --> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 66 Chapter 3: Context Free Grammars and February, 2010 Parsers

Top Down Parsing  Step 3: The next token in the input is the minus-sign and the only production in the grammar containing a minus- sign is:  rest --> - term rest . The rest node must have a minus-sign leaf, a term node and a rest node as children. expr term rest - term rest 9 expr --> term rest rest --> + term rest | - term rest | e term --> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 67 Chapter 3: Context Free Grammars and February, 2010 Parsers

Top Down Parsing  Step 4: The next token in the input is 5 and the only production in the grammar containing a 5 is:  term --> 5 so 5 must be a leaf with a term node as a parent. expr term rest - term rest 9 expr --> term rest rest --> + term rest 5 | - term rest | e term --> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 68 Chapter 3: Context Free Grammars and February, 2010 Parsers

Top Down Parsing  Step 5: The next token in the input is the plus-sign and the only production in the grammar containing a plus-sign is:  rest --> + term rest .  A rest node must have a plus-sign leaf, a term node and a rest node as children. expr term rest - term rest 9 expr --> term rest term rest rest --> + term rest 5 + | - term rest | e term --> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 69 Chapter 3: Context Free Grammars and February, 2010 Parsers

Top Down Parsing  Step 6: The next token in the input is 2 and the only production in the grammar containing a 2 is: term --> 2 so 2 must be a leaf with a term node as a parent. expr term rest - term rest 9 expr --> term rest term rest rest --> + term rest 5 + | - term rest | e term --> 0 | 1 | 2 | 3 | 4 2 | 5 | 6 | 7 | 8 | 9 70 Chapter 3: Context Free Grammars and February, 2010 Parsers

Top Down Parsing  Step 7: The whole input has been absorbed but the parse tree still has a rest node with no children.  The rest --> e production must now be used to give the rest node the empty string as a child. expr term rest - term rest 9 expr --> term rest term rest rest --> + term rest 5 + | - term rest | e term --> 0 | 1 | 2 | 3 | 4 2 e | 5 | 6 | 7 | 8 | 9 71 Chapter 3: Context Free Grammars and February, 2010 Parsers

Parsing  Only one possible derivation tree if grammar unambiguous  Top-down use leftmost derivations  Leftmost nonterminal expanded first  Bottom-up use right most derivations  Rightmost nonterminal expanded first  Two most common types of parsers are LL and LR parsers  1 st letter for left-to-right token parsing  2 nd for derivation (leftmost, rightmost)  LL(n) – n is # of lookahead symbols

LL(1) Parsing  How do we predict which NT to expand?  We can use the lookahead  However, if more than 1 rule expands given that lookahead the grammar cannot be parsed by our LL(1) parser  This means the “prediction” for top -down is easy, just use the lookahead

Building an LL(1) Parser  We need to determine some sets  First(n) – Terminals that can start valid strings that are generated by n: n  V*  Follow(A) – Set of terminals that can follow A in some legal derivation. A is nonterminal   Predict(prod) – Any token that can be the 1 st symbol produced by the RHS of prod  Predict(A  X 1 ...X m ) = (First(X 1 ...X m )- l )UFollow(A) if l  First(X 1 ...X m ) First(X 1 ...X m ) otherwise  These sets used to create a parse table

Parse Table  A row for each nonterminal  A column for each terminal  Entries contain rule (production) #s  For a lookahead T, the production to predict given that terminal as the lookahead and that non terminal to be matched.

Example Micro On handout  Predict(A  X 1 ...X m ) = if l  First(X 1 ...X m ) (First(X 1 ...X m )- l ) U Follow(A) else First(X 1 ...X m ) The parse table is filled in using: T(A,a) = A  X 1 ...X m if a  Predict(A  X 1 ...X m) T(A,a) = Error otherwise

Making LL(1) Grammars  This is not always an easy task  Must have a unique prediction for each (nonterminal, lookahead)  Conflicts are usually either  Left-recursion  Common prefixes  Often we can remove these conflicts  Not all conflicts can be removed  Dangling else (Pascal) is one of them

LL(1) Grammars  A grammar is LL(1) iff whenever A  a|b are two distinct productions the following conditions hold  The is no terminal a, such that both α and β derive strings beginning with a.  At most one of a and b can derive the empty string  If β derives the empty string, then α does not derive any string beginning with a terminal in FOLLOW(A). Likewise, if α derives the empty string, then β does not derive any string beginning with a terminal in FOLLOW(A).  LL(1) means we scan the input from left to right (first L) and a leftmost derivation is produced (leftmost non terminal expanded) by using 1 lookahead symbol to decide the rule to expand. 78 Chapter 3: Context Free Grammars and February, 2010 Parsers

Making LL(1) Grammars Left-recursion  Consider A  A b  Assume some lookahead symbol t causes the prediction of the above rule  This prediction causes A to be put on the parse stack  We have the same lookahead and the same symbol on the stack, so this rule will be predicted again, and again.......

Eliminating Left Recursion  Replace expr → expr + term | term  by expr → term expr' expr' → + term expr' | ε 80 Chapter 3: Context Free Grammars and February, 2010 Parsers

Making LL(1) Grammars Factoring  Consider <stmt>  if <expr> then <stmts> end if; <stmt>  if <expr> then <stmts> else <stmts> end if;  The productions share a common prefix  The First sets of each RHS are not disjoint  We can factor out the common prefix <stmt>  if <expr> then <stmts> <ifsfx> <ifsfx>  end if; < ifsfx >  else <stmts> end if;

Properties of LL(1) Parsers  A correct leftmost parse is guaranteed  All LL(1) grammars are unambiguous  O(n) in time and space

Top Down Parsing  In the previous example, the grammar made it easy for the parser to pick the correct production in each step of the parse.  This is not true in general: consider the following grammar: statement --> if expression then statement else statement statement --> if expression then statement  When the input token is an if token should a top-down parser use the first or second production?  The parser would have to guess which one to use, continue parsing, and later on, if the guess is wrong, go back to the if token and try the other production. 83 Chapter 3: Context Free Grammars and February, 2010 Parsers

Top Down Parsing  Usually one can modify the grammar so a predictive top- down parser can be used:  The parser always picks the correct production in each step of the parse so it never has to back-track.  T o allow the use of a predictive parser, one replaces the two productions above with: statement --> if expression then statement optional_else optional_else --> else statement | e 84 Chapter 3: Context Free Grammars and February, 2010 Parsers

Predictive Parsing  A recursive-descent parser is a top-down parser that executes a set of recursive procedures to process the input:  there is a procedure for each nonterminal in the grammar.  A predictive parser is a top-down parser where the current input token unambiguously determines the production to be applied at each step.  Here, we show the code of a predictive parser for the following grammar: expr --> term rest rest --> + term rest | - term rest | e term --> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 85 Chapter 3: Context Free Grammars and February, 2010 Parsers

Predictive Parsing  Procedure match( t:token )  We assume a global variable, lookahead , holding the current input token and a  Begin procedure match ( ExpectedToken ) If lookahead = t then  that loads the next token into  Lookahead := nexttoken lookahead if the current token is what Else  is expected, otherwise match reports  error an error and halts.  end 86 Chapter 3: Context Free Grammars and February, 2010 Parsers

Predictive Parsing This is a recursive-descent parser so a expr()  procedure is written for each nonterminal of { term(); rest(); return; } the grammar. Since there is only one production for expr ,  procedure expr is very simple: rest() { if (lookahead == '+') Since there are three productions for rest ,  { procedure rest uses lookahead to select the match('+'); term(); correct production. rest(); return; If lookahead is neither + nor - then rest  } selects the -production and simply returns else if (lookahead == '-') without any actions: { match('-'); term(); rest(); return; } expr --> term rest else rest --> + term rest | - term rest | e { term --> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 return; | 8 | 9 } } 87 Chapter 3: Context Free Grammars and February, 2010 Parsers

Predictive Parsing  The procedure for term , called term , term() checks to make sure that lookahead is { if (isdigit(lookahead)) { a digit: match(lookahead); return; } else { ReportErrorAndHalt(); } } expr --> term rest rest --> + term rest | - term rest | e term --> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 88 Chapter 3: Context Free Grammars and February, 2010 Parsers

Predictive Parsing  After loading lookahead with the first input token this parser is started by calling expr (since expr is the starting symbol.)  If there are no syntax errors in the input, the parser conducts a depth-traversal of the parse tree and returns to the caller through expr , otherwise it reports an error and halts.  If there is an e -production for a nonterminal then the procedure for that nonterminal selects it whenever none of the other productions are suitable.  If there is no e -production for a nonterminal and none of its productions are suitable then the procedure should report a syntax error. 89 Chapter 3: Context Free Grammars and February, 2010 Parsers

Left Recursion  A production like: expr --> expr + term  Where the first symbol on the right-side is the same as the symbol on the left-side is said to be left-recursive .  If one were to code this production in a recursive- descent parser, the parser would go in an infinite loop calling the expr procedure repeatedly. 90 Chapter 3: Context Free Grammars and February, 2010 Parsers

Left Recursion  Fortunately a left-recursive grammar can be easily modified to eliminate the left-recursion.  For example, expr --> expr + term | expr - term | term  defines an expr to be either a single term or a sequence of terms separated by plus and minus signs.  Another way of defining an expr (without left-recursion) is: expr --> term rest rest --> + term rest | - term rest | e 91 Chapter 3: Context Free Grammars and February, 2010 Parsers

A Translator for Simple Expressions  A translation-scheme for converting simple infix expressions to postfix is: expr --> term rest rest --> + term { print('+') ; } rest rest --> - term { print('-') ; } rest rest --> e term --> 0 { print('0') ; } term --> 1 { print('1') ; } ... ... term --> 9 { print('9') ; } 92 Chapter 3: Context Free Grammars and February, 2010 Parsers

A Translator for Simple Expressions expr() { term(); rest(); return; } rest() { if (lookahead == '+') {match('+'); term(); print('+'); rest(); return; } else if (lookahead == '-') {match('-'); term(); print('-'); rest(); return; } else { return; } } term() { if (isdigit(lookahead)) { print(lookahead); match(lookahead); return ; } else { ReportErrorAndHalt(); } } 93 Chapter 3: Context Free Grammars and February, 2010 Parsers

Parse Trees  Phrase – sequence of tokens descended from a nonterminal  Simple phrase – phrase that contains no smaller phrase at the leaves  Handle – the leftmost simple phrase

Parse Trees E Prefix ( E ) F V Tail + E V Tail l

LR Parsing Shift Reduce  Use a parse stack  Initially empty, it contains symbols already parsed (T & NT)  Tokens are shifted onto stack until the top of the stack contains the handle  The handle is then reduced by replacing it on the stack with the non terminal that is its parent in the derivation tree  Success when no input left and goal symbol on the stack

Shift Reduce Parser Useful Data Structures  Action table – determines whether to shift, reduce, terminate with success, or an error has occurred  Parse stack – contains parse states  They encode the shifted symbol and the handles that are being matched  GoTo Table – defines successor states after a token or LHS is matched and shifted.

Shift Reduce Parser S – top parse stack state T – Current input token push(S 0 ) // start state Loop forever case Action(S,T) error => ReportSyntaxError() accept => CleanUpAndFinish() shift => Push(GoTo(S,T)) Scanner(T) // yylex() reduce => Assume X -> Y 1 ...Y m Pop(m) // S' is new stack top Push(GoTo(S',X))

Shift Reduce Parser Example Consider the following grammar G 0 : <program>  begin <stmts> end $ <stmts>  SimpleStmt ; <stmts> <stmts>  begin <stmts> end ; <stmts> <stmts>  l  using the Action and GoTo tables for G 0 what would the parse look like for the following input:? Begin SimpleStmt; SimpleStmt; end $

Shift Reduce Parser Example Parse Stack Remaining Input Action 0 Begin SimpleStmt; SimpleStmt; end $ shift 0,1 SimpleStmt; SimpleStmt; end $ shift

Compiler Construction Chapter 2: CFGs & Parsing Slides modified - PowerPoint PPT Presentation

Compiler Construction Chapter 2: CFGs & Parsing Slides modified from Louden Book and Dr. Scherger Parsing The parser takes the compact representation (tokens) from the scanner and checks the structure It determines if it is

Compiler Construction Chapter 11 1 Compiler Construction Compiler Construction A New Compiler

CSC 4181 Compiler Construction Parsing 1 1 Outline Top-down v.s. Bottom-up Top-down parsing

Introduction to Bottom-Up Parsing Shift-reduce parsing The LR parsing algorithm

Syntax-Directed Translation 1 CFGs so Far CFGs for Language Definition The CFGs weve

Robust Incremental Neural Semantic Graph Parsing Jan Buys and Phil Blunsom Dependency Parsing vs

Basic Parsing Algorithms Chart Parsing Seminar Recent Advances in Parsing Technology WS

Compiler Construction Lecture 7: Bottom-up parsing 2020-01-28 Michael Engel Includes material

LR Parsing Compiler Design CSE 504 Shift-Reduce Parsing 1 LR Parsers 2 SLR and LR(1) Parsers

Top-Down Parsing 1 Parsing: Review of the Big Picture (1) Context-free grammars (CFGs)

CFGs and Intro to Parsing Scott Farrar CLMA, University of Washington farrar@uw.edu January 11,

Compiler Construction Compiler Construction 1 / 111 Mayer Goldberg \ Ben-Gurion University

Compiler Construction November 21, 2018 Compiler Construction November 21, 2018 1 / 102 Mayer

Compiler Construction Compiler Construction 1 / 54 Mayer Goldberg \ Ben-Gurion University Tuesday

Compiler Construction Compiler Construction 1 / 193 Mayer Goldberg \ Ben-Gurion University Friday

Compiler Construction Compiler Construction 1 / 177 Mayer Goldberg \ Ben-Gurion University

Compiler Construction Compiler Construction 1 / 87 Mayer Goldberg \ Ben-Gurion University

Algorithms with numbers (1) CISC5835, Computer Algorithms CIS, Fordham Univ. Instructor: X.

String Searching The previous slide is not a great example of what is meant by String

CSC321 Lecture 1: Introduction Roger Grosse Roger Grosse CSC321 Lecture 1: Introduction 1 / 26

Library of Congress Classification: Module 8.5 1 Library of Congress Classification: Module 8.5

validarcae Utility tool to deal with the Portuguese classification of economic activities (CAE)

HMDA Webinar 2 Transcript Slides and transcript to accompany the webinar video presentation

Lecture 5.3: Why RSA works Matthew Macauley Department of Mathematical Sciences Clemson

Regular Expressions Simple matching and searching String: My name is Claus Regex: My name is