compiler construction
play

Compiler Construction Lecture 9: Practical parsing issues and yacc - PowerPoint PPT Presentation

Compiler Construction Lecture 9: Practical parsing issues and yacc intro 2020-02-04 Michael Engel Overview Practical parsing issues Error recovery Unary operators Handling context-sensitive ambiguity Left versus right


  1. Compiler Construction Lecture 9: Practical parsing issues and yacc intro 2020-02-04 Michael Engel

  2. Overview • Practical parsing issues • Error recovery • Unary operators • Handling context-sensitive ambiguity • Left versus right recursion • A quick yacc intro • Syntax of yacc grammar descriptions • yacc-lex interaction • Example Compiler Construction 09: Practical parsing, yacc � 2

  3. Error recovery Syntax analysis • Syntax errors are common in program development • Our previous parsers have stopped parsing at the first error • Is this what a programmer would want? [2] • Prefer to find as many syntax errors as possible in each compilation • A mechanism for error recovery helps the parser to move on to a state where it can continue parsing when it encounters an error • Select one or more words that the parser can use to synchronize the input with its internal state • When the parser encounters an error, it discards input symbols until it finds a synchronizing word and then resets its internal state to one consistent with the synchronizing word Compiler Construction 09: Practical parsing, yacc � 3

  4. Error recovery Syntax analysis • Consider a language using semicolons as statement separators • The semicolon can be used as synchronizing element: when an error occurs, the parser calls the scanner repeatedly until it finds a semicolon f oo = f un c ) 42 ; 
 r e t u r n f oo ; • Here, a recursive-descent parser can simply discard words until it finds a semicolon and return ( fake ) success [1] • This resynchronization is more complex in an LR(1) parser: • it discards input until it finds a semicolon… • scans back down the stack to find state with valid Go t o[s, S t m t ] entry • the first such state on represents the statement that contains the error • discards entries on the stack above that state, pushes the state 
 Go t o[s, S t m t ] onto the stack and resumes normal parsing Compiler Construction 09: Practical parsing, yacc � 4

  5. Unary operators Syntax analysis • Classic expression grammar includes binary operators only • Algebraic notation includes unary operators • e.g., unary minus and absolute value • Other unary operators: • autoincrement ( i++ ) • autodecrement ( i -- ) • address-of ( & ) • dereference ( * ) • boolean complement ( ! ) • typecasts ( ( i n t )x ) • Adding these to the expression grammar requires some care Compiler Construction 09: Practical parsing, yacc � 5

  6. Unary operators Syntax analysis Example: expression grammar with an absolute value operator ||x Start Start → Expr 
 Expr → Expr + Term 
 Expr | Expr - Term 
 Expr Term | Term 
 "-" Term → Term × Value 
 Value Term | Term ÷ Value 
 Value Factor | Value 
 Value → " ||" Factor 
 <num,3> Factor "||" | Factor 
 Factor → "(" Expr ")" 
 <name,x> | num 
 Parse tree for || x - 3 | name Compiler Construction 09: Practical parsing, yacc � 6

  7. Start Unary operators Expr Expr "-" Term Value Term Example: absolute value operator ||x Value Factor • Absolute value should have higher precedence than either × or ÷ <num,3> Factor "||" • However, it needs lower precedence than Factor <name,x> • this enforces evaluation of parenthetic expressions Start → Expr 
 before application of || Expr → Expr + Term 
 • The example grammar is still LR(1) | Expr - Term 
 • but it does not allow to write || || x | Term 
 Term → Term × Value 
 • Writing this doesn’t make much sense | Term ÷ Value 
 • but it’s a legal mathematical operation, so why not? | Value 
 Value → " ||" Factor 
 • This would work: ||(|| x) | Factor 
 • Problem for other operators like (dereferencing) * Factor → "(" Expr ")" 
 | num 
 • **p is a common operation in C | name Compiler Construction 09: Practical parsing, yacc � 7

  8. Unary operators Problem for other operators like * Start → Expr 
 Expr → Expr + Term 
 • **p is a common operation in C | Expr - Term 
 • Solution: | Term 
 Term → Term "*" Value 
 • add a dereference production for Value as well: Value → "*" Value | Term ÷ Value 
 • The resulting grammar is still LR(1) | Value 
 Value → "*" Value 
 • even if we replace the " × " operator 
 in Term → Term × Value with "*" , | " ||" Factor 
 overloading the operator "*" in the 
 | Factor 
 way that C does Factor → "(" Expr ")" 
 | num 
 • The same approach works for unary minus | name Compiler Construction 09: Practical parsing, yacc � 8

  9. Handling context-sensitive ambiguity Syntax analysis • Using one word to represent two different meanings can create a syntactic ambiguity • Common in early programming languages (FORTRAN, PL/I, Ada) • Parentheses used to enclose both the subscript expressions of an array reference and the argument list of a subroutine or function • For the input f ee( i , j ) , the compiler cannot tell if f ee is a two- dimensional array or a procedure that must be invoked • Differentiating between these two cases requires knowledge of f ee ’s declared type • This information is not syntactically obvious • The scanner would classify f ee as a name in either case Compiler Construction 09: Practical parsing, yacc � 9

  10. Handling context-sensitive ambiguity Syntax analysis • We can add productions that derive both subscript expressions and argument lists from Factor Factor → FunctionReference 
 • Handling this in a classical 
 expression grammar might 
 | ArrayReference 
 look like this: | "(" Expr ")" 
 | num 
 | name 
 • Since the last two productions 
 have identical right-hand sides, 
 FunctionReference 
 → name "(" ArgList ")" 
 this grammar is ambiguous, which 
 creates a reduce-reduce conflict 
 ArrayReference 
 in an LR(1) table builder → name "(" ArgList ")" Compiler Construction 09: Practical parsing, yacc � 10

  11. Handling context-sensitive ambiguity Syntax analysis Our grammar results in an LR(1) reduce-reduce conflict • Resolving this ambiguity requires extra-syntactic knowledge • "Is name a function or an array?" Factor → FunctionReference 
 • In a recursive-descent parser, the 
 compiler writer can combine the 
 | ArrayReference 
 code for FunctionReference and 
 | "(" Expr ")" 
 ArrayReference | num 
 • add the extra code required to 
 | name 
 check the name’s declared type FunctionReference 
 • In a table-driven parser built with a 
 → name "(" ArgList ")" 
 parser generator, the solution must 
 ArrayReference 
 work within the framework provided 
 → name "(" ArgList ")" by the tools Compiler Construction 09: Practical parsing, yacc � 11

  12. Handling context-sensitive ambiguity Syntax analysis Factor → FunctionOrArrayReference 
 Two different approaches to solve this: | "(" Expr ")" 
 | num 
 • Rewrite grammar to combine function 
 | name 
 invocation and array reference into a 
 FunctionOrArrayReference 
 single production → name "(" ArgList ")" • issue is deferred until a later step in translation • there, it can be resolved with information from the declarations • Scanner can classify identifiers based on their declared types • requires handshaking between scanner and parser • works as long as the language has a define-before-use rule • Rewritten in this way, the grammar is unambiguous • Since the scanner returns a distinct 
 FunctionReference 
 syntactic category in each case, the 
 → f un cti on_name "(" ArgList ")" 
 parser can distinguish the two cases FunctionOrArrayReference 
 → a rr a y _name "(" ArgList ")" Compiler Construction 09: Practical parsing, yacc � 12

  13. Left versus right recursion Syntax analysis • Top-down parsers need right-recursive grammars • Bottom-up parsers can accommodate either left or right recursion • Compiler writers must choose between left and right recursion in writing the grammar for a bottom-up parser – how? 
 Stack depth criterion • Left recursion can lead to smaller stack depths • Accordingly, lower memory use, less recursions List → List e lt 
 List → e lt List 
 | e lt | e lt Left recursive grammar Right recursive grammar Compiler Construction 09: Practical parsing, yacc � 13

  14. Left versus right recursion: stack depth Syntax analysis • The left-recursive grammar shifts e lt 1 onto elt 5 its stack and immediately reduces it to List elt 4 • Next, it shifts e lt 2 onto the stack and reduces elt 3 it to List and so on… elt 2 elt 1 • It proceeds until it has shifted each of the five e lt ’s onto the stack and reduced them to List List → List e lt 
 • Thus, the stack reaches | e lt • a maximum depth of two List 
 List e lt 5 
 • and an average depth of � = � 10 1 2 List e lt 4 e lt 5 
 6 3 List e lt 3 e lt 4 e lt 5 
 • The stack depth of a left-recursive 
 List e lt 2 e lt 3 e lt 4 e lt 5 
 grammar depends on the grammar, 
 List e lt 1 e lt 2 e lt 3 e lt 4 e lt 5 Left recursion not the input stream Compiler Construction 09: Practical parsing, yacc � 14

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend