Compiler Construction Lecture 9: Practical parsing issues and yacc - PowerPoint PPT Presentation

Compiler Construction Lecture 9: Practical parsing issues and yacc intro 2020-02-04 Michael Engel

Overview • Practical parsing issues • Error recovery • Unary operators • Handling context-sensitive ambiguity • Left versus right recursion • A quick yacc intro • Syntax of yacc grammar descriptions • yacc-lex interaction • Example Compiler Construction 09: Practical parsing, yacc � 2

Error recovery Syntax analysis • Syntax errors are common in program development • Our previous parsers have stopped parsing at the first error • Is this what a programmer would want? [2] • Prefer to find as many syntax errors as possible in each compilation • A mechanism for error recovery helps the parser to move on to a state where it can continue parsing when it encounters an error • Select one or more words that the parser can use to synchronize the input with its internal state • When the parser encounters an error, it discards input symbols until it finds a synchronizing word and then resets its internal state to one consistent with the synchronizing word Compiler Construction 09: Practical parsing, yacc � 3

Error recovery Syntax analysis • Consider a language using semicolons as statement separators • The semicolon can be used as synchronizing element: when an error occurs, the parser calls the scanner repeatedly until it finds a semicolon f oo = f un c ) 42 ;   r e t u r n f oo ; • Here, a recursive-descent parser can simply discard words until it finds a semicolon and return ( fake ) success [1] • This resynchronization is more complex in an LR(1) parser: • it discards input until it finds a semicolon… • scans back down the stack to find state with valid Go t o[s, S t m t ] entry • the first such state on represents the statement that contains the error • discards entries on the stack above that state, pushes the state   Go t o[s, S t m t ] onto the stack and resumes normal parsing Compiler Construction 09: Practical parsing, yacc � 4

Unary operators Syntax analysis • Classic expression grammar includes binary operators only • Algebraic notation includes unary operators • e.g., unary minus and absolute value • Other unary operators: • autoincrement ( i++ ) • autodecrement ( i -- ) • address-of ( & ) • dereference ( * ) • boolean complement ( ! ) • typecasts ( ( i n t )x ) • Adding these to the expression grammar requires some care Compiler Construction 09: Practical parsing, yacc � 5

Unary operators Syntax analysis Example: expression grammar with an absolute value operator ||x Start Start → Expr   Expr → Expr + Term   Expr | Expr - Term   Expr Term | Term   "-" Term → Term × Value   Value Term | Term ÷ Value   Value Factor | Value   Value → " ||" Factor   <num,3> Factor "||" | Factor   Factor → "(" Expr ")"   <name,x> | num   Parse tree for || x - 3 | name Compiler Construction 09: Practical parsing, yacc � 6

Start Unary operators Expr Expr "-" Term Value Term Example: absolute value operator ||x Value Factor • Absolute value should have higher precedence than either × or ÷ <num,3> Factor "||" • However, it needs lower precedence than Factor <name,x> • this enforces evaluation of parenthetic expressions Start → Expr   before application of || Expr → Expr + Term   • The example grammar is still LR(1) | Expr - Term   • but it does not allow to write || || x | Term   Term → Term × Value   • Writing this doesn’t make much sense | Term ÷ Value   • but it’s a legal mathematical operation, so why not? | Value   Value → " ||" Factor   • This would work: ||(|| x) | Factor   • Problem for other operators like (dereferencing) * Factor → "(" Expr ")"   | num   • **p is a common operation in C | name Compiler Construction 09: Practical parsing, yacc � 7

Unary operators Problem for other operators like * Start → Expr   Expr → Expr + Term   • **p is a common operation in C | Expr - Term   • Solution: | Term   Term → Term "*" Value   • add a dereference production for Value as well: Value → "*" Value | Term ÷ Value   • The resulting grammar is still LR(1) | Value   Value → "*" Value   • even if we replace the " × " operator   in Term → Term × Value with "*" , | " ||" Factor   overloading the operator "*" in the   | Factor   way that C does Factor → "(" Expr ")"   | num   • The same approach works for unary minus | name Compiler Construction 09: Practical parsing, yacc � 8

Handling context-sensitive ambiguity Syntax analysis • Using one word to represent two different meanings can create a syntactic ambiguity • Common in early programming languages (FORTRAN, PL/I, Ada) • Parentheses used to enclose both the subscript expressions of an array reference and the argument list of a subroutine or function • For the input f ee( i , j ) , the compiler cannot tell if f ee is a two- dimensional array or a procedure that must be invoked • Differentiating between these two cases requires knowledge of f ee ’s declared type • This information is not syntactically obvious • The scanner would classify f ee as a name in either case Compiler Construction 09: Practical parsing, yacc � 9

Handling context-sensitive ambiguity Syntax analysis • We can add productions that derive both subscript expressions and argument lists from Factor Factor → FunctionReference   • Handling this in a classical   expression grammar might   | ArrayReference   look like this: | "(" Expr ")"   | num   | name   • Since the last two productions   have identical right-hand sides,   FunctionReference   → name "(" ArgList ")"   this grammar is ambiguous, which   creates a reduce-reduce conflict   ArrayReference   in an LR(1) table builder → name "(" ArgList ")" Compiler Construction 09: Practical parsing, yacc � 10

Handling context-sensitive ambiguity Syntax analysis Our grammar results in an LR(1) reduce-reduce conflict • Resolving this ambiguity requires extra-syntactic knowledge • "Is name a function or an array?" Factor → FunctionReference   • In a recursive-descent parser, the   compiler writer can combine the   | ArrayReference   code for FunctionReference and   | "(" Expr ")"   ArrayReference | num   • add the extra code required to   | name   check the name’s declared type FunctionReference   • In a table-driven parser built with a   → name "(" ArgList ")"   parser generator, the solution must   ArrayReference   work within the framework provided   → name "(" ArgList ")" by the tools Compiler Construction 09: Practical parsing, yacc � 11

Handling context-sensitive ambiguity Syntax analysis Factor → FunctionOrArrayReference   Two different approaches to solve this: | "(" Expr ")"   | num   • Rewrite grammar to combine function   | name   invocation and array reference into a   FunctionOrArrayReference   single production → name "(" ArgList ")" • issue is deferred until a later step in translation • there, it can be resolved with information from the declarations • Scanner can classify identifiers based on their declared types • requires handshaking between scanner and parser • works as long as the language has a define-before-use rule • Rewritten in this way, the grammar is unambiguous • Since the scanner returns a distinct   FunctionReference   syntactic category in each case, the   → f un cti on_name "(" ArgList ")"   parser can distinguish the two cases FunctionOrArrayReference   → a rr a y _name "(" ArgList ")" Compiler Construction 09: Practical parsing, yacc � 12

Left versus right recursion Syntax analysis • Top-down parsers need right-recursive grammars • Bottom-up parsers can accommodate either left or right recursion • Compiler writers must choose between left and right recursion in writing the grammar for a bottom-up parser – how?   Stack depth criterion • Left recursion can lead to smaller stack depths • Accordingly, lower memory use, less recursions List → List e lt   List → e lt List   | e lt | e lt Left recursive grammar Right recursive grammar Compiler Construction 09: Practical parsing, yacc � 13

Left versus right recursion: stack depth Syntax analysis • The left-recursive grammar shifts e lt 1 onto elt 5 its stack and immediately reduces it to List elt 4 • Next, it shifts e lt 2 onto the stack and reduces elt 3 it to List and so on… elt 2 elt 1 • It proceeds until it has shifted each of the five e lt ’s onto the stack and reduced them to List List → List e lt   • Thus, the stack reaches | e lt • a maximum depth of two List   List e lt 5   • and an average depth of � = � 10 1 2 List e lt 4 e lt 5   6 3 List e lt 3 e lt 4 e lt 5   • The stack depth of a left-recursive   List e lt 2 e lt 3 e lt 4 e lt 5   grammar depends on the grammar,   List e lt 1 e lt 2 e lt 3 e lt 4 e lt 5 Left recursion not the input stream Compiler Construction 09: Practical parsing, yacc � 14

Compiler Construction Lecture 9: Practical parsing issues and yacc - PowerPoint PPT Presentation

Compiler Construction Lecture 9: Practical parsing issues and yacc intro 2020-02-04 Michael Engel Overview Practical parsing issues Error recovery Unary operators Handling context-sensitive ambiguity Left versus right

Compiler Construction Chapter 11 1 Compiler Construction Compiler Construction A New Compiler

Compiler Construction Compiler Construction 1 / 111 Mayer Goldberg \ Ben-Gurion University

Compiler Construction November 21, 2018 Compiler Construction November 21, 2018 1 / 102 Mayer

Compiler Construction Compiler Construction 1 / 54 Mayer Goldberg \ Ben-Gurion University Tuesday

Compiler Construction Compiler Construction 1 / 193 Mayer Goldberg \ Ben-Gurion University Friday

Compiler Construction October 20, 2018 Compiler Construction October 20, 2018 1 / 115 Mayer

Compiler Construction Compiler Construction 1 / 177 Mayer Goldberg \ Ben-Gurion University

Compiler Construction Compiler Construction 1 / 87 Mayer Goldberg \ Ben-Gurion University

Compiler Construction Compiler Construction 1 / 88 Mayer Goldberg \ Ben-Gurion University Tuesday

Compiler Construction Compiler Construction 1 / 104 Mayer Goldberg \ Ben-Gurion University Friday

Compiler Construction Compiler Construction 1 / 104 Mayer Goldberg \ Ben-Gurion University Monday

Compiler Construction October 31, 2018 Compiler Construction October 31, 2018 1 / 175 Mayer

Compiler Construction Compiler Construction 1 / 114 Mayer Goldberg \ Ben-Gurion University

Compiler Construction Compiler Construction 1 / 112 Mayer Goldberg \ Ben-Gurion University

Compiler Construction Christian Rinderknecht 31 October 2008 1 Why study compiler construction?

Compiler Construction Lecture 19: Code Generation V (Compiler Backend) Winter Semester 2018/19

UnnaturalNets Work by Joshua Campbell, Eddie Antonio Santos, Nelson J Amaral, Joshua is On the

How Python Works 15-110 Friday 01/17 Learning Objectives Recognize the steps of the

Lecture 02 Algorithmic Thinking Prof. Katherine Gibson Prof. Jeremy Dixon Based on slides by

Automatically Repairing Input Data for Novice Python Programs Madeline Endres, University of

F u n c t i o n s R e t u r n V a l u e s Returns None Returns a

CS 105 Lecture 4: Functions and Conditionals Craig Zilles (Computer Science)

Syntactic Monoids in a Category CALCO 2015 Ji r Ad amek, Stefan Milius and Henning

Piecewise Testable Tree Languages Mikoaj Bojaczyk, Luc Segoufin, Howard Straubing is talk