compiler construction
play

Compiler Construction Lecture 7: Bottom-up parsing 2020-01-28 - PowerPoint PPT Presentation

Compiler Construction Lecture 7: Bottom-up parsing 2020-01-28 Michael Engel Includes material by Jan Christian Meyer and Rich Maclin (UNM) Overview Top-down parsing revisited Bottom-up parsing Comparison to top-down parsing


  1. Compiler Construction Lecture 7: Bottom-up parsing 2020-01-28 Michael Engel Includes material by Jan Christian Meyer and Rich Maclin (UNM)

  2. Overview • Top-down parsing revisited • Bottom-up parsing • Comparison to top-down parsing • Shift-reduce parsers • Conflict resolution Compiler Construction 07: Bottom-up parsing � 2

  3. Types of languages and automata Syntax analysis • Context-free languages are a superset of regular languages • Regular languages can be detected by DFAs/NFAs Stack machines • DFAs and NFAs don’t have a memory • Stack machines (also called pushdown automata ) 
 add memory by introducing operations 
 push and pop recursively enumerable 
 (type 0) • They enable the stack 
 machine to memorize 
 context-sensitive 
 (type 1) (trace) the path they 
 context-free took to get to a state 
 (type 2) (and revert to a 
 previous one) regular languages (type 3) • More powerful than D/NFA Finite automata Compiler Construction 07: Bottom-up parsing � 3

  4. Top-down parsing and the stack Syntax analysis • We’ve seen LL(1) tables and manually 
 built recursive descent parsers x y EOF • Another simple example: A → x B | y C 
 A → x B A → y C A B → x B | ε 
 B → x B B →ε B v o i d pa r se_A () { C → y C | ε C → y C C →ε sw itc h (s y m) { C c ase 'x': add_ tr ee(x,B); 
 v o i d pa r se_B () { v o i d pa r se_ C () { ma tc h(x); 
 sw itc h (s y m): sw itc h (s y m): pa r se_B(); 
 c ase 'x': c ase 'x': b r ea k ; add_ tr ee(x,B); 
 e rr o r (); b r ea k ; c ase ' y ': add_ tr ee( y , C ); 
 ma tc h(x); 
 c ase ' y ': ma tc h( y ); 
 pa r se_B(); 
 add_ tr ee( y , C ); 
 pa r se_ C (); 
 b r ea k ; ma tc h( y ); 
 b r ea k ; c ase ' y ': pa r se_ C (); 
 c ase EOF: e rr o r (); b r ea k ; b r ea k ; e rr o r (); 
 c ase EOF: c ase EOF: b r ea k ; r e t u r n; r e t u r n; } r e t u r n; r e t u r n; r e t u r n; } } } Compiler Construction 07: Bottom-up parsing � 4

  5. Tracing the recursive descent code Syntax analysis • Which derivation do we get when parsing "y y y"? 
 A → x B | y C 
 A → y C → yy C → yyy C → yyy B → x B | ε 
 • What is the related hierarchy of function calls ? C → y C | ε Call Recur: match(y) Call Call match(y) parse_C parse_C Return Call match(y) parse_C parse_C parse_C parse_C Call Call Return match(y) parse_C parse_C parse_C parse_C parse_C parse_C parse_C Call Return parse_A parse_A parse_A parse_A parse_A parse_A parse_A parse_A parse_A parse_A Call time Unwind: match(y) Return parse_C parse_C Return … parse_C parse_C parse_C Return parse_C parse_C parse_C parse_C Return parse_A parse_A parse_A parse_A parse_A Finished time Compiler Construction 07: Bottom-up parsing � 5

  6. Memory in recursive descent code Syntax analysis • Where is the memory hidden in our parser? • We do not explicitly store and retrieve state • The programming language hides it: • When calling (returning) from a function, state is pushed onto (popped from) the computer’s stack automatically • This state includes the return address of the call site • We can also build LL(1) parsers using iterations • but then we have to implement our own stack… • The stack is needed to match beginnings and ends of productions Any production of the form A → x B y where B can contain further • instances of x and y , such as: 
 Call Expression → ( Expression ) 
 match(y) Call Call Statement → { Statement } 
 Return match(y) parse_C parse_C parse_C Comment → (* Comment *) Call Return parse_A parse_A parse_A parse_A parse_A parse_A Call Compiler Construction 07: Bottom-up parsing

  7. Top-down parsing and the syntax tree Syntax analysis LL(1) parsers generate a parse tree from top to bottom: 𝛕 Part of the syntax tree that has already been derived 𝛃 : current NT symbol 𝛃 u 2 At this point, the parser tries 𝜸 to find a derivation for 𝛃 : 
 v u 0 𝛽 u 2 → u 0 v u 2 u 0 v 1 u R u R has to be derivable from u 2 
 to complete parsing 
 initial part 
 input token stream 
 (otherwise: syntax error) of the input 
 remaining to be read token stream 
 that is already 
 derived Compiler Construction 07: Bottom-up parsing � 7

  8. Bottom-up parsing Syntax analysis Can we also construct the parse tree from bottom to top? 𝛕 We try to guess a production 𝛃 → v 1 v 2 𝛃 u 2 𝛃 u 2 v 1 v 2 u 0 u u R initial part input token stream 
 already 
 remaining to be read reduced Compiler Construction 07: Bottom-up parsing � 8

  9. General idea of bottom-up parsing Syntax analysis • Bottom-up parsing starts from the input token stream (whereas top-down starts from the grammar start symbol) • It reduces a string to the start symbol by inverting productions • trying to find a production matching the right hand side E → T + E | T 
 E ← T + E | T 
 T → i n t × T | i n t | ε T ← i n t × T | i n t | ε • Consider the input token 
 i n t × i n t + i n t T → i n t 
 stream int * int + int : i n t × T + i n t T → i n t × T 
 T + i n t T → i n t 
 • Reading the productions 
 T + T E → T 
 in reverse (from bottom T + E E → T + E 
 to top ) gives a rightmost E derivation Compiler Construction 07: Bottom-up parsing � 9

  10. The resulting parse tree Syntax analysis • A bottom-up parser traces a rightmost derivation in reverse E i n t × i n t + i n t 
 i n t × T + i n t 
 T E T + i n t 
 T + T 
 T T T + E 
 E + i n t × i n t i n t Compiler Construction 07: Bottom-up parsing � 10

  11. A simple bottom-up parsing algo Syntax analysis • Idea: split input string (token stream) into two substrings • Right substring (a string of terminal symbols) has not been examined so far • Left substring has terminals and nonterminals (generated by replacing the right side of a production by the left side) 
 I = i npu t s tri n g r epea t se l e ct a non-emp ty subs tri n g 𝛾 o f I 
 whe r e X → 𝛾 i s a p r odu cti on i n t he gr amma r if no su c h 𝛾 ex i s t s, ba cktr a ck 
 r ep l a c e one 𝛾 b y X i n I un til I == "S" /* s t a rt s y mbo l */ 
 o r a ll o t he r poss i b iliti es exhaus t ed /* e rr o r */ Compiler Construction 07: Bottom-up parsing � 11

  12. Bottom-up parsing steps Syntax analysis I = i npu t s tri n g • Initially, all input is unexamined, 
 r epea t se l e ct a non-emp ty subs tri n g 𝛾 o f I 
 written as: whe r e X → 𝛾 i s a p r odu cti on i n t he gr amma r ↑ x 1 x 2 x 3 …x n if no su c h 𝛾 ex i s t s, ba cktr a ck 
 r ep l a c e one 𝛾 b y X i n I un til I == "S" /* s t a rt s y mbo l */ 
 o r a ll o t he r poss i b iliti es exhaus t ed /* e rr o r */ Two kinds of operations: • Shift : move ↑ one place to the right AB C ↑ x y z AB C x ↑ y z • Reduce : Apply an inverse production at the right end of the left string • If A → x y is a production, then C bx y ↑ ijk C bA ↑ ijk Compiler Construction 07: Bottom-up parsing � 12

  13. Example with reductions only Syntax analysis E → T + E | T 
 T → i n t × T | i n t | ε i n t × i n t ↑ + i n t reduce T → i n t i n t × T ↑ + i n t reduce T → i n t × T reduce T → i n t T + i n t ↑ T + T ↑ reduce E → T reduce E → T + E T + E ↑ Compiler Construction 07: Bottom-up parsing � 13

  14. Example with shift-reduce parsing Syntax analysis E → T + E | T 
 ↑ i n t × i n t + i n t sh ift T → i n t × T | i n t | ε i n t ↑ × i n t + i n t sh ift i n t × ↑ i n t + i n t sh ift i n t × i n t ↑ + i n t r edu c e T → i n t i n t × T ↑ + i n t r edu c e T → i n t × T T ↑ + i n t sh ift T + ↑ i n t sh ift T + i n t ↑ r edu c e T → i n t 
 T + T ↑ r edu c e E → T T + E ↑ r edu c e E → T + E 
 E (a rriv ed a t s t a rt s y mbo l !) Compiler Construction 07: Bottom-up parsing � 14

  15. Implementing the memory Syntax analysis Idea: E → T + E | T 
 T → i n t × T | i n t | ε • Left substring can be implemented 
 by a stack • shift operating pushes a terminal symbol onto the stack • reduce pops zero or more symbols off the stack (the right- hand side of a production) and pushes a non-terminal symbol onto the stack (left-hand side of a production) stack contents input token stream parser operation: stack operation(s) [] ↑ i n t × i n t + i n t sh ift : push [ i n t ] [ i n t ] i n t ↑ × i n t + i n t sh ift : push [ × ] [ i n t , × ] i n t × ↑ i n t + i n t sh ift : push [ i n t ] [ i n t , × , i n t ] i n t × i n t ↑ + i n t r edu c e T → i n t : pop-> i n t , push[ T ] [ i n t , × , T ] i n t × i n t ↑ + i n t r edu c e T → i n t × T : pop, push[ T ] [ T ] i n t × i n t ↑ + i n t … Compiler Construction 07: Bottom-up parsing � 15

  16. Conflicts in parsing Syntax analysis Problem: • How do we decide when to shift or reduce? • Consider the step i n t ↑ × i n t + i n t • We could reduce using T → i n t giving T ↑ × i n t + i n t • A fatal mistake: No way to reduce to the start symbol E • Generic shift-reduce strategy: • If there is a matching pattern ( handle ) on the stack, reduce • Otherwise, shift • What if there is a choice (between two matching patterns)? • If it’s legal to shift or reduce, there is a shift-reduce conflict • If it is legal to reduce by two different productions, there is a reduce-reduce conflict Compiler Construction 07: Bottom-up parsing � 16

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend