CSC 4181 Compiler Construction Parsing 1 1 Outline Top-down - PDF document

CSC 4181 Compiler Construction Parsing 1 1 Outline Top-down v.s. Bottom-up Top-down parsing Bottom-up parsing  Recursive-descent  Shift-reduce parsers parsing  LR(0) parsing  LL(1) parsing  LR(0) items  LL(1) parsing  Finite automata of items algorithm  First and follow sets  LR(0) parsing algorithm  Constructing LL(1)  LR(0) grammar parsing table  SLR(1) parsing  Error recovery  SLR(1) parsing algorithm  SLR(1) grammar  Parsing conflict Parsing 2 2 1

Introduction Parsing is a process that constructs a syntactic structure (i.e. parse tree) from the stream of tokens. We already learned how to describe the syntactic structure of a language using (context-free) grammar. So, a parser only needs to do this? Stream of tokens Parse tree Parser Context-free grammar Parsing 3 3 Top–Down Parsing Bottom–Up Parsing A parse tree is created A parse tree is created from root to leaves from leaves to root The traversal of parse trees The traversal of parse trees is a preorder traversal is a reversal or postorder Tracing leftmost derivation traversal Two types: Tracing rightmost derivation  Backtracking parser  Predictive parser More powerful than top- down parsing Backtracking: Try different structures and backtrack if it does not matched the input Predictive: Guess the structure of the parse tree from the next input Parsing 4 4 2

Parse Trees and Derivations E E  E + E E E  id + E � E E  id + E * E id �  id + id * E id id Top-down parsing  id + id * id E E  E + E E E  E + E * E �  E + E * id E E id �  E + id * id id id  id + id * id Bottom-up parsing Parsing 5 5 TOP DOWN PARSING Parsing 6 6 3

Top-dow n Parsing What does a parser need to decide?  Which production rule is to be used at each point of time ? How to guess? What is the guess based on?  What is the next token?  Reserved word if, open parentheses, etc.  What is the structure to be built?  If statement, expression, etc. Parsing 7 7 Top-dow n Parsing Why is it difficult?  Cannot decide until later  Next token: if Structure to be built: St  St  MatchedSt | UnmatchedSt  UnmatchedSt  if ( E ) St| if ( E ) MatchedSt else UnmatchedSt  MatchedSt  if ( E ) MatchedSt else MatchedSt |...  Production with empty string  Next token: id Structure to be built: par  par  parList |   parList  exp , parList | exp Parsing 8 8 4

Recursive-Descent Write one procedure for each set of productions with the same nonterminal in the LHS Each procedure recognizes a structure described by a nonterminal. A procedure calls other procedures if it needs to recognize other structures. A procedure calls match procedure if it needs to recognize a terminal. Parsing 9 9 Recursive-Descent: Example E  E O F | F For this grammar: E ::= F {O F} O  + | -  We cannot decide which O ::= + | - rule to use for E, and F  ( E ) | id F ::= ( E ) | id  If we choose E  E O F, procedure E it leads to infinitely procedure F { E; O; F; } recursive loops. { switch token Rewrite the grammar { case (: match(‘(‘); into EBNF E; match(‘)’); case id: match(id); procedure E default: error; { F; } while (token=+ or token=-) } { O; F; } } Parsing 10 10 5

Match procedure procedure match(expTok) { if (token==expTok) then getToken else error } The token is not consumed until getToken is executed. Parsing 11 11 Problems in Recursive-Descent Difficult to convert grammars into EBNF Cannot decide which production to use at each point Cannot decide when to use  -production A  Parsing 12 12 6

LL(1) Parsing LL(1)  Read input from ( L ) left to right  Simulate ( L ) leftmost derivation  1 lookahead symbol Use stack to simulate leftmost derivation  Part of sentential form produced in the leftmost derivation is stored in the stack.  Top of stack is the leftmost nonterminal symbol in the fragment of sentential form. Parsing 13 13 Concept of LL(1) Parsing Simulate leftmost derivation of the input. Keep part of sentential form in the stack. If the symbol on the top of stack is a terminal, try to match it with the next input token and pop it out of stack. If the symbol on the top of stack is a nonterminal X, replace it with Y if we have a production rule X  Y.  Which production will be chosen, if there are both X  Y and X  Z ? Parsing 14 14 7

Example of LL(1) Parsing F n E  TX  FNX  ( E)NX T N (  ( TX)NX ( n + ( n ) ) * n $  ( FNX)NX X  (n NX)NX E  (n X)NX  (n ATX)NX A F ) F n +  (n+ TX)NX  (n+ FNX)NX  � E � T � X  (n+( E)NX)NX N T ( T N  (n+( TX)NX)NX  �  X � A � T � X � � �  (n+( FNX)NX)NX Finished Finished  (n+(n NX)NX)NX X X E  � M * A � � � � � �  (n+(n X)NX)NX  � T � F � N  (n+(n) NX)NX ) F F n  (n+(n) X)NX  �  N � M � F � N � � �  (n+(n)) NX  (n+(n)) MFNX  � T N M � N �  (n+(n))* FNX  �  (n+(n))*n NX F � E ) | n � � X E  (n+(n))*n X  (n+(n))*n $ Parsing 15 15 LL(1) Parsing Algorithm Push the start symbol into the stack WHILE stack is not empty ($ is not on top of stack) and the stream of tokens is not empty (the next input token is not $) SWITCH (Top of stack, next token) CASE (terminal a, a): Pop stack; Get next token CASE (nonterminal A, terminal a): IF the parsing table entry M[A, a] is not empty THEN Get A  X 1 X 2 ... X n from the parsing table entry M[A, a] Pop stack; Push X n ... X 2 X 1 into stack in that order ELSE Error CASE ($,$): Accept OTHER: Error Parsing 16 16 8

LL(1) Parsing Table If the nonterminal N is on the top of stack and the next token is t , which production rule to use? t N X � � � Choose a rule N  X X N Y t such that � � � � Q Y  X  * tY or � �  X  *  and S  * WNtY t … … … Parsing 17 17 First Set Let X be  or be in V or T. First( X ) is the set of the first terminal in any sentential form derived from X .  If X is a terminal or  , then First( X ) ={ X }.  If X is a nonterminal and X  X 1 X 2 ... X n is a rule, then  First( X 1 ) -{  } is a subset of First(X)  First( X i )-{  } is a subset of First(X) if for all j<i First( X j ) contains {  }   is in First(X) if for all j ≤ n First( X j )contains  Parsing 18 18 9

Examples of First Set st  ifst | other exp  exp addop term | ifst  if ( exp ) st elsepart term elsepart  else st |  addop  + | - exp  0 | 1 term  term mulop factor | factor First(exp) = {0,1} mulop  * First(elsepart) = {else,  } factor  (exp) | num First(ifst) = {if} First(addop) = {+, -} First(st) = {if, other} First(mulop) = {*} First(factor) = {(, num} First(term) = {(, num} First(exp) = {(, num} Parsing 19 19 Algorithm for finding First(A) If A is a terminal or  , For all terminals a, First(a) = {a} then First(A) = {A}. For all nonterminals A, First(A) := {} If A is a nonterminal, then for each rule A While there are changes to any First(A)  X 1 X 2 ... X n , First(A) For each rule A  X 1 X 2 ... X n contains First(X 1 ) - {  }. If also for some i<n, For each X i in {X 1 , X 2 , …, X n } First(X 1 ), First(X 2 ), ..., If for all j<i First(X j ) contains and First(X i ) contain  , then First(A) contains  , First(X i+1 )-{  }. Then If First(X 1 ), First(X 2 ), ..., add First(X i )-{  } to First(A) and First(X n ) contain  , then First(A) also If  is in First(X 1 ), First(X 2 ), ..., contains  . and First(X n ) Then add  to First(A) Parsing 20 20 10

Finding First Set: An Example exp  term exp’ First exp’  addop term exp’ |  exp addop  + | -  exp’ term  factor term’ addop � � � � � � � � term’  mulop factor term’ |  term ( num  mulop  * term’ factor  ( exp ) | num mulop * * factor ( num ( num Parsing 21 21 Follow Set Let $ denote the end of input tokens If A is the start symbol, then $ is in Follow(A). If there is a rule B  X A Y, then First(Y) - {  } is in Follow(A). If there is production B  X A Y and  is in First(Y), then Follow(A) contains Follow(B). Parsing 22 22 11

Algorithm for Finding Follow (A) If A is the start Follow(S) = {$} symbol, then $ is FOR each A in V-{S} in Follow(A). Follow(A)={} If there is a rule A  WHILE change is made to some Follow sets Y X Z, then FOR each production A  X 1 X 2 ... X n , First(Z) - {  } is in FOR each nonterminal X i Follow(X). Add First(X i+1 X i+2 ...X n )-{  } If there is production into Follow(X i ). B  X A Y and  (NOTE: If i=n, X i+1 X i+2 ...X n =  ) is in First(Y), then Follow(A) contains IF  is in First(X i+1 X i+2 ...X n ) THEN Follow(B). Add Follow(A) to Follow(X i ) Parsing 23 23 Finding Follow Set: An Example exp  term exp’ First Follow exp’  addop term exp’ |  ( num ( num $ $ $ ) ) ) exp addop  + | -  $ exp’ ) � � � � � � � � � � � � term  factor term’ addop � � � � term’  mulop factor term’ |  mulop  * term ( num ( num $ $ ) � � � � � � � �  factor  ( exp ) | num term’ * * * mulop * ( num factor ( num Parsing 24 24 12

CSC 4181 Compiler Construction Parsing 1 1 Outline Top-down - PDF document

CSC 4181 Compiler Construction Parsing 1 1 Outline Top-down v.s. Bottom-up Top-down parsing Bottom-up parsing Recursive-descent Shift-reduce parsers parsing LR(0) parsing LL(1) parsing LR(0) items LL(1) parsing

Compiler Construction Chapter 11 1 Compiler Construction Compiler Construction A New Compiler

CSC 4181 Compiler Construction Overview of Compilers Part 1 Introduction 1 1 What is a

CSC 4181 Compiler Construction Context-Free Grammars Using grammars in parsers CFG 1 1

Introduction to Bottom-Up Parsing Shift-reduce parsing The LR parsing algorithm

The Compiler So Far Scanner Lexical analysis CSC 4181 Detects inputs with illegal

Lexical Analysis The Scanner CSC 4181 Compiler Construction 1 Scanner 1 Introduction A

CSC 4181 Compiler Construction Overview of Compilers Part 2 Introduction 1 1 Some Data

Parse & Syntax Trees Syntax & Semantic Errors Mini Lecture CSC 4181 Compiler

Software engineering Facts Fact : The economies of ALL developed nations are CSC 4181 -

Robust Incremental Neural Semantic Graph Parsing Jan Buys and Phil Blunsom Dependency Parsing vs

Basic Parsing Algorithms Chart Parsing Seminar Recent Advances in Parsing Technology WS

Compiler Construction Lecture 7: Bottom-up parsing 2020-01-28 Michael Engel Includes material

LR Parsing Compiler Design CSE 504 Shift-Reduce Parsing 1 LR Parsers 2 SLR and LR(1) Parsers

Unification of CSC and SE ABET Effor ts Similarity of CSC and SE Programs Similarity of CSC and

Compiler Construction Compiler Construction 1 / 111 Mayer Goldberg \ Ben-Gurion University

Compiler Construction November 21, 2018 Compiler Construction November 21, 2018 1 / 102 Mayer

CMSC 430 Introduction to Compilers Spring 2016 Lexing and Parsing Overview Compilers are

Compilerconstructie najaar 2018 http://www.liacs.leidenuniv.nl/~vlietrvan1/coco/ Rudy van Vliet

Compilers Shift-Reduce Parsing Alex Aiken Shift-Reduce Parsing Important Fact #1 about

Bottom-Up Parsing (A First Step) CockeYoungerKasami (CYK) algorithm and Chomsky Normal

MA/CSSE 474 Theory of Computation Bottom-up parsing Pumping Theorem for CFLs Recap: Going One

Grundlegende Parsingalgorithmen Top-Down & Bottom-Up Parsing Kurt Eberle

Objectives You should be able to ... LR Parsing Explain the difference between an LL and LR

Computational Linguistics: Parsing Raffaella Bernardi CIMeC, University of Trento e-mail:

CSC 4181 Compiler Construction Parsing 1 1 Outline Top-down - PDF document

CSC 4181 Compiler Construction Parsing 1 1 Outline Top-down v.s. Bottom-up Top-down parsing Bottom-up parsing Recursive-descent Shift-reduce parsers parsing LR(0) parsing LL(1) parsing LR(0) items LL(1) parsing

Compiler Construction Chapter 11 1 Compiler Construction Compiler Construction A New Compiler

CSC 4181 Compiler Construction Overview of Compilers Part 1 Introduction 1 1 What is a

CSC 4181 Compiler Construction Context-Free Grammars Using grammars in parsers CFG 1 1

Introduction to Bottom-Up Parsing Shift-reduce parsing The LR parsing algorithm

The Compiler So Far Scanner Lexical analysis CSC 4181 Detects inputs with illegal

Lexical Analysis The Scanner CSC 4181 Compiler Construction 1 Scanner 1 Introduction A

CSC 4181 Compiler Construction Overview of Compilers Part 2 Introduction 1 1 Some Data

Parse &amp; Syntax Trees Syntax &amp; Semantic Errors Mini Lecture CSC 4181 Compiler

Software engineering Facts Fact : The economies of ALL developed nations are CSC 4181 -

Robust Incremental Neural Semantic Graph Parsing Jan Buys and Phil Blunsom Dependency Parsing vs

Basic Parsing Algorithms Chart Parsing Seminar Recent Advances in Parsing Technology WS

Compiler Construction Lecture 7: Bottom-up parsing 2020-01-28 Michael Engel Includes material

LR Parsing Compiler Design CSE 504 Shift-Reduce Parsing 1 LR Parsers 2 SLR and LR(1) Parsers

Unification of CSC and SE ABET Effor ts Similarity of CSC and SE Programs Similarity of CSC and

Compiler Construction Compiler Construction 1 / 111 Mayer Goldberg \ Ben-Gurion University

Compiler Construction November 21, 2018 Compiler Construction November 21, 2018 1 / 102 Mayer

CMSC 430 Introduction to Compilers Spring 2016 Lexing and Parsing Overview Compilers are

Compilerconstructie najaar 2018 http://www.liacs.leidenuniv.nl/~vlietrvan1/coco/ Rudy van Vliet

Compilers Shift-Reduce Parsing Alex Aiken Shift-Reduce Parsing Important Fact #1 about

Bottom-Up Parsing (A First Step) CockeYoungerKasami (CYK) algorithm and Chomsky Normal

MA/CSSE 474 Theory of Computation Bottom-up parsing Pumping Theorem for CFLs Recap: Going One

Grundlegende Parsingalgorithmen Top-Down &amp; Bottom-Up Parsing Kurt Eberle

Objectives You should be able to ... LR Parsing Explain the difference between an LL and LR

Computational Linguistics: Parsing Raffaella Bernardi CIMeC, University of Trento e-mail:

Parse & Syntax Trees Syntax & Semantic Errors Mini Lecture CSC 4181 Compiler

Grundlegende Parsingalgorithmen Top-Down & Bottom-Up Parsing Kurt Eberle