Top-down Syntax Analysis Sebastian Hack (based on slides by - PowerPoint PPT Presentation

Top-down Syntax Analysis Sebastian Hack (based on slides by Reinhard Wilhelm and Mooly Sagiv) http://compilers.cs.uni-saarland.de Compiler Construction Core Course 2017 Saarland University

Top-Down Syntax Analysis input: A sequence of symbols (tokens) output: A syntax tree or an error message • Read input from left to right • Construct the syntax tree in a top-down manner starting with a node labeled with the start symbol • until input accepted (or error) do • Predict expansion for the actual leftmost nonterminal (maybe using some lookahead into the remaining input) or • Verify predicted terminal symbol against next symbol of the remaining input • Finds leftmost derivations 1

Grammar for Arithmetic Expressions Left factored grammar G 2 , i.e. left recursion removed. S → E E → TE ′ E generates T with a continuation E ′ E ′ → + E | ǫ E ′ generates possibly empty sequence of + T s T → FT ′ T generates F with a continuation T ′ T ′ → ∗ T | ǫ T ′ generates possibly empty sequence of ∗ F s F → id | ( E ) G 2 defines the same language as G 0 und G 1 . 2

Grammar for Arithmetic Expressions Left factored grammar G 2 , i.e. left recursion removed. S → E E → TE ′ E generates T with a continuation E ′ E ′ → + E | ǫ E ′ generates possibly empty sequence of + T s T → FT ′ T generates F with a continuation T ′ T ′ → ∗ T | ǫ T ′ generates possibly empty sequence of ∗ F s F → id | ( E ) G 2 defines the same language as G 0 und G 1 . But the parse tree is not so suitable as an abstract syntax tree! 2

Recursive Descent Parsing • parser is a program, • a procedure X for each non-terminal X , • parses words for non-terminal X , • starts with the first symbol read (into variable nextsym ), • ends with the following symbol read (into variable nextsym ). • uses one symbol lookahead into the remaining input. • uses the FiFo sets to make the expansion transitions deterministic FiFo ( N → α ) = FIRST 1 ( α ) ⊕ 1 FOLLOW 1 ( N ) 3

The FIRST 1 Sets • A production N → α is applicable for symbols that “begin” α • Example: Arithmetic Expressions, Grammar G 2 • The production F → id is applied when the current symbol is id • The production F → ( E ) is applied when the current symbol is ( • The production T → F is applied when the current symbol is id or ( • Formal definition: ∗ ⇒ w , w ∈ V ∗ FIRST 1 ( α ) = { 1 : w | α = T } 4

The FOLLOW 1 Sets • A production N → ǫ is applicable for symbols that “can follow” N in some derivation • Example: Arithmetic Expressions, Grammar G 2 • The production E ′ → ǫ is applied for symbols # and ) • The production T ′ → ǫ is applied for symbols # , ) and + • Formal definition: ∗ FOLLOW 1 ( N ) = { a ∈ V T | ∃ α, γ : S ⇒ α Na γ } = 5

Definitions Let k ≥ 1 • k -prefix of a word w = a 1 . . . a n  a 1 . . . a n if n ≤ k  k : w =  a 1 . . . a k otherwise • k -concatenation ⊕ k : V ∗ × V ∗ → V ≤ k , defined by u ⊕ k v = k : uv • extended to languages k : L = { k : w | w ∈ L } L 1 ⊕ k L 2 = { x ⊕ k y | x ∈ L 1 , y ∈ L 2 } k V ≤ k = � V i set of words of length at most k i =1 6

FIRST k and FOLLOW k X ∈ FIRST k ( X ) ∈ FOLLOW k ( X ) • set of k –prefixes of terminal words for α FIRST k : ( V N ∪ V T ) ∗ → 2 V ≤ k T ∗ FIRST k ( α ) = { k : u | α = ⇒ u } • set of k –prefixes of terminal words that may immediately follow X FOLLOW k : V N → 2 V ≤ k T # 7 ∗ FOLLOW k ( X ) = { w | S = ⇒ β X γ and w ∈ FIRST k ( γ ) }

Parser for G 2 program parser; var nextsym: string ; proc scan; { reads next input symbol into nextsym} proc error (message: string ); { issues error message and stops parser } proc accept; { terminates successfully } proc S; begin E end ; proc E; begin T; E’ end ; 8

proc E’; begin case nextsym in { ”+” } : if nextsym = "+ " then scan else error( "+ expected") fi ; E; otherwise ; endcase end ; proc T; begin F; T’ end ; proc T’; begin case nextsym in { ” ∗ ” } : if nextsym = "*" then scan else error( "* expected") fi ; T; otherwise ; 9 endcase

proc F; begin case nextsym in { ”(” } : if nextsym = "(" then scan else error( "( expected") fi ; E; if nextsym = ”)” then scan else error(" ) expected") fi ; otherwise if nextsym = ”id” then scan else error("id expected") fi ; endcase end ; begin scan; S; if nextsym = ”#” then accept else error(" # expected") fi end . 10

How to Construct such a Parser Program • Code was automatically generated from the grammar and the FiFo sets. • The program generating the parser has the functions: V N → code N_prog : nonterminals ( V N ∪ V T ) ∗ → code C_prog : concantenations S_prog : V N ∪ V T → code symbols 11

Parser Schema program parser; var nextsym: symbol; proc scan; ( ∗ reads next input symbol into nextsym ∗ ) proc error (message: string ); ( ∗ issues error message and stops the parser ∗ ) proc accept; ( ∗ terminates parser successfully ∗ ) N_prog( X 0 ); (* X 0 start symbol *) N_prog( X 1 ); . . . N_prog( X n ); 12

begin scan; X 0 ; if nextsym = ”#” then accept else error(". . . ") fi end 13

The Non-terminal Procedures N = Non-terminal, C = Concatenation, S = Symbol N_prog( X ) = (* X → α 1 | α 2 | · · · | α k − 1 | α k *) proc X; begin case nextsym in FiFo( X → α 1 ) : C_progr( α 1 ); FiFo( X → α 2 ) : C_progr( α 2 ); . . . FiFo( X → α k − 1 ) : C_progr( α k − 1 ); otherwise C_progr( α k ); endcase end ; 14

C_progr( α 1 α 2 · · · α k ) = S_progr( α 1 ); S_progr( α 2 ); . . . S_progr( α k ); S_progr( a ) = if nextsym = a then scan else error ( "a expected") fi S_progr( Y ) = Y FiFo–sets have to be disjoint (LL(1)–grammar) 15

A Generative Solution Generate the control of a deterministic PDA from the grammar and the FiFo sets. • At compiler–generation time construct a table M M : V N × V T → P M [ N , a ] is the production used to expand nonterminal N when the current symbol is a • For some grammars report that the table cannot be constructed. The compiler writer can then decide to: • change the grammar (but not the language) • use a more general parser-generator • “Patch” the table (manually or using some rules) 16

Creating the table Input: cfg G , FIRST 1 und FOLLOW 1 for G . Output: The parsing table M or an indication that such a table cannot be constructed M is constructed as follows: • For all X → α ∈ P and a ∈ FIRST 1 ( α ), set M [ X , a ] = ( X → α ) • If ε ∈ FIRST 1 ( α ), for all b ∈ FOLLOW 1 ( X ), set M [ X , b ] = ( X → α ) • Set all other entries of M to error Parser table cannot be constructed if at least one entry is set twice. Then, G is not LL(1) 17

Example – arithmetic expressions nonterminal symbol Production S ( , id S → E S + , ∗ , ) , # error E → TE ′ E ( , id E + , ∗ , ) , # error E ′ → + E E ′ + E ′ → ǫ E ′ ) , # E ′ ( , ∗ , id error T → FT ′ ( , id T + , ∗ , ) , # T error T ′ → ∗ T T ′ ∗ T ′ → ǫ T ′ + , ) , # T ′ ( , id error F id F → id ( F → ( E ) F + , ∗ , ) F error 18

LL-Parser Driver (interprets the table M ) program parser; var nextsym: symbol; var st: stack of item; proc scan; ( ∗ reads next input symbol into nextsym ∗ ) proc error (message: string ); ( ∗ issues error message and stops the parser ∗ ) proc accept; ( ∗ terminates parser successfully ∗ ) proc reduce; ( ∗ replaces [ X → β. Y γ ][ Y → α. ] by [ X → β Y .γ ] ∗ ) proc pop; ( ∗ removes topmost item from st ∗ ) proc push ( i : item); ( ∗ pushes i onto st ∗ ) proc replaceby ( i: item); ( ∗ replaces topmost item of st by i ∗ ) 19

begin scan; push( [ S ′ → . S ] ); while nextsym � = "#" do case top in [ X → β. a γ ]: if nextsym = a then scan; replaceby([ X → β a .γ ]) else error fi ; [ X → β. Y γ ] : if M [ Y , nextsym ] = ( Y → α ) then push([ Y → .α ]) else error fi ; [ X → α. ]: reduce; [ S ′ → S . ] : if nextsym = "#" then accept else error fi endcase od end . 20

Explicit Stack Deterministic Pushdown Automaton w a v ✻ [ X → α. Y β ] Input tree Output ❄ ρ M Parser–Table Control # Stack 21

LL( k ) Grammar Goal: formalizing our intuition when the expand-transitions of the Item-Pushdown-Automaton can be made deterministic. Means: k -symbol lookahead into the remaining input. 22

LL( k ) Grammar • Let G = ( V N , V T , P , S ) be a cfg and k be a natural number. G is an LL( k ) grammar iff the following holds: if there exist two leftmost derivations ∗ ∗ = lm uY α = ⇒ ⇒ = ⇒ S lm u βα lm ux and ∗ ∗ ⇒ ⇒ ⇒ S = lm uY α = lm u γα = lm uy and if k : x = k : y , then β = γ . • The expansion of the leftmost non-terminal is always uniquely determined by • the consumed part of the input and • the next k symbols of the remaining input 23

Example 1 Let G 1 be the cfg with the productions STAT → if id then STAT else STAT fi | while id do STAT od | begin STAT end | id := id 24

Top-down Syntax Analysis Sebastian Hack (based on slides by - PowerPoint PPT Presentation

Top-down Syntax Analysis Sebastian Hack (based on slides by Reinhard Wilhelm and Mooly Sagiv) http://compilers.cs.uni-saarland.de Compiler Construction Core Course 2017 Saarland University Top-Down Syntax Analysis input: A sequence of symbols

Chapter 6: Syntax Syntax Syntax is the structure of a language. Earlier, both syntax and

Syntax Liam OConnor CSE, UNSW (and data61) Term3 2019 1 Abstract Syntax Parsing Bindings

Top-Down Parsing Slides modified from Louden Book and Dr. Scherger Top Down Parsing A

Agenda What is Top-down Web services? Benefit of top-down Web services How to develop

Literary Analysis Syntax Review AP Literature and Composition 1 SYNTAX n Syntax Defines Style

To TOP or NOT to TOP www.SAS.com To TOP or NOT to TOP Using the TOP command in Linux By Len van

Fundamantals Syntax of Programming Languages cs3723 1 Syntax and Semantics Syntax The

Syntax Analysis Sukree Sinthupinyo 1 1 Department of Computer Engineering Chulalongkorn University

Down Syndrome by Birth Order and Moms Age 3/20/2017 V0 2017-Down-Syndrome 1 2017-Down-Syndrome

Lay Them Down Chorus: Lay them down, Lay them down, Lay your branches down for Him Spread them

Boosted Top Tagging Seung J. Lee Outline Introduction: top jets @ LHC Modern boosted top

Syntax Directed Analysis Chapter 5 1 Compiler Construction Syntax Directed Analysis

Syntax Analysis Reinhard Wilhelm Universitt des Saarlandes wilhelm@cs.uni-sb.de and Mooly

Syntax Analysis Parsing Syntactic analysis = parsing Goal of parser: Find all syntax errors

Top-Down Parsing Top-Down Parsing #1 Extra Credit Question Given this grammar G: E

CSC 4181 Compiler Construction Parsing 1 1 Outline Top-down v.s. Bottom-up Top-down parsing

Context-free grammars Informatics 2A: Lecture 9 John Longley School of Informatics University

Objectives You should be able to ... Right-Linear Grammars Dr. Mattox Beckman Convert

cse 311: foundations of computing Fall 2015 Lecture 21: Context-free grammars and finite state

Efficient Regular Path Query Evaluation in PGX Author : Supervisor : Xuming Meng dr. G.H.L. F

Session Agenda Duration Start Time (mins) Item Speaker 1:00 PM 0:04:00 Opening

A Lightweight Approach Wolfgang Jeltsch to Start Time Consistency in Haskell Introduction

CSE 101 Algorithm Design and Analysis Sanjoy Dasgupta, Russell Impagliazzo, and Ragesh Jaiswal

MA111: Contemporary mathematics Schedule: Mini-Exam 4 is in class, Thu Dec 4th, 2014 Written