top down syntax analysis
play

Top-down Syntax Analysis Sebastian Hack (based on slides by - PowerPoint PPT Presentation

Top-down Syntax Analysis Sebastian Hack (based on slides by Reinhard Wilhelm and Mooly Sagiv) http://compilers.cs.uni-saarland.de Compiler Construction Core Course 2017 Saarland University Top-Down Syntax Analysis input: A sequence of symbols


  1. Top-down Syntax Analysis Sebastian Hack (based on slides by Reinhard Wilhelm and Mooly Sagiv) http://compilers.cs.uni-saarland.de Compiler Construction Core Course 2017 Saarland University

  2. Top-Down Syntax Analysis input: A sequence of symbols (tokens) output: A syntax tree or an error message • Read input from left to right • Construct the syntax tree in a top-down manner starting with a node labeled with the start symbol • until input accepted (or error) do • Predict expansion for the actual leftmost nonterminal (maybe using some lookahead into the remaining input) or • Verify predicted terminal symbol against next symbol of the remaining input • Finds leftmost derivations 1

  3. Grammar for Arithmetic Expressions Left factored grammar G 2 , i.e. left recursion removed. S → E E → TE ′ E generates T with a continuation E ′ E ′ → + E | ǫ E ′ generates possibly empty sequence of + T s T → FT ′ T generates F with a continuation T ′ T ′ → ∗ T | ǫ T ′ generates possibly empty sequence of ∗ F s F → id | ( E ) G 2 defines the same language as G 0 und G 1 . 2

  4. Grammar for Arithmetic Expressions Left factored grammar G 2 , i.e. left recursion removed. S → E E → TE ′ E generates T with a continuation E ′ E ′ → + E | ǫ E ′ generates possibly empty sequence of + T s T → FT ′ T generates F with a continuation T ′ T ′ → ∗ T | ǫ T ′ generates possibly empty sequence of ∗ F s F → id | ( E ) G 2 defines the same language as G 0 und G 1 . But the parse tree is not so suitable as an abstract syntax tree! 2

  5. Recursive Descent Parsing • parser is a program, • a procedure X for each non-terminal X , • parses words for non-terminal X , • starts with the first symbol read (into variable nextsym ), • ends with the following symbol read (into variable nextsym ). • uses one symbol lookahead into the remaining input. • uses the FiFo sets to make the expansion transitions deterministic FiFo ( N → α ) = FIRST 1 ( α ) ⊕ 1 FOLLOW 1 ( N ) 3

  6. The FIRST 1 Sets • A production N → α is applicable for symbols that “begin” α • Example: Arithmetic Expressions, Grammar G 2 • The production F → id is applied when the current symbol is id • The production F → ( E ) is applied when the current symbol is ( • The production T → F is applied when the current symbol is id or ( • Formal definition: ∗ ⇒ w , w ∈ V ∗ FIRST 1 ( α ) = { 1 : w | α = T } 4

  7. The FOLLOW 1 Sets • A production N → ǫ is applicable for symbols that “can follow” N in some derivation • Example: Arithmetic Expressions, Grammar G 2 • The production E ′ → ǫ is applied for symbols # and ) • The production T ′ → ǫ is applied for symbols # , ) and + • Formal definition: ∗ FOLLOW 1 ( N ) = { a ∈ V T | ∃ α, γ : S ⇒ α Na γ } = 5

  8. Definitions Let k ≥ 1 • k -prefix of a word w = a 1 . . . a n  a 1 . . . a n if n ≤ k  k : w =  a 1 . . . a k otherwise • k -concatenation ⊕ k : V ∗ × V ∗ → V ≤ k , defined by u ⊕ k v = k : uv • extended to languages k : L = { k : w | w ∈ L } L 1 ⊕ k L 2 = { x ⊕ k y | x ∈ L 1 , y ∈ L 2 } k V ≤ k = � V i set of words of length at most k i =1 6

  9. FIRST k and FOLLOW k X ∈ FIRST k ( X ) ∈ FOLLOW k ( X ) • set of k –prefixes of terminal words for α FIRST k : ( V N ∪ V T ) ∗ → 2 V ≤ k T ∗ FIRST k ( α ) = { k : u | α = ⇒ u } • set of k –prefixes of terminal words that may immediately follow X FOLLOW k : V N → 2 V ≤ k T # 7 ∗ FOLLOW k ( X ) = { w | S = ⇒ β X γ and w ∈ FIRST k ( γ ) }

  10. Parser for G 2 program parser; var nextsym: string ; proc scan; { reads next input symbol into nextsym} proc error (message: string ); { issues error message and stops parser } proc accept; { terminates successfully } proc S; begin E end ; proc E; begin T; E’ end ; 8

  11. proc E’; begin case nextsym in { ”+” } : if nextsym = "+ " then scan else error( "+ expected") fi ; E; otherwise ; endcase end ; proc T; begin F; T’ end ; proc T’; begin case nextsym in { ” ∗ ” } : if nextsym = "*" then scan else error( "* expected") fi ; T; otherwise ; 9 endcase

  12. proc F; begin case nextsym in { ”(” } : if nextsym = "(" then scan else error( "( expected") fi ; E; if nextsym = ”)” then scan else error(" ) expected") fi ; otherwise if nextsym = ”id” then scan else error("id expected") fi ; endcase end ; begin scan; S; if nextsym = ”#” then accept else error(" # expected") fi end . 10

  13. How to Construct such a Parser Program • Code was automatically generated from the grammar and the FiFo sets. • The program generating the parser has the functions: V N → code N_prog : nonterminals ( V N ∪ V T ) ∗ → code C_prog : concantenations S_prog : V N ∪ V T → code symbols 11

  14. Parser Schema program parser; var nextsym: symbol; proc scan; ( ∗ reads next input symbol into nextsym ∗ ) proc error (message: string ); ( ∗ issues error message and stops the parser ∗ ) proc accept; ( ∗ terminates parser successfully ∗ ) N_prog( X 0 ); (* X 0 start symbol *) N_prog( X 1 ); . . . N_prog( X n ); 12

  15. begin scan; X 0 ; if nextsym = ”#” then accept else error(". . . ") fi end 13

  16. The Non-terminal Procedures N = Non-terminal, C = Concatenation, S = Symbol N_prog( X ) = (* X → α 1 | α 2 | · · · | α k − 1 | α k *) proc X; begin case nextsym in FiFo( X → α 1 ) : C_progr( α 1 ); FiFo( X → α 2 ) : C_progr( α 2 ); . . . FiFo( X → α k − 1 ) : C_progr( α k − 1 ); otherwise C_progr( α k ); endcase end ; 14

  17. C_progr( α 1 α 2 · · · α k ) = S_progr( α 1 ); S_progr( α 2 ); . . . S_progr( α k ); S_progr( a ) = if nextsym = a then scan else error ( "a expected") fi S_progr( Y ) = Y FiFo–sets have to be disjoint (LL(1)–grammar) 15

  18. A Generative Solution Generate the control of a deterministic PDA from the grammar and the FiFo sets. • At compiler–generation time construct a table M M : V N × V T → P M [ N , a ] is the production used to expand nonterminal N when the current symbol is a • For some grammars report that the table cannot be constructed. The compiler writer can then decide to: • change the grammar (but not the language) • use a more general parser-generator • “Patch” the table (manually or using some rules) 16

  19. Creating the table Input: cfg G , FIRST 1 und FOLLOW 1 for G . Output: The parsing table M or an indication that such a table cannot be constructed M is constructed as follows: • For all X → α ∈ P and a ∈ FIRST 1 ( α ), set M [ X , a ] = ( X → α ) • If ε ∈ FIRST 1 ( α ), for all b ∈ FOLLOW 1 ( X ), set M [ X , b ] = ( X → α ) • Set all other entries of M to error Parser table cannot be constructed if at least one entry is set twice. Then, G is not LL(1) 17

  20. Example – arithmetic expressions nonterminal symbol Production S ( , id S → E S + , ∗ , ) , # error E → TE ′ E ( , id E + , ∗ , ) , # error E ′ → + E E ′ + E ′ → ǫ E ′ ) , # E ′ ( , ∗ , id error T → FT ′ ( , id T + , ∗ , ) , # T error T ′ → ∗ T T ′ ∗ T ′ → ǫ T ′ + , ) , # T ′ ( , id error F id F → id ( F → ( E ) F + , ∗ , ) F error 18

  21. LL-Parser Driver (interprets the table M ) program parser; var nextsym: symbol; var st: stack of item; proc scan; ( ∗ reads next input symbol into nextsym ∗ ) proc error (message: string ); ( ∗ issues error message and stops the parser ∗ ) proc accept; ( ∗ terminates parser successfully ∗ ) proc reduce; ( ∗ replaces [ X → β. Y γ ][ Y → α. ] by [ X → β Y .γ ] ∗ ) proc pop; ( ∗ removes topmost item from st ∗ ) proc push ( i : item); ( ∗ pushes i onto st ∗ ) proc replaceby ( i: item); ( ∗ replaces topmost item of st by i ∗ ) 19

  22. begin scan; push( [ S ′ → . S ] ); while nextsym � = "#" do case top in [ X → β. a γ ]: if nextsym = a then scan; replaceby([ X → β a .γ ]) else error fi ; [ X → β. Y γ ] : if M [ Y , nextsym ] = ( Y → α ) then push([ Y → .α ]) else error fi ; [ X → α. ]: reduce; [ S ′ → S . ] : if nextsym = "#" then accept else error fi endcase od end . 20

  23. Explicit Stack Deterministic Pushdown Automaton w a v ✻ [ X → α. Y β ] Input tree Output ❄ ρ M Parser–Table Control # Stack 21

  24. LL( k ) Grammar Goal: formalizing our intuition when the expand-transitions of the Item-Pushdown-Automaton can be made deterministic. Means: k -symbol lookahead into the remaining input. 22

  25. LL( k ) Grammar • Let G = ( V N , V T , P , S ) be a cfg and k be a natural number. G is an LL( k ) grammar iff the following holds: if there exist two leftmost derivations ∗ ∗ = lm uY α = ⇒ ⇒ = ⇒ S lm u βα lm ux and ∗ ∗ ⇒ ⇒ ⇒ S = lm uY α = lm u γα = lm uy and if k : x = k : y , then β = γ . • The expansion of the leftmost non-terminal is always uniquely determined by • the consumed part of the input and • the next k symbols of the remaining input 23

  26. Example 1 Let G 1 be the cfg with the productions STAT → if id then STAT else STAT fi | while id do STAT od | begin STAT end | id := id 24

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend