bottom up parsing
play

Bottom Up Parsing Also known as Shift-Reduce parsing More powerful - PDF document

9/26/2012 Bottom Up Parsing Also known as Shift-Reduce parsing More powerful than top down Dont need left factored grammars Can handle left recursion Bottom Up Parsing Attempt to construct parse tree from an input string


  1. 9/26/2012 Bottom Up Parsing Also known as Shift-Reduce parsing More powerful than top down • Don’t need left factored grammars • Can handle left recursion Bottom Up Parsing Attempt to construct parse tree from an input string • beginning at leaves and working to top • Process of reducing strings to a non terminal – shift-reduce • Uses parse stack • Contains symbols already parsed • Shift until match RHS of production • Reduce to non-terminal on LHS • Eventually reduce to start symbol Shift and Reduce Sentential Form A sentential form is a member of (T  N)* that can be derived in a finite Shift: number of steps from the start symbol S. • Move the first input token to the top of the stack. A sentential form that contains no nonterminal symbols (i.e., is a member of T*) is Reduce: called a sentence . • Choose a grammar rule X → ɑ β γ • pop γ β ɑ from the top of the stack • push X onto the stack. Stack is initially empty and the parser is at the beginning of the input. Shifting $ is accepts . Handle Parse Tree S Intuition : reduce only if it leads to the start symbol S  b M b M  ( L M  a Handle has to b M b L  M a ) • match RHS of production and • lead to rightmost derivation, if reduced to LHS of some rule L  ) ( L Definition: Considering string: M a ) Let  w be a sentential form where: • b ( a a ) b  is an arbitrary string of symbols a X  is a production w is a string of terminals S  b M b  b ( L b  b ( M a ) b  b ( a a ) b Then  at  is a handle of  w if S   Xw   w by a rightmost derivation Try to find handles and then reduce from sentential form via rightmost derivation b ( a a ) b  b ( M a ) b  b ( L b  b M b  S Handles formalize the intuition (reduce  to X ), but doesn’t say how to • find the handle 1

  2. 9/26/2012 Bottom Up Parsing Issues We need to locate the handle in the right sentential Grammar Sentential form Handle Products E  E + E form and then decide what production to reduce it to – E  id id 1 + id 2 * id 3 id 1 which of the RHS of our grammar. E  E * E E  id E + id 2 * id 3 id 2 E  ( E ) E + E * id 3 id 3 E  id Notice in right-most derivation, where right sentential E  id S 1 form is: E + E * E E*E E  E*E E + E E+E E  E+E b M b 2 E ( L 3 Use • to indicate where we are in string: Parsing never has to guess about the middle of the string. The right side always contains terminals. M a ) 4 id 1 • + id 2 * id 3  E • + id 2 * id 3  E + • id 2 * id 3  E + E • * id 3  E + E * id 3 •  E + E * E •  Thus, we can discover the rightmost derivation in a E + E •  E reverse: 4 3 2 1 Bottom Up Parsing Viable Prefix Definition:  is a viable prefix if Consider our usual grammar and the problem of when to reduce: E  T + E | T • There is a w where  w is a right sentential form T  int * T | int | ( E ) •  • w is a configuration of a shift-reduced parser b ( a • a ) b  b ( M • a ) b  b ( L • b  b M • b  S • For the string: int * int + int E Alternatively, a prefix of a rightmost derived sentential form is viable if it does not Sentential form Production extend the right end of the handle. T  int T + E int * int + int T  int * T int * T + int A prefix is viable because it can be extended by adding terminals to form a valid T  int T + int int * T T (rightmost derived) sentential form E  T T + T E  T + E T + E int int As long as the parser has viable prefixes on the stack, no parsing error has been detected. E Parser Structure Parse S  b M b Stack Input Action … Input Tokens: $ M  ( L $ b ( a a ) b $ shift M  a Read head $ b ( a a ) b $ shift L  M a ) $ b ( a a ) b $ shift Top Output L  ) Parser Driver Syntax $ b ( a a ) b $ reduce Stack $ b ( M a ) b $ shift String: Parse table b ( a a ) b $ $ b ( M a ) b $ shift $ b ( M a ) b $ reduce $ $ b ( L b $ reduce $ b M b $ shift Operations $ b M b $ reduce 1. Shift – shift input symbol onto the stack $ Z $ accept 2. Reduce – RHS of a non-terminal handle is at the top of the stack. Decide which non-terminal to reduce it to 3. Accept – success 4. Error 2

  3. 9/26/2012 Ambiguous Grammars Ambiguity Conflicts arise with ambiguous grammars In the first step shown, we can either shift or reduce by E  E * E • Ambiguous grammars generate conflicts but so do other types of grammars • Choice because of precedence of + and * Example: • Same problem with association of * and + • Consider the ambiguous grammar E  E * E | E + E | ( E ) | int We can always rewrite ambiguous grammars of this sort to encode precedence and association in the grammar • Sometimes this results in convoluted grammars. Sentential form Actions Sentential form Actions • The tools we will use have other means to encode precedence and int * int + int shift int * int + int shift association … … … … E * E • + int reduce E  E * E E * E • + int shift We must get rid of conflicts ! E • + int shift E * E + • int shift • Know what a handle is but not clear how to detect it reduce E  int E + • int shift E * E + int • reduce E  int reduce E  E + E E + int • E * E + E • reduce E  E + E reduce E  E * E E + E • E * E • E • E • Properties about Bottom Up Parsing LR Parsers Handles always appear at the top of the stack LR family of parsers • Never in middle of stack • LR(k) • Justifies use of stack in shift–reduce parsing • L – left to right • R – rightmost derivation in reverse General shift–reduce strategy • k elements of look ahead • If there is no handle on the stack, shift • If there is a handle, reduce to the non-terminal Attractive • LR(k) is powerful – virtually all language constructs Conflicts • Efficient • If it is legal to either shift or reduce then there is a shift-reduce conflict . • LL(k)  LR(k) • If it is legal to reduce by two or more productions, then there is a • LR parsers can detect an error as soon as it is possible to do so reduce-reduce conflict. • Automatic technique to generate – YACC, Bison, Java CUP LR and LL Parsers Types of LR Parsers LR parser, each reduction needed for parse is detected on the basic of SLR – simple LR • Left context • Easiest to implement • Reducible phrase • Not as powerful • k terminals of look ahead Canonical LR LL parser • Most powerful • Left context • Expensive to implement • First k symbols of what right hand side derive (combined phrase and what is to right of phrase) LALR • Look ahead LR • In between the 2 previous ones in power and overhead Overall parsing algorithm is the same – table is different 3

  4. 9/26/2012 LR Parser Actions LR Parser Actions How does the LR parser know when to shift and when to reduce? Shift ( n ): • Advance input one token; push n on stack. By using a DFA! Reduce ( k ): The edges of the DFA are labeled by the symbols (terminals and non-terminals) • Pop stack as many times as the number of symbols on the right-hand that can appear on the stack. side of rule k • Let X be the left-hand-side symbol of rule k Five kinds of actions: • In the state now on top of stack, look up X to get “goto n ” Shift into state n ; • Push n on top of stack. 1. s n 2. g n Goto state n ; 3. r k Reduce by rule k ; Accept : Accept; • Stop parsing, report success . 4. a Error : 5. Error • Stop parsing, report failure . LR Parsers Notion of an LR(0) item Can tell handle by looking at stack top: An item is a production with a distinguished position on the right hand side. This position indicates how much of the production already seen. • (grammar symbol, state) and k input symbols index our FSA table • In practice, k<=1 Example: S  a B S is a production How to construct LR parse table from grammar: 1. First construct SLR parser Items for the production: 2. LR and LALR are augmented basic SLR techniques S  • a B S 3. 2 phases to construct table: S  a • B S I. Build deterministic finite state automation to go from state to state S  a B • S II. Build table from DFA S  a B S • Each state – how do we know from grammar where we are in the parse. Production already seen. Basic idea : Construct a DFA that recognizes the viable prefixes group items into sets Construction of LR(0) items LR(0) States Create augmented grammar G’ Start with our usual grammar: G: S   |  1.) E → T + E S’  S G’: 2.) T → int * T S   |  3.) T → ( E ) What else is needed: Add a special start symbol, S, that goes to our original start symbol and $: • A  c • d E 0.) S → E $ • Indicate a new state by consuming symbol d: need goto function • A  c d • E The LR(0) start state will be the set of LR(0) items: • What are all possible things to see – all possible derivations from E? Add strings derivable from E – closure function S → • E $ • A  c d E • – reduce to A and goto another state E → • T + E Compute functions closure and goto will be used to determine the action and T → • int * T goto parts of the parsing table T → • ( E ) • closure – essentially defines what is expected • goto – moves from one state to another by consuming symbol 4

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend