compiler design
play

Compiler Design Spring 2018 3.3 Top-down parsing Thomas R. Gross - PowerPoint PPT Presentation

Compiler Design Spring 2018 3.3 Top-down parsing Thomas R. Gross Computer Science Department ETH Zurich, Switzerland 1 Overview 3.1 Introduction 3.2 Lexical analysis 3.3 Top down parsing 3.4 Bottom up parsing 2 Is w


  1. Compiler Design Spring 2018 3.3 Top-down parsing Thomas R. Gross Computer Science Department ETH Zurich, Switzerland 1

  2. Overview § 3.1 Introduction § 3.2 Lexical analysis § 3.3 “Top down” parsing § 3.4 “Bottom up” parsing 2

  3. Is w ∈ L(G)? § Recall: given G and w, want to know if w ∈ L(G) § Approach: Find derivation § S ⇒ a ⇒ … ⇒ w Yes § Two principal approaches § Start with S (Start symbol), work towards w § Guess what production will lead to w § “Top-down” parsing S ⇒ … … ⇒ w § Start with w and try to find a way to get back to S § Guess how w was generated § “Bottom-up” parsing w ⇐ … ⇐ … ⇐ S 3

  4. 3.3 “Top down” parsing § Given w ∈ T* and context-free grammar G(S, T, NT, P) is w ∈ L(G)? § Top-down: find a derivation S ⇒ … ⇒ w § Want to find a left-most derivation § Process input from left-to-right § Languages described by a context-free grammar can be recognized by a stack machine § w recognized ⇔ w ∈ L(G) § Get derivation for free (sequence of actions by stack machine) 4

  5. Simple stack machine input string a + b $ ( $ is the end of input marker) ip TOS sp Parser control $ 5

  6. Actions § Error § w ∉ L(G) § Accept § w ∈ L(G) § Match § Consume: Remove from input , advance input pointer § Pop stack § Reduction § Use production to expand/contract the top of the stack 6

  7. Parser decisions § Parser must decide based on top of stack and current input § Current input § Either the next token § Or some number k of remaining tokens 7

  8. Grammar G 7 § Start symbol S § Terminals : { Id, +, -, *, / } § Non-terminals: {S, Op} § Productions S à Id Op Id | (1) - Id (2) Op à + | (3) - | (4) * | (5) / (6) 8

  9. 11

  10. 12

  11. Parser decisions § Parser must decide based on top of stack and current input § Current input § Either the next token § Or some number k of remaining tokens § How can we control the parser? § Must be sure that w ∉ L(G) if we say there is no derivation 18

  12. Grammars & words § Words are finite § Grammars are finite § Finite alphabets § Finite number of productions § Try until you succeed 19

  13. Compiler Design Spring 2018 3.3.1 Backtracking parsers Thomas R. Gross Computer Science Department ETH Zurich, Switzerland 20

  14. Backtracking parsers § Basic idea (given grammar G, word w) § Start with S § Given state of stack, rest of input § Can we match, consume & pop a symbol § Yes: Do it § No: Can we apply a production to non-terminal X on top of stack? § Yes: Do it. § No: Stuck, continue with undo § Undo: Undo last step and try another production § Either for X, or (if there are no choices left) § For non-terminal that was replaced in previous step § May have to restore input 21

  15. Consider this grammar G 8 § Start symbol: S § Terminals: {a, b, x} § Non-terminals: {A, B, S} § Productions: S à x A | x B A à x A | a B à x B | b § What is L(G)? 22

  16. Consider this grammar G 8 § Start symbol: S § Terminals: {a, b, x} § Non-terminals: {A, B, S} § Productions: S à x A | x B A à x A | a B à x B | b § L(G) = { x n a, x n b | n > 0 } 23

  17. xxxb Stack Input Action S $ x x x b S à xA Match x A $ x x x b A $ x x b S à xA Match x A $ x x b A $ x b S à xA Match x A $ x b A $ B Undo Undo x A $ x b … … … S à xB S $ x x x b x B $ x x x b Match S à xB B $ x x b … 26

  18. Backtracking § Accept if stack is empty and all input consumed § Reject if there are no more choices to try § Signal error § Implementation easy § May not be efficient – but fast enough in some settings § Can be used for any language 30

  19. 3.3.2 Predictive top-down parsers § For some grammars the first k tokens of the unprocessed input determine the parser’s action § LL(k) grammars § Left-to-right scan, left-most derivation, k symbols of look-ahead § Important subclass: LL(1) § Many programming languages have LL(1) grammars § Predictive parsing: The next k symbols determine everything 31

  20. Example: One token lookahead § Example production stmt → if expr then stmt else stmt | while expr do stmt | begin stmt end § Guess (which production will lead to w ) is possible by looking at first token 32

  21. Consider G 8 (again) § Start symbol: S § Terminals: {a, b, x}, Non-terminals: {A, B, S} § Productions: S à x A | x B A à x A | a B à x B | b Can we use predictive parsing for this grammar? Please justify your answer. You can work in teams. Bored? How can we use a predictive parser for L(G)? 34

  22. 39

  23. 3.3.3 Construction of predictive parsers § Top-down § Predictive: For any combination of (top-of-stack, input) parser knows how to move forward § Towards an “accept” or “reject” decision § Look again at stack machine 43

  24. Simple stack machine input string a + b $ ( $ is the end of input marker) ip TOS sp Predictive parser control $ 44

  25. Simple stack machine input string a + b $ ( $ is the end of input marker) ip TOS sp Predictive parser control Parsing table Contains rules M M[NT, T] = production NT à a $ 45

  26. Predictive parser § Two parts 1. (Generic) controller 2. (Grammar-specific) parsing table M § Start with S (start symbol on the stack) § Expand § Pop matching terminals … until stack is empty § Fine print § Assume context-free grammar G(S, T, NT, P) § Add $ to mark bottom of stack, end of input § Goal: Find left-most derivation 46

  27. Part #1: Parser control repeat { X = top of stack a: terminal pointed to by ip (input pointer) if (X ∈ T) { if (X == a) { pop X; ip++ }; } until (X == $) and (*ip == $); 48

  28. Slow motion : Match, consume, pop § Grammar G 11 with productions § Input w = a b S à AB A à a § Assume we have this B à b intermediate state: a $ a b $ ip 50

  29. Slow motion : Match, consume, pop § Grammar G 11 with productions § Input w = a b S à AB A à a § Assume we have this B à b intermediate state: a $ a b $ b $ $ ip

  30. Part #2: Parsing table M § Controls specific operation steps of parsing engine § Specific: for a grammar § Decides what to do if there is a non-terminal on top of the stack Pick a production § Expand non-terminal using production § 52

  31. Part #2: Parsing table M § (Again) grammar G 11 with productions S à AB A à a B à b Input (terminal) symbol a b Non-terminal S A B

  32. Part #2: Parsing table M § (Again) grammar G 11 with productions S à AB A à a B à b Input (terminal) symbol a b Non-terminal S S à AB A A à a B B à b

  33. Part #2: Parsing table M § (Again) grammar with productions S à AB A à a B à b Input (terminal) symbol a b $ Non-terminal S S à AB A A à a B B à b $ ACCEPT § No entry: Error

  34. Part #2: Parsing table M § (Again) grammar with productions S à AB A à a B à b Input (terminal) symbol a b $ Non-terminal S Error Error S à AB A A à a Error Error B Error B à b Error $ Error Error ACCEPT § No entry: Error

  35. Part #1 (parser control) revisited repeat { X = top of stack a: terminal pointed to by ip (input pointer) if (X ∈ T) { if (X == a) { pop X; ip++ }; } until (X == $) and (*ip == $); 57

  36. Part #1 (parser control) revisited repeat { X = top of stack a: terminal pointed to by ip (input pointer) if (X ∈ T) { if (X == a) { pop X; ip++ }; else error(); } else if (M[X, a] is error-entry) error(); else if (M=[X, a] == X à Y 1 Y 2 … Y n ) { pop X push Y n … Y 2 Y 1 onto the stack record production X à Y 1 Y 2 … Y n } until (X == $) and (*ip == $); 58

  37. Slow motion Input string: w = a b S $ a b $ § S à AB A B $ a b $ § A à a a B $ a b $ § match, consume, pop B $ b $ § B à b 62

  38. Slow motion Input string: w = a b B $ b $ § B à b b $ b $ § match, consume, pop $ $ § ACCEPT 63

  39. Construction of parsing control table M § Table M [top-of-stack, next-input] constructed from grammar productions § Each entry contains one of the following § A production § Error § Accept § The grammar for such a table cannot be ambiguous § M defined ⇒ grammar not ambiguous 65

  40. Grammar G 12 for expressions § Start symbol: E § Terminals: T = { ( , ) , * , + , Id } § Non-terminals: NT NT = { E, E’, F, T, T’ } § Productions E à T E’ (1) E’ à + T E’ | (2) e (3) T à F T’ (4) T’ à * F T’| (5) e (6) F à ( E ) | (7) 66 Id (8)

  41. L(G) § Arithmetic expressions § Not ambiguous 67

  42. Setting M § Need to capture legal input for all non-terminals § Legal input for X: those strings that start a derivation from X § X ⇒ * s a with s ∊ T +, a ∊ { T ∪ NT }* § M [ X, r ] § X on top of stack § r start of (remaining) input w: use production § r not start of (remaining) input w: X ⇏ * r a so error! 68

  43. Examples for G 12 § Legal input for the following non-terminals § F → ? § T → ? § “)” not OK if either F or T is on top of the stack 70

  44. X on top of stack, input t § X ⇒ t § t ∈ T* T* § Need start of words w over T* T* that can be generated from X § How much of the words w do we want to look at? § For now: just 1 symbol (character) § Different productions P 1 : X à a , P 2 : X à b , … § P 1 : Set 1 (of terminals) § P 2 : Set 2 (of terminals) § … 73 § Put first symbol of w into Set i

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend