Compiler Design Spring 2018 3.3 Top-down parsing Thomas R. Gross - - PowerPoint PPT Presentation

compiler design
SMART_READER_LITE
LIVE PREVIEW

Compiler Design Spring 2018 3.3 Top-down parsing Thomas R. Gross - - PowerPoint PPT Presentation

Compiler Design Spring 2018 3.3 Top-down parsing Thomas R. Gross Computer Science Department ETH Zurich, Switzerland 1 Overview 3.1 Introduction 3.2 Lexical analysis 3.3 Top down parsing 3.4 Bottom up parsing 2 Is w


slide-1
SLIDE 1

Compiler Design

Spring 2018

3.3 Top-down parsing

1

Thomas R. Gross Computer Science Department ETH Zurich, Switzerland

slide-2
SLIDE 2

Overview

§ 3.1 Introduction § 3.2 Lexical analysis § 3.3 “Top down” parsing § 3.4 “Bottom up” parsing

2

slide-3
SLIDE 3

Is w ∈ L(G)?

§ Recall: given G and w, want to know if w ∈ L(G) § Approach: Find derivation

§ S ⇒ a ⇒ … ⇒ w

§ Two principal approaches

§ Start with S (Start symbol), work towards w

§ Guess what production will lead to w § “Top-down” parsing

S ⇒ … … ⇒ w § Start with w and try to find a way to get back to S

§ Guess how w was generated § “Bottom-up” parsing

w ⇐ … ⇐ … ⇐ S

3

Yes

slide-4
SLIDE 4

3.3 “Top down” parsing

§ Given w ∈ T* and context-free grammar G(S, T, NT, P) is w ∈ L(G)? § Top-down: find a derivation S ⇒ … ⇒ w

§ Want to find a left-most derivation § Process input from left-to-right

§ Languages described by a context-free grammar can be recognized by a stack machine

§ w recognized ⇔ w ∈ L(G) § Get derivation for free (sequence of actions by stack machine)

4

slide-5
SLIDE 5

Simple stack machine

5

Parser control TOS $ a + b $ ip sp input string ($ is the end of input marker)

slide-6
SLIDE 6

Actions

§ Error

§ w ∉ L(G)

§ Accept

§ w ∈ L(G)

§ Match

§ Consume: Remove from input , advance input pointer § Pop stack

§ Reduction

§ Use production to expand/contract the top of the stack

6

slide-7
SLIDE 7

Parser decisions

§ Parser must decide based on top of stack and current input § Current input

§ Either the next token § Or some number k of remaining tokens

7

slide-8
SLIDE 8

Grammar G7

§ Start symbol S § Terminals : { Id, +, -, *, / } § Non-terminals: {S, Op} § Productions

S à Id Op Id | (1)

  • Id

(2) Op à + | (3)

  • |

(4) * | (5) / (6)

8

slide-9
SLIDE 9

11

slide-10
SLIDE 10

12

slide-11
SLIDE 11

Parser decisions

§ Parser must decide based on top of stack and current input § Current input

§ Either the next token § Or some number k of remaining tokens

§ How can we control the parser?

§ Must be sure that w ∉ L(G) if we say there is no derivation

18

slide-12
SLIDE 12

Grammars & words

§ Words are finite § Grammars are finite

§ Finite alphabets § Finite number of productions § Try until you succeed

19

slide-13
SLIDE 13

Compiler Design

Spring 2018

3.3.1 Backtracking parsers

20

Thomas R. Gross Computer Science Department ETH Zurich, Switzerland

slide-14
SLIDE 14

Backtracking parsers

§ Basic idea (given grammar G, word w)

§ Start with S § Given state of stack, rest of input § Can we match, consume & pop a symbol § Yes: Do it § No: Can we apply a production to non-terminal X on top of stack? § Yes: Do it. § No: Stuck, continue with undo § Undo: Undo last step and try another production § Either for X, or (if there are no choices left) § For non-terminal that was replaced in previous step § May have to restore input

21

slide-15
SLIDE 15

Consider this grammar G8

§ Start symbol: S § Terminals: {a, b, x} § Non-terminals: {A, B, S} § Productions:

S à x A | x B A à x A | a B à x B | b

§ What is L(G)?

22

slide-16
SLIDE 16

Consider this grammar G8

§ Start symbol: S § Terminals: {a, b, x} § Non-terminals: {A, B, S} § Productions:

S à x A | x B A à x A | a B à x B | b

§ L(G) = { xn a, xn b | n > 0 }

23

slide-17
SLIDE 17

xxxb

26

Stack Input Action

S $ x A $ A $ x A $ A $ x A $ A $ x A $ … S $ x B $ B $ x x x b x x x b x x b x x b x b x b B x b … x x x b x x x b x x b S à xA Match S à xA Match Sà xA Match Undo Undo … S à xB Match S à xB …

slide-18
SLIDE 18

Backtracking

§ Accept if stack is empty and all input consumed § Reject if there are no more choices to try

§ Signal error

§ Implementation easy § May not be efficient – but fast enough in some settings

§ Can be used for any language

30

slide-19
SLIDE 19

3.3.2 Predictive top-down parsers

§ For some grammars the first k tokens of the unprocessed input determine the parser’s action

§ LL(k) grammars § Left-to-right scan, left-most derivation, k symbols of look-ahead § Important subclass: LL(1)

§ Many programming languages have LL(1) grammars

§ Predictive parsing: The next k symbols determine everything

31

slide-20
SLIDE 20

Example: One token lookahead

§ Example production

stmt → if expr then stmt else stmt | while expr do stmt | begin stmt end

§ Guess (which production will lead to w) is possible by looking at first token

32

slide-21
SLIDE 21

Consider G8 (again)

§ Start symbol: S § Terminals: {a, b, x}, Non-terminals: {A, B, S} § Productions:

S à x A | x B A à x A | a B à x B | b

Can we use predictive parsing for this grammar? Please justify your answer. You can work in teams. Bored? How can we use a predictive parser for L(G)?

34

slide-22
SLIDE 22

39

slide-23
SLIDE 23

3.3.3 Construction of predictive parsers

§ Top-down § Predictive: For any combination of (top-of-stack, input) parser knows how to move forward

§ Towards an “accept” or “reject” decision

§ Look again at stack machine

43

slide-24
SLIDE 24

Simple stack machine

44

Predictive parser control TOS $ a + b $ ip sp input string ($ is the end of input marker)

slide-25
SLIDE 25

Simple stack machine

45

Predictive parser control Parsing table M TOS $ a + b $ ip sp input string ($ is the end of input marker) Contains rules M[NT, T] = production NT à a

slide-26
SLIDE 26

Predictive parser

§ Two parts

1. (Generic) controller 2. (Grammar-specific) parsing table M

§ Start with S (start symbol on the stack)

§ Expand § Pop matching terminals

… until stack is empty § Fine print

§ Assume context-free grammar G(S, T, NT, P) § Add $ to mark bottom of stack, end of input § Goal: Find left-most derivation

46

slide-27
SLIDE 27

Part #1: Parser control

repeat {

X = top of stack a: terminal pointed to by ip (input pointer) if (X ∈ T) { if (X == a) { pop X; ip++ };

} until (X == $) and (*ip == $);

48

slide-28
SLIDE 28

Slow motion: Match, consume, pop

§ Grammar G11 with productions

S à AB A à a B à b

50

§ Input w = a b § Assume we have this intermediate state:

a $ a b $ ip

slide-29
SLIDE 29

§ Grammar G11 with productions

S à AB A à a B à b

§ Input w = a b § Assume we have this intermediate state:

a $ a b $ $ b $ ip

Slow motion: Match, consume, pop

slide-30
SLIDE 30

Part #2: Parsing table M

§ Controls specific operation steps of parsing engine

§ Specific: for a grammar

§ Decides what to do if there is a non-terminal on top of the stack

§ Pick a production § Expand non-terminal using production

52

slide-31
SLIDE 31

Part #2: Parsing table M

§ (Again) grammar G11 with productions

S à AB A à a B à b

Non-terminal Input (terminal) symbol a b S A B

slide-32
SLIDE 32

Part #2: Parsing table M

§ (Again) grammar G11 with productions

S à AB A à a B à b

Non-terminal Input (terminal) symbol a b S S à AB A A à a B B à b

slide-33
SLIDE 33

§ (Again) grammar with productions

S à AB A à a B à b

§ No entry: Error

Non-terminal Input (terminal) symbol a b $ S S à AB A A à a B B à b $ ACCEPT

Part #2: Parsing table M

slide-34
SLIDE 34

§ (Again) grammar with productions

S à AB A à a B à b

§ No entry: Error

Non-terminal Input (terminal) symbol a b $ S S à AB Error Error A A à a Error Error B Error B à b Error $ Error Error ACCEPT

Part #2: Parsing table M

slide-35
SLIDE 35

Part #1 (parser control) revisited

repeat {

X = top of stack a: terminal pointed to by ip (input pointer) if (X ∈ T) { if (X == a) { pop X; ip++ };

} until (X == $) and (*ip == $);

57

slide-36
SLIDE 36

Part #1 (parser control) revisited

repeat {

X = top of stack a: terminal pointed to by ip (input pointer) if (X ∈ T) { if (X == a) { pop X; ip++ }; else error(); } else if (M[X, a] is error-entry) error(); else if (M=[X, a] == X à Y1 Y2 … Yn) { pop X push Yn … Y2 Y1 onto the stack record production X à Y1 Y2 … Yn

} until (X == $) and (*ip == $);

58

slide-37
SLIDE 37

Slow motion

Input string: w = a b

62

A B $ a b $ S $ a b $ a B $ a b $ B $ b $

§ S à AB § A à a § match, consume, pop § B à b

slide-38
SLIDE 38

Slow motion

Input string: w = a b

63

B $ b $ b $ b $

§ B à b § match, consume, pop

$ $

§ ACCEPT

slide-39
SLIDE 39

Construction of parsing control table M

§ Table M [top-of-stack, next-input] constructed from grammar productions § Each entry contains one of the following

§ A production § Error § Accept

§ The grammar for such a table cannot be ambiguous

§ M defined ⇒ grammar not ambiguous

65

slide-40
SLIDE 40

Grammar G12 for expressions

§ Start symbol: E § Terminals: T = { ( , ) , * , + , Id } § Non-terminals: NT NT = { E, E’, F, T, T’ } § Productions

E à T E’ (1) E’ à + T E’ | (2) e (3) T à F T’ (4) T’ à * F T’| (5) e (6) F à ( E ) | (7) Id (8)

66

slide-41
SLIDE 41

L(G)

§ Arithmetic expressions § Not ambiguous

67

slide-42
SLIDE 42

Setting M

§ Need to capture legal input for all non-terminals § Legal input for X: those strings that start a derivation from X

§ X ⇒* s a with s ∊ T +, a ∊ { T ∪ NT }*

§ M [ X, r ]

§ X on top of stack § r start of (remaining) input w: use production § r not start of (remaining) input w: X ⇏* r a so error!

68

slide-43
SLIDE 43

Examples for G12

§ Legal input for the following non-terminals

§ F → ? § T → ? § “)” not OK if either F or T is on top of the stack

70

slide-44
SLIDE 44

X on top of stack, input t

§ X ⇒ t § t ∈ T* T* § Need start of words w over T* T* that can be generated from X

§ How much of the words w do we want to look at? § For now: just 1 symbol (character)

§ Different productions P1: X à a , P2: X à b , …

§ P1: Set1 (of terminals) § P2: Set2 (of terminals) § … § Put first symbol of w into Seti

73

slide-45
SLIDE 45

X on top of stack, input t

§ Different productions P1: X à a w’ , P2: X à b w’, …

§ P1: Set1={‘a’} § P2: Set2={‘b’} § … § Put first symbol of w into Seti

§ M easy to construct if Setj ∩ Setk == ∅ for k ≠ j

74

slide-46
SLIDE 46

FIRST(a)

§ Given a ∈ { { T ∪ NT NT }+ }+ § FIRST(a) = ) = { { t | | t ∈ T T and there exists a derivation a ⇒* t b

  • r t == e and a ⇒* e }

§ Example

E’ à + T E’ | (2) e (3) FIRST(E’) = {’+’, ‘e’}

76

slide-47
SLIDE 47

Computing FIRST(X)

Apply these three rules until no more words (strings) over T or e can be added to any FIRST(X). Consider X ∈ T ∪ NT NT

  • 1. If X ∈ T then FIRST(X) = { X }
  • 2. If (X à e is a production) then add e to FIRST(X)
  • 3. If X ∈ NT

NT and X à Y1 Y2 ...Yi … Yk is a production for k ≥ 1 then add terminal t to FIRST(X) if t ∈ FIRST(Yi) and Y1 Y2 … Yi-1 ⇒* e

77

slide-48
SLIDE 48

Computing FIRST(X) (cont’d)

  • 3. If X ∈ NT

NT and X à Y1 Y2 …Yi … Yk is a production for k ≥ 1 then add t to FIRST(X) if t ∈ FIRST(Yi) and Y1 Y2 … Yi-1 ⇒* e

§ Y1 Y2 … Yi-1 ⇒* e means

§ Y1 ⇒* e § Y2 ⇒* e § … § Yi-1 ⇒* e § and Yi ⇒ t a

78

slide-49
SLIDE 49

79

slide-50
SLIDE 50

FIRST(X1X2 … Xn)

§ Add to FIRST(X1X2 … Xn) all non-e symbols from FIRST(X1) § Add to FIRST(X1X2 … Xn) all non-e symbols from FIRST(X2) if e in FIRST(X1) § And so on: Add e to FIRST(X1X2 … Xn) if e in FIRST(Xi) 1 ≤ i ≤ n

81

slide-51
SLIDE 51

83

slide-52
SLIDE 52

Setting M, continued

§ Is FIRST(…) enough to set up M? § Consider input Id + Id, derivation E ⇒ T E’ ⇒ F T’ E’ ⇒ Id T’ E’

§ Now the parser can match Id, advance ip to “+ Id”, pop Id

§ But what to do afterwards?

§ T’ cannot produce “+”

§ “+” ∉ FIRST(T’) § FIRST(T’) = {e, “*”)

§ But e ∈ FIRST(T’)!

§ So we can make T’ “disappear”

§ Question: Could we use E’ to produce “+”?

§ E’ follows T’

87

slide-53
SLIDE 53

Setting M, continued

89

§ So need to look at terminals that can follow a non-terminal

slide-54
SLIDE 54

Computing FOLLOW(X)

§ X ∈NT NT § FOLLOW(X) = { t | t ∈T and there is a derivation S ⇒* a X t g Apply these rules until nothing can be added to any FOLLOW set:

  • 1. Place $ into FOLLOW(S)
  • 2. For Aà a B b : everything (but e) in FIRST(b) is in FOLLOW(B)
  • 3. For Aà a B or (Aà a B b and b ⇒* e ) add FOLLOW(A) to

FOLLOW(B)

§ b ⇒* e means e ∈ FIRST(b)

90

slide-55
SLIDE 55

FOLLOW for NT NT from G

92

Productions: E à T E’ (1) E’ à + T E’ | (2) e (3) T à F T’ (4) T’ à * F T’| (5) e (6) F à ( E ) | (7) Id (8) FIRST(E) = { ”(“, Id } FIRST(E’) = { e, “+” } FIRST(T) = { “(“, Id } FIRST(T’) = { e, “*” } FIRST(F) = { “(“, Id } Iteration 0: FOLLOW(E) = {$} R1 FOLLOW(T) = {“+”} R2(2) FOLLOW(F) = {“*”} R2(5) Iteration 2: FOLLOW(E) = {$, “)”} FOLLOW(E’) = {$, “)”} R3(1) FOLLOW(T) = {“+”, $, “)”} R3(2) FOLLOW(T’) = {”+”, $, “)”} R3(4) FOLLOW(F) = {“*”, “+”, $, “)”} R3(5) Iteration 1: FOLLOW(E) = {$, “)”} R2(7) FOLLOW(T) = {“+”} FOLLOW(F) = {“*”}

slide-56
SLIDE 56

Constructing M

§ Want to use these sets to construct M § Consider stack, symbol at ip -- case 1

94

Add A à a to M [ A, a ]

ipà a xxxxx $ want (eventually) use Aà a if a ∈ FIRST(a)

A XXXXXX $

slide-57
SLIDE 57

Constructing M

95

Add A à a to M [ A, b ]

ipà b xxxxx $ want (eventually) use Aà a if e ∈ FIRST(a) and b ∈ FOLLOW(A)

§ Want to use these sets to construct M § Consider stack, symbol at ip -- case 2

A XXXXXX $

slide-58
SLIDE 58

Setting up table M

For all productions A à a we set up table M as follows

  • 1. Add A à a to M [ A, a ] for all a ∈ FIRST(a)
  • 2. If e ∈ FIRST(a), add A à a to M [A, b] for all b ∈ FOLLOW(A)

This rule also applies to b == $

  • 3. Add “accept” to M[$, $]
  • 4. All other entries are “error”

96

slide-59
SLIDE 59

97

FOLLOW(E) = {$, “)”} FOLLOW(E’) = {$, “)”} FOLLOW(T) = {“+”, $, “)”} FOLLOW(T’) = {”+”, $, “)”} FOLLOW(F) = {“*”, “+”, $, “)”} FIRST(E) = { ”(“, Id } FIRST(E’) = { e, “+” } FIRST(T) = { “(“, Id } FIRST(T’) = { e, “*” } FIRST(F) = { “(“, Id } E à T E’ (1) E’ à + T E’ | (2) e (3) T à F T’ (4) T’ à * F T’| (5) e (6) F à ( E ) | (7) Id (8)

Setting up table M for G12

slide-60
SLIDE 60

§ For all productions A à a

1. Add A à a to M [ A, a ] for all a ∈ FIRST(a) 2. If e ∈ FIRST(a), add A à a to M [A, b] for all b ∈ FOLLOW(A) This rule also applies to b == $

98

NT / T Id + * ( ) $ E E’ T T’ F

slide-61
SLIDE 61

Id + Id

Please use the parser table M to parse “Id + Id”

Find a derivation Draw the corresponding parse tree

Alternatively: parse “Id + Id * Id” and see how the parser handles the ambiguity that caused a problem with grammar G from last week

104

slide-62
SLIDE 62

Id + Id

105

slide-63
SLIDE 63

Example

§ Consider grammar G8 with productions

§ S à x A | x B § A à x A | a § B à x B | b

§ Find FIRST(), FOLLOW(), construct parsing table

§ M [S, x] contains S à x A and S à x B

109

slide-64
SLIDE 64

Example (cont’d)

110

FIRST(S) = {x} FIRST(A) = {x, a} FIRST(B) = {x, b} FOLLOW(S) = {$} FOLLOW(A) = {$} FOLLOW(B) = {$} § There are two productions

§ S à xA and S à xB § Which one to pick to expand S? § Note that FIRST(xA) ∩ FIRST(xB) ≠ ∅

B à x B | b

slide-65
SLIDE 65

Control table M

§ M[S, x] = { S à xA, S à xB} § Predictive parsing does not work as there is more than one entry in M[S, x] § (Finite) look-ahead k does not help to decide between the two productions

111

slide-66
SLIDE 66

LL(1) grammars

§ Grammars without conflict in the parsing table M as constructed are LL(1) grammars

§ Left-to-right scan, leftmost derivation, 1 input symbol

§ Important class, many programming languages have LL(1) grammars § What to do if G is not LL(1)?

§ Rewrite grammar (e.g., S -> x X ; X-> xX | a | b ) § Increase length of input string considered (LL(k) grammar if k symbols are considered – use FIRSTk()) § Use bottom-up parsing

112