compiler construction
play

Compiler Construction Lecture 8: Syntax Analysis IV (More on LL (1) - PowerPoint PPT Presentation

Compiler Construction Lecture 8: Syntax Analysis IV (More on LL (1) & Bottom-Up Parsing) Thomas Noll Lehrstuhl f ur Informatik 2 (Software Modeling and Verification) noll@cs.rwth-aachen.de


  1. Compiler Construction Lecture 8: Syntax Analysis IV (More on LL (1) & Bottom-Up Parsing) Thomas Noll Lehrstuhl f¨ ur Informatik 2 (Software Modeling and Verification) noll@cs.rwth-aachen.de http://moves.rwth-aachen.de/teaching/ss-14/cc14/ Summer Semester 2014

  2. Outline Recap: LL (1) Parsing 1 Transformation to LL (1) 2 The Complexity of LL (1) Parsing 3 Recursive-Descent Parsing 4 Bottom-Up Parsing 5 Nondeterministic Bottom-Up Parsing 6 Compiler Construction Summer Semester 2014 8.2

  3. Characterization of LL (1) Theorem (Characterization of LL (1)) G ∈ LL (1) iff for all pairs of rules A → β | γ ∈ P (where β � = γ ): la ( A → β ) ∩ la ( A → γ ) = ∅ . Proof. on the board Remark: the above theorem generally does not hold if k > 1 (cf. exercises) Compiler Construction Summer Semester 2014 8.3

  4. Deterministic Top-Down Parsing Approach: given G ∈ CFG Σ , Verify that G ∈ LL (1) by computing the lookahead sets and checking 1 alternatives for disjointness Start with nondeterministic top-down parsing automaton NTA ( G ) 2 Use 1-symbol lookahead to control the choice of expanding 3 productions: ( aw , A α, z ) ⊢ ( aw , βα, zi ) if π i = A → β and a ∈ la ( π i ) ( ε, A α, z ) ⊢ ( ε, βα, zi ) if π i = A → β and ε ∈ la ( π i ) [matching steps as before: ( aw , a α, z ) ⊢ ( w , α, z )] ⇒ deterministic top-down parsing automaton DTA ( G ) = Remarks: DTA ( G ) is actually not a pushdown automaton ( a is read but not consumed). But: can be simulated using the finite control. Advantage of using lookahead is twofold: Removal of nondeterminism Earlier detection of syntax errors ∈ � A → β ∈ P la ( A → β )) (in configurations ( aw , A α, z ) where a / Compiler Construction Summer Semester 2014 8.4

  5. Outline Recap: LL (1) Parsing 1 Transformation to LL (1) 2 The Complexity of LL (1) Parsing 3 Recursive-Descent Parsing 4 Bottom-Up Parsing 5 Nondeterministic Bottom-Up Parsing 6 Compiler Construction Summer Semester 2014 8.5

  6. Transformation to LL (1) Assume that G = � N , Σ , P , S � ∈ CFG Σ \ LL (1) (i.e., there exist A → β | γ ∈ P such that la ( A → β ) ∩ la ( A → γ ) � = ∅ ) Two heuristics for transforming G into G ′ ∈ LL (1): Removal of left recursion 1 Left factorization 2 (used in parser-generating systems such as ANTLR) Remarks: Transformations generally preserve the semantics (= generated language) of CFGs but not the syntactic structure of words (different syntax trees). Transformations cannot always yield an LL (1) grammar (since not every context-free language is generated by an LL grammar; details later). Compiler Construction Summer Semester 2014 8.6

  7. Left Recursion I Definition 8.1 (Left recursion) A grammar G = � N , Σ , P , S � ∈ CFG Σ is called left recursive if there exist A ∈ N and α ∈ X ∗ such that A ⇒ + A α . Corollary 8.2 If G ∈ CFG Σ is left recursive with A ⇒ + A α , then there exists β ∈ X ∗ such that A ⇒ + l A β . Example 8.3 The grammar (cf. Example 5.10) G AE : E → E + T | T T → T * F | F F → ( E ) | a | b ∈ LL (1) is left recursive, and in Example 7.4 it was shown that G AE / Compiler Construction Summer Semester 2014 8.7

  8. Left Recursion II Lemma 8.4 ∈ � If G ∈ CFG Σ is left recursive, then G / k ∈ N LL ( k ) . Proof. (for k = 1) Assume that G ∈ LL (1) is left recursive with A ⇒ + l A β . Together with the reducedness of G this implies that l vw for some v , w ∈ Σ ∗ and α ∈ X ∗ . l vA α ⇒ + l vA βα ⇒ + S ⇒ ∗ The corresponding computation of DTA ( G ) (Def. 7.6) starts with ( vw , S , ε ) ⊢ ∗ ( w , A α, . . . ) ⊢ + ( w , A βα, . . . ). But in the last state the behaviour of DTA ( G ) is determined by the same input ( fi ( w )) and stack symbol ( A ). Thus it enters a loop of the form ( w , A α, . . . ) ⊢ + ( w , A βα, . . . ) ⊢ + ( w , A ββα, . . . ) ⊢ + . . . and will never recognize w . Contradiction Compiler Construction Summer Semester 2014 8.8

  9. Removing Direct Left Recursion Direct left recursion occurs in productions of the form A → A α 1 | . . . | A α m | β 1 | . . . | β n where α i � = ε and β j � = A . . . Transformation: replacement by right recursion A → β 1 A ′ | . . . | β n A ′ A ′ → α 1 A ′ | . . . | α m A ′ | ε (with a new A ′ ∈ N ) which preserves L ( G ). Example 8.5 G AE : E → E + T | T T → T * F | F is transformed into F → ( E ) | a | b G ′ AE : E → TE ′ E ′ → + TE ′ | ε T → FT ′ with G ′ AE ∈ LL (1) (see Example 7.5). T ′ → * FT ′ | ε F → ( E ) | a | b Compiler Construction Summer Semester 2014 8.9

  10. Removing Indirect Left Recursion Indirect left recursion occurs in productions of the form ( n ≥ 1) A → A 1 α 1 | . . . A 1 → A 2 α 2 | . . . . . . A n − 1 → A n α n | . . . A n → A β | . . . Transformation: into Greibach Normal Form with productions of the form A → aB 1 . . . B n (where n ∈ N and each B i � = S ) or S → ε (cf. Formale Systeme, Automaten, Prozesse ) Compiler Construction Summer Semester 2014 8.10

  11. Left Factorization Applies to productions of the form A → αβ | αγ which are problematic if α “at least as long as lookahead”. Transformation: delaying the decision by left factorization A → α A ′ A ′ → β | γ (with a new A ′ ∈ N ) which preserves L ( G ). Example 8.6 Statement → if Condition then Statement else Statement fi | if Condition then Statement fi is transformed into Statement → if Condition then Statement S ′ S ′ → else Statement fi | fi Compiler Construction Summer Semester 2014 8.11

  12. Outline Recap: LL (1) Parsing 1 Transformation to LL (1) 2 The Complexity of LL (1) Parsing 3 Recursive-Descent Parsing 4 Bottom-Up Parsing 5 Nondeterministic Bottom-Up Parsing 6 Compiler Construction Summer Semester 2014 8.12

  13. The Complexity of LL (1) Parsing I LL (1) parsing has time (and hence space) complexity O ( | w | ) (where w ∈ Σ ∗ is the input word) Here: proof for ε -free grammars (i.e., A → α ∈ P = ⇒ α � = ε ) General case: see O. Mayer: Syntaxanalyse , p. 211ff Lemma 8.7 Let G = � N , Σ , P , S � ∈ LL (1) be ε -free. If ( w , S , ε ) ⊢ n ( ε, ε, z ) in DTA ( G ) , then n ≤ ( | w | + 1) · ( | N | + 1) . Compiler Construction Summer Semester 2014 8.13

  14. The Complexity of LL (1) Parsing II Proof. Let ( w , S , ε ) ⊢ n ( ε, ε, z ) in DTA ( G ). To show: n ≤ ( | w | + 1) · ( | N | + 1) Clear: the computation involves | w | matching steps. 1 Since G is ε -free, every matching step is preceded (and followed) by 2 k ≥ 0 expansion steps of the form ( av , A 1 α 1 , . . . ) ⊢ ( av , A 2 α 2 α 1 , . . . ) . . . ⊢ ( av , A k α k . . . α 1 , . . . ) ⊢ ( av , a α k +1 . . . α 1 , . . . ) where A i → A i +1 α i +1 for each i ∈ [ k − 1] and A k → a α k +1 . This implies that A i � = A j for i � = j (by Lemma 8.4, G is not left 3 recursive), and hence k ≤ | N | . Altogether: n ≤ ( | w | + 1) · ( | N | + 1). 4 Compiler Construction Summer Semester 2014 8.14

  15. Outline Recap: LL (1) Parsing 1 Transformation to LL (1) 2 The Complexity of LL (1) Parsing 3 Recursive-Descent Parsing 4 Bottom-Up Parsing 5 Nondeterministic Bottom-Up Parsing 6 Compiler Construction Summer Semester 2014 8.15

  16. Recursive-Descent Parsing I Idea: avoid explicit use of pushdown store (as in DTA ( G )) by employing recursive procedures (with implicit runtime stack) Advantage: simple implementation Ingredients: variable token for current token function next() for invoking the scanner procedure print(i) for displaying the leftmost analysis (or errors) Method: to every A ∈ N we assign a procedure A() which tests token with regard to the lookahead sets of the A -productions, prints the corresponding rule number and evaluates the corresponding right-hand side as follows: for a ∈ Σ: match token ; call next() for A ∈ N : call A() Compiler Construction Summer Semester 2014 8.16

  17. Recursive-Descent Parsing II Example 8.8 (Arithmetic expressions; cf. Example 8.5) proc main(); token := next(); E() T E ′ *) proc E(); (* E → if token in { ’(’,’a’,’b’ } then print(1); T(); E’() else print(error); stop fi (* E ′ → + T E ′ | proc E’(); ε *) if token = ’+’ then print(2); token := next(); T(); E’() elsif token in { EOF, ’)’ } then print(3) else print(error); stop fi F T ′ *) proc T(); (* T → if token in { ’(’,’a’,’b’ } then print(4); F(); T’() else print(error); stop fi (* T ′ → * F T ′ | proc T’(); ε *) if token = ’*’ then print(5); token := next(); F(); T’() elsif token in { ’+’,EOF,’)’ } then print(6) else print(error); stop fi proc F(); (* F → ( E ) | a | b *) if token = ’(’ then print(7); token := next(); E(); if token = ’)’ then token := next() else print(error); stop fi elsif token = ’a’ then print(8); token := next() elsif token = ’b’ then print(9); token := next() else print(error); stop fi Compiler Construction Summer Semester 2014 8.17

  18. Outline Recap: LL (1) Parsing 1 Transformation to LL (1) 2 The Complexity of LL (1) Parsing 3 Recursive-Descent Parsing 4 Bottom-Up Parsing 5 Nondeterministic Bottom-Up Parsing 6 Compiler Construction Summer Semester 2014 8.18

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend