compiler construction
play

Compiler Construction Lecture 3: Lexical Analysis II (Extended - PowerPoint PPT Presentation

Compiler Construction Lecture 3: Lexical Analysis II (Extended Matching Problem) Thomas Noll Lehrstuhl f ur Informatik 2 (Software Modeling and Verification) noll@cs.rwth-aachen.de http://moves.rwth-aachen.de/teaching/ss-14/cc14/ Summer


  1. Compiler Construction Lecture 3: Lexical Analysis II (Extended Matching Problem) Thomas Noll Lehrstuhl f¨ ur Informatik 2 (Software Modeling and Verification) noll@cs.rwth-aachen.de http://moves.rwth-aachen.de/teaching/ss-14/cc14/ Summer Semester 2014

  2. Outline Recap: Lexical Analysis 1 Complexity Analysis of Simple Matching 2 The Extended Matching Problem 3 First-Longest-Match Analysis 4 Implementation of FLM Analysis 5 Compiler Construction Summer Semester 2014 3.2

  3. Lexical Analysis Definition The goal of lexical analysis is to decompose a source program into a sequence of lexemes and their transformation into a sequence of symbols. The corresponding program is called a scanner (or lexer): (token,[attribute]) Source program Scanner Parser get next token Symbol table Example: . . . �x1�:=y2+�1�;� . . . ⇓ . . . (id , p 1 )(gets , )(id , p 2 )(plus , )(int , 1)(sem , ) . . . Compiler Construction Summer Semester 2014 3.3

  4. The DFA Method I Known from Formal Systems, Automata and Processes : Algorithm (DFA method) Input: regular expression α ∈ RE Ω , input string w ∈ Ω ∗ Procedure: using Kleene’s Theorem, construct A α ∈ NFA Ω such 1 that L ( A α ) = � α � apply powerset construction (cf. Definition 2.12) to 2 obtain A ′ α = � Q ′ , Ω , δ ′ , q ′ 0 , F ′ � ∈ DFA Ω with L ( A ′ α ) = L ( A α ) = � α � solve the matching problem by deciding whether 3 δ ′∗ ( q ′ 0 , w ) ∈ F ′ Output: “yes” or “no” Compiler Construction Summer Semester 2014 3.4

  5. The DFA Method II The powerset construction involves the following concept: Definition ( ε -closure) Let A = � Q , Ω , δ, q 0 , F � ∈ NFA Ω . The ε -closure ε ( T ) ⊆ Q of a subset T ⊆ Q is defined by T ⊆ ε ( T ) and if q ∈ ε ( T ), then δ ( q , ε ) ⊆ ε ( T ) Definition (Powerset construction) Let A = � Q , Ω , δ, q 0 , F � ∈ NFA Ω . The powerset automaton A ′ = � Q ′ , Ω , δ ′ , q ′ 0 , F ′ � ∈ DFA Ω is defined by Q ′ := 2 Q �� � ∀ T ⊆ Q , a ∈ Ω : δ ′ ( T , a ) := ε q ∈ T δ ( q , a ) q ′ 0 := ε ( { q 0 } ) F ′ := { T ⊆ Q | T ∩ F � = ∅} Compiler Construction Summer Semester 2014 3.5

  6. Outline Recap: Lexical Analysis 1 Complexity Analysis of Simple Matching 2 The Extended Matching Problem 3 First-Longest-Match Analysis 4 Implementation of FLM Analysis 5 Compiler Construction Summer Semester 2014 3.6

  7. Complexity of DFA Method in construction phase: 1 Kleene method: time and space O ( | α | ) (where | α | := length of α ) Powerset construction: time and space O (2 | A α | ) = O (2 | α | ) (where | A α | := # of states of A α ) at runtime: 2 Word problem: time O ( | w | ) (where | w | := length of w ), space O (1) (but O (2 | α | ) for storing DFA) = ⇒ nice runtime behavior but memory requirements very high (and exponential time in construction phase) Compiler Construction Summer Semester 2014 3.7

  8. The NFA Method Idea: reduce memory requirements by applying powerset construction at runtime, i.e., only “to the run of w through A α ” Algorithm 3.1 (NFA method) Input: automaton A α = � Q , Ω , δ, q 0 , F � ∈ NFA Ω , input string w ∈ Ω ∗ Variables: T ⊆ Q, a ∈ Ω Procedure: T := ε ( { q 0 } ); while w � = ε do a := head ( w ); �� � T := ε q ∈ T δ ( q , a ) ; w := tail ( w ) od Output: if T ∩ F � = ∅ then “yes” else “no” Compiler Construction Summer Semester 2014 3.8

  9. Complexity Analysis For NFA method at runtime: Space: O ( | α | ) (for storing NFA and T ) Time: O ( | α | · | w | ) (in the loop’s body, | T | states need to be considered) Comparison: Method Space Time (for “ w ∈ � α � ?”) O (2 | α | ) DFA O ( | w | ) NFA O ( | α | ) O ( | α | · | w | ) ⇒ trades exponential space for increase in time = In practice: Exponential blowup of DFA method usually does not occur in “real” applications ( = ⇒ used in [f]lex ) Improvement of NFA method: caching of transitions δ ′ ( T , a ) = ⇒ combination of both methods Compiler Construction Summer Semester 2014 3.9

  10. Outline Recap: Lexical Analysis 1 Complexity Analysis of Simple Matching 2 The Extended Matching Problem 3 First-Longest-Match Analysis 4 Implementation of FLM Analysis 5 Compiler Construction Summer Semester 2014 3.10

  11. The Extended Matching Problem I Definition 3.2 Let n ≥ 1 and α 1 , . . . , α n ∈ RE Ω with ε / ∈ � α i � � = ∅ for every i ∈ [ n ] (= { 1 , . . . , n } ). Let Σ := { T 1 , . . . , T n } be an alphabet of corresponding tokens and w ∈ Ω + . If w 1 , . . . , w k ∈ Ω + such that w = w 1 . . . w k and for every j ∈ [ k ] there exists i j ∈ [ n ] such that w j ∈ � α i j � , then ( w 1 , . . . , w k ) is called a decomposition and ( T i 1 , . . . , T i k ) is called an analysis of w w.r.t. α 1 , . . . , α n . Problem 3.3 (Extended matching problem) Given α 1 , . . . , α n ∈ RE Ω and w ∈ Ω + , decide whether there exists a decomposition of w w.r.t. α 1 , . . . , α n and determine a corresponding analysis. Compiler Construction Summer Semester 2014 3.11

  12. The Extended Matching Problem II Observation: neither the decomposition nor the analysis are uniquely determined Example 3.4 α = a + , w = aa 1 ⇒ two decompositions ( aa ) and ( a , a ) with respective (unique) = analyses ( T 1 ) and ( T 1 , T 1 ) α 1 = a | b , α 2 = a | c , w = a 2 = ⇒ unique decomposition ( a ) but two analyses ( T 1 ) and ( T 2 ) Goal: make both unique = ⇒ deterministic scanning Compiler Construction Summer Semester 2014 3.12

  13. Outline Recap: Lexical Analysis 1 Complexity Analysis of Simple Matching 2 The Extended Matching Problem 3 First-Longest-Match Analysis 4 Implementation of FLM Analysis 5 Compiler Construction Summer Semester 2014 3.13

  14. Ensuring Uniqueness Two principles : Principle of the longest match (“maximal munch tokenization”) 1 for uniqueness of decomposition make lexemes as long as possible motivated by applications: e.g., every (non-empty) prefix of an identifier is also an identifier Principle of the first match 2 for uniqueness of analysis choose first matching regular expression (in the given order) therefore: arrange keywords before identifiers (if keywords protected) Compiler Construction Summer Semester 2014 3.14

  15. Principle of the Longest Match Definition 3.5 (Longest-match decomposition) A decomposition ( w 1 , . . . , w k ) of w ∈ Ω + w.r.t. α 1 , . . . , α n ∈ RE Ω is called a longest-match decomposition (LM decomposition) if, for every i ∈ [ k ], x ∈ Ω + , and y ∈ Ω ∗ , ⇒ there is no j ∈ [ n ] such that w i x ∈ � α j � w = w 1 . . . w i xy = Corollary 3.6 Given w and α 1 , . . . , α n , at most one LM decomposition of w exists (clear by definition) and it is possible that w has a decomposition but no LM decomposition (see following example). Example 3.7 w = aab , α 1 = a + , α 2 = ab ⇒ ( a , ab ) is a decomposition but no LM decomposition exists = Compiler Construction Summer Semester 2014 3.15

  16. Principle of the First Match Problem: a (unique) LM decomposition can have several associated analyses (since � α i � ∩ � α j � � = ∅ with i � = j is possible; cf. keyword/identifier problem) Definition 3.8 (First-longest-match analysis) Let ( w 1 , . . . , w k ) be the LM decomposition of w ∈ Ω + w.r.t. α 1 , . . . , α n ∈ RE Ω . Its first-longest-match analysis (FLM analysis) ( T i 1 , . . . , T i k ) is determined by i j := min { l ∈ [ n ] | w j ∈ � α l � } for every j ∈ [ k ]. Corollary 3.9 Given w and α 1 , . . . , α n , there is at most one FLM analysis of w. It exists iff the LM decomposition of w exists. Compiler Construction Summer Semester 2014 3.16

  17. Outline Recap: Lexical Analysis 1 Complexity Analysis of Simple Matching 2 The Extended Matching Problem 3 First-Longest-Match Analysis 4 Implementation of FLM Analysis 5 Compiler Construction Summer Semester 2014 3.17

  18. Implementation of FLM Analysis Algorithm 3.10 (FLM analysis – overview) Input: expressions α 1 , . . . , α n ∈ RE Ω , tokens { T 1 , . . . , T n } , input word w ∈ Ω + Procedure: for every i ∈ [ n ] , construct A i ∈ DFA Ω such that 1 L ( A i ) = � α i � (see DFA method; Algorithm 2.9) construct the product automaton A ∈ DFA Ω such that 2 L ( A ) = � n i =1 � α i � partition the set of final states of A to follow the 3 first-match principle extend the resulting DFA to a backtracking DFA which 4 implements the longest-match principle let the backtracking DFA run on w 5 Output: FLM analysis of w (if existing) Compiler Construction Summer Semester 2014 3.18

  19. (2) The Product Automaton Definition 3.11 (Product automaton) Let A i = � Q i , Ω , δ i , q ( i ) 0 , F i � ∈ DFA Ω for every i ∈ [ n ]. The product automaton A = � Q , Ω , δ, q 0 , F � ∈ DFA Ω is defined by Q := Q 1 × . . . × Q n q 0 := ( q (1) 0 , . . . , q ( n ) 0 ) δ (( q (1) , . . . , q ( n ) ) , a ) := ( δ 1 ( q (1) , a ) , . . . , δ n ( q ( n ) , a )) ( q (1) , . . . , q ( n ) ) ∈ F iff there ex. i ∈ [ n ] such that q ( i ) ∈ F i Lemma 3.12 The above construction yields L ( A ) = � n i =1 L ( A i ) (= � n i =1 � α i � ) . Remark: similar construction for intersection ( F := F 1 × . . . × F n ) Compiler Construction Summer Semester 2014 3.19

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend