compiler construction
play

Compiler Construction Lecture 4: Lexical analysis in the real world - PowerPoint PPT Presentation

Compiler Construction Lecture 4: Lexical analysis in the real world 2020-01-17 Michael Engel Includes material by Jan Christian Meyer Overview NFA to DFA conversion Subset construction algorithm DFA state minimization:


  1. Compiler Construction Lecture 4: Lexical analysis in the real world 2020-01-17 Michael Engel Includes material by Jan Christian Meyer

  2. Overview • NFA to DFA conversion • Subset construction algorithm • DFA state minimization: • Hopcroft's algorithm • Myhill-Nerode method • Using a scanner generator • lex syntax and usage • lex examples Compiler Construction 04: Lexical analysis in the real world � 2

  3. What have we achieved so far? • We know a method to convert a regular expression: 
 (all | and) 
 into a nondeterministic finite automaton (NFA): l a l a d n using the McNaughton, Thompson and Yamada algorithm Compiler Construction 04: Lexical analysis in the real world � 3

  4. Overhead of constructed NFAs Let’s look at another example: a(b|c)* • Construct the simple NFAs for a , b and c a b c s 1 s 0 s 2 s 4 s 3 s 5 • Construct the NFA for b|c b s 3 ε s 2 ε s 6 s 7 ε ε s 5 s 4 c Compiler Construction 04: Lexical analysis in the real world � 4

  5. Overhead of constructed NFAs • Now construct the NFA for (b|c)* ε b s 3 s 2 ε ε ε ε s 8 s 6 s 7 s 9 ε ε s 5 s 4 c ε • Looks pretty complex already? We're not even finished… Compiler Construction 04: Lexical analysis in the real world � 5

  6. Overhead of constructed NFAs • Finally, construct the NFA for a(b|c)* ε b ε s 3 ε s 2 a ε ε ε s 0 s 1 s 8 s 6 s 7 s 9 ε ε s 5 s 4 c ε • This NFA has many more states than a minimal human-built DFA: b,c a s 1 s 0 Compiler Construction 04: Lexical analysis in the real world � 6

  7. From NFA to DFA • An NFA is not really helpful 
 …since its implementation is not obvious • We know: every DFA is also an NFA (without ε -transitions) • Every NFA can also be converted to an equivalent DFA 
 (this can be proven by induction, we just show the construction) • The method to do this is called subset construction: The alphabet 𝛵 stays the same NFA: ( Q N , 𝛵 , 𝜀 N , n 0 , F N ) The set of states Q N , 
 transition function 𝜀 N , 
 start state q N0 and set of accepting states F N DFA: ( Q D , 𝛵 , 𝜀 D , d 0 , F D ) are modified Compiler Construction 04: Lexical analysis in the real world � 7

  8. Subset construction algorithm Idea of the algorithm: q 0 ← ε - cl osu r e({n 0 }); Find sets of states that are Q D ← q 0 ; 
 equivalent (due to ε - Wo rk L i s t ← {q 0 }; transitions) and join these to form states of a DFA wh il e (Wo rk L i s t ! = ∅ ) do r emo v e q fr om Wo rk L i s t ; ε -closure: f o r ea c h c ha r a ct e r c ∈︎ 𝛵 do 
 contains a set of states S and t ← ε - cl osu r e( 𝜀 N (q, c )); any states in the NFA that can 𝜀 D [q, c ] ← t ; be reached from one of the if t ∉ Q D t hen 
 states in S along paths that add t t o Q D and t o Wo rk L i s t ; contain only ε -transitions end; (these are identical to a state end; in S ) Compiler Construction 04: Lexical analysis in the real world � 8

  9. Subset construction example ε q 0 ← ε - cl osu r e({n 0 }); b n 5 ε ε n 4 Q D ← q 0 ; 
 ε ε n 1 ε n 0 a Wo rk L i s t ← {q 0 }; n 2 n 3 n 8 n 9 wh il e (Wo rk L i s t ! = ∅ ) do ε ε n 6 n 7 c r emo v e q fr om Wo rk L i s t ; ε f o r ea c h c ha r a ct e r c ∈︎ 𝛵 do 
 t ← ε - cl osu r e( 𝜀 N (q, c )); a b c ε 𝜀 N 𝜀 D [q, c ] ← t ; n 0 n 1 – – – if t ∉ Q D t hen 
 n 1 – – – n 2 add t t o Q D and t o Wo rk L i s t ; n 2 – – – n 3, n 9 end; q 0 ← {n 0 } 
 end; n 3 – – – n 4, n 6 Q D ← {n 0 }; 
 n 4 – n 5 – – Wo rk L i s t ← {n 0 }; n 5 – – – n 8 n 6 – – n 7 – n 7 – – – n 8 n 8 – – – n 3, n 9 n 9 – – – – Compiler Construction 04: Lexical analysis in the real world � 9

  10. Subset construction example ε q 0 ← ε - cl osu r e({n 0 }); b n 5 ε ε n 4 Q D ← q 0 ; 
 ε ε n 1 ε n 0 a Wo rk L i s t ← {q 0 }; n 2 n 3 n 8 n 9 wh il e (Wo rk L i s t ! = ∅ ) do ε ε n 6 n 7 c r emo v e q fr om Wo rk L i s t ; ε f o r ea c h c ha r a ct e r c ∈︎ 𝛵 do 
 t ← ε - cl osu r e( 𝜀 N (q, c )); a b c ε 𝜀 N wh il e- l oop I t e r a ti on 1 𝜀 D [q, c ] ← t ; n 0 n 1 – – – Wo rk L i s t ← {{n 0 }}; if t ∉ Q D t hen 
 q ← n 0 ; n 1 – – – n 2 add t t o Q D and t o Wo rk L i s t ; c ← 'a': n 2 – – – n 3, n 9 end; t ← ε - cl osu r e( 𝜀 N (q, c )) end; n 3 – – – n 4, n 6 = ε - cl osu r e( 𝜀 N (n 0 ,’a')) 
 n 4 – n 5 – – = ε - cl osu r e(n 1 ) n 5 – – – n 8 = {n 1 , n 2 ,n 3 ,n 4 ,n 6 ,n 9 } n 6 – – n 7 – 𝜀 D [n 0 ,’a'] ← {n 1 , n 2 ,n 3 ,n 4 ,n 6 ,n 9 }; n 7 – – – n 8 Q D ← {{n 0 },{n 1 ,n 2 ,n 3 ,n 4 ,n 6 ,n 9 }}; n 8 – – – n 3, n 9 Wo rk L i s t ← n 9 – – – – {{n 1 ,n 2 ,n 3 ,n 4 ,n 6 ,n 9 }}; Compiler Construction 04: Lexical analysis in the real world � 10

  11. Subset construction example ε q 0 ← ε - cl osu r e({n 0 }); b n 5 ε ε n 4 Q D ← q 0 ; 
 ε ε n 1 ε n 0 a Wo rk L i s t ← {q 0 }; n 2 n 3 n 8 n 9 wh il e (Wo rk L i s t ! = ∅ ) do ε ε n 6 n 7 c r emo v e q fr om Wo rk L i s t ; ε f o r ea c h c ha r a ct e r c ∈︎ 𝛵 do 
 t ← ε - cl osu r e( 𝜀 N (q, c )); a b c ε 𝜀 N wh il e- l oop I t e r a ti on 1: 𝜀 D [q, c ] ← t ; n 0 n 1 – – – if t ∉ Q D t hen 
 Wo rk L i s t ← {n 0 }; n 1 – – – n 2 add t t o Q D and t o Wo rk L i s t ; q ← n 0 ; n 2 – – – n 3, n 9 end; c ← 'b',' c ': end; n 3 – – – n 4, n 6 t ← {} n 4 – n 5 – – no c han g e t o Q D , Wo rkli s t n 5 – – – n 8 We will skip the iterations n 6 – – n 7 – of the for loop that do not 
 n 7 – – – n 8 from now on n 8 – – – n 3, n 9 change Q D n 9 – – – – Compiler Construction 04: Lexical analysis in the real world � 11

  12. Subset construction example ε q 0 ← ε - cl osu r e({n 0 }); b n 5 ε ε n 4 Q D ← q 0 ; 
 ε ε n 1 ε n 0 a Wo rk L i s t ← {q 0 }; n 2 n 3 n 8 n 9 wh il e (Wo rk L i s t ! = ∅ ) do ε ε n 6 n 7 c r emo v e q fr om Wo rk L i s t ; ε f o r ea c h c ha r a ct e r c ∈︎ 𝛵 do 
 t ← ε - cl osu r e( 𝜀 N (q, c )); a b c ε 𝜀 N wh il e- l oop I t e r a ti on 2 𝜀 D [q, c ] ← t ; n 0 n 1 – – – Wo rk L i s t = {{n 1 ,n 2 ,n 3 ,n 4 ,n 6 ,n 9 }}; if t ∉ Q D t hen 
 q ← {n 1 ,n 2 ,n 3 ,n 4 ,n 6 ,n 9 }; n 1 – – – n 2 add t t o Q D and t o Wo rk L i s t ; c ← 'b': n 2 – – – n 3, n 9 end; t ← ε - cl osu r e( 𝜀 N (q, c )) end; n 3 – – – n 4, n 6 = ε - cl osu r e( 𝜀 N (q,’b’)) 
 n 4 – n 5 – – = ε - cl osu r e(n 5 ) n 5 – – – n 8 = {n 5 , n 8 ,n 9 ,n 3 ,n 4 ,n 6 } n 6 – – n 7 – 𝜀 D [q,’a'] ← {n 5 , n 8 ,n 9 ,n 3 ,n 4 ,n 6 }; n 7 – – – n 8 Q D ← {{n 0 },{n 1 ,n 2 ,n 3 ,n 4 ,n 6 ,n 9 }, 
 n 8 – – – n 3, n 9 {n 5 , n 8 ,n 9 ,n 3 ,n 4 ,n 6 }}; n 9 – – – – Wo rk L i s t ← {{n 5 , n 8 ,n 9 ,n 3 ,n 4 ,n 6 }}; Compiler Construction 04: Lexical analysis in the real world � 12

  13. Subset construction example ε q 0 ← ε - cl osu r e({n 0 }); b n 5 ε ε n 4 Q D ← q 0 ; 
 ε ε n 1 ε n 0 a Wo rk L i s t ← {q 0 }; n 2 n 3 n 8 n 9 wh il e (Wo rk L i s t ! = ∅ ) do ε ε n 6 n 7 c r emo v e q fr om Wo rk L i s t ; ε f o r ea c h c ha r a ct e r c ∈︎ 𝛵 do 
 t ← ε - cl osu r e( 𝜀 N (q, c )); a b c ε 𝜀 N wh il e- l oop I t e r a ti on 2 𝜀 D [q, c ] ← t ; n 0 n 1 – – – Wo rk L i s t = {{n 1 ,n 2 ,n 3 ,n 4 ,n 6 ,n 9 }}; if t ∉ Q D t hen 
 q ← {n 1 ,n 2 ,n 3 ,n 4 ,n 6 ,n 9 }; n 1 – – – n 2 add t t o Q D and t o Wo rk L i s t ; c ← ' c ': n 2 – – – n 3, n 9 end; t ← ε - cl osu r e( 𝜀 N (q, c )) end; n 3 – – – n 4, n 6 = ε - cl osu r e( 𝜀 N (q,’ c ’)) 
 n 4 – n 5 – – = ε - cl osu r e(n 7 ) n 5 – – – n 8 = {n 7 , n 8 ,n 9 ,n 3 ,n 4 ,n 6 } n 6 – – n 7 – 𝜀 D [q,’a’] ← {n 7 , n 8 ,n 9 ,n 3 ,n 4 ,n 6 }; n 7 – – – n 8 Q D ← {{n 0 },{n 1 ,n 2 ,n 3 ,n 4 ,n 6 ,n 9 }, 
 n 8 – – – n 3, n 9 {n 5 , n 8 ,n 9 ,n 3 ,n 4 ,n 6 }, n 9 – – – – {n 7 , n 8 ,n 9 ,n 3 ,n 4 ,n 6 }}; Wo rk L i s t ← {{n 7 , n 8 ,n 9 ,n 3 ,n 4 ,n 6 }}; Compiler Construction 04: Lexical analysis in the real world � 13

  14. Subset construction example ε q 0 ← ε - cl osu r e({n 0 }); b n 5 ε ε n 4 Q D ← q 0 ; 
 ε ε n 1 ε n 0 a Wo rk L i s t ← {q 0 }; n 2 n 3 n 8 n 9 wh il e (Wo rk L i s t ! = ∅ ) do ε ε n 6 n 7 c r emo v e q fr om Wo rk L i s t ; ε f o r ea c h c ha r a ct e r c ∈︎ 𝛵 do 
 t ← ε - cl osu r e( 𝜀 N (q, c )); a b c ε 𝜀 N wh il e- l oop I t e r a ti on 3 𝜀 D [q, c ] ← t ; n 0 n 1 – – – Wo rk L i s t = {{n 7 , n 8 ,n 9 ,n 3 ,n 4 ,n 6 }}; if t ∉ Q D t hen 
 q ← {n 7 , n 8 ,n 9 ,n 3 ,n 4 ,n 6 }; n 1 – – – n 2 add t t o Q D and t o Wo rk L i s t ; c ← 'b',' c ': n 2 – – – n 3, n 9 end; t ← ε - cl osu r e( 𝜀 N (q, c )) end; n 3 – – – n 4, n 6 = ε - cl osu r e( 𝜀 N (q,’ c ’)) 
 n 4 – n 5 – – = ε - cl osu r e(n 5 ,n 7 ) n 5 – – – n 8 // we r an a r ound t he gr aph on c e! n 6 – – n 7 – No new states are added n 7 – – – n 8 in this and the 
 n 8 – – – n 3, n 9 following iteration! to Q D n 9 – – – – Compiler Construction 04: Lexical analysis in the real world � 14

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend