Compiler Construction Lecture 4: Lexical analysis in the real world - PowerPoint PPT Presentation

Compiler Construction Lecture 4: Lexical analysis in the real world 2020-01-17 Michael Engel Includes material by Jan Christian Meyer

Overview • NFA to DFA conversion • Subset construction algorithm • DFA state minimization: • Hopcroft's algorithm • Myhill-Nerode method • Using a scanner generator • lex syntax and usage • lex examples Compiler Construction 04: Lexical analysis in the real world � 2

What have we achieved so far? • We know a method to convert a regular expression:   (all | and)   into a nondeterministic finite automaton (NFA): l a l a d n using the McNaughton, Thompson and Yamada algorithm Compiler Construction 04: Lexical analysis in the real world � 3

Overhead of constructed NFAs Let’s look at another example: a(b|c)* • Construct the simple NFAs for a , b and c a b c s 1 s 0 s 2 s 4 s 3 s 5 • Construct the NFA for b|c b s 3 ε s 2 ε s 6 s 7 ε ε s 5 s 4 c Compiler Construction 04: Lexical analysis in the real world � 4

Overhead of constructed NFAs • Now construct the NFA for (b|c)* ε b s 3 s 2 ε ε ε ε s 8 s 6 s 7 s 9 ε ε s 5 s 4 c ε • Looks pretty complex already? We're not even finished… Compiler Construction 04: Lexical analysis in the real world � 5

Overhead of constructed NFAs • Finally, construct the NFA for a(b|c)* ε b ε s 3 ε s 2 a ε ε ε s 0 s 1 s 8 s 6 s 7 s 9 ε ε s 5 s 4 c ε • This NFA has many more states than a minimal human-built DFA: b,c a s 1 s 0 Compiler Construction 04: Lexical analysis in the real world � 6

From NFA to DFA • An NFA is not really helpful   …since its implementation is not obvious • We know: every DFA is also an NFA (without ε -transitions) • Every NFA can also be converted to an equivalent DFA   (this can be proven by induction, we just show the construction) • The method to do this is called subset construction: The alphabet 𝛵 stays the same NFA: ( Q N , 𝛵 , 𝜀 N , n 0 , F N ) The set of states Q N ,   transition function 𝜀 N ,   start state q N0 and set of accepting states F N DFA: ( Q D , 𝛵 , 𝜀 D , d 0 , F D ) are modified Compiler Construction 04: Lexical analysis in the real world � 7

Subset construction algorithm Idea of the algorithm: q 0 ← ε - cl osu r e({n 0 }); Find sets of states that are Q D ← q 0 ;   equivalent (due to ε - Wo rk L i s t ← {q 0 }; transitions) and join these to form states of a DFA wh il e (Wo rk L i s t ! = ∅ ) do r emo v e q fr om Wo rk L i s t ; ε -closure: f o r ea c h c ha r a ct e r c ∈︎ 𝛵 do   contains a set of states S and t ← ε - cl osu r e( 𝜀 N (q, c )); any states in the NFA that can 𝜀 D [q, c ] ← t ; be reached from one of the if t ∉ Q D t hen   states in S along paths that add t t o Q D and t o Wo rk L i s t ; contain only ε -transitions end; (these are identical to a state end; in S ) Compiler Construction 04: Lexical analysis in the real world � 8

Subset construction example ε q 0 ← ε - cl osu r e({n 0 }); b n 5 ε ε n 4 Q D ← q 0 ;   ε ε n 1 ε n 0 a Wo rk L i s t ← {q 0 }; n 2 n 3 n 8 n 9 wh il e (Wo rk L i s t ! = ∅ ) do ε ε n 6 n 7 c r emo v e q fr om Wo rk L i s t ; ε f o r ea c h c ha r a ct e r c ∈︎ 𝛵 do   t ← ε - cl osu r e( 𝜀 N (q, c )); a b c ε 𝜀 N 𝜀 D [q, c ] ← t ; n 0 n 1 – – – if t ∉ Q D t hen   n 1 – – – n 2 add t t o Q D and t o Wo rk L i s t ; n 2 – – – n 3, n 9 end; q 0 ← {n 0 }   end; n 3 – – – n 4, n 6 Q D ← {n 0 };   n 4 – n 5 – – Wo rk L i s t ← {n 0 }; n 5 – – – n 8 n 6 – – n 7 – n 7 – – – n 8 n 8 – – – n 3, n 9 n 9 – – – – Compiler Construction 04: Lexical analysis in the real world � 9

Subset construction example ε q 0 ← ε - cl osu r e({n 0 }); b n 5 ε ε n 4 Q D ← q 0 ;   ε ε n 1 ε n 0 a Wo rk L i s t ← {q 0 }; n 2 n 3 n 8 n 9 wh il e (Wo rk L i s t ! = ∅ ) do ε ε n 6 n 7 c r emo v e q fr om Wo rk L i s t ; ε f o r ea c h c ha r a ct e r c ∈︎ 𝛵 do   t ← ε - cl osu r e( 𝜀 N (q, c )); a b c ε 𝜀 N wh il e- l oop I t e r a ti on 1 𝜀 D [q, c ] ← t ; n 0 n 1 – – – Wo rk L i s t ← {{n 0 }}; if t ∉ Q D t hen   q ← n 0 ; n 1 – – – n 2 add t t o Q D and t o Wo rk L i s t ; c ← 'a': n 2 – – – n 3, n 9 end; t ← ε - cl osu r e( 𝜀 N (q, c )) end; n 3 – – – n 4, n 6 = ε - cl osu r e( 𝜀 N (n 0 ,’a'))   n 4 – n 5 – – = ε - cl osu r e(n 1 ) n 5 – – – n 8 = {n 1 , n 2 ,n 3 ,n 4 ,n 6 ,n 9 } n 6 – – n 7 – 𝜀 D [n 0 ,’a'] ← {n 1 , n 2 ,n 3 ,n 4 ,n 6 ,n 9 }; n 7 – – – n 8 Q D ← {{n 0 },{n 1 ,n 2 ,n 3 ,n 4 ,n 6 ,n 9 }}; n 8 – – – n 3, n 9 Wo rk L i s t ← n 9 – – – – {{n 1 ,n 2 ,n 3 ,n 4 ,n 6 ,n 9 }}; Compiler Construction 04: Lexical analysis in the real world � 10

Subset construction example ε q 0 ← ε - cl osu r e({n 0 }); b n 5 ε ε n 4 Q D ← q 0 ;   ε ε n 1 ε n 0 a Wo rk L i s t ← {q 0 }; n 2 n 3 n 8 n 9 wh il e (Wo rk L i s t ! = ∅ ) do ε ε n 6 n 7 c r emo v e q fr om Wo rk L i s t ; ε f o r ea c h c ha r a ct e r c ∈︎ 𝛵 do   t ← ε - cl osu r e( 𝜀 N (q, c )); a b c ε 𝜀 N wh il e- l oop I t e r a ti on 1: 𝜀 D [q, c ] ← t ; n 0 n 1 – – – if t ∉ Q D t hen   Wo rk L i s t ← {n 0 }; n 1 – – – n 2 add t t o Q D and t o Wo rk L i s t ; q ← n 0 ; n 2 – – – n 3, n 9 end; c ← 'b',' c ': end; n 3 – – – n 4, n 6 t ← {} n 4 – n 5 – – no c han g e t o Q D , Wo rkli s t n 5 – – – n 8 We will skip the iterations n 6 – – n 7 – of the for loop that do not   n 7 – – – n 8 from now on n 8 – – – n 3, n 9 change Q D n 9 – – – – Compiler Construction 04: Lexical analysis in the real world � 11

Subset construction example ε q 0 ← ε - cl osu r e({n 0 }); b n 5 ε ε n 4 Q D ← q 0 ;   ε ε n 1 ε n 0 a Wo rk L i s t ← {q 0 }; n 2 n 3 n 8 n 9 wh il e (Wo rk L i s t ! = ∅ ) do ε ε n 6 n 7 c r emo v e q fr om Wo rk L i s t ; ε f o r ea c h c ha r a ct e r c ∈︎ 𝛵 do   t ← ε - cl osu r e( 𝜀 N (q, c )); a b c ε 𝜀 N wh il e- l oop I t e r a ti on 2 𝜀 D [q, c ] ← t ; n 0 n 1 – – – Wo rk L i s t = {{n 1 ,n 2 ,n 3 ,n 4 ,n 6 ,n 9 }}; if t ∉ Q D t hen   q ← {n 1 ,n 2 ,n 3 ,n 4 ,n 6 ,n 9 }; n 1 – – – n 2 add t t o Q D and t o Wo rk L i s t ; c ← 'b': n 2 – – – n 3, n 9 end; t ← ε - cl osu r e( 𝜀 N (q, c )) end; n 3 – – – n 4, n 6 = ε - cl osu r e( 𝜀 N (q,’b’))   n 4 – n 5 – – = ε - cl osu r e(n 5 ) n 5 – – – n 8 = {n 5 , n 8 ,n 9 ,n 3 ,n 4 ,n 6 } n 6 – – n 7 – 𝜀 D [q,’a'] ← {n 5 , n 8 ,n 9 ,n 3 ,n 4 ,n 6 }; n 7 – – – n 8 Q D ← {{n 0 },{n 1 ,n 2 ,n 3 ,n 4 ,n 6 ,n 9 },   n 8 – – – n 3, n 9 {n 5 , n 8 ,n 9 ,n 3 ,n 4 ,n 6 }}; n 9 – – – – Wo rk L i s t ← {{n 5 , n 8 ,n 9 ,n 3 ,n 4 ,n 6 }}; Compiler Construction 04: Lexical analysis in the real world � 12

Subset construction example ε q 0 ← ε - cl osu r e({n 0 }); b n 5 ε ε n 4 Q D ← q 0 ;   ε ε n 1 ε n 0 a Wo rk L i s t ← {q 0 }; n 2 n 3 n 8 n 9 wh il e (Wo rk L i s t ! = ∅ ) do ε ε n 6 n 7 c r emo v e q fr om Wo rk L i s t ; ε f o r ea c h c ha r a ct e r c ∈︎ 𝛵 do   t ← ε - cl osu r e( 𝜀 N (q, c )); a b c ε 𝜀 N wh il e- l oop I t e r a ti on 2 𝜀 D [q, c ] ← t ; n 0 n 1 – – – Wo rk L i s t = {{n 1 ,n 2 ,n 3 ,n 4 ,n 6 ,n 9 }}; if t ∉ Q D t hen   q ← {n 1 ,n 2 ,n 3 ,n 4 ,n 6 ,n 9 }; n 1 – – – n 2 add t t o Q D and t o Wo rk L i s t ; c ← ' c ': n 2 – – – n 3, n 9 end; t ← ε - cl osu r e( 𝜀 N (q, c )) end; n 3 – – – n 4, n 6 = ε - cl osu r e( 𝜀 N (q,’ c ’))   n 4 – n 5 – – = ε - cl osu r e(n 7 ) n 5 – – – n 8 = {n 7 , n 8 ,n 9 ,n 3 ,n 4 ,n 6 } n 6 – – n 7 – 𝜀 D [q,’a’] ← {n 7 , n 8 ,n 9 ,n 3 ,n 4 ,n 6 }; n 7 – – – n 8 Q D ← {{n 0 },{n 1 ,n 2 ,n 3 ,n 4 ,n 6 ,n 9 },   n 8 – – – n 3, n 9 {n 5 , n 8 ,n 9 ,n 3 ,n 4 ,n 6 }, n 9 – – – – {n 7 , n 8 ,n 9 ,n 3 ,n 4 ,n 6 }}; Wo rk L i s t ← {{n 7 , n 8 ,n 9 ,n 3 ,n 4 ,n 6 }}; Compiler Construction 04: Lexical analysis in the real world � 13

Subset construction example ε q 0 ← ε - cl osu r e({n 0 }); b n 5 ε ε n 4 Q D ← q 0 ;   ε ε n 1 ε n 0 a Wo rk L i s t ← {q 0 }; n 2 n 3 n 8 n 9 wh il e (Wo rk L i s t ! = ∅ ) do ε ε n 6 n 7 c r emo v e q fr om Wo rk L i s t ; ε f o r ea c h c ha r a ct e r c ∈︎ 𝛵 do   t ← ε - cl osu r e( 𝜀 N (q, c )); a b c ε 𝜀 N wh il e- l oop I t e r a ti on 3 𝜀 D [q, c ] ← t ; n 0 n 1 – – – Wo rk L i s t = {{n 7 , n 8 ,n 9 ,n 3 ,n 4 ,n 6 }}; if t ∉ Q D t hen   q ← {n 7 , n 8 ,n 9 ,n 3 ,n 4 ,n 6 }; n 1 – – – n 2 add t t o Q D and t o Wo rk L i s t ; c ← 'b',' c ': n 2 – – – n 3, n 9 end; t ← ε - cl osu r e( 𝜀 N (q, c )) end; n 3 – – – n 4, n 6 = ε - cl osu r e( 𝜀 N (q,’ c ’))   n 4 – n 5 – – = ε - cl osu r e(n 5 ,n 7 ) n 5 – – – n 8 // we r an a r ound t he gr aph on c e! n 6 – – n 7 – No new states are added n 7 – – – n 8 in this and the   n 8 – – – n 3, n 9 following iteration! to Q D n 9 – – – – Compiler Construction 04: Lexical analysis in the real world � 14

Compiler Construction Lecture 4: Lexical analysis in the real world - PowerPoint PPT Presentation

Compiler Construction Lecture 4: Lexical analysis in the real world 2020-01-17 Michael Engel Includes material by Jan Christian Meyer Overview NFA to DFA conversion Subset construction algorithm DFA state minimization:

Compiler Construction Chapter 11 1 Compiler Construction Compiler Construction A New Compiler

Compiler Construction Compiler Construction 1 / 111 Mayer Goldberg \ Ben-Gurion University

Compiler Construction November 21, 2018 Compiler Construction November 21, 2018 1 / 102 Mayer

Compiler Construction Compiler Construction 1 / 54 Mayer Goldberg \ Ben-Gurion University Tuesday

Compiler Construction Compiler Construction 1 / 193 Mayer Goldberg \ Ben-Gurion University Friday

Compiler Construction October 20, 2018 Compiler Construction October 20, 2018 1 / 115 Mayer

Compiler Construction Compiler Construction 1 / 177 Mayer Goldberg \ Ben-Gurion University

Compiler Construction Compiler Construction 1 / 87 Mayer Goldberg \ Ben-Gurion University

Compiler Construction Compiler Construction 1 / 88 Mayer Goldberg \ Ben-Gurion University Tuesday

Compiler Construction Compiler Construction 1 / 104 Mayer Goldberg \ Ben-Gurion University Friday

Compiler Construction Compiler Construction 1 / 104 Mayer Goldberg \ Ben-Gurion University Monday

Compiler Construction October 31, 2018 Compiler Construction October 31, 2018 1 / 175 Mayer

Compiler Construction Compiler Construction 1 / 114 Mayer Goldberg \ Ben-Gurion University

Compiler Construction Compiler Construction 1 / 112 Mayer Goldberg \ Ben-Gurion University

Compiler Construction Christian Rinderknecht 31 October 2008 1 Why study compiler construction?

Compiler Construction Lecture 19: Code Generation V (Compiler Backend) Winter Semester 2018/19

Analysis of patterns and minimal embeddings of non-Markovian sequences

Finding Software Bugs Using Active Automata Learning Frits Vaandrager Radboud University

Avoiding Dead States in Query Learning of Regular Tree Languages Frank Drewes work

Some remarks on Bisimulation and Coinduction Davide Sangiorgi University of Bologna Email:

C4.1 Pumping Lemma Regular NFAs Languages Automata & Regular Formal Languages

Reverse Mathematics and Field Extensions Preliminary Report ois Dorais, Jeff Hirst 1 , Paul

The Complexity of Semiautomatic Structures Sanjay Jain, Singapore Bakhadyr Khoussainov, Auckland

Active Automata Learning: From DFA to Interface Programs and Beyond or From Languages to Program

Compiler Construction Lecture 4: Lexical analysis in the real world - PowerPoint PPT Presentation

Compiler Construction Lecture 4: Lexical analysis in the real world 2020-01-17 Michael Engel Includes material by Jan Christian Meyer Overview NFA to DFA conversion Subset construction algorithm DFA state minimization:

Compiler Construction Chapter 11 1 Compiler Construction Compiler Construction A New Compiler

Compiler Construction Compiler Construction 1 / 111 Mayer Goldberg \ Ben-Gurion University

Compiler Construction November 21, 2018 Compiler Construction November 21, 2018 1 / 102 Mayer

Compiler Construction Compiler Construction 1 / 54 Mayer Goldberg \ Ben-Gurion University Tuesday

Compiler Construction Compiler Construction 1 / 193 Mayer Goldberg \ Ben-Gurion University Friday

Compiler Construction October 20, 2018 Compiler Construction October 20, 2018 1 / 115 Mayer

Compiler Construction Compiler Construction 1 / 177 Mayer Goldberg \ Ben-Gurion University

Compiler Construction Compiler Construction 1 / 87 Mayer Goldberg \ Ben-Gurion University

Compiler Construction Compiler Construction 1 / 88 Mayer Goldberg \ Ben-Gurion University Tuesday

Compiler Construction Compiler Construction 1 / 104 Mayer Goldberg \ Ben-Gurion University Friday

Compiler Construction Compiler Construction 1 / 104 Mayer Goldberg \ Ben-Gurion University Monday

Compiler Construction October 31, 2018 Compiler Construction October 31, 2018 1 / 175 Mayer

Compiler Construction Compiler Construction 1 / 114 Mayer Goldberg \ Ben-Gurion University

Compiler Construction Compiler Construction 1 / 112 Mayer Goldberg \ Ben-Gurion University

Compiler Construction Christian Rinderknecht 31 October 2008 1 Why study compiler construction?

Compiler Construction Lecture 19: Code Generation V (Compiler Backend) Winter Semester 2018/19

Analysis of patterns and minimal embeddings of non-Markovian sequences

Finding Software Bugs Using Active Automata Learning Frits Vaandrager Radboud University

Avoiding Dead States in Query Learning of Regular Tree Languages Frank Drewes work

Some remarks on Bisimulation and Coinduction Davide Sangiorgi University of Bologna Email:

C4.1 Pumping Lemma Regular NFAs Languages Automata &amp; Regular Formal Languages

Reverse Mathematics and Field Extensions Preliminary Report ois Dorais, Jeff Hirst 1 , Paul

The Complexity of Semiautomatic Structures Sanjay Jain, Singapore Bakhadyr Khoussainov, Auckland

Active Automata Learning: From DFA to Interface Programs and Beyond or From Languages to Program

C4.1 Pumping Lemma Regular NFAs Languages Automata & Regular Formal Languages