Lexical analysis Lexical analysis NFA N for ( a | b )* abb ( | ) • A = {0, 1, 2, 4, 7} (= ε -closure(0)) ε • B = {1, 2, 3, 4, 6, 7, 8} (= move({0, 1, 2, 4, 7}, a) = move(A, a)) • C = {1, 2, 4, 5, 6, 7} (= move({0, 1, 2, 4, 7}, b) = move(A, b)) a 2 2 3 3 • B = {1 2 3 4 6 7 8} B = {1, 2, 3, 4, 6, 7, 8} (= move({1, 2, 3, 4, 6, 7, 8}, a) = move(B, a)) (= move({1 2 3 4 6 7 8} a) = move(B a)) ε ε • D = {1, 2, 4, 5, 6, 7, 9} (= move({1, 2, 3, 4, 6, 7, 8}, b) = move(B, b)) ε ε a b b 0 1 6 7 8 9 10 • B = {1, 2, 3, 4, 6, 7, 8} (= move({1, 2, 3, 4, 6, 7}, a) = move(C, a)) ε ε • C = {1, 2, 3, 4, 6, 7} (= move({1, 2, 3, 4, 6, 7}, b) = move(C, b)) b 4 5 • B = {1, 2, 3, 4, 6, 7, 8} (= move({1, 2, 4, 5, 6, 7, 9}, a) = move(D, a)) • E = {1, 2, 4, 5, 6, 7, 10} { , , , , , , } ( (= move({1, 2, 4, 5, 6, 7, 9}, b) = move(D, b)) ({ , , , , , , }, ) ( , )) ε ε • B = {1, 2, 3, 4, 6, 7, 8} (= move({1, 2, 4, 5, 6, 7, 10}, a) = move(E, a)) • C = {1, 2, 3, 4, 6, 7} (= move({1, 2, 4, 5, 6, 7, 10}, b) = move(E, b)) / Faculteit Wiskunde en Informatica / Faculteit Wiskunde en Informatica 14-9-2011 PAGE 0 14-9-2011 PAGE 1 Lexical analysis Lexical analysis DFA minimal DFA (MDFA) • Resulting DFA g DFA = ( Q , V , δ , q 0 , F ) a a equivalence relation on states; for all s , t Q on states; for all s , t Q equivalence relation state state a a b b s t ( w : w V* : δ ( s , w ) F δ ( t ,w ) F ) a b b A B C A B D E Definition (equivalence class of state q ) B B D a a Q P ( Q ) ( q : q Q : ( q ) = { q’ | q q’ } Q P ( Q ) ( q : q Q : ( q ) = { q’ | q q’ } a C B C b b D B E Partitioning state set Q according to relation yields state set Q’ C E E B B C C which is used in the minimal DFA (MDFA) ( Q’ , V , δ ’ , ( q 0 ), F’ ) b where ( s , a : s V : δ ’ ( ( s ) ,a ) = ( δ ( s ,a )) F’ = { ( f ) | f F } { ( ) | } / Faculteit Wiskunde en Informatica 14-9-2011 PAGE 2 / Faculteit Wiskunde en Informatica 14-9-2011 PAGE 3
Lexical analysis Lexical analysis If new = , final := else repeat step (2) with := new Minimizing states of a DFA g 3. Choose one state in each group of final as representative as states 4. Input: DFA M = ( Q , V , δ , q 0 , F ) of the minimal DFA M’ ; Output: DFA M’ with less states and same behaviour Construct new transition table using these representatives, where: • start state of M’ is the representative containing q 0 of M t t t t f M’ i th t ti t i i f M Algorithm: • accepting states M’ of are representatives in F 5. If M’ has a dead state, a state d that is not accepting and only 1. Construct initial partition of sets of states of two groups: transitions to itself, then remove d from M ; transitions to itself, then remove d from M’ ; accepting states F and non-accepting states Q – F accepting states F and non-accepting states Q – F new := ; Remove all states not reachable from start state 2. FOR each group G of new DO partition G in subgroups such that states s and t of G are in same p g p subgroup if and only if for all input symbols a , states s and t have transitions to states in the same group of new ; replace G in new by set of all subgroups formed END / Faculteit Wiskunde en Informatica / Faculteit Wiskunde en Informatica 14-9-2011 PAGE 4 14-9-2011 PAGE 5 Lexical analysis Lexical analysis • Given DFA • First partition: ( E ) accepting state and ( ABCD ) non-accepting state new = ( ABCD )( E ) a a Consider ( E ): can not be split (single state) so put into Consider ( E ): can not be split (single state), so put into new • a b b • Consider ( ABCD ): for all states ( ABCD ) in on a to state B ; A B D E for b states A , B and C go to ( ABCD ) and D goes to ( E ); a a thus new = ( ABC )( D )( E ) thus = ( ABC )( D )( E ) a b b • Consider ( ABC ): for all states ( ABC ) on a to state B ; for b states A and C go to C and B goes to D ; C thus thus new = ( AC )( B )( D )( E ) = ( AC )( B )( D )( E ) b • Consider (AC): for all states ( AC ) on a to state B ; for all states ( AC ) on b go to C ; thus new = ( AC )( B )( D )( E ) thus = ( AC )( B )( D )( E ) / Faculteit Wiskunde en Informatica 14-9-2011 PAGE 6 / Faculteit Wiskunde en Informatica 14-9-2011 PAGE 7
Lexical analysis Lexical analysis • Transition table b • See, section 3.3.2 of a a a http://www.win.tue.nl/~mvdbrand/courses/GLT/1112/papers/not state a b es.pdf a b b A B A (AC) A B D E for an implementation scheme for scanners. for an implementation scheme for scanners. b B B D a D D B B E E E B A (AC) / Faculteit Wiskunde en Informatica / Faculteit Wiskunde en Informatica 14-9-2011 PAGE 8 14-9-2011 PAGE 9 Lexical analysis LEX is a scanner generator which transforms regular expressions • LEX allows: into a finite automaton: r.e. NFA into a finite a tomaton r e NFA • character classes, e.g. [0-9]+ in regular expressions { action 0 } re 0 • “semantic” actions with regular expression { action 1 } re 1 f 0 i 0 [0 9]+ { value = atoi(yytext); sum += value;} [0-9]+ { value = atoi(yytext); sum += value;} 0 … … ε • yytext contains the recognized string { action k } … re k • in accepting state the semantic action is executed start ε ε F = { f 0 , …, f k } i k f k NFA DFA accepting states have the form {…, f a , …, f b , …, f c , …} with corresponding action: action min(a,b,c) / Faculteit Wiskunde en Informatica 14-9-2011 PAGE 10 / Faculteit Wiskunde en Informatica 14-9-2011 PAGE 11
Lexical analysis Lexical analysis • Resolution of ambiguities • Example of a (f)lex specification /*** Definition section ***/ • Longest match is preferred %{ /* C code to be copied verbatim */ • If two alternatives recognize the same sequence of #include <stdio.h> characters, the alternative occurring first in the specification %} /* This tells flex to read only one input file */ / This tells flex to read only one input file / is chosen i h %option noyywrap %% /*** Rules section ***/ BEGIN [ sym := beginsym ] /* [0-9]+ matches a string of one or more digits */ IF [ sym := ifsym ] [0-9]+ { /* yytext is a string containing the matched text. */ printf("Saw an integer: %s\n", yytext); … } letter .( letter | digit )* [ sym := idsym ] .|\n { /* Ignore all other characters. */ } %% %% digit .( digit )* [ sym := intrepsym ] /*** C Code section ***/ := [ sym := becomessym ] int main(void) { /* Call the lexer, then quit. */ yylex(); return 0; } / Faculteit Wiskunde en Informatica / Faculteit Wiskunde en Informatica 14-9-2011 PAGE 12 14-9-2011 PAGE 13
Recommend
More recommend