Lexical analysis Lexical analysis NFA N for ( a | b )* abb ( | ) - - PowerPoint PPT Presentation

lexical analysis lexical analysis
SMART_READER_LITE
LIVE PREVIEW

Lexical analysis Lexical analysis NFA N for ( a | b )* abb ( | ) - - PowerPoint PPT Presentation

Lexical analysis Lexical analysis NFA N for ( a | b )* abb ( | ) A = {0, 1, 2, 4, 7} (= -closure(0)) B = {1, 2, 3, 4, 6, 7, 8} (= move({0, 1, 2, 4, 7}, a) = move(A, a)) C = {1, 2, 4, 5, 6, 7} (= move({0, 1, 2, 4, 7}, b) =


slide-1
SLIDE 1

Lexical analysis

NFA N for (a|b)*abb ( | )

2 3 a ε 10 1 2 3 6 7 8 9 a b b ε ε ε ε 4 5 b ε ε ε ε

/ Faculteit Wiskunde en Informatica

PAGE 0 14-9-2011

Lexical analysis

  • A = {0, 1, 2, 4, 7} (= ε-closure(0))
  • B = {1, 2, 3, 4, 6, 7, 8}

(= move({0, 1, 2, 4, 7}, a) = move(A, a))

  • C = {1, 2, 4, 5, 6, 7}

(= move({0, 1, 2, 4, 7}, b) = move(A, b)) B = {1 2 3 4 6 7 8} (= move({1 2 3 4 6 7 8} a) = move(B a))

  • B = {1, 2, 3, 4, 6, 7, 8}

(= move({1, 2, 3, 4, 6, 7, 8}, a) = move(B, a))

  • D = {1, 2, 4, 5, 6, 7, 9}

(= move({1, 2, 3, 4, 6, 7, 8}, b) = move(B, b))

  • B = {1, 2, 3, 4, 6, 7, 8}

(= move({1, 2, 3, 4, 6, 7}, a) = move(C, a))

  • C = {1, 2, 3, 4, 6, 7}

(= move({1, 2, 3, 4, 6, 7}, b) = move(C, b))

  • B = {1, 2, 3, 4, 6, 7, 8}

(= move({1, 2, 4, 5, 6, 7, 9}, a) = move(D, a))

  • E = {1, 2, 4, 5, 6, 7, 10}

(= move({1, 2, 4, 5, 6, 7, 9}, b) = move(D, b)) { , , , , , , } ( ({ , , , , , , }, ) ( , ))

  • B = {1, 2, 3, 4, 6, 7, 8}

(= move({1, 2, 4, 5, 6, 7, 10}, a) = move(E, a))

  • C = {1, 2, 3, 4, 6, 7}

(= move({1, 2, 4, 5, 6, 7, 10}, b) = move(E, b))

/ Faculteit Wiskunde en Informatica

PAGE 1 14-9-2011

Lexical analysis

  • Resulting DFA

g

a a

state a b

A E D B a b b a

state a b A B C B B D

C b b a a

C B C D B E E B C

b

E B C

/ Faculteit Wiskunde en Informatica

PAGE 2 14-9-2011

Lexical analysis

DFA  minimal DFA (MDFA) DFA = (Q, V, δ, q0, F) equivalence relation  on states; for all s, t  Q equivalence relation

  • n states; for all s, t  Q

s  t  (w : w  V* : δ(s,w)  F  δ(t,w)  F) Definition (equivalence class of state q)  Q  P(Q) (q : q Q :  (q) = {q’ | q q’}   Q  P(Q) (q : q  Q :  (q) = {q’ | q  q’} Partitioning state set Q according to relation  yields state set Q’ which is used in the minimal DFA (MDFA) (Q’, V, δ’,  (q0), F’) where (s, a : s  V : δ’( (s),a) = (δ(s,a)) F’ = { (f) | f  F}

/ Faculteit Wiskunde en Informatica

PAGE 3 14-9-2011

{ ( ) | }

slide-2
SLIDE 2

Lexical analysis

Minimizing states of a DFA g

Input: DFA M = (Q, V, δ, q0, F) Output: DFA M’ with less states and same behaviour Algorithm:

1. Construct initial partition  of sets of states of two groups: accepting states F and non-accepting states Q – F accepting states F and non-accepting states Q – F

2.  new :=  ; FOR each group G of  new DO

partition G in subgroups such that states s and t of G are in same p g p subgroup if and only if for all input symbols a, states s and t have transitions to states in the same group of  new; replace G in  new by set of all subgroups formed

END

/ Faculteit Wiskunde en Informatica

PAGE 4 14-9-2011

Lexical analysis

3. If  new = ,  final :=  else repeat step (2) with  :=  new 4. Choose one state in each group of  final as representative as states

  • f the minimal DFA M’;

Construct new transition table using these representatives, where:

t t t t f M’ i th t ti t i i f M

  • start state of M’ is the representative containing q0 of M
  • accepting states M’ of are representatives in F

5. If M’ has a dead state, a state d that is not accepting and only transitions to itself, then remove d from M’; transitions to itself, then remove d from M ; Remove all states not reachable from start state

/ Faculteit Wiskunde en Informatica

PAGE 5 14-9-2011

Lexical analysis

  • Given DFA

a a A E D B a b b a C b b a a b

/ Faculteit Wiskunde en Informatica

PAGE 6 14-9-2011

Lexical analysis

  • First partition:

(E) accepting state and (ABCD) non-accepting state  new = (ABCD)(E)

  • Consider (E): can not be split (single state) so put into 

Consider (E): can not be split (single state), so put into  new

  • Consider (ABCD): for all states (ABCD) in on a to state B;

for b states A, B and C go to (ABCD) and D goes to (E); thus  = (ABC)(D)(E) thus  new = (ABC)(D)(E)

  • Consider (ABC): for all states (ABC) on a to state B;

for b states A and C go to C and B goes to D; thus  = (AC)(B)(D)(E) thus  new = (AC)(B)(D)(E)

  • Consider (AC): for all states (AC) on a to state B;

for all states (AC) on b go to C; thus  = (AC)(B)(D)(E) thus  new = (AC)(B)(D)(E)

/ Faculteit Wiskunde en Informatica

PAGE 7 14-9-2011

slide-3
SLIDE 3

Lexical analysis

  • Transition table

b a

state a b A B A (AC)

A E D B a b b a a

B B D D B E

b a

D B E E B A (AC)

/ Faculteit Wiskunde en Informatica

PAGE 8 14-9-2011

Lexical analysis

  • See, section 3.3.2 of

http://www.win.tue.nl/~mvdbrand/courses/GLT/1112/papers/not es.pdf

for an implementation scheme for scanners. for an implementation scheme for scanners.

/ Faculteit Wiskunde en Informatica

PAGE 9 14-9-2011

Lexical analysis

LEX is a scanner generator which transforms regular expressions into a finite a tomaton r e NFA into a finite automaton: r.e.  NFA re0 {action0} re1 {action1}

i0 f0

… … rek {actionk}

… start ε ε

F = {f0, …, fk} NFA  DFA

ik fk ε

accepting states have the form {…, fa, …, fb, …, fc, …} with corresponding action: actionmin(a,b,c)

/ Faculteit Wiskunde en Informatica

PAGE 10 14-9-2011

  • LEX allows:
  • character classes, e.g. [0-9]+ in regular expressions
  • “semantic” actions with regular expression

[0-9]+ { value = atoi(yytext); sum += value;} [0 9]+ { value = atoi(yytext); sum += value;}

  • yytext contains the recognized string
  • in accepting state the semantic action is executed

/ Faculteit Wiskunde en Informatica

PAGE 11 14-9-2011

slide-4
SLIDE 4

Lexical analysis

  • Example of a (f)lex specification

/*** Definition section ***/ %{ /* C code to be copied verbatim */ #include <stdio.h> %} /* This tells flex to read only one input file */ / This tells flex to read only one input file / %option noyywrap %% /*** Rules section ***/ /* [0-9]+ matches a string of one or more digits */ [0-9]+ { /* yytext is a string containing the matched text. */ printf("Saw an integer: %s\n", yytext); } .|\n { /* Ignore all other characters. */ } %% %% /*** C Code section ***/ int main(void) { /* Call the lexer, then quit. */ yylex(); return 0; }

/ Faculteit Wiskunde en Informatica

PAGE 12 14-9-2011

Lexical analysis

  • Resolution of ambiguities
  • Longest match is preferred
  • If two alternatives recognize the same sequence of

characters, the alternative occurring first in the specification i h is chosen BEGIN [sym := beginsym] IF [sym := ifsym] … letter.(letter | digit)* [sym := idsym] digit.(digit)* [sym := intrepsym] := [sym := becomessym]

/ Faculteit Wiskunde en Informatica

PAGE 13 14-9-2011