NFA Example Input: a a - - PowerPoint PPT Presentation

nfa example
SMART_READER_LITE
LIVE PREVIEW

NFA Example Input: a a - - PowerPoint PPT Presentation

NFA Example Input: a a B. Ward Spring 2014 1 NFA Example current state Input: a a


slide-1
SLIDE 1
  • B. Ward — Spring 2014
  • NFA Example

1

a

Input:

a

slide-2
SLIDE 2
  • B. Ward — Spring 2014
  • NFA Example

2

a

Input:

a

current input character

current state

Epsilon transition: Can transition from State 1 to State 2 without consuming any input.

slide-3
SLIDE 3
  • B. Ward — Spring 2014
  • NFA Example

3

a

Input:

a

current input character

current state

Regular transition: Can transition from State 2 to State 3, which consumes the first ‘a’.

slide-4
SLIDE 4
  • B. Ward — Spring 2014
  • NFA Example

4

a

Input:

a

current input character

current state

Epsilon transition: Can transition from State 3 to State 2 without consuming any input.

slide-5
SLIDE 5
  • B. Ward — Spring 2014
  • NFA Example

5

a

Input:

a

current input character

current state

Regular transition: Can transition from State 2 to State 3, which consumes the second ‘a’.

slide-6
SLIDE 6
  • B. Ward — Spring 2014
  • NFA Example

6

a

Input:

a

current input character

current state

Epsilon transition from State 3 to 4: End of input reached, but the NFA can still carry out epsilon transitions.

slide-7
SLIDE 7
  • B. Ward — Spring 2014
  • NFA Example

7

a

Input:

a

current input character

current state

Input Accepted: There exists a sequence of transitions such that the NFA is in a final state at the end of input.

slide-8
SLIDE 8
  • B. Ward — Spring 2014

Equivalent DFA Construction

  • Constructing a DFA corresponding to a RE.

➡In theory, this requires two steps.

  • From a RE to an equivalent NFA.
  • From the NFA to an equivalent DFA.
  • To be practical, we require a third optimization step.

➡Large DFA to minimal DFA.

8

slide-9
SLIDE 9
  • B. Ward — Spring 2014
  • Example

0*1(1|0)*

RE to NFA

NFA to DFA Optimization

Final DFA

9

slide-10
SLIDE 10
  • B. Ward — Spring 2014

Step 1: RE ➔ NFA

10

  • Every RE can be converted to a NFA by repeatedly

applying four simple rules.

➡Base case: a single character. ➡Concatenation: joining two REs in sequence. ➡Alternation: joining two REs in parallel. ➡Kleene Closure: repeating a RE.

(recall the definition of a RE)

slide-11
SLIDE 11
  • B. Ward — Spring 2014

The Four NFA Construction Rules

S

Rule 1—Base case: ‘a’

a

11

slide-12
SLIDE 12
  • B. Ward — Spring 2014

The Four NFA Construction Rules

S

Rule 1—Base case: ‘a’

a

12

Simple two-state NFA (even DFA, too).

slide-13
SLIDE 13
  • B. Ward — Spring 2014

The Four NFA Construction Rules

Rule 2—Concatenation: AB

S

A

S

B

S

AB

followed by

13

slide-14
SLIDE 14
  • B. Ward — Spring 2014

The Four NFA Construction Rules

Rule 2—Concatenation: AB

S

A

S

B

S

AB

14

Not just two states, but any NFA with a single final state. followed by

slide-15
SLIDE 15
  • B. Ward — Spring 2014

The Four NFA Construction Rules

Rule 3--Alternation: “A|B”

S

A

S

B B

S

  • r

A A|B

ε ε ε ε

15

slide-16
SLIDE 16
  • B. Ward — Spring 2014

Rule 3--Alternation: “A|B”

S

A

S

B B

S

  • r

A A|B

ε ε ε ε

The Four NFA Construction Rules

16

Notice the epsilon transitions.

slide-17
SLIDE 17
  • B. Ward — Spring 2014

The Four NFA Construction Rules

Rule 4—Kleene Closure: “A*”

S

A

S

A A*

ε ε ε ε

17

slide-18
SLIDE 18
  • B. Ward — Spring 2014

Rule 4—Kleene Closure: “A*”

S

A

S

A A*

ε ε ε ε

The Four NFA Construction Rules

18

Notice the epsilon transitions.

slide-19
SLIDE 19
  • B. Ward — Spring 2014

Rule 4—Kleene Closure: “A*”

S

A

S

A A*

ε ε ε ε

The Four NFA Construction Rules

zero occurrences

19

  • ne occurrence

repetition

slide-20
SLIDE 20
  • B. Ward — Spring 2014

Overview

S S

B

S

A

ε ε ε ε

S

A

ε ε ε ε

a B A

20

  • Four rules:

➡Create two-state

NFAs for individual symbols,
 e.g., ‘a’.

➡Append consecutive

NFAs, e.g., AB.

➡Alternate choices in

parallel, e.g., A|B.

➡Repeat Kleene Star,


e.g., A*.

slide-21
SLIDE 21
  • B. Ward — Spring 2014
  • NFA Construction Example

21

Regular expression: (a|b)(c|d)e*

Apply Rule 1: Apply Rule 3:

a|b c|d

B A

ε ε ε ε

slide-22
SLIDE 22
  • B. Ward — Spring 2014
  • NFA Construction Example

22

Regular expression: (a|b)(c|d)e*

Apply Rule 2:

B A

(a|b)(c|d)

  • a|b

c|d

slide-23
SLIDE 23
  • B. Ward — Spring 2014
  • NFA Construction Example

23

Regular expression: (a|b)(c|d)e*

Apply Rule 1: Apply Rule 4:

S

A

ε ε ε ε

e*

slide-24
SLIDE 24
  • B. Ward — Spring 2014
  • NFA Construction Example

24

Regular expression: (a|b)(c|d)e*

e* (a|b)(c|d)

Apply Rule 2:

B A

(a|b)(c|d)e*

slide-25
SLIDE 25
  • B. Ward — Spring 2014

Step 2: NFA ➔ DFA

25

  • Simulating NFA requires exploration of all paths.

➡Either in parallel (memory consumption!). ➡Or with backtracking (large trees!). ➡Both are impractical.

  • Instead, we derive a DFA that encodes all possible paths.

➡Instead of doing a specific parallel search each time that we

simulate the NFA, we do it only once in general.

  • Key idea: for each input character, find sets of NFA states that

can be reached.

➡These are the states that a parallel search would explore. ➡Create a DFA state + transitions for each such set. ➡Final states: a DFA state is a final state if its corresponding set

  • f NFA states contains at least one final NFA state.
slide-26
SLIDE 26
  • B. Ward — Spring 2014

26

NFA-to-DFA-CONVERSION: todo: stack of sets of NFA states. push {NFA start state and all epsilon-reachable states} onto todo while (todo is not empty): curNFA: set of NFA states curDFA: a DFA state

  • curNFA = todo.pop

mark curNFA as done

  • curDFA = find or create DFA state corresponding to curNFA
  • reachableNFA: set of NFA states
  • reachableDFA: a DFA state
  • for each symbol x for which at least one state in curNFA has a transition:

reachableNFA = find each state that is reachable from a state in curNFA via one x transition and any number of epsilon transitions

  • if (reachableNFA is not empty and not done):

push reachableNFA onto todo reachableDFA = find or create DFA state corresponding to reachableNFA add transition on x from curDFA to reachableDFA end for end while

slide-27
SLIDE 27
  • B. Ward — Spring 2014

DFA Conversion Example

  • Regular expression: (a|b)(c|d)e*

continues…

27

slide-28
SLIDE 28
  • B. Ward — Spring 2014

DFA Conversion Example

28

  • Regular expression: (a|b)(c|d)e*

continues…

First Step: before any input is consumed Find all states that are reachable from the start state via epsilon transitions.

slide-29
SLIDE 29
  • B. Ward — Spring 2014

DFA Conversion Example

29

  • Regular expression: (a|b)(c|d)e*

continues…

First Step: before any input is consumed Create corresponding DFA start state.

slide-30
SLIDE 30
  • B. Ward — Spring 2014

DFA Conversion Example

30

  • Regular expression: (a|b)(c|d)e*

continues…

Next: find all input characters for which transitions in start set exist. ‘a’ and ‘b’ in this case.

slide-31
SLIDE 31
  • B. Ward — Spring 2014

DFA Conversion Example

31

  • Regular expression: (a|b)(c|d)e*

continues…

For each such input character, determine the set of reachable states
 (including epsilon transitions). On an ‘b’, NFA can reach states 5,6,7, and 9.

slide-32
SLIDE 32
  • B. Ward — Spring 2014

DFA Conversion Example

32

  • Regular expression: (a|b)(c|d)e*

continues…

Create DFA states for each distinct reachable set of states.

slide-33
SLIDE 33
  • B. Ward — Spring 2014

DFA Conversion Example

33

  • Regular expression: (a|b)(c|d)e*

continues…

On an ‘a’, NFA can reach states 3,6,7, and 9.

slide-34
SLIDE 34
  • B. Ward — Spring 2014

DFA Conversion Example

34

  • Regular expression: (a|b)(c|d)e*

continues…

Create DFA states for each distinct reachable set of states.

slide-35
SLIDE 35
  • B. Ward — Spring 2014
  • DFA Conversion Example

35

  • Regular expression: (a|b)(c|d)e*

continues…

Repeat process for each newly- discovered set of states.

done

slide-36
SLIDE 36
  • B. Ward — Spring 2014
  • DFA Conversion Example
  • Regular expression: (a|b)(c|d)e*

Reachable states:

  • n a ‘c’: 8,11,12,14

done

36

slide-37
SLIDE 37
  • B. Ward — Spring 2014

DFA Conversion Example

  • Regular expression: (a|b)(c|d)e*

Create state and transitions for the set of reachable states.

done

37

slide-38
SLIDE 38
  • B. Ward — Spring 2014

DFA Conversion Example

  • Regular expression: (a|b)(c|d)e*

Reachable states:

  • n a ‘d’: 10,11,12,14

done

38

slide-39
SLIDE 39
  • B. Ward — Spring 2014

DFA Conversion Example

39

  • Regular expression: (a|b)(c|d)e*

Reachable states:

  • n a ‘d’: 10,11,12,14

Create state and transitions for the set of reachable states.

done

slide-40
SLIDE 40
  • B. Ward — Spring 2014

DFA Conversion Example

40

  • Regular expression: (a|b)(c|d)e*

Note: both new DFA states are final states because their corresponding sets include NFA state 14, which is a final state.

done done

slide-41
SLIDE 41
  • B. Ward — Spring 2014

DFA Conversion Example

41

  • Regular expression: (a|b)(c|d)e*

done done

Repeat process for State [3, 6, 7, 9].

slide-42
SLIDE 42
  • B. Ward — Spring 2014

DFA Conversion Example

42

  • Regular expression: (a|b)(c|d)e*

done done

Reachable states:

  • n a ‘d’: 10,11,12,14

Reachable states:

  • n a ‘c’: 8,11,12,14
slide-43
SLIDE 43
  • B. Ward — Spring 2014

DFA Conversion Example

43

  • Regular expression: (a|b)(c|d)e*

done done

There already exist DFA states corresponding to those sets! Just add transitions to these states.

slide-44
SLIDE 44
  • B. Ward — Spring 2014

DFA Conversion Example

44

  • Regular expression: (a|b)(c|d)e*

done done done

Repeat process for State [10, 11, 12, 14]. Reachable states:

  • n an ‘e’: 12, 13, 14
slide-45
SLIDE 45
  • B. Ward — Spring 2014

DFA Conversion Example

45

  • Regular expression: (a|b)(c|d)e*

done done done

Create state and transitions for the set of reachable states.

slide-46
SLIDE 46
  • B. Ward — Spring 2014

DFA Conversion Example

46

  • Regular expression: (a|b)(c|d)e*

done done done

Repeat process for State [8, 11, 12, 14]. Reachable states:

  • n an ‘e’: 12, 13, 14

done

98

slide-47
SLIDE 47
  • B. Ward — Spring 2014

DFA Conversion Example

47

  • Regular expression: (a|b)(c|d)e*

done done done done

State already exists. Just create transition.

slide-48
SLIDE 48
  • B. Ward — Spring 2014

DFA Conversion Example

48

  • Regular expression: (a|b)(c|d)e*

done done done done done

Repeat process for State [12, 13, 14]. Reachable states:

  • n an ‘e’: 12, 13, 14 (itself!)
slide-49
SLIDE 49
  • B. Ward — Spring 2014

DFA Conversion Example

49

  • Regular expression: (a|b)(c|d)e*

done done done done done

There is no “escape” from the set of states [12, 13, 14]

  • n an ‘e’. Thus, create a self-loop.
slide-50
SLIDE 50
  • B. Ward — Spring 2014

DFA Conversion Example

50

  • Regular expression: (a|b)(c|d)e*

done done done done done done

The result: an equivalent DFA!

slide-51
SLIDE 51
  • B. Ward — Spring 2014

NFA ➔ DFA Conversion

51

  • Any NFA can be converted into an equivalent DFA

using this method.

  • However, the number of states can increase

exponentially.

  • With careful syntax design, this problem can be

avoided in practice.

  • Limitation: resulting DFA is not necessarily optimal.
slide-52
SLIDE 52
  • B. Ward — Spring 2014

NFA ➔ DFA Conversion

52

  • Any NFA can be converted into an equivalent DFA

using this method.

  • However, the number of states can increase

exponentially.

  • With careful syntax design, this problem can be

avoided in practice.

  • Limitation: resulting DFA is not necessarily optimal.
  • These two states are equivalent: for each

input element, they both lead to the same state. Thus, having two states is unnecessary.

slide-53
SLIDE 53
  • B. Ward — Spring 2014

Step 3: DFA Minimization

  • Goal: obtain minimal DFA.

➡For each RE, the minimal DFA is unique (ignoring

simple renaming).

➡DFA minimization: merge states that are

equivalent.

  • Key idea: it’s easier to split.

➡Start with two partitions: final and non-final states. ➡Repeatedly split partitions until all partitions

contain only equivalent states.

➡Two states S1, S2 are equivalent if all their

transitions “agree,” i.e., if there exists an input symbol x such that the DFA transitions (on input x) to a state in partition P1 if in S1 and to state in partition P2 if in S2 and P1≠P2, then S1 and S2 are not equivalent.

  • 53
slide-54
SLIDE 54
  • B. Ward — Spring 2014

Step 3: DFA Minimization

  • Goal: obtain minimal DFA.

➡For each RE, the minimal DFA is unique (ignoring

simple renaming).

➡DFA minimization: merge states that are

equivalent.

  • Key idea: it’s easier to split.

➡Start with two partitions: final and non-final states. ➡Repeatedly split partitions until all partitions

contain only equivalent states.

➡Two states S1, S2 are equivalent if all their

transitions “agree,” i.e., if there exists an input symbol x such that the DFA transitions (on input x) to a state in partition P1 if in S1 and to state in partition P2 if in S2 and P1≠P2, then S1 and S2 are not equivalent.

  • Part. 1
  • Part. 2
  • Part. 3

54

slide-55
SLIDE 55
  • B. Ward — Spring 2014

Step 3: DFA Minimization

  • Goal: obtain minimal DFA.

➡For each RE, the minimal DFA is unique (ignoring

simple renaming).

➡DFA minimization: merge states that are

equivalent.

  • Key idea: it’s easier to split.

➡Start with two partitions: final and non-final states. ➡Repeatedly split partitions until all partitions

contain only equivalent states.

➡Two states S1, S2 are equivalent if all their

transitions “agree,” i.e., if there exists an input symbol x such that the DFA transitions (on input x) to a state in partition P1 if in S1 and to state in partition P2 if in S2 and P1≠P2, then S1 and S2 are not equivalent.

  • Part. 1
  • Part. 2
  • Part. 3

A and B are equivalent. C is not equivalent to either A or B. Because it has a transition into Part.3.

55

slide-56
SLIDE 56
  • B. Ward — Spring 2014

DFA Minimization Example

56

slide-57
SLIDE 57
  • B. Ward — Spring 2014

DFA Minimization Example

57

  • Partition final and non-final states.

Final Non-Final

slide-58
SLIDE 58
  • B. Ward — Spring 2014

DFA Minimization Example

58

  • Examine final states.

All final states are equivalent!

Final Non-Final

slide-59
SLIDE 59
  • B. Ward — Spring 2014

DFA Minimization Example

59

  • [1,2,4] is not equivalent to any other state:

it is the only state with a transition to the non-final partition.

Final Non-Final

slide-60
SLIDE 60
  • B. Ward — Spring 2014

DFA Minimization Example

60

  • [5,6,7,9] and [3,6,7,9] are equivalent.

Thus, we are done.

Final

slide-61
SLIDE 61
  • B. Ward — Spring 2014
  • DFA Minimization Example

61

  • Create one state for each partition.

We have obtained a minimal DFA for (a|b)(c|d)e*.

slide-62
SLIDE 62
  • B. Ward — Spring 2014

Recognizing Multiple Tokens

62

  • Construction up to this point can only recognize a

single token type.

➡Results in Accept or Reject, but does not yield

which token was seen.

  • Real lexical analysis must discern between

multiple token types.

  • Solution: annotate final states with token type.
slide-63
SLIDE 63
  • B. Ward — Spring 2014

Multi Token Construction

  • To build DFA for N tokens:

➡Create a NFA for each token type RE as before. ➡Join all token NFAs as shown below:

63

Type 1

S

NFA 1

ε ε

Type 2

NFA 2

Type N

NFA N

ε

slide-64
SLIDE 64
  • B. Ward — Spring 2014

Multi Token Construction

  • To build DFA for N tokens:

➡Create a NFA for each token type RE as before. ➡Join all token NFAs as shown below:

64

Type 1

S

NFA 1

ε ε

Type 2

NFA 2

Type N

NFA N

ε

This is similar to NFA construction rule 3. Key difference: we keep all final states.

slide-65
SLIDE 65
  • B. Ward — Spring 2014

Token Precedence

65

Consider the following regular grammar.

➡Create DFA to recognize identifiers and keywords.

identifier → letter (letter | digit | _)* keyword → if | else | while digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 letter → a | b | c | … | z Can you spot a problem?

slide-66
SLIDE 66
  • B. Ward — Spring 2014

Token Precedence

66

All keywords are also identifiers! The grammar is ambiguous. Example: for string ‘while’, there are two accepting states in the final NFA with different labels.

Consider the following regular grammar.

➡Create DFA to recognize identifiers and keywords.

identifier → letter (letter | digit | _)* keyword → if | else | while digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 letter → a | b | c | … | z

slide-67
SLIDE 67
  • B. Ward — Spring 2014

Token Precedence

67

Solution

➡Assign precedence values to tokens (and

labels).

➡In case of ambiguity, prefer final state with

highest precedence value.

Consider the following regular grammar.

➡Create DFA to recognize identifiers and keywords.

identifier → letter (letter | digit | _)* keyword → if | else | while digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 letter → a | b | c | … | z

slide-68
SLIDE 68
  • B. Ward — Spring 2014

Solution

➡Assign precedence values to tokens (and

labels).

➡In case of ambiguity, prefer final state with

highest precedence value.

Consider the following regular grammar.

➡Create DFA to recognize identifiers and keywords.

identifier → letter (letter | digit | _)* keyword → if | else | while digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 letter → a | b | c | … | z

Token Precedence

68

Note: during DFA optimization, two final states are not equivalent if they are labeled with
 different token types.

slide-69
SLIDE 69
  • B. Ward — Spring 2014

69

  • Multi Token Example

Java Integer Literals

slide-70
SLIDE 70
  • B. Ward — Spring 2014
  • Multi Token Example

Java Integer Literals

70

Final states labeled with token type.

slide-71
SLIDE 71
  • B. Ward — Spring 2014
  • Multi Token Example

Java Integer Literals

71

Final states can have transitions into
 non-final states.

slide-72
SLIDE 72
  • B. Ward — Spring 2014

+ Kleene Plus name → letter+ is the same as name → letter letter*

Extended Regular Expressions

72

some commonly used abbreviations + ? [] [^] n

slide-73
SLIDE 73
  • B. Ward — Spring 2014

n times name → letter3 is the same as name → letter letter letter

Extended Regular Expressions

73

some commonly used abbreviations + ? [] [^] n

slide-74
SLIDE 74
  • B. Ward — Spring 2014

Extended Regular Expressions

74

? optionally ZIP → digit5 (-digit4)? is the same as ZIP → digit5 ( ε | -digit4 )

some commonly used abbreviations + ? [] [^] n

slide-75
SLIDE 75
  • B. Ward — Spring 2014

Extended Regular Expressions

75

[] one of digit→ [123456789] is the same as digit→ 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

some commonly used abbreviations + ? [] [^] n

slide-76
SLIDE 76
  • B. Ward — Spring 2014

Extended Regular Expressions

76

[^] not one of notADigit → [^123456789] is the same as notADigit → A | B | C ...


some commonly used abbreviations + ? [] [^] n

slide-77
SLIDE 77
  • B. Ward — Spring 2014

some commonly used abbreviations + ? [] [^] n

Extended Regular Expressions

77

[^] not one of notADigit → [^123456789] is the same as notADigit → A | B | C ...
 Every character except those listed between [^ and ].

slide-78
SLIDE 78
  • B. Ward — Spring 2014

Limitations of REs

  • Suppose we wanted to remove extraneous,

balanced ‘(‘ ‘)’ pairs around identifiers.

➡Example: report (sum), ((sum)) and (((sum)))

simply as Identifier.

➡But not: ((sum)

  • One might try:

78

identifier → (n letter+ )m such that n = m

This cannot be expressed with regular expressions! Requires a recursive grammar: let the parser do it.