Java II Finite Automata I Bernd Kiefer Bernd.Kiefer@dfki.de - - PowerPoint PPT Presentation

java ii finite automata i
SMART_READER_LITE
LIVE PREVIEW

Java II Finite Automata I Bernd Kiefer Bernd.Kiefer@dfki.de - - PowerPoint PPT Presentation

Java II Finite Automata I Bernd Kiefer Bernd.Kiefer@dfki.de Deutsches Forschungszentrum f ur k unstliche Intelligenz Finite Automata I p.1/13 Processing Regular Expressions We already learned about Javas regular expression


slide-1
SLIDE 1

Java II Finite Automata I

Bernd Kiefer

Bernd.Kiefer@dfki.de

Deutsches Forschungszentrum f¨ ur k¨ unstliche Intelligenz

Finite Automata I – p.1/13

slide-2
SLIDE 2

Processing Regular Expressions

We already learned about Java’s regular expression functionality Now we get to know the machinery behind Pattern and Matcher classes Compiling a regular expression into a Pattern object produces a Finite Automaton This automaton is then used to perform the matching tasks We will see how to construct a finite automaton that recognizes an input string, i.e., tries to find a full match

Finite Automata I – p.2/13

slide-3
SLIDE 3

Definition: Finite Automaton

A finite automaton (FA) is a tuple A =< Q, Σ, δ, q0, F >

Q a finite non-empty set of states Σ a finite alphabet of input letters δ a (total) transition function Q × Σ − → Q q0 ∈ Q the initial state F ⊆ Q the set of final (accepting) states

Transition graphs (diagrams):

q0 q1 q2 q3

d

  • g

transition

states final state initial state

Finite Automata I – p.3/13

slide-4
SLIDE 4

Finite Automata: Matching

A finite automaton accepts a given input string s if there is a sequence of states p1, p2, . . . , p|s| ∈ Q such that

  • 1. p1 = q0, the start state
  • 2. δ(pi, si) = pi+1, where si is the i-th character in s
  • 3. p|s| ∈ F, i.e., a final state

A string is successfully matched if we have found the appropriate sequence of states Imagine the string on an input tape with a pointer that is advanced when using a δ transition The set of strings accepted by an automaton is the accepted language, analogous to regular expressions

Finite Automata I – p.4/13

slide-5
SLIDE 5

(Non)deterministic Automata

in the definition of automata, δ was a total function ⇒ given an input string, the path through the automaton is uniquely determined those automata are therefore called deterministic for nondeterministic FA, δ is a transition relation

δ : Q × Σ ∪ {ǫ} − → P(Q), where P(Q) is the powerset of Q

allows transitions from one state into several states with the same input symbol need not be total can have transitions labeled ǫ (not in Σ), which represents the empty string

Finite Automata I – p.5/13

slide-6
SLIDE 6

RegExps − → Automata

Construct nondeterminstic automata from regular expressions (αβ)

q0α . . . qfα q0β . . . qfβ

(α | β)

q0α . . . qfα q0β . . . qfβ q0 qf ǫ ǫ ǫ ǫ

(α)∗

q0α . . . qfα q0 qf ǫ ǫ ǫ ǫ

Finite Automata I – p.6/13

slide-7
SLIDE 7

NFA vs. DFA

Traversing a DFA is easy given the input string: the path is uniquely determined In contrast, traversing an NFA requires keeping track of a set of (current) states, starting with the set {qo} Processing the next input symbol means taking all possible outgoing transitions from this set and collecting the new set From every NFA, an equivalent DFA (one which does accept the same language), can be computed Basic Idea: track the subsets that can be reached for every possible input

Finite Automata I – p.7/13

slide-8
SLIDE 8

Traversing an NFA

1 2 3 4 5 6 7 8 9

ǫ ǫ ǫ

a b

ǫ ǫ ǫ

a b

ǫ ǫ

abab

Finite Automata I – p.8/13

slide-9
SLIDE 9

Traversing an NFA

1 2 3 4 5 6 7 8 9

ǫ ǫ ǫ

a b

ǫ ǫ ǫ

a b

ǫ ǫ

1 2 4 7 abab

Finite Automata I – p.8/13

slide-10
SLIDE 10

Traversing an NFA

1 2 3 4 5 6 7 8 9

ǫ ǫ ǫ

a b

ǫ ǫ ǫ

a b

ǫ ǫ

1 2 3 4 6 7 8 abab

Finite Automata I – p.8/13

slide-11
SLIDE 11

Traversing an NFA

1 2 3 4 5 6 7 8 9

ǫ ǫ ǫ

a b

ǫ ǫ ǫ

a b

ǫ ǫ

1 2 4 5 6 7 9 abab

Finite Automata I – p.8/13

slide-12
SLIDE 12

Traversing an NFA

1 2 3 4 5 6 7 8 9

ǫ ǫ ǫ

a b

ǫ ǫ ǫ

a b

ǫ ǫ

1 2 3 4 6 7 8 abab

Finite Automata I – p.8/13

slide-13
SLIDE 13

Traversing an NFA

1 2 3 4 5 6 7 8 9

ǫ ǫ ǫ

a b

ǫ ǫ ǫ

a b

ǫ ǫ

1 2 4 5 6 7 9 abab

Finite Automata I – p.8/13

slide-14
SLIDE 14

NFA − → DFA: Subset Construction

Simulate “in parallel” all possible moves the automaton can make The states of the resulting DFA will represent sets of states

  • f the NFA, i.e., elements of P(Q)

We use two operations on states/state-sets of the NFA

ǫ-closure(T) Set of states reachable from any state s in T on

  • n ǫ-transitions

move(T, a) Set of states to which there is a transition from

  • ne state in T on input symbol a

The final states of the DFA are those where the corresponding NFA subset contains a final state

Finite Automata I – p.9/13

slide-15
SLIDE 15

Algorithm: Subset Construction

proc SubsetConstruction(s0) ≡ DFAStates = ǫ-closure({s0}) while there is an unmarked state T in DFAStates do mark T for each input symbol a do

U := ǫ-closure(move(T, a))

DFADelta[T, a] := U if U ∈ DFAStates then add U as unmarked state to DFAStates proc ǫ-closure(T) ≡

ǫ-closure := T; to check := T

while to check not empty do get some state t from to check for each state u with edge labeled ǫ from t to u if u ∈ ǫ-closure then add u to ǫ-closure and to check

Finite Automata I – p.10/13

slide-16
SLIDE 16

Example: Subset Construction

1 2 3 4 5 6 7 8 9

ǫ ǫ ǫ

a b

ǫ ǫ ǫ

a b

ǫ ǫ

Finite Automata I – p.11/13

slide-17
SLIDE 17

Example: Subset Construction

1 2 3 4 5 6 7 8 9

ǫ ǫ ǫ

a b

ǫ ǫ ǫ

a b

ǫ ǫ

0,1, 2,4,7

1 2 4 7

0,1, 2,4,7

Finite Automata I – p.11/13

slide-18
SLIDE 18

Example: Subset Construction

1 2 3 4 5 6 7 8 9

ǫ ǫ ǫ

a b

ǫ ǫ ǫ

a b

ǫ ǫ

0,1, 2,4,7 1,2,3 4,6,7,8

a 1 2 3 4 6 7 8

1,2,3 4,6,7,8

Finite Automata I – p.11/13

slide-19
SLIDE 19

Example: Subset Construction

1 2 3 4 5 6 7 8 9

ǫ ǫ ǫ

a b

ǫ ǫ ǫ

a b

ǫ ǫ

0,1, 2,4,7 1,2,3 4,6,7,8

a

1,2,4 5,6,7

b 1 2 4 5 6 7

1,2,4 5,6,7

Finite Automata I – p.11/13

slide-20
SLIDE 20

Example: Subset Construction

1 2 3 4 5 6 7 8 9

ǫ ǫ ǫ

a b

ǫ ǫ ǫ

a b

ǫ ǫ

0,1, 2,4,7 1,2,3 4,6,7,8

a

1,2,4 5,6,7

b a 1 2 3 4 6 7 8

1,2,3 4,6,7,8

Finite Automata I – p.11/13

slide-21
SLIDE 21

Example: Subset Construction

1 2 3 4 5 6 7 8 9

ǫ ǫ ǫ

a b

ǫ ǫ ǫ

a b

ǫ ǫ

0,1, 2,4,7 1,2,3 4,6,7,8

a

1,2,4 5,6,7

b a b 1 2 4 5 6 7

1,2,4 5,6,7

Finite Automata I – p.11/13

slide-22
SLIDE 22

Example: Subset Construction

1 2 3 4 5 6 7 8 9

ǫ ǫ ǫ

a b

ǫ ǫ ǫ

a b

ǫ ǫ

0,1, 2,4,7 1,2,3 4,6,7,8

a

1,2,4 5,6,7

b a b a 1 2 3 4 6 7 8

1,2,3 4,6,7,8

Finite Automata I – p.11/13

slide-23
SLIDE 23

Example: Subset Construction

1 2 3 4 5 6 7 8 9

ǫ ǫ ǫ

a b

ǫ ǫ ǫ

a b

ǫ ǫ

0,1, 2,4,7 1,2,3 4,6,7,8

a

1,2,4 5,6,7

b a b a

1,2,4 5,6,7,9 1,2,4 5,6,7,9

b 1 2 3 4 6 7 8 9

Finite Automata I – p.11/13

slide-24
SLIDE 24

Example: Subset Construction

1 2 3 4 5 6 7 8 9

ǫ ǫ ǫ

a b

ǫ ǫ ǫ

a b

ǫ ǫ

0,1, 2,4,7 1,2,3 4,6,7,8

a

1,2,4 5,6,7

b a b a

1,2,4 5,6,7,9 1,2,4 5,6,7,9

b 1 2 3 4 6 7 8 9 a b

Finite Automata I – p.11/13

slide-25
SLIDE 25

Time/Space Considerations

DFA traversal is linear to the length of input string x NFA needs O(n) space (states+transitions), where n is the length of the regular expression NFA traversal may need time n × |x|, so why use NFAs?

Finite Automata I – p.12/13

slide-26
SLIDE 26

Time/Space Considerations

DFA traversal is linear to the length of input string x NFA needs O(n) space (states+transitions), where n is the length of the regular expression NFA traversal may need time n × |x|, so why use NFAs? There are DFA that have at least 2n states!

Finite Automata I – p.12/13

slide-27
SLIDE 27

Time/Space Considerations

DFA traversal is linear to the length of input string x NFA needs O(n) space (states+transitions), where n is the length of the regular expression NFA traversal may need time n × |x|, so why use NFAs? There are DFA that have at least 2n states! Solution 1: “Lazy” construction of the DFA: construct DFA states on the fly up to a certain amount and cache them

Finite Automata I – p.12/13

slide-28
SLIDE 28

Time/Space Considerations

DFA traversal is linear to the length of input string x NFA needs O(n) space (states+transitions), where n is the length of the regular expression NFA traversal may need time n × |x|, so why use NFAs? There are DFA that have at least 2n states! Solution 1: “Lazy” construction of the DFA: construct DFA states on the fly up to a certain amount and cache them Solution 2: Try to minimize the DFA: There is a unique (modulo state names) minimal automaton for a regular language!

Finite Automata I – p.12/13

slide-29
SLIDE 29

Minimization Algorithm by Hopcroft

proc Minimize() ≡

B1 = F; B2 = Q F E = {B1, B2} k = 3

for a ∈ Σ do

a(i) = {s ∈ Q|s ∈ Bi ∧ ∃t : δ(t, a) = s} L = the smaller of the a(i)

while L = ∅ do take some i ∈ L and delete it for j < k s.th. ∃t ∈ Bj

Finite Automata I – p.13/13