Chrobak normal form revisited, with applications Pawe Gawrychowski - - PowerPoint PPT Presentation

chrobak normal form revisited with applications
SMART_READER_LITE
LIVE PREVIEW

Chrobak normal form revisited, with applications Pawe Gawrychowski - - PowerPoint PPT Presentation

Chrobak normal form revisited, with applications Pawe Gawrychowski Institute of Computer Science, University of Wrocaw July 20, 2011 Pawe Gawrychowski Chrobak normal form... July 20, 2011 1 / 18 What is Chrobak normal form? We focus on


slide-1
SLIDE 1

Chrobak normal form revisited, with applications

Paweł Gawrychowski

Institute of Computer Science, University of Wrocław

July 20, 2011

Paweł Gawrychowski Chrobak normal form... July 20, 2011 1 / 18

slide-2
SLIDE 2

What is Chrobak normal form?

We focus on nondeterministic finite automata over Σ = {a}.

q0

Without losing (much) generality, there is exactly one final state qf.

Paweł Gawrychowski Chrobak normal form... July 20, 2011 2 / 18

slide-3
SLIDE 3

Such NFA is just a directed graph, hence there is not much structure you can assume when trying to prove some properties of the language in question.

q0 c1 c2 cℓ

  • O(n2)

Chrobak normal form theorem

For any NFA on n states, there exists an equivalent automaton in the above normal form, with

i ci ≤ n.

Paweł Gawrychowski Chrobak normal form... July 20, 2011 3 / 18

slide-4
SLIDE 4

The original proof by Marek Chrobak was not concerned with a polynomial time construction.

  • M. Chrobak. Finite automata and unary languages. Theor.
  • Comput. Sci., 47:149–158, November 1986

Later a polynomial time version was given by Martinez, but the complexity was around O(n5).

  • A. Martinez. Efficient computation of regular expressions from

unary NFAs. DFCS ’02, pages 174–187, 2002

Both proof contained a minor flaw corrected by To.

  • A. W. To. Unary finite automata vs. arithmetic progressions. Information Processing

Letters, 109(17):1010 – 1014, 2009

And recently Sawa improved the complexity to O(n4).

  • Z. Sawa. Efficient construction of semilinear representations of languages accepted

by unary NFA. In Proceedings of the 4th international conference on Reachability problems, RP’10, pages 176–182, 2010

Paweł Gawrychowski Chrobak normal form... July 20, 2011 4 / 18

slide-5
SLIDE 5

I will briefly sketch the idea behind a very simple O(nm) algorithm. Then we will see how to apply its more sophisticated versions to get some nice results concerning converting unary NFA into RE or CFG.

Paweł Gawrychowski Chrobak normal form... July 20, 2011 5 / 18

slide-6
SLIDE 6

We want to describe all paths from q0 to qf. We split them into two categories. acyclic all vertices on the path belong to trivial strongly connected components. cyclic there is a cycle of length d ≥ 1 intersecting the path. The first case is trivial as the length cannot exceed n. The second case is more involved, though. For each v such that there exists a cycle through v we describe all paths q0 → v → qf.

Paweł Gawrychowski Chrobak normal form... July 20, 2011 6 / 18

slide-7
SLIDE 7

q0 qf v d

Observation

If there is an accepting path of length ℓ through v, then there is a path

  • f length ℓ + d, ℓ + 2d, ℓ + 3d, . . . as well.

Hence to completely describe all those paths, we only need to compute for each r = 0, 1, 2, . . . , d − 1 the shortest path of length of the form αd + r. We call its length t(r).

Paweł Gawrychowski Chrobak normal form... July 20, 2011 7 / 18

slide-8
SLIDE 8

To compute all t(0), t(1), . . . , t(d − 1) we construct a new graph. For each vertex of the original graph u we create u(0), u(1), . . . , u(d − 1) and u′(0), u′(1), . . . , u′(d − 1). The intuition is that

1

there exists a path from q0(0) to u(ℓ mod d) of length ℓ iff there is a path from q0 to u of length ℓ,

2

there exists a path from q0(0) to u′(ℓ mod d) of length ℓ iff there is a path from q0 to v and then to u of length ℓ. The construction is straightforward: for any edge x → y and r = 0, 1, . . . , d − 1 we add edges:

1

x(r) → y((r + 1) mod d),

2

x′(r) → y′((r + 1) mod d),

3

if y = v, x(r) → y′((r + 1) mod d).

Paweł Gawrychowski Chrobak normal form... July 20, 2011 8 / 18

slide-9
SLIDE 9

What is this new graph for?

Then it turns out that the values t(r) we are looking for are exactly the distances from q0(0) to q′

f(r)! So, t(r) is either infinite or at most 2n2.

Hence we can run BFS to compute all t(r) in linear time, which is O(nm). Then we can represent the paths by a prefix of length 2n2 followed by a cycle of length d. Of course we must repeat the computation for each possible v but we can share the prefix among all choices! This is not enough to get the claimed bounds, though. The running time is O(n2m) and we do not know that the combined size of all cycles is at most n. This can be easily fixed with a small modification: we process all vertices in a single simple cycle at once.

Paweł Gawrychowski Chrobak normal form... July 20, 2011 9 / 18

slide-10
SLIDE 10

But... what is this form for?

Now we focus on two possible applications of the above idea (and the Chrobak normal form). Say we have a NFA on n states. We would like to construct a small regular expression describing the same language. O(n2) can be achieved by an application of the Chrobak normal form, as observed by Martinez. It looks like a natural bound, right?

Theorem

For any NFA on n states an equivalent RE of size O( n2

log n) exists.

(and can be found in polynomial time)

Paweł Gawrychowski Chrobak normal form... July 20, 2011 10 / 18

slide-11
SLIDE 11

n n2 periodic anything possible ...anything possible, too?

The rightmost part can be represented as a sum of a few periodic sets (with small periods). What happens in the middle? It turns out that the red part can be represented as a sum of two fairly regular sets.

Paweł Gawrychowski Chrobak normal form... July 20, 2011 11 / 18

slide-12
SLIDE 12

n n2 periodic anything possible

n2 log n

not periodic, but...

The rightmost part can be represented as a sum of a few periodic sets (with small periods). What happens in the middle? It turns out that the red part can be represented as a sum of two fairly regular sets.

Paweł Gawrychowski Chrobak normal form... July 20, 2011 11 / 18

slide-13
SLIDE 13

We split all accepting paths into a few types. acyclic as before. strongly cyclic there is a cycle of length d ≤

n2 log n intersecting the path.

weakly cyclic not acyclic, but all intersected cycles are long. The difficult part of the red fragment corresponds to weakly cyclic

  • paths. We further split them into two subtypes.

thin if they can reach at most O(log n) different simple cycles lengths. fat if they can reach more different simple cycles lengths.

Paweł Gawrychowski Chrobak normal form... July 20, 2011 12 / 18

slide-14
SLIDE 14

Both subtypes have succinct representations by regular expression but for different reasons.

1

For thin paths we do some kind of (almost) brute force power set construction.

2

For fat paths we cannot afford to do that. Nevertheless, it turns out that we can apply some number theoretical bound on the Frobenius number to show that those paths are not terribly complicated.

Paweł Gawrychowski Chrobak normal form... July 20, 2011 13 / 18

slide-15
SLIDE 15

Now assume that we would like to construct a small context-free grammar describing the same language. By small we actually mean with as few nonterminals as possible, and in the Chomsky normal form (so all productions are of the form A → a or A → BC). It is known that O(n2/3) nonterminals are enough.

  • M. Domaratzki, G. Pighizzini, and J. Shallit. Simulating finite automata with

context-free grammars. Information Processing Letters, 84(6):339 – 344, 2002

It also looks like a natural bound, right?

Theorem

For any NFA on n states there exists an equivalent CFG in Chomsky normal form on O(

  • n log n) nonterminals.

(and can be found in polynomial time)

Paweł Gawrychowski Chrobak normal form... July 20, 2011 14 / 18

slide-16
SLIDE 16

It turns out that the difficulty is in describing all paths of lengths at most 2n2 which intersect just long cycles (where long means longer than b). We cut each such path into fragments of length b and try to construct shortcuts.

q0

  • b

qf

  • b
  • b

Choose a subset of vertices so that at least one vertex in each block is

  • marked. Then add a shortcut of length ≤ 2b between each pair of

neighbouring blocks. States in a single block are different!

Paweł Gawrychowski Chrobak normal form... July 20, 2011 15 / 18

slide-17
SLIDE 17

It turns out that the difficulty is in describing all paths of lengths at most 2n2 which intersect just long cycles (where long means longer than b). We cut each such path into fragments of length b and try to construct shortcuts.

q0

  • b

qf

  • b
  • b

Choose a subset of vertices so that at least one vertex in each block is

  • marked. Then add a shortcut of length ≤ 2b between each pair of

neighbouring blocks. States in a single block are different!

Paweł Gawrychowski Chrobak normal form... July 20, 2011 15 / 18

slide-18
SLIDE 18

It turns out that the difficulty is in describing all paths of lengths at most 2n2 which intersect just long cycles (where long means longer than b). We cut each such path into fragments of length b and try to construct shortcuts.

q0

  • b

qf

  • b
  • b

B1 B4 B1 B3

Choose a subset of vertices so that at least one vertex in each block is

  • marked. Then add a shortcut of length ≤ 2b between each pair of

neighbouring blocks. States in a single block are different!

Paweł Gawrychowski Chrobak normal form... July 20, 2011 15 / 18

slide-19
SLIDE 19

It turns out that the difficulty is in describing all paths of lengths at most 2n2 which intersect just long cycles (where long means longer than b). We cut each such path into fragments of length b and try to construct shortcuts.

q0

  • b

qf

  • b
  • b

B1 B4 B1 B3

Choose a subset of vertices so that at least one vertex in each block is

  • marked. Then add a shortcut of length ≤ 2b between each pair of

neighbouring blocks. States in a single block are different!

Paweł Gawrychowski Chrobak normal form... July 20, 2011 15 / 18

slide-20
SLIDE 20

The smaller the set of the chosen vertices, the better. This looks like a simple combinatorial problem, right? Given a collection of t sets A1, . . . , At ⊆ U of sizes ≥ s, find a small B ⊆ U such that Ai ∩ B = ∅ for all i.

Lemma

|B| = |U|

s ln t is possible.

Paweł Gawrychowski Chrobak normal form... July 20, 2011 16 / 18

slide-21
SLIDE 21

Open problems

1

O( n2

log n) for converting into RE does not look like the true answer.

2

neither does O(

  • n log n) for converting into CFG. Maybe the true

answer is O(√n)? The best known lower bound is Ω(n1/3) for DFA → CFG conversion.

3

Both conversions work in polynomial time. O(n3+ǫ) seems possible, but can we get O(nm)?

4

Speaking of which, is O(nm) the best possible bound for computing the Chrobak normal form? It seems that even for acyclic paths a substantial improvement would be nontrivial.

Paweł Gawrychowski Chrobak normal form... July 20, 2011 17 / 18

slide-22
SLIDE 22

Thank you for your attention!

Paweł Gawrychowski Chrobak normal form... July 20, 2011 18 / 18