Chrobak normal form revisited, with applications Pawe Gawrychowski - PowerPoint PPT Presentation

Chrobak normal form revisited, with applications Paweł Gawrychowski Institute of Computer Science, University of Wrocław July 20, 2011 Paweł Gawrychowski Chrobak normal form... July 20, 2011 1 / 18

What is Chrobak normal form? We focus on nondeterministic finite automata over Σ = { a } . q 0 Without losing (much) generality, there is exactly one final state q f . Paweł Gawrychowski Chrobak normal form... July 20, 2011 2 / 18

Such NFA is just a directed graph, hence there is not much structure you can assume when trying to prove some properties of the language in question. c 1 q 0 c 2 � �� O ( n 2 ) c ℓ Chrobak normal form theorem For any NFA on n states, there exists an equivalent automaton in the above normal form, with � i c i ≤ n . Paweł Gawrychowski Chrobak normal form... July 20, 2011 3 / 18

The original proof by Marek Chrobak was not concerned with a polynomial time construction. M. Chrobak. Finite automata and unary languages. Theor. Comput. Sci. , 47:149–158, November 1986 Later a polynomial time version was given by Martinez, but the complexity was around O ( n 5 ) . A. Martinez. Efficient computation of regular expressions from unary NFAs. DFCS ’02, pages 174–187, 2002 Both proof contained a minor flaw corrected by To. A. W. To. Unary finite automata vs. arithmetic progressions. Information Processing Letters , 109(17):1010 – 1014, 2009 And recently Sawa improved the complexity to O ( n 4 ) . Z. Sawa. Efficient construction of semilinear representations of languages accepted by unary NFA. In Proceedings of the 4th international conference on Reachability problems , RP’10, pages 176–182, 2010 Paweł Gawrychowski Chrobak normal form... July 20, 2011 4 / 18

I will briefly sketch the idea behind a very simple O ( nm ) algorithm. Then we will see how to apply its more sophisticated versions to get some nice results concerning converting unary NFA into RE or CFG. Paweł Gawrychowski Chrobak normal form... July 20, 2011 5 / 18

We want to describe all paths from q 0 to q f . We split them into two categories. acyclic all vertices on the path belong to trivial strongly connected components. cyclic there is a cycle of length d ≥ 1 intersecting the path. The first case is trivial as the length cannot exceed n . The second case is more involved, though. For each v such that there exists a cycle through v we describe all paths q 0 → v → q f . Paweł Gawrychowski Chrobak normal form... July 20, 2011 6 / 18

d q 0 q f v Observation If there is an accepting path of length ℓ through v , then there is a path of length ℓ + d , ℓ + 2 d , ℓ + 3 d , . . . as well. Hence to completely describe all those paths, we only need to compute for each r = 0 , 1 , 2 , . . . , d − 1 the shortest path of length of the form α d + r . We call its length t ( r ) . Paweł Gawrychowski Chrobak normal form... July 20, 2011 7 / 18

To compute all t ( 0 ) , t ( 1 ) , . . . , t ( d − 1 ) we construct a new graph. For each vertex of the original graph u we create u ( 0 ) , u ( 1 ) , . . . , u ( d − 1 ) and u ′ ( 0 ) , u ′ ( 1 ) , . . . , u ′ ( d − 1 ) . The intuition is that there exists a path from q 0 ( 0 ) to u ( ℓ mod d ) of length ℓ iff there is 1 a path from q 0 to u of length ℓ , there exists a path from q 0 ( 0 ) to u ′ ( ℓ mod d ) of length ℓ iff there is 2 a path from q 0 to v and then to u of length ℓ . The construction is straightforward: for any edge x → y and r = 0 , 1 , . . . , d − 1 we add edges: x ( r ) → y (( r + 1 ) mod d ) , 1 x ′ ( r ) → y ′ (( r + 1 ) mod d ) , 2 if y = v , x ( r ) → y ′ (( r + 1 ) mod d ) . 3 Paweł Gawrychowski Chrobak normal form... July 20, 2011 8 / 18

What is this new graph for? Then it turns out that the values t ( r ) we are looking for are exactly the distances from q 0 ( 0 ) to q ′ f ( r ) ! So, t ( r ) is either infinite or at most 2 n 2 . Hence we can run BFS to compute all t ( r ) in linear time, which is O ( nm ) . Then we can represent the paths by a prefix of length 2 n 2 followed by a cycle of length d . Of course we must repeat the computation for each possible v but we can share the prefix among all choices! This is not enough to get the claimed bounds, though. The running time is O ( n 2 m ) and we do not know that the combined size of all cycles is at most n . This can be easily fixed with a small modification: we process all vertices in a single simple cycle at once. Paweł Gawrychowski Chrobak normal form... July 20, 2011 9 / 18

But... what is this form for? Now we focus on two possible applications of the above idea (and the Chrobak normal form). Say we have a NFA on n states. We would like to construct a small regular expression describing the same language. O ( n 2 ) can be achieved by an application of the Chrobak normal form, as observed by Martinez. It looks like a natural bound, right? Theorem For any NFA on n states an equivalent RE of size O ( n 2 log n ) exists. (and can be found in polynomial time) Paweł Gawrychowski Chrobak normal form... July 20, 2011 10 / 18

anything possible ...anything possible, too? n 2 periodic n The rightmost part can be represented as a sum of a few periodic sets (with small periods). What happens in the middle? It turns out that the red part can be represented as a sum of two fairly regular sets. Paweł Gawrychowski Chrobak normal form... July 20, 2011 11 / 18

anything possible not periodic, but... n 2 n 2 periodic n log n The rightmost part can be represented as a sum of a few periodic sets (with small periods). What happens in the middle? It turns out that the red part can be represented as a sum of two fairly regular sets. Paweł Gawrychowski Chrobak normal form... July 20, 2011 11 / 18

We split all accepting paths into a few types. acyclic as before. n 2 strongly cyclic there is a cycle of length d ≤ log n intersecting the path. weakly cyclic not acyclic, but all intersected cycles are long. The difficult part of the red fragment corresponds to weakly cyclic paths. We further split them into two subtypes. thin if they can reach at most O ( log n ) different simple cycles lengths. fat if they can reach more different simple cycles lengths. Paweł Gawrychowski Chrobak normal form... July 20, 2011 12 / 18

Both subtypes have succinct representations by regular expression but for different reasons. For thin paths we do some kind of (almost) brute force power set 1 construction. For fat paths we cannot afford to do that. Nevertheless, it turns out 2 that we can apply some number theoretical bound on the Frobenius number to show that those paths are not terribly complicated. Paweł Gawrychowski Chrobak normal form... July 20, 2011 13 / 18

Now assume that we would like to construct a small context-free grammar describing the same language. By small we actually mean with as few nonterminals as possible, and in the Chomsky normal form (so all productions are of the form A → a or A → BC ). It is known that O ( n 2 / 3 ) nonterminals are enough. M. Domaratzki, G. Pighizzini, and J. Shallit. Simulating finite automata with context-free grammars. Information Processing Letters , 84(6):339 – 344, 2002 It also looks like a natural bound, right? Theorem For any NFA on n states there exists an equivalent CFG in Chomsky � normal form on O ( n log n ) nonterminals. (and can be found in polynomial time) Paweł Gawrychowski Chrobak normal form... July 20, 2011 14 / 18

It turns out that the difficulty is in describing all paths of lengths at most 2 n 2 which intersect just long cycles (where long means longer than b ). We cut each such path into fragments of length b and try to construct shortcuts. q 0 q f � �� b b b Choose a subset of vertices so that at least one vertex in each block is marked. Then add a shortcut of length ≤ 2 b between each pair of neighbouring blocks. States in a single block are different! Paweł Gawrychowski Chrobak normal form... July 20, 2011 15 / 18

It turns out that the difficulty is in describing all paths of lengths at most 2 n 2 which intersect just long cycles (where long means longer than b ). We cut each such path into fragments of length b and try to construct shortcuts. q 0 B 4 B 3 q f B 1 B 1 � �� b b b Choose a subset of vertices so that at least one vertex in each block is marked. Then add a shortcut of length ≤ 2 b between each pair of neighbouring blocks. States in a single block are different! Paweł Gawrychowski Chrobak normal form... July 20, 2011 15 / 18

Chrobak normal form revisited, with applications Pawe Gawrychowski - PowerPoint PPT Presentation

Chrobak normal form revisited, with applications Pawe Gawrychowski Institute of Computer Science, University of Wrocaw July 20, 2011 Pawe Gawrychowski Chrobak normal form... July 20, 2011 1 / 18 What is Chrobak normal form? We focus on

Chomsky Normal Form Chomsky Normal Form Chomsky Normal Form A context free grammar is in

Chomsky Normal Form We introduce Chomsky Normal Form, which is used to answer questions about

Linear regression How to measure the accuracy of linear regression models Linear Regression

Hom and Ext, Revisited Justin Lyle Lawrence, KS justin.lyle@ku.edu April 28, 2018 JL Hom and

Smith Normal Form and Combinatorics Richard P . Stanley Smith Normal Form and Combinatorics

Minimum Energy Scheduling Marek Chrobak University of California, Riverside How to Keep Sheep

Normal Form Games 2-12-16 Game Representations Extensive Form Game Normal Form Game

Normal A Spectrum of Engineering Design Normal Radical A Spectrum of Engineering Design Normal

Normal Forms for Boolean Expressions A NORMAL FORM defines a class expressions s.t. a. Satisfy

The Lament Form The Lament Form The Lament Form The Lament Form The Lament Form Psalm 64

CKY Algorithm, Chomsky Normal Form Scott Farrar CLMA, University of Washington January 13, 2010

BCNF revisited: 40 Years Normal Forms Part III : BCNF for SQL After F. Ferrarotti, S. Hartmann, H.

Chomsky normal form (CNF) I Purpose: a simplified form of grammars Every rule must be either A

4.9: Chomsky Normal Form In this section, we study a special form of grammars called Chomsky

Chapter 5 Slide 1 Normal Probability Distributions 5-1 Overview 5-2 The Standard Normal

Simplification of CFG and Normal Forms Wen-Guey Tzeng Computer Science Department National

CS20a: summary (Oct 24, 2002) Context-free languages Grammars G = (V, T, P, S)

Parsing CSCI 3130 Formal Languages and Automata Theory Siu On CHAN Fall 2018 Chinese University

Parsing CSCI 3130 Formal Languages and Automata Theory Siu On CHAN Chinese University of Hong

Parse Trees Statistical NLP Spring 2011 Lecture 15: Parsing I The move followed a round of

INF2080 Context-Free Langugaes Daniel Lupp Universitetet i Oslo 1st February 2016 Department

s tt s

PCFGs: Viterbi CKY CMSC 473/673 UMBC November 13 th , 2017 Recap from last time

Slowing Down Top Trees for Better Worst-Case Compression Bartomiej Dudek 1 Pawe Gawrychowski 1