words and automata lecture 4 ergodic sources and
play

Words and Automata, Lecture 4 Ergodic sources and compression - PowerPoint PPT Presentation

Ergodic sources Statistics on words Words and Automata, Lecture 4 Ergodic sources and compression Dominique Perrin 20 octobre 2012 Dominique Perrin Words and Automata, Lecture 4 Ergodic sources and compressi Ergodic sources Unique


  1. Ergodic sources Statistics on words Words and Automata, Lecture 4 Ergodic sources and compression Dominique Perrin 20 octobre 2012 Dominique Perrin Words and Automata, Lecture 4 Ergodic sources and compressi

  2. Ergodic sources Unique ergodicity Statistics on words Practical estimate of the entropy Ergodic sources Consider a source X = ( X 1 , X 2 , . . . , X n , . . . ) on the alphabet A associated to a probability distribution π . Given a word w = a 1 · · · a n on A , denote by f N ( w ) the frequency of occurrences of the word w in the first N terms of the sequence X . We say that the source X is ergodic if for any word w , the sequence f N ( w ) tends almost surely to π ( w ). An ergodic source is stationary. The converse is not true, as shown by the following example. Example Let us consider again the distribution of the first Example. This distribution is stationary. We have f N ( b ) = 1 when the source outputs only b ’s, although the probability of b is 1 / 2. Thus, this source is not ergodic. Dominique Perrin Words and Automata, Lecture 4 Ergodic sources and compressi

  3. Ergodic sources Unique ergodicity Statistics on words Practical estimate of the entropy Example Consider the distribution of the second Example (Thue-Morse). This source is ergodic. Indeed, the definition of π implies that the frequency f N ( w ) of any factor w in the Thue–Morse word tends to π ( w ). Dominique Perrin Words and Automata, Lecture 4 Ergodic sources and compressi

  4. Ergodic sources Unique ergodicity Statistics on words Practical estimate of the entropy It can be proved that any Bernoulli source is ergodic. This implies in particular the statement known as the strong law of large numbers : if the sequence X = ( X 1 , X 2 , . . . , X n , . . . ) is independent and identically distributed then, setting S n = X 1 + · · · + X n , the sequence 1 n S n converges almost surely to the common value E ( X i ). More generally, any irreducible Markov chain equipped with its stationary distribution as initial distribution is an ergodic source. Dominique Perrin Words and Automata, Lecture 4 Ergodic sources and compressi

  5. Ergodic sources Unique ergodicity Statistics on words Practical estimate of the entropy Ergodic sources have the important property that typical messages of the same length have approximately the same probability, which is 2 − nH where H is the entropy of the source. Let us give a more precise formulation of this property, known as the asymptotic equirepartition property. Let ( X 1 , X 2 , . . . ) be an ergodic source with entropy H . Then for any ǫ > 0 there is an N such that for all n ≥ N , the set of words of length n is the union of two sets R and T satisfying (i) π ( R ) < ǫ (ii) for each w ∈ T , 2 − n ( H + ǫ ) < π ( w ) < 2 − n ( H − ǫ ) where π denotes the probability distribution on A n defined by π ( a 1 a 2 · · · a n ) = P ( X 1 = a 1 , . . . , X n = a n ). Dominique Perrin Words and Automata, Lecture 4 Ergodic sources and compressi

  6. Ergodic sources Unique ergodicity Statistics on words Practical estimate of the entropy Thus, the set of messages of length n is partitioned into a set R of negligible probability and a set T of “typical” messages having all approximately probability 2 − nH . Since π ( w ) ≥ 2 − n ( H + ǫ ) for w ∈ T , the number of typical messages satisfies Card( T ) ≤ 2 n ( H + ǫ ) . This observation allows us to see that the entropy gives a lower bound for the compression of a text. Indeed, if the messages of length n are coded unambiguously by binary messages of average length ℓ , then ℓ/ n ≥ H − ǫ since otherwise two different messages would have the same coding. On the other hand, any coding assigning different binary words of length n ( H + ǫ ) to the typical messages and arbitrary values to the other messages will give a coding of compression rate approximately equal to H . Dominique Perrin Words and Automata, Lecture 4 Ergodic sources and compressi

  7. Ergodic sources Unique ergodicity Statistics on words Practical estimate of the entropy It is interesting in practice to have compression methods which are universal in the sense that they do not depend on a particular source. Some of these methods however achieve asymptotically the theoretical lower bound given by the entropy for all ergodic sources. We sketch here the presentation of one of these methods among many, the Ziv–Lempel encoding algorithm. We consider for a word w the factorization w = x 1 x 2 · · · x m u where 1 for each i = 1 , . . . , m , the word x i is chosen the shortest possible not the set { x 0 , x 1 , x 2 , . . . , x i − 1 } , with the convention x 0 = ǫ . 2 the word u is a prefix of some x i . This factorization is called the Ziv–Lempel factorization of w . Dominique Perrin Words and Automata, Lecture 4 Ergodic sources and compressi

  8. Ergodic sources Unique ergodicity Statistics on words Practical estimate of the entropy For example, the Fibonacci word has the factorization ( a )( b )( aa )( ba )( baa )( baab )( ab )( aab )( aba ) · · · The coding of the word w is the sequence ( n 1 , a 1 ) , ( n 2 , a 2 ) , . . . , ( n m , a m ) where n 1 = 0 and x 1 = a 1 , and for each i = 2 , . . . , n , we have x i = x n i a i , with n i < i and a i a letter. Writing each integer n i in binary gives a coding of length approximately m log m bits. It can be shown that for any ergodic source, the quantity m log m / n tends almost surely to the entropy of the source. Thus this coding is an optimal universal coding. Dominique Perrin Words and Automata, Lecture 4 Ergodic sources and compressi

  9. Ergodic sources Unique ergodicity Statistics on words Practical estimate of the entropy Practically, the coding of a word w uses a set D called the dictionary to maintain the set of words { x 1 , . . . , x i } . We use a trie to represent the set D . We also suppose that the word ends with a final symbol to avoid coding the last factor u . ZLencoding ( w ) 1 ⊲ returns the Ziv–Lempel encoding c of w 2 T ← NewTrie () 3 ( c , i ) ← ( ε, 0) 4 while i < | w | do 5 ( ℓ, p ) ← LongestPrefixInTrie ( w , i ) a ← w [ i + ℓ ] 6 7 q ← NewVertex () Next ( p , a ) ← q 8 ⊲ updates the trie T 9 c ← c · ( p , a ) ⊲ appends ( p , a ) to c i ← i + ℓ + 1 10 11 return c The result is a linear time algorithm. Dominique Perrin Words and Automata, Lecture 4 Ergodic sources and compressi

  10. Ergodic sources Unique ergodicity Statistics on words Practical estimate of the entropy The decoding is also simple. The important point is that there is no need to transmit the dictionary. Indeed, one builds it in the same way as it was built in the encoding phase. It is convenient this time to represent the dictionary as an array of strings. ZLdecoding ( c ) 1 ( w , i ) ← ( ε, 0) 2 D [ i ] ← ε 3 while c � = ε do 4 ( p , a ) ← Current () ⊲ returns the current pair in c 5 Advance () y ← D [ p ] 6 7 i ← i + 1 D [ i ] ← ya 8 ⊲ adds ya to the dictionary 9 w ← wya 10 return w Dominique Perrin Words and Automata, Lecture 4 Ergodic sources and compressi

  11. Ergodic sources Unique ergodicity Statistics on words Practical estimate of the entropy The functions Current () and Advance () manage the sequence c , considering each pair as a token. The practical details of the implementation are delicate. In particular, it is advised not to let the size of the dictionary grow too much. One strategy consists in limiting the size of the input, encoding it by blocks. Another one is to reset the dictionary once it has exceeded some prescribed size. In either case, the decoding algorithm must of course also follow the same strategy. Dominique Perrin Words and Automata, Lecture 4 Ergodic sources and compressi

  12. Ergodic sources Unique ergodicity Statistics on words Practical estimate of the entropy Unique ergodicity We have seen that in some cases, given a formal language S , there exists a unique invariant measure with entropy equal the topological entropy of the set S . In particular, it is true in the case of a regular set S recognized by an automaton with a strongly connected graph. In this case, the measure is also ergodic since it is the invariant measure corresponding to an irreducible Markov chain. There are even cases in which there is a unique invariant measure supported by S . This is the so-called property of unique ergodicity . We will see below that this situation arises for the factors of fixed points of primitive morphisms. Dominique Perrin Words and Automata, Lecture 4 Ergodic sources and compressi

  13. Ergodic sources Unique ergodicity Statistics on words Practical estimate of the entropy The Example of the Thue-Morse word is one illustration of this case. We got the result by an elementary computation. In the general case, one considers a morphism f : A ∗ → A ∗ that admits a fixed point u ∈ A ω . Let M be the A × A –matrix defined by M a , b = | f ( a ) | b where | x | a is the number of occurrences of the symbol a in the word x . We suppose the morphism f to be primitive, which by definition means that the matrix M itself is primitive. It is easy to verify that for any n , the entry M n a , b is the number of occurrences of b in the word f n ( a ). Dominique Perrin Words and Automata, Lecture 4 Ergodic sources and compressi

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend