data compression chapters 4 6
play

Data Compression (Chapters 4-6) presented by Tapani Raiko Feb 26, - PowerPoint PPT Presentation

T-61.182 Information Theory and Machine Learning Data Compression (Chapters 4-6) presented by Tapani Raiko Feb 26, 2004 Contents (Data Compression) Chap. 4 Chap. 5 Chap. 6 Data Block Symbol Stream Lossy? Lossy Lossless Lossless


  1. T-61.182 Information Theory and Machine Learning Data Compression (Chapters 4-6) presented by Tapani Raiko Feb 26, 2004

  2. Contents (Data Compression) Chap. 4 Chap. 5 Chap. 6 Data Block Symbol Stream Lossy? Lossy Lossless Lossless Result Shannon’s source Huffman coding Arithmetic coding coding theorem algorithm algorithm

  3. Weighting Problem (What is information?) • 12 balls, all equal in weight except for one • Two-pan balance to use • Determine which is the odd ball and whether it is heavier or lighter • As few uses of the balance as possible! • The outcome of a random experiment is guaranteed to be most informative if the probability distribution over outcomes is uniform

  4. 1 + � ✒ 1 � ✲ 1 + 2 + 5 − 2 + 1 + ✁✁ ✕ ❅ 2 ❅ ❘ 5 − 2 + ✁ weigh 3 + 3 + ✁ � ✒ 4 + 1 2 6 3 ✁ ✲ � ✲ 3 + 4 + 6 − 4 + 1 + ✍ ✂ ❆ ❅ 5 − 3 4 5 4 ✂ ❅ ❘ ❆ 6 − 2 + 6 − ✂ ❆ ✂ 3 + 7 − 7 − ❆❆ ✒ � ✂ 4 + 1 8 − ❯ � ✲ 7 − 8 − 8 − ✂ 5 + ❅ 7 ✂ ❘ ❅ 6 + ⋆ ✂ 7 + ✂ 4 − ✒ � ✂ 8 + 3 � ✲ 6 + 3 − 4 − 3 − ✂ 9 + ✁✁ ✕ ❅ 1 − 4 ❅ ❘ ✂ 6 + 10 + 2 − ✂ ✁ weigh weigh 11 + 3 − 2 − ✂ ✁ ✒ � 12 + 1 2 3 4 ✂ ✲ 4 − 1 2 6 ✁ ✲ 1 � ✲ 1 − 2 − 5 + 1 − ❇ 5 + ❆ ❅ 1 − 5 6 7 8 3 4 5 2 ❅ ❘ ❇ ❆ 5 + 6 + 2 − ❇ ❆ 7 + 3 − 7 + ❇ ❆❆ � ✒ 8 + 4 − ❯ 7 � ✲ ❇ 7 + 8 + 8 + ❅ 5 − ❇ 1 ❅ ❘ ⋆ ❇ 6 − ❇ 7 − 9 + ❇ � ✒ 8 − 9 ✲ � ❇ 9 + 10 + 11 + 10 + 9 + ✁✁ ✕ ❅ 9 − 10 ❇ ❅ ❘ 11 + 10 + 10 − ❇ ✁ weigh 11 + ❇ 11 − 10 − ✁ � ✒ ❇ 12 + 12 − 9 10 11 9 ◆ ❇ ✁ ✲ � ✲ 9 − 10 − 11 − 9 − ❆ ❅ 9 − 1 2 3 10 ❘ ❅ ❆ 11 − 10 − ❆ 11 − 12 + ❆❆ ✒ � 12 − 12 ❯ � ✲ 12 + 12 − 12 − ❅ 1 ❘ ❅ ⋆

  5. Definitions • Shannon information content: 1 h ( x = a i ) ≡ log 2 p i • Entropy: 1 � H ( X ) = p i log 2 p i i • Both are additive for independent variables 1 1 p h ( p ) H 2 ( p ) h ( p ) = log 2 H 2 ( p ) 10 p 0.8 8 0.001 10.0 0.011 0.6 0.01 6.6 0.081 6 0.1 3.3 0.47 0.4 4 0.2 2.3 0.72 0.2 2 0.5 1.0 1.0 0 0 1 p 1 p 0 0.2 0.4 0.6 0.8 0 0.2 0.4 0.6 0.8

  6. Game of Submarine • Player hides a submarine in one square of an 8 by 8 grid • Another player trys to hit it × × × × × × × × × × × × × × × × A ❥ B × × × × × × × × × × × × × × × × × × × × × C × × × × × × × × × × × × × × × × D × × × × × × × × × × × × × × × ❥ × × × × × × × × × × × × × × × × E ❥ × × × × × × × × × × × × × × × × F ❥ G × × × × × × × × × × × × × × × × × × × × × ❥ H × × × × × × × × × S × × × × × 1 2 3 4 5 6 7 8 move # 1 2 32 48 49 question G3 B1 E5 F3 H3 outcome x = n x = n x = n x = n x = y 63 62 32 16 1 P ( x ) 64 63 33 17 16 h ( x ) 0.0227 0.0230 0.0443 0.0874 4.0 Total info. 0.0227 0.0458 1.0 2.0 6.0 • Compare to asking 6 yes/no questions about the location

  7. Raw Bit Content • A binary name is given to each outcome of a random variable X • The length of the names would be log 2 |A X | (assuming |A X | happens to be a power of 2) • Define: The raw bit content of X is H 0 ( X ) = log 2 |A X | • Simply counts the possible outcomes - no compression yet • Additive: H 0 ( X, Y ) = H 0 ( X ) + H 0 ( Y )

  8. Lossy Compression • Let δ = 0 δ = 1 / 16 A X = { a , b , c , d , e , f , g , h } x c ( x ) x c ( x ) � 1 4 , 1 4 , 1 4 , 3 16 , 1 64 , 1 64 , 1 64 , 1 � P X = a 000 a 00 64 b 001 b 01 • The raw bit content is 3 bits (8 binary c 010 c 10 d 011 d 11 names) e 100 e − • If we are willing to run a risk of δ = 1 / 16 f 101 f − g 110 g − of not having a name for x, then we can h 111 h − get by with 2 bits (4 names)

  9. log 2 P ( x ) − 6 − 4 − 2 . 4 − 2 ✲ S 1 S 0 16 ✻ ✻ ✻ e , f , g , h a , b , c d The outcomes of X ranked by their probability

  10. Essential Bit Content • Allow an error with probability δ • Choose the smallest sufficient subset S δ such that P ( x ∈ S δ ) ≥ 1 − δ (arrange the elements of A X in order of decreasing probability and take enough from beginning) • Define: The essential bit content of X is H δ ( X ) = log 2 | S δ | • Note that the raw bit content H 0 is a special case of H δ

  11. 3 {a,b,c,d,e,f,g,h} {a,b,c,d,e,f,g} {a,b,c,d,e,f} 2.5 {a,b,c,d,e} H δ ( X ) 2 {a,b,c,d} {a,b,c} 1.5 1 {a,b} 0.5 {a} 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 δ The essential bit content as the function of allowed probability of error

  12. Extended Ensembles (Blocks) • Consider a tuple of N i.i.d. random variables • Denote by X N the ensemble ( X 1 , X 2 , . . . , X N ) • Entropy is additive: H ( X N ) = NH ( X ) • Example: N flips of a bent coin: p 0 = 0 . 9 , p 1 = 0 . 1

  13. log 2 P ( x ) − 14 − 12 − 10 − 8 − 6 − 4 − 2 0 ✲ S 0 . 01 S 0 . 1 ✻ ✻ ✻ ✻ ✻ 1101 , 1011 , . . . 0110 , 1010 , . . . 0010 , 0001 , . . . 1111 0000 Outcomes of the bent coin ensemble X 4

  14. 4 N=4 3.5 H δ ( X 4 ) 3 2.5 2 1.5 1 0.5 0 δ 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 Essential bit content of the bent coin ensemble X 4

  15. 10 N=10 8 H δ ( X 10 ) 6 4 2 0 δ 0 0.2 0.4 0.6 0.8 1 Essential bit content of the bent coin ensemble X 10

  16. 1 N=10 N=210 N=410 N H δ ( X N ) 1 0.8 N=610 N=810 N=1010 0.6 0.4 0.2 0 δ 0 0.2 0.4 0.6 0.8 1 Essential bit content per toss

  17. Shannon’s Source Coding Theorem Given ǫ > 0 and 0 < δ < 1 , there exists a positive integer N 0 such that for N > N 0 , � � 1 � N H δ ( X N ) − H ( X ) � � < ǫ. � � � 1 N H δ ( X N ) H 0 ( X ) • Proof involves – Law of large numbers H + ǫ H – Chebyshev’s inequality H − ǫ 0 1 δ

  18. log 2 ( P ( x )) x − 50.1 ...1...................1.....1....1.1.......1........1...........1.....................1.......11... − 37.3 ......................1.....1.....1.......1....1.........1.....................................1.... − 65.9 ........1....1..1...1....11..1.1.........11.........................1...1.1..1...1................1. − 56.4 1.1...1................1.......................11.1..1............................1.....1..1.11..... − 53.2 ...11...........1...1.....1.1......1..........1....1...1.....1............1......................... − 43.7 ..............1......1.........1.1.......1..........1............1...1......................1....... − 46.8 .....1........1.......1...1............1............1...........1......1..11........................ − 56.4 .....1..1..1...............111...................1...............1.........1.1...1...1.............1 − 37.3 .........1..........1.....1......1..........1....1..............................................1... − 43.7 ......1........................1..............1.....1..1.1.1..1...................................1. − 56.4 1.......................1..........1...1...................1....1....1........1..11..1.1...1........ − 37.3 ...........11.1.........1................1......1.....................1............................. − 56.4 .1..........1...1.1.............1.......11...........1.1...1..............1.............11.......... − 59.5 ......1...1..1.....1..11.1.1.1...1.....................1............1.............1..1.............. − 46.8 ............11.1......1....1..1............................1.......1..............1.......1......... − 15.2 .................................................................................................... − 332.1 1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111 Some samples from X 100 . Compare to H ( X 100 ) = 46 . 9 bits.

  19. Typicality • A string contains r 1s and N − r 0s • Consider r as a random variable (binomial distribution) � • Mean and std: r ∼ Np 1 ± Np 1 (1 − p 1 ) • A typical string is a one with r ≃ Np 1 • In general, information content within N [ H ( X ) ± β ] 1 1 � log 2 P ( x ) = N p i log 2 ≃ NH ( X ) p i i

  20. N = 100 N = 1000 1.2e+29 3e+299 1e+29 2.5e+299 � N � n ( r ) = 8e+28 2e+299 r 6e+28 1.5e+299 4e+28 1e+299 2e+28 5e+298 0 0 0 10 20 30 40 50 60 70 80 90 100 0 100 200 300 400 500 600 700 800 9001000 0 0 -50 -500 -100 -1000 log 2 P ( x ) T T -150 -1500 -200 -2000 -250 -2500 -300 -3000 -350 -3500 0 10 20 30 40 50 60 70 80 90 100 0 100 200 300 400 500 600 700 800 9001000 0.14 0.045 0.04 0.12 0.035 � N � p r 1 (1 − p 1 ) N − r 0.1 n ( r ) P ( x ) = 0.03 r 0.08 0.025 0.02 0.06 0.015 0.04 0.01 0.02 0.005 0 0 0 10 20 30 40 50 60 70 80 90 100 0 100 200 300 400 500 600 700 800 9001000 r r Anatomy of the typical set T

  21. log 2 P ( x ) − NH ( X ) ✲ T Nβ ✻ ✻ ✻ ✻ ✻ 1111111111110 . . . 11111110111 0000100000010 . . . 00001000010 0100000001000 . . . 00010000000 0001000000000 . . . 00000000000 0000000000000 . . . 00000000000 Outcomes of X N ranked by their probability and the typical set T Nβ

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend