Data Compression (Chapters 4-6) presented by Tapani Raiko Feb 26, - PowerPoint PPT Presentation

T-61.182 Information Theory and Machine Learning Data Compression (Chapters 4-6) presented by Tapani Raiko Feb 26, 2004

Contents (Data Compression) Chap. 4 Chap. 5 Chap. 6 Data Block Symbol Stream Lossy? Lossy Lossless Lossless Result Shannon’s source Huffman coding Arithmetic coding coding theorem algorithm algorithm

Weighting Problem (What is information?) • 12 balls, all equal in weight except for one • Two-pan balance to use • Determine which is the odd ball and whether it is heavier or lighter • As few uses of the balance as possible! • The outcome of a random experiment is guaranteed to be most informative if the probability distribution over outcomes is uniform

1 + � ✒ 1 � ✲ 1 + 2 + 5 − 2 + 1 + ✁✁ ✕ ❅ 2 ❅ ❘ 5 − 2 + ✁ weigh 3 + 3 + ✁ � ✒ 4 + 1 2 6 3 ✁ ✲ � ✲ 3 + 4 + 6 − 4 + 1 + ✍ ✂ ❆ ❅ 5 − 3 4 5 4 ✂ ❅ ❘ ❆ 6 − 2 + 6 − ✂ ❆ ✂ 3 + 7 − 7 − ❆❆ ✒ � ✂ 4 + 1 8 − ❯ � ✲ 7 − 8 − 8 − ✂ 5 + ❅ 7 ✂ ❘ ❅ 6 + ⋆ ✂ 7 + ✂ 4 − ✒ � ✂ 8 + 3 � ✲ 6 + 3 − 4 − 3 − ✂ 9 + ✁✁ ✕ ❅ 1 − 4 ❅ ❘ ✂ 6 + 10 + 2 − ✂ ✁ weigh weigh 11 + 3 − 2 − ✂ ✁ ✒ � 12 + 1 2 3 4 ✂ ✲ 4 − 1 2 6 ✁ ✲ 1 � ✲ 1 − 2 − 5 + 1 − ❇ 5 + ❆ ❅ 1 − 5 6 7 8 3 4 5 2 ❅ ❘ ❇ ❆ 5 + 6 + 2 − ❇ ❆ 7 + 3 − 7 + ❇ ❆❆ � ✒ 8 + 4 − ❯ 7 � ✲ ❇ 7 + 8 + 8 + ❅ 5 − ❇ 1 ❅ ❘ ⋆ ❇ 6 − ❇ 7 − 9 + ❇ � ✒ 8 − 9 ✲ � ❇ 9 + 10 + 11 + 10 + 9 + ✁✁ ✕ ❅ 9 − 10 ❇ ❅ ❘ 11 + 10 + 10 − ❇ ✁ weigh 11 + ❇ 11 − 10 − ✁ � ✒ ❇ 12 + 12 − 9 10 11 9 ◆ ❇ ✁ ✲ � ✲ 9 − 10 − 11 − 9 − ❆ ❅ 9 − 1 2 3 10 ❘ ❅ ❆ 11 − 10 − ❆ 11 − 12 + ❆❆ ✒ � 12 − 12 ❯ � ✲ 12 + 12 − 12 − ❅ 1 ❘ ❅ ⋆

Definitions • Shannon information content: 1 h ( x = a i ) ≡ log 2 p i • Entropy: 1 � H ( X ) = p i log 2 p i i • Both are additive for independent variables 1 1 p h ( p ) H 2 ( p ) h ( p ) = log 2 H 2 ( p ) 10 p 0.8 8 0.001 10.0 0.011 0.6 0.01 6.6 0.081 6 0.1 3.3 0.47 0.4 4 0.2 2.3 0.72 0.2 2 0.5 1.0 1.0 0 0 1 p 1 p 0 0.2 0.4 0.6 0.8 0 0.2 0.4 0.6 0.8

Game of Submarine • Player hides a submarine in one square of an 8 by 8 grid • Another player trys to hit it × × × × × × × × × × × × × × × × A ❥ B × × × × × × × × × × × × × × × × × × × × × C × × × × × × × × × × × × × × × × D × × × × × × × × × × × × × × × ❥ × × × × × × × × × × × × × × × × E ❥ × × × × × × × × × × × × × × × × F ❥ G × × × × × × × × × × × × × × × × × × × × × ❥ H × × × × × × × × × S × × × × × 1 2 3 4 5 6 7 8 move # 1 2 32 48 49 question G3 B1 E5 F3 H3 outcome x = n x = n x = n x = n x = y 63 62 32 16 1 P ( x ) 64 63 33 17 16 h ( x ) 0.0227 0.0230 0.0443 0.0874 4.0 Total info. 0.0227 0.0458 1.0 2.0 6.0 • Compare to asking 6 yes/no questions about the location

Raw Bit Content • A binary name is given to each outcome of a random variable X • The length of the names would be log 2 |A X | (assuming |A X | happens to be a power of 2) • Define: The raw bit content of X is H 0 ( X ) = log 2 |A X | • Simply counts the possible outcomes - no compression yet • Additive: H 0 ( X, Y ) = H 0 ( X ) + H 0 ( Y )

Lossy Compression • Let δ = 0 δ = 1 / 16 A X = { a , b , c , d , e , f , g , h } x c ( x ) x c ( x ) � 1 4 , 1 4 , 1 4 , 3 16 , 1 64 , 1 64 , 1 64 , 1 � P X = a 000 a 00 64 b 001 b 01 • The raw bit content is 3 bits (8 binary c 010 c 10 d 011 d 11 names) e 100 e − • If we are willing to run a risk of δ = 1 / 16 f 101 f − g 110 g − of not having a name for x, then we can h 111 h − get by with 2 bits (4 names)

log 2 P ( x ) − 6 − 4 − 2 . 4 − 2 ✲ S 1 S 0 16 ✻ ✻ ✻ e , f , g , h a , b , c d The outcomes of X ranked by their probability

Essential Bit Content • Allow an error with probability δ • Choose the smallest sufficient subset S δ such that P ( x ∈ S δ ) ≥ 1 − δ (arrange the elements of A X in order of decreasing probability and take enough from beginning) • Define: The essential bit content of X is H δ ( X ) = log 2 | S δ | • Note that the raw bit content H 0 is a special case of H δ

3 {a,b,c,d,e,f,g,h} {a,b,c,d,e,f,g} {a,b,c,d,e,f} 2.5 {a,b,c,d,e} H δ ( X ) 2 {a,b,c,d} {a,b,c} 1.5 1 {a,b} 0.5 {a} 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 δ The essential bit content as the function of allowed probability of error

Extended Ensembles (Blocks) • Consider a tuple of N i.i.d. random variables • Denote by X N the ensemble ( X 1 , X 2 , . . . , X N ) • Entropy is additive: H ( X N ) = NH ( X ) • Example: N flips of a bent coin: p 0 = 0 . 9 , p 1 = 0 . 1

log 2 P ( x ) − 14 − 12 − 10 − 8 − 6 − 4 − 2 0 ✲ S 0 . 01 S 0 . 1 ✻ ✻ ✻ ✻ ✻ 1101 , 1011 , . . . 0110 , 1010 , . . . 0010 , 0001 , . . . 1111 0000 Outcomes of the bent coin ensemble X 4

4 N=4 3.5 H δ ( X 4 ) 3 2.5 2 1.5 1 0.5 0 δ 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 Essential bit content of the bent coin ensemble X 4

10 N=10 8 H δ ( X 10 ) 6 4 2 0 δ 0 0.2 0.4 0.6 0.8 1 Essential bit content of the bent coin ensemble X 10

1 N=10 N=210 N=410 N H δ ( X N ) 1 0.8 N=610 N=810 N=1010 0.6 0.4 0.2 0 δ 0 0.2 0.4 0.6 0.8 1 Essential bit content per toss

Shannon’s Source Coding Theorem Given ǫ > 0 and 0 < δ < 1 , there exists a positive integer N 0 such that for N > N 0 , � � 1 � N H δ ( X N ) − H ( X ) � � < ǫ. � � � 1 N H δ ( X N ) H 0 ( X ) • Proof involves – Law of large numbers H + ǫ H – Chebyshev’s inequality H − ǫ 0 1 δ

log 2 ( P ( x )) x − 50.1 ...1...................1.....1....1.1.......1........1...........1.....................1.......11... − 37.3 ......................1.....1.....1.......1....1.........1.....................................1.... − 65.9 ........1....1..1...1....11..1.1.........11.........................1...1.1..1...1................1. − 56.4 1.1...1................1.......................11.1..1............................1.....1..1.11..... − 53.2 ...11...........1...1.....1.1......1..........1....1...1.....1............1......................... − 43.7 ..............1......1.........1.1.......1..........1............1...1......................1....... − 46.8 .....1........1.......1...1............1............1...........1......1..11........................ − 56.4 .....1..1..1...............111...................1...............1.........1.1...1...1.............1 − 37.3 .........1..........1.....1......1..........1....1..............................................1... − 43.7 ......1........................1..............1.....1..1.1.1..1...................................1. − 56.4 1.......................1..........1...1...................1....1....1........1..11..1.1...1........ − 37.3 ...........11.1.........1................1......1.....................1............................. − 56.4 .1..........1...1.1.............1.......11...........1.1...1..............1.............11.......... − 59.5 ......1...1..1.....1..11.1.1.1...1.....................1............1.............1..1.............. − 46.8 ............11.1......1....1..1............................1.......1..............1.......1......... − 15.2 .................................................................................................... − 332.1 1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111 Some samples from X 100 . Compare to H ( X 100 ) = 46 . 9 bits.

Typicality • A string contains r 1s and N − r 0s • Consider r as a random variable (binomial distribution) � • Mean and std: r ∼ Np 1 ± Np 1 (1 − p 1 ) • A typical string is a one with r ≃ Np 1 • In general, information content within N [ H ( X ) ± β ] 1 1 � log 2 P ( x ) = N p i log 2 ≃ NH ( X ) p i i

N = 100 N = 1000 1.2e+29 3e+299 1e+29 2.5e+299 � N � n ( r ) = 8e+28 2e+299 r 6e+28 1.5e+299 4e+28 1e+299 2e+28 5e+298 0 0 0 10 20 30 40 50 60 70 80 90 100 0 100 200 300 400 500 600 700 800 9001000 0 0 -50 -500 -100 -1000 log 2 P ( x ) T T -150 -1500 -200 -2000 -250 -2500 -300 -3000 -350 -3500 0 10 20 30 40 50 60 70 80 90 100 0 100 200 300 400 500 600 700 800 9001000 0.14 0.045 0.04 0.12 0.035 � N � p r 1 (1 − p 1 ) N − r 0.1 n ( r ) P ( x ) = 0.03 r 0.08 0.025 0.02 0.06 0.015 0.04 0.01 0.02 0.005 0 0 0 10 20 30 40 50 60 70 80 90 100 0 100 200 300 400 500 600 700 800 9001000 r r Anatomy of the typical set T

log 2 P ( x ) − NH ( X ) ✲ T Nβ ✻ ✻ ✻ ✻ ✻ 1111111111110 . . . 11111110111 0000100000010 . . . 00001000010 0100000001000 . . . 00010000000 0001000000000 . . . 00000000000 0000000000000 . . . 00000000000 Outcomes of X N ranked by their probability and the typical set T Nβ

Data Compression (Chapters 4-6) presented by Tapani Raiko Feb 26, - PowerPoint PPT Presentation

T-61.182 Information Theory and Machine Learning Data Compression (Chapters 4-6) presented by Tapani Raiko Feb 26, 2004 Contents (Data Compression) Chap. 4 Chap. 5 Chap. 6 Data Block Symbol Stream Lossy? Lossy Lossless Lossless

Lossless compression in lossy compression systems Almost every lossy compression system

14.9.2 JPEG2000 compression DCT compression basis for JPEG wavelet compression

JPEG Compression Ian Snyder December 11, 2009 Ian Snyder JPEG Compression Outline

Lecture 9: Compression 1 / 52 Compression Recap Bu ff er Management Recap 2 / 52 Compression

Tradeoffs in XML Database Compression James Cheney University of Edinburgh Data Compression

From Sorting to Heaps to Compression Data Compression video on demand/set top box jpeg

Digital Image Compression Digital Image Compression Digital Image Compression and JPEG Standards

Digital Video Compression Digital Video Compression Digital Video Compression and H.261

COUNCIL OF CHAPTERS A liaison body linking chapter to chapter and chapters to ASA

Compression Overview Multimedia Encoding and Compression Huffman codes Lossless

Scientific Data Compression: From Stone-Age to Renaissance Factor 10,100 compression

Data Compression Reduce the size of data. Reduces storage space and hence storage cost.

Data compression anhtt-fit@mail.hut.edu.vn dungct@it-hut.edu.vn Data Compression Data in memory

ECS231 PCA, revisited May 28, 2019 1 / 18 Outline 1. PCA for lossy data compression 2. PCA for

A Model to Address Salary Compression for Faculty (an anti-compression model) Presented to

Compression Programs File Compression: Gzip, Bzip Archivers :Arc, Pkzip, Winrar,

Wireless Communication Systems @CS.NCTU Lecture 5: Compression Instructor: Kate Ching-Ju Lin (

Priority Queue implementation Creating Heaps

Text Operations Text Operations Berlin Chen 2003 References: 1. Modern Information Retrieval,

Random Access Archives for Efficient Compression of Many Small Files or Avoiding the void

V.2 Index Compression Heaps law (empirically observed and postulated): Size of the vocabulary

Information Theory Lecture 3 Lossless source coding algorithms: Huffman: CT5.68

1 Data structures for decoder: Construction of canonical Huffman: (sketch) The array

D ATA C OMPRESSION May. 7, 2015 Acknowledgement: The course slides are