Coding and Data Compression Mathias Winther Madsen - - PowerPoint PPT Presentation

▶

Coding and Data Compression Mathias Winther Madsen - - PowerPoint PPT Presentation

Jan 11, 2023 376 likes •600 views

Coding and Data Compression Mathias Winther Madsen mathias.winther@gmail.com Institute for Logic, Language, and Computation University of Amsterdam March 2015 Information Theory E M Y 2 H y x T REASONABLE CAUSES FOR EACH E 2 H x y T

SLIDE 1

Coding and Data Compression

Mathias Winther Madsen mathias.winther@gmail.com

Institute for Logic, Language, and Computation University of Amsterdam

March 2015

SLIDE 2

Information Theory

M E

2Hy x T

REASONABLE CAUSES FOR EACH E

2Hx y T

REASONABLE EFFECTS FOR EACH M

Claude Shannon: “A Mathematical Theory of Communication,” Bell System Technical Journal, 1948.

SLIDE 3

Information Theory

THE COIEF DIFFIOULTY ALOCE FOUOD OT FIRST WAS IN OAOAGING HER FLAOINGO: SHE SUCCEODEO ON GO OTIOG IOS BODY OUOKEO AOAO, COMFOROABLY EOOOGO, UNDER OER O OM, WITO OTS O O OS HANGIOG DOO O, BOT OENEOAO OY, OUST AS SO O HOD OOT OTS O OCK NOCEO O SOROIGHTEOEO O OT, ANO WOS O O ONG TO OIOE TO O HEDGEHOG O OLOW WOTH ITS O OAD, O O WOULO TWOST O OSEOF OOUO O ANO O O OK OP IN HOR OACO, O OTO OUO O A O O OZOED EO OREOSOOO O O O O SHO COUOD O O O O O O O O O OSO O OG O O O OAO OHO O O: AOD WHON O O O OAO OOO O O O O O O O DOO O, O OD O OS GOIOG O O BO O ON O O OIO, O O O OS O O OY O OOOOO O O O O O O O O O O O OT TO O OEOGO O O O O OD O OROLO O O O O O O OF, O O O O O O O O OHO O O O O O O O O O O O O O O O O O O

SLIDE 4

The Hartley Measure

Definition: The Hartley Measure of Uncertainty

H = log2 |Ω| . Ralph V. L. Hartley: “Transmission of Information,” Bell System Technical Journal, 1928.

SLIDE 5

The Hartley Measure

♠♣♥♦ ♣♠♥♦ ♠♣♦♥ ♣♠♦♥ ♠♥♣♦ ♣♥♠♦ ♠♦♣♥ ♣♦♠♥ ♠♥♦♣ ♣♥♦♠ ♠♦♥♣ ♣♦♥♠ ♠♣♦♥ ♣♠♦♥ ♠♣♥♦ ♣♠♥♦ ♠♦♣♥ ♣♦♠♥ ♠♥♣♦ ♣♥♠♦ ♠♦♥♣ ♣♦♥♠ ♠♥♦♣ ♣♥♦♠

H = log2 24 = 4.58

SLIDE 6

The Hartley Measure

00000 00001 00010 00011 00100 00101 00110 00111 01000 01001 01010 01011 01100 01101 01110 01111 10000 10001 10010 10011 10100 10101 10110 10111

H = log2 24 = 4.58

SLIDE 7

The Hartley Measure

♠♣♥♦ ♣♠♥♦ ♠♣♦♥ ♣♠♦♥ ♠♥♣♦ ♣♥♠♦ ♠♦♣♥ ♣♦♠♥ ♠♥♦♣ ♣♥♦♠ ♠♦♥♣ ♣♦♥♠ ♠♣♦♥ ♣♠♦♥ ♠♣♥♦ ♣♠♥♦ ♠♦♣♥ ♣♦♠♥ ♠♥♣♦ ♣♥♠♦ ♠♦♥♣ ♣♦♥♠ ♠♥♦♣ ♣♥♦♠

H = log2 24 = 4.58

SLIDE 8

The Hartley Measure

♠♣♥♦

– – –

♠♥♣♦

– – –

♠♥♦♣

– – –

♠♣♦♥

– – –

♠♦♣♥

– – –

♠♦♥♣

– – – H = log2 6 = 2.58

SLIDE 9

The Hartley Measure

000

– – –

001

– – –

010

– – –

011

– – –

100

– – –

101

– – – H = log2 6 = 2.58

SLIDE 10

The Hartley Measure

♠♣♥♦

– – –

♠♥♣♦

– – –

♠♥♦♣

– – –

♠♣♦♥

– – –

♠♦♣♥

– – –

♠♦♥♣

– – – H = log2 6 = 2.58

SLIDE 11

The Hartley Measure

– – – –

♠♥♣♦

– – –

♠♥♦♣

– – – – – – – – – – – – – – – H = log2 2 = 1.00

SLIDE 12

The Hartley Measure

– – – –

♠♥♣♦

– – – – – – – – – – – – – – – – – – – H = log2 1 = 0.00

SLIDE 13

The Hartley Measure

H = log k ? H = log(∞) ?

SLIDE 14

Entropy

The Shannon Entropy

H = E

1 p(X)

p(x) log 1 p(x). 1 2 3 0.2 0.4 0.6 x p(x) 1 2 3 0.2 0.4 0.6 − log p(x) p(x)

SLIDE 15

Entropy

0.5 1 0.5 1 H

SLIDE 16

Entropy

SLIDE 17

Entropy

0.5 1 2 4 6 p H . . . 1 2 3 1 − p p 1 − p p 1 − p p

SLIDE 18

Entropy

Properties of the entropy

1. Positive: H ≥ 0.
2. Decomposes: H(X × Y) = H(X) + H(Y | X).
3. Reduced (on average) by information: H(X) ≥ H(X | Y).

Definition: Conditional Entropy

H(X | Y) = EY[ H(X | Y) ] =

p(y) H(X | Y = y)

SLIDE 19

Huffman Coding

x a b c d e Pr{X = x} .05 .15 .20 .25 .35 David A. Huffman: “A Method for the Construction of Minimum-Redundancy Codes,” Proceedings of the Institute of Radio Engineers, 1952.

SLIDE 20

Huffman Coding

SLIDE 21

Huffman Coding

x Code p − log p k A 1001 .0634 3.98 4 B 011101 .0135 6.21 6 C 00011 .0242 5.37 5 D 10100 .0321 4.96 5 E 001 .0980 3.35 3 F 101111 .0174 5.84 6 G 101011 .0165 5.92 6 H 11011 .0438 4.51 5 I 0110 .0552 4.18 4 J 011100000 .0009 10.17 9 K 0111001 .0061 7.35 7 L 10110 .0336 4.89 5 M 101110 .0174 5.85 6 N 0101 .0551 4.18 4 O 1000 .0622 4.01 4 P 110100 .0180 5.80 6 x Code p − log p k Q 0111000100 .0008 10.33 10 R 0000 .0470 4.41 4 S 0100 .0502 4.32 4 T 1100 .0729 3.78 4 U 00010 .0234 5.42 5 V 0111110 .0075 7.06 7 W 011110 .0156 6.00 6 X 011100001 .0014 9.46 9 Y 101010 .0160 5.97 6 Z 01110001011 .0005 11.04 11 ¶ 0111111 .0084 6.89 7 _ 111 .1741 2.52 3 ’ 011100011 .0019 9.06 9 , 1101011 .0117 6.42 7 . 1101010 .0109 6.52 7 ? 01110001010 .0003 11.56 11