[PPT] - Information Theory and Communications CSM25 Secure Information PowerPoint Presentation

SLIDE 1

Information Theory and Communications

CSM25 Secure Information Hiding Dr Hans Georg Schaathun

University of Surrey

Spring 2007

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 1 / 44

SLIDE 2

Learning Outcomes

become familiar with fundamental concepts in communications

Entropy and Redundancy Error-control coding Compression

be able to link communications fundamentals to steganography

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 2 / 44

SLIDE 3

Communications essentials Communications and Redundancy

Outline

1

Communications essentials Communications and Redundancy Digital Communications Shannon Entropy Security Prediction

2

Compression Recollection Huffmann Coding Huffmann Steganography

3

Grammars

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 3 / 44

SLIDE 4

Communications essentials Communications and Redundancy

The communications problem

m ˆ m

Alice

Bob Bob’s problem

Estimate m, given (partly) random output ˆ m from the channel

How much (un)certainty does Bob have about m? Information theory and Shannon entropy.

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 4 / 44

SLIDE 5

Communications essentials Communications and Redundancy

The communications problem

m ˆ m Noisy channel

Alice

Bob Bob’s problem

Estimate m, given (partly) random output ˆ m from the channel

How much (un)certainty does Bob have about m? Information theory and Shannon entropy.

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 4 / 44

SLIDE 6

Communications essentials Communications and Redundancy

The communications problem

m ˆ m Noisy channel

Alice

Bob Bob’s problem

Estimate m, given (partly) random output ˆ m from the channel

How much (un)certainty does Bob have about m? Information theory and Shannon entropy.

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 4 / 44

SLIDE 7

Communications essentials Communications and Redundancy

The communications problem

m ˆ m Noisy channel

Alice

Bob Bob’s problem

Estimate m, given (partly) random output ˆ m from the channel

How much (un)certainty does Bob have about m? Information theory and Shannon entropy.

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 4 / 44

SLIDE 8

Communications essentials Communications and Redundancy

The communications problem

m ˆ m Noisy channel Enc. Dec.

c

r

Alice

Bob Bob’s problem

Estimate m, given (partly) random output ˆ m from the channel

How much (un)certainty does Bob have about m? Information theory and Shannon entropy.

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 4 / 44

SLIDE 9

Communications essentials Communications and Redundancy

The communications problem

m ˆ m Noisy channel Enc. Dec.

c

r

Alice

Bob Bob’s problem

Estimate m, given (partly) random output ˆ m from the channel

How much (un)certainty does Bob have about m? Information theory and Shannon entropy.

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 4 / 44

SLIDE 10

Communications essentials Communications and Redundancy

Redundancy of English

Fact The English language is more than 50% redundant. Message destroyed on the channel

Redundancy allows Bob to determine the original m.

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 5 / 44

SLIDE 11

Communications essentials Communications and Redundancy

Redundancy of English

Fact The English language is more than 50% redundant. t** poces ohidg *ata**nsid* oherata. For ex, a xt fle cld hid* "inde"****imge orsnd ile By look****at te imgor list to ths**nd,yo wud nt nothatere is x*ra info******* rsent. from http://www.cdt.org/crypto/glossary.shtml Message destroyed on the channel

Redundancy allows Bob to determine the original m.

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 5 / 44

SLIDE 12

Communications essentials Communications and Redundancy

Redundancy of English

Fact The English language is more than 50% redundant. t** poces ohidg *ata**nsid* oherata. For ex, a xt fle cld hid* "inde"****imge orsnd ile By look****at te imgor list to ths**nd,yo wud nt nothatere is x*ra info******* rsent. from http://www.cdt.org/crypto/glossary.shtml Message destroyed on the channel

Redundancy allows Bob to determine the original m.

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 5 / 44

SLIDE 13

Communications essentials Communications and Redundancy

Redundancy of English

Fact The English language is more than 50% redundant. te poces o hid**g atainsid* oherdata. For ex*m***, a txt fle c**ldb hidd** "ind*de" a**imge orasund ile By look**gat te img,*or list**in* to th* s**nd,yo wuld nt *no**that here is x*ra info*****on rsent. from http://www.cdt.org/crypto/glossary.shtml Message destroyed on the channel

Redundancy allows Bob to determine the original m.

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 5 / 44

SLIDE 14

Communications essentials Communications and Redundancy

Redundancy of English

Fact The English language is more than 50% redundant. the process of hiding data inside other data. For example, a text file could be hidden "inside" an image or a sound file. By looking at the image, or listening to the sound, you would not know that there is extra information present. from http://www.cdt.org/crypto/glossary.shtml Message destroyed on the channel

Redundancy allows Bob to determine the original m.

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 5 / 44

SLIDE 15

Communications essentials Communications and Redundancy

Benefits of redundancy

Cross-word puzzles Understand foreigners with imperfect pronounciation.

How much would you understand of a lecture without redundancy?

Hear in a noisy environment. Read bad hand writing

How could I mark exam scripts without redundancy?

Cryptanalysis? Steganalysis?

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 6 / 44

SLIDE 16

Communications essentials Communications and Redundancy

Benefits of redundancy

Cross-word puzzles Understand foreigners with imperfect pronounciation.

How much would you understand of a lecture without redundancy?

Hear in a noisy environment. Read bad hand writing

How could I mark exam scripts without redundancy?

Cryptanalysis? Steganalysis?

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 6 / 44

SLIDE 17

Communications essentials Communications and Redundancy

Benefits of redundancy

Cross-word puzzles Understand foreigners with imperfect pronounciation.

How much would you understand of a lecture without redundancy?

Hear in a noisy environment. Read bad hand writing

How could I mark exam scripts without redundancy?

Cryptanalysis? Steganalysis?

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 6 / 44

SLIDE 18

Communications essentials Communications and Redundancy

Benefits of redundancy

Cross-word puzzles Understand foreigners with imperfect pronounciation.

How much would you understand of a lecture without redundancy?

Hear in a noisy environment. Read bad hand writing

How could I mark exam scripts without redundancy?

Cryptanalysis? Steganalysis?

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 6 / 44

SLIDE 19

Communications essentials Communications and Redundancy

Benefits of redundancy

Cross-word puzzles Understand foreigners with imperfect pronounciation.

How much would you understand of a lecture without redundancy?

Hear in a noisy environment. Read bad hand writing

How could I mark exam scripts without redundancy?

Cryptanalysis? Steganalysis?

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 6 / 44

SLIDE 20

Communications essentials Communications and Redundancy

Benefits of redundancy

Cross-word puzzles Understand foreigners with imperfect pronounciation.

How much would you understand of a lecture without redundancy?

Hear in a noisy environment. Read bad hand writing

How could I mark exam scripts without redundancy?

Cryptanalysis? Steganalysis?

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 6 / 44

SLIDE 21

Communications essentials Communications and Redundancy

Benefits of redundancy

Cross-word puzzles Understand foreigners with imperfect pronounciation.

How much would you understand of a lecture without redundancy?

Hear in a noisy environment. Read bad hand writing

How could I mark exam scripts without redundancy?

Cryptanalysis? Steganalysis?

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 6 / 44

SLIDE 22

Communications essentials Communications and Redundancy

What if there were no redundancy?

No use for steganography! Any text would be meaningful,

in particular, ciphertext would be meaningful

Simple encryption would give a stegogramme

indistinguishable from cover-text.

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 7 / 44

SLIDE 23

Communications essentials Communications and Redundancy

Problems in natural language

Natural languages are arbitrary Some words/sentences have a lot of redundancy

Others have very little

Unstructured: hard to automate correction

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 8 / 44

SLIDE 24

Communications essentials Communications and Redundancy

Problems in natural language

Natural languages are arbitrary Some words/sentences have a lot of redundancy

Others have very little

Unstructured: hard to automate correction

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 8 / 44

SLIDE 25

Communications essentials Communications and Redundancy

Problems in natural language

Natural languages are arbitrary Some words/sentences have a lot of redundancy

Others have very little

Unstructured: hard to automate correction

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 8 / 44

SLIDE 26

Communications essentials Communications and Redundancy

Problems in natural language

Natural languages are arbitrary Some words/sentences have a lot of redundancy

Others have very little

Unstructured: hard to automate correction

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 8 / 44

SLIDE 27

Communications essentials Digital Communications

Outline

1

Communications essentials Communications and Redundancy Digital Communications Shannon Entropy Security Prediction

2

Compression Recollection Huffmann Coding Huffmann Steganography

3

Grammars

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 9 / 44

SLIDE 28

Communications essentials Digital Communications

Coding

Channel and source coding

Source coding (aka. compression)

Remove redundancy Make a compact representation

Channel coding (aka. error-control coding)

Add mathematically structured redundancy Computationally efficient error-correction Optimised (low error-rate, small space)

Two aspect of Information Theory

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 10 / 44

SLIDE 29

Communications essentials Digital Communications

Coding

Channel and source coding

Source coding (aka. compression)

Remove redundancy Make a compact representation

Channel coding (aka. error-control coding)

Add mathematically structured redundancy Computationally efficient error-correction Optimised (low error-rate, small space)

Two aspect of Information Theory

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 10 / 44

SLIDE 30

Communications essentials Digital Communications

Coding

Channel and source coding

Source coding (aka. compression)

Remove redundancy Make a compact representation

Channel coding (aka. error-control coding)

Add mathematically structured redundancy Computationally efficient error-correction Optimised (low error-rate, small space)

Two aspect of Information Theory

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 10 / 44

SLIDE 31

Communications essentials Digital Communications

Channel and Source Coding

Message Comp.

Enc.

Dec. Channel

Decom.

r

Dr Hans Georg Schaathun

Information Theory and Communications Spring 2007 11 / 44

SLIDE 32

Communications essentials Digital Communications

Channel and Source Coding

Message Comp.

Enc.

Dec. Channel

Decom.

r

Remove redundancy

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 11 / 44

SLIDE 33

Communications essentials Digital Communications

Channel and Source Coding

Message Comp.

Enc.

Dec. Channel

Decom.

r

Remove redundancy

Add redundancy

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 11 / 44

SLIDE 34

Communications essentials Digital Communications

Channel and Source Coding

Message Comp.

Enc.

Dec. Channel

Decom.

r

Encrypt.
Decrypt.
Scramble

Remove redundancy Add redundancy

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 11 / 44

SLIDE 35

Communications essentials Shannon Entropy

Outline

1

Communications essentials Communications and Redundancy Digital Communications Shannon Entropy Security Prediction

2

Compression Recollection Huffmann Coding Huffmann Steganography

3

Grammars

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 12 / 44

SLIDE 36

Communications essentials Shannon Entropy

Uncertainty

Shannon Entropy

m and r are stochastic variables

(drawn at random from a distribution)

How much uncertainty about the message m?

Uncertainty measured by entropy H(m) before any message is received. H(m|r) after receipt of the message

Conditional entropy

Mutual Information is derived from entropy

I(m; r) = H(m) − H(m|r) I(m; r) is the amount of information contained in r about m. I(m; r) = I(r; m)

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 13 / 44

SLIDE 37

Communications essentials Shannon Entropy

Uncertainty

Shannon Entropy

m and r are stochastic variables

(drawn at random from a distribution)

How much uncertainty about the message m?

Uncertainty measured by entropy H(m) before any message is received. H(m|r) after receipt of the message

Conditional entropy

Mutual Information is derived from entropy

I(m; r) = H(m) − H(m|r) I(m; r) is the amount of information contained in r about m. I(m; r) = I(r; m)

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 13 / 44

SLIDE 38

Communications essentials Shannon Entropy

Uncertainty

Shannon Entropy

m and r are stochastic variables

(drawn at random from a distribution)

How much uncertainty about the message m?

Uncertainty measured by entropy H(m) before any message is received. H(m|r) after receipt of the message

Conditional entropy

Mutual Information is derived from entropy

I(m; r) = H(m) − H(m|r) I(m; r) is the amount of information contained in r about m. I(m; r) = I(r; m)

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 13 / 44

SLIDE 39

Communications essentials Shannon Entropy

Shannon entropy

Definition

Random variable X ∈ X Hq(X) = −

x∈X

Pr(X = x) logq Pr(X = x) Usually q = 2, giving entropy in bits

q = e (natural logarithm) gives entropy in nats

If Pr(X = xi) = pi for x1, x2, . . . ∈ X, we write

H(X) = h(p1, p2, . . .)

Example: One question Q; Yes/No is 50-50 probability

H(Q) = −2

1

2 log 1 2

= 1

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 14 / 44

SLIDE 40

Communications essentials Shannon Entropy

Shannon entropy

Definition

Random variable X ∈ X Hq(X) = −

x∈X

Pr(X = x) logq Pr(X = x) Usually q = 2, giving entropy in bits

q = e (natural logarithm) gives entropy in nats

If Pr(X = xi) = pi for x1, x2, . . . ∈ X, we write

H(X) = h(p1, p2, . . .)

Example: One question Q; Yes/No is 50-50 probability

H(Q) = −2

1

2 log 1 2

= 1

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 14 / 44

SLIDE 41

Communications essentials Shannon Entropy

Shannon entropy

Definition

Random variable X ∈ X Hq(X) = −

x∈X

Pr(X = x) logq Pr(X = x) Usually q = 2, giving entropy in bits

q = e (natural logarithm) gives entropy in nats

If Pr(X = xi) = pi for x1, x2, . . . ∈ X, we write

H(X) = h(p1, p2, . . .)

Example: One question Q; Yes/No is 50-50 probability

H(Q) = −2

1

2 log 1 2

= 1

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 14 / 44

SLIDE 42

Communications essentials Shannon Entropy

Shannon entropy

Definition

Random variable X ∈ X Hq(X) = −

x∈X

Pr(X = x) logq Pr(X = x) Usually q = 2, giving entropy in bits

q = e (natural logarithm) gives entropy in nats

If Pr(X = xi) = pi for x1, x2, . . . ∈ X, we write

H(X) = h(p1, p2, . . .)

Example: One question Q; Yes/No is 50-50 probability

H(Q) = −2

1

2 log 1 2

= 1

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 14 / 44

SLIDE 43

Communications essentials Shannon Entropy

Shannon entropy

Properties

1

Additive, if X and Y are independent, then H(X, Y) = H(X) + H(Y).

If you are uncertain about two completely different questions, the entropy is the sum of uncertainty for each question

2

If X is uniformly distributed,

then H(X) increase when the size of X increases. The more possibilities, the more uncertainty

3

Continuity, h(p1, p2, . . .) is continuous in each pi. Shannon entropy is a measure in mathematical terms

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 15 / 44

SLIDE 44

Communications essentials Shannon Entropy

What it tells us

Shannon entropy

Consider a message X of entropy k = H(X) (in bits) The average size of a file F describing X is

at least k bits

If the size of F is exactly k bits on average

then we have found a perfect compression of F Each message bit contains one bit of information on average

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 16 / 44

SLIDE 45

Communications essentials Shannon Entropy

What it tells us

Shannon entropy

Consider a message X of entropy k = H(X) (in bits) The average size of a file F describing X is

at least k bits

If the size of F is exactly k bits on average

then we have found a perfect compression of F Each message bit contains one bit of information on average

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 16 / 44

SLIDE 46

Communications essentials Shannon Entropy

Example banale

A single bit may contain more than a 1 bit of information E.G. Image Compression

0: Mona Lisa 10: Lenna 110: Baboon 11100: Peppers 11110: F-16 11101: Che Guevarra

11111. . . : other images

However, on average,

Maximum information in one bit is one bit (most of the time it is less)

The example is based on Huffmann coding

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 17 / 44

SLIDE 47

Communications essentials Shannon Entropy

Example banale

A single bit may contain more than a 1 bit of information E.G. Image Compression

0: Mona Lisa 10: Lenna 110: Baboon 11100: Peppers 11110: F-16 11101: Che Guevarra

11111. . . : other images

However, on average,

Maximum information in one bit is one bit (most of the time it is less)

The example is based on Huffmann coding

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 17 / 44

SLIDE 48

Communications essentials Shannon Entropy

Example banale

A single bit may contain more than a 1 bit of information E.G. Image Compression

0: Mona Lisa 10: Lenna 110: Baboon 11100: Peppers 11110: F-16 11101: Che Guevarra

11111. . . : other images

However, on average,

Maximum information in one bit is one bit (most of the time it is less)

The example is based on Huffmann coding

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 17 / 44

SLIDE 49

Communications essentials Shannon Entropy

Example banale

A single bit may contain more than a 1 bit of information E.G. Image Compression

0: Mona Lisa 10: Lenna 110: Baboon 11100: Peppers 11110: F-16 11101: Che Guevarra

11111. . . : other images

However, on average,

Maximum information in one bit is one bit (most of the time it is less)

The example is based on Huffmann coding

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 17 / 44

SLIDE 50

Communications essentials Shannon Entropy

Example banale

A single bit may contain more than a 1 bit of information E.G. Image Compression

0: Mona Lisa 10: Lenna 110: Baboon 11100: Peppers 11110: F-16 11101: Che Guevarra

11111. . . : other images

However, on average,

Maximum information in one bit is one bit (most of the time it is less)

The example is based on Huffmann coding

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 17 / 44

SLIDE 51

Communications essentials Shannon Entropy

Example banale

A single bit may contain more than a 1 bit of information E.G. Image Compression

0: Mona Lisa 10: Lenna 110: Baboon 11100: Peppers 11110: F-16 11101: Che Guevarra

11111. . . : other images

However, on average,

Maximum information in one bit is one bit (most of the time it is less)

The example is based on Huffmann coding

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 17 / 44

SLIDE 52

Communications essentials Shannon Entropy

Example banale

A single bit may contain more than a 1 bit of information E.G. Image Compression

0: Mona Lisa 10: Lenna 110: Baboon 11100: Peppers 11110: F-16 11101: Che Guevarra

11111. . . : other images

However, on average,

Maximum information in one bit is one bit (most of the time it is less)

The example is based on Huffmann coding

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 17 / 44

SLIDE 53

Communications essentials Shannon Entropy

Example banale

A single bit may contain more than a 1 bit of information E.G. Image Compression

0: Mona Lisa 10: Lenna 110: Baboon 11100: Peppers 11110: F-16 11101: Che Guevarra

11111. . . : other images

However, on average,

Maximum information in one bit is one bit (most of the time it is less)

The example is based on Huffmann coding

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 17 / 44

SLIDE 54

Communications essentials Security

Outline

1

Communications essentials Communications and Redundancy Digital Communications Shannon Entropy Security Prediction

2

Compression Recollection Huffmann Coding Huffmann Steganography

3

Grammars

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 18 / 44

SLIDE 55

Communications essentials Security

Cryptography

Alice ciphertext Bob, m → c → m Eve Eve seeks information about m, observing c

If I(m; c) > 0 then Eve succeeds in theory

r if I(k; c) > 0

If H(m|c) = H(m) then the system is absolutely secure.

The above are strong statements

Even if Eve has information I(m; c) > 0,

she may be unable to make sense of it.

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 19 / 44

SLIDE 56

Communications essentials Security

Stegananalysis

Question: Does Alice send secret information to Bob? Answer: X ∈ {yes, no} What is the uncertainty H(X)? Eve intercepts a message S,

Is there any information I(X; S)?

If H(X|S) = H(X), then the system is absolutely secure.

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 20 / 44

SLIDE 57

Communications essentials Security

Stegananalysis

Question: Does Alice send secret information to Bob? Answer: X ∈ {yes, no} What is the uncertainty H(X)? Eve intercepts a message S,

Is there any information I(X; S)?

If H(X|S) = H(X), then the system is absolutely secure.

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 20 / 44

SLIDE 58

Communications essentials Security

Stegananalysis

Question: Does Alice send secret information to Bob? Answer: X ∈ {yes, no} What is the uncertainty H(X)? Eve intercepts a message S,

Is there any information I(X; S)?

If H(X|S) = H(X), then the system is absolutely secure.

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 20 / 44

SLIDE 59

Communications essentials Security

Stegananalysis

Question: Does Alice send secret information to Bob? Answer: X ∈ {yes, no} What is the uncertainty H(X)? Eve intercepts a message S,

Is there any information I(X; S)?

If H(X|S) = H(X), then the system is absolutely secure.

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 20 / 44

SLIDE 60

Communications essentials Security

Stegananalysis

Question: Does Alice send secret information to Bob? Answer: X ∈ {yes, no} What is the uncertainty H(X)? Eve intercepts a message S,

Is there any information I(X; S)?

If H(X|S) = H(X), then the system is absolutely secure.

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 20 / 44

SLIDE 61

Communications essentials Prediction

Outline

1

Communications essentials Communications and Redundancy Digital Communications Shannon Entropy Security Prediction

2

Compression Recollection Huffmann Coding Huffmann Steganography

3

Grammars

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 21 / 44

SLIDE 62

Communications essentials Prediction

Random sequences

Text is a sequence of random samples (letters)

(l1, l2, l3, . . .); li ∈ A = {A, B, . . . , Z}

Each letter has a probability distribution P(l), l ∈ A. Statistical dependence (aka. redundancy)

P(li|li−1) = P(li) H(li|li−1) < H(li): Letter i − 1 contains information about li Use this information to guess li

The more letters li−j, . . . , li−1 we have seen

the more reliable can we predict li

Wayner (Ch 6.1) gives example of first, second, . . . , fifth order prediction

Using j = 0, 1, 2, 3, 4

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 22 / 44

SLIDE 63

Communications essentials Prediction

First-order prediction

Example from Wayner

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 23 / 44

SLIDE 64

Communications essentials Prediction

Second-order prediction

Example from Wayner

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 24 / 44

SLIDE 65

Communications essentials Prediction

Third-order prediction

Example from Wayner

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 25 / 44

SLIDE 66

Communications essentials Prediction

Fourth-order prediction

Example from Wayner

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 26 / 44

SLIDE 67

Compression Recollection

Outline

1

Communications essentials Communications and Redundancy Digital Communications Shannon Entropy Security Prediction

2

Compression Recollection Huffmann Coding Huffmann Steganography

3

Grammars

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 27 / 44

SLIDE 68

Compression Recollection

Compression

❋∗ is set of binary strings of arbitrary length Definition A compression system is a function c : ❋∗ → ❋∗, such that E(length m) > E(length(c(m))) when m is drawn from ❋∗. The compressed string is expected to be shorter than the original. Definition A compression c is perfect if all target strings are used, i.e. if for any m ∈ ❋∗, c−1(m) is a sensible file (cover-text). Decompress a random string, and it makes sense!

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 28 / 44

SLIDE 69

Compression Recollection

Steganography by Perfect Compression

Anderson and Petitcolas 1998

A perfect compression scheme. A secure cipher. Decompress Encryption C

Compress

S

Decrypt

C

Message
Message
Key
Steganography without data hiding.

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 29 / 44

SLIDE 70

Compression Recollection

Steganography by Perfect Compression

Anderson and Petitcolas 1998

A perfect compression scheme. A secure cipher. Decompress Encryption C

Compress

S

Decrypt

C

Message
Message
Key
Steganography without data hiding.

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 29 / 44

SLIDE 71

Compression Huffmann Coding

Outline

1

Communications essentials Communications and Redundancy Digital Communications Shannon Entropy Security Prediction

2

Compression Recollection Huffmann Coding Huffmann Steganography

3

Grammars

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 30 / 44

SLIDE 72

Compression Huffmann Coding

Huffmann Coding

Short codewords for frequent quantities Long codewords for unusual quantities Each symbol (bit) should be equally probable

50%
1
25%
25%
1

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 31 / 44

SLIDE 73

Compression Huffmann Coding

Example

1
25%
1
12 1

2%

1
25%
25%
1
7 1

4%

1
7 1

4%

Dr Hans Georg Schaathun

Information Theory and Communications Spring 2007 32 / 44

SLIDE 74

Compression Huffmann Coding

Decoding

Huffmann codes are prefix free

No codeword is the prefix of another This simplifies the decoding

This is expressed in the Huffmann tree,

follow edges for each coded bit (only) leaf node resolves to a message symbol

When a message symbol is recovered, start over for next symbol.

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 33 / 44

SLIDE 75

Compression Huffmann Coding

Ideal Huffmann code

Each branch equally likely: P(bi|bi−1, bi−2, . . .) = 1/2 Maximum entropy H(Bi|Bi−1, Bi−2, . . .) = 1

uniform distribution of compressed files implies perfect compression

In practice, the probabilities are rarely powers of 1

2 hence the Huffmann code is imperfect

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 34 / 44

SLIDE 76

Compression Huffmann Steganography

Outline

1

Communications essentials Communications and Redundancy Digital Communications Shannon Entropy Security Prediction

2

Compression Recollection Huffmann Coding Huffmann Steganography

3

Grammars

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 35 / 44

SLIDE 77

Compression Huffmann Steganography

Reverse Huffmann

Core Reading Peter Wayner: Disappearing Cryptography Ch. 6-7 Stego-encoder: Huffmann decompression Stego-decoder: Huffmann compression Is this similar to Anderson & Petitcolas

Steganography by Perfect Compression?

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 36 / 44

SLIDE 78

Compression Huffmann Steganography

The Stegogramme

Stegogramme looks like random text

use probability distribution based on sample text higher-order statistics make it look natural

Fifth-order statistics is reasonable Higher order will look more natural

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 37 / 44

SLIDE 79

Compression Huffmann Steganography

The Stegogramme

Stegogramme looks like random text

use probability distribution based on sample text higher-order statistics make it look natural

Fifth-order statistics is reasonable Higher order will look more natural

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 37 / 44

SLIDE 80

Compression Huffmann Steganography

Example

Fifth order

For each 5-tupple of letters A0, A1, A2, A3, A4,

Let li−4, . . . , li be consecutive letters in natural text tabulate P(li = A0|li−j = Aj, j = 1, 2, 3, 4)

For each 4-tuple A1, A2, A4, A5

make an (approximate) Huffmann code for A0.

we may ommit some values of A0,

r have non-unique codewords

We encode a message by Huffmann decompression

using Huffmann code depending on the last four stegogramme symbols

btaining a fifth-order random text

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 38 / 44

SLIDE 81

Compression Huffmann Steganography

Example

Fifth order

Consider four preceeding letters comp Next letter may be letter r e l a

probability

40% 12% 22% 18% 8% combined 52% 22% 26% rounded 50% 25% 25% Rounding to power of 1

2

Combining several letters reduces rounding error.

The example is arbitrary and fictuous.

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 39 / 44

SLIDE 82

Compression Huffmann Steganography

Example

The Huffmann code

Huffmann code based on fifth-order conditional probabilities

r/e
1
l
a/o
1

When two letters are possible,

choose at random (according to probalitity in natural text) decoding (compression) is still unique encoding (decompression) is not unique

This evens out the statistics in the stegogramme

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 40 / 44

SLIDE 83

Compression Huffmann Steganography

Is this practical?

Exercise

To be discussed in groups of 2-4. How would you steganalyse a potential Huffmann-based stegogramme? How practical is the steganalysis? How would you implement Huffmann-based steganography?

Which implementation issues/challenges do you foresee?

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 41 / 44

SLIDE 84

Grammars

Grammar

A grammar describes the structure of a language Simple grammar

sentence → noun verb noun → Mr. Brown | Miss Scarlet verb → eats | drinks

Each choice can map to a message symbol

0 : Mr. Brown, eats 1 : Miss Scarlet, drinks

Two messages can be stego-encrypted No cover-text is input.

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 42 / 44

SLIDE 85

Grammars

More complex grammar

sentence → noun verb addition noun → Mr. Brown | Miss Scarlet | . . . | Mrs. White verb → eats | drinks | celebrates | . . . | cooks addition → addition term | ∅ term → on Monday | in March | with Mr. Green | . . . | in Alaska | at home general → sentence | question question → Does noun verb addition ? xgeneral → general | sentence, because sentence

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 43 / 44

SLIDE 86

Grammars

More complex grammar

sentence → noun verb addition noun → Mr. Brown | Miss Scarlet | . . . | Mrs. White verb → eats | drinks | celebrates | . . . | cooks addition → addition term | ∅ term → on Monday | in March | with Mr. Green | . . . | in Alaska | at home general → sentence | question question → Does noun verb addition ? xgeneral → general | sentence, because sentence

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 43 / 44

SLIDE 87

Grammars

More complex grammar

sentence → noun verb addition noun → Mr. Brown | Miss Scarlet | . . . | Mrs. White verb → eats | drinks | celebrates | . . . | cooks addition → addition term | ∅ term → on Monday | in March | with Mr. Green | . . . | in Alaska | at home general → sentence | question question → Does noun verb addition ? xgeneral → general | sentence, because sentence

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 43 / 44

SLIDE 88

Grammars

Discussion

How practical is a grammar-based stego-system? Which implementation issues do you foresee? Can you visualise a grammar-variant for images?

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 44 / 44