information theory and synthetic steganography
play

Information Theory and Synthetic Steganography CSM25 Secure - PowerPoint PPT Presentation

Information Theory and Synthetic Steganography CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey Spring 2008 Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 1 / 53 Learning


  1. Communications essentials Communications and Redundancy What if there were no redundancy? No use for steganography! Any text would be meaningful, in particular, ciphertext would be meaningful Simple encryption would give a stegogramme indistinguishable from cover-text. Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 8 / 53

  2. Communications essentials Communications and Redundancy What if there were no redundancy? No use for steganography! Any text would be meaningful, in particular, ciphertext would be meaningful Simple encryption would give a stegogramme indistinguishable from cover-text. Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 8 / 53

  3. Communications essentials Communications and Redundancy What if there were no redundancy? No use for steganography! Any text would be meaningful, in particular, ciphertext would be meaningful Simple encryption would give a stegogramme indistinguishable from cover-text. Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 8 / 53

  4. Communications essentials Anderson and Petitcolas 1999 Outline Communications essentials 1 Communications and Redundancy Anderson and Petitcolas 1999 Digital Communications Shannon Entropy Security Prediction Compression 2 Huffmann Coding Huffmann Steganography Miscellanea 3 Synthesis by Grammar Redundancy in Images Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 9 / 53

  5. Communications essentials Anderson and Petitcolas 1999 Perfect compression Compression removes redundancy Minimises average string length (file size) Retaining information contents Decompression replaces the redundancy Recover original (loss-less compression) Perfect means no redundancy in compressed string Consequently all strings are used A(ny) random string can be decompressed ... and yield sensible output Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 10 / 53

  6. Communications essentials Anderson and Petitcolas 1999 Perfect compression Compression removes redundancy Minimises average string length (file size) Retaining information contents Decompression replaces the redundancy Recover original (loss-less compression) Perfect means no redundancy in compressed string Consequently all strings are used A(ny) random string can be decompressed ... and yield sensible output Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 10 / 53

  7. � � � � � � � Communications essentials Anderson and Petitcolas 1999 Steganography by Perfect Compression Anderson and Petitcolas 1998 A perfect compression scheme. A secure cipher. Message Message Encryption Key Decrypt C C S Decompress Compress Steganography without data hiding. Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 11 / 53

  8. � � � � � � � Communications essentials Anderson and Petitcolas 1999 Steganography by Perfect Compression Anderson and Petitcolas 1998 A perfect compression scheme. A secure cipher. Message Message Encryption Key Decrypt C C S Decompress Compress Steganography without data hiding. Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 11 / 53

  9. Communications essentials Digital Communications Outline Communications essentials 1 Communications and Redundancy Anderson and Petitcolas 1999 Digital Communications Shannon Entropy Security Prediction Compression 2 Huffmann Coding Huffmann Steganography Miscellanea 3 Synthesis by Grammar Redundancy in Images Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 12 / 53

  10. Communications essentials Digital Communications Problems in natural language How efficient is the redundancy Natural languages are arbitrary Some words/sentences have a lot of redundancy Others have very little Unstructured: hard to automate correction Structured redundancy is necessary for digital comms Coding Theory Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 13 / 53

  11. Communications essentials Digital Communications Problems in natural language How efficient is the redundancy Natural languages are arbitrary Some words/sentences have a lot of redundancy Others have very little Unstructured: hard to automate correction Structured redundancy is necessary for digital comms Coding Theory Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 13 / 53

  12. Communications essentials Digital Communications Problems in natural language How efficient is the redundancy Natural languages are arbitrary Some words/sentences have a lot of redundancy Others have very little Unstructured: hard to automate correction Structured redundancy is necessary for digital comms Coding Theory Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 13 / 53

  13. Communications essentials Digital Communications Problems in natural language How efficient is the redundancy Natural languages are arbitrary Some words/sentences have a lot of redundancy Others have very little Unstructured: hard to automate correction Structured redundancy is necessary for digital comms Coding Theory Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 13 / 53

  14. Communications essentials Digital Communications Problems in natural language How efficient is the redundancy Natural languages are arbitrary Some words/sentences have a lot of redundancy Others have very little Unstructured: hard to automate correction Structured redundancy is necessary for digital comms Coding Theory Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 13 / 53

  15. Communications essentials Digital Communications Problems in natural language How efficient is the redundancy Natural languages are arbitrary Some words/sentences have a lot of redundancy Others have very little Unstructured: hard to automate correction Structured redundancy is necessary for digital comms Coding Theory Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 13 / 53

  16. Communications essentials Digital Communications Problems in natural language How efficient is the redundancy Natural languages are arbitrary Some words/sentences have a lot of redundancy Others have very little Unstructured: hard to automate correction Structured redundancy is necessary for digital comms Coding Theory Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 13 / 53

  17. Communications essentials Digital Communications Coding Channel and source coding Source coding (aka. compression) Remove redundancy Make a compact representation Channel coding (aka. error-control coding) Add mathematically structured redundancy Computationally efficient error-correction Optimised (low error-rate, small space) Two aspect of Information Theory Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 14 / 53

  18. Communications essentials Digital Communications Coding Channel and source coding Source coding (aka. compression) Remove redundancy Make a compact representation Channel coding (aka. error-control coding) Add mathematically structured redundancy Computationally efficient error-correction Optimised (low error-rate, small space) Two aspect of Information Theory Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 14 / 53

  19. Communications essentials Digital Communications Coding Channel and source coding Source coding (aka. compression) Remove redundancy Make a compact representation Channel coding (aka. error-control coding) Add mathematically structured redundancy Computationally efficient error-correction Optimised (low error-rate, small space) Two aspect of Information Theory Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 14 / 53

  20. � � � � � Communications essentials Digital Communications Channel and Source Coding Message r Comp. Decom. Channel Encode Decode Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 15 / 53

  21. � � � � � Communications essentials Digital Communications Channel and Source Coding Message r Comp. Decom. Remove redundancy Channel Encode Decode Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 15 / 53

  22. � � � � � Communications essentials Digital Communications Channel and Source Coding Message r Comp. Decom. Remove redundancy Channel Encode Decode Add redundancy Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 15 / 53

  23. � � � � � � � Communications essentials Digital Communications Channel and Source Coding Message r Comp. Decom. Remove redundancy Encrypt. Decrypt. Scramble Channel Encode Decode Add redundancy Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 15 / 53

  24. Communications essentials Shannon Entropy Outline Communications essentials 1 Communications and Redundancy Anderson and Petitcolas 1999 Digital Communications Shannon Entropy Security Prediction Compression 2 Huffmann Coding Huffmann Steganography Miscellanea 3 Synthesis by Grammar Redundancy in Images Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 16 / 53

  25. Communications essentials Shannon Entropy Uncertainty Shannon Entropy m and r are stochastic variables (drawn at random from a distribution) How much uncertainty about the message m ? Uncertainty measured by entropy H ( m ) before any message is received. H ( m | r ) after receipt of the message Conditional entropy Mutual Information is derived from entropy I ( m ; r ) = H ( m ) − H ( m | r ) I ( m ; r ) is the amount of information contained in r about m . I ( m ; r ) = I ( r ; m ) Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 17 / 53

  26. Communications essentials Shannon Entropy Uncertainty Shannon Entropy m and r are stochastic variables (drawn at random from a distribution) How much uncertainty about the message m ? Uncertainty measured by entropy H ( m ) before any message is received. H ( m | r ) after receipt of the message Conditional entropy Mutual Information is derived from entropy I ( m ; r ) = H ( m ) − H ( m | r ) I ( m ; r ) is the amount of information contained in r about m . I ( m ; r ) = I ( r ; m ) Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 17 / 53

  27. Communications essentials Shannon Entropy Uncertainty Shannon Entropy m and r are stochastic variables (drawn at random from a distribution) How much uncertainty about the message m ? Uncertainty measured by entropy H ( m ) before any message is received. H ( m | r ) after receipt of the message Conditional entropy Mutual Information is derived from entropy I ( m ; r ) = H ( m ) − H ( m | r ) I ( m ; r ) is the amount of information contained in r about m . I ( m ; r ) = I ( r ; m ) Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 17 / 53

  28. Communications essentials Shannon Entropy Shannon entropy Definition Random variable X ∈ X � H q ( X ) = − Pr ( X = x ) log q Pr ( X = x ) x ∈X Usually q = 2, giving entropy in bits q = e (natural logarithm) gives entropy in nats If Pr ( X = x i ) = p i for x 1 , x 2 , . . . ∈ X , we write H ( X ) = h ( p 1 , p 2 , . . . ) Example: One question Q ; Yes/No is 50-50 probability � � 1 2 log 1 H ( Q ) = − 2 = 1 2 Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 18 / 53

  29. Communications essentials Shannon Entropy Shannon entropy Definition Random variable X ∈ X � H q ( X ) = − Pr ( X = x ) log q Pr ( X = x ) x ∈X Usually q = 2, giving entropy in bits q = e (natural logarithm) gives entropy in nats If Pr ( X = x i ) = p i for x 1 , x 2 , . . . ∈ X , we write H ( X ) = h ( p 1 , p 2 , . . . ) Example: One question Q ; Yes/No is 50-50 probability � � 1 2 log 1 H ( Q ) = − 2 = 1 2 Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 18 / 53

  30. Communications essentials Shannon Entropy Shannon entropy Definition Random variable X ∈ X � H q ( X ) = − Pr ( X = x ) log q Pr ( X = x ) x ∈X Usually q = 2, giving entropy in bits q = e (natural logarithm) gives entropy in nats If Pr ( X = x i ) = p i for x 1 , x 2 , . . . ∈ X , we write H ( X ) = h ( p 1 , p 2 , . . . ) Example: One question Q ; Yes/No is 50-50 probability � � 1 2 log 1 H ( Q ) = − 2 = 1 2 Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 18 / 53

  31. Communications essentials Shannon Entropy Shannon entropy Definition Random variable X ∈ X � H q ( X ) = − Pr ( X = x ) log q Pr ( X = x ) x ∈X Usually q = 2, giving entropy in bits q = e (natural logarithm) gives entropy in nats If Pr ( X = x i ) = p i for x 1 , x 2 , . . . ∈ X , we write H ( X ) = h ( p 1 , p 2 , . . . ) Example: One question Q ; Yes/No is 50-50 probability � � 1 2 log 1 H ( Q ) = − 2 = 1 2 Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 18 / 53

  32. Communications essentials Shannon Entropy Example Alice has a 1-bit message m , with 50-50 distribution The entropy (Bob’s uncertainty) is H ( m The Binary Symmetric Channel has an error rate of 25% i.e. 25% risk that Alice’s message is flipped Alice’s uncertainty about the received message is H ( r | 1 ) = H ( r | 0 ) = − 0 . 25 log 0 . 25 − 0 . 75 log 0 . 75 ≈ 0 . 811 H ( r | m ) = 0 . 5 H ( r | 0 ) + 0 . 5 H ( r | 1 ) = 0 . 811 The information received by Bob is I ( m ; r ) = H ( m ) − H ( m | r ) = H ( r ) − H ( r | m ) = 1 − 0 . 811 = 0 . 189 What if the error rate is 50 % ? Or 10 % ? Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 19 / 53

  33. Communications essentials Shannon Entropy Shannon entropy Properties Additive, if X and Y are independent, then 1 H ( X , Y ) = H ( X ) + H ( Y ) . If you are uncertain about two completely different questions, the entropy is the sum of uncertainty for each question If X is uniformly distributed, 2 then H ( X ) increase when the size of X increases. The more possibilities, the more uncertainty Continuity, h ( p 1 , p 2 , . . . ) is continuous in each p i . 3 Shannon entropy is a measure in mathematical terms Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 20 / 53

  34. Communications essentials Shannon Entropy What it tells us Shannon entropy Consider a message X of entropy k = H ( X ) (in bits) The average size of a file F describing X is at least k bits If the size of F is exactly k bits on average then we have found a perfect compression of F Each message bit contains one bit of information on average Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 21 / 53

  35. Communications essentials Shannon Entropy What it tells us Shannon entropy Consider a message X of entropy k = H ( X ) (in bits) The average size of a file F describing X is at least k bits If the size of F is exactly k bits on average then we have found a perfect compression of F Each message bit contains one bit of information on average Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 21 / 53

  36. Communications essentials Shannon Entropy Exemple banale A single bit may contain more than a 1 bit of information E.G. Image Compression 0: Mona Lisa 10: Lenna 110: Baboon 11100: Peppers 11110: F-16 11101: Che Guevarra 11111. . . : other images However, on average, Maximum information in one bit is one bit (most of the time it is less) The example is based on Huffmann coding Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 22 / 53

  37. Communications essentials Shannon Entropy Exemple banale A single bit may contain more than a 1 bit of information E.G. Image Compression 0: Mona Lisa 10: Lenna 110: Baboon 11100: Peppers 11110: F-16 11101: Che Guevarra 11111. . . : other images However, on average, Maximum information in one bit is one bit (most of the time it is less) The example is based on Huffmann coding Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 22 / 53

  38. Communications essentials Shannon Entropy Exemple banale A single bit may contain more than a 1 bit of information E.G. Image Compression 0: Mona Lisa 10: Lenna 110: Baboon 11100: Peppers 11110: F-16 11101: Che Guevarra 11111. . . : other images However, on average, Maximum information in one bit is one bit (most of the time it is less) The example is based on Huffmann coding Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 22 / 53

  39. Communications essentials Shannon Entropy Exemple banale A single bit may contain more than a 1 bit of information E.G. Image Compression 0: Mona Lisa 10: Lenna 110: Baboon 11100: Peppers 11110: F-16 11101: Che Guevarra 11111. . . : other images However, on average, Maximum information in one bit is one bit (most of the time it is less) The example is based on Huffmann coding Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 22 / 53

  40. Communications essentials Shannon Entropy Exemple banale A single bit may contain more than a 1 bit of information E.G. Image Compression 0: Mona Lisa 10: Lenna 110: Baboon 11100: Peppers 11110: F-16 11101: Che Guevarra 11111. . . : other images However, on average, Maximum information in one bit is one bit (most of the time it is less) The example is based on Huffmann coding Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 22 / 53

  41. Communications essentials Shannon Entropy Exemple banale A single bit may contain more than a 1 bit of information E.G. Image Compression 0: Mona Lisa 10: Lenna 110: Baboon 11100: Peppers 11110: F-16 11101: Che Guevarra 11111. . . : other images However, on average, Maximum information in one bit is one bit (most of the time it is less) The example is based on Huffmann coding Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 22 / 53

  42. Communications essentials Shannon Entropy Exemple banale A single bit may contain more than a 1 bit of information E.G. Image Compression 0: Mona Lisa 10: Lenna 110: Baboon 11100: Peppers 11110: F-16 11101: Che Guevarra 11111. . . : other images However, on average, Maximum information in one bit is one bit (most of the time it is less) The example is based on Huffmann coding Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 22 / 53

  43. Communications essentials Shannon Entropy Exemple banale A single bit may contain more than a 1 bit of information E.G. Image Compression 0: Mona Lisa 10: Lenna 110: Baboon 11100: Peppers 11110: F-16 11101: Che Guevarra 11111. . . : other images However, on average, Maximum information in one bit is one bit (most of the time it is less) The example is based on Huffmann coding Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 22 / 53

  44. Communications essentials Security Outline Communications essentials 1 Communications and Redundancy Anderson and Petitcolas 1999 Digital Communications Shannon Entropy Security Prediction Compression 2 Huffmann Coding Huffmann Steganography Miscellanea 3 Synthesis by Grammar Redundancy in Images Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 23 / 53

  45. Communications essentials Security Cryptography Alice ciphertext Bob , → → m c m Eve Eve seeks information about m , observing c If I ( m ; c ) > 0 then Eve succeeds in theory or if I ( k ; c ) > 0 If H ( m | c ) = H ( m ) then the system is absolutely secure. The above are strong statements Even if Eve has information I ( m ; c ) > 0, she may be unable to make sense of it. Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 24 / 53

  46. Communications essentials Security Stegananalysis Question: Does Alice send secret information to Bob? Answer: X ∈ { yes , no } What is the uncertainty H ( X ) ? Eve intercepts a message S , Is there any information I ( X ; S ) ? If H ( X | S ) = H ( X ) , then the system is absolutely secure. Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 25 / 53

  47. Communications essentials Security Stegananalysis Question: Does Alice send secret information to Bob? Answer: X ∈ { yes , no } What is the uncertainty H ( X ) ? Eve intercepts a message S , Is there any information I ( X ; S ) ? If H ( X | S ) = H ( X ) , then the system is absolutely secure. Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 25 / 53

  48. Communications essentials Security Stegananalysis Question: Does Alice send secret information to Bob? Answer: X ∈ { yes , no } What is the uncertainty H ( X ) ? Eve intercepts a message S , Is there any information I ( X ; S ) ? If H ( X | S ) = H ( X ) , then the system is absolutely secure. Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 25 / 53

  49. Communications essentials Security Stegananalysis Question: Does Alice send secret information to Bob? Answer: X ∈ { yes , no } What is the uncertainty H ( X ) ? Eve intercepts a message S , Is there any information I ( X ; S ) ? If H ( X | S ) = H ( X ) , then the system is absolutely secure. Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 25 / 53

  50. Communications essentials Security Stegananalysis Question: Does Alice send secret information to Bob? Answer: X ∈ { yes , no } What is the uncertainty H ( X ) ? Eve intercepts a message S , Is there any information I ( X ; S ) ? If H ( X | S ) = H ( X ) , then the system is absolutely secure. Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 25 / 53

  51. Communications essentials Prediction Outline Communications essentials 1 Communications and Redundancy Anderson and Petitcolas 1999 Digital Communications Shannon Entropy Security Prediction Compression 2 Huffmann Coding Huffmann Steganography Miscellanea 3 Synthesis by Grammar Redundancy in Images Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 26 / 53

  52. Communications essentials Prediction Random sequences Text is a sequence of random samples (letters) ( l 1 , l 2 , l 3 , . . . ) ; l i ∈ A = { A , B , . . . , Z } Each letter has a probability distribution P ( l ) , l ∈ A . Statistical dependence (implies redundancy) P ( l i | l i − 1 ) � = P ( l i ) H ( l i | l i − 1 ) < H ( l i ) : Letter i − 1 contains information about l i Use this information to guess l i The more letters l i − j , . . . , l i − 1 we have seen the more reliable can we predict l i Wayner (Ch 6.1) gives example of first, second, . . . , fifth order prediction Using j = 0 , 1 , 2 , 3 , 4 Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 27 / 53

  53. Communications essentials Prediction First-order prediction Example from Wayner Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 28 / 53

  54. Communications essentials Prediction Second-order prediction Example from Wayner Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 29 / 53

  55. Communications essentials Prediction Third-order prediction Example from Wayner Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 30 / 53

  56. Communications essentials Prediction Fourth-order prediction Example from Wayner Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 31 / 53

  57. Communications essentials Prediction Markov models Markov source is a sequence M 1 , M 2 , . . . of stochastic (random) variables An n -th order Markov source completely described by probability distributions P [ M 1 , M 2 , . . . , M n ] P [ M i | M i − n , . . . , M i − 1 ] (identical for all i ) This is a finite-state machine (automaton) State of the source last n bits M i − n , . . . , M i − 1 determines probability distribution of next symbol The random texts from Wayner are generated using 1st, 2nd, 3rd, and 4th order Markov models Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 32 / 53

  58. Communications essentials Prediction A related example A group of MIT students software generating random ‘science’ papers random paper accepted for WMSCI 2005 You can generate your own paper on-line http://pdos.csail.mit.edu/scigen/ Source code available (SCIgen) If you are brave – as a poster topic modify SCIgen for steganography Or maybe for your dissertation if you have a related topic you can tweek Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 33 / 53

  59. Compression Huffmann Coding Outline Communications essentials 1 Communications and Redundancy Anderson and Petitcolas 1999 Digital Communications Shannon Entropy Security Prediction Compression 2 Huffmann Coding Huffmann Steganography Miscellanea 3 Synthesis by Grammar Redundancy in Images Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 34 / 53

  60. Compression Huffmann Coding Compression ❋ ∗ is set of binary strings of arbitrary length Definition A compression system is a function c : ❋ ∗ → ❋ ∗ , such that E ( length m ) > E ( length ( c ( m ))) when m is drawn from ❋ ∗ . The compressed string is expected to be shorter than the original. Definition A compression c is perfect if all target strings are used, i.e. if for any m ∈ ❋ ∗ , c − 1 ( m ) is a sensible file (cover-text). Decompress a random string, and it makes sense! Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 35 / 53

  61. Compression Huffmann Coding Huffmann Coding Short codewords for frequent quantities Long codewords for unusual quantities Each symbol (bit) should be equally probable � ���� ���� � � � � � � � � � � 0 1 � � � � 50% � � ���� ���� � ���� ���� � � � � � � � 0 1 � � 25% 25% Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 36 / 53

  62. Compression Huffmann Coding Example � ������������� � � � � � � � � 0 1 � � � � � � � ���� ���� � ���� ���� � � ���� ���� � � � � � � � � � � � � � � 0 � 0 � 1 1 � � � � � 25% � 25% 25% � � ���� ���� � � � � � � � � 0 1 � � � 12 1 � � 2 % � ���� ���� � ���� ���� � � � � � � � � 0 1 � 7 1 7 1 4 % 4 % Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 37 / 53

  63. Compression Huffmann Coding Decoding Huffmann codes are prefix free No codeword is the prefix of another This simplifies the decoding This is expressed in the Huffmann tree, follow edges for each coded bit (only) leaf node resolves to a message symbol When a message symbol is recovered, start over for next symbol. Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 38 / 53

  64. Compression Huffmann Coding Ideal Huffmann code Each branch equally likely: P ( b i | b i − 1 , b i − 2 , . . . ) = 1 / 2 Maximum entropy H ( B i | B i − 1 , B i − 2 , . . . ) = 1 uniform distribution of compressed files implies perfect compression In practice, the probabilities are rarely powers of 1 2 hence the Huffmann code is imperfect Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 39 / 53

  65. Compression Huffmann Steganography Outline Communications essentials 1 Communications and Redundancy Anderson and Petitcolas 1999 Digital Communications Shannon Entropy Security Prediction Compression 2 Huffmann Coding Huffmann Steganography Miscellanea 3 Synthesis by Grammar Redundancy in Images Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 40 / 53

  66. Compression Huffmann Steganography Reverse Huffmann Core Reading Peter Wayner: Disappearing Cryptography Ch. 6-7 Use a Huffmann code for each state in the Markov model Stegano-encoder: Huffmann decompression Stegano-decoder: Huffmann compression Is this similar to Anderson & Petitcolas Steganography by Perfect Compression? Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 41 / 53

  67. Compression Huffmann Steganography The Steganogram Steganogram looks like random text use probability distribution based on sample text higher-order statistics make it look natural Fifth-order statistics is reasonable Higher order will look more natural Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 42 / 53

  68. Compression Huffmann Steganography The Steganogram Steganogram looks like random text use probability distribution based on sample text higher-order statistics make it look natural Fifth-order statistics is reasonable Higher order will look more natural Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 42 / 53

  69. Compression Huffmann Steganography Example Fifth order For each 5-tupple of letters A 0 , A 1 , A 2 , A 3 , A 4 , Let l i − 4 , . . . , l i be consecutive letters in natural text tabulate P ( l i = A 0 | l i − j = A j , j = 1 , 2 , 3 , 4 ) For each 4-tuple A 1 , A 2 , A 3 , A 4 make an (approximate) Huffmann code for A 0 . we may ommit some values of A 0 , or have non-unique codewords We encode a message by Huffmann decompression using Huffmann code depending on the last four stegogramme symbols obtaining a fifth-order random text Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 43 / 53

  70. Compression Huffmann Steganography Example Fifth order Consider four preceeding letters comp Next letter may be letter r e l a o probability 40% 12% 22% 18% 8% combined 52% 22% 26% rounded 50% 25% 25% Rounding to power of 1 2 Combining several letters reduces rounding error. The example is arbitrary and fictuous. Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 44 / 53

  71. Compression Huffmann Steganography Example The Huffmann code Huffmann code based on fifth-order conditional probabilities � � ���� ���� � � � � � � � � � 0 1 � � � � r / e � ���� ���� � � ���� ���� � � � � � � � � 0 1 � � a / o l When two letters are possible, choose at random (according to probability in natural text) decoding (compression) is still unique encoding (decompression) is not unique This evens out the statistics in the stegogramme Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 45 / 53

  72. Miscellanea Synthesis by Grammar Outline Communications essentials 1 Communications and Redundancy Anderson and Petitcolas 1999 Digital Communications Shannon Entropy Security Prediction Compression 2 Huffmann Coding Huffmann Steganography Miscellanea 3 Synthesis by Grammar Redundancy in Images Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 46 / 53

  73. Miscellanea Synthesis by Grammar Grammar A grammar describes the structure of a language Simple grammar sentence → noun verb noun → Mr. Brown | Miss Scarlet verb → eats | drinks Each choice can map to a message symbol 0 : Mr. Brown, eats 1 : Miss Scarlet, drinks Two messages can be stego-encrypted No cover-text is input. Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 47 / 53

  74. Miscellanea Synthesis by Grammar More complex grammar sentence → noun verb addition noun → Mr. Brown | Miss Scarlet | . . . | Mrs. White verb → eats | drinks | celebrates | . . . | cooks addition → addition term | ∅ term → on Monday | in March | with Mr. Green | . . . | in Alaska | at home general → sentence | question question → Does noun verb addition ? xgeneral → general | sentence , because sentence Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 48 / 53

  75. Miscellanea Synthesis by Grammar More complex grammar sentence → noun verb addition noun → Mr. Brown | Miss Scarlet | . . . | Mrs. White verb → eats | drinks | celebrates | . . . | cooks addition → addition term | ∅ term → on Monday | in March | with Mr. Green | . . . | in Alaska | at home general → sentence | question question → Does noun verb addition ? xgeneral → general | sentence , because sentence Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 48 / 53

  76. Miscellanea Synthesis by Grammar More complex grammar sentence → noun verb addition noun → Mr. Brown | Miss Scarlet | . . . | Mrs. White verb → eats | drinks | celebrates | . . . | cooks addition → addition term | ∅ term → on Monday | in March | with Mr. Green | . . . | in Alaska | at home general → sentence | question question → Does noun verb addition ? xgeneral → general | sentence , because sentence Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 48 / 53

  77. Miscellanea Synthesis by Grammar Is this practical? Exercise Choose either the reverse Huffmann or the grammar-based steganography technique, and write a short critique (approx. 1 page) where you answer some of the following questions. How can you do steganalysis? Under what condition will it be secure? Is the system practical? Useful? Which implementation issues do you foresee? How could it be implemented? Could the technique estend to images? Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 49 / 53

  78. Miscellanea Redundancy in Images Outline Communications essentials 1 Communications and Redundancy Anderson and Petitcolas 1999 Digital Communications Shannon Entropy Security Prediction Compression 2 Huffmann Coding Huffmann Steganography Miscellanea 3 Synthesis by Grammar Redundancy in Images Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2008 50 / 53

Recommend


More recommend