the method of types and its application to information
play

The Method of Types and Its Application to Information Hiding - PowerPoint PPT Presentation

The Method of Types and Its Application to Information Hiding Pierre Moulin University of Illinois at Urbana-Champaign www.ifp.uiuc.edu/ moulin/talks/eusipco05-slides.pdf EUSIPCO Antalya, September 7, 2005 1


  1. ✬ ✩ The Method of Types and Its Application to Information Hiding Pierre Moulin University of Illinois at Urbana-Champaign www.ifp.uiuc.edu/ ˜ moulin/talks/eusipco05-slides.pdf EUSIPCO Antalya, September 7, 2005 ✫ ✪ 1

  2. ✬ ✩ Outline • Part I: General Concepts – Introduction – Definitions – What is it useful for? • Part II: Application to Information Hiding – Performance guarantees against omnipotent attacker? – Steganography, Watermarking, Fingerprinting ✫ ✪ 2

  3. ✬ ✩ Part I: General Concepts ✫ ✪ 3

  4. ✬ ✩ Reference Materials • I. Csiszar, “The Method of Types”, IEEE Trans. Information Theory , Oct. 1998 (commemorative Shannon issue) • A. Lapidoth and P. Narayan, “Reliable Communication under Channel Uncertainty”, same issue. • Application areas: – capacity analyses – computation of error probabilities (exponential behavior) – universal coding/decoding – hypothesis testing ✫ ✪ 4

  5. ✬ ✩ Basic Notation • Discrete alphabets X and Y • Random variables X, Y with joint pmf p ( x, y ) • The entropy of X is H ( X ) = − � x ∈X p ( x ) log p ( x ) (will sometimes be denoted by H ( p X )) • Joint entropy H ( X, Y ) = − � � y ∈Y p ( x, y ) log p ( x, y ) x ∈X • The conditional entropy of Y given X is � � H ( Y | X ) = − p ( x, y ) log p ( y | x ) x ∈X y ∈Y = H ( X, Y ) − H ( X ) ✫ ✪ 5

  6. ✬ ✩ • The mutual information between X and Y is p ( x, y ) log p ( x, y ) � � I ( X ; Y ) = p ( x ) p ( y ) x ∈X y ∈Y = H ( Y ) − H ( Y | X ) • The Kullback-Leibler divergence between pmf’s p and q is p ( x ) log p ( x ) � D ( p || q ) = q ( x ) x ∈X ✫ ✪ 6

  7. ✬ ✩ Types • Deterministic notion • Given a length- n sequence x ∈ X n , count the frequency of occurrence of each letter of the alphabet X • Example: X = { 0 , 1 } , n = 12, x = 110100101110 contains 5 zeroes and 7 ones p x = ( 5 12 , 7 ⇒ the sequence x has type ˆ 12 ) • ˆ p x is also called empirical pmf. It may be viewed as a pmf over X p x ( x ) is a multiple of 1 • Each ˆ n . ✫ ✪ 7

  8. ✬ ✩ Joint Types • Given two length- n sequences x ∈ X n and y ∈ Y n , count the frequency of occurrence of each pair ( x, y ) ∈ X × Y • Example: x = 110100101110 y = 111100101110    4 / 12 1 / 12 • ( x , y ) have joint type ˆ p xy =  0 7 / 12 • Empirical pmf over X × Y ✫ ✪ 8

  9. ✬ ✩ Conditional Types • By analogy with Bayes rule, define the conditional type of y given x as p y | x ( y | x ) = ˆ p xy ( x, y ) ˆ p x ( x ) ˆ which is an empirical conditional pmf • Example: x = 110100101110 y = 111100101110    4 / 5 1 / 5 ⇒ ˆ p y | x =  0 1 ✫ ✪ 9

  10. ✬ ✩ Type Classes • The type class T x is the set of all sequences that have the same type as x . Example: all sequences with 5 zeroes and 7 ones • The joint type class T xy is the set of all sequences that have the same joint type as ( x , y ) • The conditional type class T y | x is the set of all sequences y ′ that have the same type as y , conditioned on x ✫ ✪ 10

  11. ✬ ✩ Information Measures • Any type may be represented by a dummy sequence • Can define empirical information measures: � H ( x ) H (ˆ p x ) � H ( y | x ) H (ˆ p y | x ) � I ( x ; y ) I ( X ; Y ) for ( X, Y ) ∼ ˆ p xy • Will be useful to design universal decoders ✫ ✪ 11

  12. ✬ ✩ Typicality • Consider pmf p over X • Length- n sequence x ∼ i.i.d. p . Notation: x ∼ p n • Example: X = { 0 , 1 } , n = 12, x = 110100101110 • For large n , all typical sequences have approximately composition p • This can be measured in various ways: – Entropy ǫ -typicality: | 1 n log p n ( x ) − H ( X ) | < ǫ – Strong ǫ -typicality: max x ∈X | ˆ p x ( x ) − p ( x ) | < ǫ both define sets of typical sequences ✫ ✪ 12

  13. ✬ ✩ Application to Channel Coding • Channel input x = ( x 1 , · · · , x n ) ∈ X n , output y = ( y 1 , · · · , y n ) ∈ Y n • Discrete Memoryless Channel (DMC): p n ( y | x ) = � n i =1 p ( y i | x i ) • Many fundamental coding theorems can be proven using the concept of entropy typicality. Examples: – Shannon’s coding theorem (capacity of DMC) – Rate-distortion bound for memoryless sources ✫ ✪ 13

  14. ✬ ✩ • Many fundamental coding theorems cannot be proved using the concept of entropy typicality. Examples: – precise calculations of error log-probability – various kinds of unknown channels • So let’s derive some useful facts about types • Number of types ≤ ( n + 1) |X| (polynomial in n ) • Size of type class T x : p x ) ≤ | T x | ≤ e nH (ˆ ( n + 1) −|X| e nH (ˆ p x ) Ignoring polynomial terms, we write | T x | . = e nH (ˆ p x ) ✫ ✪ 14

  15. ✬ ✩ • Probability of x under distribution p n : � p ( x ) n ˆ p x ( x ) p n ( x ) = x ∈X e − n � x ∈X ˆ p x ( x ) log p ( x ) = e − n [ H (ˆ p x )+ D (ˆ p x || p )] = same for all x in the same type class • Probability of type class T x under distribution p n : P n ( T x ) = | T x | p n ( x ) . = e − nD (ˆ p x || p ) • Similarly: | T y | x | . = e nH (ˆ p y | x ) Y | X ( T y | x | x ) . = e − nD (ˆ p xy || p Y | X ˆ p x ) P n ✫ ✪ 15

  16. ✬ ✩ Constant-Composition Codes • All codewords have the same type ˆ p x • Random coding : generate codewords x m , m ∈ M randomly and independently from uniform pmf on type class T x • Note that channel outputs have different types in general ✫ ✪ 16

  17. ✬ ✩ Unknown DMC’s – Universal Codes • Channel p Y | X is revealed neither to encoder nor to decoder ⇒ neither encoding rule nor decoding rule may depend on p Y | X C = max p X min p Y | X I ( X ; Y ) • Universal codes: same error exponent as in known- p Y | X case (existence?) • Encoder : select T x , use constant-composition codes • Decoder : uses Maximum Mutual Information rule ˆ = argmax m ∈M I ( x m ; y ) m = argmin m ∈M H ( y | x m ) • Note: the GLRT decoder is in general not universal (GLRT: first estimate p Y | X , then plug in ML decoding rule) ✫ ✪ 17

  18. ✬ ✩ Key idea in proof • Denote by D m ⊂ Y n the decoding region for message m • Polynomial number of type classes, forming a partition of Y n • Given that m was transmitted, partition error event y ∈ Y n \ D m into a union over type classes: � y ∈ T y | x m \ D m T y | x m ✫ ✪ 18

  19. ✬ ✩ • The probability of the error event is therefore given by    � Pr [error | m ] = Pr T y | x m \ D m  T y | x m � � � ≤ Pr T y | x m \ D m T y | x m . � � = max Pr T y | x m \ D m T y | x m Pr [ T y | x m ] | T y | x m \ D m | = max | T y | x m | T y | x m p x m ) | T y | x m \ D m | . e − nD (ˆ p x m y || p Y | X ˆ = max | T y | x m | T y | x m ⇒ the worst conditional type class dominates error probability • Calculation mostly involves combinatorics: finding out | T y | x m \ D m | ✫ ✪ 19

  20. ✬ ✩ Extensions • Channels with memory • “Arbitrary Varying” Channels ⇒ randomized codes • Continuous alphabets (difficult!) ✫ ✪ 20

  21. ✬ ✩ Part II: Applications to WM ✫ ✪ 21

  22. ✬ ✩ Reference Materials [SM’03 ] A. Somekh-Baruch and N. Merhav, “On the Error Exponent and Capacity Games of Private Watermarking Systems,” IEEE Trans. Information Theory , March 2003 [SM’04 ] A. Somekh-Baruch and N. Merhav, “On the Capacity Game of Public Watermarking Systems,” IEEE Trans. Information Theory , March 2004 [MO’03 ] P. Moulin and J. O’Sullivan, “Information-Theoretic Analysis of Information Hiding,” IEEE Trans. Information Theory , March 2003 [MW’04 ] P. Moulin and Y. Wang, “Error Exponents for Channel Coding with Side Information,” preprint , Sep. 2004 ✫ ✪ 22

  23. ✬ ✩ Communication Model for Data Hiding Decoder Attack Encoder ^ x y M M y x g( , ) y k Message f( ,m, ) s k p( | ) s Host k Key • Memoryless host sequence s • Message M uniformly distributed over { 1 , 2 , · · · , 2 nR } • Unknown attack channel p ( y | x ) • Randomization via secret key sequence k , arbitrary alphabet K ✫ ✪ 23

  24. ✬ ✩ Attack Channel Model • First IT formulations of this problem assumed a fixed attack channel (e.g., AWGN) or a family of memoryless channels (1998-1999) • Memoryless assumption was later relaxed (2001) • We’ll just require the following distortion constraint: n � d n ( x , y ) � d ( x i , y i ) ≤ D 2 ∀ x , y (wp1) i =1 ⇒ unknown channel with arbitrary memory • Similarly the following embedding constraint will be assumed: d n ( s , x ) ≤ D 1 ∀ s , k , m, x (wp1) ✫ ✪ 24

  25. ✬ ✩ Data-Hiding Capacity [SM’04] • Single-letter formula: C ( D 1 , D 2 ) = sup p ( y | x ) ∈A ( D 2 ) [ I ( U ; Y ) − I ( U ; S )] min p ( x,u | s ) ∈Q ( D 1 ) where U is an auxiliary random variable Q ( D 1 ) = { p XU | S : � x,u,s p ( x, u | s ) p ( s ) d ( s, x ) ≤ D 1 } A ( D 2 ) = { p Y | X : � x,y p ( y | x ) p ( x ) d ( x, y ) ≤ D 2 } • Same capacity formula as in [MO’03], where p ( y | x ) was constrained to belong to the family A n ( D 2 ) of memoryless channels • Why? ✫ ✪ 25

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend