Previously... Joint typical sequences Covering and Packing Lemmas - PowerPoint PPT Presentation

Lecture 12 Capacity of non-white Gaussian channels Parallel Gaussian channels Consider that we have K parallel channels ( K bands) and the corresponding noise powers are σ 2 1 , σ 2 2 , · · · , σ 2 K S. Cheng (OU-Tulsa) November 1, 2017 7 / 26

Lecture 12 Capacity of non-white Gaussian channels Parallel Gaussian channels Consider that we have K parallel channels ( K bands) and the corresponding noise powers are σ 2 1 , σ 2 2 , · · · , σ 2 K And say, we can allocate a total of P power to all channels. The powers assigned to the channels are P 1 , P 2 , · · · , P K . So we need � K i =1 P i ≤ P S. Cheng (OU-Tulsa) November 1, 2017 7 / 26

Lecture 12 Capacity of non-white Gaussian channels Parallel Gaussian channels Consider that we have K parallel channels ( K bands) and the corresponding noise powers are σ 2 1 , σ 2 2 , · · · , σ 2 K And say, we can allocate a total of P power to all channels. The powers assigned to the channels are P 1 , P 2 , · · · , P K . So we need � K i =1 P i ≤ P � � Therefore, for the k -th channel, we can transmit 1 1 + P k 2 log bits σ 2 k per channel use S. Cheng (OU-Tulsa) November 1, 2017 7 / 26

Lecture 12 Capacity of non-white Gaussian channels Parallel Gaussian channels Consider that we have K parallel channels ( K bands) and the corresponding noise powers are σ 2 1 , σ 2 2 , · · · , σ 2 K And say, we can allocate a total of P power to all channels. The powers assigned to the channels are P 1 , P 2 , · · · , P K . So we need � K i =1 P i ≤ P � � Therefore, for the k -th channel, we can transmit 1 1 + P k 2 log bits σ 2 k per channel use So our goal is to assign P 1 , P 2 , · · · , P K ≥ 0 ( � K k =1 P k ≤ P ) such that the total capacity K � � 1 1 + P k � 2 log σ 2 k k =1 is maximize S. Cheng (OU-Tulsa) November 1, 2017 7 / 26

Lecture 12 Capacity of non-white Gaussian channels KKT conditions Let’s list all the KKT conditions for the optimization problem K 1 � � 1 + P k � max 2 log such that σ 2 k k =1 K � P 1 , · · · , P K ≥ 0 , P k ≤ P k =1 � K � K K �� ∂ 1 � 1 + P k � � � � λ k P k − µ P k − P 2 log + = 0 σ 2 ∂ P i k k =1 k =1 k =1 S. Cheng (OU-Tulsa) November 1, 2017 8 / 26

Lecture 12 Capacity of non-white Gaussian channels KKT conditions Let’s list all the KKT conditions for the optimization problem K 1 � � 1 + P k � max 2 log such that σ 2 k k =1 K � P 1 , · · · , P K ≥ 0 , P k ≤ P k =1 � K � K K �� ∂ 1 � 1 + P k � � � � λ k P k − µ P k − P 2 log + = 0 σ 2 ∂ P i k k =1 k =1 k =1 µ, λ 1 , · · · , λ K ≥ 0 S. Cheng (OU-Tulsa) November 1, 2017 8 / 26

Lecture 12 Capacity of non-white Gaussian channels KKT conditions Let’s list all the KKT conditions for the optimization problem K 1 � � 1 + P k � max 2 log such that σ 2 k k =1 K � P 1 , · · · , P K ≥ 0 , P k ≤ P k =1 � K � K K �� ∂ 1 � 1 + P k � � � � λ k P k − µ P k − P 2 log + = 0 σ 2 ∂ P i k k =1 k =1 k =1 K � µ, λ 1 , · · · , λ K ≥ 0 , P 1 , · · · , P K ≥ 0 , P k ≤ P k =1 S. Cheng (OU-Tulsa) November 1, 2017 8 / 26

Lecture 12 Capacity of non-white Gaussian channels KKT conditions Let’s list all the KKT conditions for the optimization problem K 1 � � 1 + P k � max 2 log such that σ 2 k k =1 K � P 1 , · · · , P K ≥ 0 , P k ≤ P k =1 � K � K K �� ∂ 1 � 1 + P k � � � � λ k P k − µ P k − P 2 log + = 0 σ 2 ∂ P i k k =1 k =1 k =1 K � µ, λ 1 , · · · , λ K ≥ 0 , P 1 , · · · , P K ≥ 0 , P k ≤ P k =1 � K � � µ P k − P = 0 , λ k P k = 0 , ∀ k k =1 S. Cheng (OU-Tulsa) November 1, 2017 8 / 26

Lecture 12 Capacity of non-white Gaussian channels Capacity of parallel channels � K � K K �� ∂ 1 � � 1 + P k � � � 2 log + λ k P k − µ P k − P = 0 σ 2 ∂ P i k k =1 k =1 k =1 S. Cheng (OU-Tulsa) November 1, 2017 9 / 26

Lecture 12 Capacity of non-white Gaussian channels Capacity of parallel channels � K � K K �� ∂ 1 � � 1 + P k � � � 2 log + λ k P k − µ P k − P = 0 σ 2 ∂ P i k k =1 k =1 k =1 ⇒ 1 1 = µ − λ i P i + σ 2 2 i S. Cheng (OU-Tulsa) November 1, 2017 9 / 26

Lecture 12 Capacity of non-white Gaussian channels Capacity of parallel channels � K � K K �� ∂ 1 � � 1 + P k � � � 2 log + λ k P k − µ P k − P = 0 σ 2 ∂ P i k k =1 k =1 k =1 ⇒ 1 1 1 = µ − λ i ⇒ P i + σ 2 i = P i + σ 2 2 2( µ − λ i ) i S. Cheng (OU-Tulsa) November 1, 2017 9 / 26

Lecture 12 Capacity of non-white Gaussian channels Capacity of parallel channels � K � K K �� ∂ 1 � � 1 + P k � � � 2 log + λ k P k − µ P k − P = 0 σ 2 ∂ P i k k =1 k =1 k =1 ⇒ 1 1 1 = µ − λ i ⇒ P i + σ 2 i = P i + σ 2 2 2( µ − λ i ) i Since λ i P i = 0, for P i > 0, we have λ i = 0 and thus i = 1 P i + σ 2 2 µ S. Cheng (OU-Tulsa) November 1, 2017 9 / 26

Lecture 12 Capacity of non-white Gaussian channels Capacity of parallel channels � K � K K �� ∂ 1 � � 1 + P k � � � 2 log + λ k P k − µ P k − P = 0 σ 2 ∂ P i k k =1 k =1 k =1 ⇒ 1 1 1 = µ − λ i ⇒ P i + σ 2 i = P i + σ 2 2 2( µ − λ i ) i Since λ i P i = 0, for P i > 0, we have λ i = 0 and thus i = 1 P i + σ 2 2 µ This suggests that µ > 0 and thus � K k =1 P k = P S. Cheng (OU-Tulsa) November 1, 2017 9 / 26

Lecture 12 Capacity of non-white Gaussian channels Capacity of parallel channels � K � K K �� ∂ 1 � � 1 + P k � � � 2 log + λ k P k − µ P k − P = 0 σ 2 ∂ P i k k =1 k =1 k =1 ⇒ 1 1 1 = µ − λ i ⇒ P i + σ 2 i = P i + σ 2 2 2( µ − λ i ) i Since λ i P i = 0, for P i > 0, we have λ i = 0 and thus i = 1 P i + σ 2 2 µ = constant This suggests that µ > 0 and thus � K k =1 P k = P S. Cheng (OU-Tulsa) November 1, 2017 9 / 26

Lecture 12 Capacity of non-white Gaussian channels Water-filling interpretation From P i + σ 2 i = const , power can be allocated intuitively as filling water to a pond (hence “water-filling”) Example S. Cheng (OU-Tulsa) November 1, 2017 10 / 26

Lecture 12 Capacity of non-white Gaussian channels Water-filling interpretation From P i + σ 2 i = const , power can be allocated intuitively as filling water to a pond (hence “water-filling”) Example P 1 = 0 , P 2 = 0 . 3 , P 3 = 0 . 6 , P 4 = 0 , P 5 = 0 S. Cheng (OU-Tulsa) November 1, 2017 10 / 26

Lecture 12 Capacity of non-white Gaussian channels Water-filling interpretation From P i + σ 2 i = const , power can be allocated intuitively as filling water to a pond (hence “water-filling”) Example P 1 = 0 , P 2 = 0 . 3 , P 3 = 0 . 6 , P 4 = 0 , P 5 = 0 P 1 = 0 , P 2 = 0 . 8 , P 3 = 1 . 1 , P 4 = 0 . 3 , P 5 = 0 S. Cheng (OU-Tulsa) November 1, 2017 10 / 26

Lecture 12 Capacity of non-white Gaussian channels Water-filling interpretation From P i + σ 2 i = const , power can be allocated intuitively as filling water to a pond (hence “water-filling”) Example P 1 = 0 , P 2 = 0 . 3 , P 3 = 0 . 6 , P 4 = 0 , P 5 = 0 P 1 = 0 , P 2 = 0 . 8 , P 3 = 1 . 1 , P 4 = 0 . 3 , P 5 = 0 P 1 = 0 . 5 , P 2 = 1 . 5 , P 3 = 1 . 8 , P 4 = 1 , P 5 = 0 S. Cheng (OU-Tulsa) November 1, 2017 10 / 26

Lecture 12 Rate-distortion problem Rate-distortion problem X N m p ( x ) ˆ Encoder Decoder X N We know that H ( X ) bits are needed on average to represent each sample of a source X S. Cheng (OU-Tulsa) November 1, 2017 11 / 26

Lecture 12 Rate-distortion problem Rate-distortion problem X N m p ( x ) ˆ Encoder Decoder X N We know that H ( X ) bits are needed on average to represent each sample of a source X If X is continuous, there is no way to recover X precisely S. Cheng (OU-Tulsa) November 1, 2017 11 / 26

Lecture 12 Rate-distortion problem Rate-distortion problem X N m p ( x ) ˆ Encoder Decoder X N We know that H ( X ) bits are needed on average to represent each sample of a source X If X is continuous, there is no way to recover X precisely Let say we are satisfied as long as we can recover X up to certain fidelity, how many bits are needed per sample? S. Cheng (OU-Tulsa) November 1, 2017 11 / 26

Lecture 12 Rate-distortion problem Rate-distortion problem X N m p ( x ) ˆ Encoder Decoder X N We know that H ( X ) bits are needed on average to represent each sample of a source X If X is continuous, there is no way to recover X precisely Let say we are satisfied as long as we can recover X up to certain fidelity, how many bits are needed per sample? There is an apparent rate (bits per sample) and distortion (fidelity) trade-off. We expect that needed rate is smaller if we allow a lower fidelity (higher distortion). What we are really interested in is a rate-distortion function S. Cheng (OU-Tulsa) November 1, 2017 11 / 26

Lecture 12 Rate-distortion problem Rate-distortion function m ∈ { 1 , 2 , · · · , M } X N m p ( x ) ˆ Encoder Decoder X N S. Cheng (OU-Tulsa) November 1, 2017 12 / 26

Lecture 12 Rate-distortion problem Rate-distortion function m ∈ { 1 , 2 , · · · , M } X N m p ( x ) ˆ Encoder Decoder X N N R = log M X N , X N )] = 1 D = E [ d ( ˆ � d ( ˆ , X i , X i ) N N i =1 S. Cheng (OU-Tulsa) November 1, 2017 12 / 26

Lecture 12 Rate-distortion problem Rate-distortion function m ∈ { 1 , 2 , · · · , M } X N m p ( x ) ˆ Encoder Decoder X N N R = log M X N , X N )] = 1 D = E [ d ( ˆ � d ( ˆ , X i , X i ) N N i =1 Maybe you can guess at this point. For given X and ˆ X , the required rate is simply I ( X ; ˆ X ) S. Cheng (OU-Tulsa) November 1, 2017 12 / 26

Lecture 12 Rate-distortion problem Rate-distortion function m ∈ { 1 , 2 , · · · , M } X N m p ( x ) ˆ Encoder Decoder X N N R = log M X N , X N )] = 1 D = E [ d ( ˆ � d ( ˆ , X i , X i ) N N i =1 Maybe you can guess at this point. For given X and ˆ X , the required rate is simply I ( X ; ˆ X ) How is it related to the distortion though? S. Cheng (OU-Tulsa) November 1, 2017 12 / 26

Lecture 12 Rate-distortion problem Rate-distortion function m ∈ { 1 , 2 , · · · , M } X N m p ( x ) ˆ Encoder Decoder X N N R = log M X N , X N )] = 1 D = E [ d ( ˆ � d ( ˆ , X i , X i ) N N i =1 Maybe you can guess at this point. For given X and ˆ X , the required rate is simply I ( X ; ˆ X ) How is it related to the distortion though? x | x ) such that E [ d ( ˆ X N , X N )] Note that we have a freedom to pick p (ˆ (less than or) equal to the desired D S. Cheng (OU-Tulsa) November 1, 2017 12 / 26

Lecture 12 Rate-distortion problem Rate-distortion function m ∈ { 1 , 2 , · · · , M } X N m p ( x ) ˆ Encoder Decoder X N N R = log M X N , X N )] = 1 D = E [ d ( ˆ � d ( ˆ , X i , X i ) N N i =1 Maybe you can guess at this point. For given X and ˆ X , the required rate is simply I ( X ; ˆ X ) How is it related to the distortion though? x | x ) such that E [ d ( ˆ X N , X N )] Note that we have a freedom to pick p (ˆ (less than or) equal to the desired D Therefore given D , the rate-distortion function is simply x | x ) I ( ˆ R ( D ) = min p (ˆ X ; X ) such that E [ d ( ˆ X N , X N )] ≤ D S. Cheng (OU-Tulsa) November 1, 2017 12 / 26

Lecture 12 Rate-distortion problem Binary symmetric source Let’s try to compress outcome from a fair coin toss S. Cheng (OU-Tulsa) November 1, 2017 13 / 26

Lecture 12 Rate-distortion problem Binary symmetric source Let’s try to compress outcome from a fair coin toss We know that we need 1 bit to compress the outcome losslessly, what if we have only 0.5 bit per sample? S. Cheng (OU-Tulsa) November 1, 2017 13 / 26

Lecture 12 Rate-distortion problem Binary symmetric source Let’s try to compress outcome from a fair coin toss We know that we need 1 bit to compress the outcome losslessly, what if we have only 0.5 bit per sample? In this case, we can’t losslessly recover the outcome. But how good will we do? S. Cheng (OU-Tulsa) November 1, 2017 13 / 26

Lecture 12 Rate-distortion problem Binary symmetric source Let’s try to compress outcome from a fair coin toss We know that we need 1 bit to compress the outcome losslessly, what if we have only 0.5 bit per sample? In this case, we can’t losslessly recover the outcome. But how good will we do? We need to introduce a distortion measure first. Note that we have two types of errors: taking head as tail and taking tail as head. A natural measure will just weights both error equally d ( X = H , ˆ X = T ) = d ( X = T , ˆ X = H ) = 1 d ( X = H , ˆ X = H ) = d ( X = T , ˆ X = T ) = 0 S. Cheng (OU-Tulsa) November 1, 2017 13 / 26

Lecture 12 Rate-distortion problem Binary symmetric source Let’s try to compress outcome from a fair coin toss We know that we need 1 bit to compress the outcome losslessly, what if we have only 0.5 bit per sample? In this case, we can’t losslessly recover the outcome. But how good will we do? We need to introduce a distortion measure first. Note that we have two types of errors: taking head as tail and taking tail as head. A natural measure will just weights both error equally d ( X = H , ˆ X = T ) = d ( X = T , ˆ X = H ) = 1 d ( X = H , ˆ X = H ) = d ( X = T , ˆ X = T ) = 0 If rate is > 1 bit, we know that distortion is 0. How about rate is 0, what distortion suppose to be? S. Cheng (OU-Tulsa) November 1, 2017 13 / 26

Lecture 12 Rate-distortion problem Binary symmetric source Let’s try to compress outcome from a fair coin toss We know that we need 1 bit to compress the outcome losslessly, what if we have only 0.5 bit per sample? In this case, we can’t losslessly recover the outcome. But how good will we do? We need to introduce a distortion measure first. Note that we have two types of errors: taking head as tail and taking tail as head. A natural measure will just weights both error equally d ( X = H , ˆ X = T ) = d ( X = T , ˆ X = H ) = 1 d ( X = H , ˆ X = H ) = d ( X = T , ˆ X = T ) = 0 If rate is > 1 bit, we know that distortion is 0. How about rate is 0, what distortion suppose to be? If decoders know nothing, the best bet will be just always decode head (or tail). Then D = E [ d ( X , H )] = 0 . 5 S. Cheng (OU-Tulsa) November 1, 2017 13 / 26

Lecture 12 Rate-distortion problem Binary symmetric source For 0 < D < 0 . 5, denote Z as the prediction error such that X = ˆ X + Z . S. Cheng (OU-Tulsa) November 1, 2017 14 / 26

Lecture 12 Rate-distortion problem Binary symmetric source For 0 < D < 0 . 5, denote Z as the prediction error such that X = ˆ X + Z . Note that Pr ( Z = 1) = D S. Cheng (OU-Tulsa) November 1, 2017 14 / 26

Lecture 12 Rate-distortion problem Binary symmetric source For 0 < D < 0 . 5, denote Z as the prediction error such that X = ˆ X + Z . Note that Pr ( Z = 1) = D x | x ) I ( ˆ x | x ) H ( X ) − H ( X | ˆ R = min p (ˆ X ; X ) = min p (ˆ X ) S. Cheng (OU-Tulsa) November 1, 2017 14 / 26

Lecture 12 Rate-distortion problem Binary symmetric source For 0 < D < 0 . 5, denote Z as the prediction error such that X = ˆ X + Z . Note that Pr ( Z = 1) = D x | x ) I ( ˆ x | x ) H ( X ) − H ( X | ˆ R = min p (ˆ X ; X ) = min p (ˆ X ) x | x ) H ( X ) − H ( ˆ X + Z | ˆ = min p (ˆ X ) S. Cheng (OU-Tulsa) November 1, 2017 14 / 26

Lecture 12 Rate-distortion problem Binary symmetric source For 0 < D < 0 . 5, denote Z as the prediction error such that X = ˆ X + Z . Note that Pr ( Z = 1) = D x | x ) I ( ˆ x | x ) H ( X ) − H ( X | ˆ R = min p (ˆ X ; X ) = min p (ˆ X ) x | x ) H ( X ) − H ( ˆ X + Z | ˆ = min p (ˆ X ) x | x ) H ( X ) − H ( Z | ˆ = min p (ˆ X ) S. Cheng (OU-Tulsa) November 1, 2017 14 / 26

Lecture 12 Rate-distortion problem Binary symmetric source For 0 < D < 0 . 5, denote Z as the prediction error such that X = ˆ X + Z . Note that Pr ( Z = 1) = D x | x ) I ( ˆ x | x ) H ( X ) − H ( X | ˆ R = min p (ˆ X ; X ) = min p (ˆ X ) x | x ) H ( X ) − H ( ˆ X + Z | ˆ = min p (ˆ X ) x | x ) H ( X ) − H ( Z | ˆ = min p (ˆ X ) x | x ) H ( X ) − H ( Z ) = min p (ˆ S. Cheng (OU-Tulsa) November 1, 2017 14 / 26

Lecture 12 Rate-distortion problem Binary symmetric source For 0 < D < 0 . 5, denote Z as the prediction error such that X = ˆ X + Z . Note that Pr ( Z = 1) = D x | x ) I ( ˆ x | x ) H ( X ) − H ( X | ˆ R = min p (ˆ X ; X ) = min p (ˆ X ) x | x ) H ( X ) − H ( ˆ X + Z | ˆ = min p (ˆ X ) 1 x | x ) H ( X ) − H ( Z | ˆ = min p (ˆ X ) 0 . 8 x | x ) H ( X ) − H ( Z ) = min p (ˆ 0 . 6 R ( D ) 0 . 4 = 1 − H ( D ) 0 . 2 D 0 . 1 0 . 2 0 . 3 0 . 4 0 . 5 S. Cheng (OU-Tulsa) November 1, 2017 14 / 26

Lecture 12 Rate-distortion problem Gaussian source Consider X ∼ N (0 , σ 2 X ). To determine the rate-distortion function, we need first to decide the distortion measure. An intuitive will be just the square error. That is, d ( ˆ X , X ) = ( ˆ X − X ) 2 S. Cheng (OU-Tulsa) November 1, 2017 15 / 26

Lecture 12 Rate-distortion problem Gaussian source Consider X ∼ N (0 , σ 2 X ). To determine the rate-distortion function, we need first to decide the distortion measure. An intuitive will be just the square error. That is, d ( ˆ X , X ) = ( ˆ X − X ) 2 Given E [ d ( ˆ X , X )] = D , what is the minimum rate required? S. Cheng (OU-Tulsa) November 1, 2017 15 / 26

Lecture 12 Rate-distortion problem Gaussian source Consider X ∼ N (0 , σ 2 X ). To determine the rate-distortion function, we need first to decide the distortion measure. An intuitive will be just the square error. That is, d ( ˆ X , X ) = ( ˆ X − X ) 2 Given E [ d ( ˆ X , X )] = D , what is the minimum rate required? Like before, let us denote Z = X − ˆ X as the prediction error. Note that Var ( Z ) = D S. Cheng (OU-Tulsa) November 1, 2017 15 / 26

Lecture 12 Rate-distortion problem Gaussian source Consider X ∼ N (0 , σ 2 X ). To determine the rate-distortion function, we need first to decide the distortion measure. An intuitive will be just the square error. That is, d ( ˆ X , X ) = ( ˆ X − X ) 2 Given E [ d ( ˆ X , X )] = D , what is the minimum rate required? Like before, let us denote Z = X − ˆ X as the prediction error. Note that Var ( Z ) = D x | x ) I ( ˆ x | x ) h ( X ) − h ( X | ˆ R ( D ) = min p (ˆ X ; X ) = min p (ˆ X ) S. Cheng (OU-Tulsa) November 1, 2017 15 / 26

Lecture 12 Rate-distortion problem Gaussian source Consider X ∼ N (0 , σ 2 X ). To determine the rate-distortion function, we need first to decide the distortion measure. An intuitive will be just the square error. That is, d ( ˆ X , X ) = ( ˆ X − X ) 2 Given E [ d ( ˆ X , X )] = D , what is the minimum rate required? Like before, let us denote Z = X − ˆ X as the prediction error. Note that Var ( Z ) = D x | x ) I ( ˆ x | x ) h ( X ) − h ( X | ˆ R ( D ) = min p (ˆ X ; X ) = min p (ˆ X ) x | x ) h ( X ) − h ( Z + ˆ X | ˆ = min p (ˆ X ) S. Cheng (OU-Tulsa) November 1, 2017 15 / 26

Lecture 12 Rate-distortion problem Gaussian source Consider X ∼ N (0 , σ 2 X ). To determine the rate-distortion function, we need first to decide the distortion measure. An intuitive will be just the square error. That is, d ( ˆ X , X ) = ( ˆ X − X ) 2 Given E [ d ( ˆ X , X )] = D , what is the minimum rate required? Like before, let us denote Z = X − ˆ X as the prediction error. Note that Var ( Z ) = D x | x ) I ( ˆ x | x ) h ( X ) − h ( X | ˆ R ( D ) = min p (ˆ X ; X ) = min p (ˆ X ) x | x ) h ( X ) − h ( Z + ˆ X | ˆ = min p (ˆ X ) x | x ) h ( X ) − h ( Z | ˆ = min p (ˆ X ) S. Cheng (OU-Tulsa) November 1, 2017 15 / 26

Lecture 12 Rate-distortion problem Gaussian source Consider X ∼ N (0 , σ 2 X ). To determine the rate-distortion function, we need first to decide the distortion measure. An intuitive will be just the square error. That is, d ( ˆ X , X ) = ( ˆ X − X ) 2 Given E [ d ( ˆ X , X )] = D , what is the minimum rate required? Like before, let us denote Z = X − ˆ X as the prediction error. Note that Var ( Z ) = D x | x ) I ( ˆ x | x ) h ( X ) − h ( X | ˆ R ( D ) = min p (ˆ X ; X ) = min p (ˆ X ) x | x ) h ( X ) − h ( Z + ˆ X | ˆ = min p (ˆ X ) x | x ) h ( X ) − h ( Z | ˆ = min p (ˆ X ) = min p (ˆ x | x ) h ( X ) − h ( Z ) S. Cheng (OU-Tulsa) November 1, 2017 15 / 26

Lecture 12 Rate-distortion problem Gaussian source Consider X ∼ N (0 , σ 2 X ). To determine the rate-distortion function, we need first to decide the distortion measure. An intuitive will be just the square error. That is, d ( ˆ X , X ) = ( ˆ X − X ) 2 Given E [ d ( ˆ X , X )] = D , what is the minimum rate required? Like before, let us denote Z = X − ˆ X as the prediction error. Note that Var ( Z ) = D x | x ) I ( ˆ x | x ) h ( X ) − h ( X | ˆ R ( D ) = min p (ˆ X ; X ) = min p (ˆ X ) x | x ) h ( X ) − h ( Z + ˆ X | ˆ = min p (ˆ X ) x | x ) h ( X ) − h ( Z | ˆ = min p (ˆ X ) = min p (ˆ x | x ) h ( X ) − h ( Z ) = log σ 2 X D S. Cheng (OU-Tulsa) November 1, 2017 15 / 26

Lecture 12 Rate-distortion Theorem Forward proof Forward statement Given distortion constraint D , we can find scheme such that the require rate is no bigger than x | x ) I ( X ; ˆ R ( D ) = min X ) , p (ˆ where the ˆ x | x ) should satisfy E [ d ( X , ˆ X introduced by p (ˆ X )] ≤ D S. Cheng (OU-Tulsa) November 1, 2017 16 / 26

Lecture 12 Rate-distortion Theorem Forward proof Forward statement Given distortion constraint D , we can find scheme such that the require rate is no bigger than x | x ) I ( X ; ˆ R ( D ) = min X ) , p (ˆ where the ˆ x | x ) should satisfy E [ d ( X , ˆ X introduced by p (ˆ X )] ≤ D Code book construction Let say p ∗ (ˆ x | x ) is the distribution that achieve the rate-distortion optimiation problem. Randomly construct 2 NR codewords as follows S. Cheng (OU-Tulsa) November 1, 2017 16 / 26

Lecture 12 Rate-distortion Theorem Forward proof Forward statement Given distortion constraint D , we can find scheme such that the require rate is no bigger than x | x ) I ( X ; ˆ R ( D ) = min X ) , p (ˆ where the ˆ x | x ) should satisfy E [ d ( X , ˆ X introduced by p (ˆ X )] ≤ D Code book construction Let say p ∗ (ˆ x | x ) is the distribution that achieve the rate-distortion optimiation problem. Randomly construct 2 NR codewords as follows x | x ) to obtain ˆ Sample X from the source and pass X into p ∗ (ˆ X S. Cheng (OU-Tulsa) November 1, 2017 16 / 26

Lecture 12 Rate-distortion Theorem Forward proof Forward statement Given distortion constraint D , we can find scheme such that the require rate is no bigger than x | x ) I ( X ; ˆ R ( D ) = min X ) , p (ˆ where the ˆ x | x ) should satisfy E [ d ( X , ˆ X introduced by p (ˆ X )] ≤ D Code book construction Let say p ∗ (ˆ x | x ) is the distribution that achieve the rate-distortion optimiation problem. Randomly construct 2 NR codewords as follows x | x ) to obtain ˆ Sample X from the source and pass X into p ∗ (ˆ X Repeat this N time to get a length- N codeword Store the i -th codeword as C ( i ) S. Cheng (OU-Tulsa) November 1, 2017 16 / 26

Lecture 12 Rate-distortion Theorem Forward proof Forward statement Given distortion constraint D , we can find scheme such that the require rate is no bigger than x | x ) I ( X ; ˆ R ( D ) = min X ) , p (ˆ where the ˆ x | x ) should satisfy E [ d ( X , ˆ X introduced by p (ˆ X )] ≤ D Code book construction Let say p ∗ (ˆ x | x ) is the distribution that achieve the rate-distortion optimiation problem. Randomly construct 2 NR codewords as follows x | x ) to obtain ˆ Sample X from the source and pass X into p ∗ (ˆ X Repeat this N time to get a length- N codeword Store the i -th codeword as C ( i ) Note that the code rate is log 2 NR = R as desired N S. Cheng (OU-Tulsa) November 1, 2017 16 / 26

Lecture 12 Rate-distortion Theorem Covering lemma and distortion typical sequences We say joint typical sequences x N and ˆ x N are distortion typical x N ) − E [ d ( X , ˆ (( x N , ˆ x N ) ∈ A N d ,ǫ ) if | d ( x N , ˆ X )] | ≤ ǫ S. Cheng (OU-Tulsa) November 1, 2017 17 / 26

Lecture 12 Rate-distortion Theorem Covering lemma and distortion typical sequences We say joint typical sequences x N and ˆ x N are distortion typical x N ) − E [ d ( X , ˆ (( x N , ˆ x N ) ∈ A N d ,ǫ ) if | d ( x N , ˆ X )] | ≤ ǫ By LLN, every pair of sequences sampled from the joint source will virtually be distortion typical S. Cheng (OU-Tulsa) November 1, 2017 17 / 26

Lecture 12 Rate-distortion Theorem Covering lemma and distortion typical sequences We say joint typical sequences x N and ˆ x N are distortion typical x N ) − E [ d ( X , ˆ (( x N , ˆ x N ) ∈ A N d ,ǫ ) if | d ( x N , ˆ X )] | ≤ ǫ By LLN, every pair of sequences sampled from the joint source will virtually be distortion typical X ) − ǫ ) ≤ |A N Consequently, (1 − δ )2 N ( H ( X , ˆ d ,ǫ | ≤ 2 N ( H ( X , ˆ X )+ ǫ ) as before S. Cheng (OU-Tulsa) November 1, 2017 17 / 26

Lecture 12 Rate-distortion Theorem Covering lemma and distortion typical sequences We say joint typical sequences x N and ˆ x N are distortion typical x N ) − E [ d ( X , ˆ (( x N , ˆ x N ) ∈ A N d ,ǫ ) if | d ( x N , ˆ X )] | ≤ ǫ By LLN, every pair of sequences sampled from the joint source will virtually be distortion typical X ) − ǫ ) ≤ |A N Consequently, (1 − δ )2 N ( H ( X , ˆ d ,ǫ | ≤ 2 N ( H ( X , ˆ X )+ ǫ ) as before X N and X N , the probability For two independently drawn sequences ˆ for them to be distortion typical will be just the same as before. In particular, (1 − δ )2 − N ( I ( X ; ˆ X ) − 3 ǫ ) ≤ Pr (( X N , ˆ d ,ǫ ( X , ˆ X N ) ∈ A N X )) S. Cheng (OU-Tulsa) November 1, 2017 17 / 26

Lecture 12 Rate-distortion Theorem Covering lemma for distortion typical sequences S. Cheng (OU-Tulsa) November 1, 2017 18 / 26

Lecture 12 Rate-distortion Theorem Covering lemma for distortion typical sequences ∈ A ( N ) Pr (( X N ( m ) , ˆ X N ) / d ,ǫ ( X , ˆ X ) for all m ) S. Cheng (OU-Tulsa) November 1, 2017 18 / 26

Lecture 12 Rate-distortion Theorem Covering lemma for distortion typical sequences ∈ A ( N ) Pr (( X N ( m ) , ˆ X N ) / d ,ǫ ( X , ˆ X ) for all m ) M ∈ A ( N ) � Pr (( X N ( m ) , ˆ d ,ǫ ( ˆ X N ) / = X , X )) m =1 S. Cheng (OU-Tulsa) November 1, 2017 18 / 26

Lecture 12 Rate-distortion Theorem Covering lemma for distortion typical sequences ∈ A ( N ) Pr (( X N ( m ) , ˆ X N ) / d ,ǫ ( X , ˆ X ) for all m ) M ∈ A ( N ) � Pr (( X N ( m ) , ˆ d ,ǫ ( ˆ X N ) / = X , X )) m =1 M � X N ) ∈ A ( N ) � � 1 − Pr (( X N ( m ) , ˆ d ,ǫ ( ˆ = X , X )) m =1 S. Cheng (OU-Tulsa) November 1, 2017 18 / 26

Lecture 12 Rate-distortion Theorem Covering lemma for distortion typical sequences ∈ A ( N ) Pr (( X N ( m ) , ˆ X N ) / d ,ǫ ( X , ˆ X ) for all m ) M ∈ A ( N ) � Pr (( X N ( m ) , ˆ d ,ǫ ( ˆ X N ) / = X , X )) m =1 M � X N ) ∈ A ( N ) � � 1 − Pr (( X N ( m ) , ˆ d ,ǫ ( ˆ = X , X )) m =1 ≤ (1 − (1 − δ )2 − N ( I ( ˆ X ; X )+3 ǫ ) ) M S. Cheng (OU-Tulsa) November 1, 2017 18 / 26

Lecture 12 Rate-distortion Theorem Covering lemma for distortion typical sequences ∈ A ( N ) Pr (( X N ( m ) , ˆ X N ) / d ,ǫ ( X , ˆ X ) for all m ) 0 M ∈ A ( N ) � Pr (( X N ( m ) , ˆ d ,ǫ ( ˆ X N ) / = X , X )) − 2 m =1 M − 4 � X N ) ∈ A ( N ) � � 1 − Pr (( X N ( m ) , ˆ d ,ǫ ( ˆ = X , X )) 0 2 4 m =1 1 − x ≤ (1 − (1 − δ )2 − N ( I ( ˆ X ; X )+3 ǫ ) ) M e − x ≤ exp( − M (1 − δ )2 − N ( I ( ˆ X ; X )+3 ǫ ) ) S. Cheng (OU-Tulsa) November 1, 2017 18 / 26

Lecture 12 Rate-distortion Theorem Covering lemma for distortion typical sequences ∈ A ( N ) Pr (( X N ( m ) , ˆ X N ) / d ,ǫ ( X , ˆ X ) for all m ) 0 M ∈ A ( N ) � Pr (( X N ( m ) , ˆ d ,ǫ ( ˆ X N ) / = X , X )) − 2 m =1 M − 4 � X N ) ∈ A ( N ) � � 1 − Pr (( X N ( m ) , ˆ d ,ǫ ( ˆ = X , X )) 0 2 4 m =1 1 − x ≤ (1 − (1 − δ )2 − N ( I ( ˆ X ; X )+3 ǫ ) ) M e − x ≤ exp( − M (1 − δ )2 − N ( I ( ˆ X ; X )+3 ǫ ) ) ≤ exp( − (1 − δ )2 − N ( I ( ˆ X ; X ) − R +3 ǫ ) ) → 0 as N → ∞ and R > I ( X ; ˆ X ) S. Cheng (OU-Tulsa) November 1, 2017 18 / 26

Lecture 12 Rate-distortion Theorem Forward proof Encoding Given input X N , find out of the codewords the one that is jointly typical with X N . And say, if the codeword is C ( i ), output index i to the decoder S. Cheng (OU-Tulsa) November 1, 2017 19 / 26

Lecture 12 Rate-distortion Theorem Forward proof Encoding Given input X N , find out of the codewords the one that is jointly typical with X N . And say, if the codeword is C ( i ), output index i to the decoder Decoding Upon receiving the index i , simply output C ( i ) S. Cheng (OU-Tulsa) November 1, 2017 19 / 26

Lecture 12 Rate-distortion Theorem Forward proof Encoding Given input X N , find out of the codewords the one that is jointly typical with X N . And say, if the codeword is C ( i ), output index i to the decoder Decoding Upon receiving the index i , simply output C ( i ) Performance analysis First of all, the only point of failure lies on encoding, that is when the encoder cannot find a codeword jointly typical with X N S. Cheng (OU-Tulsa) November 1, 2017 19 / 26

Lecture 12 Rate-distortion Theorem Forward proof Encoding Given input X N , find out of the codewords the one that is jointly typical with X N . And say, if the codeword is C ( i ), output index i to the decoder Decoding Upon receiving the index i , simply output C ( i ) Performance analysis First of all, the only point of failure lies on encoding, that is when the encoder cannot find a codeword jointly typical with X N By covering Lemma, encoding failure is neglible as long as R > I ( X ; ˆ X ) S. Cheng (OU-Tulsa) November 1, 2017 19 / 26

Lecture 12 Rate-distortion Theorem Forward proof Encoding Given input X N , find out of the codewords the one that is jointly typical with X N . And say, if the codeword is C ( i ), output index i to the decoder Decoding Upon receiving the index i , simply output C ( i ) Performance analysis First of all, the only point of failure lies on encoding, that is when the encoder cannot find a codeword jointly typical with X N By covering Lemma, encoding failure is neglible as long as R > I ( X ; ˆ X ) If encoding is successful, C ( i ) and X N should be distortion typical. Therefore, E [ d ( C ( i ); X N )] ∼ E [ d ( ˆ X , X )] ≤ D as desired S. Cheng (OU-Tulsa) November 1, 2017 19 / 26

Lecture 12 Rate-distortion Theorem Converse proof Converse statement If rate is smaller than R ( D ), distortion will be larger than D S. Cheng (OU-Tulsa) November 1, 2017 20 / 26

Lecture 12 Rate-distortion Theorem Converse proof Converse statement If rate is smaller than R ( D ), distortion will be larger than D Alternative statement If distortion is less than or equal to D , the rate must be larger than R ( D ) S. Cheng (OU-Tulsa) November 1, 2017 20 / 26

Lecture 12 Rate-distortion Theorem Converse proof Converse statement If rate is smaller than R ( D ), distortion will be larger than D Alternative statement If distortion is less than or equal to D , the rate must be larger than R ( D ) In the proof, we need to use the convex property of R ( D ). That is, R ( a D 1 + (1 − a ) D 2 ) ≥ aR ( D 1 ) + (1 − a ) R ( D 2 ) So we will digress a little bit to show this convex property first S. Cheng (OU-Tulsa) November 1, 2017 20 / 26

Lecture 12 Rate-distortion Theorem Log-sum inequality Log-sum inequality For any a 1 , · · · , a n ≥ 0 and b 1 , · · · , b n ≥ 0, we have � a i i a i � � a i log 2 ≥ a i log 2 . � b i i b i i i S. Cheng (OU-Tulsa) November 1, 2017 21 / 26

Lecture 12 Rate-distortion Theorem Log-sum inequality Log-sum inequality For any a 1 , · · · , a n ≥ 0 and b 1 , · · · , b n ≥ 0, we have � a i i a i � � a i log 2 ≥ a i log 2 . � b i i b i i i Proof a i We can define two distributions p ( x ) and q ( x ) with p ( x i ) = i a i and � b i q ( x i ) = i b i . Since p ( x ) and q ( x ) are both non-negative and sum up to � 1, they are indeed valid probability mass functions. S. Cheng (OU-Tulsa) November 1, 2017 21 / 26

Previously... Joint typical sequences Covering and Packing Lemmas - PowerPoint PPT Presentation

Lecture 12 Review Previously... Joint typical sequences Covering and Packing Lemmas Channel Coding Theorem Capacity of Gaussian channel Capacity of additive white Gaussian channel Forward proof of Channel Coding Theorem S. Cheng (OU-Tulsa)

Previously in Game Theory Previously in Game Theory decision makers: choices

HAYMARKET Previously Previously recap the Feb 2017 presentation Presented to scrutiny

Heavy flavour spectroscopy at ATLAS, CMS and LHCb Mat Charles (Sorbonne Universit * /LPNHE) 1

FCSL Previously on this channel Previously on this channel letrec span (x : ptr) : bool = {

Creating Digital Photo Books Step-By-Step by Jim and Diane Bodkin Previously We Showed: Saving

Scala.js Safety & Sanity in the wild west of the web Li Haoyi, Dropbox, 20 July 2015 1.1 Who

Graphical Models Graphical Models Bayesian Networks Siamak Ravanbakhsh Fall 2019 Previously on

CSE 373: P vs NP Michael Lee Monday, Mar 5, 2018 1 Overview Previously: We spent a lot of

History of Present Illness 14 month old previously healthy infant boy presented via EMS after

Nadav Sahar Sahar Dr. Nadav Dr. Patient Details Patient Details 47 year old female

Audit of NOAC patients previously on warfarin Sue Bacon Anticoagulation Nurse Specialist North

MERTON IAPT Page 9 About us We are part of Addaction-Thinkaction (previously KCA) Page 10

DALT Fellowship Works In Progress: Exploration of Substance Use Patterns in Previously

Family: Mint ( Lamiaceae : lay-mee-AY-see-eye or ee; previously Labiatae : lay-bee-AH- tie or

The research commercialisation office of the University of Oxford, previously called Isis

Ingersoll Rand Bridge came across a former industrial facility, previously owned and operated by

Fast Dictionary-based Compression for Inverted Indexes Giulio Ermanno Pibiri Matthias Petri

Successive Cancellation Inactivation Decoding for Modified Reed-Muller and eBCH Codes Mustafa

Polynomial ideals associated to combinatorial objects William J. Martin Department of

Space Efficient Data Structures and FM index Venkatesh Raman The Institute of Mathematical

On Data-Processing and Majorization Inequalities for f -Divergences Igal Sason EE Department,

Virtual file system 1. VFS basic concepts 2. VFS design approach and architecture 3. Device

What is DNA? D eoxyribo n ucleic A cid A molecule composed of two chains that wrap

Wonderful Counselor Wonderful Counselor He made us. Wonderful Counselor He made us.

Previously... Joint typical sequences Covering and Packing Lemmas - PowerPoint PPT Presentation

Lecture 12 Review Previously... Joint typical sequences Covering and Packing Lemmas Channel Coding Theorem Capacity of Gaussian channel Capacity of additive white Gaussian channel Forward proof of Channel Coding Theorem S. Cheng (OU-Tulsa)

Previously in Game Theory Previously in Game Theory decision makers: choices

HAYMARKET Previously Previously recap the Feb 2017 presentation Presented to scrutiny

Heavy flavour spectroscopy at ATLAS, CMS and LHCb Mat Charles (Sorbonne Universit * /LPNHE) 1

FCSL Previously on this channel Previously on this channel letrec span (x : ptr) : bool = {

Creating Digital Photo Books Step-By-Step by Jim and Diane Bodkin Previously We Showed: Saving

Scala.js Safety &amp; Sanity in the wild west of the web Li Haoyi, Dropbox, 20 July 2015 1.1 Who

Graphical Models Graphical Models Bayesian Networks Siamak Ravanbakhsh Fall 2019 Previously on

CSE 373: P vs NP Michael Lee Monday, Mar 5, 2018 1 Overview Previously: We spent a lot of

History of Present Illness 14 month old previously healthy infant boy presented via EMS after

Nadav Sahar Sahar Dr. Nadav Dr. Patient Details Patient Details 47 year old female

Audit of NOAC patients previously on warfarin Sue Bacon Anticoagulation Nurse Specialist North

MERTON IAPT Page 9 About us We are part of Addaction-Thinkaction (previously KCA) Page 10

DALT Fellowship Works In Progress: Exploration of Substance Use Patterns in Previously

Family: Mint ( Lamiaceae : lay-mee-AY-see-eye or ee; previously Labiatae : lay-bee-AH- tie or

The research commercialisation office of the University of Oxford, previously called Isis

Ingersoll Rand Bridge came across a former industrial facility, previously owned and operated by

Fast Dictionary-based Compression for Inverted Indexes Giulio Ermanno Pibiri Matthias Petri

Successive Cancellation Inactivation Decoding for Modified Reed-Muller and eBCH Codes Mustafa

Polynomial ideals associated to combinatorial objects William J. Martin Department of

Space Efficient Data Structures and FM index Venkatesh Raman The Institute of Mathematical

On Data-Processing and Majorization Inequalities for f -Divergences Igal Sason EE Department,

Virtual file system 1. VFS basic concepts 2. VFS design approach and architecture 3. Device

What is DNA? D eoxyribo n ucleic A cid A molecule composed of two chains that wrap

Wonderful Counselor Wonderful Counselor He made us. Wonderful Counselor He made us.

Scala.js Safety & Sanity in the wild west of the web Li Haoyi, Dropbox, 20 July 2015 1.1 Who