information theory
play

Information Theory Lecture 4 Discrete channels, codes and capacity: - PDF document

Information Theory Lecture 4 Discrete channels, codes and capacity: CT7 Channels: CT7.12 Capacity and the coding theorem: CT7.37 and CT7.9 Combining source and channel coding: CT7.13 Mikael Skoglund, Information Theory 1/19


  1. Information Theory Lecture 4 • Discrete channels, codes and capacity: CT7 • Channels: CT7.1–2 • Capacity and the coding theorem: CT7.3–7 and CT7.9 • Combining source and channel coding: CT7.13 Mikael Skoglund, Information Theory 1/19 Discrete Channels channel p ( y | x ) X Y • Let X and Y be finite sets. • A discrete channel is a random mapping p ( y | x ) : X �− → Y . • The nth extension of the discrete channel is a random 1 ) : X n �− mapping p ( y n 1 | x n → Y n , defined for all n ≥ 1 , 1 ∈ X n and y n x n 1 ∈ Y n . • A pmf p ( x n 1 ) induces a pmf p ( y n 1 ) via the channel, � p ( y n p ( y n 1 | x n 1 ) p ( x n 1 ) = 1 ) x n 1 Mikael Skoglund, Information Theory 2/19

  2. • The channel is stationary if for any n 1 ) = p ( y n + k 1+ k | x n + k p ( y n 1 | x n 1+ k ) , k = 1 , 2 , . . . • A stationary channel is memoryless if 1 y m − 1 p ( y m | x m ) = p ( y m | x m ) , m = 2 , 3 , . . . 1 That is, the channel output at time m does not depend on past inputs or outputs . • Furthermore, if the channel is used without feedback n � p ( y n 1 | x n 1 ) = p ( y m | x m ) , n = 2 , 3 , . . . m =1 That is, each time the channel is used its effect on the output is independent of previous and future uses . Mikael Skoglund, Information Theory 3/19 • A discrete memoryless channel (DMC) is completely described by the triple ( X , p ( y | x ) , Y ) . • The binary symmetric channel (BSC) with crossover probability ε , • a DMC with X = Y = { 0 , 1 } and p (1 | 0) = p (0 | 1) = ε 1 − ε 0 0 ε X Y ε 1 1 1 − ε • The binary erasure channel (BEC) with erasure probability ε , • a DMC with X = { 0 , 1 } , Y = { 0 , 1 , e } and p ( e | 0) = p ( e | 1) = ε 1 − ε 0 0 ε e X Y ε 1 1 1 − ε Mikael Skoglund, Information Theory 4/19

  3. A Block Channel Code encoder channel decoder x n 1 ( ω ) Y n 1 ω α ( · ) p ( y | x ) β ( · ) ω ˆ • An ( M, n ) block channel code for a DMC ( X , p ( y | x ) , Y ) is defined by: 1 An index set I M � { 1 , . . . , M } . → X n . The set 2 An encoder mapping α : I M �− � � x n 1 : x n C n � 1 = α ( i ) , ∀ i ∈ I M of codewords is called the codebook . 3 A decoder mapping β : Y n �− → I M . • The rate of the code is R � log M [bits per channel use] n Mikael Skoglund, Information Theory 5/19 Why? • M different codewords { x n 1 (1) , . . . , x n 1 ( M ) } can convey log M bits of information per codeword, or R bits per channel use. • Consider M = 2 k , |X| = 2 , and assume that k < n . Then k “information bits” are mapped into n > k “coded bits.” Introduces redundancy ; can be employed by the decoder to correct channel errors . Mikael Skoglund, Information Theory 6/19

  4. Error Probabilities • Information symbol ω ∈ I M , with p ( i ) = Pr( ω = i ) . Then, for a given DMC and a given code → X n → Y n ω = β ( Y n ω − 1 = α ( ω ) − − → ˆ 1 ) 1 • Define: 1 The conditional error probability: λ i = Pr(ˆ ω � = i | ω = i ) 2 The maximal error probability: λ ( n ) = max { λ 1 , . . . , λ M } 3 The average error probability: M � P ( n ) = Pr(ˆ ω � = ω ) = λ i p ( i ) e i =1 Mikael Skoglund, Information Theory 7/19 Jointly Typical Sequences The set A ( n ) of jointly typical sequences with respect to a pmf ε p ( x, y ) is the set { ( x n 1 , y n 1 ) } of sequences for which � � � − 1 n log p ( x n � � 1 ) − H ( X ) � < ε � � � � � − 1 � n log p ( y n � 1 ) − H ( Y ) � < ε � � � � � − 1 n log p ( x n 1 , y n � � 1 ) − H ( X, Y ) � < ε � � where n � p ( x n 1 , y n 1 ) = p ( x m , y m ) m =1 � � p ( x n p ( x n 1 , y n p ( y n p ( x n 1 , y n 1 ) = 1 ) , 1 ) = 1 ) y n x n 1 1 and where the entropies are computed based on p ( x, y ) . Mikael Skoglund, Information Theory 8/19

  5. The joint AEP 1 ) = � n Let ( X n 1 , Y n 1 ) drawn according to p ( x n 1 , y n m =1 p ( x m , y m ) 1 ) ∈ A ( n ) � ( X n 1 , Y n � • Pr > 1 − ε for sufficiently large n . ε • | A ( n ) ε | ≤ 2 n ( H ( X,Y )+ ε ) . • | A ( n ) ε | ≥ (1 − ε )2 n ( H ( X,Y ) − ε ) for sufficiently large n . • If ˜ 1 and ˜ X n Y n 1 are drawn independently according to p ( x n 1 p ( x n 1 , y n 1 ) and p ( y n 1 p ( x n 1 , y n 1 ) = � 1 ) = � 1 ) , then y n x n ( ˜ 1 , ˜ X n Y n 1 ) ∈ A ( n ) ≤ 2 − n ( I ( X ; Y ) − 3 ε ) � � Pr ε and for sufficiently large n ( ˜ 1 , ˜ 1 ) ∈ A ( n ) ≥ (1 − ε )2 − n ( I ( X ; Y )+3 ε ) X n Y n � � Pr ε with I ( X ; Y ) computed for the pmf p ( x, y ) . Mikael Skoglund, Information Theory 9/19 Channel Capacity • For a fixed n , a code can convey more information for large ⇒ we would like to maximize the rate R = 1 M = n log M without sacrificing performance • Which is the largest R that allows for a (very) low P ( n ) ?? e • For a given channel we say that the rate R is achievable if there exists a sequence of ( M, n ) codes, with M = ⌈ 2 nR ⌉ , such that the maximal probability of error λ ( n ) → 0 as n → ∞ . The capacity C of a channel is the supremum of all rates that are achievable over the channel . Mikael Skoglund, Information Theory 10/19

  6. Random Code Design • Choose a joint pmf p ( x n 1 ) on X n . • Random code design : Draw M codewords x n 1 ( i ) , i = 1 , . . . , M , i.i.d. according to p ( x n 1 ) and let these define a codebook x n 1 (1) , . . . , x n � � C n = 1 ( M ) . • Note : The interpretation here is that the codebook is “designed” in a random fashion . When the resulting code then is used , the codebook must, of course, be fixed and known. . . Mikael Skoglund, Information Theory 11/19 A Lower Bound for C of a DMC • A DMC ( X , p ( y | x ) , Y ) . • Fix a pmf p ( x ) for x ∈ X . 1 ) = � p ( x m ) . Generate C n = { x n 1 (1) , . . . , x n 1 ( M ) } using p ( x n • A data symbol ω is generated according to a uniform distribution on I M , and x n 1 ( ω ) is transmitted. • The channel produces a corresponding output sequence Y n 1 . • Let A ( n ) be the typical set w.r.t. p ( x, y ) = p ( y | x ) p ( x ) . ε At the receiver, the decoder then uses the following decision rule. Index ˆ ω was sent if: ∈ A ( n ) • � x n ω ) , Y n � 1 (ˆ for some small ε ; ε 1 � x n 1 ( ω ) , Y n � • no other ω corresponds to a jointly typical . 1 Mikael Skoglund, Information Theory 12/19

  7. Now study π n = Pr(ˆ ω � = ω ) where “Pr” is over the random codebook selection, the data variable ω and the channel. • Symmetry = ⇒ π n = Pr(ˆ ω � = 1 | ω = 1) • Let E i = { ( x n 1 ( i ) , Y n 1 ) ∈ A ( n ) ε } then for a sufficiently large n , M � π n = P ( E c 1 ∪ E 2 ∪ · · · ∪ E M ) ≤ P ( E c 1 ) + P ( E i ) i =2 ≤ ε + ( M − 1)2 − n ( I ( X ; Y ) − 3 ε ) ≤ ε + 2 − n ( I ( X ; Y ) − R − 3 ε ) because of the union bound and the joint AEP. Mikael Skoglund, Information Theory 13/19 • Note that p ( y | x ) p ( x ) log p ( y | x ) � I ( X ; Y ) = p ( y ) x,y with p ( y ) = � x p ( y | x ) p ( x ) , where p ( x ) generated the random codebook and p ( y | x ) is given by the channel. • Let C tot be the set of all possible codebooks that can be 1 ) = � p ( x m ) , then at least one C n ∈ C tot generated by p ( x n must give P ( n ) ≤ π n ≤ ε + 2 − n ( I ( X ; Y ) − R − 3 ε ) e = ⇒ as long as R < I ( X ; Y ) − 3 ε there exists at least one n , that can give P ( n ) C n ∈ C tot , say C ∗ → 0 as n → ∞ . e Mikael Skoglund, Information Theory 14/19

  8. • Order the codewords in C ∗ n according to the corresponding λ i ’s and throw away the worst half = ⇒ • new rate R ′ = R − n − 1 • for the remaining codewords λ ( n ) ≤ ε + 2 − n ( I ( X ; Y ) − R − 3 ε ) 2 = ⇒ for any p ( x ) , all rates R < I ( X ; Y ) − 3 ε achievable = ⇒ all rates R < max p ( x ) I ( X ; Y ) − 3 ε achievable = ⇒ C ≥ max p ( x ) I ( X ; Y ) Mikael Skoglund, Information Theory 15/19 An Upper Bound for C of a DMC • Let C n = { x n 1 (1) , . . . , x n 1 ( M ) } be any sequence of codes that can achieve λ ( n ) → 0 at a fixed rate R = 1 n log M . • Note that λ ( n ) → 0 = ⇒ P ( n ) → 0 for any p ( ω ) . e We can assume C n encodes equally probable ω ∈ I M . • Fano’s inequality = ⇒ R ≤ 1 R + 1 1 ) ≤ 1 n + P ( n ) nI ( x n 1 ( ω ); Y n n + P ( n ) R +max p ( x ) I ( X ; Y ) e e That is, for any fixed achievable R λ ( n ) → 0 = ⇒ R ≤ max p ( x ) I ( X ; Y ) = ⇒ C ≤ max p ( x ) I ( X ; Y ) Mikael Skoglund, Information Theory 16/19

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend