[PPT] - INFORMATION THEORY FUNDAMENTALS AND MULTIPLE USER APPLICATIONS Max PowerPoint Presentation

SLIDE 1

INFORMATION THEORY FUNDAMENTALS AND MULTIPLE USER APPLICATIONS

July 2018

Max H. M. Costa Unicamp

LAWCI – Unicamp

SLIDE 2

Summary

Introduction
Entropy, K-L Divergence, Mutual Information
Asymptotic Equipartition Property (AEP)
1. Data Compression (Source Coding)
2. Transmission over Unreliable Channels (Channel Coding)
Differential Entropy, Gaussian Channels
Multiple User Information Theory:
Multiple Access, Broadcast, Interference, Relay Channels
Remarks: Applications in Biology, Economics,...

SLIDE 3

Some References (Textbooks):

 [1] T. Cover and J. Thomas, Elements of Information

Theory, Wiley, 2nd ed., 2006 (1991).

 [2] R. Ash, Information Theory, Dover, 1990.  [3] R. Gallager, Information Theory and Reliable

Communication, Wiley, 1968.

 [4] A. El Gamal and Y-H. Kim, Network Information

Theory, Cambridge, 2011.

SLIDE 4

Claude Elwood Shannon – 1916-2001

SLIDE 5

The Information Theory Landscape



IT

Communications Theory Probability Statistics Mathematics Economics Biology Physics Computer Science

SLIDE 6

H(X) = Entropy of X

 Let X be a discrete random variable taking values in

{x1, x2, ..., xM} with probabilities p = {p1, p2, ..., pM}.

 Definition:  H(X) = H(p) =

𝑞 𝑦𝑙 𝑚𝑝𝑕2

1 𝑞(𝑦𝑙) 𝑁 𝑙=1

(bits) =

 = E ( 𝑚𝑝𝑕2

1 𝑞(𝑌) ) bits

H(X) is a measure of the uncertainty of X.

SLIDE 7

How can H(X) arise naturally?

 Let X1, X2, ... be independent and identically distributed

(i.i.d.) according to p(x).

 Then  p(x1, x2, ..., xn) = p(x1)p(x2) ... p(xn) =  p(xi) =

= 2 𝑚𝑝𝑕2𝑞

𝑜 𝑙=1

𝑦𝑗 = 2 𝑜

(𝑘)

𝑜 𝑚𝑝𝑕2𝑞 𝑦𝑘 𝑛 𝑘=1

~ 2−𝑜𝐼 𝑌

i=1 n

Asymptotic Equipartition Property

SLIDE 8

Change of base

 Hb(X)= E ( 𝑚𝑝𝑕𝑐

1 𝑞(𝑌) ) = 𝑚𝑝𝑕𝑐a Ha(X)

 Units of Entropy:  Base 2  bits  Base 10  dits or Hartleys  Base e  nats  Base 3  trits

SLIDE 9

Examples

 Ex. 1) X  {0,1}, p(X=0)=0, p(X=1)=1   H(X) = - 0 log 0 - 1 log 1 = 0  Note: lim p log p = 0 by l’Hôpital’s rule

P0

No uncertainty ! X is deterministic

SLIDE 10

Examples (continued)

 Ex. 2) X  {0,1}, p(X=0)=p, p(X=1) =1-p,  H(X) = – p log p – (1-p) log (1-p)  = h(p)

h(p) p

1

1 1/2

h(p) is the binary entropy function

SLIDE 11

Lemma

 ln x ≤ x-1, x>0  Proof: Taylor series with remainder

1

x-1 ln x

x

SLIDE 12

Relative Entropy (Kulbach-Leibler divergence)

 Let p(x) and q(x) be two probability mass functions

defined on alphabet X.

 The K-L divergence of p w.r.t. q is  D(p  q) = 𝑞 𝑦 log

𝑞(𝑦) 𝑟(𝑦)

X

SLIDE 13

Proposition: Information Inequality

D(p  q) ≥ 0 with equality if and only if (iff) pq

 Proof: Let A = {x : p(x) > 0}  Have ln x ≤ x-1 (Lemma)  Thus ln

𝑟(𝑦) 𝑞(𝑦) ≤ 𝑟(𝑦) 𝑞(𝑦)-1

 Multiply by p(x) and sum over x  A  𝑞 𝑦 log

𝑟(𝑦) 𝑞(𝑦)

≤ 𝑞 𝑦 (

𝑟(𝑦) 𝑞(𝑦)

1) ≤ 0

 - D(p  q) ≤ 0  D(p  q) ≥ 0  equality iff p = q in A, then p  q . QED

SLIDE 14

Remark

 The K-L Divergence is very useful in IT,  but it is not a metric.  It is not symmetric and does not satisfy the triangle

inequality.

SLIDE 15

Application

 Let q be the uniform distribution  qi = 1/n for i=1,...,n  p = {p1, p2, ..., pn}  Then D(p  q) ≥ 0  𝑞𝑗 log

𝑞𝑗 𝑟𝑗

≥ 0

 𝑞𝑗 log 𝑞𝑗

≥ 𝑞𝑗 log 𝑟𝑗 = 𝑞𝑗 log 1/𝑜

 Thus H(p) ≤ log n  The uniform distribution has maximum entropy.

SLIDE 16

Joint, marginal and conditional distributions

 Joint Distribution:

X Y

yN y2 y1 xM x2 x1

.... p(xi,yj) p(xi) p(yj) 𝑞(𝑧𝑘) = 𝑞 𝑦𝑗, 𝑧𝑘 𝑞(𝑦𝑗) = 𝑞 𝑦𝑗, 𝑧𝑘 Marginal distributions:

𝑦𝑗 𝑧𝑘

p(x1) ... p(xM) p(y1) p(yN)

SLIDE 17

Conditional Distributions:

 𝑞 𝑧𝑘 𝑦𝑗 = 𝑞(𝑦𝑗,𝑧𝑘)

𝑞(𝑦𝑗)

 𝑞 𝑦𝑗 𝑧𝑘 = 𝑞(𝑦𝑗,𝑧𝑘)

𝑞(𝑧𝑘)

The joint distribution determines the marginal and the conditional distributions. Note: The marginals do not determine the joint distribution.

SLIDE 18

Condicional probabilities may lead to non-intuitive results

 3 cards: front back

SLIDE 19

Joint Entropy

 H(X,Y) = H(p( . , . )) = 𝑞 𝑦𝑗, 𝑧𝑘 𝑚𝑝𝑕

1 𝑞(𝑦𝑗,𝑧𝑘)

𝑦𝑗, 𝑧𝑘

SLIDE 20

Conditional Entropy

 H(X|Y) = 𝑞 𝑦𝑗, 𝑧𝑘 𝑚𝑝𝑕

1 𝑞(𝑦𝑗|𝑧𝑘) = - E (𝑚𝑝𝑕 𝑞(𝑌|𝑍) )

 H(Y|X) = 𝑞 𝑧𝑘, 𝑦𝑗 𝑚𝑝𝑕

1 𝑞(𝑧𝑘|𝑦𝑗) = - E (𝑚𝑝𝑕 𝑞(𝑍|𝑌) )

SLIDE 21

Chain Rule (like peeling an onion):

H(X,Y) = H(X) + H(Y|X)

 = H(Y) + H(X|Y)  Proof: Do it for homework.  Simple algebraic manipulation.  Corollary (conditional form):  H(X,Y|Z) = H(X|Z) + H(Y|X,Z)  = H(Y|Z) + H(X|Y,Z)

SLIDE 22

Mutual Information

 The Mutual information between X and Y is  the K-L divergence of the joint distribution p(x,y)

and the product of the marginals p(x) p(y).

 I(X;Y) = D(p(x,y)  p(x) p(y) )  =

𝑞 𝑦, 𝑧 log 𝑞(𝑦,𝑧) 𝑞 𝑦 𝑞(𝑧)

X Y

SLIDE 23

Properties of I(X;Y)

 1) Non-negativity: I(X;Y) ≥ 0 , with equality  iff X and Y are independent.  Proof: I(X;Y) is a K-L divergence.  2) Symmetry:  I(X;Y) = I(Y;X)  Proof: Trivial (p(x)p(y) = p(y)p(x))

SLIDE 24

Mutual Information and Entropy

 I(X;Y) =

𝑞 𝑦, 𝑧 log 𝑞(𝑦,𝑧) 𝑞 𝑦 𝑞(𝑧)

 = H(X) + H(Y) – H(X,Y) (from above)  = H(X) – H(X|Y) (from chain rule)  = H(Y) – H(Y|X) (alternative form)  Note: The Mutual Information between two random

variables is the residual uncertainty about one r.v. after the other is revealed.

SLIDE 25

A Venn Diagram

 Works well for two random variables

H(Y|X) H(X) H(X|Y) H(Y) I(X;Y)

SLIDE 26

Information can’t hurt

 Conditioning reduces entropy:  H(X|Y) ≤ H(X)  Proof: I(X;Y) = H(X) – H(X|Y) ≥ 0  On average the knowledge of Y cannot increase

the uncertainty about X.

SLIDE 27

Entropy as self-information

 I(X;X) = H(X) – H(X|X) = H(X)  The entropy is the amount of information that a

random variable conveys about itself.

SLIDE 28

Passing on Information

 Let X and Y be dependent r.v.’s  Proposition: I(X;Y) ≥ I(X; (Y))  Proof: I(X;Y) = H(X) – H(X|Y)  = H(X) – H(X|Y, (Y))  ≥ H(X) – H(X|(Y)) = I(X; (Y))  This is a simple form of the Data Processing Inequality.

X Y (Y)

Random mechanism

(•)

Conditioning reduces entropy

SLIDE 29

Convexity – quick review

 Convex sets:  Non-convex sets:

SLIDE 30

Convex Functions:

 A function f(x) is convex if the set of points above its

graph is convex.

 Examples: f(x) = x2 f(x) = ex

Mnemonic: The exponential function is convex

SLIDE 31

Concave functions

 f(x) is concave if {- f(x)} is convex.  Examples:  f(x) = log(x) f(x) = -|x|

SLIDE 32

Some are neither, some are both

 Neither:  Both:

f(x)=x3 f(x)=ax+b

SLIDE 33

Jensen’s Inequality

 Let X be a random variable and f(x) a convex function.  Then E[f(X)] ≥ f(EX)

f(x) x Ef(X) f(EX) EX Mnemonic: The chord is above the arc.

Proofs: Induction, Taylor series (when f´´(x) exists).

SLIDE 34

Concavity of H(p)

 Proposition: H(p) is a concave function of p.  Proof: Let X1 be distributed as p1 and X2 as p2.  Let index   {1,2} with probabilities (, 1-)  Let Z = X . Then Z is distributed as  p1 + (1-) p2 .  Now since conditioning reduces uncertainty  H(Z) ≥ H(Z| ). Equivalently  H( p1 + (1-) p2) ≥  H(p1)+ (1-) H(p2)  showing that H(•) is a concave function.  Note: Mixing two gases of equal entropy results in

a gas with higher entropy.

SLIDE 35

Fano’s Inequality:

p(y|x) decoder Y X = g(Y) x ^

Pe  H(X | Y ) − 1

𝐦𝐩𝐡 ( 𝒀 )

where Pe = Prob(X = X )

^

SLIDE 36

Asymptotic Equipartition Property

 Let X1, X2, …, Xn be i.i.d. according to p(x)

Sample space = set of all sequences (x1, x2, …, xn) A=Set of typical sequences

1) Pr{A} ≥ 1-

2) p(x)  2–nH(X) 3) A  2nH(X) A.E.P. { This is the DNA of IT !

SLIDE 37

Examples of typical sequences

 Let X be a biased coin with  P(Head)=0.9 and P(Tail) = 0.1  Consider the set of 1000-long sequences of coin tosses.  Typical sequences are those that have approximately

900 Heads and 100 Tails.

 Note: The most likely sequence, namely the one  with 1000 Heads, is not Typical !

SLIDE 38

Conclusion

 Better bet on A !

SLIDE 39

Rn also tends to be non-intuitive

 Example: Sphere “inscribed” in cube of side 1.

 R2 R3

 Question: What happens in Rn ? Vn  0, 1,  ?

Radius 1/2

SLIDE 40

Transmission over Unreliable Channels

 The Channel Coding Problem:  W  {1,2,…,2𝑜𝑆} = message set of rate R  X = (x1 x2 … xn) = codeword input to channel  Y = (y1 y2 … yn) = codeword output from channel  𝑋

= decoded message P(error) = P{W𝑋}

𝑋 W X Y Channel Encoder Channel 𝑞(𝑧|𝑦) Channel Decoder

SLIDE 41

Shannon’s Channel Coding Theorem

 Using the channel n times:

Xn Yn

•

SLIDE 42

Simple examples

 Noiseless typewriter:

4 3 2 1 4 3 2 1 Can transmit R = 𝑚𝑝𝑕2 4 = 2 bits/transmission Number of noise free symbols = 4 Output Y Input X

SLIDE 43

Simple examples

 Noisy typewriter (type 1):

4 3 2 1 4 3 2 1 Can transmit R = 𝑚𝑝𝑕2 2 = 1 bit/transmission Number of noise free symbols = 2 Output Y Input X

0.5 0.5 0.5 0.5

SLIDE 44

Simple examples

 Noisy typewriter (type 2):

4 3 2 1 4 3 2 1 Can transmit R ≥ 𝑚𝑝𝑕2 2 = 1 bit/transmission Number of noise free symbols = 2 (apparently, surprise later) Output Y Input X

0.5 0.5 0.5 0.5

SLIDE 45

Simple examples

 Noisy typewriter (type 3):

4 3 2 1 4 3 2 1 Can transmit R = 𝑚𝑝𝑕2 2 = 1 bit/transmission Number of noise free symbols = 2 Use X=1 and X=3 Output Y Input X

0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5

SLIDE 46

Simple examples

 A tricky typewriter:

4 3 2 1 3 2 1 Output Y Input X 4 5 5

0.5 0.5 0.5 0.5 0.5

How many noise free symbols? Clearly at least 2, hopefully more.

SLIDE 47

Simple examples

 Consider the n=2 extension of the channel:

4 X1 X2 2 1 4 3 2 1 5 5 Which code squares to pick? 3

SLIDE 48

Simple examples

 Consider the n=2 extension of the channel:

4 X1 X2 3 2 1 4 3 2 1 5 5 Let {X1,X2} be {(1,1), (2,3), (3,5), (4,2), (5,4)}

SLIDE 49

Reminder of the channel

 A tricky typewriter:

4 3 2 1 3 2 1 Output Y Input X 4 5 5

0.5 0.5 0.5 0.5 0.5

How many noise free symbols? Clearly at least 2, hopefully more.

SLIDE 50

Simple examples

 Looking at the outputs:

4 Y1 Y2 3 2 1 4 3 2 1 5 5 Let {X1,X2} be {(1,1), (2,3), (3,5), (4,2), (5,4)}

SLIDE 51

Simple examples - observations

 Note that we get 5 noise-free symbols in n=2

transmissions.

 Thus achieve rate

𝑚𝑝𝑕25 2 = 1.16 bits/transmission

 with P(error) = 0.  For arbitrarily small P(error) can use long codes (n)

to achieve log2 5/2=1.32 bits/transmission, the channel capacity.

SLIDE 52

The Binary Symmetric Channel (BSC)

 How many noise-free symbols?

 1- 1 1  X 1- A.: Clearly for n=1 there are none. How about using n large? Y

SLIDE 53

Shannon’s Second Theorem

 Using the channel 𝑜 times:

Xn Yn

•

SLIDE 54

Shannon’s Second Theorem

 The Information Channel Capacity of a discrete

memoryless channel is

 𝐷 = max 𝐽(𝑌; 𝑍).  Note: 𝐽 𝑌; 𝑍 is a function of 𝑞 𝑦, 𝑧 = 𝑞 𝑦 𝑞 𝑧 𝑦 .  But 𝑞 𝑧 𝑦 is fixed by the channel.

𝑞(𝑦)

SLIDE 55

Shannon’s Second Theorem

 Direct Part:  If R < C  There exists a code with P(error)  0  Converse Part:  If R > C  Communication with P(error)  0  is not possible.

SLIDE 56

Shannon’s Second Theorem

 Theorem: For a discrete memoryless channel, all rates

𝑆 below the information channel capacity 𝐷 are achievable with maximum probability of error arbitrarily small. Conversely, if the rate is above 𝐷, the probability of error is bounded away from zero.

 Proof: Achievability: Use random coding to generate

codes with a particular 𝑞 𝑦 distribution in the

codewords. Then show that the average P(error) tends

to zero with n if 𝑆 < 𝐷. Then expurgate bad codewords to get a code with small maximum P(error).

SLIDE 57

Shannon’s Second Theorem

 Proof of Converse (sketch using AEP):

Xn Yn

•
Yn 2𝑜𝐼(𝑍)

typical balls 2𝑜𝐼(𝑍|𝑌)

SLIDE 58

Shannon’s Second Theorem

 Proof of Converse (sketch using AEP):  Recall the sphere packing problem.

Maximum number of non-overlapping balls is bounded by

2𝑆≤

2𝑜𝐼(𝑍) 2𝑜𝐼(𝑍|𝑌) = 2𝑜𝐽(𝑌:𝑍) ≤ 2𝑜𝐷

 Thus 2𝑆 ≤ 2𝐷 and 𝑆 ≤ 𝐷.  A formal proof uses Fano’s inequality.

SLIDE 59

Example: The Binary Symmetric Channel

  C = max (H(Y) – H(Y|X))  = 1 – h() bits/transmission  Note: C=0 for  = ½ .

 1- 1 1  X 1- Y  C() 1 ½ 1

SLIDE 60

Example: The Binary Erasure Channel

  C = max (H(Y) – H(Y|X))  = 1 –  bits/transmission

 1- 1 1  X 1- Y  C() 1 ½ E Note: C=0 for  = 1. Capacity is achieved with 𝑞 𝑌 = 0 = 𝑞 𝑌 = 1 =½ .

SLIDE 61

Example: The Z Channel

  C = max (H(Y) – H(Y|X)) = 𝑚𝑝𝑕2 5 − 2 = 0.322

𝑐𝑗𝑢 𝑢𝑠.

 Note: Maximizing 𝑞 𝑌 = 1 =

2 5 .

 Homework: Obtain this capacity.

1 1 X Y ½ ½

𝑞(𝑦)

SLIDE 62

Changing Z channel into BEC

 Show that code {01, 10} can transform a Z channel

into a BEC.

 What is a lower bound to capacity of the Z

channel?

SLIDE 63

Typewriter type 2:



Sum channel: 2C = 2C1 + 2C2 where C1 = C2 = 0.322 C = 1,322 bits/channel use How many noise free symbols?

½ ½ ½ ½

SLIDE 64

Example: Noisy typewriter

  C = max (H(Y) – H(Y|X))  = 𝑚𝑝𝑕226 − 𝑚𝑝𝑕22 = 𝑚𝑝𝑕213 bits/transmission  Achieved with uniform distribution on the inputs.

A C D ½ X ½ Y A B B C D E Z Z

SLIDE 65

Remark:

 For this example, we can also achieve  𝐷 = 𝑚𝑝𝑕213 bits/transmission with P(error)=0 and  n = 1 by transmitting alternating input symbols, i.e.,  X = {A C E … Z}.

SLIDE 66

Differential Entropy

 Let 𝑌 be a continuous random variable with density

𝑔 𝑦 and support 𝑇. The differential entropy of 𝑌 is

ℎ 𝑌 = − 𝑔 𝑦 log 𝑔 𝑦 𝑒𝑦

𝑇

(if it exists).

Note: Also written as ℎ 𝑔 .

SLIDE 67

Examples: Uniform distribution

 Let 𝑌 be uniform in the interval 0, 𝑏 . Then  𝑔 𝑦 =

1 𝑏 in the interval and 𝑔 𝑦 = 0 outside.

 ℎ 𝑌 = −

1 𝑏 𝑚𝑝𝑕 𝑏 1 𝑏 𝑒𝑦 = log 𝑏

 Note that ℎ 𝑌 can be negative for 𝑏 < 1.  However, 2ℎ(𝑔) = 2log 𝑏 = 𝑏 is the size of the

support set, which is non-negative.

SLIDE 68

Example: Gaussian distribution

 Let 𝑌 ~  𝑦 =

1 22 𝑓𝑦𝑞 ( −𝑦2 22)

 Then ℎ 𝑌 = ℎ  = −  𝑦 [−

𝑦2 22 − 𝑚𝑜 22] 𝑒𝑦

 =

𝐹𝑌2 22 + 1 2 𝑚𝑜 2 2

 =

1 2 𝑚𝑜( 2e2) nats

 Changing the base we have ℎ 𝑌 =

1 2 𝑚𝑝𝑕( 2e2) bits

SLIDE 69

Relation of Differential and Discrete Entropies

 Consider a quantization of X, denoted by X

 Let X = 𝑦𝑗 inside the 𝑗th interval.

Then 𝐼(𝑌) = - 𝑞𝑗

𝑗

𝑚𝑝𝑕 𝑞𝑗

= - 𝑔(𝑦𝑗)

𝑗

𝑚𝑝𝑕 𝑔(𝑦𝑗) - 𝑚𝑝𝑕   ℎ 𝑔 − log 



SLIDE 70

Differential Entropy

 So the two entropies differ by the log of the

quantization level .

 We can define joint differential entropy, conditional

differential entropy, K-L divergence and mutual information with some care to avoid infinite differential entropies.

SLIDE 71

K-L divergence and Mutual Information

  𝐸(𝑔 g) = 𝑔 𝑚𝑝𝑕

𝑔 𝑕

 𝐽 𝑌; 𝑍 = 𝑔 𝑦, 𝑧 𝑚𝑝𝑕

𝑔(𝑦,𝑧) 𝑔 𝑦 𝑔(𝑧) 𝑒𝑦 𝑒𝑧

 Thus, I(X;Y) = h(X) + h(Y) – h(X,Y).

Note: h(X) can be negative, but I(X;Y) is always  0.

SLIDE 72

Differential entropy of a Gaussian vector

 Theorem: Let 𝒀 be a Gaussian n-dimensional vector

with mean  and covariance matrix 𝐿. Then

 ℎ 𝒀 =

1 2 log((2𝑓)𝑜 𝐿 )

 where 𝐿 denotes the determinant of 𝐿.  Proof: Algebraic manipulation.

SLIDE 73

The Gaussian Channel

𝑋 W X Y Channel Encoder Channel Decoder Z~N (0, N I)

+

Power Constraint: EX2≤P

SLIDE 74

The Gaussian Channel

  W  {1,2,…,2𝑜𝑆} = message set of rate R  X = (x1 x2 … xn) = codeword input to channel  Y = (y1 y2 … yn) = codeword output from channel  𝑋

= decoded message P(error) = P{W𝑋}

𝑋 W X Y Channel Encoder Channel Decoder Z~N (0, N I)

+

Power constraint: EX2≤P

SLIDE 75

The Gaussian Channel

 Using the channel n times:

Xn Yn

•

SLIDE 76

The Gaussian Channel

  C𝑏𝑞𝑏𝑑𝑗𝑢𝑧 𝐷 = max 𝐽(𝑌; 𝑍)  𝐽 𝑌; 𝑍 = ℎ 𝑍 − ℎ 𝑍 𝑌 = ℎ 𝑍 − ℎ 𝑌 + 𝑎|𝑌  = ℎ 𝑍 − ℎ 𝑎 ≤

1 2 log 2e 𝑄 + 𝑂 − 1 2 log 2e𝑂

 =

1 2 log 1 + 𝑄 𝑂 bits/transmission

f(x): EX2≤P

SLIDE 77

The Gaussian Channel

  The capacity of the discrete time additive

Gaussian channel:

 𝐷 =

1 2 log 1 + 𝑄 𝑂 bits/transmission

 achieved with X ~ N(0 , P).

SLIDE 78

Bandlimited Gaussian Channel

 Consider the channel with continuous waveform inputs x(t)

with power constraint (

1 𝑈 𝑦2 𝑈

𝑢 𝑒𝑢 ≤ 𝑄) and Bandwidth limited to W. The channel has white Gaussian noise with power spectral density N0/2 watt/Hz.

 In the interval (0,T) we can specify the code waveform by

2WT samples (Nyquist criterion). We can transmit these samples over discrete time Gaussian channels with noise variance N0/2. This gives

 𝐷 = 𝑋 log

(1+

𝑄 𝑂0𝑋 ) bit/second

SLIDE 79

Bandlimited Gaussian Channel

  𝐷 = 𝑋 log

(1+

𝑄 𝑂0𝑋 ) bit/second

 Note: If W   we have C =

𝑄 𝑂0 𝑚𝑝𝑕2𝑓 bits/second.

SLIDE 80

Bandlimited Gaussian Channel

 Let

𝑆 𝑋 be the spectral density  in bits per second

per Hertz. Also let 𝑄 = 𝐹𝑐𝑆 where 𝐹𝑐 is the available energy per information bit.

 We get 

𝑆 𝑋 ≤ 𝐷 𝑋 = log

(1+

𝐹𝑐𝑆 𝑂0𝑋 ) bit/second.

 Thus 

𝐹𝑐 𝑂0 ≥ 2−1



This relation defines the so called Shannon Bound.

SLIDE 81

The Shannon Bound



𝐹𝑐 𝑂0 ≥ 2−1



 𝐹𝑐 𝑂0

𝐹𝑐 𝑂0 (dB)

0 0.69

1.59

0.1 0.718

1.44

0.25 0.757

1.21

0.5 0.828

0.82

1 1 2 1.5 1.76 4 3.75 5.74 8 31.87 15.03



– – – – – –

      

𝐹𝑐 𝑂0 (dB)

 Shannon Bound 5 4 3 2 1 6 5 4 3 2 1

1

SLIDE 82

Shannon’s Water Filling Solution

SLIDE 83

Parallel Gaussian Channels



2.5 3 2 1

SLIDE 84

Example of Water Filling

 Channels with noise levels 2, 1 and 3.  Available power = 2  Capacity=

1 2 log (1+ 0.5 2 ) + 1 2 log (1+ 1.5 1 ) + 1 2 log (1+ 3)

 Level of noise + signal power = 2.5  No power allocated to the third channel.

SLIDE 85

Parallel Gaussian Channels



2.5 3 2 1

SLIDE 86

Differential capacity

Discrete memoryless channel as a band limited channel

SLIDE 87

Multiplex strategies (TDMA, FDMA)

j  P j 

SLIDE 88

Multiplex strategies (non-orthonal CDMA)

Discrete memoryless channel as a band limited channel

j P j 

1

2 log

(1 +

𝑄 𝑂+ 𝑘−1 𝑄)

𝐷

𝑘 = 1 2 log

(1 + 𝑁𝑄

𝑂 )

j=1

M

Aggregate capacity::

SLIDE 89

TDMA or FDMA versus CDMA



Number of Users Aggregate Power

Bandwidth limitation (2WT dimensions) Non-orthogonal CDMA (log has no cap)

Orthogonal schemes:

SLIDE 90

Multiple User Information Theory

 Building Blocks:  Multiple Access Channels (MACs)  Broadcast Channels (BCs)  Interference Channels (IFCs)  Relay Channels (RCs)  Note: These channels have their discrete memoryless

and Gaussian versions. For simplicity we will look at the Gaussian models.

SLIDE 91

Multiple Access Channel (MAC)

SLIDE 92

Gaussian Broadcast Channel

SLIDE 93

Superposition coding

N2 (1-)P P 1

SLIDE 94

Superposition coding

N2 (1-)P P 1

SLIDE 95

Standard Gaussian Interference Channel

Power P1 Power P2

a b

W1 W2 W1 W2

^ ^

SLIDE 96

Symmetric Gaussian Interference Channel

Power P Power P

SLIDE 97

Z-Gaussian Interference Channel

SLIDE 98

Interference Channel: Strategies

Things that we can do with interference:

1.

Ignore (take interference as noise (IAN)

2.

Avoid (divide the signal space (TDM/FDM))

3.

Partially decode both interfering signals

4.

Partially decode one, fully decode the other

5.

Fully decode both (only good for strong inter- ference, a≥1)

SLIDE 99

Interference Channel: Brief history

 Carleial (1975): Very strong interference does not

reduce capacity (a2 ≥ 1+P)

 Sato (1981), Han and kobayashi (1981): Strong

interference (a2 ≥ 1) : IFC behaves like 2 MACs

 Motahari, Khandani (2007), Shang, Kramer and

Chen (2007), Annapureddy, Veeravalli (2007): Very weak interference (2a(1+a2P) ≤ 1) : Treat interference as noise – (IAN)

SLIDE 100

Interference Ch.: History (continued)

 Sason (2004): Symmetrical superposition to beat TDM  Etkin, Tse, Wang (2008): capacity to within 1 bit  C (2011): Noisebergs to get Gaussian H+K for Z IFCs  C, Nair (2012, 2013, 2016, 2017): Some progress on

achievable region of symmetric Gaussian IFCs

 Polyanskiy and Wu, 2016: Corner points established.

SLIDE 101

Relay Channel

  The least understood.  Upper bound: Cut set bound  Lower bound:

Y1 : X1 X Y

SLIDE 102

Relay Channel

  The relay channel is said to be physically degraded

if p(y,y1|x,x1)=p(y1|x,x1) p(y|y1,x1).

 So Y is a degradation of the relay signal Y1 .  Theorem: C = sup min { I(X,X1;Y1), I(X;Y1|X1)}

Y1 : X1 X Y p(x,x1)

SLIDE 103

Applications to Biology

 BCH error correcting codes have been found in DNA sequences

generated by BCH codes over GF(4)

 L.C.B. Faria, A.S.L. Rocha, J.H. Kleinschmidt, R. Palazzo Jr. and M.C. Silva-Filho

 The question raised by researchers in the field of mathematical

biology regarding the existence of error-correcting codes in the structure of the DNA sequences is answered positively. It is shown, for the first time, that DNA sequences such as proteins, targeting sequences and internal sequences are identified as codewords of BCH codes over Galois fields.

 Electronics Letters, vol 46, No. 3, 4/Feb/2010

SLIDE 104

Applications to Economics

 Stock Market:  Portfolio b=(b1 b2 … bm), bi ≥ 0, ∑ bi =1  Stock vector X= (x1, x2, … xm)  Stocks Xi ≥0 , i = 1,2,…, n.  xi represent the relative final price w.r.t. initial price

in day i. For example, xi = 1.03 represent a 3% variation that day.

 The wealth after n days using portfolio b is  Sn=

𝑐𝑈𝑌𝑗

𝑜 𝑗=1

SLIDE 105

Optimal portfolio

 Def.: The growth rate of a stock portfolio b w.r.t. to

a stock market distribution F(x) is

 W(b,F) = E log bTX.  Def. The optimal growth rate W*(F) is  W*(F) = max W(b,F)  Theorem: The optimal wealth after n days behaves

as 𝑇𝑜* ≈ 2𝑜𝑋∗ with probability 1.

b

SLIDE 106

Proof

 By the strong Law of Large Numbers, 

1 𝑜 log 𝑇𝑜* = 1 𝑜

log 𝑐∗𝑈

𝑜 𝑗=1

𝑌𝑗

  W* with probability 1.  Thus  𝑇𝑜* ≈ 2𝑜𝑋∗ with probability 1.

SLIDE 107

Some new fronts

Joint source and channel coding Coding for channels with side information Distributed source coding Network strategies Merging of Network Coding and Multi User IT

SLIDE 108

 Many thanks!