Capacity of Continuous Channels with Memory via Directed Information - - PowerPoint PPT Presentation

capacity of continuous channels with memory via directed
SMART_READER_LITE
LIVE PREVIEW

Capacity of Continuous Channels with Memory via Directed Information - - PowerPoint PPT Presentation

Capacity of Continuous Channels with Memory via Directed Information Neural Estimator Ziv Aharoni 1 , Dor Tsur 1 , Ziv Goldfeld 2 , Haim H. Permuter 1 1 Ben-Gurion University of the Negev 2 Cornell University International Symposium on Information


slide-1
SLIDE 1

Capacity of Continuous Channels with Memory via Directed Information Neural Estimator

Ziv Aharoni1, Dor Tsur1, Ziv Goldfeld2, Haim H. Permuter1

1Ben-Gurion University of the Negev 2Cornell University

International Symposium on Information Theory June 21st, 2020

Ziv Aharoni Capacity via DINE 1 / 18

slide-2
SLIDE 2

Communication Channel

M Encoder Decoder Channel ∆ Xi Yi Yi−1

  • M

Continuous alphabet Time invariant channel with memory channel is unknown

Ziv Aharoni Capacity via DINE 2 / 18

slide-3
SLIDE 3

Capacity

Feedback is not present: CFF = lim

n→∞ sup PXn

1 nI(X n; Y n) Feedback is present: CFB = lim

n→∞

sup

PXnY n−1

1 nI(X n → Y n) where I(X n → Y n) is the directed information (DI)

Ziv Aharoni Capacity via DINE 3 / 18

slide-4
SLIDE 4

Capacity

Feedback is not present: CFF = lim

n→∞ sup PXn

1 nI(Xn → Yn) Feedback is present: CFB = lim

n→∞

sup

PXnY n−1

1 nI(Xn → Yn) where I(X n → Y n) is the directed information (DI) DI is a unifying measure for feed-forward (FF) and feedback (FB) capacity

Ziv Aharoni Capacity via DINE 3 / 18

slide-5
SLIDE 5

Talk Outline

Directed Information Neural Estimator (DINE)

M Xi Yi

  • M

Channel ∆ Yi−1 Gradient

Decoder

Ziv Aharoni Capacity via DINE 4 / 18

slide-6
SLIDE 6

Talk Outline

Directed Information Neural Estimator (DINE) Neural Distribution Transformer (NDT)

M Xi Yi

  • M

Channel ∆ Yi−1 Gradient

Decoder

Ziv Aharoni Capacity via DINE 4 / 18

slide-7
SLIDE 7

Talk Outline

Directed Information Neural Estimator (DINE) Neural Distribution Transformer (NDT) Capacity estimation

M Xi Yi

  • M

Channel ∆ Yi−1 Gradient

Decoder

Ziv Aharoni Capacity via DINE 4 / 18

slide-8
SLIDE 8

Preliminaries - Donsker-Varadhan

Theorem (Donsker-Varadhan Representation)

The KL-divergence between the probability measures P and Q, can be represented by DKL(PQ) = sup

T:Ω− →R

EP [T] − log EQ

  • eT

where, T is measurable and expectations are finite. For mutual information: I (X; Y ) = sup

T:Ω− →R

EPXY [T] − log EPX PY

  • eT

Ziv Aharoni Capacity via DINE 5 / 18

slide-9
SLIDE 9

MINE (Y. Bengio Keynoe ISIT ’19)

Mutual Information Neural Estimator: Given {xi, yi}n

i=1

Approximation ˆ I(X; Y ) = sup

θ∈Θ

EPXY [Tθ] − log EPX PY

  • eTθ

Estimation ˆ In(X, Y ) = sup

θ∈Θ

1 n

n

  • i=1

Tθ(xi, yi) − log 1 n

n

  • i=1

eTθ(xi,

yi)

Ziv Aharoni Capacity via DINE 6 / 18

slide-10
SLIDE 10

Estimator Derivation

DI as entropies difference I(X n → Y n) = h (Y n) − h (Y nX n) where h (Y nX n) = n

i=1 h

  • Yi|X i, Y i−1

Using an reference measure: I(X n → Y n) = I(X n−1 → Y n−1)+ DKL(PY nX nPY n−1X n−1 ⊗ P

Y |PX n)

  • D(n)

Y X

− DKL(PY nPY n−1 ⊗ P

Y )

  • D(n)

Y

P

Y is some uniform i.i.d reference measure of the dataset.

Ziv Aharoni Capacity via DINE 7 / 18

slide-11
SLIDE 11

Estimator Derivation

DI Rate as a difference of KL-divergences: I(X n → Y n) = I(X n−1 → Y n−1) + D(n)

Y X − D(n) Y

  • increment in info. in step n

Ziv Aharoni Capacity via DINE 8 / 18

slide-12
SLIDE 12

Estimator Derivation

DI Rate as a difference of KL-divergences: D(n)

Y X − D(n) Y n→∞

− − − → I(X → Y) The limit exists for ergodic and stationary processes

Ziv Aharoni Capacity via DINE 8 / 18

slide-13
SLIDE 13

Estimator Derivation

DI Rate as a difference of KL-divergences: D(n)

Y X − D(n) Y n→∞

− − − → I(X → Y) The goal: Estimate D(n)

Y X, D(n) Y

Ziv Aharoni Capacity via DINE 8 / 18

slide-14
SLIDE 14

Directed Information Neural Estimator

Apply DV formula on D(n)

Y X, D(n) Y :

  • D(n)

Y

= sup

T:Ω→R

EPY n [T(Y n)] − EPY n−1⊗P ˜

Y

  • exp
  • T(Y n−1, ˜

Y )

  • where the optimal solution is T∗ = log

PYn|Y n−1 P ˜

Y Ziv Aharoni Capacity via DINE 9 / 18

slide-15
SLIDE 15

Directed Information Neural Estimator

Approximate T with a recurrent neural network (RNN)

  • D(n)

Y

= sup

θY

EPY n [TθY(Y n)] − EPY n−1⊗P ˜

Y

  • exp
  • TθY(Y n−1, ˜

Y )

  • Ziv Aharoni

Capacity via DINE 9 / 18

slide-16
SLIDE 16

Directed Information Neural Estimator

Estimate expectations with empirical means

  • D(n)

Y

= sup

θY

1 n

n

  • i=1

TθY (yi|y i−1) − log

  • 1

n

n

  • i=1

eTθY (

yi|yi−1)

  • Ziv Aharoni

Capacity via DINE 9 / 18

slide-17
SLIDE 17

Directed Information Neural Estimator

Estimate expectations with empirical means

  • D(n)

Y

= sup

θY

1 n

n

  • i=1

TθY (yi|y i−1) − log

  • 1

n

n

  • i=1

eTθY (

yi|yi−1)

  • Finally,
  • I (n)(X → Y) =

D(n)

XY −

D(n)

Y

Ziv Aharoni Capacity via DINE 9 / 18

slide-18
SLIDE 18

Consistency

Theorem (DINE consistency)

Let {Xi, Yi}∞

i=1 ∼ P be jointly stationary ergodic stochastic processes.

Then, there exist RNNs F1 ∈ RNNdy,1, F2 ∈ RNNdxy,1, such that DINE In(F1, F2) is a strongly consistent estimator of I(X → Y), i.e., lim

n→∞

  • In(F1, F2)

a.s

= I(X → Y)

Ziv Aharoni Capacity via DINE 10 / 18

slide-19
SLIDE 19

Consistency

Theorem (DINE consistency)

Let {Xi, Yi}∞

i=1 ∼ P be jointly stationary ergodic stochastic processes.

Then, there exist RNNs F1 ∈ RNNdy,1, F2 ∈ RNNdxy,1, such that DINE In(F1, F2) is a strongly consistent estimator of I(X → Y), i.e., lim

n→∞

  • In(F1, F2)

a.s

= I(X → Y) Sketch of proof: Represent the solution T∗ by a dynamic system. Universal approximation of dynamical system with RNNs. Estimation of expectations with empirical means.

Ziv Aharoni Capacity via DINE 10 / 18

slide-20
SLIDE 20

Implementation

  • D(n)

Y

= sup

θY

1 n

n

  • i=1

TθY (yi|y i−1) − log

  • 1

n

n

  • i=1

eTθY (

yi|yi−1)

  • Ziv Aharoni

Capacity via DINE 11 / 18

slide-21
SLIDE 21

Implementation

  • D(n)

Y

= sup

θY

1 n

n

  • i=1

TθY (yi|y i−1) − log

  • 1

n

n

  • i=1

eTθY (

yi|yi−1)

  • Adjust RNN to process both inputs and carry the state

generated by true samples

Ziv Aharoni Capacity via DINE 11 / 18

slide-22
SLIDE 22

Implementation

  • D(n)

Y

= sup

θY

1 n

n

  • i=1

TθY (yi|y i−1) − log

  • 1

n

n

  • i=1

eTθY (

yi|yi−1)

  • Adjust RNN to process both inputs and carry the state

generated by true samples

...

  • S1

S1

  • Y1

Y1 F F

  • ST

ST

ST−1

  • YT

YT F F

Ziv Aharoni Capacity via DINE 11 / 18

slide-23
SLIDE 23

Implementation

Complete system layout for the calculation of D(n)

Y Yi

Reference Gen.

  • Yi

Modified

Si

  • Si

Dense

Dense

TθY ( Yi|Y i−1) TθY (Yi|Y i−1)

DV

LSTM Layer Input

  • DY (θY , Dn)

Ziv Aharoni Capacity via DINE 12 / 18

slide-24
SLIDE 24

NDT

Neural Distribution Transformer (NDT)

M Xi Yi

  • M

Channel ∆ Yi−1 Gradient

Decoder

Ziv Aharoni Capacity via DINE 13 / 18

slide-25
SLIDE 25

NDT

Model M as i.i.d Gaussian noise {Ni}i∈Z. The NDT a mapping w/o feedback: NDT : Ni − → Xi w/ feedback: NDT : Ni, Y i−1 − → Xi

Ziv Aharoni Capacity via DINE 14 / 18

slide-26
SLIDE 26

NDT

Model M as i.i.d Gaussian noise {Ni}i∈Z. The NDT a mapping w/o feedback: NDT : Ni − → Xi w/ feedback: NDT : Ni, Y i−1 − → Xi NDT is modeled by an RNN

LSTM

Dense Dense

Channel Xi Yi−1 Ni Constraint

Ziv Aharoni Capacity via DINE 14 / 18

slide-27
SLIDE 27

Capacity Estimation

Iterating between DINE and NDT.

Ni

NDT

Channel

DINE Xi Yi (RNN) Feedback

PYi|X iY i−1 Noise ∆ Output

  • In(X → Y)

Gradient

Yi−1 (RNN)

Ziv Aharoni Capacity via DINE 15 / 18

slide-28
SLIDE 28

Results

Channel - MA(1) additive Gaussian noise (AGN): Zi = αUi−1 + Ui Yi = Xi + Zi where, Ui

i.i.d.

∼ N(0, 1), Xi is the channel input sequence bound to the power constraint E [X 2

i ] ≤ P, and Yi is the channel

  • utput.

Ziv Aharoni Capacity via DINE 16 / 18

slide-29
SLIDE 29

MA(1) AGN Results

Estimation performance

  • 20
  • 15
  • 10
  • 5

5 10 15 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8

(a) Feed-forward Capacity

  • 20
  • 15
  • 10
  • 5

5 10 15 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

(b) Feedback Capacity

Ziv Aharoni Capacity via DINE 17 / 18

slide-30
SLIDE 30

Conclusion and Future Work

Conclusions: Estimation method for both FF and FB capacity. Pros: mild assumptions on the channel Cons: lack of provable bounds

Ziv Aharoni Capacity via DINE 18 / 18

slide-31
SLIDE 31

Conclusion and Future Work

Conclusions: Estimation method for both FF and FB capacity. Pros: mild assumptions on the channel Cons: lack of provable bounds Future Work: Generalize for more complex scenarios (e.g multi-user) Obtain provable bounds on fundamental limits

Ziv Aharoni Capacity via DINE 18 / 18

slide-32
SLIDE 32

Conclusion and Future Work

Conclusions: Estimation method for both FF and FB capacity. Pros: mild assumptions on the channel Cons: lack of provable bounds Future Work: Generalize for more complex scenarios (e.g multi-user) Obtain provable bounds on fundamental limits

Thank You!

Ziv Aharoni Capacity via DINE 18 / 18