Characterization of Conditional Independence and Weak Realizations - - PowerPoint PPT Presentation

characterization of conditional independence and weak
SMART_READER_LITE
LIVE PREVIEW

Characterization of Conditional Independence and Weak Realizations - - PowerPoint PPT Presentation

Characterization of Conditional Independence and Weak Realizations of Multivariate Gaussian Random Variables: Applications to Networks Charalambos D. Charalambous and Jan H. van Schuppen 2020 IEEE International Symposium on Information Theory


slide-1
SLIDE 1

Characterization of Conditional Independence and Weak Realizations of Multivariate Gaussian Random Variables: Applications to Networks

Charalambos D. Charalambous and Jan H. van Schuppen

2020 IEEE International Symposium on Information Theory 21-26 June 2020 Los Angeles, California, USA

slide-2
SLIDE 2

Motivation Notation The Gray and Wyner Rate Region Main Concepts Our Results Concluding Remarks

Problems of Multi-User Communication

  • Problem. We consider the Gray and Wyner1 lossy network coding

for a tuple of multivariate jointly Gaussian random variables (RVs): (X N

1 ,X N 2 ) = {(X1,i,X2,i) ∶ i = 1,2,...,N},

(1) X1,i ∶ Ω → Rp1 = X1, X2,i ∶ Ω → Rp2 = X2, ∀i, (2) PX1,iX2,i = PX1,X2 jointly Gaussian and (X1,i,X2,i) indep. of (X1,j,X2,j), ∀i ≠ j (3) with average square-error distortions at the two decoders E{DXi(X N

i , ˆ

X N

i )} ≤ ∆i, ∆i ∈ [0,∞], i = 1,2,

(4) DXi(x N

i , ˆ

x N

i ) △

= 1 N

N

j=1

∣∣xi,j − ˆ xi,j∣∣2

Rpi , i = 1,2,

(5)

  • 1R. M. Gray and A.D. Wyner, “Source coding for a simple network”, Bell

Systems Technical Journal, 1974

slide-3
SLIDE 3

Motivation Notation The Gray and Wyner Rate Region Main Concepts Our Results Concluding Remarks

Problems of Multi-User Communication (cont)

Main Results of ISIT Paper: Calculation of the lossy common information, i.e, minimum common message rate on the Gray and Wyner rate region, when the sum rate is equal to the joint rate distortion function (RDF) Parametrization of Gray and Wyner rate region and rates that lie on the Pangloss plane Weak stochastic realization of RVs that achieve the rates, and induce optimal test channels of conditional, marginal, and joint RDFs

slide-4
SLIDE 4

Motivation Notation The Gray and Wyner Rate Region Main Concepts Our Results Concluding Remarks

Problems of Multi-User Communication

Technical Tools-Three Concepts: Hotelling’s geometric approach to Gaussian RVs van Putten’s and van Schuppen’s parametrization of the family of all Gaussian distributions PX1,X2,W by an auxiliary Gaussian RV W ∶ Ω → Rk = W that makes X1 and X2 conditional independent Weak stochastic realization of (X1,X2,W ) Additional Applications not Discussed: Capacity of the multiple access channel Secrecy capacity, etc

slide-5
SLIDE 5

Motivation Notation The Gray and Wyner Rate Region Main Concepts Our Results Concluding Remarks

Notation

An Rn-valued Gaussian RV, X ∶ Ω → Rn is denoted by X ∈ G(mX,QX), with as parameters the mean value mX ∈ Rn and the variance QX ∈ Rn×n, QX = QT

X ⪰ 0.

The effective dimension of the RV X ∈ G(0,QX) is denoted by dim(X) = rank(QX). An n × n identity matrix is denoted by In. A tuple of Gaussian RVs (X1,X2) ∈ G(0,Q(X1,X2)) will be denoted this way to save space, rather than by ( X1 X2 ), Q(X1,X2) = ( QX1 QX1,X2 QT

X1,X2

QX2 ) ∈ R(p1+p2)×(p1+p2).

slide-6
SLIDE 6

Motivation Notation The Gray and Wyner Rate Region Main Concepts Our Results Concluding Remarks

The Gray and Wyner Lossy Rate Region

RGW(∆1,∆2) denotes the Gray and Wyner operational rate region, by a coding scheme that uses the auxiliary RV W ∶ Ω → W. Define the family of probability distributions P ≜{ PX1,X2,W, x1 ∈ X1, x2 ∈ X2, w ∈ W ∣ PX1,X2,W(x1,x2,∞) = PX1,X2 } Theorem 8 in Gray and Wyner 1974 (Under conditions). For each PX1,X2,W ∈ P and ∆1 ≥ 0,∆2 ≥ 0, define R

PX1,X2,W GW

(∆1,∆2) = {(R0,R1,R2) ∶ R0 ≥ I(X1,X2;W ), R1 ≥ RX1∣W(∆1), R2 ≥ RX2∣W(∆2)} (6) where RXi∣W(∆i) is the conditional RDF of Xi, conditioned on W .

slide-7
SLIDE 7

Motivation Notation The Gray and Wyner Rate Region Main Concepts Our Results Concluding Remarks

The Gray and Wyner Lossy Rate Region

Let R∗

GW(∆1,∆2) △

= ( ⋃

PX1,X2,W ∈P

R

PX1,X2,W GW

(∆1,∆2))

c

(7) The achievable rate region is RGW(∆1,∆2) = R∗

GW(∆1,∆2).

(8) Theorem 6 in Gray and Wyner 1974. if (R0,R1,R2) ∈ RGW(∆1,∆2), then R0 + R1 + R2 ≥ RX1,X2(∆1,∆2), called pangloss bound (9) R0 + R1 ≥ RX1(∆1), R0 + R2 ≥ RX2(∆2) (10) Pangloss Plane: the set of triples (R0,R1,R2) ∈ RGW(∆1,∆2) that satisfy R0 + R1 + R2 = RX1,X2(∆1,∆2)

slide-8
SLIDE 8

Motivation Notation The Gray and Wyner Rate Region Main Concepts Our Results Concluding Remarks

Wyner’s Common Information

Wyner’s characterization of common information2 (for finite-valued RVs): CW(X1,X2)

= inf

PX1,X2,W ∶ PX1,X2∣W =PX1∣W PX2∣W I(X1,X2;W )

(11) CW(X1,X2) does not have an operational meaning for continuous-valued RVs, such as, Gaussian CW(X1,X2) is a single point on the Gray and Wyner rate regions (for finite-valued RVs)

2A.D. Wyner, “The common information of two dependent random

variables”, IEEEIT, 1975

slide-9
SLIDE 9

Motivation Notation The Gray and Wyner Rate Region Main Concepts Our Results Concluding Remarks

Wyner’s Lossy Common Information

Wyner’s characterization of lossy common information3: Let CGW(X1,X2;∆1,∆2) denote the minimum common message rate R0 on RGW(∆1,∆2), with sum rate not exceeding the joint RDF RX1,X2(∆1,∆2). Then CGW(X1,X2;∆1,∆2) = inf I(X1,X2;W ), such that (12) RX1∣W(∆1) + RX2∣W(∆2) + I(X1,X2;W ) = RX1,X2(∆1,∆2) (13) The infimum is over all RVs W in W, which parametrize the source distribution via PX1,X2,W, having a X1 × X2−marginal source distribution PX1,X2, and induce joint distributions PW ,X1,X2,ˆ

X1,ˆ X2.

  • 3K. B. Viswanatha, E. Akyol, K. Rose, “The lossy common information of

correlated sources”, IEEEIT, 2014,

  • G. Xu, W. Liu, B. Chen, “A lossy source coding interpretation of Wyner’s

common information”, IEEEIT, 2016

slide-10
SLIDE 10

Motivation Notation The Gray and Wyner Rate Region Main Concepts Our Results Concluding Remarks

Common, Correlated, Private Parts of (X1,X2)

Hotelling’s 1936 geometric approach to Gaussian RVs: The geometric object of a Gaussian RV Y ∶ Ω → Rp is the σ−algebra FY generated by Y . For (X1,X2) ∈ G(0,Q(X1,X2)), a basis transformation consists

  • f a non-singular matrix, S,

S

= Block-diag(S1,S2), X c

1 △

= S1X, X c

2 △

= S2X2, (14) FX1 = FS1X1, FX2 = FS2X2. (15) S maps (X1,X2) into the “canonical form” of the tuple of RVs Full specification4, as interpreted in the table below.

  • 4C. D. Charalambous and J. H. van Schuppen, “A New Approach to Lossy

Network Compression of a Tuple of Correlated Multivariate Gaussian RVs”, 29 May 2019, https://arxiv.org/abs/1905.12695

slide-11
SLIDE 11

Motivation Notation The Gray and Wyner Rate Region Main Concepts Our Results Concluding Remarks

X c

11 = X c 21 − a.s.

identical information of X c

1 and X c 2

X c

12

correlated information of X c

1 w.r.t X c 2

X c

13

private information of X c

1 w.r.t X c 2

X c

21 = X c 11 − a.s.

identical information of X c

1 and X c 2

X c

22

correlated information of X c

2 w.r.t X c 1

X c

23

private information of X c

2 w.r.t X c 1

X c

ij ∶ Ω → Rpij, i = 1,2, j = 1,2,3, p11 = p21, p12 = p22 = n, (16)

p1 = p11 + p12 + p13, p2 = p21 + p22 + p23, (17) S1X1 = (X c

11,X c 12,X c 13), S2X2 = (X c 21,X c 22,X c 23),

(18) X c

11 = X c 21 − a.s., X c 11,X c 21 ∈ G(0,Ip11),

(19) X c

13 ∈ G(0,Ip13) and X c 23 ∈ G(0,Ip23) are independent

(20) X c

12 ∈ G(0,Ip12) and X c 22 ∈ G(0,Ip22) are correlated,

(21) E[X c

12(X c 22)T] = D = Diag(d1,...,dp12), di ∈ (0,1) ∀i.

(22) Entries of D are called the canonical correlation coefficients (23)

slide-12
SLIDE 12

Motivation Notation The Gray and Wyner Rate Region Main Concepts Our Results Concluding Remarks

van Putten’s and van Schuppen’s parametrization

  • f conditional independence

van Putten’s and van Schuppen’s parametrization5: Define the family of all jointly Gaussian probability distributions PX1,X2,W by a Gaussian RV W ∶ Ω → Rk = W that makes X1 and X2 conditional independent, PCIG △ = {PX1,X2,W∣ PX1,X2∣W = PX1∣WPX2∣W, the X1 × X2−marginal dist. of PX1,X2,W is the fixed dist. PX1,X2, and (X1,X2,W ) is jointly Gaussian} PCIG

min ⊆ PCIG additionally the dimension of RV W is minimal

  • 5C. van Putten and J.H. van Schuppen, “The weak and strong Gaussian

probabilistic realization problem”, J. Multivariate Anal., 1985

slide-13
SLIDE 13

Motivation Notation The Gray and Wyner Rate Region Main Concepts Our Results Concluding Remarks

van Putten’s and van Schuppen’s parametrization

  • f conditional independence (cont)

Theorem 4.2 in van Putten and van Schuppen: Consider a tuple (X1,X2) of Gaussian RVs in the canonical variable

  • form. Restrict attention to the correlated parts:

(X1,X2) ∈ G(0,Q(X1,X2)) = P0, X1,X2 ∶ Ω → Rn, (24) Q(x1,x2) = ( I D D I ), p11 = p21 = 0,p13 = p23 = 0, (25) D = Diag(d1,...,dn) ∈ Rn×n, 1 > d1 ≥ ... ≥ dn > 0. (26) (a) There exists a probability measure P1, and a triple of Gaussian RVs X1,X2, W ∶ Ω → Rn defined on it, such that (i) P1∣(X1,X2) = P0 and (ii) X1 and X2 are conditional independent given W with W having minimal dimension.

slide-14
SLIDE 14

Motivation Notation The Gray and Wyner Rate Region Main Concepts Our Results Concluding Remarks

van Putten’s and van Schuppen’s parametrization

  • f conditional independence (cont)

(b) There exist a family of Gaussian measures denoted by Pci ⊆ PCIG

min , that satisfy (i) and (ii) of (a), and moreover this family

is parametrized by the matrices and sets: G(0,Qs(QW)), QW ∈ QW, (27) Qs(QW) = ⎛ ⎜ ⎝ I D D1/2 D I D1/2QW D1/2 QWD1/2 QW ⎞ ⎟ ⎠ , (28) QW = {QW ∈ Rn×n∣ QW = QT

W, 0 < D ≤ QW ≤ D−1},

(29) Pci = {G(0,Qs(QW)) on (R3n,B(R3n))∣ QW ∈ QW} and Pci ⊆ PCIG

min .

slide-15
SLIDE 15

Motivation Notation The Gray and Wyner Rate Region Main Concepts Our Results Concluding Remarks

Calculation of the Gray and Wyner Rate Region

Next, we give, for the tuple of multivariate Gaussian RVs (X1,X2) and square-error distortion functions,

1 Wyner’s common information (information definition) 2 Parametrization of rate triples (R0,R1,R2) ∈ RGW(∆1,∆2) 3 Wyner’s lossy common information.

The derivations are outlined in the ISIT paper6

6extensive elaboration is found in C. D. Charalambous and J. H. van

Schuppen, “A New Approach to Lossy Network Compression of a Tuple of Correlated Multivariate Gaussian RVs”, 29 May 2019, https://arxiv.org/abs/1905.12695

slide-16
SLIDE 16

Motivation Notation The Gray and Wyner Rate Region Main Concepts Our Results Concluding Remarks

Wyner’s Common Information

Theorem 1. (Charalambous and van Schuppen 2019) Consider the jointly Gaussian RVs X1 ∶ Ω → Rp1,X2 ∶ Ω → Rp2, (X1,X2) ∈ G(0,Q(X1,X2)) and without loss of generality assume S = Block-diag(S1,S2) produces a canonical variable form such that p11 = p21 = 0. For any dist. PX1,X2,W parametrized by an arbitrary RV W ∶ Ω → Rk with fixed marginal dist. PX1,X2 = G(0,Q(X1,X2)): (a) The mutual information I(X1,X2;W ) satisfies I(X1,X2;W ) = I(X c

12,X c 22;W ), p12 = p22 = n

(30) ≥H(X c

12,X c 22) − H(X c 12∣W ) − H(X c 22∣W )

(31) =1 2

n

i=1

ln(1 − d2

i ) − 1

2 ln(det([I − D1/2Q−1

W D1/2][I − D1/2QWD1/2]))

(32)

slide-17
SLIDE 17

Motivation Notation The Gray and Wyner Rate Region Main Concepts Our Results Concluding Remarks

Wyner’s Common Information (cont)

where the lower bound is parametrized by QW ∈ QW, QW = {QW ∈ Rn×n∣ QW = QT

W, 0 < D ≤ QW ≤ D−1}

(33) and such that PX c

11,X c 22,W is jointly Gaussian.

(34) (b) The lower bound in part (a) is achieved if PX1,X2,W is jointly Gaussian and PX c

12,X c 22∣W = PX c 12∣WPX c 22∣W,W ∶ Ω → Rn;

a realization of RVs (X c

12,X c 22) which achieves this bound is

X c

12 =QX c

12,WQ−1

W W + Z1,

(35) QX c

12,W =D1/2, Z1 ∈ G(0,(I − D1/2Q−1

W D1/2)),

(36) X c

22 =QX c

22,WQ−1

W W + Z2

(37) QX c

22,W =D1/2QW, Z2 ∈ G(0,(I − D1/2QWD1/2)),

(38) d1 >d2 > ... > dn, (Z1,Z2,W ), are independent. (39)

slide-18
SLIDE 18

Motivation Notation The Gray and Wyner Rate Region Main Concepts Our Results Concluding Remarks

Wyner’s Common Information (cont)

(c) A further lower bound on part (a) occurs if QW = QW ∗ ∈ QW is diagonal, i.e., QW ∗ = Diag(QW ∗

1 ,...,QW ∗ n ), di ≤ QW ∗ i ≤ d−1

i , ∀i

(40) and is achieved by the realization of part (b), with QW = QW ∗. (d) Wyner’s information common information is given by CW(X1,X2) = {

1 2 ∑n i=1 ln(1+di 1−di ) ∈ (0,∞)

if n > 0 if n = 0 (41) and it is achieved by a Gaussian RV W = W ∗ ∈ G(0,QW ∗),W ∗ ∶ Ω → Rn, QW ∗ = In an n × n identity covariance matrix, and the realization of part (b) with QW = In.

slide-19
SLIDE 19

Motivation Notation The Gray and Wyner Rate Region Main Concepts Our Results Concluding Remarks

Joint RDF and Test Channel Realizations

Theorem 2. (Charalambous and van Schuppen 2019) Consider the statement of Theorem 1, with (X1,X2) in canonical variable form. Then the following hold. (a) The joint RDF RX1,X2(∆1,∆2) with joint decoding of (X c

12,X c 22) by (̂

X c

12, ̂

X c

22), with respect to the average square-error

distortions is characterized by RX1,X2(∆1,∆2) = inf

∑n

j=1 ∆1,j≤∆1,∑n j=1 ∆2,j≤∆2 {1

2

n

j=1

log ( (1 − d2

j )

∆1,j∆2,j )} (42) E∣∣X c

i2 − ̂

X c

i2∣∣2 Rn = n

j=1

∆i,j ≤ ∆i, i = 1,2. (43)

slide-20
SLIDE 20

Motivation Notation The Gray and Wyner Rate Region Main Concepts Our Results Concluding Remarks

Joint RDF and Test Channel Realizations (cont)

(b) The representations of reproductions (̂ X c

12, ̂

X c

22) of (X c 12,X c 22)

that achieve the joint RDF of part (a) is ̂ X c

12 = X c 12 − Λ1(I − D1/2Q−1 W D1/2)−1Z1 + V1,

(44) ̂ X c

22 = X c 22 − Λ2(I − D1/2QWD1/2)−1Z2 + V2,

(45) V1 ∈ G(0,Λ1AT

1 ), V2 ∈ G(0,Λ2AT 2 ),

(46) A1 = I − Λ1(I − D1/2Q−1

W D1/2)−1,

(47) Λi = Diag(∆i,1,∆i,2,...,∆i,n) ∈ Rn×n, i = 1,2, (48) A2 = I − Λ2(I − D1/2QWD1/2)−1, (49) (V1,V2,Z1,Z2,W ), mutually independent. (50) Further, the joint distribution satisfies PX c

12,X c 22,̂

X c

12,̂

X c

22,W = PX c 12∣̂

X c

12PX c 22∣̂

X c

22P̂

X c

12∣WP̂

X c

22∣WPW.

(51)

slide-21
SLIDE 21

Motivation Notation The Gray and Wyner Rate Region Main Concepts Our Results Concluding Remarks

Parametrization of Gray and Wyner Rate Region

Proposition 1. (Charalambous and van Schuppen 2019) Consider the statement of Theorem 1, with (X1,X2) in canonical variable form. Then the following hold. (a) The region RGW(∆1,∆2) is determined from T(α1,α2) =inf

QW {I(X1,X2;W ) + α1RX1∣W(∆1) + α2RX2∣W(∆2)}

0 ≤ αi ≤ 1,i = 1,2,α1 + α2 ≥ 1, and the infimum occurs at the diagonal QW = QW ∗ ∈ QW of Theorem 1, part (c).

slide-22
SLIDE 22

Motivation Notation The Gray and Wyner Rate Region Main Concepts Our Results Concluding Remarks

Parametrization of Gray and Wyner Rate Region (cont)

(b) The conditional RDF RX1∣W(∆1) is characterized by RX1∣W(∆1) = inf

∑n

j=1 ∆1,j=∆1

1 2

n

j=1

log ( (1 − dj/Q∗

Wj)

∆1,j )

+

(52) where (⋅)+ △ = max{1,⋅}, E∣∣X c

12 − ̂

X c

12∣∣2 Rn = ∑n j=1 ∆1,j = ∆1, and the

water-filling equations hold: ∆1,j = { λ, λ < 1 − dj/QW ∗

j

1 − dj, λ ≥ 1 − dj/QW ∗

j ,

∆1 ∈ (0,∞). (53) Similarly, for RX2∣W(∆2).

slide-23
SLIDE 23

Motivation Notation The Gray and Wyner Rate Region Main Concepts Our Results Concluding Remarks

Wyner’s Lossy Common Information

Theorem 3. (Charalambous and van Schuppen 2019) Consider the statement of Theorem 1, with (X1,X2) in canonical variable form. Then CGW(X1,X2;∆1,∆2) = CW(X1,X2) (54) =1 2

n

j=1

ln(1 + dj 1 − dj ), (∆1,∆2) ∈ DW (55) DW

={(∆1,∆2) ∈ [0,∞] × [0,∞]∣ 0 ≤ ∆1 ≤ n(1 − d1), 0 ≤ ∆2 ≤ n(1 − d1)}, d1 ∈ (0,1). (56) Formulas (55), (56) is a generalization of the formula for a tuple of jointly Gaussian scalar RVs (X1,X2), i.e., n = 1, initially derived by Gray and Wyner.

slide-24
SLIDE 24

Motivation Notation The Gray and Wyner Rate Region Main Concepts Our Results Concluding Remarks

Conclusions-Future Directions Conclusions:

Calculated rates on the Gray and Wyner 1974 lossy rate region

  • f a tuple of jointly Gaussian RVs, X1 ∶ Ω → Rp1,X2 ∶ Ω → Rp2

with square-error fidelity at the two decoders Constructed stochastic realizations of (X1,X2, ̂ X1, ̂ X2,W ) that achieve the optimal test channels of rates that lie on the Gray and Wyner Rate Region and on the Pangloss Plane

Future Directions:

Much remains to be done to exploit the new approach to

  • ther multi-user problems.