A Conditional Information Inequality and Its Combinatorial - - PowerPoint PPT Presentation

a conditional information inequality and its
SMART_READER_LITE
LIVE PREVIEW

A Conditional Information Inequality and Its Combinatorial - - PowerPoint PPT Presentation

A Conditional Information Inequality and Its Combinatorial Applications Nikolay Vereshchagin 1 based on the joint paper with Tarik Kaced and Andrey Romashchenko 1 Moscow State University, NRU Higher School of Ecomomics and Yandex MIPT 2019 1 /


slide-1
SLIDE 1

A Conditional Information Inequality and Its Combinatorial Applications

Nikolay Vereshchagin1 based on the joint paper with Tarik Kaced and Andrey Romashchenko

1Moscow State University, NRU Higher School of Ecomomics and Yandex

MIPT 2019

1 / 19

slide-2
SLIDE 2

Shannon entropy

H(A) = −

  • a

P[A = a] · log2 P[A = a] H(A|B) = −

  • a,b

P[A = a, B = b] · log2 P[A = a|B = b]

Theorem

H(A) ≤ log2(the number of outcomes of A) and H(A) = log2(the number of outcomes of A) iff A has the uniform distribution.

2 / 19

slide-3
SLIDE 3

Information inequalities

Definition (Basic inequalities)

The chain rule: H(A, B) = H(A) + H(B|A), H(A, B|C) = H(A|C) + H(B|A, C), Sub-additivity: H(A, B) ≤ H(A) + H(B), H(A, B|C) ≤ H(A|C) + H(B|C)

Definition

Linear combinations of basic inequalities are called Shannon type inequalities.

Example

H(B|A) ≤ H(B), H(B|A, C) ≤ H(B|C).

3 / 19

slide-4
SLIDE 4

Combinatorial applications of information inequalities (an example)

Theorem (Shearer’s inequality)

2 · H(A, B, C) ≤ H(A, B) + H(A, C) + H(B, C)

4 / 19

slide-5
SLIDE 5

Theorem (Shearer’s inequality)

2 · H(A, B, C) ≤ H(A, B) + H(A, C) + H(B, C)

Proof.

Add the following inequalities: H(A, B, C) = H(A, B) + H(C|A, B) H(A, B, C) ≤ H(A) + H(B, C) H(C|A, B) ≤ H(C|A) H(A) + H(C|A) = H(A, C)

5 / 19

slide-6
SLIDE 6

Theorem (Loomis–Whitney inequality)

The volume of a 3-dimensional body is at most the square root of the product of its 2-dimensional projections: V 2 ≤ S1S2S3.

6 / 19

slide-7
SLIDE 7

Proof of the discrete version of Loomis–Whitney inequality.

Let (A, B, C) be a random pixel in the body. Then H(A, B, C) = log2 V H(A, B) ≤ log2 S1 H(A, C) ≤ log2 S2 H(B, C) ≤ log2 S3 Plug these values into Shearer’s inequality 2 · H(A, B, C) ≤ H(A, B) + H(A, C) + H(B, C).

7 / 19

slide-8
SLIDE 8

Mutual information

Definition (mutual information)

I(A : B) = H(A) + H(B) − H(A, B) I(A : B|C) = H(A|C) + H(B|C) − H(A, B|C).

Theorem

I(A : B) = 0 iff A, B are independent. I(A : B|C) = 0 iff A, B are independent conditional to C.

8 / 19

slide-9
SLIDE 9

Conditional inequalities (an example)

Proposition

The inequality I(A : B|C) ≤ I(A : B) is false for some A, B, C (let C = A ⊕ B).

Proposition

However, I(B : C|A) = 0 ⇒ I(A : B|C) ≤ I(A : B) Moreover, I(A : B|C) ≤ I(A : B) + I(B : C|A). for all A, B, C.

9 / 19

slide-10
SLIDE 10

Proof of the inequality I(A : B|C) ≤ I(A : B) + I(B : C|A)

Add the inequalities H(A, B) = H(A) + H(B|A) H(B, C|A) = H(C|A) + H(B|A, C) H(A|C) + H(B|A, C) = H(A, B|C) H(B|C) ≤ H(B)

10 / 19

slide-11
SLIDE 11

Another evidence that mutual information is not material

Theorem (folklore)

H(C) ≤ H(C|X) + H(C|Y ) + I(X : Y ) for all C, X, Y .

Theorem (Matuˇ s, Romashchenko)

The inequality I(A : B) ≤ I(A : B|X) + I(A : B|Y ) + I(X : Y ) (Ingleton inequality) is false for some A, B, X, Y .

11 / 19

slide-12
SLIDE 12

A non Shannon-type conditional inequality

Example ( Zhang and Yeung (1997))

I(X : Y |A) = I(X : Y ) = 0 ⇒ I(A : B) ≤ I(A : B|X) + I(A : B|Y ) + I(X : Y ).

Remark

This inequality is essentially conditional: the inequality I(A : B) ≤ I(A : B|X) + I(A : B|Y ) + I(X : Y )+ c · (I(X : Y ) + I(X : Y |A)) is wrong in general for any constant c.

12 / 19

slide-13
SLIDE 13

A non Shannon-type unconditional inequality

Theorem (Makarychev, Makarychev, Romashchenko, V.’ 2002)

I(A : B) ≤ I(A : B|X) + I(A : B|Y ) + I(X : Y ) + I(A : B|C) + I(B : C|A) + I(C : A|B)

13 / 19

slide-14
SLIDE 14

Another non Shannon-type conditional inequality

Example (Kaced and Romashchenko (2013))

I(X : Y |A) = H(A|X, Y ) = 0 ⇒ I(A : B) ≤ I(A : B|X) + I(A : B|Y ) + I(X : Y ). A reformulation: I(X : Y |A) = H(A|X, Y ) = 0 ⇒ H(A|X, B) + H(A|Y , B) ≤ H(A|B). This talk: we “demystify” Kaced and Romashchenko’s inequality and present its combinatorial application.

14 / 19

slide-15
SLIDE 15

Theorem

The inequality H(A|X, B) + H(A|Y , B) ≤ H(A|B) holds true provided the supports of distribution of the pairs (A, X) and (A, Y ) have the following property: P[A = a, X = x] > 0, P[A = a, Y = y] > 0, P[A = a′, X = x] > 0, P[A = a′, Y = y] > 0 ⇒ a = a′

a a′ x y X A Y

Remark

  • 1. The condition here is weaker than that of Kaced and Romashchenko.
  • 2. The condition here relativizes.

15 / 19

slide-16
SLIDE 16

Proof

Step 1. The general case reduces to the case of trivial B: H(A|X) + H(A|Y ) ≤ H(A). Step 2. The general case reduces further to the case when X, Y are independent conditional to A.

Proof.

Define new random variables A′, X ′, Y ′ so that: the marginal distributions of (A′, X ′) and (A′, Y ′) are the same as the marginal distributions of (A, X) and (A, Y ), but X ′, Y ′ are independent conditional to A′. Step 3. We prove the Shannon type inequality H(A|X) + H(A|Y ) ≤ H(A) + H(A|X, Y ) + I(X : Y |A)

16 / 19

slide-17
SLIDE 17

A combinatorial application of the inequality H(A|X) + H(A|Y ) ≤ H(A)

Theorem

Assume that a finite family F of pair-wise disjoint squares is given, each square being a subset of [0, 1] × [0, 1]. Assume that each vertical line inside [0, 1] × [0, 1] intersects at least L squares in F and similarly each horizontal line intersects at least R squares in F. Then |F| ≥ LR.

17 / 19

slide-18
SLIDE 18

The proof of a discrete version of the theorem (each square consists of pixels)

Let A = S · T be a randomly chosen square from F, where the probability

  • f each square is proportional to its length |S| = |T| (not area!). Let

(X, Y ) be a random pair from A (chosen with the uniform distribution). The conditions of the inequality are fulfilled hence H(A|X) + H(A|Y ) ≤ H(A) One can show that A|X and A|Y have uniform distributions, hence H(A|X) ≥ log R and H(A|Y ) ≥ log L. It follows that log L + log R ≤ H(A). As H(A) ≤ log |F|, the theorem follows.

18 / 19

slide-19
SLIDE 19

Thank you for attention!

19 / 19