Optimum Source Resolvability Rate with Respect to f -Divergences - - PowerPoint PPT Presentation

optimum source resolvability rate with respect to f
SMART_READER_LITE
LIVE PREVIEW

Optimum Source Resolvability Rate with Respect to f -Divergences - - PowerPoint PPT Presentation

Optimum Source Resolvability Rate with Respect to f -Divergences Using the Smooth Rnyi Entropy Ryo Nomura (Waseda University, Japan) Hideki Yagi (The Universtiy of Electro-Communications, Japan) ISIT2020, June 21-26, 2020 Outline


slide-1
SLIDE 1

Optimum Source Resolvability Rate with Respect to f-Divergences Using the Smooth Rényi Entropy

Ryo Nomura (Waseda University, Japan) Hideki Yagi (The Universtiy of Electro-Communications, Japan) ISIT2020, June 21-26, 2020

slide-2
SLIDE 2

Outline

  • Preliminaries
  • 1. Source resolvability
  • 2. f-divergence
  • 3. Smooth Rényi entropy
  • Main results (Optimum D-achievable resolvability rate )
  • Specifjcations
  • Conclusion

1

slide-3
SLIDE 3

Preliminaries

slide-4
SLIDE 4
  • 1. Preliminaries

We focus on the source resolvability problem. Given an arbitrary target source, we approximate it by using a discrete random variable which is uniformly distributed. Mapping φ : UM := {1, 2, . . . , M} → X Uniform Random Number UM : i P(i) 1 1/M 2 1/M 3 1/M · · · · · · M 1/M

φ →

Given Target Source X : x P(x) 1 1/3 2 1/8 3 1/9 · · · · · ·

2

slide-5
SLIDE 5
  • 1. Preliminaries

We focus on the source resolvability problem. Given an arbitrary target source, we approximate it by using a discrete random variable which is uniformly distributed. Mapping φ : UM := {1, 2, . . . , M} → X Uniform Random Number UM : i P(i) 1 1/M 2 1/M 3 1/M · · · · · · M 1/M

φ →

Given Target Source X : x P(x) 1 1/3 2 1/8 3 1/9 · · · · · ·

2

slide-6
SLIDE 6
  • 1. Preliminaries
  • Distance d(φ(UM), X): requested to be small
  • Size M (rate: log M): requested to be small

3

slide-7
SLIDE 7
  • 1. Preliminaries
  • Distance d(φ(UM), X): requested to be small
  • Size M (rate: log M): requested to be small

3

slide-8
SLIDE 8
  • 1. Preliminaries
  • Distance d(φ(UM), X): requested to be small (≤ D)
  • Size M (rate: log M): requested to be as small as possible

4

slide-9
SLIDE 9
  • 1. Preliminaries

Table 1: Previous results

Approximation measure (or distance) Variational distance KL-divergence f-divergence Information Spectrum Han & Verdú, ’93 Steinberg & Verdú, ’96 Yagi & Han, ’17 Steinberg & Verdú, ’96 Nomura, ’18 Nomura, ISIT’19 Rényi Entropy Uyematsu, ’10

5

slide-10
SLIDE 10
  • 1. Preliminaries

Table 2: Previous results

Approximation measure (or distance) Variational distance KL-divergence f-divergence Information Spectrum Han & Verdú, ’93 Steinberg & Verdú, ’96 Yagi & Han, ’17 Steinberg & Verdú, ’96 Nomura, ’18 Nomura, ISIT’19 Rényi Entropy Uyematsu, ’10 This study

6

slide-11
SLIDE 11
  • 1. Preliminaries

Notation

  • X = {Xn}∞

n=1: general source with values in countable sets X n.

  • PXn: probability distribution of Xn
  • UM: random variable uniformly distributed on UM := {1, 2, · · · , M},

PUM(i) = 1 M , 1 ≤ i ≤ M Assumption: H(X) := sup

  • R
  • lim

n→∞ Pr

1 n log 1 PXn(Xn) ≥ R

  • = 1
  • < +∞

7

slide-12
SLIDE 12
  • 1. Preliminaries

Notation

  • X = {Xn}∞

n=1: general source with values in countable sets X n.

  • PXn: probability distribution of Xn
  • UM: random variable uniformly distributed on UM := {1, 2, · · · , M},

PUM(i) = 1 M , 1 ≤ i ≤ M Assumption: H(X) := sup

  • R
  • lim

n→∞ Pr

1 n log 1 PXn(Xn) ≥ R

  • = 1
  • < +∞

7

slide-13
SLIDE 13
  • 1. Preliminaries

We defjne a class of f-divergences between PZ and PZ. Let f(t) be a convex function defjned for t > 0 and f(1) = 0. Defjnition ([Csiszár and Shields, ’04]) Let PZ and PZ denote probability distributions over a fjnite set Z. The f-divergence between PZ and PZ is defjned by Df(Z||Z) :=

  • z∈Z

PZ(z)f PZ(z) PZ(z)

  • ,

where we set 0f

  • = 0, f(0) = limt→0 f(t), 0f( a

0) = a limu→∞ f(u) u .

We next give some examples of f-divergences.

8

slide-14
SLIDE 14
  • 1. Preliminaries

We defjne a class of f-divergences between PZ and PZ. Let f(t) be a convex function defjned for t > 0 and f(1) = 0. Defjnition ([Csiszár and Shields, ’04]) Let PZ and PZ denote probability distributions over a fjnite set Z. The f-divergence between PZ and PZ is defjned by Df(Z||Z) :=

  • z∈Z

PZ(z)f PZ(z) PZ(z)

  • ,

where we set 0f

  • = 0, f(0) = limt→0 f(t), 0f( a

0) = a limu→∞ f(u) u .

We next give some examples of f-divergences.

8

slide-15
SLIDE 15
  • 1. Preliminaries

We defjne a class of f-divergences between PZ and PZ. Let f(t) be a convex function defjned for t > 0 and f(1) = 0. Defjnition ([Csiszár and Shields, ’04]) Let PZ and PZ denote probability distributions over a fjnite set Z. The f-divergence between PZ and PZ is defjned by Df(Z||Z) :=

  • z∈Z

PZ(z)f PZ(z) PZ(z)

  • ,

where we set 0f

  • = 0, f(0) = limt→0 f(t), 0f( a

0) = a limu→∞ f(u) u .

We next give some examples of f-divergences.

8

slide-16
SLIDE 16
  • 1. Preliminaries

We defjne a class of f-divergences between PZ and PZ. Let f(t) be a convex function defjned for t > 0 and f(1) = 0. Defjnition ([Csiszár and Shields, ’04]) Let PZ and PZ denote probability distributions over a fjnite set Z. The f-divergence between PZ and PZ is defjned by Df(Z||Z) :=

  • z∈Z

PZ(z)f PZ(z) PZ(z)

  • ,

where we set 0f

  • = 0, f(0) = limt→0 f(t), 0f( a

0) = a limu→∞ f(u) u .

We next give some examples of f-divergences.

8

slide-17
SLIDE 17
  • 1. Preliminaries

Defjnition Df(Z||Z) :=

  • z∈Z

PZ(z)f PZ(z) PZ(z)

  • Examples [Csiszár and Shields, ’04][Sason and Verdú, ’16]
  • f(t) = t log t: (KL divergence)

Df(Z||Z) =

  • z∈Z

PZ(z) log PZ(z) PZ(z) =: D(Z||Z).

  • f(t) = − log t: (Reverse KL divergence)

Df(Z||Z) =

  • z∈Z

PZ(z) log PZ(z) PZ(z) = D(Z||Z).

9

slide-18
SLIDE 18
  • 1. Preliminaries

Defjnition Df(Z||Z) :=

  • z∈Z

PZ(z)f PZ(z) PZ(z)

  • .

Examples [Csiszár and Shields, ’04][Sason and Verdú, ’16]

  • f(t) = 1 −

√ t: (Hellinger distance) Df(Z||Z) = 1 −

  • z∈Z
  • PZ(z)PZ(z).
  • f(t) = (t − 1)+ = (1 − t)+ := max{1 − t, 0}: (Variational distance)

Df(Z||Z) = 1 2

  • z∈Z

|(PZ(z) − PZ(z))|.

10

slide-19
SLIDE 19
  • 1. Preliminaries

Defjnition Df(Z||Z) :=

  • z∈Z

PZ(z)f PZ(z) PZ(z)

  • .

Examples [Csiszár and Shields, ’04][Sason and Verdú, ’16]

  • f(t) = (t − γ)+ : (Eγ-divergence) For any given γ ≥ 1,

Df(Z||Z) =

  • z∈Z:PZ(z)>γPZ(z)

(PZ(z) − γPZ(z)) =: Eγ(Z||Z).

  • It is not diffjcult to check that f(t) = (γ − t)+ + 1 − γ leads

Eγ-divergence.

  • γ = 1: variational distance

11

slide-20
SLIDE 20
  • 1. Preliminaries

Defjnition Df(Z||Z) :=

  • z∈Z

PZ(z)f PZ(z) PZ(z)

  • .

Examples [Csiszár and Shields, ’04][Sason and Verdú, ’16]

  • f(t) = (t − γ)+ : (Eγ-divergence) For any given γ ≥ 1,

Df(Z||Z) =

  • z∈Z:PZ(z)>γPZ(z)

(PZ(z) − γPZ(z)) =: Eγ(Z||Z).

  • It is not diffjcult to check that f(t) = (γ − t)+ + 1 − γ leads

Eγ-divergence.

  • γ = 1: variational distance

11

slide-21
SLIDE 21
  • 1. Preliminaries

In this study, we assume the following conditions on the function f. C1) The function f(t) is a strictly decreasing function of t for t ∈ (0, 1] and a decreasing function for t ≥ 1. C2) For any pair of positive numbers (a, b), it holds that lim

n→∞

f

  • e−nb

ena = 0. C3) For any number a ∈ [0, 1], it holds that 0f a

  • = 0.

Note We only consider the f-divergence with the function f satisfying C1)–C3).

12

slide-22
SLIDE 22
  • 1. Preliminaries

In this study, we assume the following conditions on the function f. C1) The function f(t) is a strictly decreasing function of t for t ∈ (0, 1] and a decreasing function for t ≥ 1. C2) For any pair of positive numbers (a, b), it holds that lim

n→∞

f

  • e−nb

ena = 0. C3) For any number a ∈ [0, 1], it holds that 0f a

  • = 0.

Note We only consider the f-divergence with the function f satisfying C1)–C3).

12

slide-23
SLIDE 23
  • 1. Preliminaries

In this study, we assume the following conditions on the function f. C1) The function f(t) is a strictly decreasing function of t for t ∈ (0, 1] and a decreasing function for t ≥ 1. C2) For any pair of positive numbers (a, b), it holds that lim

n→∞

f

  • e−nb

ena = 0. C3) For any number a ∈ [0, 1], it holds that 0f a

  • = 0.

Note We only consider the f-divergence with the function f satisfying C1)–C3).

12

slide-24
SLIDE 24
  • 1. Preliminaries

In this study, we assume the following conditions on the function f. C1) The function f(t) is a strictly decreasing function of t for t ∈ (0, 1] and a decreasing function for t ≥ 1. C2) For any pair of positive numbers (a, b), it holds that lim

n→∞

f

  • e−nb

ena = 0. C3) For any number a ∈ [0, 1], it holds that 0f a

  • = 0.

Note We only consider the f-divergence with the function f satisfying C1)–C3).

12

slide-25
SLIDE 25
  • 1. Preliminaries

13

slide-26
SLIDE 26
  • 2. Main result

Note f(t) = − log t, f(t) = 1 − √ t, and f(t) = (1 − t)+ satisfy three conditions, while f(t) = t log t does not satisfy C1).

14

slide-27
SLIDE 27
  • 1. Preliminaries

Note f(t) = (γ − t)+ + 1 − γ satisfy three conditions, while f(t) = (t − γ)+ does not satisfy C1).

15

slide-28
SLIDE 28
  • 1. Preliminaries

Key property

  • z∈Z′

b(z)f a(z) b(z)

  • z∈Z′

b(z)

  • f
  • z∈Z′ a(z)
  • z∈Z′ b(z)
  • .

Together with the fact that f(1) = 0, we have Df(Z||Z) ≥ 0.

  • This is an analogue of the log-sum inequality in the KL divergence.

16

slide-29
SLIDE 29
  • 1. Preliminaries

Defjnition: Smooth Rényi entropy of order α ([Renner and Wolf, ’04]) Hα(δ|Xn) := 1 1 − α inf

PXn∈Bδ(PXn) log x∈X n

PXn(x)α

  • ,

where Bδ(PXn) :=

  • PXn ∈ Pn
  • 1

2

  • x∈X n

|PXn(x) − PXn(x)| ≤ δ

  • .

Theorem ([Uyematsu, ’10]) H0(δ|Xn) = min

An⊂X n Pr{Xn∈An}≥1−δ

log |An|. is also called smooth Max entropy of the source.

17

slide-30
SLIDE 30
  • 1. Preliminaries

Defjnition: Smooth Rényi entropy of order α ([Renner and Wolf, ’04]) Hα(δ|Xn) := 1 1 − α inf

PXn∈Bδ(PXn) log x∈X n

PXn(x)α

  • ,

where Bδ(PXn) :=

  • PXn ∈ Pn
  • 1

2

  • x∈X n

|PXn(x) − PXn(x)| ≤ δ

  • .

Theorem ([Uyematsu, ’10]) H0(δ|Xn) = min

An⊂X n Pr{Xn∈An}≥1−δ

log |An|. is also called smooth Max entropy of the source.

17

slide-31
SLIDE 31

Main results

slide-32
SLIDE 32
  • 2. Main Result

Theorem (Direct Theorem) Under conditions C1)–C3), for any γ > 0 and any Mn satisfying 1 n log Mn ≥ 1 nH0(1 − f −1(D)|Xn) + γ, there exists a mapping φn which satisfjes Df(Xn||φn(UMn)) ≤ D + γ for suffjciently large n. H0(δ|Xn): smooth Rényi entropy of order 0

18

slide-33
SLIDE 33
  • 2. Main result

Theorem (Converse Theorem) Under conditions C1) and C2), for any mapping φn satisfying Df(Xn||φn(UMn)) ≤ D, it holds that 1 n log Mn ≥ 1 nH0(1 − f −1(D)|Xn).

19

slide-34
SLIDE 34
  • 2. Main result

The optimum D-achievable rate with respect to a class of f-divergences: Defjnition R is said to be D-achievable with the given f-divergence if there exists a mapping φn : UMn → X n such that lim sup

n→∞ Df(Xn||φn(UMn))≤ D, lim sup n→∞

1 n log Mn ≤R. We want here to make R as small as possible. Defjnition (Optimum D-achievable rate) Sf(D|X) := inf {R |R is D-achievable with the given f-divergence} . Our main objective: to determine

20

slide-35
SLIDE 35
  • 2. Main result

The optimum D-achievable rate with respect to a class of f-divergences: Defjnition R is said to be D-achievable with the given f-divergence if there exists a mapping φn : UMn → X n such that lim sup

n→∞ Df(Xn||φn(UMn))≤ D, lim sup n→∞

1 n log Mn ≤R. We want here to make R as small as possible. Defjnition (Optimum D-achievable rate) Sf(D|X) := inf {R |R is D-achievable with the given f-divergence} . Our main objective: to determine

20

slide-36
SLIDE 36
  • 2. Main result

The optimum D-achievable rate with respect to a class of f-divergences: Defjnition R is said to be D-achievable with the given f-divergence if there exists a mapping φn : UMn → X n such that lim sup

n→∞ Df(Xn||φn(UMn))≤ D, lim sup n→∞

1 n log Mn ≤R. We want here to make R as small as possible. Defjnition (Optimum D-achievable rate) Sf(D|X) := inf {R |R is D-achievable with the given f-divergence} . Our main objective: to determine

20

slide-37
SLIDE 37
  • 2. Main result

Then, we have the main theorem: Theorem Under conditions C1)–C3), it holds that Sf(D|X) = lim

ν↓0 lim sup n→∞

1 nH0(1 − f −1(D + ν)|Xn) = lim

ν↓0 lim sup n→∞

1 nH0(1 − f −1(D) + ν|Xn). Proof: Omitted.

21

slide-38
SLIDE 38
  • 3. Specifjcations
slide-39
SLIDE 39
  • 3. Specifjcations

We apply our general formula to three cases f(t) = (1 − t)+ → Variational Distance Df(Z||Z) = 1 2

  • z∈Z

|(PZ(z) − PZ(z))| f(t) = − log t → Reverse KL Divergence Df(Xn||φn(UMn) =

  • x∈X n

Pφn(UMn)(x) log Pφn(UMn)(x) PXn(x) f(t) = (γ − t)+ + 1 − γ → Eγ-divergence Df(Z||Z) =

  • z∈Z:PZ(z)>γPZ(z)

(PZ(z) − γPZ(z))

22

slide-40
SLIDE 40
  • 3. Specifjcations

We apply our general formula to three cases f(t) = (1 − t)+ → Variational Distance Df(Z||Z) = 1 2

  • z∈Z

|(PZ(z) − PZ(z))| f(t) = − log t → Reverse KL Divergence Df(Xn||φn(UMn) =

  • x∈X n

Pφn(UMn)(x) log Pφn(UMn)(x) PXn(x) f(t) = (γ − t)+ + 1 − γ → Eγ-divergence Df(Z||Z) =

  • z∈Z:PZ(z)>γPZ(z)

(PZ(z) − γPZ(z))

22

slide-41
SLIDE 41
  • 3. Specifjcations

We apply our general formula to three cases f(t) = (1 − t)+ → Variational Distance Df(Z||Z) = 1 2

  • z∈Z

|(PZ(z) − PZ(z))| f(t) = − log t → Reverse KL Divergence Df(Xn||φn(UMn) =

  • x∈X n

Pφn(UMn)(x) log Pφn(UMn)(x) PXn(x) f(t) = (γ − t)+ + 1 − γ → Eγ-divergence Df(Z||Z) =

  • z∈Z:PZ(z)>γPZ(z)

(PZ(z) − γPZ(z))

22

slide-42
SLIDE 42
  • 3. Specifjcations

Theorem (Optimum D-achievable rate) Sf(D|X) = lim

ν↓0 lim sup n→∞

1 nH0(1 − f −1(D + ν)|Xn). Variational distance: f(t) = (1 − t)+ ⇐ ⇒ f −1(D) = 1 − D (0 ≤ D ≤ 1) Corollary For f(t) = (1 − t)+ it holds that Sf(D|X) = lim

ν↓0 lim n→∞

1 nH0(D + ν|Xn). This result coincides with the result given by [Uyematsu, ’10].

23

slide-43
SLIDE 43
  • 3. Specifjcations

Theorem (Optimum D-achievable rate) Sf(D|X) = lim

ν↓0 lim sup n→∞

1 nH0(1 − f −1(D + ν)|Xn). Variational distance: f(t) = (1 − t)+ ⇐ ⇒ f −1(D) = 1 − D (0 ≤ D ≤ 1) Corollary For f(t) = (1 − t)+ it holds that Sf(D|X) = lim

ν↓0 lim n→∞

1 nH0(D + ν|Xn). This result coincides with the result given by [Uyematsu, ’10].

23

slide-44
SLIDE 44
  • 3. Specifjcations

Theorem (Optimum D-achievable rate) Sf(D|X) = lim

ν↓0 lim sup n→∞

1 nH0(1 − f −1(D + ν)|Xn). Variational distance: f(t) = (1 − t)+ ⇐ ⇒ f −1(D) = 1 − D (0 ≤ D ≤ 1) Corollary For f(t) = (1 − t)+ it holds that Sf(D|X) = lim

ν↓0 lim n→∞

1 nH0(D + ν|Xn). This result coincides with the result given by [Uyematsu, ’10].

23

slide-45
SLIDE 45
  • 3. Specifjcations

Theorem (Optimum D-achievable rate) Sf(D|X) = lim

ν↓0 lim sup n→∞

1 nH0(1 − f −1(D + ν)|Xn). Reverse KL divergence: f(t) = − log t ⇐ ⇒ f −1(D) = e−D Corollary For f(t) = − log t, it holds that Sf(D|X) = lim

ν↓0 lim n→∞

1 nH0(1 − e−(D+ν)|Xn)

24

slide-46
SLIDE 46
  • 3. Specifjcations

Theorem (Optimum D-achievable rate) Sf(D|X) = lim

ν↓0 lim sup n→∞

1 nH0(1 − f −1(D + ν)|Xn). Reverse KL divergence: f(t) = − log t ⇐ ⇒ f −1(D) = e−D Corollary For f(t) = − log t, it holds that Sf(D|X) = lim

ν↓0 lim n→∞

1 nH0(1 − e−(D+ν)|Xn)

24

slide-47
SLIDE 47
  • 3. Specifjcations

Theorem (Optimum D-achievable rate) Sf(D|X) = lim

ν↓0 lim sup n→∞

1 nH0(1 − f −1(D + ν)|Xn) Eγ-divergence: f(t) = (γ − t)+ + 1 − γ ⇐ ⇒ f −1(D) = 1 − D (γ ≥ 1) Noticing that γ ≥ 1, substituting f(t) = (γ − t)+ + 1 − γ yields

  • γ−Pr

1 n log 1 PXn(Xn) ≤R + + 1 − γ = Pr 1 n log 1 PXn(Xn) >R

  • 25
slide-48
SLIDE 48
  • 3. Specifjcations

Theorem (Optimum D-achievable rate) Sf(D|X) = lim

ν↓0 lim sup n→∞

1 nH0(1 − f −1(D + ν)|Xn) Eγ-divergence: f(t) = (γ − t)+ + 1 − γ ⇐ ⇒ f −1(D) = 1 − D (γ ≥ 1) Noticing that γ ≥ 1, substituting f(t) = (γ − t)+ + 1 − γ yields

  • γ−Pr

1 n log 1 PXn(Xn) ≤R + + 1 − γ = Pr 1 n log 1 PXn(Xn) >R

  • 25
slide-49
SLIDE 49
  • 3. Specifjcations

Thus, we have: Corollary For f(t) = (γ − t)+ + 1 − γ, it holds that Sf(D|X) = lim

ν↓0 lim n→∞

1 nH0(D + ν|Xn) Remark Sf(D|X) with the Eγ-divergence does not depend on γ, which shows that it coincides with Sf(D|X) with the variational distance.

26

slide-50
SLIDE 50
  • 4. Conclusion

We have considered the source resolvability problem with respect to f-divergences. Contributions

  • We have characterized the general formula of the optimum

D-achievable rate by using the smooth Rényi entropy including the function f.

  • It is easy to derive the optimum D-achievable rate with respect to the

specifjed function f.

27

slide-51
SLIDE 51

Thank you!

27