Optimum Source Resolvability Rate with Respect to f -Divergences - - PowerPoint PPT Presentation
Optimum Source Resolvability Rate with Respect to f -Divergences - - PowerPoint PPT Presentation
Optimum Source Resolvability Rate with Respect to f -Divergences Using the Smooth Rnyi Entropy Ryo Nomura (Waseda University, Japan) Hideki Yagi (The Universtiy of Electro-Communications, Japan) ISIT2020, June 21-26, 2020 Outline
Outline
- Preliminaries
- 1. Source resolvability
- 2. f-divergence
- 3. Smooth Rényi entropy
- Main results (Optimum D-achievable resolvability rate )
- Specifjcations
- Conclusion
1
Preliminaries
- 1. Preliminaries
We focus on the source resolvability problem. Given an arbitrary target source, we approximate it by using a discrete random variable which is uniformly distributed. Mapping φ : UM := {1, 2, . . . , M} → X Uniform Random Number UM : i P(i) 1 1/M 2 1/M 3 1/M · · · · · · M 1/M
φ →
Given Target Source X : x P(x) 1 1/3 2 1/8 3 1/9 · · · · · ·
2
- 1. Preliminaries
We focus on the source resolvability problem. Given an arbitrary target source, we approximate it by using a discrete random variable which is uniformly distributed. Mapping φ : UM := {1, 2, . . . , M} → X Uniform Random Number UM : i P(i) 1 1/M 2 1/M 3 1/M · · · · · · M 1/M
φ →
Given Target Source X : x P(x) 1 1/3 2 1/8 3 1/9 · · · · · ·
2
- 1. Preliminaries
- Distance d(φ(UM), X): requested to be small
- Size M (rate: log M): requested to be small
3
- 1. Preliminaries
- Distance d(φ(UM), X): requested to be small
- Size M (rate: log M): requested to be small
3
- 1. Preliminaries
- Distance d(φ(UM), X): requested to be small (≤ D)
- Size M (rate: log M): requested to be as small as possible
4
- 1. Preliminaries
Table 1: Previous results
Approximation measure (or distance) Variational distance KL-divergence f-divergence Information Spectrum Han & Verdú, ’93 Steinberg & Verdú, ’96 Yagi & Han, ’17 Steinberg & Verdú, ’96 Nomura, ’18 Nomura, ISIT’19 Rényi Entropy Uyematsu, ’10
5
- 1. Preliminaries
Table 2: Previous results
Approximation measure (or distance) Variational distance KL-divergence f-divergence Information Spectrum Han & Verdú, ’93 Steinberg & Verdú, ’96 Yagi & Han, ’17 Steinberg & Verdú, ’96 Nomura, ’18 Nomura, ISIT’19 Rényi Entropy Uyematsu, ’10 This study
6
- 1. Preliminaries
Notation
- X = {Xn}∞
n=1: general source with values in countable sets X n.
- PXn: probability distribution of Xn
- UM: random variable uniformly distributed on UM := {1, 2, · · · , M},
PUM(i) = 1 M , 1 ≤ i ≤ M Assumption: H(X) := sup
- R
- lim
n→∞ Pr
1 n log 1 PXn(Xn) ≥ R
- = 1
- < +∞
7
- 1. Preliminaries
Notation
- X = {Xn}∞
n=1: general source with values in countable sets X n.
- PXn: probability distribution of Xn
- UM: random variable uniformly distributed on UM := {1, 2, · · · , M},
PUM(i) = 1 M , 1 ≤ i ≤ M Assumption: H(X) := sup
- R
- lim
n→∞ Pr
1 n log 1 PXn(Xn) ≥ R
- = 1
- < +∞
7
- 1. Preliminaries
We defjne a class of f-divergences between PZ and PZ. Let f(t) be a convex function defjned for t > 0 and f(1) = 0. Defjnition ([Csiszár and Shields, ’04]) Let PZ and PZ denote probability distributions over a fjnite set Z. The f-divergence between PZ and PZ is defjned by Df(Z||Z) :=
- z∈Z
PZ(z)f PZ(z) PZ(z)
- ,
where we set 0f
- = 0, f(0) = limt→0 f(t), 0f( a
0) = a limu→∞ f(u) u .
We next give some examples of f-divergences.
8
- 1. Preliminaries
We defjne a class of f-divergences between PZ and PZ. Let f(t) be a convex function defjned for t > 0 and f(1) = 0. Defjnition ([Csiszár and Shields, ’04]) Let PZ and PZ denote probability distributions over a fjnite set Z. The f-divergence between PZ and PZ is defjned by Df(Z||Z) :=
- z∈Z
PZ(z)f PZ(z) PZ(z)
- ,
where we set 0f
- = 0, f(0) = limt→0 f(t), 0f( a
0) = a limu→∞ f(u) u .
We next give some examples of f-divergences.
8
- 1. Preliminaries
We defjne a class of f-divergences between PZ and PZ. Let f(t) be a convex function defjned for t > 0 and f(1) = 0. Defjnition ([Csiszár and Shields, ’04]) Let PZ and PZ denote probability distributions over a fjnite set Z. The f-divergence between PZ and PZ is defjned by Df(Z||Z) :=
- z∈Z
PZ(z)f PZ(z) PZ(z)
- ,
where we set 0f
- = 0, f(0) = limt→0 f(t), 0f( a
0) = a limu→∞ f(u) u .
We next give some examples of f-divergences.
8
- 1. Preliminaries
We defjne a class of f-divergences between PZ and PZ. Let f(t) be a convex function defjned for t > 0 and f(1) = 0. Defjnition ([Csiszár and Shields, ’04]) Let PZ and PZ denote probability distributions over a fjnite set Z. The f-divergence between PZ and PZ is defjned by Df(Z||Z) :=
- z∈Z
PZ(z)f PZ(z) PZ(z)
- ,
where we set 0f
- = 0, f(0) = limt→0 f(t), 0f( a
0) = a limu→∞ f(u) u .
We next give some examples of f-divergences.
8
- 1. Preliminaries
Defjnition Df(Z||Z) :=
- z∈Z
PZ(z)f PZ(z) PZ(z)
- Examples [Csiszár and Shields, ’04][Sason and Verdú, ’16]
- f(t) = t log t: (KL divergence)
Df(Z||Z) =
- z∈Z
PZ(z) log PZ(z) PZ(z) =: D(Z||Z).
- f(t) = − log t: (Reverse KL divergence)
Df(Z||Z) =
- z∈Z
PZ(z) log PZ(z) PZ(z) = D(Z||Z).
9
- 1. Preliminaries
Defjnition Df(Z||Z) :=
- z∈Z
PZ(z)f PZ(z) PZ(z)
- .
Examples [Csiszár and Shields, ’04][Sason and Verdú, ’16]
- f(t) = 1 −
√ t: (Hellinger distance) Df(Z||Z) = 1 −
- z∈Z
- PZ(z)PZ(z).
- f(t) = (t − 1)+ = (1 − t)+ := max{1 − t, 0}: (Variational distance)
Df(Z||Z) = 1 2
- z∈Z
|(PZ(z) − PZ(z))|.
10
- 1. Preliminaries
Defjnition Df(Z||Z) :=
- z∈Z
PZ(z)f PZ(z) PZ(z)
- .
Examples [Csiszár and Shields, ’04][Sason and Verdú, ’16]
- f(t) = (t − γ)+ : (Eγ-divergence) For any given γ ≥ 1,
Df(Z||Z) =
- z∈Z:PZ(z)>γPZ(z)
(PZ(z) − γPZ(z)) =: Eγ(Z||Z).
- It is not diffjcult to check that f(t) = (γ − t)+ + 1 − γ leads
Eγ-divergence.
- γ = 1: variational distance
11
- 1. Preliminaries
Defjnition Df(Z||Z) :=
- z∈Z
PZ(z)f PZ(z) PZ(z)
- .
Examples [Csiszár and Shields, ’04][Sason and Verdú, ’16]
- f(t) = (t − γ)+ : (Eγ-divergence) For any given γ ≥ 1,
Df(Z||Z) =
- z∈Z:PZ(z)>γPZ(z)
(PZ(z) − γPZ(z)) =: Eγ(Z||Z).
- It is not diffjcult to check that f(t) = (γ − t)+ + 1 − γ leads
Eγ-divergence.
- γ = 1: variational distance
11
- 1. Preliminaries
In this study, we assume the following conditions on the function f. C1) The function f(t) is a strictly decreasing function of t for t ∈ (0, 1] and a decreasing function for t ≥ 1. C2) For any pair of positive numbers (a, b), it holds that lim
n→∞
f
- e−nb
ena = 0. C3) For any number a ∈ [0, 1], it holds that 0f a
- = 0.
Note We only consider the f-divergence with the function f satisfying C1)–C3).
12
- 1. Preliminaries
In this study, we assume the following conditions on the function f. C1) The function f(t) is a strictly decreasing function of t for t ∈ (0, 1] and a decreasing function for t ≥ 1. C2) For any pair of positive numbers (a, b), it holds that lim
n→∞
f
- e−nb
ena = 0. C3) For any number a ∈ [0, 1], it holds that 0f a
- = 0.
Note We only consider the f-divergence with the function f satisfying C1)–C3).
12
- 1. Preliminaries
In this study, we assume the following conditions on the function f. C1) The function f(t) is a strictly decreasing function of t for t ∈ (0, 1] and a decreasing function for t ≥ 1. C2) For any pair of positive numbers (a, b), it holds that lim
n→∞
f
- e−nb
ena = 0. C3) For any number a ∈ [0, 1], it holds that 0f a
- = 0.
Note We only consider the f-divergence with the function f satisfying C1)–C3).
12
- 1. Preliminaries
In this study, we assume the following conditions on the function f. C1) The function f(t) is a strictly decreasing function of t for t ∈ (0, 1] and a decreasing function for t ≥ 1. C2) For any pair of positive numbers (a, b), it holds that lim
n→∞
f
- e−nb
ena = 0. C3) For any number a ∈ [0, 1], it holds that 0f a
- = 0.
Note We only consider the f-divergence with the function f satisfying C1)–C3).
12
- 1. Preliminaries
13
- 2. Main result
Note f(t) = − log t, f(t) = 1 − √ t, and f(t) = (1 − t)+ satisfy three conditions, while f(t) = t log t does not satisfy C1).
14
- 1. Preliminaries
Note f(t) = (γ − t)+ + 1 − γ satisfy three conditions, while f(t) = (t − γ)+ does not satisfy C1).
15
- 1. Preliminaries
Key property
- z∈Z′
b(z)f a(z) b(z)
- ≥
- z∈Z′
b(z)
- f
- z∈Z′ a(z)
- z∈Z′ b(z)
- .
Together with the fact that f(1) = 0, we have Df(Z||Z) ≥ 0.
- This is an analogue of the log-sum inequality in the KL divergence.
16
- 1. Preliminaries
Defjnition: Smooth Rényi entropy of order α ([Renner and Wolf, ’04]) Hα(δ|Xn) := 1 1 − α inf
PXn∈Bδ(PXn) log x∈X n
PXn(x)α
- ,
where Bδ(PXn) :=
- PXn ∈ Pn
- 1
2
- x∈X n
|PXn(x) − PXn(x)| ≤ δ
- .
Theorem ([Uyematsu, ’10]) H0(δ|Xn) = min
An⊂X n Pr{Xn∈An}≥1−δ
log |An|. is also called smooth Max entropy of the source.
17
- 1. Preliminaries
Defjnition: Smooth Rényi entropy of order α ([Renner and Wolf, ’04]) Hα(δ|Xn) := 1 1 − α inf
PXn∈Bδ(PXn) log x∈X n
PXn(x)α
- ,
where Bδ(PXn) :=
- PXn ∈ Pn
- 1
2
- x∈X n
|PXn(x) − PXn(x)| ≤ δ
- .
Theorem ([Uyematsu, ’10]) H0(δ|Xn) = min
An⊂X n Pr{Xn∈An}≥1−δ
log |An|. is also called smooth Max entropy of the source.
17
Main results
- 2. Main Result
Theorem (Direct Theorem) Under conditions C1)–C3), for any γ > 0 and any Mn satisfying 1 n log Mn ≥ 1 nH0(1 − f −1(D)|Xn) + γ, there exists a mapping φn which satisfjes Df(Xn||φn(UMn)) ≤ D + γ for suffjciently large n. H0(δ|Xn): smooth Rényi entropy of order 0
18
- 2. Main result
Theorem (Converse Theorem) Under conditions C1) and C2), for any mapping φn satisfying Df(Xn||φn(UMn)) ≤ D, it holds that 1 n log Mn ≥ 1 nH0(1 − f −1(D)|Xn).
19
- 2. Main result
The optimum D-achievable rate with respect to a class of f-divergences: Defjnition R is said to be D-achievable with the given f-divergence if there exists a mapping φn : UMn → X n such that lim sup
n→∞ Df(Xn||φn(UMn))≤ D, lim sup n→∞
1 n log Mn ≤R. We want here to make R as small as possible. Defjnition (Optimum D-achievable rate) Sf(D|X) := inf {R |R is D-achievable with the given f-divergence} . Our main objective: to determine
20
- 2. Main result
The optimum D-achievable rate with respect to a class of f-divergences: Defjnition R is said to be D-achievable with the given f-divergence if there exists a mapping φn : UMn → X n such that lim sup
n→∞ Df(Xn||φn(UMn))≤ D, lim sup n→∞
1 n log Mn ≤R. We want here to make R as small as possible. Defjnition (Optimum D-achievable rate) Sf(D|X) := inf {R |R is D-achievable with the given f-divergence} . Our main objective: to determine
20
- 2. Main result
The optimum D-achievable rate with respect to a class of f-divergences: Defjnition R is said to be D-achievable with the given f-divergence if there exists a mapping φn : UMn → X n such that lim sup
n→∞ Df(Xn||φn(UMn))≤ D, lim sup n→∞
1 n log Mn ≤R. We want here to make R as small as possible. Defjnition (Optimum D-achievable rate) Sf(D|X) := inf {R |R is D-achievable with the given f-divergence} . Our main objective: to determine
20
- 2. Main result
Then, we have the main theorem: Theorem Under conditions C1)–C3), it holds that Sf(D|X) = lim
ν↓0 lim sup n→∞
1 nH0(1 − f −1(D + ν)|Xn) = lim
ν↓0 lim sup n→∞
1 nH0(1 − f −1(D) + ν|Xn). Proof: Omitted.
21
- 3. Specifjcations
- 3. Specifjcations
We apply our general formula to three cases f(t) = (1 − t)+ → Variational Distance Df(Z||Z) = 1 2
- z∈Z
|(PZ(z) − PZ(z))| f(t) = − log t → Reverse KL Divergence Df(Xn||φn(UMn) =
- x∈X n
Pφn(UMn)(x) log Pφn(UMn)(x) PXn(x) f(t) = (γ − t)+ + 1 − γ → Eγ-divergence Df(Z||Z) =
- z∈Z:PZ(z)>γPZ(z)
(PZ(z) − γPZ(z))
22
- 3. Specifjcations
We apply our general formula to three cases f(t) = (1 − t)+ → Variational Distance Df(Z||Z) = 1 2
- z∈Z
|(PZ(z) − PZ(z))| f(t) = − log t → Reverse KL Divergence Df(Xn||φn(UMn) =
- x∈X n
Pφn(UMn)(x) log Pφn(UMn)(x) PXn(x) f(t) = (γ − t)+ + 1 − γ → Eγ-divergence Df(Z||Z) =
- z∈Z:PZ(z)>γPZ(z)
(PZ(z) − γPZ(z))
22
- 3. Specifjcations
We apply our general formula to three cases f(t) = (1 − t)+ → Variational Distance Df(Z||Z) = 1 2
- z∈Z
|(PZ(z) − PZ(z))| f(t) = − log t → Reverse KL Divergence Df(Xn||φn(UMn) =
- x∈X n
Pφn(UMn)(x) log Pφn(UMn)(x) PXn(x) f(t) = (γ − t)+ + 1 − γ → Eγ-divergence Df(Z||Z) =
- z∈Z:PZ(z)>γPZ(z)
(PZ(z) − γPZ(z))
22
- 3. Specifjcations
Theorem (Optimum D-achievable rate) Sf(D|X) = lim
ν↓0 lim sup n→∞
1 nH0(1 − f −1(D + ν)|Xn). Variational distance: f(t) = (1 − t)+ ⇐ ⇒ f −1(D) = 1 − D (0 ≤ D ≤ 1) Corollary For f(t) = (1 − t)+ it holds that Sf(D|X) = lim
ν↓0 lim n→∞
1 nH0(D + ν|Xn). This result coincides with the result given by [Uyematsu, ’10].
23
- 3. Specifjcations
Theorem (Optimum D-achievable rate) Sf(D|X) = lim
ν↓0 lim sup n→∞
1 nH0(1 − f −1(D + ν)|Xn). Variational distance: f(t) = (1 − t)+ ⇐ ⇒ f −1(D) = 1 − D (0 ≤ D ≤ 1) Corollary For f(t) = (1 − t)+ it holds that Sf(D|X) = lim
ν↓0 lim n→∞
1 nH0(D + ν|Xn). This result coincides with the result given by [Uyematsu, ’10].
23
- 3. Specifjcations
Theorem (Optimum D-achievable rate) Sf(D|X) = lim
ν↓0 lim sup n→∞
1 nH0(1 − f −1(D + ν)|Xn). Variational distance: f(t) = (1 − t)+ ⇐ ⇒ f −1(D) = 1 − D (0 ≤ D ≤ 1) Corollary For f(t) = (1 − t)+ it holds that Sf(D|X) = lim
ν↓0 lim n→∞
1 nH0(D + ν|Xn). This result coincides with the result given by [Uyematsu, ’10].
23
- 3. Specifjcations
Theorem (Optimum D-achievable rate) Sf(D|X) = lim
ν↓0 lim sup n→∞
1 nH0(1 − f −1(D + ν)|Xn). Reverse KL divergence: f(t) = − log t ⇐ ⇒ f −1(D) = e−D Corollary For f(t) = − log t, it holds that Sf(D|X) = lim
ν↓0 lim n→∞
1 nH0(1 − e−(D+ν)|Xn)
24
- 3. Specifjcations
Theorem (Optimum D-achievable rate) Sf(D|X) = lim
ν↓0 lim sup n→∞
1 nH0(1 − f −1(D + ν)|Xn). Reverse KL divergence: f(t) = − log t ⇐ ⇒ f −1(D) = e−D Corollary For f(t) = − log t, it holds that Sf(D|X) = lim
ν↓0 lim n→∞
1 nH0(1 − e−(D+ν)|Xn)
24
- 3. Specifjcations
Theorem (Optimum D-achievable rate) Sf(D|X) = lim
ν↓0 lim sup n→∞
1 nH0(1 − f −1(D + ν)|Xn) Eγ-divergence: f(t) = (γ − t)+ + 1 − γ ⇐ ⇒ f −1(D) = 1 − D (γ ≥ 1) Noticing that γ ≥ 1, substituting f(t) = (γ − t)+ + 1 − γ yields
- γ−Pr
1 n log 1 PXn(Xn) ≤R + + 1 − γ = Pr 1 n log 1 PXn(Xn) >R
- 25
- 3. Specifjcations
Theorem (Optimum D-achievable rate) Sf(D|X) = lim
ν↓0 lim sup n→∞
1 nH0(1 − f −1(D + ν)|Xn) Eγ-divergence: f(t) = (γ − t)+ + 1 − γ ⇐ ⇒ f −1(D) = 1 − D (γ ≥ 1) Noticing that γ ≥ 1, substituting f(t) = (γ − t)+ + 1 − γ yields
- γ−Pr
1 n log 1 PXn(Xn) ≤R + + 1 − γ = Pr 1 n log 1 PXn(Xn) >R
- 25
- 3. Specifjcations
Thus, we have: Corollary For f(t) = (γ − t)+ + 1 − γ, it holds that Sf(D|X) = lim
ν↓0 lim n→∞
1 nH0(D + ν|Xn) Remark Sf(D|X) with the Eγ-divergence does not depend on γ, which shows that it coincides with Sf(D|X) with the variational distance.
26
- 4. Conclusion
We have considered the source resolvability problem with respect to f-divergences. Contributions
- We have characterized the general formula of the optimum
D-achievable rate by using the smooth Rényi entropy including the function f.
- It is easy to derive the optimum D-achievable rate with respect to the
specifjed function f.
27
Thank you!
27