SLIDE 1
Tight bounds for Communication assisted agreement distillation - - PowerPoint PPT Presentation
Tight bounds for Communication assisted agreement distillation - - PowerPoint PPT Presentation
Tight bounds for Communication assisted agreement distillation Jaikumar Radhakrishnan Tata Institute of Fundamental Research, Mumbai Joint work with Venkat Guruswami, Carnegie Mellon University Agreement distillation Alice Bob Input: X {
SLIDE 2
SLIDE 3
Agreement distillation
Alice Bob Input: X ∈ {0, 1}N Output: fA(X) ∈ {0, 1}k Input: Y ∈ {0, 1}N Output: fB(Y) ∈ {0, 1}k (X, Y) ∼ BSC(ε): Pr[Xi = Yi] = ε Goal fA(X) uniformly distributed in {0, 1}k Pr[fA(X) = fB(Y)] close to 1
SLIDE 4
Naive protocol: no communication
Alice Bob Input: X ∈ {0, 1}N Output: fA(X) = X1X2 . . . Xk Input: Y ∈ {0, 1}N Output: fB(Y) = Y1Y2 . . . Yk Success probability Pr[fA(X) = fB(Y)] = (1 − ε)k ≈ exp(−εk)
SLIDE 5
Naive protocol: no communication
Alice Bob Input: X ∈ {0, 1}N Output: fA(X) = X1X2 . . . Xk Input: Y ∈ {0, 1}N Output: fB(Y) = Y1Y2 . . . Yk Success probability Pr[fA(X) = fB(Y)] = (1 − ε)k ≈ exp(−εk)
SLIDE 6
Can we do better?
Yes, a little better (Bogdanov & Mossel 2011) Alice and Bob can agree with probability least ≈ 2−(ε/(1−ε))k. But, no better (Bogdanov & Mossel 2011) Alice and Bob can agree with probability at most ≈ 2−(ε/(1−ε))k.
SLIDE 7
Can we do better?
Yes, a little better (Bogdanov & Mossel 2011) Alice and Bob can agree with probability least ≈ 2−(ε/(1−ε))k. But, no better (Bogdanov & Mossel 2011) Alice and Bob can agree with probability at most ≈ 2−(ε/(1−ε))k.
SLIDE 8
Can we do better?
Yes, a little better (Bogdanov & Mossel 2011) Alice and Bob can agree with probability least ≈ 2−(ε/(1−ε))k. But, no better (Bogdanov & Mossel 2011) Alice and Bob can agree with probability at most ≈ 2−(ε/(1−ε))k.
SLIDE 9
How much can communication help?
Alice Bob Input: X ∈ {0, 1}N Output: fA(X) = {0, 1}k = ⇒ M Input: Y ∈ {0, 1}N Output: fB(Y, M) ∈ {0, 1}k How many bits must Alice send Bob to ensure agreement with constant probability? What is the trade-off between communicaton and probability of agreement?
SLIDE 10
How much can communication help?
Alice Bob Input: X ∈ {0, 1}N Output: fA(X) = {0, 1}k = ⇒ M Input: Y ∈ {0, 1}N Output: fB(Y, M) ∈ {0, 1}k How many bits must Alice send Bob to ensure agreement with constant probability? What is the trade-off between communicaton and probability of agreement?
SLIDE 11
The trade-off
Definition CBSC(ε)(k, η) is the minimum number of bits Alice transmits to Bob in a protocol where gA(X) is uniformly distributed in {0, 1}k Pr[gA(X) = gB(Y, M)] ≥ η Probability of agreement = 2−γk Communication = ck BM ’10: If c = 0, then γ = ε/(1 − ε) This work: If c = 4ε(1 − ε), then γ → 0. c γ
SLIDE 12
The trade-off
Definition CBSC(ε)(k, η) is the minimum number of bits Alice transmits to Bob in a protocol where gA(X) is uniformly distributed in {0, 1}k Pr[gA(X) = gB(Y, M)] ≥ η Probability of agreement = 2−γk Communication = ck BM ’10: If c = 0, then γ = ε/(1 − ε) This work: If c = 4ε(1 − ε), then γ → 0. ε/(1 − ε) c γ
SLIDE 13
The trade-off
Definition CBSC(ε)(k, η) is the minimum number of bits Alice transmits to Bob in a protocol where gA(X) is uniformly distributed in {0, 1}k Pr[gA(X) = gB(Y, M)] ≥ η Probability of agreement = 2−γk Communication = ck BM ’10: If c = 0, then γ = ε/(1 − ε) This work: If c = 4ε(1 − ε), then γ → 0. B := 4ε(1 − ε) ε/(1 − ε) c γ
SLIDE 14
The trade-off
Definition CBSC(ε)(k, η) is the minimum number of bits Alice transmits to Bob in a protocol where gA(X) is uniformly distributed in {0, 1}k Pr[gA(X) = gB(Y, M)] ≥ η Probability of agreement = 2−γk Communication = ck BM ’10: If c = 0, then γ = ε/(1 − ε) This work: If c = 4ε(1 − ε), then γ → 0. B := 4ε(1 − ε) ε/(1 − ε) c = B(1 − γ) − 2
- B(1 − B)γ
c γ
SLIDE 15
Related work
Communication complexity: Canonne, Guruswami, Meka, and Sudan (2015) used capacity-achieving codes to ensure agreement with high probability with (h(ε) + o(1))k bits of communication. Information theory: The case k = 1 is the subject of a recent conjecture of Courtade and Kumar (2014): The function gA(X) = X1 maximizes I[gA(X) : Y]. Chandar and Tchamkerten (2014) showed that the corresponding conjecture is false for large k.
SLIDE 16
The protocol of Bogdanov and Mossel
View {+1, −1}N as points on an N-dimensional sphere. Pick 2k well-sparated vectors, labelled by {0, 1}k. Alice: fA(X) = closest vector to X Bob: fB(Y) = closest vector to Y Proof idea The projections along the various directions are Gaussian and approximately independent.
SLIDE 17
The protocol of Bogdanov and Mossel
View {+1, −1}N as points on an N-dimensional sphere. Pick 2k well-sparated vectors, labelled by {0, 1}k. Alice: fA(X) = closest vector to X Bob: fB(Y) = closest vector to Y Proof idea The projections along the various directions are Gaussian and approximately independent.
SLIDE 18
The protocol of Bogdanov and Mossel
View {+1, −1}N as points on an N-dimensional sphere. Pick 2k well-sparated vectors, labelled by {0, 1}k. Alice: fA(X) = closest vector to X Bob: fB(Y) = closest vector to Y Proof idea The projections along the various directions are Gaussian and approximately independent.
SLIDE 19
The protocol of Bogdanov and Mossel
View {+1, −1}N as points on an N-dimensional sphere. Pick 2k well-sparated vectors, labelled by {0, 1}k. Alice: fA(X) = closest vector to X Bob: fB(Y) = closest vector to Y Proof idea The projections along the various directions are Gaussian and approximately independent.
SLIDE 20
The protocol of Bogdanov and Mossel
View {+1, −1}N as points on an N-dimensional sphere. Pick 2k well-sparated vectors, labelled by {0, 1}k. Alice: fA(X) = closest vector to X Bob: fB(Y) = closest vector to Y Proof idea The projections along the various directions are Gaussian and approximately independent.
SLIDE 21
The protocol of Bogdanov and Mossel
View {+1, −1}N as points on an N-dimensional sphere. Pick 2k well-sparated vectors, labelled by {0, 1}k. Alice: fA(X) = closest vector to X Bob: fB(Y) = closest vector to Y Proof idea The projections along the various directions are Gaussian and approximately independent.
SLIDE 22
Alice’s view
The ambient space is {+1, −1}N. The space is partitioned into disks. When X falls in a disk, Alice reports the label of its center. Each disk has volume ≈ 2−k.
SLIDE 23
Bob’s view
The disks are bigger and overlap. Bogdanov and Mossel ’10: About 2−(ε/(1−ε))k of the volume is covered by
- nly one disk.
SLIDE 24
Bob’s view
The disks are bigger and overlap. Bogdanov and Mossel ’10: About 2−(ε/(1−ε))k of the volume is covered by
- nly one disk.
SLIDE 25
Alice’s message
The space is partitioned into disks. The disks are colored using 2c colors. When X falls in a disk, Alice reports the label of its center. Alice sends Bob the color of the disk (c bits).
SLIDE 26
Alice’s message
The space is partitioned into disks. The disks are colored using 2c colors. When X falls in a disk, Alice reports the label of its center. Alice sends Bob the color of the disk (c bits).
SLIDE 27
Alice’s message
The space is partitioned into disks. The disks are colored using 2c colors. When X falls in a disk, Alice reports the label of its center. Alice sends Bob the color of the disk (c bits).
SLIDE 28
Bob’s view
Again, the disks are bigger and overlap. But, most points are covered by only
- ne disk of a given color.
Bob uniquely identifies the disk (and its center). How many colors must Alice use?
SLIDE 29
Bob’s view
Again, the disks are bigger and overlap. But, most points are covered by only
- ne disk of a given color.
Bob uniquely identifies the disk (and its center). How many colors must Alice use?
SLIDE 30
Bob’s view
Again, the disks are bigger and overlap. But, most points are covered by only
- ne disk of a given color.
Bob uniquely identifies the disk (and its center). How many colors must Alice use?
SLIDE 31
Bob’s view
Again, the disks are bigger and overlap. But, most points are covered by only
- ne disk of a given color.
Bob uniquely identifies the disk (and its center). How many colors must Alice use?
SLIDE 32
Bob’s view
Again, the disks are bigger and overlap. But, most points are covered by only
- ne disk of a given color.
Bob uniquely identifies the disk (and its center). How many colors must Alice use?
SLIDE 33
The calculation
Suppose Alice chooses (1, 1, . . . , 1) vol(disk) ≈ 2−k. Then for a typical x in Alice’s disk, bias(x) = t, where exp(−t2) ≈ 2−k. bias(y) = (1 − 2ε)t. Then, vol(expanded disk) ≈ 2−(1−2ε)2k. So, a typical point is in 2k2−k(1−2ε)2 = 24ε(1−ε)k disks. If Alice uses ≈ 24ε(1−ε)k colors, then most points will be covered by at most
- ne disk of any color.
SLIDE 34
The calculation
Suppose Alice chooses (1, 1, . . . , 1) vol(disk) ≈ 2−k. Then for a typical x in Alice’s disk, bias(x) = t, where exp(−t2) ≈ 2−k. bias(y) = (1 − 2ε)t. Then, vol(expanded disk) ≈ 2−(1−2ε)2k. So, a typical point is in 2k2−k(1−2ε)2 = 24ε(1−ε)k disks. If Alice uses ≈ 24ε(1−ε)k colors, then most points will be covered by at most
- ne disk of any color.
SLIDE 35
The calculation
Suppose Alice chooses (1, 1, . . . , 1) vol(disk) ≈ 2−k. Then for a typical x in Alice’s disk, bias(x) = t, where exp(−t2) ≈ 2−k. bias(y) = (1 − 2ε)t. Then, vol(expanded disk) ≈ 2−(1−2ε)2k. So, a typical point is in 2k2−k(1−2ε)2 = 24ε(1−ε)k disks. If Alice uses ≈ 24ε(1−ε)k colors, then most points will be covered by at most
- ne disk of any color.
SLIDE 36
The calculation
Suppose Alice chooses (1, 1, . . . , 1) vol(disk) ≈ 2−k. Then for a typical x in Alice’s disk, bias(x) = t, where exp(−t2) ≈ 2−k. bias(y) = (1 − 2ε)t. Then, vol(expanded disk) ≈ 2−(1−2ε)2k. So, a typical point is in 2k2−k(1−2ε)2 = 24ε(1−ε)k disks. If Alice uses ≈ 24ε(1−ε)k colors, then most points will be covered by at most
- ne disk of any color.
SLIDE 37
The calculation
Suppose Alice chooses (1, 1, . . . , 1) vol(disk) ≈ 2−k. Then for a typical x in Alice’s disk, bias(x) = t, where exp(−t2) ≈ 2−k. bias(y) = (1 − 2ε)t. Then, vol(expanded disk) ≈ 2−(1−2ε)2k. So, a typical point is in 2k2−k(1−2ε)2 = 24ε(1−ε)k disks. If Alice uses ≈ 24ε(1−ε)k colors, then most points will be covered by at most
- ne disk of any color.
SLIDE 38
The calculation
Suppose Alice chooses (1, 1, . . . , 1) vol(disk) ≈ 2−k. Then for a typical x in Alice’s disk, bias(x) = t, where exp(−t2) ≈ 2−k. bias(y) = (1 − 2ε)t. Then, vol(expanded disk) ≈ 2−(1−2ε)2k. So, a typical point is in 2k2−k(1−2ε)2 = 24ε(1−ε)k disks. If Alice uses ≈ 24ε(1−ε)k colors, then most points will be covered by at most
- ne disk of any color.
SLIDE 39
The lower bound
Theorem CBSC(ε)(k, 0.5) ≥ 4ε(1 − ε)k − o(k). Idea The “expansion of the discs” seen by Bob is inevitable. This is formalized using a hypercontractivity inequality of the following form. For fAlice : {+1, −1}n → R, let fBob(y) := E[fAlice(X)|Y = y].
SLIDE 40
The lower bound
Theorem CBSC(ε)(k, 0.5) ≥ 4ε(1 − ε)k − o(k). Idea The “expansion of the discs” seen by Bob is inevitable. This is formalized using a hypercontractivity inequality of the following form. For fAlice : {+1, −1}n → R, let fBob(y) := E[fAlice(X)|Y = y].
SLIDE 41
The lower bound
Theorem CBSC(ε)(k, 0.5) ≥ 4ε(1 − ε)k − o(k). Idea The “expansion of the discs” seen by Bob is inevitable. This is formalized using a hypercontractivity inequality of the following form. For fAlice : {+1, −1}n → R, let fBob(y) := E[fAlice(X)|Y = y]. Then, for q > 1 and p = 1 + (1 − 2ε)2(q − 1), fBobq ≤ fAlicep, where for α, β : {+1, −1}N → R, αp = E
X [|α(X)|p]1/p
βq = E
Y [|β(Y)|q]1/q.
SLIDE 42
The analysis
Suppose Bob receives y. Let β(z|y) = Pr[gA(X) = z|Y = y]. Without a message, the best strategy for Bob is to output the z for which β(z|y) is maximum. Pr[Success | Y = y] ≤ max
z
β(z|y). Suppose Alice sends c-bit messages; so, there are at most t = 2c possible transcripts. Then, Pr[Success | Y = y] ≤
2c
- i=1
β(zi|y), where z1, z2, . . . , zt ∈ {0, 1}k give the top t values for β(z|y).
SLIDE 43
The analysis
Suppose Bob receives y. Let β(z|y) = Pr[gA(X) = z|Y = y]. Without a message, the best strategy for Bob is to output the z for which β(z|y) is maximum. Pr[Success | Y = y] ≤ max
z
β(z|y). Suppose Alice sends c-bit messages; so, there are at most t = 2c possible transcripts. Then, Pr[Success | Y = y] ≤
2c
- i=1
β(zi|y), where z1, z2, . . . , zt ∈ {0, 1}k give the top t values for β(z|y).
SLIDE 44
The analysis
Suppose Bob receives y. Let β(z|y) = Pr[gA(X) = z|Y = y]. Without a message, the best strategy for Bob is to output the z for which β(z|y) is maximum. Pr[Success | Y = y] ≤ max
z
β(z|y). Suppose Alice sends c-bit messages; so, there are at most t = 2c possible transcripts. Then, Pr[Success | Y = y] ≤
2c
- i=1
β(zi|y), where z1, z2, . . . , zt ∈ {0, 1}k give the top t values for β(z|y).
SLIDE 45
The analysis
Suppose Bob receives y. Let β(z|y) = Pr[gA(X) = z|Y = y]. Without a message, the best strategy for Bob is to output the z for which β(z|y) is maximum. Pr[Success | Y = y] ≤ max
z
β(z|y). Suppose Alice sends c-bit messages; so, there are at most t = 2c possible transcripts. Then, Pr[Success | Y = y] ≤
2c
- i=1
β(zi|y), where z1, z2, . . . , zt ∈ {0, 1}k give the top t values for β(z|y).
SLIDE 46
The final bound
Claim Pr[Success] ≤
- z
E
Y [β(z|Y)q]
1/q · t1−1/q. Using the hypercontractivity inequality with q = 1 + δ (so, p = 1 + (1 − 2ε)2δ), we obtain t ≥ Pr[Success](1+δ)/δ · 24ε(1−ε)k/(1+(1−2ε)2δ). Set Pr[Success] = 2−γk. Choose δ > 0 optimally for each choice of γ.
SLIDE 47
The final bound
Claim Pr[Success] ≤
- z
E
Y [β(z|Y)q]
1/q · t1−1/q. Using the hypercontractivity inequality with q = 1 + δ (so, p = 1 + (1 − 2ε)2δ), we obtain t ≥ Pr[Success](1+δ)/δ · 24ε(1−ε)k/(1+(1−2ε)2δ). Set Pr[Success] = 2−γk. Choose δ > 0 optimally for each choice of γ.
SLIDE 48
The final bound
Claim Pr[Success] ≤
- z
E
Y [β(z|Y)q]
1/q · t1−1/q. Using the hypercontractivity inequality with q = 1 + δ (so, p = 1 + (1 − 2ε)2δ), we obtain t ≥ Pr[Success](1+δ)/δ · 24ε(1−ε)k/(1+(1−2ε)2δ). Set Pr[Success] = 2−γk. Choose δ > 0 optimally for each choice of γ.
SLIDE 49
The result
Probability of agreement = 2−γk Communication = ck B := 4ε(1 − ε) ε/(1 − ε) c = B(1 − γ) − 2
- B(1 − B)γ
c γ
SLIDE 50
The Binary Erasure Channel
Alice Bob Input: X ∈ {0, 1}N Output: fA(X) ∈ {0, 1}k Input: Y ∈ {0, 1, ⋆}N Output: fB(Y) ∈ {0, 1}k (X, Y) ∼ BEC(ε): Pr[Yi = ⋆ | Xi = x] = ε; Pr[Yi = x | Xi = x] = 1 − ε. Probability of agreement = 2−γk Communication = ck c = ε(1 − γ) − 2
- ε(1 − ε)γ.
The lower bound is based on a new hypercontractivity inequality due to Nair and Wang (2015). ε (1 − √ 1 − ε)/(1 + √ 1 − ε) c = ε(1 − γ) − 2
- ε(1 − ε)γ
c γ
SLIDE 51
The Binary Erasure Channel
Alice Bob Input: X ∈ {0, 1}N Output: fA(X) ∈ {0, 1}k Input: Y ∈ {0, 1, ⋆}N Output: fB(Y) ∈ {0, 1}k (X, Y) ∼ BEC(ε): Pr[Yi = ⋆ | Xi = x] = ε; Pr[Yi = x | Xi = x] = 1 − ε. Probability of agreement = 2−γk Communication = ck c = ε(1 − γ) − 2
- ε(1 − ε)γ.
The lower bound is based on a new hypercontractivity inequality due to Nair and Wang (2015). ε (1 − √ 1 − ε)/(1 + √ 1 − ε) c = ε(1 − γ) − 2
- ε(1 − ε)γ
c γ
SLIDE 52
The Binary Erasure Channel
Alice Bob Input: X ∈ {0, 1}N Output: fA(X) ∈ {0, 1}k Input: Y ∈ {0, 1, ⋆}N Output: fB(Y) ∈ {0, 1}k (X, Y) ∼ BEC(ε): Pr[Yi = ⋆ | Xi = x] = ε; Pr[Yi = x | Xi = x] = 1 − ε. Probability of agreement = 2−γk Communication = ck c = ε(1 − γ) − 2
- ε(1 − ε)γ.
The lower bound is based on a new hypercontractivity inequality due to Nair and Wang (2015). ε (1 − √ 1 − ε)/(1 + √ 1 − ε) c = ε(1 − γ) − 2
- ε(1 − ε)γ
c γ
SLIDE 53
Open problems
Our lower bounds apply to commmunication over multiple rounds, if we assume that Alice’s output depends on her input alone. We do not know if the same lower bound holds without this assumption. We examined the two well-studied channels, BSC(ε) and BEC(ε), for agreement probability parametrized as 1 − 2γk. We do not know if a general bound applicable to all channels can be stated in terms of some information-theoretic parameter of the channel. For agreement close to 1 (error close to zero), the works of Zhao & Chia (2011) and Anantharam et al. (2013) address this question. Thank you!
SLIDE 54
Open problems
Our lower bounds apply to commmunication over multiple rounds, if we assume that Alice’s output depends on her input alone. We do not know if the same lower bound holds without this assumption. We examined the two well-studied channels, BSC(ε) and BEC(ε), for agreement probability parametrized as 1 − 2γk. We do not know if a general bound applicable to all channels can be stated in terms of some information-theoretic parameter of the channel. For agreement close to 1 (error close to zero), the works of Zhao & Chia (2011) and Anantharam et al. (2013) address this question. Thank you!
SLIDE 55