Tight bounds for Communication assisted agreement distillation - - PowerPoint PPT Presentation

▶

Jan 13, 2023 366 likes •934 views

Tight bounds for Communication assisted agreement distillation Jaikumar Radhakrishnan Tata Institute of Fundamental Research, Mumbai Joint work with Venkat Guruswami, Carnegie Mellon University Agreement distillation Alice Bob Input: X {

SLIDE 1

Tight bounds for Communication assisted agreement distillation

Jaikumar Radhakrishnan

Tata Institute of Fundamental Research, Mumbai

Joint work with Venkat Guruswami, Carnegie Mellon University

SLIDE 2

Agreement distillation

Alice Bob Input: X ∈ {0, 1}N Output: fA(X) ∈ {0, 1}k Input: Y ∈ {0, 1}N Output: fB(Y) ∈ {0, 1}k (X, Y) ∼ BSC(ε): Pr[Xi = Yi] = ε Goal fA(X) uniformly distributed in {0, 1}k Pr[fA(X) = fB(Y)] close to 1

SLIDE 3

Agreement distillation

Alice Bob Input: X ∈ {0, 1}N Output: fA(X) ∈ {0, 1}k Input: Y ∈ {0, 1}N Output: fB(Y) ∈ {0, 1}k (X, Y) ∼ BSC(ε): Pr[Xi = Yi] = ε Goal fA(X) uniformly distributed in {0, 1}k Pr[fA(X) = fB(Y)] close to 1

SLIDE 4

Naive protocol: no communication

Alice Bob Input: X ∈ {0, 1}N Output: fA(X) = X1X2 . . . Xk Input: Y ∈ {0, 1}N Output: fB(Y) = Y1Y2 . . . Yk Success probability Pr[fA(X) = fB(Y)] = (1 − ε)k ≈ exp(−εk)

SLIDE 5

Naive protocol: no communication

Alice Bob Input: X ∈ {0, 1}N Output: fA(X) = X1X2 . . . Xk Input: Y ∈ {0, 1}N Output: fB(Y) = Y1Y2 . . . Yk Success probability Pr[fA(X) = fB(Y)] = (1 − ε)k ≈ exp(−εk)

SLIDE 6

Can we do better?

Yes, a little better (Bogdanov & Mossel 2011) Alice and Bob can agree with probability least ≈ 2−(ε/(1−ε))k. But, no better (Bogdanov & Mossel 2011) Alice and Bob can agree with probability at most ≈ 2−(ε/(1−ε))k.

SLIDE 7

Can we do better?

Yes, a little better (Bogdanov & Mossel 2011) Alice and Bob can agree with probability least ≈ 2−(ε/(1−ε))k. But, no better (Bogdanov & Mossel 2011) Alice and Bob can agree with probability at most ≈ 2−(ε/(1−ε))k.

SLIDE 8

Can we do better?

Yes, a little better (Bogdanov & Mossel 2011) Alice and Bob can agree with probability least ≈ 2−(ε/(1−ε))k. But, no better (Bogdanov & Mossel 2011) Alice and Bob can agree with probability at most ≈ 2−(ε/(1−ε))k.

SLIDE 9

How much can communication help?

Alice Bob Input: X ∈ {0, 1}N Output: fA(X) = {0, 1}k = ⇒ M Input: Y ∈ {0, 1}N Output: fB(Y, M) ∈ {0, 1}k How many bits must Alice send Bob to ensure agreement with constant probability? What is the trade-off between communicaton and probability of agreement?

SLIDE 10

How much can communication help?

Alice Bob Input: X ∈ {0, 1}N Output: fA(X) = {0, 1}k = ⇒ M Input: Y ∈ {0, 1}N Output: fB(Y, M) ∈ {0, 1}k How many bits must Alice send Bob to ensure agreement with constant probability? What is the trade-off between communicaton and probability of agreement?

SLIDE 11

The trade-off

Definition CBSC(ε)(k, η) is the minimum number of bits Alice transmits to Bob in a protocol where gA(X) is uniformly distributed in {0, 1}k Pr[gA(X) = gB(Y, M)] ≥ η Probability of agreement = 2−γk Communication = ck BM ’10: If c = 0, then γ = ε/(1 − ε) This work: If c = 4ε(1 − ε), then γ → 0. c γ

SLIDE 12

The trade-off

Definition CBSC(ε)(k, η) is the minimum number of bits Alice transmits to Bob in a protocol where gA(X) is uniformly distributed in {0, 1}k Pr[gA(X) = gB(Y, M)] ≥ η Probability of agreement = 2−γk Communication = ck BM ’10: If c = 0, then γ = ε/(1 − ε) This work: If c = 4ε(1 − ε), then γ → 0. ε/(1 − ε) c γ

SLIDE 13

The trade-off

Definition CBSC(ε)(k, η) is the minimum number of bits Alice transmits to Bob in a protocol where gA(X) is uniformly distributed in {0, 1}k Pr[gA(X) = gB(Y, M)] ≥ η Probability of agreement = 2−γk Communication = ck BM ’10: If c = 0, then γ = ε/(1 − ε) This work: If c = 4ε(1 − ε), then γ → 0. B := 4ε(1 − ε) ε/(1 − ε) c γ

SLIDE 14

The trade-off

Definition CBSC(ε)(k, η) is the minimum number of bits Alice transmits to Bob in a protocol where gA(X) is uniformly distributed in {0, 1}k Pr[gA(X) = gB(Y, M)] ≥ η Probability of agreement = 2−γk Communication = ck BM ’10: If c = 0, then γ = ε/(1 − ε) This work: If c = 4ε(1 − ε), then γ → 0. B := 4ε(1 − ε) ε/(1 − ε) c = B(1 − γ) − 2

B(1 − B)γ

c γ

SLIDE 15

Related work

Communication complexity: Canonne, Guruswami, Meka, and Sudan (2015) used capacity-achieving codes to ensure agreement with high probability with (h(ε) + o(1))k bits of communication. Information theory: The case k = 1 is the subject of a recent conjecture of Courtade and Kumar (2014): The function gA(X) = X1 maximizes I[gA(X) : Y]. Chandar and Tchamkerten (2014) showed that the corresponding conjecture is false for large k.

SLIDE 16

The protocol of Bogdanov and Mossel

View {+1, −1}N as points on an N-dimensional sphere. Pick 2k well-sparated vectors, labelled by {0, 1}k. Alice: fA(X) = closest vector to X Bob: fB(Y) = closest vector to Y Proof idea The projections along the various directions are Gaussian and approximately independent.

SLIDE 17

The protocol of Bogdanov and Mossel

View {+1, −1}N as points on an N-dimensional sphere. Pick 2k well-sparated vectors, labelled by {0, 1}k. Alice: fA(X) = closest vector to X Bob: fB(Y) = closest vector to Y Proof idea The projections along the various directions are Gaussian and approximately independent.

SLIDE 18

The protocol of Bogdanov and Mossel

View {+1, −1}N as points on an N-dimensional sphere. Pick 2k well-sparated vectors, labelled by {0, 1}k. Alice: fA(X) = closest vector to X Bob: fB(Y) = closest vector to Y Proof idea The projections along the various directions are Gaussian and approximately independent.

SLIDE 19

The protocol of Bogdanov and Mossel

View {+1, −1}N as points on an N-dimensional sphere. Pick 2k well-sparated vectors, labelled by {0, 1}k. Alice: fA(X) = closest vector to X Bob: fB(Y) = closest vector to Y Proof idea The projections along the various directions are Gaussian and approximately independent.

SLIDE 20

The protocol of Bogdanov and Mossel

View {+1, −1}N as points on an N-dimensional sphere. Pick 2k well-sparated vectors, labelled by {0, 1}k. Alice: fA(X) = closest vector to X Bob: fB(Y) = closest vector to Y Proof idea The projections along the various directions are Gaussian and approximately independent.

SLIDE 21

The protocol of Bogdanov and Mossel

View {+1, −1}N as points on an N-dimensional sphere. Pick 2k well-sparated vectors, labelled by {0, 1}k. Alice: fA(X) = closest vector to X Bob: fB(Y) = closest vector to Y Proof idea The projections along the various directions are Gaussian and approximately independent.

SLIDE 22

Alice’s view

The ambient space is {+1, −1}N. The space is partitioned into disks. When X falls in a disk, Alice reports the label of its center. Each disk has volume ≈ 2−k.

SLIDE 23

Bob’s view

The disks are bigger and overlap. Bogdanov and Mossel ’10: About 2−(ε/(1−ε))k of the volume is covered by

nly one disk.

SLIDE 24

Bob’s view

The disks are bigger and overlap. Bogdanov and Mossel ’10: About 2−(ε/(1−ε))k of the volume is covered by

nly one disk.

SLIDE 25

Alice’s message

The space is partitioned into disks. The disks are colored using 2c colors. When X falls in a disk, Alice reports the label of its center. Alice sends Bob the color of the disk (c bits).

SLIDE 26

Alice’s message

The space is partitioned into disks. The disks are colored using 2c colors. When X falls in a disk, Alice reports the label of its center. Alice sends Bob the color of the disk (c bits).

SLIDE 27

Alice’s message

The space is partitioned into disks. The disks are colored using 2c colors. When X falls in a disk, Alice reports the label of its center. Alice sends Bob the color of the disk (c bits).

SLIDE 28

Bob’s view

Again, the disks are bigger and overlap. But, most points are covered by only

ne disk of a given color.

Bob uniquely identifies the disk (and its center). How many colors must Alice use?

SLIDE 29

Bob’s view

Again, the disks are bigger and overlap. But, most points are covered by only

ne disk of a given color.

Bob uniquely identifies the disk (and its center). How many colors must Alice use?

SLIDE 30

Bob’s view

Again, the disks are bigger and overlap. But, most points are covered by only

ne disk of a given color.

Bob uniquely identifies the disk (and its center). How many colors must Alice use?

SLIDE 31

Bob’s view

Again, the disks are bigger and overlap. But, most points are covered by only

ne disk of a given color.

Bob uniquely identifies the disk (and its center). How many colors must Alice use?

SLIDE 32

Bob’s view

Again, the disks are bigger and overlap. But, most points are covered by only

ne disk of a given color.

Bob uniquely identifies the disk (and its center). How many colors must Alice use?

SLIDE 33

The calculation

Suppose Alice chooses (1, 1, . . . , 1) vol(disk) ≈ 2−k. Then for a typical x in Alice’s disk, bias(x) = t, where exp(−t2) ≈ 2−k. bias(y) = (1 − 2ε)t. Then, vol(expanded disk) ≈ 2−(1−2ε)2k. So, a typical point is in 2k2−k(1−2ε)2 = 24ε(1−ε)k disks. If Alice uses ≈ 24ε(1−ε)k colors, then most points will be covered by at most

ne disk of any color.

SLIDE 34

The calculation

Suppose Alice chooses (1, 1, . . . , 1) vol(disk) ≈ 2−k. Then for a typical x in Alice’s disk, bias(x) = t, where exp(−t2) ≈ 2−k. bias(y) = (1 − 2ε)t. Then, vol(expanded disk) ≈ 2−(1−2ε)2k. So, a typical point is in 2k2−k(1−2ε)2 = 24ε(1−ε)k disks. If Alice uses ≈ 24ε(1−ε)k colors, then most points will be covered by at most

ne disk of any color.

SLIDE 35

The calculation

Suppose Alice chooses (1, 1, . . . , 1) vol(disk) ≈ 2−k. Then for a typical x in Alice’s disk, bias(x) = t, where exp(−t2) ≈ 2−k. bias(y) = (1 − 2ε)t. Then, vol(expanded disk) ≈ 2−(1−2ε)2k. So, a typical point is in 2k2−k(1−2ε)2 = 24ε(1−ε)k disks. If Alice uses ≈ 24ε(1−ε)k colors, then most points will be covered by at most

ne disk of any color.

SLIDE 36

The calculation

Suppose Alice chooses (1, 1, . . . , 1) vol(disk) ≈ 2−k. Then for a typical x in Alice’s disk, bias(x) = t, where exp(−t2) ≈ 2−k. bias(y) = (1 − 2ε)t. Then, vol(expanded disk) ≈ 2−(1−2ε)2k. So, a typical point is in 2k2−k(1−2ε)2 = 24ε(1−ε)k disks. If Alice uses ≈ 24ε(1−ε)k colors, then most points will be covered by at most

ne disk of any color.

SLIDE 37

The calculation

Suppose Alice chooses (1, 1, . . . , 1) vol(disk) ≈ 2−k. Then for a typical x in Alice’s disk, bias(x) = t, where exp(−t2) ≈ 2−k. bias(y) = (1 − 2ε)t. Then, vol(expanded disk) ≈ 2−(1−2ε)2k. So, a typical point is in 2k2−k(1−2ε)2 = 24ε(1−ε)k disks. If Alice uses ≈ 24ε(1−ε)k colors, then most points will be covered by at most

ne disk of any color.

SLIDE 38

The calculation

Suppose Alice chooses (1, 1, . . . , 1) vol(disk) ≈ 2−k. Then for a typical x in Alice’s disk, bias(x) = t, where exp(−t2) ≈ 2−k. bias(y) = (1 − 2ε)t. Then, vol(expanded disk) ≈ 2−(1−2ε)2k. So, a typical point is in 2k2−k(1−2ε)2 = 24ε(1−ε)k disks. If Alice uses ≈ 24ε(1−ε)k colors, then most points will be covered by at most

ne disk of any color.

SLIDE 39

The lower bound

Theorem CBSC(ε)(k, 0.5) ≥ 4ε(1 − ε)k − o(k). Idea The “expansion of the discs” seen by Bob is inevitable. This is formalized using a hypercontractivity inequality of the following form. For fAlice : {+1, −1}n → R, let fBob(y) := E[fAlice(X)|Y = y].

SLIDE 40

The lower bound

Theorem CBSC(ε)(k, 0.5) ≥ 4ε(1 − ε)k − o(k). Idea The “expansion of the discs” seen by Bob is inevitable. This is formalized using a hypercontractivity inequality of the following form. For fAlice : {+1, −1}n → R, let fBob(y) := E[fAlice(X)|Y = y].

SLIDE 41

The lower bound

Theorem CBSC(ε)(k, 0.5) ≥ 4ε(1 − ε)k − o(k). Idea The “expansion of the discs” seen by Bob is inevitable. This is formalized using a hypercontractivity inequality of the following form. For fAlice : {+1, −1}n → R, let fBob(y) := E[fAlice(X)|Y = y]. Then, for q > 1 and p = 1 + (1 − 2ε)2(q − 1), fBobq ≤ fAlicep, where for α, β : {+1, −1}N → R, αp = E

X [|α(X)|p]1/p

βq = E

Y [|β(Y)|q]1/q.

SLIDE 42

The analysis

Suppose Bob receives y. Let β(z|y) = Pr[gA(X) = z|Y = y]. Without a message, the best strategy for Bob is to output the z for which β(z|y) is maximum. Pr[Success | Y = y] ≤ max

β(z|y). Suppose Alice sends c-bit messages; so, there are at most t = 2c possible transcripts. Then, Pr[Success | Y = y] ≤

β(zi|y), where z1, z2, . . . , zt ∈ {0, 1}k give the top t values for β(z|y).

SLIDE 43

The analysis

Suppose Bob receives y. Let β(z|y) = Pr[gA(X) = z|Y = y]. Without a message, the best strategy for Bob is to output the z for which β(z|y) is maximum. Pr[Success | Y = y] ≤ max

β(z|y). Suppose Alice sends c-bit messages; so, there are at most t = 2c possible transcripts. Then, Pr[Success | Y = y] ≤

β(zi|y), where z1, z2, . . . , zt ∈ {0, 1}k give the top t values for β(z|y).

SLIDE 44

The analysis

Suppose Bob receives y. Let β(z|y) = Pr[gA(X) = z|Y = y]. Without a message, the best strategy for Bob is to output the z for which β(z|y) is maximum. Pr[Success | Y = y] ≤ max

β(z|y). Suppose Alice sends c-bit messages; so, there are at most t = 2c possible transcripts. Then, Pr[Success | Y = y] ≤

β(zi|y), where z1, z2, . . . , zt ∈ {0, 1}k give the top t values for β(z|y).

SLIDE 45

The analysis

Suppose Bob receives y. Let β(z|y) = Pr[gA(X) = z|Y = y]. Without a message, the best strategy for Bob is to output the z for which β(z|y) is maximum. Pr[Success | Y = y] ≤ max

β(z|y). Suppose Alice sends c-bit messages; so, there are at most t = 2c possible transcripts. Then, Pr[Success | Y = y] ≤

β(zi|y), where z1, z2, . . . , zt ∈ {0, 1}k give the top t values for β(z|y).

SLIDE 46

The final bound

Claim Pr[Success] ≤

E

Y [β(z|Y)q]

1/q · t1−1/q. Using the hypercontractivity inequality with q = 1 + δ (so, p = 1 + (1 − 2ε)2δ), we obtain t ≥ Pr[Success](1+δ)/δ · 24ε(1−ε)k/(1+(1−2ε)2δ). Set Pr[Success] = 2−γk. Choose δ > 0 optimally for each choice of γ.

SLIDE 47

The final bound

Claim Pr[Success] ≤

E

Y [β(z|Y)q]

1/q · t1−1/q. Using the hypercontractivity inequality with q = 1 + δ (so, p = 1 + (1 − 2ε)2δ), we obtain t ≥ Pr[Success](1+δ)/δ · 24ε(1−ε)k/(1+(1−2ε)2δ). Set Pr[Success] = 2−γk. Choose δ > 0 optimally for each choice of γ.

SLIDE 48

The final bound

Claim Pr[Success] ≤

E

Y [β(z|Y)q]

1/q · t1−1/q. Using the hypercontractivity inequality with q = 1 + δ (so, p = 1 + (1 − 2ε)2δ), we obtain t ≥ Pr[Success](1+δ)/δ · 24ε(1−ε)k/(1+(1−2ε)2δ). Set Pr[Success] = 2−γk. Choose δ > 0 optimally for each choice of γ.

SLIDE 49

The result

Probability of agreement = 2−γk Communication = ck B := 4ε(1 − ε) ε/(1 − ε) c = B(1 − γ) − 2

B(1 − B)γ

c γ

SLIDE 50

The Binary Erasure Channel

Alice Bob Input: X ∈ {0, 1}N Output: fA(X) ∈ {0, 1}k Input: Y ∈ {0, 1, ⋆}N Output: fB(Y) ∈ {0, 1}k (X, Y) ∼ BEC(ε): Pr[Yi = ⋆ | Xi = x] = ε; Pr[Yi = x | Xi = x] = 1 − ε. Probability of agreement = 2−γk Communication = ck c = ε(1 − γ) − 2

ε(1 − ε)γ.

The lower bound is based on a new hypercontractivity inequality due to Nair and Wang (2015). ε (1 − √ 1 − ε)/(1 + √ 1 − ε) c = ε(1 − γ) − 2

ε(1 − ε)γ

c γ

SLIDE 51

The Binary Erasure Channel

Alice Bob Input: X ∈ {0, 1}N Output: fA(X) ∈ {0, 1}k Input: Y ∈ {0, 1, ⋆}N Output: fB(Y) ∈ {0, 1}k (X, Y) ∼ BEC(ε): Pr[Yi = ⋆ | Xi = x] = ε; Pr[Yi = x | Xi = x] = 1 − ε. Probability of agreement = 2−γk Communication = ck c = ε(1 − γ) − 2

ε(1 − ε)γ.

The lower bound is based on a new hypercontractivity inequality due to Nair and Wang (2015). ε (1 − √ 1 − ε)/(1 + √ 1 − ε) c = ε(1 − γ) − 2

ε(1 − ε)γ

c γ

SLIDE 52

The Binary Erasure Channel

Alice Bob Input: X ∈ {0, 1}N Output: fA(X) ∈ {0, 1}k Input: Y ∈ {0, 1, ⋆}N Output: fB(Y) ∈ {0, 1}k (X, Y) ∼ BEC(ε): Pr[Yi = ⋆ | Xi = x] = ε; Pr[Yi = x | Xi = x] = 1 − ε. Probability of agreement = 2−γk Communication = ck c = ε(1 − γ) − 2

ε(1 − ε)γ.

The lower bound is based on a new hypercontractivity inequality due to Nair and Wang (2015). ε (1 − √ 1 − ε)/(1 + √ 1 − ε) c = ε(1 − γ) − 2

ε(1 − ε)γ

c γ

SLIDE 53

Open problems

Our lower bounds apply to commmunication over multiple rounds, if we assume that Alice’s output depends on her input alone. We do not know if the same lower bound holds without this assumption. We examined the two well-studied channels, BSC(ε) and BEC(ε), for agreement probability parametrized as 1 − 2γk. We do not know if a general bound applicable to all channels can be stated in terms of some information-theoretic parameter of the channel. For agreement close to 1 (error close to zero), the works of Zhao & Chia (2011) and Anantharam et al. (2013) address this question. Thank you!

SLIDE 54

Open problems

Our lower bounds apply to commmunication over multiple rounds, if we assume that Alice’s output depends on her input alone. We do not know if the same lower bound holds without this assumption. We examined the two well-studied channels, BSC(ε) and BEC(ε), for agreement probability parametrized as 1 − 2γk. We do not know if a general bound applicable to all channels can be stated in terms of some information-theoretic parameter of the channel. For agreement close to 1 (error close to zero), the works of Zhao & Chia (2011) and Anantharam et al. (2013) address this question. Thank you!

SLIDE 55

Open problems

Our lower bounds apply to commmunication over multiple rounds, if we assume that Alice’s output depends on her input alone. We do not know if the same lower bound holds without this assumption. We examined the two well-studied channels, BSC(ε) and BEC(ε), for agreement probability parametrized as 1 − 2γk. We do not know if a general bound applicable to all channels can be stated in terms of some information-theoretic parameter of the channel. For agreement close to 1 (error close to zero), the works of Zhao & Chia (2011) and Anantharam et al. (2013) address this question. Thank you!