alice and bob show distribution testing lower bounds They dont talk - - PowerPoint PPT Presentation

alice and bob show distribution testing lower bounds
SMART_READER_LITE
LIVE PREVIEW

alice and bob show distribution testing lower bounds They dont talk - - PowerPoint PPT Presentation

alice and bob show distribution testing lower bounds They dont talk to each other anymore. Clment Canonne (Columbia University) July 9, 2017 Joint work with Eric Blais (UWaterloo) and Tom Gur (Weizmann Institute UC Berkeley)


slide-1
SLIDE 1

alice and bob show distribution testing lower bounds

They don’t talk to each other anymore.

Clément Canonne (Columbia University) July 9, 2017

Joint work with Eric Blais (UWaterloo) and Tom Gur (Weizmann Institute UC Berkeley)

slide-2
SLIDE 2

“distribution testing?”

slide-3
SLIDE 3

why?

Property testing of probability distributions: sublinear, approximate, randomized algorithms that take random samples ∙ Big Dataset: too big ∙ Expensive access: pricey data ∙ “Model selection”: many options Need to infer information – one bit – from the data: fast, or with very few samples.

2

slide-4
SLIDE 4

why?

Property testing of probability distributions: sublinear, approximate, randomized algorithms that take random samples ∙ Big Dataset: too big ∙ Expensive access: pricey data ∙ “Model selection”: many options Need to infer information – one bit – from the data: fast, or with very few samples.

2

slide-5
SLIDE 5

why?

Property testing of probability distributions: sublinear, approximate, randomized algorithms that take random samples ∙ Big Dataset: too big ∙ Expensive access: pricey data ∙ “Model selection”: many options Need to infer information – one bit – from the data: fast, or with very few samples.

2

slide-6
SLIDE 6

why?

Property testing of probability distributions: sublinear, approximate, randomized algorithms that take random samples ∙ Big Dataset: too big ∙ Expensive access: pricey data ∙ “Model selection”: many options Need to infer information – one bit – from the data: fast, or with very few samples.

2

slide-7
SLIDE 7

why?

Property testing of probability distributions: sublinear, approximate, randomized algorithms that take random samples ∙ Big Dataset: too big ∙ Expensive access: pricey data ∙ “Model selection”: many options Need to infer information – one bit – from the data: fast, or with very few samples.

2

slide-8
SLIDE 8

why?

Property testing of probability distributions: sublinear, approximate, randomized algorithms that take random samples ∙ Big Dataset: too big ∙ Expensive access: pricey data ∙ “Model selection”: many options Need to infer information – one bit – from the data: fast, or with very few samples.

2

slide-9
SLIDE 9

why?

Property testing of probability distributions: sublinear, approximate, randomized algorithms that take random samples ∙ Big Dataset: too big ∙ Expensive access: pricey data ∙ “Model selection”: many options Need to infer information – one bit – from the data: fast, or with very few samples.

2

slide-10
SLIDE 10

why?

Property testing of probability distributions: sublinear, approximate, randomized algorithms that take random samples ∙ Big Dataset: too big ∙ Expensive access: pricey data ∙ “Model selection”: many options Need to infer information – one bit – from the data: fast, or with very few samples.

2

slide-11
SLIDE 11

why?

Property testing of probability distributions: sublinear, approximate, randomized algorithms that take random samples ∙ Big Dataset: too big ∙ Expensive access: pricey data ∙ “Model selection”: many options Need to infer information – one bit – from the data: fast, or with very few samples.

2

slide-12
SLIDE 12

3

slide-13
SLIDE 13

how?

(Property) Distribution Testing: in an (egg)shell.

4

slide-14
SLIDE 14

how?

(Property) Distribution Testing: in an (egg)shell.

4

slide-15
SLIDE 15

how?

(Property) Distribution Testing: in an (egg)shell.

4

slide-16
SLIDE 16

how?

Known domain (here [n] = {1, . . . , n}) Property P ⊆ ∆([n]) Independent samples from unknown p ∈ ∆([n]) Distance parameter ε ∈ (0, 1] Must decide: p , or

1 p

? (and be correct on any p with probability at least 2 3)

5

slide-17
SLIDE 17

how?

Known domain (here [n] = {1, . . . , n}) Property P ⊆ ∆([n]) Independent samples from unknown p ∈ ∆([n]) Distance parameter ε ∈ (0, 1] Must decide: p ∈ P , or

1 p

? (and be correct on any p with probability at least 2 3)

5

slide-18
SLIDE 18

how?

Known domain (here [n] = {1, . . . , n}) Property P ⊆ ∆([n]) Independent samples from unknown p ∈ ∆([n]) Distance parameter ε ∈ (0, 1] Must decide: p ∈ P, or ℓ1(p, P) > ε? (and be correct on any p with probability at least 2 3)

5

slide-19
SLIDE 19

how?

Known domain (here [n] = {1, . . . , n}) Property P ⊆ ∆([n]) Independent samples from unknown p ∈ ∆([n]) Distance parameter ε ∈ (0, 1] Must decide: p ∈ P, or ℓ1(p, P) > ε? (and be correct on any p with probability at least 2/3)

5

slide-20
SLIDE 20

and?

Many results on many properties: ∙ Uniformity [GR00, BFR 00, Pan08] ∙ Identity* [BFF 01, VV14] ∙ Equivalence [BFR 00, Val11, CDVV14] ∙ Independence [BFF 01, LRR13] ∙ Monotonicity [BKR04] ∙ Poisson Binomial Distributions [AD14] ∙ Generic approachs for classes [CDGR15, ADK15] ∙ and more…

6

slide-21
SLIDE 21

and?

Many results on many properties: ∙ Uniformity [GR00, BFR+00, Pan08] ∙ Identity* [BFF 01, VV14] ∙ Equivalence [BFR 00, Val11, CDVV14] ∙ Independence [BFF 01, LRR13] ∙ Monotonicity [BKR04] ∙ Poisson Binomial Distributions [AD14] ∙ Generic approachs for classes [CDGR15, ADK15] ∙ and more…

6

slide-22
SLIDE 22

and?

Many results on many properties: ∙ Uniformity [GR00, BFR+00, Pan08] ∙ Identity* [BFF+01, VV14] ∙ Equivalence [BFR 00, Val11, CDVV14] ∙ Independence [BFF 01, LRR13] ∙ Monotonicity [BKR04] ∙ Poisson Binomial Distributions [AD14] ∙ Generic approachs for classes [CDGR15, ADK15] ∙ and more…

6

slide-23
SLIDE 23

and?

Many results on many properties: ∙ Uniformity [GR00, BFR+00, Pan08] ∙ Identity* [BFF+01, VV14] ∙ Equivalence [BFR+00, Val11, CDVV14] ∙ Independence [BFF 01, LRR13] ∙ Monotonicity [BKR04] ∙ Poisson Binomial Distributions [AD14] ∙ Generic approachs for classes [CDGR15, ADK15] ∙ and more…

6

slide-24
SLIDE 24

and?

Many results on many properties: ∙ Uniformity [GR00, BFR+00, Pan08] ∙ Identity* [BFF+01, VV14] ∙ Equivalence [BFR+00, Val11, CDVV14] ∙ Independence [BFF+01, LRR13] ∙ Monotonicity [BKR04] ∙ Poisson Binomial Distributions [AD14] ∙ Generic approachs for classes [CDGR15, ADK15] ∙ and more…

6

slide-25
SLIDE 25

and?

Many results on many properties: ∙ Uniformity [GR00, BFR+00, Pan08] ∙ Identity* [BFF+01, VV14] ∙ Equivalence [BFR+00, Val11, CDVV14] ∙ Independence [BFF+01, LRR13] ∙ Monotonicity [BKR04] ∙ Poisson Binomial Distributions [AD14] ∙ Generic approachs for classes [CDGR15, ADK15] ∙ and more…

6

slide-26
SLIDE 26

and?

Many results on many properties: ∙ Uniformity [GR00, BFR+00, Pan08] ∙ Identity* [BFF+01, VV14] ∙ Equivalence [BFR+00, Val11, CDVV14] ∙ Independence [BFF+01, LRR13] ∙ Monotonicity [BKR04] ∙ Poisson Binomial Distributions [AD14] ∙ Generic approachs for classes [CDGR15, ADK15] ∙ and more…

6

slide-27
SLIDE 27

and?

Many results on many properties: ∙ Uniformity [GR00, BFR+00, Pan08] ∙ Identity* [BFF+01, VV14] ∙ Equivalence [BFR+00, Val11, CDVV14] ∙ Independence [BFF+01, LRR13] ∙ Monotonicity [BKR04] ∙ Poisson Binomial Distributions [AD14] ∙ Generic approachs for classes [CDGR15, ADK15] ∙ and more…

6

slide-28
SLIDE 28

and?

Many results on many properties: ∙ Uniformity [GR00, BFR+00, Pan08] ∙ Identity* [BFF+01, VV14] ∙ Equivalence [BFR+00, Val11, CDVV14] ∙ Independence [BFF+01, LRR13] ∙ Monotonicity [BKR04] ∙ Poisson Binomial Distributions [AD14] ∙ Generic approachs for classes [CDGR15, ADK15] ∙ and more…

6

slide-29
SLIDE 29

but?

Lower bounds… … are quite tricky. We want more methods. Generic if possible, applying to many problems at once.

7

slide-30
SLIDE 30

but?

Lower bounds… … are quite tricky. We want more methods. Generic if possible, applying to many problems at once.

7

slide-31
SLIDE 31

but?

Lower bounds… … are quite tricky. We want more methods. Generic if possible, applying to many problems at once.

7

slide-32
SLIDE 32

but?

Lower bounds… … are quite tricky. We want more methods. Generic if possible, applying to many problems at once.

7

slide-33
SLIDE 33

but?

Lower bounds… … are quite tricky. We want more methods. Generic if possible, applying to many problems at once.

7

slide-34
SLIDE 34

“communication complexity?”

slide-35
SLIDE 35

what now?

f(x, y)

9

slide-36
SLIDE 36

what now?

f(x, y)

9

slide-37
SLIDE 37

what now?

f(x, y)

9

slide-38
SLIDE 38

what now?

f(x, y)

9

slide-39
SLIDE 39

what now?

f(x, y)

9

slide-40
SLIDE 40

what now?

But communicating is hard.

10

slide-41
SLIDE 41

was that a toilet?

∙ f known by all parties ∙ Alice gets x, Bob gets y ∙ Private randomness Goal: minimize communication (worst case over x, y, randomness) to compute f(x, y).

11

slide-42
SLIDE 42

also…

…in our setting, Alice and Bob do not get to communicate. ∙ f known by all parties ∙ Alice gets x, Bob gets y ∙ Both send one-way messages to a referee ∙ Private randomness SMP Simultaneous Message Passing model.

12

slide-43
SLIDE 43

also…

…in our setting, Alice and Bob do not get to communicate. ∙ f known by all parties ∙ Alice gets x, Bob gets y ∙ Both send one-way messages to a referee ∙ Private randomness SMP Simultaneous Message Passing model.

12

slide-44
SLIDE 44

also…

…in our setting, Alice and Bob do not get to communicate. ∙ f known by all parties ∙ Alice gets x, Bob gets y ∙ Both send one-way messages to a referee ∙ Private randomness SMP Simultaneous Message Passing model.

12

slide-45
SLIDE 45

referee model (smp).

13

slide-46
SLIDE 46

referee model (smp).

Upshot SMP(Eqn) = Ω( √ n) (Only O(log n) with one-way communication!)

14

slide-47
SLIDE 47

well, sure, but why?

slide-48
SLIDE 48

testing lower bounds via communication complexity

∙ Introduced by Blais, Brody, and Matulef [BBM12] for Boolean functions ∙ Elegant reductions, generic framework ∙ Carry over very strong communication complexity lower bounds Can we… … have the same for distribution testing?

16

slide-49
SLIDE 49

testing lower bounds via communication complexity

∙ Introduced by Blais, Brody, and Matulef [BBM12] for Boolean functions ∙ Elegant reductions, generic framework ∙ Carry over very strong communication complexity lower bounds Can we… … have the same for distribution testing?

16

slide-50
SLIDE 50

distribution testing via comm. compl.

slide-51
SLIDE 51

the title should make sense now.

18

slide-52
SLIDE 52

rest of the talk

  • 1. The general methodology.
  • 2. Application: testing uniformity, and the struggle for Equality
  • 3. Testing identity, an unexpected connection

∙ The [VV14] result and the 2/3-pseudonorm ∙ Our reduction, p-weighted codes, and the K-functional ∙ Wait, what is this thing?

  • 4. Conclusion

19

slide-53
SLIDE 53

the methodology

Theorem Let ε > 0, and let Ω be a domain of size n. Fix a property Π ⊆ ∆(Ω) and f: {0, 1}k × {0, 1}k → {0, 1}. Suppose there exists a mapping p: {0, 1}k × {0, 1}k → ∆(Ω) that satisfies the following conditions.

  • 1. Decomposability: ∀x, y ∈ {0, 1}k, there exist

α = α(x), β = β(y) ∈ [0, 1] and pA(x), pB(y) ∈ ∆(Ω) such that p(x, y) = α α + β · pA(x) + β α + β · pB(y) and α, β can each be encoded with O(log n) bits.

  • 2. Completeness: For every (x, y) = f−1(1), it holds that p(x, y) ∈ Π.
  • 3. Soundness: For every (x, y) = f−1(0), it holds that p(x, y) is ε-far

from Π in ℓ1 distance. Then, every ε-tester for Π needs Ω (

SMP(f) log(n)

) samples.

20

slide-54
SLIDE 54

application: testing uniformity

Take the “equality” predicate Eqk as f: Theorem (Newman and Szegedy [NS96]) For every k ∈ N, SMP(Eqk) = Ω (√ k ) . Goal: Will (re)prove an n lower bound on testing uniformity.

21

slide-55
SLIDE 55

application: testing uniformity

Take the “equality” predicate Eqk as f: Theorem (Newman and Szegedy [NS96]) For every k ∈ N, SMP(Eqk) = Ω (√ k ) . Goal: Will (re)prove an ˜ Ω(√n) lower bound on testing uniformity.

21

slide-56
SLIDE 56

application: testing uniformity

22

slide-57
SLIDE 57

testing identity

Statement Explicit description of p ∈ ∆([n]), parameter ε ∈ (0, 1]. Given samples from (unknown) q ∈ ∆([n]), decide p = q vs. ℓ1(p, q) > ε Theorem Identity testing requires

n

2

samples (and we just proved n ). Actually… Theorem ([VV14]) Identity testing requires

p

max 2 3 2

samples (and this is “tight”).

23

slide-58
SLIDE 58

testing identity

Statement Explicit description of p ∈ ∆([n]), parameter ε ∈ (0, 1]. Given samples from (unknown) q ∈ ∆([n]), decide p = q vs. ℓ1(p, q) > ε Theorem Identity testing requires Ω ( √n

ε2

) samples (and we just proved ˜ Ω(√n)). Actually… Theorem ([VV14]) Identity testing requires

p

max 2 3 2

samples (and this is “tight”).

23

slide-59
SLIDE 59

testing identity

Statement Explicit description of p ∈ ∆([n]), parameter ε ∈ (0, 1]. Given samples from (unknown) q ∈ ∆([n]), decide p = q vs. ℓ1(p, q) > ε Theorem Identity testing requires Ω ( √n

ε2

) samples (and we just proved ˜ Ω(√n)). Actually… Theorem ([VV14]) Identity testing requires Ω (

∥p− max

−ε

∥2/3 ε2

) samples (and this is “tight”).

23

slide-60
SLIDE 60

application: testing identity

An issue: how to interpret this ∥p− max

−ε

∥2/3? Goal: Will prove an(other) p lower bound on testing identity, via communication complexity. (and it will be “tight” as well.)

24

slide-61
SLIDE 61

application: testing identity

An issue: how to interpret this ∥p− max

−ε

∥2/3? Goal: Will prove an(other) ˜ Ω(Φ(p, ε)) lower bound on testing identity, via communication complexity. (and it will be “tight” as well.)

24

slide-62
SLIDE 62

application: testing identity

An issue: how to interpret this ∥p− max

−ε

∥2/3? Goal: Will prove an(other) ˜ Ω(Φ(p, ε)) lower bound on testing identity, via communication complexity. (and it will be “tight” as well.)

24

slide-63
SLIDE 63

application: testing identity

∙ p-weighted codes distp(x, y) :=

n

i=1

p(i) · |xi − yi| (x, y ∈ {0, 1}n) A p-weighted code has distance guarantee w.r.t. this p-distance: distp(C(x), C(y)) > γ for all distinct x, y ∈ {0, 1}k. ∙ Volume of the p-ball: VolFn

2,distp(ε) := |{ w ∈ Fn

2 : distp(w, 0n) ≤ ε }| .

25

slide-64
SLIDE 64

application: testing identity

Lemma (Balanced p-weighted exist) Fix p ∈ ∆([n]) and ε. There exists a p-weighted (nearly) balanced code C: {0, 1}k → {0, 1}n with relative distance ε such that k = Ω(n − log VolFn

2,distp(ε)).

(Sphere packing bound: must have k n log Vol

n 2 distp

2 ) Recall Our reduction will give a lower bound of

k log n : so we need to

analyze Vol

n 2 distp

.

26

slide-65
SLIDE 65

application: testing identity

Lemma (Balanced p-weighted exist) Fix p ∈ ∆([n]) and ε. There exists a p-weighted (nearly) balanced code C: {0, 1}k → {0, 1}n with relative distance ε such that k = Ω(n − log VolFn

2,distp(ε)).

(Sphere packing bound: must have k ≤ n − log VolFn

2,distp(ε/2))

Recall Our reduction will give a lower bound of

k log n : so we need to

analyze Vol

n 2 distp

.

26

slide-66
SLIDE 66

application: testing identity

Lemma (Balanced p-weighted exist) Fix p ∈ ∆([n]) and ε. There exists a p-weighted (nearly) balanced code C: {0, 1}k → {0, 1}n with relative distance ε such that k = Ω(n − log VolFn

2,distp(ε)).

(Sphere packing bound: must have k ≤ n − log VolFn

2,distp(ε/2))

Recall Our reduction will give a lower bound of Ω ( √

k log n

) : so we need to analyze VolFn

2,distp(ε). 26

slide-67
SLIDE 67

enter concentration inequalities

VolFn

2,distp(γ) =

  • {

w ∈ Fn

2 : n

i=1

piwi ≤ γ }

  • = 2n

Pr

Y∼{0,1}n

[

n

i=1

piYi ≤ γ ] = 2n Pr

X∼{−1,1}n

[

n

i=1

piXi ≥ 1 − 2γ ] Concentration inequalities for weighted sums of Rademacher r.v.’s?

27

slide-68
SLIDE 68

enter concentration inequalities

VolFn

2,distp(γ) =

  • {

w ∈ Fn

2 : n

i=1

piwi ≤ γ }

  • = 2n

Pr

Y∼{0,1}n

[

n

i=1

piYi ≤ γ ] = 2n Pr

X∼{−1,1}n

[

n

i=1

piXi ≥ 1 − 2γ ] Concentration inequalities for weighted sums of Rademacher r.v.’s?

27

slide-69
SLIDE 69

the k-functional [Pee68]

Definition (K-functional) Fix any two Banach spaces (X0, ∥·∥0), (X1, ∥·∥1). The K-functional between X0 and X1 is the function KX0,X1 : (X0 + X1) × (0, ∞) → [0, ∞) defined by KX0,X1(x, t) := inf

(x0,x1)∈X0×X1 x0+x1=x

∥x0∥0 + t∥x1∥1. For a ∈ ℓ1 + ℓ2 = ℓ2, we write κa for the function t → Kℓ1,ℓ2(a, t).

28

slide-70
SLIDE 70

the connection

Theorem ([MS90]) Let (Xi)i≥0 be a sequence of independent Rademacher random variables, i.e. uniform on {−1, 1}. Then, for any a ∈ ℓ2 and t > 0, Pr [ ∞ ∑

i=1

aiXi ≥ κa(t) ] ≤ e− t2

2 .

(1) and Pr [ ∞ ∑

i=1

aiXi ≥ 1 2κa(t) ] ≥ e−(2 ln 24)t2. (2)

29

slide-71
SLIDE 71

putting it together

Theorem ([BCG16]) Identity testing to p ∈ ∆([n]) requires Ω(tε/ log(n)) samples, where tε := κ−1

p (1 − 2ε).

But… …is it tight?

30

slide-72
SLIDE 72

putting it together

Theorem ([BCG16]) Identity testing to p ∈ ∆([n]) requires Ω(tε/ log(n)) samples, where tε := κ−1

p (1 − 2ε).

But… …is it tight?

30

slide-73
SLIDE 73

now that communication complexity paved the way…

Theorem ([BCG16]) Identity testing to p ∈ ∆([n]) can be done with O ( tε/18

ε2

) samples and requires Ω ( tε

ε

)

  • f them, where tε := κ−1

p (1 − 2ε).

Upper bound established by a new connection between K-functional and “effective support size.”

31

slide-74
SLIDE 74

now that communication complexity paved the way…

Theorem ([BCG16]) Identity testing to p ∈ ∆([n]) can be done with O ( tε/18

ε2

) samples and requires Ω ( tε

ε

)

  • f them, where tε := κ−1

p (1 − 2ε).

Upper bound established by a new connection between K-functional and “effective support size.”

31

slide-75
SLIDE 75

k-functional, q-norm, and sparsity

Theorem ([Ast10, MS90]) For arbitrary a ∈ ℓ2 and t ∈ N, define the norm ∥a∥Q(t) := sup     

t

j=1

 ∑

i∈Aj

a2

i

 

1/2

: (Aj)1≤j≤t partition of N      . Then, for any a ∈ ℓ2, and t > 0 such that t2 ∈ N, we have ∥a∥Q(t2) ≤ κa(t) ≤ √ 2∥a∥Q(t2). (3) Lemma ([BCG16]) For any a ∈ ℓ2 and t such that t2 ∈ N, we have ∥a∥Q(t2) ≤ κa(t) ≤ ∥a∥Q(2t2). (4)

32

slide-76
SLIDE 76

k-functional, q-norm, and sparsity

Lemma (Sparsity Lemma) If ∥p∥Q(T) ≥ 1 − 2ε, then there is a subset S of T elements such that p(S) ≥ 1 − 6ε. Directly implies the upperbound, using T 2t2

O

. Proof idea. By monotonicity,

T j 1 i Aj p2 i 1 2 T j 1 i Aj pi

p 1

  • 1. So

we have 1 2

T j 1 i Aj

p2

i 1 2

1 which (morally) implies that p is “close to a singleton” on each Aj.

33

slide-77
SLIDE 77

k-functional, q-norm, and sparsity

Lemma (Sparsity Lemma) If ∥p∥Q(T) ≥ 1 − 2ε, then there is a subset S of T elements such that p(S) ≥ 1 − 6ε. Directly implies the upperbound, using T := 2t2

O(ε).

Proof idea. By monotonicity,

T j 1 i Aj p2 i 1 2 T j 1 i Aj pi

p 1

  • 1. So

we have 1 2

T j 1 i Aj

p2

i 1 2

1 which (morally) implies that p is “close to a singleton” on each Aj.

33

slide-78
SLIDE 78

k-functional, q-norm, and sparsity

Lemma (Sparsity Lemma) If ∥p∥Q(T) ≥ 1 − 2ε, then there is a subset S of T elements such that p(S) ≥ 1 − 6ε. Directly implies the upperbound, using T := 2t2

O(ε).

Proof idea. By monotonicity, ∑T

j=1

(∑

i∈Aj p2 i

)1/2 ≤ ∑T

j=1

i∈Aj pi = ∥p∥1 = 1. So

we have 1 − 2ε ≤

T

j=1

 ∑

i∈Aj

p2

i

 

1/2

≤ 1 which (morally) implies that p is “close to a singleton” on each Aj.

33

slide-79
SLIDE 79

conclusion

∙ New framework to prove distribution testing lower bounds: reduction from communication complexity ∙ Clean and simple ∙ Leads to new insights: “instance-optimal” identity testing, revisited ∙ unexpected connection to interpolation theory ∙ Codes are great!

34

slide-80
SLIDE 80

conclusion

∙ New framework to prove distribution testing lower bounds: reduction from communication complexity ∙ Clean and simple ∙ Leads to new insights: “instance-optimal” identity testing, revisited ∙ unexpected connection to interpolation theory ∙ Codes are great!

34

slide-81
SLIDE 81

conclusion

∙ New framework to prove distribution testing lower bounds: reduction from communication complexity ∙ Clean and simple ∙ Leads to new insights: “instance-optimal” identity testing, revisited ∙ unexpected connection to interpolation theory ∙ Codes are great!

34

slide-82
SLIDE 82

conclusion

∙ New framework to prove distribution testing lower bounds: reduction from communication complexity ∙ Clean and simple ∙ Leads to new insights: “instance-optimal” identity testing, revisited ∙ unexpected connection to interpolation theory ∙ Codes are great!

34

slide-83
SLIDE 83

conclusion

∙ New framework to prove distribution testing lower bounds: reduction from communication complexity ∙ Clean and simple ∙ Leads to new insights: “instance-optimal” identity testing, revisited ∙ unexpected connection to interpolation theory ∙ Codes are great!

34

slide-84
SLIDE 84

Thank you

35

slide-85
SLIDE 85

Jayadev Acharya and Constantinos Daskalakis. Testing Poisson Binomial Distributions. In Proceedings of SODA, pages 1829–1840, 2014. Jayadev Acharya, Constantinos Daskalakis, and Gautam Kamath. Optimal testing for properties of distributions. ArXiV, (abs/1507.05952), July 2015. Sergey V. Astashkin. Rademacher functions in symmetric spaces. Journal of Mathematical Sciences, 169(6):725–886, sep 2010. Eric Blais, Joshua Brody, and Kevin Matulef. Property testing lower bounds via communication complexity. Computational Complexity, 21(2):311–358, 2012. Eric Blais, Clément L. Canonne, and Tom Gur. Alice and Bob Show Distribution Testing Lower Bounds (They don’t talk to each other anymore.). Electronic Colloquium on Computational Complexity (ECCC), 23:168, 2016. Tuğkan Batu, Eldar Fischer, Lance Fortnow, Ravi Kumar, Ronitt Rubinfeld, and Patrick White. Testing random variables for independence and identity. In Proceedings of FOCS, pages 442–451, 2001. Tuğkan Batu, Lance Fortnow, Ronitt Rubinfeld, Warren D. Smith, and Patrick White. Testing that distributions are close. In Proceedings of FOCS, pages 189–197, 2000. Tuğkan Batu, Ravi Kumar, and Ronitt Rubinfeld. Sublinear algorithms for testing monotone and unimodal distributions.

35

slide-86
SLIDE 86

In Proceedings of STOC, pages 381–390, New York, NY, USA, 2004. ACM. Clément L. Canonne, Ilias Diakonikolas, Themis Gouleakis, and Ronitt Rubinfeld. Testing Shape Restrictions of Discrete Distributions. ArXiV, abs/1507.03558, July 2015. Siu-On Chan, Ilias Diakonikolas, Gregory Valiant, and Paul Valiant. Optimal algorithms for testing closeness of discrete distributions. In Proceedings of SODA, pages 1193–1203. Society for Industrial and Applied Mathematics (SIAM), 2014. Oded Goldreich and Dana Ron. On testing expansion in bounded-degree graphs. Electronic Colloquium on Computational Complexity (ECCC), 7:20, 2000. Reut Levi, Dana Ron, and Ronitt Rubinfeld. Testing properties of collections of distributions. Theory of Computing, 9:295–347, 2013. Stephen J. Montgomery-Smith. The distribution of Rademacher sums. Proceedings of the American Mathematical Society, 109(2):517–522, 1990. Ilan Newman and Mario Szegedy. Public vs. private coin flips in one round communication games. In Proceedings of the twenty-eighth annual ACM symposium on Theory of computing, pages 561–570. ACM, 1996. Liam Paninski. A coincidence-based test for uniformity given very sparsely sampled discrete data. IEEE Transactions on Information Theory, 54(10):4750–4755, 2008.

35

slide-87
SLIDE 87

Jaak Peetre. A theory of interpolation of normed spaces. Notas de Matemática, No. 39. Instituto de Matemática Pura e Aplicada, Conselho Nacional de Pesquisas, Rio de Janeiro, 1968. Paul Valiant. Testing symmetric properties of distributions. SIAM Journal on Computing, 40(6):1927–1968, 2011. Gregory Valiant and Paul Valiant. An automatic inequality prover and instance optimal identity testing. In Proceedings of FOCS, 2014.

35