[PPT] - Communication and Memory Efficient Testing of Discrete PowerPoint Presentation

SLIDE 1

Communication and Memory Efficient Testing of Discrete Distributions Themis Gouleakis

USC→MPI

July 21, 2019 Joint work with: Ilias Diakonikolas (USC), Daniel Kane (UCSD) and Sankeerth Rao (UCSD)

SLIDE 2

MOTIVATION

◮ Datasets growing → too many samples needed! ◮ Can we do property testing distributedly? ◮ Insufficient memory! ◮ Design low memory algorithms!

2 / 15

SLIDE 3

Is the lottery fair?

vs ◮ We can learn the distribution: Ω(n) samples. ◮ Centralized sampling/ unbounded memory: we can test (uniform vs ε-far) with Θ(√n/ε2) samples. ◮ What if we have memory constraints/unavailable centralized sampling?

3 / 15

SLIDE 4

DEFINITION AND (CENTRALIZED) PRIOR WORK

vs

Uniformity testing problem

Given samples from a probability distribution p, distinguish p = Un from p − Un1 > ε with success probability at least 2/3. ◮ Sample complexity: Θ

√n

ε2

[Goldreich, Ron 00],[Batu,

Fisher, Fortnow, Kumar, Rubinfeld, White 01],[Paninski 08], [Chan, Diakonikolas, Valiant, Valiant 14], [Diakonikolas, G, Peebles, Price 17]

4 / 15

SLIDE 5

PRIOR/RELATED WORK

Distributed learning

◮ Parameter estimation [ZDJW13],[GMN14],[BGMNW16],[JLY16],[HOW18] ◮ Non-parametric [DGLNOS17],[HMOW18]

Distributed testing

◮ Single sample per machine with sublogarithmic size messages: [Acharya, Cannone, Tyagi 18] ◮ Two-party setting: [Andoni, Malkin, Nosatzki 18] ◮ LOCAL and CONGEST models: [Fisher, Meir, Oshman 18]

5 / 15

SLIDE 6

CENTRALIZED COLLISION-BASED ALGORITHM

[GOLDREICH, RON 00],[BATU, FISHER, FORTNOW, KUMAR, RUBINFELD, WHITE 01]

Problem: Given distribution p over [n], distinguish p = Un from p − Un1 ≥ ǫ. ◮ m samples ◮ Node labels: i.i.d samples from p. ◮ Edges: {i, j} ∈ E iff L(i) = L(j) ◮ Define statistic Z = ♯edges ⇒ E[Z] =

m

2

· p2

2

◮ Minimized for p = Un

◮ Idea: Draw enough samples and compare Z to some threshold.

6 / 15

SLIDE 7

GENERIC BIPARTITE TESTING ALGORITHM

ℓ SAMPLES PER MACHINE

Problem: Given distribution p over [n], distinguish p = Un from p − Un1 ≥ ǫ. ◮ ℓ samples per machine. ◮ Node labels: i.i.d samples from p. ◮ Edges: {i, j} ∈ E iff (i ∈ S1) ∧ (j ∈ S2) ∧ (L(i) = L(j))

7 / 15

SLIDE 8

GENERIC BIPARTITE TESTING ALGORITHM

ℓ SAMPLES PER MACHINE

Problem: Given distribution p over [n], distinguish p = Un from p − Un1 ≥ ǫ. ◮ ℓ samples per machine. ◮ Node labels: i.i.d samples from p. ◮ Edges: {i, j} ∈ E iff (i ∈ S1) ∧ (j ∈ S2) ∧ (L(i) = L(j)) ◮ Define statistic Z = ♯edges ⇒ E[Z] = |S1| · |S2| · p2

2

◮ Minimized for p = Un

◮ Remark: Suboptimal sample complexity, but can lead to

ptimal communication complexity in certain cases.

8 / 15

SLIDE 9

COMMUNICATION MODEL

◮ Unbounded number of players ◮ Players can broadcast on the blackboard ◮ The referee asks questions to players and receives replies. ◮ Goal: Minimize total number of bits of communication.

9 / 15

SLIDE 10

A COMMUNICATION EFFICIENT ALGORITHM

◮ Idea: Statistic Z = sum of degrees on one side.

◮ Only the opposite side needs to reveal samples exactly.

◮ Broadcasted samples: ℓ · |S1| = √

n/ℓ ǫ2√log n

◮ Not enough for testing.

◮ And the samples on the right?

◮ Only degrees dk sent to the referee.

◮ O(1) bits/message w.l.o.g.

◮ Communication complexity: O

√

n/ℓ√log n ǫ2

bits.

◮ Matching lower bound of Ω √

n/ℓ√ log n ǫ2

bits for small ℓ.

◮ Better than naive O

√n log n

ǫ2

bits.

10 / 15

SLIDE 11

COMMUNICATION EFFICIENT IMPLEMENTATION

TWO ALGORITHMS

Case I: ℓ = ˜ O(n1/3/ε4/3) samples/ machine ◮ Use cross collisions - bipartite graph ◮ Communication complexity: O

√

n/ℓ√log n ǫ2

bits.

Case II: ℓ = ˜ Ω(n1/3/ε4/3) samples/machine ◮ Each machine sends that number of local collisions and to the referee. ◮ The referee computes the total sum Z of the collisions.

◮ E[Z] = ℓ

2

p2

2

◮ Threshold: (1 + ε2)E[Z]

◮ Communication complexity: O

n log n

ℓ2ǫ4

bits.

11 / 15

SLIDE 12

MEMORY EFFICIENT IMPLEMENTATION

IN THE ONE-PASS STREAMING MODEL

Model:

One-pass streaming algorithm: The samples arrive in a stream and the algorithm can access them only once.

Memory constraint: At most m bits for some m ≥ log n/ε6

◮ Use N1 = m/2 log n samples to get the multiset of labels S1. ◮ Use collision information from N2 = Θ

n log n/(mε4)

ther samples (i.e the multiset of labels S2).

Remarks: ◮ We can store

r

k=1

dk, 1 ≤ r ≤ N2 in a single pass. ◮ For m = Ω(√n log n/ε2), we simply run the classical collision-based tester using the first O(√n/ε2) samples.

12 / 15

SLIDE 13

SUMMARY OF RESULTS

Sample Complexity Bounds with Memory Constraints Property Upper Bound Lower Bound 1 Lower Bound 2 Uniformity O

n log n

mε4

Ω
n log n

mε4

Ω
n

mε2

Conditions

n0.9 ≫ m ≫ log(n)/ε2 m = ˜ Ω( n0.34

ε8/3 + n0.1 ε4 )

Unconditional Closeness O(n

log(n)/(√mε2))

– – Conditions ˜ Θ(min(n, n2/3/ε4/3)) ≫ m ≫ log(n) – – Communication Complexity Bounds Property UB 1 UB 2 LB 1 LB 2 LB 3 Uniformity O

√

n log(n)/ℓ ε2

O

n log(n)

ℓ2ε4

Ω

√

n log(n)/ℓ ε2

Ω(

√

n/ℓ ε

) Ω(

n ℓ2ε2 log n)

Conditions

ε8n log n ≫ ℓ ≫ ε−4 n0.9

ℓ ≪

√n ε2

ε4/3n0.3 ≫ ℓ ℓ = ˜ O

n1/3

ε4/3

ℓ = ˜

Ω

n1/3

ε4/3

Closeness

O

n2/3 log1/3(n)

ℓ2/3ε4/3

Conditions

nε4/ log(n) ≫ ℓ

13 / 15

SLIDE 14

LOWER BOUNDS (ONE PASS)

k SAMPLES, m BITS OF MEMORY, ℓ SAMPLES PER MACHINE

1. Memory:

◮ k · m = Ω( n

ε2 )

◮ Under technical assumptions: k · m = Ω( n log n

ε4

)

Reduction (low communication ⇒ low memory)

◮ samples/machine: ℓ ◮ bits of communication: t

Store samples of the next player only ⇒ t + ℓ log n-memory

2. Communication (ℓ = O
n1/3

ε4/3(log n)1/3

)-one pass:

◮ Ω √

n/ℓ ε

samples.

◮ Under assumptions: Ω √

n log n/ℓ ε2

3. Communication (ℓ = Ω
n1/3

ε4/3(log n)1/3

)-one pass:

◮ Ω

n

ℓ2ε2 log n

samples.

14 / 15

SLIDE 15

SUMMARY-OPEN PROBLEMS

◮ We described a bipartite collision-based algorithm for uniformity.

◮ Then applied it to memory constrained and distributed settings.

◮ Showed matching lower bounds for certain parameter regimes.

◮ An asymptotically optimal algorithm becomes (provably) suboptimal as ℓ grows.

Open Problems: ◮ Do the lower bounds still hold if multiple passes are allowed? ◮ Is there an algorithm with a better communication-sample complexity trade-off?

15 / 15