Communication Complexity of Learning Discrete Distributions - - PowerPoint PPT Presentation

communication complexity of learning discrete
SMART_READER_LITE
LIVE PREVIEW

Communication Complexity of Learning Discrete Distributions - - PowerPoint PPT Presentation

Communication Complexity of Learning Discrete Distributions Krzysztof Onak IBM T.J. Watson Research Center Joint work with Ilias Diakonikolas , Elena Grigorescu , and Abhiram Natarajan . Krzysztof Onak (IBM Research) Communication Complexity of


slide-1
SLIDE 1

Communication Complexity

  • f Learning Discrete Distributions

Krzysztof Onak

IBM T.J. Watson Research Center

Joint work with Ilias Diakonikolas, Elena Grigorescu, and Abhiram Natarajan.

Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 1 / 20

slide-2
SLIDE 2

Distribution Learning and Testing

Input: Stream of independent samples from an unknown distribution D

x1,x2,x3,x4, . . .

Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 2 / 20

slide-3
SLIDE 3

Distribution Learning and Testing

Input: Stream of independent samples from an unknown distribution D

x1,x2,x3,x4, . . .

Goal: Learn the distribution

  • r test a property
  • r estimate a parameter

Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 2 / 20

slide-4
SLIDE 4

Distribution Learning and Testing

Input: Stream of independent samples from an unknown distribution D

x1,x2,x3,x4, . . .

Goal: Learn the distribution

  • r test a property
  • r estimate a parameter
  • Small total variation distance error acceptable
  • Traditional focus: sample complexity

Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 2 / 20

slide-5
SLIDE 5

Learning Discrete Distributions

D = probability distribution on {1, . . . , n} Input: Independent samples from D x1,x2,x3,x4, . . . Goal: Output a distribution D′ such that D − D′1 < ǫ

Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 3 / 20

slide-6
SLIDE 6

Learning Discrete Distributions

D = probability distribution on {1, . . . , n} Input: Independent samples from D x1,x2,x3,x4, . . . Goal: Output a distribution D′ such that D − D′1 < ǫ Sample complexity: Θ(n/ǫ2)

Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 3 / 20

slide-7
SLIDE 7

Communication Complexity

Distributed data: samples held by different players Example: Samples in different data centers

Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 4 / 20

slide-8
SLIDE 8

Communication Complexity

Distributed data: samples held by different players Example: Samples in different data centers

How much do players have to communicate to solve the problem? Is sublinear communication possible?

Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 4 / 20

slide-9
SLIDE 9

“Survey” Complexity

This talk will focus on the simplest setting:

  • Each player has one sample

and sends a single message to a referee

  • The referee outputs solution
  • utput

sample sample sample sample Player 2 Player 3 Player p Referee Player 1

Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 5 / 20

slide-10
SLIDE 10

“Survey” Complexity

This talk will focus on the simplest setting:

  • Each player has one sample

and sends a single message to a referee

  • The referee outputs solution
  • utput

sample sample sample sample Player 2 Player 3 Player p Referee Player 1

  • Each sample is Θ(log n) bits
  • Can average communication be made o(log n)?

Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 5 / 20

slide-11
SLIDE 11

Related Work

A lot of recent interest in communication-efficient learning: DAW12, ZDW13, ZX15, GMN14, KVW14, LBKW14, SSZ14, DJWZ14, LSLT15, BGMNW15

  • Both upper and lower bounds.
  • Usually more continuous problems.
  • Sample problem: estimating the mean of a Gaussian

distribution.

Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 6 / 20

slide-12
SLIDE 12

Related Work

A lot of recent interest in communication-efficient learning: DAW12, ZDW13, ZX15, GMN14, KVW14, LBKW14, SSZ14, DJWZ14, LSLT15, BGMNW15

  • Both upper and lower bounds.
  • Usually more continuous problems.
  • Sample problem: estimating the mean of a Gaussian

distribution.

See Mark Braverman’s talk tomorrow

Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 6 / 20

slide-13
SLIDE 13

Outline

1

O(n/ǫ2) Sample Complexity Review

2

Communication Complexity Lower Bound

3

Quick Distribution Testing Example

Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 7 / 20

slide-14
SLIDE 14

Outline

1

O(n/ǫ2) Sample Complexity Review

2

Communication Complexity Lower Bound

3

Quick Distribution Testing Example

Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 8 / 20

slide-15
SLIDE 15

Upper Bound Review

Solution: D′ = empirical distribution of O(n/ǫ2) samples

Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 9 / 20

slide-16
SLIDE 16

Upper Bound Review

Solution: D′ = empirical distribution of O(n/ǫ2) samples Why this works:

  • For every subset of {1, . . . , n} the probabilities under

D and D′ within ǫ/2 with probability 1 − 2−2n

Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 9 / 20

slide-17
SLIDE 17

Upper Bound Review

Solution: D′ = empirical distribution of O(n/ǫ2) samples Why this works:

  • For every subset of {1, . . . , n} the probabilities under

D and D′ within ǫ/2 with probability 1 − 2−2n

  • Union bound: D − D′1 ≤ ǫ with probability 1 − o(1)

Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 9 / 20

slide-18
SLIDE 18

Lower Bound Review

Fact: Hoeffding’s inequality is optimal

  • ǫ-biased coin, determine direction of the bias
  • Ω(ǫ−2) coin tosses needed

Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 10 / 20

slide-19
SLIDE 19

Lower Bound Review

Fact: Hoeffding’s inequality is optimal

  • ǫ-biased coin, determine direction of the bias
  • Ω(ǫ−2) coin tosses needed

Construction:

1 2 3 4 5 6 7 8

δ1 = 1 δ2 = −1 δ3 = 1 δ4 = 1 10δ1ǫ −10δ1ǫ

Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 10 / 20

slide-20
SLIDE 20

Lower Bound Review

Fact: Hoeffding’s inequality is optimal

  • ǫ-biased coin, determine direction of the bias
  • Ω(ǫ−2) coin tosses needed

Construction:

1 2 3 4 5 6 7 8

δ1 = 1 δ2 = −1 δ3 = 1 δ4 = 1 10δ1ǫ −10δ1ǫ

  • Each pair randomly biased by 10ǫ

Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 10 / 20

slide-21
SLIDE 21

Lower Bound Review

Fact: Hoeffding’s inequality is optimal

  • ǫ-biased coin, determine direction of the bias
  • Ω(ǫ−2) coin tosses needed

Construction:

1 2 3 4 5 6 7 8

δ1 = 1 δ2 = −1 δ3 = 1 δ4 = 1 10δ1ǫ −10δ1ǫ

  • Each pair randomly biased by 10ǫ
  • Need to predict bias of more than

9 10 pairs

(via averaging/Markov’s bound)

Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 10 / 20

slide-22
SLIDE 22

Lower Bound Review

Fact: Hoeffding’s inequality is optimal

  • ǫ-biased coin, determine direction of the bias
  • Ω(ǫ−2) coin tosses needed

Construction:

1 2 3 4 5 6 7 8

δ1 = 1 δ2 = −1 δ3 = 1 δ4 = 1 10δ1ǫ −10δ1ǫ

  • Each pair randomly biased by 10ǫ
  • Need to predict bias of more than

9 10 pairs

(via averaging/Markov’s bound)

  • This requires Ω(n/ǫ2) samples

Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 10 / 20

slide-23
SLIDE 23

Outline

1

O(n/ǫ2) Sample Complexity Review

2

Communication Complexity Lower Bound

3

Quick Distribution Testing Example

Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 11 / 20

slide-24
SLIDE 24

Our Claim No protocol with o

  • n

ǫ2 log n

  • communication on average

that succeeds learning the distribution with probability 99/100.

Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 12 / 20

slide-25
SLIDE 25

Our Claim No protocol with o

  • n

ǫ2 log n

  • communication on average

that succeeds learning the distribution with probability 99/100.

(Can assume at most O

  • n/ǫ2 log n
  • players in the proof)

Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 12 / 20

slide-26
SLIDE 26

Hard Distribution

Reuse the hard distribution for sampling:

1 2 3 4 5 6 7 8

δ1 = 1 δ2 = −1 δ3 = 1 δ4 = 1 10δ1ǫ −10δ1ǫ

Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 13 / 20

slide-27
SLIDE 27

Hard Distribution

Reuse the hard distribution for sampling:

1 2 3 4 5 6 7 8

δ1 = 1 δ2 = −1 δ3 = 1 δ4 = 1 10δ1ǫ −10δ1ǫ

Can assume the protocol is deterministic:

  • Slight loss in the probability of success
  • Expected communication goes up by constant factor

Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 13 / 20

slide-28
SLIDE 28

The Proof Plan

  • Assume o(nǫ−2 log n) communication protocol

Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 14 / 20

slide-29
SLIDE 29

The Proof Plan

  • Assume o(nǫ−2 log n) communication protocol
  • For random i, show that:
  • Messages reveal very little about δi

(even if the referee knows all other δi’s)

  • The referee can predict δi with probability 1

2 + o(1)

Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 14 / 20

slide-30
SLIDE 30

The Proof Plan

  • Assume o(nǫ−2 log n) communication protocol
  • For random i, show that:
  • Messages reveal very little about δi

(even if the referee knows all other δi’s)

  • The referee can predict δi with probability 1

2 + o(1)

  • The original protocol correct only on 1

2 + o(1) fraction

  • f δi’s most of the time

Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 14 / 20

slide-31
SLIDE 31

The Proof Plan

  • Assume o(nǫ−2 log n) communication protocol
  • For random i, show that:
  • Messages reveal very little about δi

(even if the referee knows all other δi’s)

  • The referee can predict δi with probability 1

2 + o(1)

  • The original protocol correct only on 1

2 + o(1) fraction

  • f δi’s most of the time

CONTRADICTION!!!

Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 14 / 20

slide-32
SLIDE 32

Messages of Single Player

Modify protocol for each pair 2j − 1 and 2j:

  • Before: x sent for 2j − 1 and y sent for 2j
  • After: send xy for 2j − 1 and yx for 2j

1 2 3 4 5 6 7 8

x y

Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 15 / 20

slide-33
SLIDE 33

Messages of Single Player

Modify protocol for each pair 2j − 1 and 2j:

  • Before: x sent for 2j − 1 and y sent for 2j
  • After: send xy for 2j − 1 and yx for 2j

1 2 3 4 5 6 7 8

xy yx

Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 15 / 20

slide-34
SLIDE 34

Messages of Single Player

Modify protocol for each pair 2j − 1 and 2j:

  • Before: x sent for 2j − 1 and y sent for 2j
  • After: send xy for 2j − 1 and yx for 2j

1 2 3 4 5 6 7 8

xy yx

Result:

  • Communication complexity only doubles.
  • This partitions pairs. Each message reveals bias on a

specific subset of pairs.

Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 15 / 20

slide-35
SLIDE 35

Messages of Single Player

Three cases for a pair 2i − 1 and 2i and corresponding messages xy and yx:

Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 16 / 20

slide-36
SLIDE 36

Messages of Single Player

Three cases for a pair 2i − 1 and 2i and corresponding messages xy and yx:

1 |xy| > log n 100

  • Happens for o(n/ǫ2) fraction of players
  • Can assume the message reveals the sample
  • I(message; δi) ≤ I(sample; δi) = O(ǫ2/n)

Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 16 / 20

slide-37
SLIDE 37

Messages of Single Player

Three cases for a pair 2i − 1 and 2i and corresponding messages xy and yx:

1 |xy| > log n 100 2 |xy| ≤ log n 100

& ≤√n pairs with these messages

  • Random i: happens with probability n0.01·√n

n

  • Can assume the message reveals the sample
  • I(message; δi) ≤ I(sample; δi) = O(ǫ2/n)

Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 16 / 20

slide-38
SLIDE 38

Messages of Single Player

Three cases for a pair 2i − 1 and 2i and corresponding messages xy and yx:

1 |xy| > log n 100 2 |xy| ≤ log n 100

& ≤√n pairs with these messages

3 |xy| ≤ log n 100

& >√n pairs with these messages

  • Can happen always
  • δi has little impact on probabilities of xy and yx
  • I(sample; δi) = O(ǫ2/(n · #pairs)) = O(ǫ2/n1.5)

Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 16 / 20

slide-39
SLIDE 39

Total Information about δi

Mj = message of the j-th player M = (M1, M2, . . . , Mp)

Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 17 / 20

slide-40
SLIDE 40

Total Information about δi

Mj = message of the j-th player M = (M1, M2, . . . , Mp)

For all but o(1) fraction of i’s:

  • j

I(δi; Mj) = o n ǫ2

  • · O

ǫ2 n

  • + O

n0.52 ǫ2

  • · O

ǫ2 n

  • + O

n log n ǫ2

  • · O

ǫ2 n1.5

  • = o(1)

Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 17 / 20

slide-41
SLIDE 41

Total Information about δi

Mj = message of the j-th player M = (M1, M2, . . . , Mp)

For all but o(1) fraction of i’s:

  • j

I(δi; Mj) = o n ǫ2

  • · O

ǫ2 n

  • + O

n0.52 ǫ2

  • · O

ǫ2 n

  • + O

n log n ǫ2

  • · O

ǫ2 n1.5

  • = o(1)

Then I(δi; M) = o(1):

  • Messages Mj independent once δi is fixed
  • This implies that I(δi; M) ≤

j I(δi, Mj)

Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 17 / 20

slide-42
SLIDE 42

Total Information about δi

Mj = message of the j-th player M = (M1, M2, . . . , Mp)

For all but o(1) fraction of i’s:

  • j

I(δi; Mj) = o n ǫ2

  • · O

ǫ2 n

  • + O

n0.52 ǫ2

  • · O

ǫ2 n

  • + O

n log n ǫ2

  • · O

ǫ2 n1.5

  • = o(1)

Then I(δi; M) = o(1):

  • Messages Mj independent once δi is fixed
  • This implies that I(δi; M) ≤

j I(δi, Mj)

And H(δi|M) = H(δi) − I(δi; M) = 1 − o(1)

Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 17 / 20

slide-43
SLIDE 43

Total Information about δi

Mj = message of the j-th player M = (M1, M2, . . . , Mp)

For all but o(1) fraction of i’s:

  • j

I(δi; Mj) = o n ǫ2

  • · O

ǫ2 n

  • + O

n0.52 ǫ2

  • · O

ǫ2 n

  • + O

n log n ǫ2

  • · O

ǫ2 n1.5

  • = o(1)

Then I(δi; M) = o(1):

  • Messages Mj independent once δi is fixed
  • This implies that I(δi; M) ≤

j I(δi, Mj)

And H(δi|M) = H(δi) − I(δi; M) = 1 − o(1) Algorithm correct with probability 1

2 + o(1)

Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 17 / 20

slide-44
SLIDE 44

Outline

1

O(n/ǫ2) Sample Complexity Review

2

Communication Complexity Lower Bound

3

Quick Distribution Testing Example

Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 18 / 20

slide-45
SLIDE 45

Uniformity Testing

Problem:

  • Distinguish D = U vs. D − U1 ≥ ǫ

Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 19 / 20

slide-46
SLIDE 46

Uniformity Testing

Problem:

  • Distinguish D = U vs. D − U1 ≥ ǫ
  • Sample complexity: Θ

√n/ǫ2

Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 19 / 20

slide-47
SLIDE 47

Uniformity Testing

Problem:

  • Distinguish D = U vs. D − U1 ≥ ǫ
  • Sample complexity: Θ

√n/ǫ2 Communication complexity bound:

  • Assume lengths of all messages o(log n)
  • Methods presented here imply:
  • Referee likely learns n−Ω(1)-fraction of samples
  • Other messages provide little information
  • Not enough to distinguish hard instances

Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 19 / 20

slide-48
SLIDE 48

This talk:

  • Communication lower bounds
  • Players have to essentially transmit their samples

Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 20 / 20

slide-49
SLIDE 49

This talk:

  • Communication lower bounds
  • Players have to essentially transmit their samples

Longer goals

  • Reinterpret known distribution testing and learning

results in this framework

  • Design non-trivial protocols with sublinear amount of

communication

Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 20 / 20

slide-50
SLIDE 50

This talk:

  • Communication lower bounds
  • Players have to essentially transmit their samples

Longer goals

  • Reinterpret known distribution testing and learning

results in this framework

  • Design non-trivial protocols with sublinear amount of

communication

Questions?

Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 20 / 20