communication complexity of learning discrete
play

Communication Complexity of Learning Discrete Distributions - PowerPoint PPT Presentation

Communication Complexity of Learning Discrete Distributions Krzysztof Onak IBM T.J. Watson Research Center Joint work with Ilias Diakonikolas , Elena Grigorescu , and Abhiram Natarajan . Krzysztof Onak (IBM Research) Communication Complexity of


  1. Communication Complexity of Learning Discrete Distributions Krzysztof Onak IBM T.J. Watson Research Center Joint work with Ilias Diakonikolas , Elena Grigorescu , and Abhiram Natarajan . Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 1 / 20

  2. Distribution Learning and Testing Input: Stream of independent samples from an unknown distribution D x 1 , x 2 , x 3 , x 4 , . . . Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 2 / 20

  3. Distribution Learning and Testing Input: Stream of independent samples from an unknown distribution D x 1 , x 2 , x 3 , x 4 , . . . Goal: Learn the distribution or test a property or estimate a parameter Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 2 / 20

  4. Distribution Learning and Testing Input: Stream of independent samples from an unknown distribution D x 1 , x 2 , x 3 , x 4 , . . . Goal: Learn the distribution or test a property or estimate a parameter • Small total variation distance error acceptable • Traditional focus: sample complexity Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 2 / 20

  5. Learning Discrete Distributions D = probability distribution on { 1 , . . . , n } Input: Independent samples from D x 1 , x 2 , x 3 , x 4 , . . . Goal: Output a distribution D ′ such that �D − D ′ � 1 < ǫ Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 3 / 20

  6. Learning Discrete Distributions D = probability distribution on { 1 , . . . , n } Input: Independent samples from D x 1 , x 2 , x 3 , x 4 , . . . Goal: Output a distribution D ′ such that �D − D ′ � 1 < ǫ Sample complexity: Θ( n /ǫ 2 ) Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 3 / 20

  7. Communication Complexity Distributed data: samples held by different players Example: Samples in different data centers Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 4 / 20

  8. Communication Complexity Distributed data: samples held by different players Example: Samples in different data centers How much do players have to communicate to solve the problem? Is sublinear communication possible? Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 4 / 20

  9. “Survey” Complexity This talk will focus on the simplest setting: • Each player has one sample and sends a single message to a referee • The referee outputs solution sample Player 1 sample Player 2 sample Player 3 Referee output sample Player p Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 5 / 20

  10. “Survey” Complexity This talk will focus on the simplest setting: • Each player has one sample and sends a single message to a referee • The referee outputs solution sample Player 1 sample Player 2 sample Player 3 Referee output sample Player p • Each sample is Θ( log n ) bits • Can average communication be made o ( log n ) ? Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 5 / 20

  11. Related Work A lot of recent interest in communication-efficient learning: DAW12, ZDW13, ZX15, GMN14, KVW14, LBKW14, SSZ14, DJWZ14, LSLT15, BGMNW15 • Both upper and lower bounds. • Usually more continuous problems. • Sample problem: estimating the mean of a Gaussian distribution. Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 6 / 20

  12. Related Work A lot of recent interest in communication-efficient learning: DAW12, ZDW13, ZX15, GMN14, KVW14, LBKW14, SSZ14, DJWZ14, LSLT15, BGMNW15 • Both upper and lower bounds. • Usually more continuous problems. • Sample problem: estimating the mean of a Gaussian distribution. See Mark Braverman’s talk tomorrow Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 6 / 20

  13. Outline O ( n /ǫ 2 ) Sample Complexity Review 1 Communication Complexity Lower Bound 2 Quick Distribution Testing Example 3 Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 7 / 20

  14. Outline O ( n /ǫ 2 ) Sample Complexity Review 1 Communication Complexity Lower Bound 2 Quick Distribution Testing Example 3 Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 8 / 20

  15. Upper Bound Review Solution: D ′ = empirical distribution of O ( n /ǫ 2 ) samples Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 9 / 20

  16. Upper Bound Review Solution: D ′ = empirical distribution of O ( n /ǫ 2 ) samples Why this works: • For every subset of { 1 , . . . , n } the probabilities under D and D ′ within ǫ/ 2 with probability 1 − 2 − 2 n Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 9 / 20

  17. Upper Bound Review Solution: D ′ = empirical distribution of O ( n /ǫ 2 ) samples Why this works: • For every subset of { 1 , . . . , n } the probabilities under D and D ′ within ǫ/ 2 with probability 1 − 2 − 2 n • Union bound: �D − D ′ � 1 ≤ ǫ with probability 1 − o ( 1 ) Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 9 / 20

  18. Lower Bound Review Fact: Hoeffding’s inequality is optimal • ǫ -biased coin, determine direction of the bias • Ω( ǫ − 2 ) coin tosses needed Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 10 / 20

  19. Lower Bound Review Fact: Hoeffding’s inequality is optimal • ǫ -biased coin, determine direction of the bias • Ω( ǫ − 2 ) coin tosses needed Construction: δ 1 = 1 δ 2 = − 1 δ 3 = 1 δ 4 = 1 10 δ 1 ǫ − 10 δ 1 ǫ 1 2 3 4 5 6 7 8 Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 10 / 20

  20. Lower Bound Review Fact: Hoeffding’s inequality is optimal • ǫ -biased coin, determine direction of the bias • Ω( ǫ − 2 ) coin tosses needed Construction: δ 1 = 1 δ 2 = − 1 δ 3 = 1 δ 4 = 1 10 δ 1 ǫ − 10 δ 1 ǫ 1 2 3 4 5 6 7 8 • Each pair randomly biased by 10 ǫ Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 10 / 20

  21. Lower Bound Review Fact: Hoeffding’s inequality is optimal • ǫ -biased coin, determine direction of the bias • Ω( ǫ − 2 ) coin tosses needed Construction: δ 1 = 1 δ 2 = − 1 δ 3 = 1 δ 4 = 1 10 δ 1 ǫ − 10 δ 1 ǫ 1 2 3 4 5 6 7 8 • Each pair randomly biased by 10 ǫ 9 • Need to predict bias of more than 10 pairs (via averaging/Markov’s bound) Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 10 / 20

  22. Lower Bound Review Fact: Hoeffding’s inequality is optimal • ǫ -biased coin, determine direction of the bias • Ω( ǫ − 2 ) coin tosses needed Construction: δ 1 = 1 δ 2 = − 1 δ 3 = 1 δ 4 = 1 10 δ 1 ǫ − 10 δ 1 ǫ 1 2 3 4 5 6 7 8 • Each pair randomly biased by 10 ǫ 9 • Need to predict bias of more than 10 pairs (via averaging/Markov’s bound) • This requires Ω( n /ǫ 2 ) samples Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 10 / 20

  23. Outline O ( n /ǫ 2 ) Sample Complexity Review 1 Communication Complexity Lower Bound 2 Quick Distribution Testing Example 3 Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 11 / 20

  24. Our Claim � � n No protocol with o ǫ 2 log n communication on average that succeeds learning the distribution with probability 99 / 100. Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 12 / 20

  25. Our Claim � � n No protocol with o ǫ 2 log n communication on average that succeeds learning the distribution with probability 99 / 100. n /ǫ 2 log n � � (Can assume at most O players in the proof) Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 12 / 20

  26. Hard Distribution Reuse the hard distribution for sampling: δ 1 = 1 δ 2 = − 1 δ 3 = 1 δ 4 = 1 10 δ 1 ǫ − 10 δ 1 ǫ 1 2 3 4 5 6 7 8 Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 13 / 20

  27. Hard Distribution Reuse the hard distribution for sampling: δ 1 = 1 δ 2 = − 1 δ 3 = 1 δ 4 = 1 10 δ 1 ǫ − 10 δ 1 ǫ 1 2 3 4 5 6 7 8 Can assume the protocol is deterministic: • Slight loss in the probability of success • Expected communication goes up by constant factor Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 13 / 20

  28. The Proof Plan • Assume o ( n ǫ − 2 log n ) communication protocol Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 14 / 20

  29. The Proof Plan • Assume o ( n ǫ − 2 log n ) communication protocol • For random i , show that: • Messages reveal very little about δ i (even if the referee knows all other δ i ’s) • The referee can predict δ i with probability 1 2 + o ( 1 ) Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 14 / 20

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend