Communication Complexity of Learning Discrete Distributions - PowerPoint PPT Presentation

Communication Complexity of Learning Discrete Distributions Krzysztof Onak IBM T.J. Watson Research Center Joint work with Ilias Diakonikolas , Elena Grigorescu , and Abhiram Natarajan . Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 1 / 20

Distribution Learning and Testing Input: Stream of independent samples from an unknown distribution D x 1 , x 2 , x 3 , x 4 , . . . Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 2 / 20

Distribution Learning and Testing Input: Stream of independent samples from an unknown distribution D x 1 , x 2 , x 3 , x 4 , . . . Goal: Learn the distribution or test a property or estimate a parameter Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 2 / 20

Distribution Learning and Testing Input: Stream of independent samples from an unknown distribution D x 1 , x 2 , x 3 , x 4 , . . . Goal: Learn the distribution or test a property or estimate a parameter • Small total variation distance error acceptable • Traditional focus: sample complexity Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 2 / 20

Learning Discrete Distributions D = probability distribution on { 1 , . . . , n } Input: Independent samples from D x 1 , x 2 , x 3 , x 4 , . . . Goal: Output a distribution D ′ such that �D − D ′ � 1 < ǫ Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 3 / 20

Learning Discrete Distributions D = probability distribution on { 1 , . . . , n } Input: Independent samples from D x 1 , x 2 , x 3 , x 4 , . . . Goal: Output a distribution D ′ such that �D − D ′ � 1 < ǫ Sample complexity: Θ( n /ǫ 2 ) Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 3 / 20

Communication Complexity Distributed data: samples held by different players Example: Samples in different data centers Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 4 / 20

Communication Complexity Distributed data: samples held by different players Example: Samples in different data centers How much do players have to communicate to solve the problem? Is sublinear communication possible? Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 4 / 20

“Survey” Complexity This talk will focus on the simplest setting: • Each player has one sample and sends a single message to a referee • The referee outputs solution sample Player 1 sample Player 2 sample Player 3 Referee output sample Player p Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 5 / 20

“Survey” Complexity This talk will focus on the simplest setting: • Each player has one sample and sends a single message to a referee • The referee outputs solution sample Player 1 sample Player 2 sample Player 3 Referee output sample Player p • Each sample is Θ( log n ) bits • Can average communication be made o ( log n ) ? Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 5 / 20

Related Work A lot of recent interest in communication-efficient learning: DAW12, ZDW13, ZX15, GMN14, KVW14, LBKW14, SSZ14, DJWZ14, LSLT15, BGMNW15 • Both upper and lower bounds. • Usually more continuous problems. • Sample problem: estimating the mean of a Gaussian distribution. Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 6 / 20

Related Work A lot of recent interest in communication-efficient learning: DAW12, ZDW13, ZX15, GMN14, KVW14, LBKW14, SSZ14, DJWZ14, LSLT15, BGMNW15 • Both upper and lower bounds. • Usually more continuous problems. • Sample problem: estimating the mean of a Gaussian distribution. See Mark Braverman’s talk tomorrow Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 6 / 20

Outline O ( n /ǫ 2 ) Sample Complexity Review 1 Communication Complexity Lower Bound 2 Quick Distribution Testing Example 3 Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 7 / 20

Upper Bound Review Solution: D ′ = empirical distribution of O ( n /ǫ 2 ) samples Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 9 / 20

Upper Bound Review Solution: D ′ = empirical distribution of O ( n /ǫ 2 ) samples Why this works: • For every subset of { 1 , . . . , n } the probabilities under D and D ′ within ǫ/ 2 with probability 1 − 2 − 2 n Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 9 / 20

Upper Bound Review Solution: D ′ = empirical distribution of O ( n /ǫ 2 ) samples Why this works: • For every subset of { 1 , . . . , n } the probabilities under D and D ′ within ǫ/ 2 with probability 1 − 2 − 2 n • Union bound: �D − D ′ � 1 ≤ ǫ with probability 1 − o ( 1 ) Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 9 / 20

Lower Bound Review Fact: Hoeffding’s inequality is optimal • ǫ -biased coin, determine direction of the bias • Ω( ǫ − 2 ) coin tosses needed Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 10 / 20

Lower Bound Review Fact: Hoeffding’s inequality is optimal • ǫ -biased coin, determine direction of the bias • Ω( ǫ − 2 ) coin tosses needed Construction: δ 1 = 1 δ 2 = − 1 δ 3 = 1 δ 4 = 1 10 δ 1 ǫ − 10 δ 1 ǫ 1 2 3 4 5 6 7 8 Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 10 / 20

Lower Bound Review Fact: Hoeffding’s inequality is optimal • ǫ -biased coin, determine direction of the bias • Ω( ǫ − 2 ) coin tosses needed Construction: δ 1 = 1 δ 2 = − 1 δ 3 = 1 δ 4 = 1 10 δ 1 ǫ − 10 δ 1 ǫ 1 2 3 4 5 6 7 8 • Each pair randomly biased by 10 ǫ Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 10 / 20

Lower Bound Review Fact: Hoeffding’s inequality is optimal • ǫ -biased coin, determine direction of the bias • Ω( ǫ − 2 ) coin tosses needed Construction: δ 1 = 1 δ 2 = − 1 δ 3 = 1 δ 4 = 1 10 δ 1 ǫ − 10 δ 1 ǫ 1 2 3 4 5 6 7 8 • Each pair randomly biased by 10 ǫ 9 • Need to predict bias of more than 10 pairs (via averaging/Markov’s bound) Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 10 / 20

Lower Bound Review Fact: Hoeffding’s inequality is optimal • ǫ -biased coin, determine direction of the bias • Ω( ǫ − 2 ) coin tosses needed Construction: δ 1 = 1 δ 2 = − 1 δ 3 = 1 δ 4 = 1 10 δ 1 ǫ − 10 δ 1 ǫ 1 2 3 4 5 6 7 8 • Each pair randomly biased by 10 ǫ 9 • Need to predict bias of more than 10 pairs (via averaging/Markov’s bound) • This requires Ω( n /ǫ 2 ) samples Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 10 / 20

Our Claim � � n No protocol with o ǫ 2 log n communication on average that succeeds learning the distribution with probability 99 / 100. Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 12 / 20

Our Claim � � n No protocol with o ǫ 2 log n communication on average that succeeds learning the distribution with probability 99 / 100. n /ǫ 2 log n � � (Can assume at most O players in the proof) Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 12 / 20

Hard Distribution Reuse the hard distribution for sampling: δ 1 = 1 δ 2 = − 1 δ 3 = 1 δ 4 = 1 10 δ 1 ǫ − 10 δ 1 ǫ 1 2 3 4 5 6 7 8 Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 13 / 20

Hard Distribution Reuse the hard distribution for sampling: δ 1 = 1 δ 2 = − 1 δ 3 = 1 δ 4 = 1 10 δ 1 ǫ − 10 δ 1 ǫ 1 2 3 4 5 6 7 8 Can assume the protocol is deterministic: • Slight loss in the probability of success • Expected communication goes up by constant factor Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 13 / 20

The Proof Plan • Assume o ( n ǫ − 2 log n ) communication protocol Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 14 / 20

The Proof Plan • Assume o ( n ǫ − 2 log n ) communication protocol • For random i , show that: • Messages reveal very little about δ i (even if the referee knows all other δ i ’s) • The referee can predict δ i with probability 1 2 + o ( 1 ) Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 14 / 20

Communication Complexity of Learning Discrete Distributions - PowerPoint PPT Presentation

Communication Complexity of Learning Discrete Distributions Krzysztof Onak IBM T.J. Watson Research Center Joint work with Ilias Diakonikolas , Elena Grigorescu , and Abhiram Natarajan . Krzysztof Onak (IBM Research) Communication Complexity of

Communication Complexity Lecture 23 Computing with remote inputs 1 Communication Complexity

Data Streams & Communication Complexity Lecture 3: Communication Complexity and Lower Bounds

SK Telecom 1 U U U U U U U- U - - communication - - - - - communication

Communication Complexity BASICS Summer School 2015 Communication

Discrete Mathematics Jeremy Siek Spring 2010 Jeremy Siek Discrete Mathematics 1 / 118 Jeremy

Cyber-Physical Systems Discrete Dynamics IECE 553/453 Fall 2019 Prof. Dola Saha 1 Discrete

CMSC 222: Discrete Mathematics Prof S Fall 2018 What is Discrete Mathematics? Discrete

Cyber-Physical Systems Discrete Dynamics ICEN 553/453 Fall 2018 Prof. Dola Saha 1 Discrete

Plan Discrete paths as Heyting algebras Discrete paths as categories Discrete paths as quantales

Discrete-time Systems in the Time Domain Chaiwoot Boonyasiriwat August 21, 2020 Discrete-time

Evidence evaluation for discrete data Evidence evaluation for discrete data Evidence evaluation

Hans Vangheluwe Modelling and Simulation Causes of Complexity Dealing with Complexity

Hans Vangheluwe Modelling and Simulation Causes of Complexity Dealing with Complexity

Background Background Text Complexity Text Complexity Text Complexity Sowmya V.B., Sowmya

Kolmogorov Complexity of Categories Complexity Programing Language Kolmogorov Noson S.

IN 5210 Complexity Theory Complexity Complexity: Socio-technical (Internet, globalization)

Distributed Graph Processing Lecture 13 CSCI 4974/6971 17 Oct 2016 1 / 9 Todays Biz 1.

MapReduce Reduce Introdu duction ion and Hadoop p Overvie view Lab Course: Databases &

CSC 369: Distributed Computing Alex Dekhtyar Day 1: Welcome Syllabus Teaching and

Parallel Distributed Processing: Further Explorations in the Microstructure of Cognition

Lecture 13 : The Exponential Distribution 0/ 19 Definition A continuous random variable X is

Security proofs for continuous-variable quantum key distribution Anthony Leverrier Inria Paris

Definition of Stochastic Processes Definition of Stochastic Processes st Order Density

BeyOND Unleashing BOND Thomas Bernecker, Franz Graf, Hans-Peter Kriegel, , , g , Christian

Communication Complexity of Learning Discrete Distributions - PowerPoint PPT Presentation

Communication Complexity of Learning Discrete Distributions Krzysztof Onak IBM T.J. Watson Research Center Joint work with Ilias Diakonikolas , Elena Grigorescu , and Abhiram Natarajan . Krzysztof Onak (IBM Research) Communication Complexity of

Communication Complexity Lecture 23 Computing with remote inputs 1 Communication Complexity

Data Streams &amp; Communication Complexity Lecture 3: Communication Complexity and Lower Bounds

SK Telecom 1 U U U U U U U- U - - communication - - - - - communication

Communication Complexity BASICS Summer School 2015 Communication

Discrete Mathematics Jeremy Siek Spring 2010 Jeremy Siek Discrete Mathematics 1 / 118 Jeremy

Cyber-Physical Systems Discrete Dynamics IECE 553/453 Fall 2019 Prof. Dola Saha 1 Discrete

CMSC 222: Discrete Mathematics Prof S Fall 2018 What is Discrete Mathematics? Discrete

Cyber-Physical Systems Discrete Dynamics ICEN 553/453 Fall 2018 Prof. Dola Saha 1 Discrete

Plan Discrete paths as Heyting algebras Discrete paths as categories Discrete paths as quantales

Discrete-time Systems in the Time Domain Chaiwoot Boonyasiriwat August 21, 2020 Discrete-time

Evidence evaluation for discrete data Evidence evaluation for discrete data Evidence evaluation

Hans Vangheluwe Modelling and Simulation Causes of Complexity Dealing with Complexity

Hans Vangheluwe Modelling and Simulation Causes of Complexity Dealing with Complexity

Background Background Text Complexity Text Complexity Text Complexity Sowmya V.B., Sowmya

Kolmogorov Complexity of Categories Complexity Programing Language Kolmogorov Noson S.

IN 5210 Complexity Theory Complexity Complexity: Socio-technical (Internet, globalization)

Distributed Graph Processing Lecture 13 CSCI 4974/6971 17 Oct 2016 1 / 9 Todays Biz 1.

MapReduce Reduce Introdu duction ion and Hadoop p Overvie view Lab Course: Databases &amp;

CSC 369: Distributed Computing Alex Dekhtyar Day 1: Welcome Syllabus Teaching and

Parallel Distributed Processing: Further Explorations in the Microstructure of Cognition

Lecture 13 : The Exponential Distribution 0/ 19 Definition A continuous random variable X is

Security proofs for continuous-variable quantum key distribution Anthony Leverrier Inria Paris

Definition of Stochastic Processes Definition of Stochastic Processes st Order Density

BeyOND Unleashing BOND Thomas Bernecker, Franz Graf, Hans-Peter Kriegel, , , g , Christian

Data Streams & Communication Complexity Lecture 3: Communication Complexity and Lower Bounds

MapReduce Reduce Introdu duction ion and Hadoop p Overvie view Lab Course: Databases &