Learning Additive Noise Channels: Generalization Bounds and - PowerPoint PPT Presentation

Learning Additive Noise Channels: Generalization Bounds and Algorithms Nir Weinberger Massachusetts Institute of Technology, MA, USA IEEE International Symposium on Information Theory June 2020 1/22

In an nutshell An additive noise channel ⊥ X ∈ R d Y = X + Z , Z ⊥ �� output input noise 2/22

In an nutshell An additive noise channel ⊥ X ∈ R d Y = X + Z , Z ⊥ �� output input noise Z ∼ µ , but µ is unknown and non-parametric . 2/22

In an nutshell An additive noise channel ⊥ X ∈ R d Y = X + Z , Z ⊥ �� output input noise Z ∼ µ , but µ is unknown and non-parametric . ✞ ☎ Can we learn to efficiently communicate from ( Z 1 ,...,Z n ) i . i . d . ∼ µ ? ✝ ✆ 2/22

In an nutshell An additive noise channel ⊥ X ∈ R d Y = X + Z , Z ⊥ �� output input noise Z ∼ µ , but µ is unknown and non-parametric . ✞ ☎ Can we learn to efficiently communicate from ( Z 1 ,...,Z n ) i . i . d . ∼ µ ? ✝ ✆ Generalization bounds for 2/22

In an nutshell An additive noise channel ⊥ X ∈ R d Y = X + Z , Z ⊥ �� output input noise Z ∼ µ , but µ is unknown and non-parametric . ✞ ☎ Can we learn to efficiently communicate from ( Z 1 ,...,Z n ) i . i . d . ∼ µ ? ✝ ✆ Generalization bounds for learning under error probability loss 1 2/22

In an nutshell An additive noise channel ⊥ X ∈ R d Y = X + Z , Z ⊥ �� output input noise Z ∼ µ , but µ is unknown and non-parametric . ✞ ☎ Can we learn to efficiently communicate from ( Z 1 ,...,Z n ) i . i . d . ∼ µ ? ✝ ✆ Generalization bounds for learning under error probability loss 1 Applies to empirical risk minimization (ERM). 2/22

In an nutshell An additive noise channel ⊥ X ∈ R d Y = X + Z , Z ⊥ �� output input noise Z ∼ µ , but µ is unknown and non-parametric . ✞ ☎ Can we learn to efficiently communicate from ( Z 1 ,...,Z n ) i . i . d . ∼ µ ? ✝ ✆ Generalization bounds for learning under error probability loss 1 Applies to empirical risk minimization (ERM). learning under a surrogate error probability loss. 2 2/22

In an nutshell An additive noise channel ⊥ X ∈ R d Y = X + Z , Z ⊥ �� output input noise Z ∼ µ , but µ is unknown and non-parametric . ✞ ☎ Can we learn to efficiently communicate from ( Z 1 ,...,Z n ) i . i . d . ∼ µ ? ✝ ✆ Generalization bounds for learning under error probability loss 1 Applies to empirical risk minimization (ERM). learning under a surrogate error probability loss. 2 New alternating optimization algorithm. 2/22

In an nutshell An additive noise channel ⊥ X ∈ R d Y = X + Z , Z ⊥ �� output input noise Z ∼ µ , but µ is unknown and non-parametric . ✞ ☎ Can we learn to efficiently communicate from ( Z 1 ,...,Z n ) i . i . d . ∼ µ ? ✝ ✆ Generalization bounds for learning under error probability loss 1 Applies to empirical risk minimization (ERM). learning under a surrogate error probability loss. 2 New alternating optimization algorithm. a “codeword-expurgating” Gibbs learning algorithm. 3 2/22

In an nutshell An additive noise channel ⊥ X ∈ R d Y = X + Z , Z ⊥ �� output input noise Z ∼ µ , but µ is unknown and non-parametric . ✞ ☎ Can we learn to efficiently communicate from ( Z 1 ,...,Z n ) i . i . d . ∼ µ ? ✝ ✆ Generalization bounds for learning under error probability loss 1 Applies to empirical risk minimization (ERM). learning under a surrogate error probability loss. 2 New alternating optimization algorithm. a “codeword-expurgating” Gibbs learning algorithm. 3 Caveat: a distilled learning-theoretic framework. 2/22

Motivation Why? 3/22

Motivation Why? Justification of learning-based methods: 3/22

Motivation Why? Justification of learning-based methods: 1 The success of deep neural networks (DNN) [OH17; Gru+17]. 3/22

Motivation Why? Justification of learning-based methods: 1 The success of deep neural networks (DNN) [OH17; Gru+17]. 2 Avoid channel modeling [Wan+17; OH17; FG17; Shl+19]. 3/22

Motivation Why? Justification of learning-based methods: 1 The success of deep neural networks (DNN) [OH17; Gru+17]. 2 Avoid channel modeling [Wan+17; OH17; FG17; Shl+19]. Interference, jamming signals, non-linearities [Sch08], finite-resolution quantization. 3/22

Motivation Why? Justification of learning-based methods: 1 The success of deep neural networks (DNN) [OH17; Gru+17]. 2 Avoid channel modeling [Wan+17; OH17; FG17; Shl+19]. Interference, jamming signals, non-linearities [Sch08], finite-resolution quantization. High-dimensional parameter, e.g. massive MIMO. 3/22

Motivation Why? Justification of learning-based methods: 1 The success of deep neural networks (DNN) [OH17; Gru+17]. 2 Avoid channel modeling [Wan+17; OH17; FG17; Shl+19]. Interference, jamming signals, non-linearities [Sch08], finite-resolution quantization. High-dimensional parameter, e.g. massive MIMO. 3 Existing theory on learning-based quantizer design [LLZ94; LLZ97; BLL98; Lin02]. 3/22

Motivation Why? Justification of learning-based methods: 1 The success of deep neural networks (DNN) [OH17; Gru+17]. 2 Avoid channel modeling [Wan+17; OH17; FG17; Shl+19]. Interference, jamming signals, non-linearities [Sch08], finite-resolution quantization. High-dimensional parameter, e.g. massive MIMO. 3 Existing theory on learning-based quantizer design [LLZ94; LLZ97; BLL98; Lin02]. 4 Exploit efficient optimization methods, e.g., for the design of low latency codes [Kim+18; Jia+19]. 3/22

Outline Learning to Minimize Error Probability 1 Learning to Minimize a Surrogate to the Error Probability 2 Learning by Codebook Expurgation 3 4/22

Model Channel ⊥ X ∈ R d Y = X + Z , Z ⊥ �� output input noise 5/22

Model Channel ⊥ X ∈ R d Y = X + Z , Z ⊥ �� output input noise Encoder: A codebook C = { x j } j ∈ [ m ] ⊂ C ⊆ ( R d ) m . 5/22

Model Channel ⊥ X ∈ R d Y = X + Z , Z ⊥ �� output input noise Encoder: A codebook C = { x j } j ∈ [ m ] ⊂ C ⊆ ( R d ) m . Decoder: minimal (Mahalanobis) distance decoder ˆ j ( y ) ∈ argmin � x − y � S j ∈ [ m ] w.r.t. inverse covariance matrix S ∈ S ⊆ S d + . 5/22

✶ Expected and empirical error probability Expected average error probability: m p µ ( C,S ) := 1 � p µ ( C,S | j ) , m j =1 with � � �� p µ ( C,S | j ) := E µ j ′ ∈ [ m ] ,j ′ � = j � x j + Z − x j ′ � S < � Z � S min . ✶ 6/22

✶ Expected and empirical error probability Expected average error probability: m p µ ( C,S ) := 1 � p µ ( C,S | j ) , m j =1 with � � �� p µ ( C,S | j ) := E µ j ′ ∈ [ m ] ,j ′ � = j � x j + Z − x j ′ � S < � Z � S min . ✶ Ultimate goal: find argmin C,S p µ ( C,S ) . 6/22

Expected and empirical error probability Expected average error probability: m p µ ( C,S ) := 1 � p µ ( C,S | j ) , m j =1 with � � �� p µ ( C,S | j ) := E µ j ′ ∈ [ m ] ,j ′ � = j � x j + Z − x j ′ � S < � Z � S min . ✶ Ultimate goal: find argmin C,S p µ ( C,S ) . Empirical average error probability: replace n E µ [ ℓ ( Z )] → E Z [ ℓ ( Z )] := 1 � ℓ ( Z i ) n i =1 so that � � �� m p Z ( C,S ) := 1 � E Z ✶ j ′ ∈ [ m ] \ j � x j + Z − x j ′ � S < � Z � S min . m j =1 6/22

Uniform error bound and ERM Theorem Assume that n ≥ d +1 . With probability of at least 1 − δ sup | p µ ( C,S ) − p Z ( C,S ) | C ⊂ ( R d ) m , S ∈ S d + � � � � � en � 2( d +1)log � 2log(2 /δ ) d +1 ≤ 4 m + . n n 7/22

Uniform error bound and ERM Theorem Assume that n ≥ d +1 . With probability of at least 1 − δ sup | p µ ( C,S ) − p Z ( C,S ) | C ⊂ ( R d ) m , S ∈ S d + � � � � � en � 2( d +1)log � 2log(2 /δ ) d +1 ≤ 4 m + . n n Holds for the output ( C Z ,S Z ) of any learning algorithm. 7/22

Uniform error bound and ERM Theorem Assume that n ≥ d +1 . With probability of at least 1 − δ sup | p µ ( C,S ) − p Z ( C,S ) | C ⊂ ( R d ) m , S ∈ S d + � � � � � en � 2( d +1)log � 2log(2 /δ ) d +1 ≤ 4 m + . n n Holds for the output ( C Z ,S Z ) of any learning algorithm. Specifically, for ERM ( C Z ,S Z ) ERM ∈ argmin p Z ( C,S ) , O ( m 2 d +log(1 /δ ) n = ˜ ) samples guarantees ǫ 2 p µ ( C Z ,S Z ) ERM ) ≤ inf C,S p µ ( C,S )+ ǫ. 7/22

Uniform error bound and ERM - cont. Open questions: The term ˜ O ( log(1 /δ ) ) can be shown to be minimax tight. ǫ 2 8/22

Learning Additive Noise Channels: Generalization Bounds and - PowerPoint PPT Presentation

Learning Additive Noise Channels: Generalization Bounds and Algorithms Nir Weinberger Massachusetts Institute of Technology, MA, USA IEEE International Symposium on Information Theory June 2020 1/22 In an nutshell An additive noise channel

Generalization Bounds and Stability Lorenzo Rosasco Tomaso Poggio 9.520 Class 6 February, 23

Module-2c: Two Port Noise Modelling 20 July 2018 16:40 Shot Noise vs. Flicker Noise Simple

On the Influence of Input Noise On the Influence of Input Noise on a Generalization Error

Circuit Lower-bounds Lecture 24 Weak circuits are indeed weak 1 Circuit Lower-bounds 2

Multiple Input and Output Channels Multiple Input and Output Channels Multiple Input Channels In

Side Channels and Covert Channels Daniel Bosk Department of Information and Communication Systems

Visioning Committee Air Quality and Noise January 23, 2020 Noise Data Noise is evaluated on

Lecture 19- ECE 240a Laser Phase Noise 1 ECE 240a Lasers - Fall 2019 Lecture 19 Phase Noise

Making Polynomials Robust to Noise Alexander Sherstov U C L A Noise in computation 2 Noise in

Johnson Noise: Determinations of k and Absolute Zero Edwin Ng | 12 December 2011 Nyquists

Noises Jaanus Jaggo Noise Noise is a function: noise(coordinate) -> value Pseudo-random:

Noises Jaanus Jaggo Noise Noise is a function: noise(coordinate) -> value Pseudo-random:

Michael Spece Departments of Machine Learning and Statistics Carnegie Mellon University June 11,

Non-additive measures and integrals Vicen c Torra March, 2014 IIIA-CSIC (joint work with

Generalized Additive Models September 10, 2019 Generalized Additive Models September 10, 2019 1

Deep learning: Challenges in learning and generalization Tomas Mikolov, Facebook AI What is

Bounds on Permutation Channel Capacity Anuran Makur Department of Electrical Engineering and

Permutation Channels Anuran Makur Department of Electrical Engineering and Computer Science

Higher-order Coreference Resolution with Coarse-to-fine Inference Kenton Lee * Luheng He Luke

How to Writ e a SIGGRAPH Paper Dani Lischinski The Hebrew University of Jerusalem, Israel

Finite-Blocklength and Error-Exponent Analyses for LDPC Codes in Point-to-Point and Multiple

Programming in Time Tuesday, December 10, 13 Algorave? Tuesday, December 10, 13 Tuesday,

Tukey classes of local bases in compacta David Milovich 16th Boise Extravaganza in Set Theory

Every Child a Mover! Every Child a Talker! Early Years & Childcare Service Being Active....

Sambuz

Useful Links

Newsletter

Mail Us

Learning Additive Noise Channels: Generalization Bounds and - PowerPoint PPT Presentation

Learning Additive Noise Channels: Generalization Bounds and Algorithms Nir Weinberger Massachusetts Institute of Technology, MA, USA IEEE International Symposium on Information Theory June 2020 1/22 In an nutshell An additive noise channel

Generalization Bounds and Stability Lorenzo Rosasco Tomaso Poggio 9.520 Class 6 February, 23

Module-2c: Two Port Noise Modelling 20 July 2018 16:40 Shot Noise vs. Flicker Noise Simple

On the Influence of Input Noise On the Influence of Input Noise on a Generalization Error

Circuit Lower-bounds Lecture 24 Weak circuits are indeed weak 1 Circuit Lower-bounds 2

Multiple Input and Output Channels Multiple Input and Output Channels Multiple Input Channels In

Side Channels and Covert Channels Daniel Bosk Department of Information and Communication Systems

Visioning Committee Air Quality and Noise January 23, 2020 Noise Data Noise is evaluated on

Lecture 19- ECE 240a Laser Phase Noise 1 ECE 240a Lasers - Fall 2019 Lecture 19 Phase Noise

Making Polynomials Robust to Noise Alexander Sherstov U C L A Noise in computation 2 Noise in

Johnson Noise: Determinations of k and Absolute Zero Edwin Ng | 12 December 2011 Nyquists

Noises Jaanus Jaggo Noise Noise is a function: noise(coordinate) -&gt; value Pseudo-random:

Noises Jaanus Jaggo Noise Noise is a function: noise(coordinate) -&gt; value Pseudo-random:

Michael Spece Departments of Machine Learning and Statistics Carnegie Mellon University June 11,

Non-additive measures and integrals Vicen c Torra March, 2014 IIIA-CSIC (joint work with

Generalized Additive Models September 10, 2019 Generalized Additive Models September 10, 2019 1

Deep learning: Challenges in learning and generalization Tomas Mikolov, Facebook AI What is

Bounds on Permutation Channel Capacity Anuran Makur Department of Electrical Engineering and

Permutation Channels Anuran Makur Department of Electrical Engineering and Computer Science

Higher-order Coreference Resolution with Coarse-to-fine Inference Kenton Lee * Luheng He Luke

How to Writ e a SIGGRAPH Paper Dani Lischinski The Hebrew University of Jerusalem, Israel

Finite-Blocklength and Error-Exponent Analyses for LDPC Codes in Point-to-Point and Multiple

Programming in Time Tuesday, December 10, 13 Algorave? Tuesday, December 10, 13 Tuesday,

Tukey classes of local bases in compacta David Milovich 16th Boise Extravaganza in Set Theory

Every Child a Mover! Every Child a Talker! Early Years &amp; Childcare Service Being Active....

Sambuz

Useful Links

Newsletter

Mail Us

Noises Jaanus Jaggo Noise Noise is a function: noise(coordinate) -> value Pseudo-random:

Noises Jaanus Jaggo Noise Noise is a function: noise(coordinate) -> value Pseudo-random:

Every Child a Mover! Every Child a Talker! Early Years & Childcare Service Being Active....