Gaussian Approximation of Quantization Error for Inference from - PowerPoint PPT Presentation

Gaussian Approximation of Quantization Error for Inference from Compressed Data Alon Kipnis (Stanford) Galen Reeves (Duke) ISIT, July 2019

Table of Contents Introduction Motivation Contribution Main results Non Asymptotic Asymptotic Examples / Applications Standard Source Coding Quantized Compressed Sensing 2 / 16

Motivation Inference from Compressed Data data inference P X | θ ∼ ˆ X θ 3 / 16

Motivation Inference from Compressed Data data inference P X | θ ∼ ˆ X Y θ compression (bit limitation) 3 / 16

Motivation Inference from Compressed Data data inference P X | θ ∼ ˆ X Y θ compression (bit limitation) ◮ Indirect rate-distortion [Dobrushin & Tsybakov ’62], [Berger ’71] ◮ Quantized compressed sensing [Kamilov et. al. ’12], [Xu et. al. ’14], [Kipnis et. al. ’17, ’18] ◮ Estimation under communication constraints [Han ’87], [Zhang & Berger ’88], [Duchi et. al. ’17], [Duchi & Kipnis ’17], [Barnes et. al. ’18], [Han et. al. ’18] ◮ Task-oriented quantization [Kassam ’77], [Picimbono & Duvaut ’88], [Gersho ’96], [Misra et. al. ’08], [Shlezinger et. al. ’18] 3 / 16

Motivation Inference from Compressed Data data inference P X | θ ∼ ˆ X Y θ compression (bit limitation) ◮ Indirect rate-distortion [Dobrushin & Tsybakov ’62], [Berger ’71] ◮ Quantized compressed sensing [Kamilov et. al. ’12], [Xu et. al. ’14], [Kipnis et. al. ’17, ’18] ◮ Estimation under communication constraints [Han ’87], [Zhang & Berger ’88], [Duchi et. al. ’17], [Duchi & Kipnis ’17], [Barnes et. al. ’18], [Han et. al. ’18] ◮ Task-oriented quantization [Kassam ’77], [Picimbono & Duvaut ’88], [Gersho ’96], [Misra et. al. ’08], [Shlezinger et. al. ’18] Challenge: combining estimation theory and quantization 3 / 16

Lossy Compression vs AWGN Channel { 1 , . . . , 2 nR } P X | θ θ X Enc Dec Y N (0 , 1 snr I ) + Z 4 / 16

Lossy Compression vs AWGN Channel { 1 , . . . , 2 nR } P X | θ θ X Enc Dec Y N (0 , 1 snr I ) + Z ◮ Plenty of pitfalls/non-rigorous work [Gray ] ◮ Some rigorous “high bit resolution” results [Lee & Neuhoff ’96], [Viswanathan & Zamir ’01], [Marco & Neuhoff ’05] 4 / 16

Lossy Compression vs AWGN Channel { 1 , . . . , 2 nR } P X | θ θ X Enc Dec Y N (0 , 1 snr I ) + Z ◮ Plenty of pitfalls/non-rigorous work [Gray ] ◮ Some rigorous “high bit resolution” results [Lee & Neuhoff ’96], [Viswanathan & Zamir ’01], [Marco & Neuhoff ’05] This Talk: If X is encoded using a random spherical code, then snr = 2 2 R − 1 Wass 2 ( Y , Z | X ) ≈ const , 4 / 16

Geometric Interpretation of Gaussian Source Coding [Sakrison ’68], [Wyner ’68] 5 / 16

Geometric Interpretation of Gaussian Source Coding [Sakrison ’68], [Wyner ’68] input sphere X √ n 5 / 16

Geometric Interpretation of Gaussian Source Coding [Sakrison ’68], [Wyner ’68] representation sphere input sphere ¯ X X Y α √ n √ r n sin α → 2 − R 5 / 16

Geometric Interpretation of Gaussian Source Coding [Sakrison ’68], [Wyner ’68] representation error sphere sphere input sphere ¯ X X Y α √ n √ r n sin α → 2 − R 5 / 16

Geometric Interpretation of Gaussian Source Coding [Sakrison ’68], [Wyner ’68] This talk: representation error sphere sphere input sphere X ¯ Y X X Y α √ n α √ n √ r √ n ρ n sin α → 2 − R 5 / 16

Overview of Contributions N (0 , 1 snr I ) X Y √ n + X Z 2 2 R − 1 = snr 6 / 16

Overview of Contributions N (0 , 1 snr I ) X Y √ n + X Z 2 2 R − 1 = snr ◮ Strong equivalence between quantization error using rate R random spherical coding and AWGN with SNR 2 2 R − 1 6 / 16

Overview of Contributions N (0 , 1 snr I ) X Y √ n + X Z 2 2 R − 1 = snr ◮ Strong equivalence between quantization error using rate R random spherical coding and AWGN with SNR 2 2 R − 1 ◮ Applications to inference from compressed data 6 / 16

Main Result (Non Asymptotic) Gaussian Approximation of Quantization Error rate R P X | θ θ X Y spherical code W ∼ N (0 , σ 2 I ) + Z Theorem For P X with finite second moments, E [ � X � ] E [ � X � ] = ρ 2 − R , ρ = , σ = n (2 2 R − 1) � � n (1 − 2 − 2 R ) we have 2 ( Y , Z | X ) ≤ var( � X � ) + 2 σ 2 + C R E [ � X � ] 2 log 2 n Wass 2 n 2 8 / 16

Wasserstein Distance and Lipschitz Continuity Definition (quadratic Wasserstein Distance:) Wass 2 ( Y , Z ) � inf P Y ,Z E [ � Y − Z � 2 ] , ( P Y , P Z are fixed) (a.k.a. Kantorovitch, Kantorovich-Rubinstein, “transportation”, ρ -bar, “earth movers”, Gini, Frechet, Vallender, Mallows...) 9 / 16

Wasserstein Distance and Lipschitz Continuity Definition (quadratic Wasserstein Distance:) Wass 2 ( Y , Z ) � inf P Y ,Z E [ � Y − Z � 2 ] , ( P Y , P Z are fixed) (a.k.a. Kantorovitch, Kantorovich-Rubinstein, “transportation”, ρ -bar, “earth movers”, Gini, Frechet, Vallender, Mallows...) Fact For any L -Lipschitz f : � �� θ − f ( Y ) � 2 � θ − f ( Z ) � 2 − � ≤ L Wass 2 ( Y , Z | X ) E E � � 2 2 � � � 9 / 16

Main Result (Asymptotic) Asymptotic Squared Error rate R P X | θ θ X Y spherical code θ ∈ R d n W ∼ N (0 , σ 2 I ) + Z Corollary If 1 � θ n ( Z ) � 2 � � θ − ˆ = M ( snr ) + o (1) , E d n then 1 �� 2 � � = M (2 2 R − 1) + o (1) , � θ − ˆ E θ n ( Y ) � � d n � provided: ◮ var( � X � ) = O (1) θ n ) = o ( √ d n ) ◮ Lip (ˆ 10 / 16

Examples / Applications { 1 , . . . , 2 nR } P X | θ ˆ θ X Enc Dec θ ◮ Standard source coding: X = θ ◮ Quantized compressed sensing: X = A θ + W Not in this talk... ◮ Parametric estimation under bit constraints ◮ Optimization with gradient compression ◮ Data compression in latent space using a generative model 12 / 16

Example I: Standard Source Coding � θ 2 � X = θ , E = 1 1 R bits/symbol iid P θ ∼ ˆ θ Enc Dec θ 13 / 16

Example I: Standard Source Coding � θ 2 � X = θ , E = 1 1 R bits/symbol iid P θ ∼ ˆ θ Enc Dec θ W/ √ snr + Z 13 / 16

Example I: Standard Source Coding � θ 2 � X = θ , E = 1 1 R bits/symbol iid P θ ∼ ˆ θ Enc Dec θ W/ √ snr + Z ˆ 1) Estimator : θ ( z ) = E [ θ 1 | Z 1 = z ] M ( snr ) = mmse ( θ 1 | Z 1 ) 2) MSE function : 13 / 16

Example I: Standard Source Coding � θ 2 � X = θ , E = 1 1 R bits/symbol iid P θ ∼ ˆ θ Enc Dec θ W/ √ snr + Z ˆ 1) Estimator : θ ( z ) = E [ θ 1 | Z 1 = z ] M ( snr ) = mmse ( θ 1 | Z 1 ) 2) MSE function : Corollary � � 2 2 R − 1 � θ 1 | θ 1 + W/ D sp ( R ) = mmse is achievable with random spherical coding 13 / 16

Example I: Standard Source Coding � θ 2 � X = θ , E = 1 1 R bits/symbol iid P θ ∼ ˆ θ Enc Dec θ W/ √ snr + Z ˆ 1) Estimator : θ ( z ) = E [ θ 1 | Z 1 = z ] M ( snr ) = mmse ( θ 1 | Z 1 ) 2) MSE function : Corollary � � 2 2 R − 1 � θ 1 | θ 1 + W/ D sp ( R ) = mmse is achievable with random spherical coding D sp ( R ) ≤ D Gauss ( R ) = 2 − 2 R Note: ◮ Compare to [Sakrison ’68], [Lapidoth ’97] 13 / 16

Standard Source Coding (cont’d) Illustration: Equiprobable Binary P θ = Unif ( {− 1 , 1 } ) , i = 1 , . . . , n 1 D sp ( R ) D Gauss ( R ) D Shannon ( R ) MSE 0 0 1 2 R 14 / 16

Example II: Quantized Compressed Sensing X = A θ + ǫW → { 1 , . . . , 2 nR } → ˆ R n × d n A ∈ θ, n/d n → δ> 0 15 / 16

Example II: Quantized Compressed Sensing X = A θ + ǫW → { 1 , . . . , 2 nR } → ˆ R n × d n A ∈ θ, n/d n → δ> 0 � � X � 2 � σ 2 = 1 E � ǫ 2 + σ 2 W ′ , Z = A θ + snr n 15 / 16

Example II: Quantized Compressed Sensing X = A θ + ǫW → { 1 , . . . , 2 nR } → ˆ R n × d n A ∈ θ, n/d n → δ> 0 � � X � 2 � σ 2 = 1 E � ǫ 2 + σ 2 W ′ , Z = A θ + snr n 1) θ T AMP ( Z ) = T iterations of A pproximate M essage P assing [Donoho et. al. ’09] 2) M T AMP ( snr ) = T iterations of state evolution recursion [Bayati & Montanari ’11] 15 / 16

Example II: Quantized Compressed Sensing X = A θ + ǫW → { 1 , . . . , 2 nR } → ˆ R n × d n A ∈ θ, n/d n → δ> 0 � � X � 2 � σ 2 = 1 E � ǫ 2 + σ 2 W ′ , Z = A θ + snr n 1) θ T AMP ( Z ) = T iterations of A pproximate M essage P assing [Donoho et. al. ’09] 2) M T AMP ( snr ) = T iterations of state evolution recursion [Bayati & Montanari ’11] Corollary 1 �� 2 � AMP (2 2 R − 1) � θ − θ T → M T � AMP ( Y ) E d n 15 / 16

Gaussian Approximation of Quantization Error for Inference from - PowerPoint PPT Presentation

Gaussian Approximation of Quantization Error for Inference from Compressed Data Alon Kipnis (Stanford) Galen Reeves (Duke) ISIT, July 2019 Table of Contents Introduction Motivation Contribution Main results Non Asymptotic Asymptotic

Gaussian Filter The Gaussian filter 1 2 1 A Gaussian kernel gives less 1 2 4 2 weight to

Same, Same But Different Recovering Neural Network Quantization Error Through Weight

Quantization, after Souriau Prequantization Quantization? Group algebra Classical Franois

ECS 231 Lecture on Approximation and Error Analysis 1 / 9 Approximation and error analysis 1.

LOW PRECISION INFERENCE ON GPU Hao Wu, NVIDIA OUTLINE Performance motivation for quantization

Quantization for TVM Ziheng Jiang TVM Conference, Dec 12th 2018 Quantization for TVM What is

Chapter 11: The R.M.S. Error for Regression Errors: A has a large positive error B has a large

Lecture 3 Capacity of Multiuser Gaussian Channels The Gaussian uplink: 6.1 The fading

Non-Gaussian likelihoods for Gaussian Processes Alan Saul Outline Motivation Non-Gaussian

6. Approximation and fitting norm approximation least-norm problems regularized

ERROR DETECTON & CORRECTION Error Detection EDC= Error Detection and Correction bits

Faster Gaussian Lattice Sampling using Information Leakage Gaussian Sampling Our Work Lazy

CS70: Jean Walrand: Lecture 36. Gaussian and CLT CS70: Jean Walrand: Lecture 36. Gaussian and

Quantization of Poisson-Lie Hamiltonian systems Chiara Esposito Julius Maximilian University of

Adiabatic limits, Theta functions, and Geometric Quantization 2019 CMS Winter Meeting Takahiko

CMSC5743 L05: Quantization Bei Yu (Latest update: October 12, 2020) Fall 2020 1 / 25 Overview

Lightning Introductions Digital Computing Beyond Moores Law May 3-4, 2018 Sarita

World Scout Badge World Scout Badge History of Scouting History of Scouting in Singapore

A Look into Far Detector Photon Rates Caroline Zhang Calibration Consortium Meeting March 9,

A New Weight-Restricted DEA Model Based on PROMETHEE II 2 nd International MCDA workshop on

Phrase-Based Models Philipp Koehn 15 September 2020 Philipp Koehn Machine Translation:

Natural Language Processing (CSEP 517): Machine Translation Noah Smith 2017 c University of

Coordinator Orientation July 15, 2020 Jennifer Norton and Anika Harris Welcome and Introductions

Decentralized Proof-Term Library Michael Nahas affliliated with Radboud Universiteit Nijmegen

Gaussian Approximation of Quantization Error for Inference from - PowerPoint PPT Presentation

Gaussian Approximation of Quantization Error for Inference from Compressed Data Alon Kipnis (Stanford) Galen Reeves (Duke) ISIT, July 2019 Table of Contents Introduction Motivation Contribution Main results Non Asymptotic Asymptotic

Gaussian Filter The Gaussian filter 1 2 1 A Gaussian kernel gives less 1 2 4 2 weight to

Same, Same But Different Recovering Neural Network Quantization Error Through Weight

Quantization, after Souriau Prequantization Quantization? Group algebra Classical Franois

ECS 231 Lecture on Approximation and Error Analysis 1 / 9 Approximation and error analysis 1.

LOW PRECISION INFERENCE ON GPU Hao Wu, NVIDIA OUTLINE Performance motivation for quantization

Quantization for TVM Ziheng Jiang TVM Conference, Dec 12th 2018 Quantization for TVM What is

Chapter 11: The R.M.S. Error for Regression Errors: A has a large positive error B has a large

Lecture 3 Capacity of Multiuser Gaussian Channels The Gaussian uplink: 6.1 The fading

Non-Gaussian likelihoods for Gaussian Processes Alan Saul Outline Motivation Non-Gaussian

6. Approximation and fitting norm approximation least-norm problems regularized

ERROR DETECTON &amp; CORRECTION Error Detection EDC= Error Detection and Correction bits

Faster Gaussian Lattice Sampling using Information Leakage Gaussian Sampling Our Work Lazy

CS70: Jean Walrand: Lecture 36. Gaussian and CLT CS70: Jean Walrand: Lecture 36. Gaussian and

Quantization of Poisson-Lie Hamiltonian systems Chiara Esposito Julius Maximilian University of

Adiabatic limits, Theta functions, and Geometric Quantization 2019 CMS Winter Meeting Takahiko

CMSC5743 L05: Quantization Bei Yu (Latest update: October 12, 2020) Fall 2020 1 / 25 Overview

Lightning Introductions Digital Computing Beyond Moores Law May 3-4, 2018 Sarita

World Scout Badge World Scout Badge History of Scouting History of Scouting in Singapore

A Look into Far Detector Photon Rates Caroline Zhang Calibration Consortium Meeting March 9,

A New Weight-Restricted DEA Model Based on PROMETHEE II 2 nd International MCDA workshop on

Phrase-Based Models Philipp Koehn 15 September 2020 Philipp Koehn Machine Translation:

Natural Language Processing (CSEP 517): Machine Translation Noah Smith 2017 c University of

Coordinator Orientation July 15, 2020 Jennifer Norton and Anika Harris Welcome and Introductions

Decentralized Proof-Term Library Michael Nahas affliliated with Radboud Universiteit Nijmegen

ERROR DETECTON & CORRECTION Error Detection EDC= Error Detection and Correction bits