Generalization Error Analysis of Quantized Compressive Learning - PowerPoint PPT Presentation

Generalization Error Analysis of Quantized Compressive Learning Xiaoyun Li Ping Li Department of Statistics, Rutgers University Cognitive Computing Lab, Baidu Research USA Xiaoyun Li, Ping Li NeurIPS 2019 1 / 14

Random Projection (RP) Method Data matrix X ∈ R n × d , normalized to unit norm (all samples on unit sphere). Save storage by k random projections: X R = X × R , with R ∈ R d × k ⇒ X R ∈ R n × k . a random matrix with i.i.d. N (0 , 1) entries = J-L lemma: approximate distance preservation = ⇒ Many applications: clustering, classification, compressed sensing, dimensionality reduction, etc.. “Projection+quantization”: more storage saving. Apply (entry-wise) scalar quantization function Q ( · ) by X Q = Q ( X R ). More applications: MaxCut, SimHash, 1-bit compressive sensing, etc.. Xiaoyun Li, Ping Li NeurIPS 2019 2 / 14

Compressive Learning + Quantization We can apply learning models to projected data ( X R , Y ), where Y is the response or label = ⇒ learning in the projected space S R ! This is called compressive learning . It has been shown that learning in the projected space is able to provide satisfactory performance, while substantially reduce the computational cost, especially for high-dimensional data. We go one step further: learning with quantized random projections ( X Q , Y ) = ⇒ learning in the quantized projected space S Q ! This is called quantized compressive learning . A relatively new topic, but is practical in applications with data compression. Xiaoyun Li, Ping Li NeurIPS 2019 3 / 14

Paper Summary We provide generalization error bounds (of a test sample x ∈ X ) on three quantized compressive learning models: Nearest neighbor classifier Linear classifier (logistic regression, linear SVM, etc.) Linear regression Applications : we identify the factors that affect the generalization performance of each model, which gives recommendations on the choice of quantizer Q in practice. Some experiments are conducted to verify the theory. Xiaoyun Li, Ping Li NeurIPS 2019 4 / 14

Backgrounds A b -bit quantizer Q b separates the real line into M = 2 b regions. Distortion : D Q b = E [( Q b ( X ) − X ) 2 ] ⇐ ⇒ minimized by Lloyd-Max (LM) quantizer. Maximal gap of Q on interval [ a , b ]: the largest gap between two consecutive boarders of Q on [ a , b ]. Indeed, we can estimate the inner product between two samples x 1 1 R ) Q ( R T x 2 ) ρ Q ( x 1 , x 2 ) = Q ( x T and x 2 through the estimator ˆ , which k might be biased. We define the debiased variance of a quantizer Q as the variance of ˆ ρ Q after debiasing. Idea: connection between the generalization of three models and inner product estimates. Xiaoyun Li, Ping Li NeurIPS 2019 5 / 14

Quantized Compressive 1-NN Classifier We are interested in the risk of a classifier h , L ( h ) = E [ ✶ { h ( x ) � = y } ]. Assume ( x , y ) ∼ D , with conditional probability η ( x ) = P ( y = 1 | x ). Bayes classifier h ∗ ( x ) = ✶ { η ( x ) > 1 / 2 } has the minimal risk. h Q ( x ) = y (1) Q , where ( x (1) Q , y (1) Q ) is the sample and label of nearest neighbor of x in the quantized space S Q . Theorem: Generalization of 1-NN Classifier Suppose ( x , y ) is a test sample. Q is a uniform quantizer with △ between boarders and maximal gap g Q . Under some technical conditions and with some constants c 1 , c 2 , with high probability, √ k +1 √ � E X , Y [ L ( h Q ( x ))] ≤ 2 L ( h ∗ ( x ))+ c 1 ( △ 1 + ω k + c 2 △ k 1 k k +1 ( ne ) − 1 − ω ) √ 1 − ω. g Q Xiaoyun Li, Ping Li NeurIPS 2019 6 / 14

Quantized Compressive 1-NN Classifier: Asymptotics Theorem: Asymptotic Error of 1-NN Classifier 1 R ) Q ( R T x 2 ) ρ Q = Q ( x T Let the cosine estimator ˆ , assume ∀ x 1 , x 2 , k E [ˆ ρ Q ( x 1 , x 2 )] = αρ x 1 , x 2 for some α > 0. As k → ∞ , we have E X , Y , R [ L ( h Q ( x ))] ≤ E X , Y [ L ( h S ( x ))] + r k , √ k (cos( x , x i ) − cos( x , x (1) )) � � � r k = E [ Φ ] , � ξ 2 x , x i + ξ 2 ρ Q ( x , x (1) )) ξ x , x i ξ x , x (1) x , x (1) − 2 Corr (ˆ ρ Q ( x , x i ) , ˆ i : x i ∈G with ξ 2 ρ Q ( x , y ) and G = X / x (1) . L ( h S ( x )) x , y / k the debiased variance of ˆ is the risk of data space NN classifier, and Φ( · ) is the CDF of N (0 , 1). Let x (1) be the nearest neighbor of a test sample x . Under mild conditions, smaller debiased variance around ρ = cos( x , x (1) ) leads to smaller generalization error. Xiaoyun Li, Ping Li NeurIPS 2019 7 / 14

Quantized Compressive Linear Classifier with (0,1)-loss H separates the space by a hyper-plane: H ( x ) = ✶ { h T x > 0 } . ERM classifiers: ˆ H ( x ) = ✶ { ˆ h T x > 0 } , ˆ H Q ( x ) = ✶ { ˆ h T Q Q ( R T x ) > 0 } . Theorem: Generalization of linear classifier Under some technical conditions, with probability (1 − 2 δ ), n h ) + 1 Pr [ ˆ H Q ( x ) � = y ] ≤ ˆ L (0 , 1) ( S , ˆ � f k , Q ( ρ i ) + C k , n ,δ , δ n i =1 √ k | ρ i | where f k , Q ( ρ i ) = Φ( − ), with ρ i the cosine between training sample ξ ρ i x i and ERM classifier ˆ h in the data space, and ξ 2 ρ i / k the debiased variance 1 R ) Q ( R T x 2 ) ρ Q = Q ( x T of ˆ at ρ i . k Small debiased variance around ρ = 0 lowers the bound. Xiaoyun Li, Ping Li NeurIPS 2019 8 / 14

Quantized Compressive Least Squares (QCLS) Regression Fixed design: Y = X T β + ǫ , with x i fixed, ǫ i.i.d. N (0 , γ ) L ( β ) = 1 n E Y [ � Y − X β � 2 ], L Q ( β Q ) = 1 n E Y , R [ � Y − Q ( XR ) β Q � 2 ]. ˆ ˆ L ( β ) = 1 n � Y − X β � 2 , L Q ( β Q ) = 1 1 k Q ( XR ) β Q � 2 . (given R ) n � Y − √ Theorem: Generalization of QCLS β ∗ = argmin Let ˆ ˆ L ( β ) and ˆ ˆ β ∗ L Q ( β ). Let Σ = X T X / k , k < n . Q = argmin β ∈ R d β ∈ R k D Q is the distortion of Q . Then we have n + 1 Q )] − L ( β ∗ ) ≤ γ k E Y , R [ L Q (ˆ β ∗ k � β ∗ � 2 Ω , (1) √ where Ω = [ ξ 2 , 2 − 1+ D Q 1 w T Ω w the (1 − D Q ) 2 − 1]Σ + 1 − D Q I d , with � w � Ω = Mahalanobis norm. Smaller distortion lowers the error bound. Xiaoyun Li, Ping Li NeurIPS 2019 9 / 14

Implications 1-NN classification : In most applications, we should choose the quantizer with small debiased variance of inner product estimator ρ Q = Q ( R T x ) T Q ( R T y ) ˆ in high similarity region. = ⇒ Normalizing the k quantized random projections ( X Q ) may help, see ref Xiaoyun Li and Ping Li, Random Projections with Asymmetric Quantization, NeurIPS 2019. Linear classification : we should choose the quantizer with small ρ Q = Q ( R T x ) T Q ( R T y ) debiased variance of inner product estimate ˆ at k around ρ = 0. = ⇒ First choice: Lloyd-Max quantizer. Linear regression : we should choose the quantizer with small distortion D Q . = ⇒ First choice: Lloyd-Max quantizer. Xiaoyun Li, Ping Li NeurIPS 2019 10 / 14

Experiments Dataset # samples # features # classes Mean 1-NN ρ BASEHOCK 1993 4862 2 0.6 orlraws10P 100 10304 10 0.9 3 Debiased Variance 2 1 Full-precision LM b=1 LM b=3 Uniform b=3 0 0.2 0.4 0.6 0.8 1 Figure 1: Empirical debiased variance of three quantizers. Mean 1-NN ρ is the estimated cos( x , x (1) ) from training set. Xiaoyun Li, Ping Li NeurIPS 2019 11 / 14

Quantized Compressive 1-NN Classification Claim: smaller debiased variance at around ρ = cos( x , x (1 ) is better. 100% 100% orlraws10P BASEHOCK Test Accuracy Test Accuracy 95% 80% 90% Full-precision Full-precision 60% 85% LM b=1 LM b=1 LM b=3 LM b=3 Uniform b=3 Uniform b=3 80% 40% 2 6 2 7 2 8 2 9 2 10 2 11 2 12 2 6 2 7 2 8 2 9 2 10 2 11 2 12 Number of Projections Number of Projections Figure 2: Quantized compressive 1-NN classification. Target ρ should be around: BASEHOCK: 0 . 6, where 1-bit quantizer has largest debiased variance. Orlraws10P: 0 . 9, where 1-bit quantizer has smallest debiased variance. 1-bit quantizer may generalize better than using more bits! Xiaoyun Li, Ping Li NeurIPS 2019 12 / 14

Quantized Compressive Linear SVM Claim: smaller debiased variance at ρ = 0 is better. 100% 100% BASEHOCK Test Accuracy Test Accuracy 90% 90% 80% orlraws10P Full-precision Full-precision 80% LM b=1 LM b=1 70% LM b=3 LM b=3 Uniform b=3 Uniform b=3 70% 60% 2 6 2 7 2 8 2 9 2 10 2 11 2 12 2 6 2 7 2 8 2 9 2 10 2 11 2 12 Number of Projections Number of Projections Figure 3: Quantized compressive linear SVM. At ρ = 0, red quantizer has much larger debiased variance than others = ⇒ Lowest test accuracy on both datasets. Xiaoyun Li, Ping Li NeurIPS 2019 13 / 14

Quantized Compressive Linear Regression Claim: smaller distortion is better. 1.1 1 Test MSE 0.9 0.8 0.7 0.6 200 400 600 800 1000 Number of Projections Figure 4: Test MSE of QCLS. Blue: uniform quantizers. Red: Lloyd-Max (LM) quantizers. LM quantizer always outperforms uniform quantizer. The order of test error agrees with the order of distortion. Xiaoyun Li, Ping Li NeurIPS 2019 14 / 14

Generalization Error Analysis of Quantized Compressive Learning - PowerPoint PPT Presentation

Generalization Error Analysis of Quantized Compressive Learning Xiaoyun Li Ping Li Department of Statistics, Rutgers University Cognitive Computing Lab, Baidu Research USA Xiaoyun Li, Ping Li NeurIPS 2019 1 / 14 Random Projection (RP)

Quantized Vortices and Quantized Vortices and Quantum Turbulence Quantum Turbulence Makoto

Quantized Quantized superfluid vortices superfluid vortices in the unitary Fermi gas in the

Chapter 11: The R.M.S. Error for Regression Errors: A has a large positive error B has a large

Fast Compressive Sampling Using Fast Compressive Sampling Using Structurally Random Matrices

Compressive Sensing Take 2 Yubo Paul Yang, Algorithm Interest Group, Nov. 1 2019 See take 1

Analysis of Compressive Sensing in Radar Holger Rauhut Lehrstuhl C f ur Mathematik (Analysis)

ERROR DETECTON & CORRECTION Error Detection EDC= Error Detection and Correction bits

On the Influence of Input Noise On the Influence of Input Noise on a Generalization Error

Outline IAML: Overfitting and Capacity Control Generalization error Estimating

Quantized cosmological spacetimes and higher spin in the IKKT model Harold Steinacker Department

Human Error and Human Error Identification Techniques adapted from an IE 545 presentaton by

An Overview of Human Error Drawn f rom J . Reason, Human Error , Cambridge, 1990 Aaron Brown CS

Questions From Chapter 1 Figure 1.1: Testing life cycle Ch 12 Error vocabulary 1

Error Detection Codes Error Detection Two types Nave scheme Error Detection Codes

llvm::Error Rich Error Handling in LLVM Error Handling History LLVMs APIs historically

Towards Compressive Geospatial Sensing Via Fusion of LIDAR and Hyperspectral Imaging Allen Y.

A First Course in Digital Communications Ha H. Nguyen and E. Shwedyk February 2009 A First

Ultra-low-bit Neural Network Quantization Peisong Wang Institute of Automation, Chinese Academy

Quantization, after Souriau Prequantization Quantization? Group algebra Classical Franois

Obstacles to the quantization of general relativity using symplectic structures Tom McClain

QUANTIZED SYSTEMS AND CONTROL Daniel Liberzon Coordinated Science Laboratory and Dept. of

Light-front quantization From the White Paper by the Board of Directors of ILCAC, Inc. John R.

MHV amplitudes in N=4 SUSY Yang-Mills theory and quantum geometry of the momentum space Alexander

Introduction Today we move on to the final section of material on quantum optical communica-

Generalization Error Analysis of Quantized Compressive Learning - PowerPoint PPT Presentation

Generalization Error Analysis of Quantized Compressive Learning Xiaoyun Li Ping Li Department of Statistics, Rutgers University Cognitive Computing Lab, Baidu Research USA Xiaoyun Li, Ping Li NeurIPS 2019 1 / 14 Random Projection (RP)

Quantized Vortices and Quantized Vortices and Quantum Turbulence Quantum Turbulence Makoto

Quantized Quantized superfluid vortices superfluid vortices in the unitary Fermi gas in the

Chapter 11: The R.M.S. Error for Regression Errors: A has a large positive error B has a large

Fast Compressive Sampling Using Fast Compressive Sampling Using Structurally Random Matrices

Compressive Sensing Take 2 Yubo Paul Yang, Algorithm Interest Group, Nov. 1 2019 See take 1

Analysis of Compressive Sensing in Radar Holger Rauhut Lehrstuhl C f ur Mathematik (Analysis)

ERROR DETECTON &amp; CORRECTION Error Detection EDC= Error Detection and Correction bits

On the Influence of Input Noise On the Influence of Input Noise on a Generalization Error

Outline IAML: Overfitting and Capacity Control Generalization error Estimating

Quantized cosmological spacetimes and higher spin in the IKKT model Harold Steinacker Department

Human Error and Human Error Identification Techniques adapted from an IE 545 presentaton by

An Overview of Human Error Drawn f rom J . Reason, Human Error , Cambridge, 1990 Aaron Brown CS

Questions From Chapter 1 Figure 1.1: Testing life cycle Ch 12 Error vocabulary 1

Error Detection Codes Error Detection Two types Nave scheme Error Detection Codes

llvm::Error Rich Error Handling in LLVM Error Handling History LLVMs APIs historically

Towards Compressive Geospatial Sensing Via Fusion of LIDAR and Hyperspectral Imaging Allen Y.

A First Course in Digital Communications Ha H. Nguyen and E. Shwedyk February 2009 A First

Ultra-low-bit Neural Network Quantization Peisong Wang Institute of Automation, Chinese Academy

Quantization, after Souriau Prequantization Quantization? Group algebra Classical Franois

Obstacles to the quantization of general relativity using symplectic structures Tom McClain

QUANTIZED SYSTEMS AND CONTROL Daniel Liberzon Coordinated Science Laboratory and Dept. of

Light-front quantization From the White Paper by the Board of Directors of ILCAC, Inc. John R.

MHV amplitudes in N=4 SUSY Yang-Mills theory and quantum geometry of the momentum space Alexander

Introduction Today we move on to the final section of material on quantum optical communica-

ERROR DETECTON & CORRECTION Error Detection EDC= Error Detection and Correction bits