Generalization Error of Generalized Linear Models in High Dimensions - PowerPoint PPT Presentation

Generalization Error of Generalized Linear Models in High Dimensions Melika Emami 1 , Mojtaba Sahraee-Ardakan 1,2 , Parthe Pandit 1,2 , Sundeep Rangan 3 , Alyson K. Fletcher 1,2 1 ECE, UCLA, 2 STAT, UCLA, 3 ECE, NYU ICML 2020 Melika Emami (UCLA) Generalization Error of GLMs ICML 2020 1 / 15

Overview • Generalization Error: Performance on new data • Fundamental question in modern systems: – Low generalization error despite over-parameterization [BHMM19] • This work : Exact calculation of generalization error for GLMs – High dimensional regime – Double descent phenomenon Melika Emami (UCLA) Generalization Error of GLMs ICML 2020 2 / 15

Overview y = φ out ( � x , w 0 � , d ) • Generalized linear models (GLMs): d w 0 0 x i φ out ( · ) y Σ w 0 1 w 0 p − 1 • Regularized ERM: w = argmin � F out ( y , X w ) + F in ( w ) w • Generalization error: E f ts ( y ts , � y ts ) (1) – Test sample: ( x ts , y ts ) � x ts , w 0 � – y ts = φ out ( , d ts ) , � y ts = φ ( � x ts , � w � ) Melika Emami (UCLA) Generalization Error of GLMs ICML 2020 3 / 15

Overview • Prior work – Understanding generalization in deep neural nets [BMM18, BHX19, BLLT19, NLB + 18, ZBH + 16, AS17] – Linear models [MRSY19, DKT19, MM19, HMRT19, GAK20] – GLMs with uncorrelated features [BKM + 19] • Our contribution : – A procedure for characterizing generalization error (1) – General test metrics, training losses, regularizers, link function – Correlated covariates – Train-test distributional mismatch – Over-parameterized and under-parameterized regime Melika Emami (UCLA) Generalization Error of GLMs ICML 2020 4 / 15

Outline Main Result Scalar Equivalent System Main Theorem Examples Linear Regression Logistic Regression Non-linear Regression Proof Technique Multi-layer VAMP Future Directions Melika Emami (UCLA) Generalization Error of GLMs ICML 2020 5 / 15

Scalar Equivalent System True vector system Scalar equivalent system N (0 , τ ) w 0 d � w y W 0 � W z ≡ φ out ( · ) + X Est Denoiser High dimensional: Hard to analyze Scalar: Easy to analyze � w = argmin F out ( y , X w ) + F in ( w ) (2) w • Key tool : Approximate Message Passing (AMP) framework [DMM09, BM11, RSF19, FRS18, PSAR + 20] – As a constructive proof technique – Performance of the estimates: → deterministic recursive equations: state evolution (SE) Melika Emami (UCLA) Generalization Error of GLMs ICML 2020 6 / 15

Main Result True vector system Scalar equivalent system N (0 , τ ) w 0 d � w y � W 0 W z ≡ φ out ( · ) X Est + Denoiser High dimensional: Hard to analyze Scalar: Easy to analyze Theorem (Generalization error of GLMs) (a) Under some regularity conditions on f ts , φ, φ out , the above convergence is rigorous: � N 1 f ( w 0 w i ) = E f ( W 0 , � lim W ) i , � a.s. N N →∞ i =1 W = prox f in /γ ( W 0 + Q ) , � ( independent of W 0 ) Q = N (0 , τ ) (b) Generalization error: � � φ out ( Z ts , D ) , φ ( � ( Z ts , � E ts = E f ts Z ts ) , Z ts ) ∼ N ( 0 2 , M ) ⊥ ( Z ts , � τ, γ , and M are computed by SE equations, and D ⊥ Z ts ) Melika Emami (UCLA) Generalization Error of GLMs ICML 2020 7 / 15

Example Setting • Train-test distributional mismatch – x train ∼ N ( 0 , Σ tr ), x test ∼ N ( 0 , Σ ts ), Σ tr and Σ ts commute – i.i.d. log-normal eigenvalues � log( S 2 � � � 1 �� tr ) ρ i . i . d . ∼ N 0 , σ ∀ i log( S 2 ts ) ρ 1 • 3 different cases : (i) Uncorrelated features ( σ = 0) Σ tr = Σ ts = I (ii) Correlated features ( σ > 0 , ρ = 1) Σ tr = Σ ts � = I Σ tr � = Σ ts (iii) Mismatched features ( σ > 0 , ρ < 1) Melika Emami (UCLA) Generalization Error of GLMs ICML 2020 8 / 15

Example: Linear Regression • Under-regularized linear regression: – φ out ( p, d ) = p + d , and d ∼ N (0 , σ 2 d ) – MSE output loss – double descent phenomenon (Recovered result of [HMRT19]) Melika Emami (UCLA) Generalization Error of GLMs ICML 2020 9 / 15

Example: Logistic Regression • Logistic regression – Logistic output P ( y = 1) = 1 / (1 + e − p ) – Binary cross-entropy loss with ℓ 2 regularization Melika Emami (UCLA) Generalization Error of GLMs ICML 2020 10 / 15

Example: Non-linear Regression • Non-linear Regression d ∼ N (0 , σ 2 – φ out ( p, d ) = tanh( p ) + d, d ) 1 d ( y − tanh( p )) 2 – f out ( y, p ) = 2 σ 2 Melika Emami (UCLA) Generalization Error of GLMs ICML 2020 11 / 15

Proof Technique: Multi-Layer Representation z 0 0 = w 0 z 0 3 = y Σ 1 / 2 φ out ( · ) U tr • Represent the mapping w 0 �→ y as a multi-layer network y = φ out ( X w , d ) • Decompose Gaussian training data X with covariance Σ tr 1 X = UΣ tr , 2 U i.i.d. Gaussian 1 • Use SVD of U and eigendecomposition of Σ tr : 2 Σ tr = 1 p V T 0 diag( s 2 tr ) V 0 , U = V 2 S mp V 1 • V 0 , V 1 , V 2 : Haar-distributed • S mp : Singular values of U – converges in distribution to Marchenko-Pastur law Melika Emami (UCLA) Generalization Error of GLMs ICML 2020 12 / 15

Proof Technique: Multi-Layer VAMP z 0 0 = w 0 z 0 3 = y p 0 z 0 p 0 z 0 p 0 2 = X w 0 0 1 1 2 V 0 S tr V 1 S mp V 2 φ out ( · ) • Algorithm to solve inference problem in deep neural networks • Similar to ADMM algorithm for optimization • Statistical guarantees: – Joint distribution of ( W 0 , � W ) and other hidden signals Melika Emami (UCLA) Generalization Error of GLMs ICML 2020 13 / 15

Proof Technique: Generalization Error z 0 0 = w 0 , � z 0 3 = y w p 0 z 0 p 0 z 0 p 0 2 = X w 0 0 1 1 2 V 0 S ts V 1 S mp V 2 φ out ( · ) � � � � � p 0 z 1 p 1 z 2 p 2 • ML-VAMP ⇒ Joint distribution of ( W 0 , � W ) (part (a) of Thm) • Given test data: x T ts = u T diag( s ts ) V 0 2 , � • Find joint distribution of ( P 0 P 2 ) for test data (part (b) of Thm) 2 , � ( P 0 P 2 ) ∼ N ( 0 2 , M ) • Obtain generalization error � � 2 , D ) , φ ( � φ out ( P 0 E ts = E f ts P 2 ) Melika Emami (UCLA) Generalization Error of GLMs ICML 2020 14 / 15

Future Directions • Generalize results to: – Non-Gaussian covariates – Multitask GLMs using multi-layer matrix-valued VAMP – Deeper models like two-layer neural networks – Non-asymptotic regimes • Use results to get: – Generalization errors in reproducing kernel Hilbert spaces, such as NTK space Melika Emami (UCLA) Generalization Error of GLMs ICML 2020 15 / 15

Madhu S Advani and Andrew M Saxe. High-dimensional dynamics of generalization error in neural networks. arXiv preprint arXiv:1710.03667 , 2017. Mikhail Belkin, Daniel Hsu, Siyuan Ma, and Soumik Mandal. Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proc. National Academy of Sciences , 116(32):15849–15854, 2019. Mikhail Belkin, Daniel Hsu, and Ji Xu. Two models of double descent for weak features. arXiv preprint arXiv:1903.07571 , 2019. Jean Barbier, Florent Krzakala, Nicolas Macris, L´ eo Miolane, and Lenka Zdeborov´ a. Optimal errors and phase transitions in high-dimensional generalized linear models. Proc. National Academy of Sciences , 116(12):5451–5460, March 2019. Melika Emami (UCLA) Generalization Error of GLMs ICML 2020 15 / 15

Peter L Bartlett, Philip M Long, G´ abor Lugosi, and Alexander Tsigler. Benign overfitting in linear regression. arXiv preprint arXiv:1906.11300 , 2019. M. Bayati and A. Montanari. The dynamics of message passing on dense graphs, with applications to compressed sensing. IEEE Trans. Inform. Theory , 57(2):764–785, February 2011. Mikhail Belkin, Siyuan Ma, and Soumik Mandal. To understand deep learning we need to understand kernel learning. arXiv preprint arXiv:1802.01396 , 2018. Zeyu Deng, Abla Kammoun, and Christos Thrampoulidis. A model of double descent for high-dimensional binary linear classification. arXiv preprint arXiv:1911.05822 , 2019. David L Donoho, Arian Maleki, and Andrea Montanari. Melika Emami (UCLA) Generalization Error of GLMs ICML 2020 15 / 15

Message-passing algorithms for compressed sensing. Proc. National Academy of Sciences , 106(45):18914–18919, 2009. Alyson K Fletcher, Sundeep Rangan, and P. Schniter. Inference in deep networks in high dimensions. Proc. IEEE Int. Symp. Information Theory , 2018. C´ edric Gerbelot, Alia Abbara, and Florent Krzakala. Asymptotic errors for convex penalized linear regression beyond gaussian matrices. arXiv preprint arXiv:2002.04372 , 2020. Trevor Hastie, Andrea Montanari, Saharon Rosset, and Ryan J Tibshirani. Surprises in high-dimensional ridgeless least squares interpolation. arXiv preprint arXiv:1903.08560 , 2019. Song Mei and Andrea Montanari. The generalization error of random features regression: Precise asymptotics and double descent curve. arXiv preprint arXiv:1908.05355 , 2019. Melika Emami (UCLA) Generalization Error of GLMs ICML 2020 15 / 15

Generalization Error of Generalized Linear Models in High Dimensions - PowerPoint PPT Presentation

Generalization Error of Generalized Linear Models in High Dimensions Melika Emami 1 , Mojtaba Sahraee-Ardakan 1,2 , Parthe Pandit 1,2 , Sundeep Rangan 3 , Alyson K. Fletcher 1,2 1 ECE, UCLA, 2 STAT, UCLA, 3 ECE, NYU ICML 2020 Melika Emami (UCLA)

Overview of logistic regression Richard Erickson Instructor DataCamp Generalized Linear Models

Limitations of linear models Richard Erickson Instructor DataCamp Generalized Linear Models in

Introduction to the R Statistical Computing Environment Linear and Generalized Linear Models in R

Generalized linear models Christopher F Baum EC 823: Applied Econometrics Boston College, Spring

Introduction to General and Generalized Linear Models Generalized Linear Models - part II Henrik

Introduction to General and Generalized Linear Models Generalized Linear Models - part I Henrik

Multiple logistic regression Richard Erickson Instructor DataCamp Generalized Linear Models in

Workshop 11.2a: Generalized Linear Mixed Effects Models (GLMM) Murray Logan February 7, 2017

Introduction to General and Generalized Linear Models Generalized Linear Models - part III Henrik

Generalized Nonlinear Models gnm : a Package for Generalized Nonlinear Models Same form as

Chapter 11: The R.M.S. Error for Regression Errors: A has a large positive error B has a large

Session 06 Generalized Linear Models 1 Nature of the generalization Single response variable,

Generalized Additive Models September 10, 2019 Generalized Additive Models September 10, 2019 1

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Introduction to Data Science: Logistic 0 1 1 according to a data fit criterion. account

ERROR DETECTON & CORRECTION Error Detection EDC= Error Detection and Correction bits

Steins Method for Matrix Concentration Lester Mackey Collaborators: Michael I. Jordan ,

Towards Demystifying Overparameterization in Deep Learning Mahdi Soltanolkotabi Department of

IV and IV-GMM Christopher F Baum ECON 8823: Applied Econometrics Boston College, Spring 2016

Foundations of Machine Learning Boosting Weak Learning (Kearns and Valiant, 1994) Definition:

Truncations of unitary matrices and Brownian bridges Alain Rouault (Laboratoire de

Outline 1

Data Science in the Wild Lecture 8: Advanced Experimental Analysis Eran Toch Data Science in the

Variational Model Selection for Sparse Gaussian Process Regression Michalis K. Titsias School of

Sambuz

Useful Links

Newsletter

Mail Us

Generalization Error of Generalized Linear Models in High Dimensions - PowerPoint PPT Presentation

Generalization Error of Generalized Linear Models in High Dimensions Melika Emami 1 , Mojtaba Sahraee-Ardakan 1,2 , Parthe Pandit 1,2 , Sundeep Rangan 3 , Alyson K. Fletcher 1,2 1 ECE, UCLA, 2 STAT, UCLA, 3 ECE, NYU ICML 2020 Melika Emami (UCLA)

Overview of logistic regression Richard Erickson Instructor DataCamp Generalized Linear Models

Limitations of linear models Richard Erickson Instructor DataCamp Generalized Linear Models in

Introduction to the R Statistical Computing Environment Linear and Generalized Linear Models in R

Generalized linear models Christopher F Baum EC 823: Applied Econometrics Boston College, Spring

Introduction to General and Generalized Linear Models Generalized Linear Models - part II Henrik

Introduction to General and Generalized Linear Models Generalized Linear Models - part I Henrik

Multiple logistic regression Richard Erickson Instructor DataCamp Generalized Linear Models in

Workshop 11.2a: Generalized Linear Mixed Effects Models (GLMM) Murray Logan February 7, 2017

Introduction to General and Generalized Linear Models Generalized Linear Models - part III Henrik

Generalized Nonlinear Models gnm : a Package for Generalized Nonlinear Models Same form as

Chapter 11: The R.M.S. Error for Regression Errors: A has a large positive error B has a large

Session 06 Generalized Linear Models 1 Nature of the generalization Single response variable,

Generalized Additive Models September 10, 2019 Generalized Additive Models September 10, 2019 1

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Introduction to Data Science: Logistic 0 1 1 according to a data fit criterion. account

ERROR DETECTON &amp; CORRECTION Error Detection EDC= Error Detection and Correction bits

Steins Method for Matrix Concentration Lester Mackey Collaborators: Michael I. Jordan ,

Towards Demystifying Overparameterization in Deep Learning Mahdi Soltanolkotabi Department of

IV and IV-GMM Christopher F Baum ECON 8823: Applied Econometrics Boston College, Spring 2016

Foundations of Machine Learning Boosting Weak Learning (Kearns and Valiant, 1994) Definition:

Truncations of unitary matrices and Brownian bridges Alain Rouault (Laboratoire de

Outline 1

Data Science in the Wild Lecture 8: Advanced Experimental Analysis Eran Toch Data Science in the

Variational Model Selection for Sparse Gaussian Process Regression Michalis K. Titsias School of

Sambuz

Useful Links

Newsletter

Mail Us

ERROR DETECTON & CORRECTION Error Detection EDC= Error Detection and Correction bits