generalization error of generalized linear models in high
play

Generalization Error of Generalized Linear Models in High Dimensions - PowerPoint PPT Presentation

Generalization Error of Generalized Linear Models in High Dimensions Melika Emami 1 , Mojtaba Sahraee-Ardakan 1,2 , Parthe Pandit 1,2 , Sundeep Rangan 3 , Alyson K. Fletcher 1,2 1 ECE, UCLA, 2 STAT, UCLA, 3 ECE, NYU ICML 2020 Melika Emami (UCLA)


  1. Generalization Error of Generalized Linear Models in High Dimensions Melika Emami 1 , Mojtaba Sahraee-Ardakan 1,2 , Parthe Pandit 1,2 , Sundeep Rangan 3 , Alyson K. Fletcher 1,2 1 ECE, UCLA, 2 STAT, UCLA, 3 ECE, NYU ICML 2020 Melika Emami (UCLA) Generalization Error of GLMs ICML 2020 1 / 15

  2. Overview • Generalization Error: Performance on new data • Fundamental question in modern systems: – Low generalization error despite over-parameterization [BHMM19] • This work : Exact calculation of generalization error for GLMs – High dimensional regime – Double descent phenomenon Melika Emami (UCLA) Generalization Error of GLMs ICML 2020 2 / 15

  3. Overview y = φ out ( � x , w 0 � , d ) • Generalized linear models (GLMs): d w 0 0 x i φ out ( · ) y Σ w 0 1 w 0 p − 1 • Regularized ERM: w = argmin � F out ( y , X w ) + F in ( w ) w • Generalization error: E f ts ( y ts , � y ts ) (1) – Test sample: ( x ts , y ts ) � x ts , w 0 � – y ts = φ out ( , d ts ) , � y ts = φ ( � x ts , � w � ) Melika Emami (UCLA) Generalization Error of GLMs ICML 2020 3 / 15

  4. Overview • Prior work – Understanding generalization in deep neural nets [BMM18, BHX19, BLLT19, NLB + 18, ZBH + 16, AS17] – Linear models [MRSY19, DKT19, MM19, HMRT19, GAK20] – GLMs with uncorrelated features [BKM + 19] • Our contribution : – A procedure for characterizing generalization error (1) – General test metrics, training losses, regularizers, link function – Correlated covariates – Train-test distributional mismatch – Over-parameterized and under-parameterized regime Melika Emami (UCLA) Generalization Error of GLMs ICML 2020 4 / 15

  5. Outline Main Result Scalar Equivalent System Main Theorem Examples Linear Regression Logistic Regression Non-linear Regression Proof Technique Multi-layer VAMP Future Directions Melika Emami (UCLA) Generalization Error of GLMs ICML 2020 5 / 15

  6. Scalar Equivalent System True vector system Scalar equivalent system N (0 , τ ) w 0 d � w y W 0 � W z ≡ φ out ( · ) + X Est Denoiser High dimensional: Hard to analyze Scalar: Easy to analyze � w = argmin F out ( y , X w ) + F in ( w ) (2) w • Key tool : Approximate Message Passing (AMP) framework [DMM09, BM11, RSF19, FRS18, PSAR + 20] – As a constructive proof technique – Performance of the estimates: → deterministic recursive equations: state evolution (SE) Melika Emami (UCLA) Generalization Error of GLMs ICML 2020 6 / 15

  7. Main Result True vector system Scalar equivalent system N (0 , τ ) w 0 d � w y � W 0 W z ≡ φ out ( · ) X Est + Denoiser High dimensional: Hard to analyze Scalar: Easy to analyze Theorem (Generalization error of GLMs) (a) Under some regularity conditions on f ts , φ, φ out , the above convergence is rigorous: � N 1 f ( w 0 w i ) = E f ( W 0 , � lim W ) i , � a.s. N N →∞ i =1 W = prox f in /γ ( W 0 + Q ) , � ( independent of W 0 ) Q = N (0 , τ ) (b) Generalization error: � � φ out ( Z ts , D ) , φ ( � ( Z ts , � E ts = E f ts Z ts ) , Z ts ) ∼ N ( 0 2 , M ) ⊥ ( Z ts , � τ, γ , and M are computed by SE equations, and D ⊥ Z ts ) Melika Emami (UCLA) Generalization Error of GLMs ICML 2020 7 / 15

  8. Example Setting • Train-test distributional mismatch – x train ∼ N ( 0 , Σ tr ), x test ∼ N ( 0 , Σ ts ), Σ tr and Σ ts commute – i.i.d. log-normal eigenvalues � log( S 2 � � � 1 �� tr ) ρ i . i . d . ∼ N 0 , σ ∀ i log( S 2 ts ) ρ 1 • 3 different cases : (i) Uncorrelated features ( σ = 0) Σ tr = Σ ts = I (ii) Correlated features ( σ > 0 , ρ = 1) Σ tr = Σ ts � = I Σ tr � = Σ ts (iii) Mismatched features ( σ > 0 , ρ < 1) Melika Emami (UCLA) Generalization Error of GLMs ICML 2020 8 / 15

  9. Example: Linear Regression • Under-regularized linear regression: – φ out ( p, d ) = p + d , and d ∼ N (0 , σ 2 d ) – MSE output loss – double descent phenomenon (Recovered result of [HMRT19]) Melika Emami (UCLA) Generalization Error of GLMs ICML 2020 9 / 15

  10. Example: Logistic Regression • Logistic regression – Logistic output P ( y = 1) = 1 / (1 + e − p ) – Binary cross-entropy loss with ℓ 2 regularization Melika Emami (UCLA) Generalization Error of GLMs ICML 2020 10 / 15

  11. Example: Non-linear Regression • Non-linear Regression d ∼ N (0 , σ 2 – φ out ( p, d ) = tanh( p ) + d, d ) 1 d ( y − tanh( p )) 2 – f out ( y, p ) = 2 σ 2 Melika Emami (UCLA) Generalization Error of GLMs ICML 2020 11 / 15

  12. Proof Technique: Multi-Layer Representation z 0 0 = w 0 z 0 3 = y Σ 1 / 2 φ out ( · ) U tr • Represent the mapping w 0 �→ y as a multi-layer network y = φ out ( X w , d ) • Decompose Gaussian training data X with covariance Σ tr 1 X = UΣ tr , 2 U i.i.d. Gaussian 1 • Use SVD of U and eigendecomposition of Σ tr : 2 Σ tr = 1 p V T 0 diag( s 2 tr ) V 0 , U = V 2 S mp V 1 • V 0 , V 1 , V 2 : Haar-distributed • S mp : Singular values of U – converges in distribution to Marchenko-Pastur law Melika Emami (UCLA) Generalization Error of GLMs ICML 2020 12 / 15

  13. Proof Technique: Multi-Layer VAMP z 0 0 = w 0 z 0 3 = y p 0 z 0 p 0 z 0 p 0 2 = X w 0 0 1 1 2 V 0 S tr V 1 S mp V 2 φ out ( · ) • Algorithm to solve inference problem in deep neural networks • Similar to ADMM algorithm for optimization • Statistical guarantees: – Joint distribution of ( W 0 , � W ) and other hidden signals Melika Emami (UCLA) Generalization Error of GLMs ICML 2020 13 / 15

  14. Proof Technique: Generalization Error z 0 0 = w 0 , � z 0 3 = y w p 0 z 0 p 0 z 0 p 0 2 = X w 0 0 1 1 2 V 0 S ts V 1 S mp V 2 φ out ( · ) � � � � � p 0 z 1 p 1 z 2 p 2 • ML-VAMP ⇒ Joint distribution of ( W 0 , � W ) (part (a) of Thm) • Given test data: x T ts = u T diag( s ts ) V 0 2 , � • Find joint distribution of ( P 0 P 2 ) for test data (part (b) of Thm) 2 , � ( P 0 P 2 ) ∼ N ( 0 2 , M ) • Obtain generalization error � � 2 , D ) , φ ( � φ out ( P 0 E ts = E f ts P 2 ) Melika Emami (UCLA) Generalization Error of GLMs ICML 2020 14 / 15

  15. Future Directions • Generalize results to: – Non-Gaussian covariates – Multitask GLMs using multi-layer matrix-valued VAMP – Deeper models like two-layer neural networks – Non-asymptotic regimes • Use results to get: – Generalization errors in reproducing kernel Hilbert spaces, such as NTK space Melika Emami (UCLA) Generalization Error of GLMs ICML 2020 15 / 15

  16. Madhu S Advani and Andrew M Saxe. High-dimensional dynamics of generalization error in neural networks. arXiv preprint arXiv:1710.03667 , 2017. Mikhail Belkin, Daniel Hsu, Siyuan Ma, and Soumik Mandal. Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proc. National Academy of Sciences , 116(32):15849–15854, 2019. Mikhail Belkin, Daniel Hsu, and Ji Xu. Two models of double descent for weak features. arXiv preprint arXiv:1903.07571 , 2019. Jean Barbier, Florent Krzakala, Nicolas Macris, L´ eo Miolane, and Lenka Zdeborov´ a. Optimal errors and phase transitions in high-dimensional generalized linear models. Proc. National Academy of Sciences , 116(12):5451–5460, March 2019. Melika Emami (UCLA) Generalization Error of GLMs ICML 2020 15 / 15

  17. Peter L Bartlett, Philip M Long, G´ abor Lugosi, and Alexander Tsigler. Benign overfitting in linear regression. arXiv preprint arXiv:1906.11300 , 2019. M. Bayati and A. Montanari. The dynamics of message passing on dense graphs, with applications to compressed sensing. IEEE Trans. Inform. Theory , 57(2):764–785, February 2011. Mikhail Belkin, Siyuan Ma, and Soumik Mandal. To understand deep learning we need to understand kernel learning. arXiv preprint arXiv:1802.01396 , 2018. Zeyu Deng, Abla Kammoun, and Christos Thrampoulidis. A model of double descent for high-dimensional binary linear classification. arXiv preprint arXiv:1911.05822 , 2019. David L Donoho, Arian Maleki, and Andrea Montanari. Melika Emami (UCLA) Generalization Error of GLMs ICML 2020 15 / 15

  18. Message-passing algorithms for compressed sensing. Proc. National Academy of Sciences , 106(45):18914–18919, 2009. Alyson K Fletcher, Sundeep Rangan, and P. Schniter. Inference in deep networks in high dimensions. Proc. IEEE Int. Symp. Information Theory , 2018. C´ edric Gerbelot, Alia Abbara, and Florent Krzakala. Asymptotic errors for convex penalized linear regression beyond gaussian matrices. arXiv preprint arXiv:2002.04372 , 2020. Trevor Hastie, Andrea Montanari, Saharon Rosset, and Ryan J Tibshirani. Surprises in high-dimensional ridgeless least squares interpolation. arXiv preprint arXiv:1903.08560 , 2019. Song Mei and Andrea Montanari. The generalization error of random features regression: Precise asymptotics and double descent curve. arXiv preprint arXiv:1908.05355 , 2019. Melika Emami (UCLA) Generalization Error of GLMs ICML 2020 15 / 15

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend