on coresets for regularized regression
play

On Coresets For Regularized Regression ICML 2020 Rachit Chhaya , - PowerPoint PPT Presentation

On Coresets For Regularized Regression ICML 2020 Rachit Chhaya , Anirban Dasgupta and Supratim Shit IIT Gandhinagar June 15, 2020 Motivation Coresets : Small summary of data for some cost function as proxy for original data Motivation


  1. On Coresets For Regularized Regression ICML 2020 Rachit Chhaya , Anirban Dasgupta and Supratim Shit IIT Gandhinagar June 15, 2020

  2. Motivation ◮ Coresets : Small summary of data for some cost function as proxy for original data

  3. Motivation ◮ Coresets : Small summary of data for some cost function as proxy for original data ◮ Coresets for ridge regression (smaller) shown by [ACW17].

  4. Motivation ◮ Coresets : Small summary of data for some cost function as proxy for original data ◮ Coresets for ridge regression (smaller) shown by [ACW17]. ◮ No study of coresets for regularized regression for general p -norm.

  5. Motivation ◮ Coresets : Small summary of data for some cost function as proxy for original data ◮ Coresets for ridge regression (smaller) shown by [ACW17]. ◮ No study of coresets for regularized regression for general p -norm.

  6. Our Contributions ◮ No coreset for min x ∈ R d � Ax − b � r p + λ � x � s q , where r � = s smaller in size than that for min x ∈ R d � Ax − b � r p

  7. Our Contributions ◮ No coreset for min x ∈ R d � Ax − b � r p + λ � x � s q , where r � = s smaller in size than that for min x ∈ R d � Ax − b � r p ◮ Implies no coreset for Lasso smaller in size than that of least squares regression ◮ Introducing modified lasso and building smaller coreset for it.

  8. Our Contributions ◮ No coreset for min x ∈ R d � Ax − b � r p + λ � x � s q , where r � = s smaller in size than that for min x ∈ R d � Ax − b � r p ◮ Implies no coreset for Lasso smaller in size than that of least squares regression ◮ Introducing modified lasso and building smaller coreset for it. ◮ Coresets for ℓ p -regression with ℓ p regularization. Extension to multiple response regression

  9. Our Contributions ◮ No coreset for min x ∈ R d � Ax − b � r p + λ � x � s q , where r � = s smaller in size than that for min x ∈ R d � Ax − b � r p ◮ Implies no coreset for Lasso smaller in size than that of least squares regression ◮ Introducing modified lasso and building smaller coreset for it. ◮ Coresets for ℓ p -regression with ℓ p regularization. Extension to multiple response regression ◮ Empirical Evaluations

  10. Coresets Definition For ǫ > 0, a dataset A , a non-negative function f and a query space Q , C is an ǫ -coreset of A if ∀ q ∈ Q � � � � � f q ( A ) − f q ( C ) � ≤ ǫ f q ( A ) � � We construct coresets which are subsamples (rescaled) from the original data

  11. Sensitivity [LS10] Definition The sensitivity of the i th point of some dataset X for a func- tion f and query space Q is defined as f q ( x i ) x ′∈ X f q ( x ′ ) . s i = sup q ∈ Q �

  12. Sensitivity [LS10] Definition The sensitivity of the i th point of some dataset X for a func- tion f and query space Q is defined as f q ( x i ) x ′∈ X f q ( x ′ ) . s i = sup q ∈ Q � ◮ Determines highest fractional contribution of point to the cost function

  13. Sensitivity [LS10] Definition The sensitivity of the i th point of some dataset X for a func- tion f and query space Q is defined as f q ( x i ) x ′∈ X f q ( x ′ ) . s i = sup q ∈ Q � ◮ Determines highest fractional contribution of point to the cost function ◮ Can be used to create coresets. Coreset size is function of sum of sensitivities and dimension of query space

  14. Sensitivity [LS10] Definition The sensitivity of the i th point of some dataset X for a func- tion f and query space Q is defined as f q ( x i ) x ′∈ X f q ( x ′ ) . s i = sup q ∈ Q � ◮ Determines highest fractional contribution of point to the cost function ◮ Can be used to create coresets. Coreset size is function of sum of sensitivities and dimension of query space ◮ Upper bounds to sensitivities are enough [FL11, BFL16]

  15. Coresets for Regularized Regression ◮ Regularization is important to prevent overfitting, numerical stability, induce sparsity etc.

  16. Coresets for Regularized Regression ◮ Regularization is important to prevent overfitting, numerical stability, induce sparsity etc. We are interested in the following problem : For λ > 0 x ∈ R d � Ax − b � r p + λ � x � s min q for p , q ≥ 1 and r , s > 0.

  17. Coresets for Regularized Regression ◮ Regularization is important to prevent overfitting, numerical stability, induce sparsity etc. We are interested in the following problem : For λ > 0 x ∈ R d � Ax − b � r p + λ � x � s min q for p , q ≥ 1 and r , s > 0. b ) such that ∀ x ∈ R d and ∀ λ > 0, A coreset for this problem is (˜ A , ˜ � ~ Ax − ~ b � r p + λ � x � s q ∈ ( 1 ± ǫ )( � Ax − b � r p + λ � x � s q )

  18. Main Question ◮ Coresets for unregularized regression work for regularized counterpart

  19. Main Question ◮ Coresets for unregularized regression work for regularized counterpart ◮ [ACW17] showed coreset for ridge regression using ridge leverage scores. Coreset smaller than coresets for least squares regression

  20. Main Question ◮ Coresets for unregularized regression work for regularized counterpart ◮ [ACW17] showed coreset for ridge regression using ridge leverage scores. Coreset smaller than coresets for least squares regression ◮ Intuition : Regularization imposes a constraint on the solution space.

  21. Main Question ◮ Coresets for unregularized regression work for regularized counterpart ◮ [ACW17] showed coreset for ridge regression using ridge leverage scores. Coreset smaller than coresets for least squares regression ◮ Intuition : Regularization imposes a constraint on the solution space. ◮ Can we expect all regularized problems to have a smaller size coresets, than the unregularized version? For e.g. for Lasso

  22. Our Main Result Theorem Given a matrix A ∈ R n × d and λ > 0, any coreset for the problem � Ax � r p + λ � x � s q , where r � = s , p , q ≥ 1 and r , s > 0, is also a coreset for � Ax � r p .

  23. Our Main Result Theorem Given a matrix A ∈ R n × d and λ > 0, any coreset for the problem � Ax � r p + λ � x � s q , where r � = s , p , q ≥ 1 and r , s > 0, is also a coreset for � Ax � r p . Implication : Smaller coresets for regularized problem are not obtained when r � = s

  24. Our Main Result Theorem Given a matrix A ∈ R n × d and λ > 0, any coreset for the problem � Ax � r p + λ � x � s q , where r � = s , p , q ≥ 1 and r , s > 0, is also a coreset for � Ax � r p . Implication : Smaller coresets for regularized problem are not obtained when r � = s The popular Lasso problem falls under this category and hence does not have a coreset smaller than one for least square regression.

  25. Our Main Result Theorem Given a matrix A ∈ R n × d and λ > 0, any coreset for the problem � Ax � r p + λ � x � s q , where r � = s , p , q ≥ 1 and r , s > 0, is also a coreset for � Ax � r p . Implication : Smaller coresets for regularized problem are not obtained when r � = s The popular Lasso problem falls under this category and hence does not have a coreset smaller than one for least square regression. Proof by Contradiction

  26. Modified Lasso x ∈ R d || Ax − b || 2 2 + λ || x || 2 min 1 ◮ Constrained version same as lasso

  27. Modified Lasso x ∈ R d || Ax − b || 2 2 + λ || x || 2 min 1 ◮ Constrained version same as lasso ◮ Empirically shown to induce sparsity like lasso

  28. Modified Lasso x ∈ R d || Ax − b || 2 2 + λ || x || 2 min 1 ◮ Constrained version same as lasso ◮ Empirically shown to induce sparsity like lasso ◮ Allows smaller coreset than least squares regression

  29. Coreset for Modified Lasso Theorem Given a matrix A ∈ R n × d , corresponding vector b ∈ R n , any coreset for the function � Ax − b � p p + λ � x � p p is also a coreset of the function � Ax − b � p p + λ � x � p q where q ≤ p , p , q ≥ 1 .

  30. Coreset for Modified Lasso Theorem Given a matrix A ∈ R n × d , corresponding vector b ∈ R n , any coreset for the function � Ax − b � p p + λ � x � p p is also a coreset of the function � Ax − b � p p + λ � x � p q where q ≤ p , p , q ≥ 1 . ◮ Implication: Coresets for ridge regression also work for modified lasso

  31. Coreset for Modified Lasso Theorem Given a matrix A ∈ R n × d , corresponding vector b ∈ R n , any coreset for the function � Ax − b � p p + λ � x � p p is also a coreset of the function � Ax − b � p p + λ � x � p q where q ≤ p , p , q ≥ 1 . ◮ Implication: Coresets for ridge regression also work for modified lasso ◮ Coreset of size O ( sd λ ( A ) log sd λ ( A ) ) with a high probability for ǫ 2 modified lasso 1 ◮ sd λ ( A ) = � ≤ d j ∈ [ d ] 1 + λ σ 2 j

  32. Coresets for ℓ p Regression with ℓ p Regularization The ℓ p Regression with ℓ p Regularization is given as x ∈ R d � Ax − b � p p + λ � x � p min p Coresets for ℓ p regression constructed using the well conditioned basis Well conditioned Basis [DDH + 09] A matrix U is called an ( α, β, p ) well-conditioned basis for A if � U � p ≤ α and ∀ x ∈ R d , � x � q ≤ β � Ux � p where 1 p + 1 q = 1.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend