On Coresets For Regularized Regression ICML 2020 Rachit Chhaya , - PowerPoint PPT Presentation

On Coresets For Regularized Regression ICML 2020 Rachit Chhaya , Anirban Dasgupta and Supratim Shit IIT Gandhinagar June 15, 2020

Motivation ◮ Coresets : Small summary of data for some cost function as proxy for original data

Motivation ◮ Coresets : Small summary of data for some cost function as proxy for original data ◮ Coresets for ridge regression (smaller) shown by [ACW17].

Motivation ◮ Coresets : Small summary of data for some cost function as proxy for original data ◮ Coresets for ridge regression (smaller) shown by [ACW17]. ◮ No study of coresets for regularized regression for general p -norm.

Our Contributions ◮ No coreset for min x ∈ R d � Ax − b � r p + λ � x � s q , where r � = s smaller in size than that for min x ∈ R d � Ax − b � r p

Our Contributions ◮ No coreset for min x ∈ R d � Ax − b � r p + λ � x � s q , where r � = s smaller in size than that for min x ∈ R d � Ax − b � r p ◮ Implies no coreset for Lasso smaller in size than that of least squares regression ◮ Introducing modified lasso and building smaller coreset for it.

Our Contributions ◮ No coreset for min x ∈ R d � Ax − b � r p + λ � x � s q , where r � = s smaller in size than that for min x ∈ R d � Ax − b � r p ◮ Implies no coreset for Lasso smaller in size than that of least squares regression ◮ Introducing modified lasso and building smaller coreset for it. ◮ Coresets for ℓ p -regression with ℓ p regularization. Extension to multiple response regression

Our Contributions ◮ No coreset for min x ∈ R d � Ax − b � r p + λ � x � s q , where r � = s smaller in size than that for min x ∈ R d � Ax − b � r p ◮ Implies no coreset for Lasso smaller in size than that of least squares regression ◮ Introducing modified lasso and building smaller coreset for it. ◮ Coresets for ℓ p -regression with ℓ p regularization. Extension to multiple response regression ◮ Empirical Evaluations

Coresets Definition For ǫ > 0, a dataset A , a non-negative function f and a query space Q , C is an ǫ -coreset of A if ∀ q ∈ Q � � � � � f q ( A ) − f q ( C ) � ≤ ǫ f q ( A ) � � We construct coresets which are subsamples (rescaled) from the original data

Sensitivity [LS10] Definition The sensitivity of the i th point of some dataset X for a function f and query space Q is defined as f q ( x i ) x ′∈ X f q ( x ′ ) . s i = sup q ∈ Q �

Sensitivity [LS10] Definition The sensitivity of the i th point of some dataset X for a function f and query space Q is defined as f q ( x i ) x ′∈ X f q ( x ′ ) . s i = sup q ∈ Q � ◮ Determines highest fractional contribution of point to the cost function

Sensitivity [LS10] Definition The sensitivity of the i th point of some dataset X for a function f and query space Q is defined as f q ( x i ) x ′∈ X f q ( x ′ ) . s i = sup q ∈ Q � ◮ Determines highest fractional contribution of point to the cost function ◮ Can be used to create coresets. Coreset size is function of sum of sensitivities and dimension of query space

Sensitivity [LS10] Definition The sensitivity of the i th point of some dataset X for a function f and query space Q is defined as f q ( x i ) x ′∈ X f q ( x ′ ) . s i = sup q ∈ Q � ◮ Determines highest fractional contribution of point to the cost function ◮ Can be used to create coresets. Coreset size is function of sum of sensitivities and dimension of query space ◮ Upper bounds to sensitivities are enough [FL11, BFL16]

Coresets for Regularized Regression ◮ Regularization is important to prevent overfitting, numerical stability, induce sparsity etc.

Coresets for Regularized Regression ◮ Regularization is important to prevent overfitting, numerical stability, induce sparsity etc. We are interested in the following problem : For λ > 0 x ∈ R d � Ax − b � r p + λ � x � s min q for p , q ≥ 1 and r , s > 0.

Coresets for Regularized Regression ◮ Regularization is important to prevent overfitting, numerical stability, induce sparsity etc. We are interested in the following problem : For λ > 0 x ∈ R d � Ax − b � r p + λ � x � s min q for p , q ≥ 1 and r , s > 0. b ) such that ∀ x ∈ R d and ∀ λ > 0, A coreset for this problem is (˜ A , ˜ � ~ Ax − ~ b � r p + λ � x � s q ∈ ( 1 ± ǫ )( � Ax − b � r p + λ � x � s q )

Main Question ◮ Coresets for unregularized regression work for regularized counterpart

Main Question ◮ Coresets for unregularized regression work for regularized counterpart ◮ [ACW17] showed coreset for ridge regression using ridge leverage scores. Coreset smaller than coresets for least squares regression

Main Question ◮ Coresets for unregularized regression work for regularized counterpart ◮ [ACW17] showed coreset for ridge regression using ridge leverage scores. Coreset smaller than coresets for least squares regression ◮ Intuition : Regularization imposes a constraint on the solution space.

Main Question ◮ Coresets for unregularized regression work for regularized counterpart ◮ [ACW17] showed coreset for ridge regression using ridge leverage scores. Coreset smaller than coresets for least squares regression ◮ Intuition : Regularization imposes a constraint on the solution space. ◮ Can we expect all regularized problems to have a smaller size coresets, than the unregularized version? For e.g. for Lasso

Our Main Result Theorem Given a matrix A ∈ R n × d and λ > 0, any coreset for the problem � Ax � r p + λ � x � s q , where r � = s , p , q ≥ 1 and r , s > 0, is also a coreset for � Ax � r p .

Our Main Result Theorem Given a matrix A ∈ R n × d and λ > 0, any coreset for the problem � Ax � r p + λ � x � s q , where r � = s , p , q ≥ 1 and r , s > 0, is also a coreset for � Ax � r p . Implication : Smaller coresets for regularized problem are not obtained when r � = s

Our Main Result Theorem Given a matrix A ∈ R n × d and λ > 0, any coreset for the problem � Ax � r p + λ � x � s q , where r � = s , p , q ≥ 1 and r , s > 0, is also a coreset for � Ax � r p . Implication : Smaller coresets for regularized problem are not obtained when r � = s The popular Lasso problem falls under this category and hence does not have a coreset smaller than one for least square regression.

Our Main Result Theorem Given a matrix A ∈ R n × d and λ > 0, any coreset for the problem � Ax � r p + λ � x � s q , where r � = s , p , q ≥ 1 and r , s > 0, is also a coreset for � Ax � r p . Implication : Smaller coresets for regularized problem are not obtained when r � = s The popular Lasso problem falls under this category and hence does not have a coreset smaller than one for least square regression. Proof by Contradiction

Modified Lasso x ∈ R d || Ax − b || 2 2 + λ || x || 2 min 1 ◮ Constrained version same as lasso

Modified Lasso x ∈ R d || Ax − b || 2 2 + λ || x || 2 min 1 ◮ Constrained version same as lasso ◮ Empirically shown to induce sparsity like lasso

Modified Lasso x ∈ R d || Ax − b || 2 2 + λ || x || 2 min 1 ◮ Constrained version same as lasso ◮ Empirically shown to induce sparsity like lasso ◮ Allows smaller coreset than least squares regression

Coreset for Modified Lasso Theorem Given a matrix A ∈ R n × d , corresponding vector b ∈ R n , any coreset for the function � Ax − b � p p + λ � x � p p is also a coreset of the function � Ax − b � p p + λ � x � p q where q ≤ p , p , q ≥ 1 .

Coreset for Modified Lasso Theorem Given a matrix A ∈ R n × d , corresponding vector b ∈ R n , any coreset for the function � Ax − b � p p + λ � x � p p is also a coreset of the function � Ax − b � p p + λ � x � p q where q ≤ p , p , q ≥ 1 . ◮ Implication: Coresets for ridge regression also work for modified lasso

Coreset for Modified Lasso Theorem Given a matrix A ∈ R n × d , corresponding vector b ∈ R n , any coreset for the function � Ax − b � p p + λ � x � p p is also a coreset of the function � Ax − b � p p + λ � x � p q where q ≤ p , p , q ≥ 1 . ◮ Implication: Coresets for ridge regression also work for modified lasso ◮ Coreset of size O ( sd λ ( A ) log sd λ ( A ) ) with a high probability for ǫ 2 modified lasso 1 ◮ sd λ ( A ) = � ≤ d j ∈ [ d ] 1 + λ σ 2 j

Coresets for ℓ p Regression with ℓ p Regularization The ℓ p Regression with ℓ p Regularization is given as x ∈ R d � Ax − b � p p + λ � x � p min p Coresets for ℓ p regression constructed using the well conditioned basis Well conditioned Basis [DDH + 09] A matrix U is called an ( α, β, p ) well-conditioned basis for A if � U � p ≤ α and ∀ x ∈ R d , � x � q ≤ β � Ux � p where 1 p + 1 q = 1.

On Coresets For Regularized Regression ICML 2020 Rachit Chhaya , - PowerPoint PPT Presentation

On Coresets For Regularized Regression ICML 2020 Rachit Chhaya , Anirban Dasgupta and Supratim Shit IIT Gandhinagar June 15, 2020 Motivation Coresets : Small summary of data for some cost function as proxy for original data Motivation

Regularized generalized CCA (RGCCA) Arthur Tenenhaus (SUPELEC) Michel Tenenhaus (HEC Paris) 1

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Coresets for Data-efficient Training of Machine Learning Models Baharan Mirzasoleiman*, Jeff

Coresets Meet EDCS: Algorithms for Matching and Vertex Cover on Massive Graphs Sepehr Assadi

Geometric Approximation Using Coresets Pankaj K. Agarwal Department of Computer Science Duke

speaking, an extent measure of P either computes certain statistics of P itself or of a (possibly

Secure Data Retrieval on the Cloud: Homomorphic Encryption meets Coresets Adi Akavia (University

Coresets for k-Means and k-Median Clustering and their Applications Sariel Har-Peled and Soham

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

Business Statistics CONTENTS Multiple regression Dummy regressors Assumptions of regression

Kernel Methods for Regression Support Vector Regression Gaussian Mixture Regression Gaussian

Lecture 8: Regression Trees Instructor: Saravanan Thirumuruganathan CSE 5334 Saravanan

Multiple Regression and Logistic Regression I Dajiang Liu @PHS 525 Apr-14-2016 Multiple

Planning and Optimization B2. Regression: Introduction & STRIPS Case Malte Helmert and

Steins method, logarithmic and transport inequalities M. Ledoux Institut de Math ematiques

Alessandro Vicini University of Milano, INFN Milano CERN, December 16th 2013 in collaboration

Double quandle coverings Franois Renaud Universit catholique de Louvain Institut de

Building the Easy Button: Automating SAS Program Batch Runs Nancy Brucken inVentiv Health June

Deep Unfolded Proximal Interior Point Algorithm for Image Restoration C. Bertocchi 1 , E.

Modularity of the Abelian Surface of Conductor 277 David S. Yuen Lake Forest College joint work

Elementary (super) groups Julia Pevtsova University of Washington, Seattle Auslander Days 2018

On the Linear Complexity of Legendre-Sidelnikov Sequences Ming Su Nankai University, China

On Coresets For Regularized Regression ICML 2020 Rachit Chhaya , - PowerPoint PPT Presentation

On Coresets For Regularized Regression ICML 2020 Rachit Chhaya , Anirban Dasgupta and Supratim Shit IIT Gandhinagar June 15, 2020 Motivation Coresets : Small summary of data for some cost function as proxy for original data Motivation

Regularized generalized CCA (RGCCA) Arthur Tenenhaus (SUPELEC) Michel Tenenhaus (HEC Paris) 1

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Coresets for Data-efficient Training of Machine Learning Models Baharan Mirzasoleiman*, Jeff

Coresets Meet EDCS: Algorithms for Matching and Vertex Cover on Massive Graphs Sepehr Assadi

Geometric Approximation Using Coresets Pankaj K. Agarwal Department of Computer Science Duke

speaking, an extent measure of P either computes certain statistics of P itself or of a (possibly

Secure Data Retrieval on the Cloud: Homomorphic Encryption meets Coresets Adi Akavia (University

Coresets for k-Means and k-Median Clustering and their Applications Sariel Har-Peled and Soham

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

Business Statistics CONTENTS Multiple regression Dummy regressors Assumptions of regression

Kernel Methods for Regression Support Vector Regression Gaussian Mixture Regression Gaussian

Lecture 8: Regression Trees Instructor: Saravanan Thirumuruganathan CSE 5334 Saravanan

Multiple Regression and Logistic Regression I Dajiang Liu @PHS 525 Apr-14-2016 Multiple

Planning and Optimization B2. Regression: Introduction &amp; STRIPS Case Malte Helmert and

Steins method, logarithmic and transport inequalities M. Ledoux Institut de Math ematiques

Alessandro Vicini University of Milano, INFN Milano CERN, December 16th 2013 in collaboration

Double quandle coverings Franois Renaud Universit catholique de Louvain Institut de

Building the Easy Button: Automating SAS Program Batch Runs Nancy Brucken inVentiv Health June

Deep Unfolded Proximal Interior Point Algorithm for Image Restoration C. Bertocchi 1 , E.

Modularity of the Abelian Surface of Conductor 277 David S. Yuen Lake Forest College joint work

Elementary (super) groups Julia Pevtsova University of Washington, Seattle Auslander Days 2018

On the Linear Complexity of Legendre-Sidelnikov Sequences Ming Su Nankai University, China

Planning and Optimization B2. Regression: Introduction & STRIPS Case Malte Helmert and