Optimization MS Maths Big Data Alexandre Gramfort - PowerPoint PPT Presentation

Optimization MS Maths Big Data Alexandre Gramfort alexandre.gramfort@telecom-paristech.fr Telecom ParisTech M2 Maths Big Data

Notations Ridge regression and quadratic forms SVD Woodbury Dense Ridge Sparse Ridge Plan Notations 1 Ridge regression and quadratic forms 2 SVD 3 Woodbury 4 Dense Ridge 5 Sparse Ridge 6 Alexandre Gramfort - Telecom ParisTech Optimization MS Maths Big Data 2

Notations Ridge regression and quadratic forms SVD Woodbury Dense Ridge Sparse Ridge Optimization problem Definition (Optimization problem ( P )) min f ( x ) , x ∈ C , where f : R n → R ∪ { + ∞} is called the objective function C = { x ∈ R n / g ( x ) ≤ 0 et h ( x ) = 0 } is the feasible set g ( x ) ≤ 0 represent inequality constraints . g ( x ) = ( g 1 ( x ) , . . . , g p ( x )) so with p contraints. h ( x ) = 0 represent equality contraints . h ( x ) = ( h 1 ( x ) , . . . , h q ( x )) so with q contraints. an element x ∈ C is said to be feasible Alexandre Gramfort - Telecom ParisTech Optimization MS Maths Big Data 3

Notations Ridge regression and quadratic forms SVD Woodbury Dense Ridge Sparse Ridge Taylor at order 2 Assuming f is twice differentiable, the Taylor expansion at order 2 of f at x reads: ∀ h ∈ R n , f ( x + h ) = f ( x ) + ∇ f ( x ) T h + 1 2 h T ∇ 2 f ( x ) h + o ( � h � 2 ) ∇ f ( x ) ∈ R n is the gradient. ∇ 2 f ( x ) ∈ R n × n the Hessian matrix. Remark: Local quadratic approximation Alexandre Gramfort - Telecom ParisTech Optimization MS Maths Big Data 4

Notations Ridge regression and quadratic forms SVD Woodbury Dense Ridge Sparse Ridge Ridge regression We consider problems with n samples, observations, and p features, variables. Definition (Ridge regression) Let y ∈ R n the n targets to predict and ( x i ) i the n samples in R p . Ridge regression consists in solving the following problem 2 � y − Xw − b � 2 + λ 1 2 � w � 2 , λ > 0 min w , b where w ∈ R p is called the weights vector, b ∈ R is the intercept (a.k.a. bias) and the ith row of X is x i . : Note that the intercept is not penalized with λ . Remark: Alexandre Gramfort - Telecom ParisTech Optimization MS Maths Big Data 6

Notations Ridge regression and quadratic forms SVD Woodbury Dense Ridge Sparse Ridge Taking care of the intercept There are different ways to deal with the intercept. Option 1: Center the target y and each column feature. After centering the problem reads: 1 2 � y − Xw � 2 + λ 2 � w � 2 , λ > 0 min w Option 2: Add a column of 1 to X and try not to penalize it (too much). Exercise Denote by y ∈ R the mean of y and by X ∈ R p the mean of T ˆ each column of X . Show that ˆ b = − X w + y . Alexandre Gramfort - Telecom ParisTech Optimization MS Maths Big Data 7

Notations Ridge regression and quadratic forms SVD Woodbury Dense Ridge Sparse Ridge Ridge regression Definition (Quadratic form) A quadratic form reads f ( x ) = 1 2 x T Ax + b T x + c where x ∈ R p , A ∈ R p × p , b ∈ R p and c ∈ R . Alexandre Gramfort - Telecom ParisTech Optimization MS Maths Big Data 8

Notations Ridge regression and quadratic forms SVD Woodbury Dense Ridge Sparse Ridge Ridge regression Questions Show that ridge regression boils down to the minimization of a quadratic form. Propose a closed form solution. Show that the solution is obtained by solving a linear system. Is the objective function strongly convex? Assuming n < p what is the value of the constant of strong convexity? Alexandre Gramfort - Telecom ParisTech Optimization MS Maths Big Data 9

Notations Ridge regression and quadratic forms SVD Woodbury Dense Ridge Sparse Ridge Singular value decomposition (SVD) SVD is a factorization of a matrix (real here) M = U Σ V T where M ∈ R n × p , U ∈ R n × n , Σ ∈ R n × p , V ∈ R p × p U T U = UU T = I n (orthogonal matrix) V T V = VV T = I p (orthogonal matrix) Σ diagonal matrix Σ i , i are called the singular values U are left-singular vectors V are right-singular vectors Alexandre Gramfort - Telecom ParisTech Optimization MS Maths Big Data 11

Notations Ridge regression and quadratic forms SVD Woodbury Dense Ridge Sparse Ridge Singular value decomposition (SVD) SVD is a factorization of a matrix (real here) U contains the eigenvectors of MM T associated to the eigenvalues Σ 2 i , i V contains the eigenvectors of M T M associated to the eigenvalues Σ 2 i , i we assume here Σ i , i = 0 for min ( n , p ) ≤ i ≤ max ( n , p ) SVD is particularly useful to find the rank, null-space, image and pseudo-inverse of a matrix Alexandre Gramfort - Telecom ParisTech Optimization MS Maths Big Data 12

Notations Ridge regression and quadratic forms SVD Woodbury Dense Ridge Sparse Ridge Matrix inversion lemma Proposition (Matrix inversion lemma) also known as Sherman–Morrison–Woodbury formula states that: � − 1 VA − 1 , ( A + UCV ) − 1 = A − 1 − A − 1 U C − 1 + VA − 1 U � where A ∈ R n × n , U ∈ R n × k , C ∈ R k × k , V ∈ R k × n . Alexandre Gramfort - Telecom ParisTech Optimization MS Maths Big Data 14

Notations Ridge regression and quadratic forms SVD Woodbury Dense Ridge Sparse Ridge Matrix inversion lemma (proof) Just check that (A+UCV) times the RHS of the Woodbury identity gives the identity matrix: � − 1 VA − 1 � � A − 1 − A − 1 U C − 1 + VA − 1 U � ( A + UCV ) = I + UCVA − 1 − ( U + UCVA − 1 U )( C − 1 + VA − 1 U ) − 1 VA − 1 = I + UCVA − 1 − UC ( C − 1 + VA − 1 U )( C − 1 + VA − 1 U ) − 1 VA − 1 = I + UCVA − 1 − UCVA − 1 = I Questions Using the matrix inversion lemma show that if n < p , the ridge regression problem can be solved by inverting a matrix of size n × n rather than p × p . Alexandre Gramfort - Telecom ParisTech Optimization MS Maths Big Data 15

Notations Ridge regression and quadratic forms SVD Woodbury Dense Ridge Sparse Ridge Primal and dual implementation The solution of the ridge regression problem (without intercept) is obtained by solving the problem in the primal form: w = ( X T X + λ I p ) − 1 X T y ˆ or in the dual form: w = X T ( XX T + λ I n ) − 1 y ˆ In the dual formulation the matrix to invert in R n × n . What if X is sparse, n is 1e5 and p is 1e6? Alexandre Gramfort - Telecom ParisTech Optimization MS Maths Big Data 17

Notations Ridge regression and quadratic forms SVD Woodbury Dense Ridge Sparse Ridge Conjugate gradient: Solve Ax = b , A ∈ R n × n and b ∈ R n 1: x 0 ∈ R n , g 0 = Ax 0 − b 2: for k = 0 to n do if g k = 0 then 3: break 4: end if 5: if k = 0 then 6: w k = g 0 7: else 8: � g k , Aw k − 1 � α k = − 9: � w k − 1 , Aw k − 1 � w k = g k + α k w k − 1 10: end if 11: � g k , w k � ρ k = 12: � w k , Aw k � x k + 1 = x k − ρ k w k 13: g k + 1 = Ax k + 1 − b 14: 15: end for 16: return x k + 1 Alexandre Gramfort - Telecom ParisTech Optimization MS Maths Big Data 19

Notations Ridge regression and quadratic forms SVD Woodbury Dense Ridge Sparse Ridge Sparse ridge with CG cf. Notebook Alexandre Gramfort - Telecom ParisTech Optimization MS Maths Big Data 20

Notations Ridge regression and quadratic forms SVD Woodbury Dense Ridge Sparse Ridge Logistic regression with CG cf. Notebook Alexandre Gramfort - Telecom ParisTech Optimization MS Maths Big Data 21

Optimization MS Maths Big Data Alexandre Gramfort - PowerPoint PPT Presentation

Optimization MS Maths Big Data Alexandre Gramfort alexandre.gramfort@telecom-paristech.fr Telecom ParisTech M2 Maths Big Data Notations Ridge regression and quadratic forms SVD Woodbury Dense Ridge Sparse Ridge Plan Notations 1 Ridge

Key Maths 3 UK Assessm ent overview Claire Parsons Overview 1. Key Maths 3 UK (overview) 2.

Welcome Maths in Year 1 Aims Maths teaching in Year 1 Different areas of Maths How

Monday Tuesday Wednesday Thursday Friday 8:50 -9am Registration Maths meetings /Pre- Maths

The New Curriculum Maths Your friendly Maths team: Lizzie Kirk, Helen Twining and Helen Bramall

Machine Learning Anders Holst SICS Big Data Analytics Analysis Big Data Big Value Big Data

Welcome! Aims Give an overview of the GFS maths vision Dispel some maths myths

Reception Maths Workshop Maths in Early Years Maths in the Early Years builds an important

Warstones Primary ry Why is maths so important? Maths is everywhere in the world around them.

Blackboard Collaborate ?? CONNECT WITH MATHS ~ MATHS IN ACTION ~ COMMUNITY LAUNCH 21/07/2014

Big Data Algorithms with Medical Applications Yixin Chen Outline Challenges to big data

CS535 Big Data 1/22/2020 Sangmi Lee Pallickara CS535 Big Data | Computer Science Department

COMP9313: Big Data Management Introduction to Big Data Management What is big data? Tweeted by

Optimization in the Big Data Regime Sham M. Kakade Machine Learning for Big Data

Big Maths At David Livingstone Memorial Primary School Why change the way we teach maths and

Maths in Year 1 Recap, Consolidation and Mastery Maths across the school Visual Range

Maths and English Evening To broaden understanding about Maths and English GCSE How to support

Lecture 08: Ridge Regression, Equivalent Formulations and KKT Conditions Instructor: Prof. Ganesh

Survey of Machine Learning Methods Pedro Rodriguez CU Boulder PhD Student in Large-Scale Machine

Why LASSO, Ridge Need for Strictly . . . Regression, and EN: General Analysis of the . . . Why

COMS 4721: Machine Learning for Data Science Lecture 4, 1/26/2017 Prof. John Paisley Department

CS480/680 Machine Learning Lecture 3: May 13, 2019 Linear Regression [RN] Sec. 18.6.1, [HTF]

Lecture 3: Kernel Regression Curse of Dimensionality Aykut Erdem February 2016 Hacettepe

5. Summary of linear regression so far Main points Model/function/predictor class of linear

CSI5180. MachineLearningfor BioinformaticsApplications Regularized Linear Models by Marcel