Regularization via Spectral Filtering Lorenzo Rosasco MIT, 9.520 - PowerPoint PPT Presentation

Regularization via Spectral Filtering Lorenzo Rosasco MIT, 9.520 Class 7 L. Rosasco Regularization via Spectral Filtering

About this class Goal To discuss how a class of regularization methods originally designed for solving ill-posed inverse problems, give rise to regularized learning algorithms. These algorithms are kernel methods that can be easily implemented and have a common derivation, but different computational and theoretical properties. L. Rosasco Regularization via Spectral Filtering

Plan From ERM to Tikhonov regularization. Linear ill-posed problems and stability. Spectral Regularization and Filtering. Example of Algorithms. L. Rosasco Regularization via Spectral Filtering

Basic Notation training set S = { ( x 1 , y 1 ) , ..., ( x n , y n ) } . X is the n by d input matrix. Y = ( y 1 , . . . , y n ) is the output vector. k denotes the kernel function , K the n by n kernel matrix with entries K ij = k ( x i , x j ) and H the RKHS with kernel k . RLS estimator solves n 1 ( y i − f ( x i )) 2 + λ � f � 2 � H . min n f ∈H i = 1 L. Rosasco Regularization via Spectral Filtering

Representer Theorem We have seen that RKHS allow us to write the RLS estimator in the form n f λ � S ( x ) = c i k ( x , x i ) i = 1 with ( K + n λ I ) c = Y where c = ( c 1 , . . . , c n ) . L. Rosasco Regularization via Spectral Filtering

The Role of Regularization We observed that adding a penalization term can be interpreted as way to to control smoothness and avoid overfitting n n 1 1 ( y i − f ( x i )) 2 ⇒ min ( y i − f ( x i )) 2 + λ � f � 2 � � min H . n n f ∈H f ∈H i = 1 i = 1 L. Rosasco Regularization via Spectral Filtering

Empirical risk minimization Similarly we can prove that the solution of empirical risk minimization n 1 � ( y i − f ( x i )) 2 min n f ∈H i = 1 can be written as n � f S ( x ) = c i k ( x , x i ) i = 1 where the coefficients satisfy Kc = Y . L. Rosasco Regularization via Spectral Filtering

The Role of Regularization Now we can observe that adding a penalty has an effect from a numerical point of view: Kc = Y ⇒ ( K + n λ I ) c = Y it stabilizes a possibly ill-conditioned matrix inversion problem. This is the point of view of regularization for (ill-posed) inverse problems. L. Rosasco Regularization via Spectral Filtering

Ill-posed Inverse Problems Hadamard introduced the definition of ill-posedness. Ill-posed problems are typically inverse problems. If g ∈ G and f ∈ F , with G , F Hilbert spaces, a linear, continuous operator L , consider the equation g = Lf . The direct problem is is to compute g given f ; the inverse problem is to compute f given the data g . The inverse problem of finding f is well-posed when the solution exists, is unique and is stable, that is depends continuously on the initial data g . Otherwise the problem is ill-posed. L. Rosasco Regularization via Spectral Filtering

Linear System for ERM In the finite dimensional case the main problem is numerical stability. For example, in the learning setting the kernel matrix can be decomposed as K = Q Σ Q T , with Σ = diag ( σ 1 , . . . , σ n ) , σ 1 ≥ σ 2 ≥ ...σ n ≥ 0 and q 1 , . . . , q n are the corresponding eigenvectors. Then n 1 c = K − 1 Y = Q Σ − 1 Q T Y = � � q i , Y � q i . σ i i = 1 In correspondence of small eigenvalues, small perturbations of the data can cause large changes in the solution. The problem is ill-conditioned. L. Rosasco Regularization via Spectral Filtering

Regularization as a Filter For Tikhonov regularization ( K + n λ I ) − 1 Y c = Q (Σ + n λ I ) − 1 Q T Y = n 1 � = σ i + n λ � q i , Y � q i . i = 1 Regularization filters out the undesired components. σ i + n λ ∼ 1 1 For σ ≫ λ n , then σ i . 1 1 For σ ≪ λ n , then σ i + n λ ∼ λ n . L. Rosasco Regularization via Spectral Filtering

Matrix Function Note that we can look at a scalar function G λ ( σ ) as a function on the kernel matrix. Using the eigen-decomposition of K we can define G λ ( K ) = QG λ (Σ) Q T , meaning n � G λ ( K ) Y = G λ ( σ i ) � q i , Y � q i . i = 1 For Tikhonov 1 G λ ( σ ) = σ + n λ. L. Rosasco Regularization via Spectral Filtering

Regularization in Inverse Problems In the inverse problems literature many algorithms are known besides Tikhonov regularization. Each algorithm is defined by a suitable filter function G λ . This class of algorithms is known collectively as spectral regularization. Algorithms are not necessarily based on penalized empirical risk minimization. L. Rosasco Regularization via Spectral Filtering

Algorithms Gradient Descent or Landweber Iteration or L2 Boosting ν -method, accelerated Landweber. Iterated Tikhonov Truncated Singular Value Decomposition (TSVD) Principal Component Regression (PCR) The spectral filtering perspective leads to a unified framework. L. Rosasco Regularization via Spectral Filtering

Properties of Spectral Filters Not every scalar function defines a regularization scheme. Roughly speaking a good filter function must have the following properties: as λ goes to 0, G λ ( σ ) → 1 /σ so that G λ ( K ) → K − 1 . λ controls the magnitude of the (smaller) eigenvalues of G λ ( K ) . L. Rosasco Regularization via Spectral Filtering

Spectral Regularization for Learning We can define a class of Kernel Methods as follows. Spectral Regularization We look for estimators n f λ � S ( X ) = c i k ( x , x i ) i = 1 where c = G λ ( K ) Y . L. Rosasco Regularization via Spectral Filtering

Gradient Descent Consider the (Landweber) iteration: gradient descent set c 0 = 0 for i = 1 , . . . , t − 1 c i = c i − 1 + η ( Y − Kc i − 1 ) If the largest eigenvalue of K is smaller than n the above iteration converges if we choose the step-size η = 2 / n . The above iteration can be seen as the minimization of the empirical risk 1 n � Y − Kc � 2 2 via gradient descent. L. Rosasco Regularization via Spectral Filtering

Gradient Descent as Spectral Filtering Note that c 0 = 0, c 1 = η Y , c 2 = η Y + η ( I − η K ) Y c 3 = η Y + η ( I − η K ) Y + η ( Y − K ( η Y + η ( I − η K ) Y )) = η Y + η ( I − η K ) Y + η ( I − 2 η K + η 2 K 2 ) Y One can prove by induction that the solution at the t − th iteration is given by t − 1 � ( I − η K ) i Y . c = η i = 0 The filter function is t − 1 � ( I − ησ ) i . G λ ( σ ) = η i = 0 L. Rosasco Regularization via Spectral Filtering

Landweber iteration i ≥ 0 x i = 1 / ( 1 − x ) , also holds replacing x with the a Note that � matrix. If we consider the kernel matrix (or rather I − η K ) we get t − 1 ∞ K − 1 = η ( I − η K ) i ∼ η � � ( I − η K ) i . i = 0 i = 0 The filter function of Landweber iteration corresponds to a truncated power expansion of K − 1 . L. Rosasco Regularization via Spectral Filtering

Early Stopping The regularization parameter is the number of iteration. Roughly speaking t ∼ 1 /λ . Large values of t correspond to minimization of the empirical risk and tend to overfit. Small values of t tends to oversmooth, recall we start from c = 0. Early stopping of the iteration has a regularization effect. L. Rosasco Regularization via Spectral Filtering

Gradient Descent at Work 2 1.5 1 0.5 Y 0 −0.5 −1 −1.5 −2 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 X L. Rosasco Regularization via Spectral Filtering

Regularization via Spectral Filtering Lorenzo Rosasco MIT, 9.520 - PowerPoint PPT Presentation

Regularization via Spectral Filtering Lorenzo Rosasco MIT, 9.520 Class 7 L. Rosasco Regularization via Spectral Filtering About this class Goal To discuss how a class of regularization methods originally designed for solving ill-posed inverse

Spectral Clustering Spectral Clustering? Spectral methods Methods using eigenvectors of

Filtering Cubemaps Filtering Cubemaps Angular Extent Filtering and Edge Seam Fixup Methods

Traffic Control Mechanisms Filtering Source address filtering Other forms of filtering

Lesson 7 Rate Conversion Filtering and Downsampling interchange Filtering and Upsampling

Regularization Overview Regularization Overview Problems & Multicollinearity We will

Introduction CSCE 970 CSCE 970 Lecture 3: Lecture 3: Regularization Regularization CSCE 970

Regularization Regularization is a general approach to add a complexity parameter to a

An Introduction to Spectral Learning Hanxiao Liu November 8, 2013 An Introduction to Spectral

RLS Adaptive Filtering with Sparsity Regularization S IO Asst. Prof. Ender M. EK GLU

Poster #190 1 Spectral Clustering of Signed Graphs Poster #190 Our Goal: Extend Spectral

aHomestake Array and Wiener Filtering Array Coherence Wiener Filtering Velocity Measurements

Least-Action Filtering L. C. G. Rogers Statistical Laboratory, University of Cambridge

The Filtering Matrix Interrogating Internet Filtering and Surveillance Practices Worldwide Nart

Statistical Filtering and Control for AI and Robotics Part I. Bayes filtering Riccardo Muradore

1 An Filtering System that Monitors Document Search Engines Can Help, But Not Enough!

CS490W: What is Collaborative Filtering? Collaborative Filtering (CF): Making recommendation

v1 v2 vn my-map : (define (my-map f xs) F F F (F v2) (F vn) (F v1) (if (null? xs) null

Pricing and Hedging of Credit Derivatives via Nonlinear Filtering R udiger Frey Universit

Sequential Monte Carlo Dr. Jarad Niemi STAT 615 - Iowa State University October 20, 2017 Jarad

Convolution Pyramids Zeev Farbman , Raanan Fattal and Dani Lischinski Motivation SIGGRAPH Asia

Polynomial Methods in Time-Series Analysis . Aparicio-Prez 1 F 1 National Institute of

!"#$"%&'()%+'"#%'# ,'-%./&0#!&1+%0*%)&0

Lecture 4: Filtered Noise Mark Hasegawa-Johnson ECE 417: Multimedia Signal Processing, Fall 2020

Stabilization of POD-ROMs David Wells Virginia Tech/Rensselaer Polytechnic Institute Wednesday,

Regularization via Spectral Filtering Lorenzo Rosasco MIT, 9.520 - PowerPoint PPT Presentation

Regularization via Spectral Filtering Lorenzo Rosasco MIT, 9.520 Class 7 L. Rosasco Regularization via Spectral Filtering About this class Goal To discuss how a class of regularization methods originally designed for solving ill-posed inverse

Spectral Clustering Spectral Clustering? Spectral methods Methods using eigenvectors of

Filtering Cubemaps Filtering Cubemaps Angular Extent Filtering and Edge Seam Fixup Methods

Traffic Control Mechanisms Filtering Source address filtering Other forms of filtering

Lesson 7 Rate Conversion Filtering and Downsampling interchange Filtering and Upsampling

Regularization Overview Regularization Overview Problems &amp; Multicollinearity We will

Introduction CSCE 970 CSCE 970 Lecture 3: Lecture 3: Regularization Regularization CSCE 970

Regularization Regularization is a general approach to add a complexity parameter to a

An Introduction to Spectral Learning Hanxiao Liu November 8, 2013 An Introduction to Spectral

RLS Adaptive Filtering with Sparsity Regularization S IO Asst. Prof. Ender M. EK GLU

Poster #190 1 Spectral Clustering of Signed Graphs Poster #190 Our Goal: Extend Spectral

aHomestake Array and Wiener Filtering Array Coherence Wiener Filtering Velocity Measurements

Least-Action Filtering L. C. G. Rogers Statistical Laboratory, University of Cambridge

The Filtering Matrix Interrogating Internet Filtering and Surveillance Practices Worldwide Nart

Statistical Filtering and Control for AI and Robotics Part I. Bayes filtering Riccardo Muradore

1 An Filtering System that Monitors Document Search Engines Can Help, But Not Enough!

CS490W: What is Collaborative Filtering? Collaborative Filtering (CF): Making recommendation

v1 v2 vn my-map : (define (my-map f xs) F F F (F v2) (F vn) (F v1) (if (null? xs) null

Pricing and Hedging of Credit Derivatives via Nonlinear Filtering R udiger Frey Universit

Sequential Monte Carlo Dr. Jarad Niemi STAT 615 - Iowa State University October 20, 2017 Jarad

Convolution Pyramids Zeev Farbman , Raanan Fattal and Dani Lischinski Motivation SIGGRAPH Asia

Polynomial Methods in Time-Series Analysis . Aparicio-Prez 1 F 1 National Institute of

!&quot;#$&quot;%&amp;'()*%+'&quot;#%'# ,'-%./&amp;0#!&amp;*1+%0*%)&amp;0

Lecture 4: Filtered Noise Mark Hasegawa-Johnson ECE 417: Multimedia Signal Processing, Fall 2020

Stabilization of POD-ROMs David Wells Virginia Tech/Rensselaer Polytechnic Institute Wednesday,

Regularization Overview Regularization Overview Problems & Multicollinearity We will

!"#$"%&'()%+'"#%'# ,'-%./&0#!&1+%0*%)&0