Girosi, Jones, and Poggio Regularization theory and neural network - PowerPoint PPT Presentation

Girosi, Jones, and Poggio Regularization theory and neural network architectures presented by Hsin-Hao Yu Department of Cognitive Science October 4, 2001

Learning as function approximation Goal: Given sparse, noisy samples of a function f , how do we recover f as accurately as possible? Why is it hard? Infinitely many curves pass through the samples. This problem is ill-posed . Prior knowledge about the function must be introduced to make the solution unique. Regularization is a theoretical framework to do this. 2

Constraining the solution with “stablizers” Let ( x 1 , y 1 ) . . . ( x N , y N ) be the input data. In order to recover the underlying function, we regularize the ill-posed problem by choosing the function f that minimizes the functional H : H [ f ] = E [ f ] + λφ [ f ] where λ ∈ R is a user chosen constant, E [ f ] represents the “fidelity” of the approximation, N E [ f ] = 1 � ( f ( x i ) − y i ) 2 2 i =1 and φ [ f ] represents a constraint on the “smoothness” of f . φ is called the stablizer . 3

The fidelity vs. smoothness trade-off very small λ intermediate λ very big λ 4

Math review: Calculus of variations Calculus In order to find a number ¯ x such that the function f ( x ) is an extremum at ¯ x , we first calculate the derivative of f , then solve for d f dx = 0 Calculus of variations In order to find a function ¯ f such that the functional H [ f ] is an extremum at ¯ f , we first calculate the functional derivative of H , then solve for δH δf = 0 Calculus Calculus of variations Object for optimization function functional Solution number function d f δH Solve for dx = 0 δf = 0 5

An example of regularization Consider a one-dimensional case. Given input data ( x 1 , y 1 ) . . . ( x N , y N ), we want to minimize the functional H [ f ] = E [ f ] + λφ [ f ] N � ( f ( x i ) − y i ) 2 E [ f ] = i =1 � 2 � � d 2 f φ [ f ] = dx d 2 x To proceed, δH δf = δE δf + λδφ δf 6

Regularization continued � N = 1 δE δ i =1 ( f ( x i ) − y i ) 2 δf 2 δf � � N = 1 δ i =1 ( f ( x ) − y i ) 2 δ ( x − x i ) dx 2 δf � N = 1 δ i =1 ( f ( x ) − y i ) 2 δ ( x − x i ) dx � 2 δf � � N = i =1 ( f ( x ) − y i ) δ ( x − x i ) dx ( d 2 f δφ δ � d 2 x ) 2 dx = δf δf � d 4 f = dx 4 dx δf + λ δφ δH = δE δf δf i =1 ( f ( x ) − y i ) δ ( x − x i ) + λ d 4 f ( � N � = dx 4 ) dx 7

Regularization continued To minimize H [ f ], δH δf = 0 i =1 ( f ( x ) − y i ) δ ( x − x i ) + λ d 4 f � N ⇒ dx 4 = 0 d 4 f � N dx 4 = 1 ⇒ i =1 ( y i − f ( x )) δ ( x − x i ) λ To solve this differential equation, we calculate the Green’s function G ( x, ξ ): d 4 G ( x,ξ ) = δ ( x − ξ ) dx 4 G ( x, ξ ) = | x − ξ | 3 + o ( x 2 ) ⇒ We are almost there... 8

Regularization continued The solution to d 4 f � N dx 4 = 1 i =1 ( y i − f ( x )) δ ( x − x i ) can now be λ constructed from the Green’s function: � 1 � N f ( x ) = i =1 ( y i − f ( ξ )) δ ( ξ − λ ) G ( x, ξ ) dξ λ � 1 � N i =1 ( y i − f ( ξ )) δ ( ξ − λ ) | x − ξ | 3 ) dξ = λ � N = 1 i =1 ( y i − f ( x i )) | x − x i | 3 λ The solution turns out to be the cubic spline ! Oh, one more thing: we need to consider the null space of φ . Nul ( φ ) = { ψ 1 , ψ 2 } = { 1 , x } ( k = 2) N k y i − f ( x i ) � � f ( x ) = G ( x, x i ) + d α ψ α ( x ) λ α =1 i =1 9

Solving for the weights The general solution for minimizing H [ f ] = E [ f ] + λφ [ f ] is: N k � � f ( x ) = w i G ( x, x i ) + d α ψ α ( x ) α =1 i =1 w i = y i − f ( x i ) ( ∗ ) λ where G is the Green’s function for the differential operator φ , k is the dimension of the null space of φ , and ψ α ’s are the members of the null space. But how do we calculate w i ? ( ∗ ) ⇒ λw i = y i − f ( x i ) ⇒ y i = f ( x i ) + λw i 10

Computing w i continued y i = f ( x i ) + λw i       � N i =1 w i G ( x 1 , x i ) y 1 w 1 . . .        + Ψ T d + λ . . .  =       . . .           � N i =1 w i G ( x N , x i ) y N w N       G ( x 1 , x 1 ) G ( x 1 , x N ) y 1 . . . w 1 . . . .        + Ψ T d + λw . . . .  =       . . . .           G ( x N , x 1 ) G ( x N , x N ) y N . . . w N 11

Computing w i continued The last statement in matrix form: y = ( G + λI ) w + Ψ T d 0 = Ψ d or,        G + λI Ψ  w  y  =   Ψ T 0 d 0 In the special case when the null space is empty (such as the Gaussian kernel), w = ( G + λI ) − 1 y 12

Interpretations of regularization The regularized solutions can be understood as: 1. Interpolation with kernels 2. Neural networks (Regularization networks) 3. Data smoothing ( equivalent kernels as convolution filters) 13

More stablizers Various interpolation methods and neural networks can be derived from regularization theory: • If we require that φ [ f ( x )] = φ [ f ( Rx )], where R is a rotation matrix, G is radial symmetric. It is the Radial Basis Function (RBF). This reflects a priori assumption that all variables have the same relevance, and there are no priviledged directions. • If 2 � | s | 2 � � � ˜ φ [ f ] = f ( s ) e ds � � β � we get Gaussian kernels. • Thin plate splines, polynomial splines, multiquadric kernel . . . etc. 14

The probablistic interpretation of RN Suppose that g is a set of random samples drawn from the function f , in the presence of noise. • P [ f | g ] is the probability of function f given the examples g . • P [ g | f ] is the the model of noise. We assume Gaussian noise, so � i ( y i − f ( x i )) 2 1 P [ g | f ] ∝ e − 2 σ 2 • P [ f ] the a priori probability of f . This embodies our a priori knowledge of the function. Let P [ f ] ∝ e − αφ [ f ] . 15

Probabilistic interpretation cont. By the Bayes Rule, P [ f | g ] ∝ P [ g | f ] P [ f ] 2 α 2 ( � i ( y i − f ( x i )) 2 +2 ασ 2 φ [ f ] ) 1 ∝ e − The MAP estimate of f is therefore the minimizer of: ( y i − f ( x i )) 2 + λφ [ f ] � H [ f ] = i where λ = 2 σ 2 α . It determines the trade-off between the level of noise and the strength of the a priori assumption about the solution. 16

Generalized Regularization Networks w = ( G + λI ) − 1 y but calculating ( G + λI ) − 1 can be costly, if the number of data points is large. Generalized Regularization Networks approximates the regularized solution by using fewer kernel functions. 17

Applications in early vision Edge detection Optical flow Surface reconstruction Stereo ...etc. 18

Girosi, Jones, and Poggio Regularization theory and neural network - PowerPoint PPT Presentation

Girosi, Jones, and Poggio Regularization theory and neural network architectures presented by Hsin-Hao Yu Department of Cognitive Science October 4, 2001 Learning as function approximation Goal: Given sparse, noisy samples of a function f ,

Online Learning Tomaso Poggio and Lorenzo Rosasco 9.520 Class 15 March 30 2011 T. Poggio and L.

The Learning Problem and Regularization Tomaso Poggio 9.520 Class 02 September 2015 Tomaso

The Learning Problem and Regularization Tomaso Poggio 9.520 Class 02 February 2011 Tomaso

The Learning Problem and Regularization Tomaso Poggio 9.520 Class 02 September 2014 Tomaso

The Learning Problem and Regularization Tomaso Poggio 9.520 Class 02 February 2011 Tomaso

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Introduction CSCE 970 CSCE 970 Lecture 3: Lecture 3: Regularization Regularization CSCE 970

Regularization Regularization is a general approach to add a complexity parameter to a

Regularization Overview Regularization Overview Problems & Multicollinearity We will

Generalization Bounds and Stability Lorenzo Rosasco Tomaso Poggio 9.520 Class 6 February, 23

ITALY Tuscany San Miniato - Pisa Tenuta di Poggio is an historical winery, dating back to

STOPLOSS JONES TUBE STOPLOSS JONES TUBE WHAT IS A JONES TUBE USED FOR? Jones tubes are

9.54 class 8 Supervised learning Optimization, regularization, kernels Shimon Ullman + Tomaso

Dow Jones Sustainability Indexes A cooperation of Dow Jones Indexes and SAM Content Key

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Overview SVM theoretical framework ORACLE data mining technology SVM parameter

High Quality Real Time Image Processing Framework on Mobile Platforms using Tegra K1 Eyal Hirsch

Modeling and Correcting the Time- Dependent ACS PSF for Weak Lensing Jason Rhodes, JPL With:

Recovering and Reprocessing Resources from Waste Tabled on 6 June 2019 This presentation

Particle filtering in geophysical systemes: Problems and potential solutions Peter Jan van

The Kernel Matrix Diffie-Hellman Assumption Carla Rfols 1 , Paz Morillo 2 and Jorge L. Villar 2 1

Kernel Operations Assignment 02 Graphics Programming By Akarsh Kumar Images Chosen Box

Feature Detection Adam Kalisz 18 September 2016 This handout accompanies the presentation about

Sambuz

Useful Links

Newsletter

Mail Us

Girosi, Jones, and Poggio Regularization theory and neural network - PowerPoint PPT Presentation

Girosi, Jones, and Poggio Regularization theory and neural network architectures presented by Hsin-Hao Yu Department of Cognitive Science October 4, 2001 Learning as function approximation Goal: Given sparse, noisy samples of a function f ,

Online Learning Tomaso Poggio and Lorenzo Rosasco 9.520 Class 15 March 30 2011 T. Poggio and L.

The Learning Problem and Regularization Tomaso Poggio 9.520 Class 02 September 2015 Tomaso

The Learning Problem and Regularization Tomaso Poggio 9.520 Class 02 February 2011 Tomaso

The Learning Problem and Regularization Tomaso Poggio 9.520 Class 02 September 2014 Tomaso

The Learning Problem and Regularization Tomaso Poggio 9.520 Class 02 February 2011 Tomaso

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Introduction CSCE 970 CSCE 970 Lecture 3: Lecture 3: Regularization Regularization CSCE 970

Regularization Regularization is a general approach to add a complexity parameter to a

Regularization Overview Regularization Overview Problems &amp; Multicollinearity We will

Generalization Bounds and Stability Lorenzo Rosasco Tomaso Poggio 9.520 Class 6 February, 23

ITALY Tuscany San Miniato - Pisa Tenuta di Poggio is an historical winery, dating back to

STOPLOSS JONES TUBE STOPLOSS JONES TUBE WHAT IS A JONES TUBE USED FOR? Jones tubes are

9.54 class 8 Supervised learning Optimization, regularization, kernels Shimon Ullman + Tomaso

Dow Jones Sustainability Indexes A cooperation of Dow Jones Indexes and SAM Content Key

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Overview SVM theoretical framework ORACLE data mining technology SVM parameter

High Quality Real Time Image Processing Framework on Mobile Platforms using Tegra K1 Eyal Hirsch

Modeling and Correcting the Time- Dependent ACS PSF for Weak Lensing Jason Rhodes, JPL With:

Recovering and Reprocessing Resources from Waste Tabled on 6 June 2019 This presentation

Particle filtering in geophysical systemes: Problems and potential solutions Peter Jan van

The Kernel Matrix Diffie-Hellman Assumption Carla Rfols 1 , Paz Morillo 2 and Jorge L. Villar 2 1

Kernel Operations Assignment 02 Graphics Programming By Akarsh Kumar Images Chosen Box

Feature Detection Adam Kalisz 18 September 2016 This handout accompanies the presentation about

Sambuz

Useful Links

Newsletter

Mail Us

Regularization Overview Regularization Overview Problems & Multicollinearity We will