9.54, fall semester 2014
9.54 class 8
Shimon Ullman + Tomaso Poggio
Danny Harari + Daneil Zysman + Darren Seibert
Supervised learning Optimization, regularization, kernels
9.54 class 8 Supervised learning Optimization, regularization, - - PowerPoint PPT Presentation
9.54 class 8 Supervised learning Optimization, regularization, kernels Shimon Ullman + Tomaso Poggio Danny Harari + Daneil Zysman + Darren Seibert 9.54, fall semester 2014 The Regularization Kingdom Loss functions and empirical risk
9.54, fall semester 2014
Shimon Ullman + Tomaso Poggio
Danny Harari + Daneil Zysman + Darren Seibert
Supervised learning Optimization, regularization, kernels
Math
n
i=1 V (f(xi), yi)
Cucker and Smale, AMS 2001
j=1 xjwj
j=1 Φ(x)jwj
j1 Φ(x)jwj = P i1 K(x, xi)αi
K(x, x0) is a symmetric positive definite function called reproducing kernel
parametric non-parametric
j=1 xjwj
j=1 Φ(x)jwj
j1 Φ(x)jwj = P i1 K(x, xi)αi
K(x, x0) is a symmetric positive definite function called reproducing kernel
parametric semi-parametric
j=1 xjwj
j=1 Φ(x)jwj
j1 Φ(x)jwj = P i1 K(x, xi)αi
K(x, x0) is a symmetric positive definite function called reproducing kernel
parametric semi-parametric non-parametric
f∈H IS[f] = min f∈H
n
i=1
f∈H IS[f] = min f∈H
n
i=1
f∈H ES[f] = min f∈H
n
i=1
Training set
house.) (living area of
Learning algorithm h predicted y x
(predicted price)
Ill posed problems often arise if one tries to infer general laws from few data the hypothesis space is too large there are not enough data
In general ERM leads to ill-posed solutions because the solution may be too complex it may be not unique it may change radically when leaving one sample out
Jacques Hadamard
mathematical foundation of regularized learning algorithms (Cucker and Smale, Vapnik and Chervonenkis, ).
(Algorithms, Complexity) and Mathematics (Optimization, Probability, Statistics).
understanding condition under which ERM can solve
f∈H{ 1
n
i=1
S
regularization parameter regularizer
min
f∈H
1 n
n
X
i=1
(yi − f(xi))2
j=1 wjxj, and R(f) = wT w,
min
f∈H
( 1 n
n
X
i=1
(yi f(xi))2 + λkfk2 ) Math
wT = Y XT (XXT + λI)−1
p
j=1
linear model
p
j=1
generalized linear model
Math
n
i
i x
p
j=1
nxj = n
i=1
i x
Pp
j=1 xj i xj
j≥1
n
i=1
j≥1
Math
n
n )i,j = xT i xj
j≥1
n
i=1
j≥1
Math
σ2
p
j=1
Note: An RKHS is equivalently defined as a Hilbert space where the evaluation functionals are continuous.
Given K, 9! Hilbert space of functions (H, h·, ·i) such that,
The norm of a function f(x) = Pn
i=1 K(x, xi)ci is given by
kfk2 =
n
X
i,j=1
K(xj, xi)cicj and is a natural complexity measure.
Math
For most loss functions the solution of Tikhonov regularization is of the form
n
i=1
n
i=1
f
data fit term
complexity/smoothness term
Hebbian mechanisms can be used for biological supervised learning (Knudsen, 1990)