9.54, fall semester 2014
9.54 Review
Shimon Ullman + Tomaso Poggio
Danny Harari + Daneil Zysman + Darren Seibert
Levels +Biophysics+Supervised learning
9.54 Review Levels +Biophysics+ Supervised learning Shimon Ullman + - - PowerPoint PPT Presentation
9.54 Review Levels +Biophysics+ Supervised learning Shimon Ullman + Tomaso Poggio Danny Harari + Daneil Zysman + Darren Seibert 9.54, fall semester 2014 Vision A Computational Investigation into the Human Representation and Processing of Visual
9.54, fall semester 2014
Shimon Ullman + Tomaso Poggio
Danny Harari + Daneil Zysman + Darren Seibert
Levels +Biophysics+Supervised learning
Vision
A Computational Investigation into the Human Representation and Processing of Visual Information David Marr Foreword by Shimon Ullman Afterword by Tomaso Poggio
David Marr's posthumously published Vision (1982) influenced a generation of brain and cognitive scientists, inspiring many to enter the field. In Vision, Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood…… In Marr's framework, the process of vision constructs a set of representations ….A central theme, and one that has had far-reaching influence in both neuroscience and cognitive science, is the notion of different levels of analysis—in Marr's framework, the computational level, the algorithmic level, and the hardware implementation level. Now, thirty years later, the main problems that occupied Marr remain fundamental open problems in the study
Poggio, T. The Levels of Understanding framework, revised, MIT-CSAIL-TR-2012-014, CBCL-308,
e3 ANDNOT (i1 OR i2 OR i3)] OR [e2 ANDNOT (i1 OR i2) OR (e1 ANDNT i1 ) (e1 ANDNOT i1) OR (e2 ANDNOT i2 ) OR {[(e3 ANDNOT i3) OR (e4 ANDNOT i4) OR (e6 AND-NOT i6) OR (e6 AND- NOT i6)] ANDNOT i7 }
Thus Y − MX = 0 More in general look for M such that
min||Y − MX||2
rV (w) = 2(Y wT X)XT = 0 yields Y XT = wT XXT and wT = Y XT (XXT )−1
rV (w) = 2(Y wT X)XT + 2λwT = 0 yields Y XT = wT XXT + λwT and wT = Y XT (XXT + λI)−1
Now look for w such that
minw||Y − wT X||2 + λ||w||2
Example: representer theorem in the linear case
Math
n
i
i x
min
f∈H
1 n
n
X
i=1
(yi − f(xi))2
j=1 wjxj, and R(f) = wT w,
min
f∈H
( 1 n
n
X
i=1
(yi f(xi))2 + λkfk2 ) Math
wT = Y XT (XXT + λI)−1
n
n )i,j = xT i xj
j≥1
n
i=1
j≥1
Math
f(x) = X
i
ciK(xi, x) for K(x, y) =< x, y > gives
N
X
i
ci < x, xi >=
N
X
i
ci
D
X
j
xjxi
j = D
X
j
wjxj
N
i
j
K = cT Kc = cT XT Xc = wT w
Thus Y − MX = 0 More in general look for M such that
min||Y − MX||2
The solution is given by putting the gradient to zero
rV (M) = 2(Y MX)XT = 0 yielding Y XT = MXXT that is M = Y XT (XXT )−1
The solution is given by putting the gradient to zero
min||Y − MX||2
How could minimization done in general, in practice, by the brain? Probably not by analytic solution…. The gradient offers a general way to compute a solution to a minimization problem
rV (M) = 2(Y MX)XT
finds the elements of M which correspond to
As an example let us look again at Using
Then becomes Let us make the example more specific. Assume that are scalar
yi
M = wT
and
minw∈Rd 1 n
n
X
1=1
(yi − wT xi)2
yielding
rV (M) = rV (wT
i ) = 2
n
n
X
i=1
(yi wT xi)xT
i
and thus
dwT dt = −γt
n
X
i=1
(yi − wT
t xi)xT i
Discretizing time in we obtain
t+1 = wT t − γt n
i=1
t xi)xT i
dwT dt = −γt
n
X
i=1
(yi − wT
t xi)xT i
Gradient descent has several nice properties but it is still not “biological”…
t+1 = wT t − γt n
i=1
t xi)xT i
can be written as
dwT dt = γt
n
X
i=1
rVi(w)
Stochastic gradient descent is…