9.54 Review Levels +Biophysics+ Supervised learning Shimon Ullman + - - PowerPoint PPT Presentation

9 54 review
SMART_READER_LITE
LIVE PREVIEW

9.54 Review Levels +Biophysics+ Supervised learning Shimon Ullman + - - PowerPoint PPT Presentation

9.54 Review Levels +Biophysics+ Supervised learning Shimon Ullman + Tomaso Poggio Danny Harari + Daneil Zysman + Darren Seibert 9.54, fall semester 2014 Vision A Computational Investigation into the Human Representation and Processing of Visual


slide-1
SLIDE 1

9.54, fall semester 2014

9.54 Review

Shimon Ullman + Tomaso Poggio

Danny Harari + Daneil Zysman + Darren Seibert

Levels +Biophysics+Supervised learning

slide-2
SLIDE 2
slide-3
SLIDE 3

Vision

A Computational Investigation into the Human Representation and Processing of Visual Information David Marr Foreword by Shimon Ullman Afterword by Tomaso Poggio

David Marr's posthumously published Vision (1982) influenced a generation of brain and cognitive scientists, inspiring many to enter the field. In Vision, Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood…… In Marr's framework, the process of vision constructs a set of representations ….A central theme, and one that has had far-reaching influence in both neuroscience and cognitive science, is the notion of different levels of analysis—in Marr's framework, the computational level, the algorithmic level, and the hardware implementation level. Now, thirty years later, the main problems that occupied Marr remain fundamental open problems in the study

  • f perception…..
slide-4
SLIDE 4

Levels of Understanding (1977) Computation Algorithms Wetware, hardware, circuits and components

slide-5
SLIDE 5

Levels of Understanding A case study

  • f levels of understanding:

the fly’s visual system

slide-6
SLIDE 6

Levels of Understanding (1977--2012) Evolution Learning/Development Computation Algorithms Wetware, hardware, circuits and components

Poggio, T. The Levels of Understanding framework, revised, MIT-CSAIL-TR-2012-014, CBCL-308,

slide-7
SLIDE 7
slide-8
SLIDE 8
slide-9
SLIDE 9
slide-10
SLIDE 10
slide-11
SLIDE 11
slide-12
SLIDE 12

e3 ANDNOT (i1 OR i2 OR i3)] OR [e2 ANDNOT (i1 OR i2) OR (e1 ANDNT i1 ) (e1 ANDNOT i1) OR (e2 ANDNOT i2 ) OR {[(e3 ANDNOT i3) OR (e4 ANDNOT i4) OR (e6 AND-NOT i6) OR (e6 AND- NOT i6)] ANDNOT i7 }

slide-13
SLIDE 13
slide-14
SLIDE 14

Thus Y − MX = 0 More in general look for M such that

min||Y − MX||2

if M = wT

rV (w) = 2(Y wT X)XT = 0 yields Y XT = wT XXT and wT = Y XT (XXT )−1

rV (w) = 2(Y wT X)XT + 2λwT = 0 yields Y XT = wT XXT + λwT and wT = Y XT (XXT + λI)−1

Now look for w such that

minw||Y − wT X||2 + λ||w||2

slide-15
SLIDE 15

Then Some simple linear algebra shows that

We can compute Cn or wn depending whether n ≤ p.

The above result is the most basic form of the Representer Theorem.

Example: representer theorem in the linear case

Math

wT = Y XT (XXT )−1 = Y (XT X)−1XT = CXT since XT (XXT )−1 = (XT X)−1XT f(x) = wT x = CXT x =

n

X

i

cixT

i x

slide-16
SLIDE 16

min

f∈H

1 n

n

X

i=1

(yi − f(xi))2

Stability and (Tikhonov) Regularization

Consider f(x) = wT x = Pp

j=1 wjxj, and R(f) = wT w,

min

f∈H

( 1 n

n

X

i=1

(yi f(xi))2 + λkfk2 ) Math

wT = Y XT (XXT + λI)−1

wT = Y XT (XXT )−1 λ||w||2 in the case of linear functions

slide-17
SLIDE 17

From Linear to Nonparametric Models

We have

Cn = (XnXT

n

| {z } +λnI)−1Yn (XnXT

n )i,j = xT i xj

Cn = ( Kn |{z} +λnI)−1Yn (Kn)i,j = K(xi, xj)

We can now consider a truly non parametric model

f(x) = X

j≥1

wjΦ(x)j =

n

X

i=1

K(x, xi) | {z } ci

X

j≥1

Φ(xi)jΦ(x)j

Math

slide-18
SLIDE 18

f(x) = X

i

ciK(xi, x) for K(x, y) =< x, y > gives

N

X

i

ci < x, xi >=

N

X

i

ci

D

X

j

xjxi

j = D

X

j

wjxj

with wj =

N

X

i

cixi

j

thus w = Xc For linear kernels ||f||2

K = cT Kc = cT XT Xc = wT w

slide-19
SLIDE 19
slide-20
SLIDE 20

Thus Y − MX = 0 More in general look for M such that

min||Y − MX||2

The solution is given by putting the gradient to zero

rV (M) = 2(Y MX)XT = 0 yielding Y XT = MXXT that is M = Y XT (XXT )−1

The solution is given by putting the gradient to zero

slide-21
SLIDE 21

min||Y − MX||2

How could minimization done in general, in practice, by the brain? Probably not by analytic solution…. The gradient offers a general way to compute a solution to a minimization problem

rV (M) = 2(Y MX)XT

finds the elements of M which correspond to

minV (M)

As an example let us look again at Using

dM dt = γrV (M)

slide-22
SLIDE 22

Then becomes Let us make the example more specific. Assume that are scalar

yi

M = wT

minmi,j||MX − Y ||2

and

minw∈Rd 1 n

n

X

1=1

(yi − wT xi)2

yielding

rV (M) = rV (wT

i ) = 2

n

n

X

i=1

(yi wT xi)xT

i

and thus

dwT dt = −γt

n

X

i=1

(yi − wT

t xi)xT i

slide-23
SLIDE 23

Discretizing time in we obtain

wT

t+1 = wT t − γt n

X

i=1

(yi − wT

t xi)xT i

dwT dt = −γt

n

X

i=1

(yi − wT

t xi)xT i

slide-24
SLIDE 24

Gradient descent has several nice properties but it is still not “biological”…

wT

t+1 = wT t − γt n

X

i=1

(yi − wT

t xi)xT i

can be written as

dwT dt = γt

n

X

i=1

rVi(w)

Stochastic gradient descent is…

dwT dt = γtrVi(w) , i = 1, · · · , n