9 54 class 4
play

9.54 class 4 Supervised learning Shimon Ullman + Tomaso Poggio - PowerPoint PPT Presentation

9.54 class 4 Supervised learning Shimon Ullman + Tomaso Poggio Danny Harari + Daneil Zysman + Darren Seibert 9.54, fall semester 2014 Intro 9.54, fall semester 2014 An old and simple model of supervised learning associate b to a


  1. 9.54 class 4 Supervised learning � Shimon Ullman + Tomaso Poggio Danny Harari + Daneil Zysman + Darren Seibert 9.54, fall semester 2014

  2. Intro 9.54, fall semester 2014

  3. An old and simple model of supervised learning � associate b to a and store: Z φ b,a ( x ) = b ∗ a = b ( ξ ) a ( x − ξ ) d ξ retrieve output b from input a — if a � a ⇡ a Z a � φ b,a ( x ) = a ( τ ) φ b,a ( τ + x ) d τ ⇡ a 9.54, fall semester 2014

  4. An old and simple model of supervised learning � when X φ ( x ) = b i ∗ a i a j � a i ⇡ δ i,j retrieve output b from input a — if a j � φ ⇡ b j It is a special case… � 9.54, fall semester 2014

  5. Linear 9.54, fall semester 2014

  6. “Linear” learning � Suppose x i ∈ R n and y i ∈ R m , i = 1 , · · · , N Define ( x 1 , · · · , x N ) = X and ( y 1 , · · · , y N ) = Y Find linear operator (eg a matrix) such that MX = Y 9.54, fall semester 2014

  7. “Linear” learning � If X − 1 exists, then M = Y X − 1 MX = Y ⇒ = If X − 1 does not exists, then M = Y X † MX = Y ⇒ = where the pseudo inverse is the solution of min || MX − Y || F X p with | a i,j | 2 ) || A || F = ( i,j X † = ( X T X ) − 1 X T and if X is full column rank 9.54, fall semester 2014

  8. “Linear” learning is linear regression � If e.g. the output y is scalar, then m = 1 ⇒ y = m T x = X Mx = y m i x i = i with M = XY − 1 9.54, fall semester 2014

  9. Nonlinear 9.54, fall semester 2014

  10. Nonlinear learning � Suppose x i ∈ R n and y i ∈ R m , i = 1 , · · · , N Define ( x 1 , · · · , x N ) = X and ( y 1 , · · · , y N ) = Y Find operator N such that N � X = Y In general impossible but…assume N is in the class of polynomial mappings of degree k in the vector space V (over the real field)…eg N has a convergent Taylor series expansion Weierstrass theorem ensures approximation of any continuous function 10

  11. Nonlinear learning � Y = L o + L 1 ( X ) + L 2 ( X, X ) + ... + L k ( X, ..., X ) f(x) is a polynomial with all monomials as in this 2D example y = a 1 x 1 + a 2 x 2 + b 1 x 2 1 + b 12 x 1 x 2 + · · · 11

  12. Classification and Regression 9.54, fall semester 2014

  13. 13

  14. y = sign ( Mx )

  15. In our language: is L 1 enough?

  16. XOR function y = sign ( L 1 x + L 2 ( x, x )) = sign ( a 1 u 1 + a 2 u 2 + bu 1 u 2 ) = sign ( u 1 u 2 ) is in fact enough. This corresponds to a universal, one-hidden layer network output layer all monomials input variables

  17. A few non-standard remarks • Regression is king, Gauss knew everything… � • Perhaps no need of multiple layers…are 2 layers universal? � • An interesting junction here RBFs MLPs

  18. Radial Basis Functions 9.54, fall semester 2014

  19. Nonlinear learning � Later we will see that RBF expansions are a good approximation of functions in high dimensions: N c k e − || x k − x || 2 X k =1 • RBF can be written as a 1-hidden layer network � � � • RBF is a rewriting of our polynomial (infinite ∞ x k − x || 2 n x k − x || 2 = || ˆ radius of convergence) X e || ˆ n ! n =0

  20. Memory-based computation c i e − || x − xi || 2 X X f ( x ) = c i G ( x, x i ) = 2 σ 2 i i The training set is ( x 1 , · · · , x N ) = X and ( y 1 , · · · , y N ) = Y � e − || x − xi || 2 Suppose now that : then it is a → δ ( x − x i ) 2 σ 2 memory, a lookup table ( if x = x i y, f ( x ) = 0 , if x 6 = x i 9.54, fall semester 2014

  21. Memory-based computation Of course learning is much more than memory but in this model the difference is between a Gaussian and a delta function 9.54, fall semester 2014

  22. c i e − || x − xi || 2 X X f ( x ) = c i G ( x, x i ) = 2 σ 2 i i From Learning-from-Examples to 
 View-based Networks for Object Recognition Σ VIEW ANGLE Poggio, Edelman Nature , 1990.

  23. Recording Sites in Anterior IT Logothetis, Pauls, and Poggio, 1995

  24. Garfield 9.54, fall semester 2014

  25. Image Analysis ⇒ Bear (0° view) ⇒ Bear (45° view)

  26. Image Synthesis UNCONVENTIONAL GRAPHICS Θ = 0° view ⇒ Θ = 45° view ⇒

  27. 34

  28. Hyperbf 9.54, fall semester 2014

  29. 36

  30. Cartooon male 9.54, fall semester 2014

  31. A toy problem: Gender Classification

  32. Brunelli, Poggio ’91 (IRST, MIT)

  33. An example: HyperBF and gender classification Some of the geometrical feature (white) used in the gender classification experiments

  34. HyperBF and gender classification Typical stimuli used in the (informal!) psychophysical experiments of gender classification (about 90% correct)

  35. Figure 3: Feature weights for gender classification as computed by the HyperBF networks

  36. Radial Basis Functions and MLPs 9.54, fall semester 2014

  37. Sigmoidal units are radial basis functions (for normalized inputs) � || x − w || 2 = || x || 2 + || w || 2 − 2( x · w ) Since If || x || = 1 ( x · w ) = 1 + || w || 2 − || x − w || 2 2 σ ( w · x + b ) is a radial function and thus Consider the MLP units 1 σ ( x · w − θ ) = 1 + e − ( x · w − θ )

  38. Sigmoidal units are radial basis functions (for normalized inputs) � The corresponding radial function is

  39. Sigmoidal units are radial basis functions (for normalized inputs) �

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend