introduction to deep learning
play

Introduction to Deep Learning A. G. Schwing & S. Fidler - PowerPoint PPT Presentation

Introduction to Deep Learning A. G. Schwing & S. Fidler University of Toronto, 2014 A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2014 1 / 35 Outline Universality of Neural Networks 1 Learning Neural


  1. Introduction to Deep Learning A. G. Schwing & S. Fidler University of Toronto, 2014 A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2014 1 / 35

  2. Outline Universality of Neural Networks 1 Learning Neural Networks 2 Deep Learning 3 Applications 4 References 5 A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2014 2 / 35

  3. What are neural networks? Let’s ask • Biological Input Hidden Output layer layer layer Input #1 Input #2 Output Input #3 • Computational Input #4 A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2014 3 / 35

  4. What are neural networks? ... Neural networks (NNs) are computational models inspired by biological neural networks [...] and are used to estimate or approximate functions ... [Wikipedia] A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2014 4 / 35

  5. What are neural networks? The people behind Origins: Traced back to threshold logic [W. McCulloch and W. Pitts, 1943] Perceptron [F . Rosenblatt, 1958] More recently: G. Hinton (UofT) Y. LeCun (NYU) A. Ng (Stanford) Y. Bengio (University of Montreal) J. Schmidhuber (IDSIA, Switzerland) R. Salakhutdinov (UofT) H. Lee (University of Michigan) R. Fergus (NYU) ... (and many more having made significant contributions; apologies for not being able to mention everyone and all the students behind) A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2014 5 / 35

  6. What are neural networks? Use cases Classification Playing video games Captcha Neural Turing Machine (e.g., learn how to sort) Alex Graves http://www.technologyreview.com/view/532156/googles-secretive-deepmind-startup-unveils-a-neural-turing-machine/ A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2014 6 / 35

  7. What are neural networks? Example: input x parameters w 1 , w 2 , b x ∈ R w 1 w 2 h 1 f b ∈ R A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2014 7 / 35

  8. How to compute the function? Forward propagation/pass, inference, prediction: Given input x and parameters w , b Compute latent variables/intermediate results in a feed-forward manner Until we obtain output function f x ∈ R w 1 w 2 h 1 f b ∈ R A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2014 8 / 35

  9. How to compute the function? Forward propagation/pass, inference, prediction: Given input x and parameters w , b Compute latent variables/intermediate results in a feed-forward manner Until we obtain output function f A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2014 8 / 35

  10. How to compute the function? Example: input x , parameters w 1 , w 2 , b x ∈ R w 1 w 2 h 1 f h 1 = σ ( w 1 · x + b ) = f w 2 · h 1 b ∈ R Sigmoid function: σ ( z ) = 1 / ( 1 + exp ( − z )) x = ln 2, b = ln 3, w 1 = 2, w 2 = 2 h 1 =? f =? A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2014 9 / 35

  11. How to compute the function? Given parameters, what is f for x = 0, x = 1, x = 2, ... f = w 2 σ ( w 1 · x + b ) 2 1.5 1 f 0.5 0 −5 0 5 x A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2014 10 / 35

  12. Let’s mess with parameters: x ∈ R w 1 w 2 h 1 f h 1 = σ ( w 1 · x + b ) f = w 2 · h 1 σ ( z ) = 1 / ( 1 + exp ( − z )) b ∈ R w 1 = 1 . 0 b = 0 1 1 0.8 0.8 0.6 0.6 w 1 = 0 f f 0.4 0.4 w 1 = 0.5 b = −2 w 1 = 1.0 0.2 0.2 b = 0 w 1 = 100 b = 2 0 0 −5 0 5 −5 0 5 x x Keep in mind the step function. A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2014 11 / 35

  13. How to use Neural Networks for binary classification? Feature/Measurement: x Output: How likely is the input to be a cat? 1 0.8 0.6 y 0.4 0.2 0 −5 0 5 x A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2014 12 / 35

  14. How to use Neural Networks for binary classification? Feature/Measurement: x Output: How likely is the input to be a cat? 1 0.8 0.6 f 0.4 0.2 0 −5 0 5 x A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2014 12 / 35

  15. How to use Neural Networks for binary classification? Feature/Measurement: x Output: How likely is the input to be a cat? 1 0.8 0.6 f 0.4 0.2 0 −5 0 5 x A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2014 12 / 35

  16. How to use Neural Networks for binary classification? Feature/Measurement: x Output: How likely is the input to be a cat? 1 0.8 0.6 f 0.4 0.2 0 −5 0 5 x A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2014 12 / 35

  17. How to use Neural Networks for binary classification? Shifted feature/measurement: x Output: How likely is the input to be a cat? Previous features Shifted features 1 1 0.8 0.8 0.6 0.6 f f 0.4 0.4 0.2 0.2 0 0 −5 0 5 −5 0 5 x x Learning/Training means finding the right parameters. A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2014 13 / 35

  18. So far we are able to scale and translate sigmoids. How well can we approximate an arbitrary function? With the simple model we are obviously not going very far. Features are good Features are noisy Simple classifier More complex classifier 1 1 0.8 0.8 0.6 0.6 f f 0.4 0.4 0.2 0.2 0 0 −5 0 5 −5 0 5 x x How can we generalize? A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2014 14 / 35

  19. Let’s use more hidden variables: b 1 h 1 = σ ( w 1 · x + b 1 ) h 1 h 2 = σ ( w 3 · x + b 2 ) w 1 w 2 x ∈ R f = w 2 · h 1 + w 4 · h 2 f w 3 w 4 h 2 b 2 Combining two step functions gives a bump. 2 1.8 1.6 f 1.4 1.2 1 −5 0 5 x w 1 = − 100, b 1 = 40, w 3 = 100, b 2 = 60, w 2 = 1, w 4 = 1 A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2014 15 / 35

  20. So let’s simplify: b 1 h 1 w 2 w 1 x ∈ R f w 3 w 4 h 2 b 2 Bump( x 1 , x 2 , h ) f We simplify a pair of hidden nodes to a “bump” function: Starts at x 1 Ends at x 2 Has height h A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2014 16 / 35

  21. Now we can represent “bumps” very well. How can we generalize? Bump(0 . 0, 0 . 2, h 1 ) Bump(0 . 2, 0 . 4, h 2 ) Bump(0 . 4, 0 . 6, h 3 ) f Bump(0 . 6, 0 . 8, h 4 ) Bump(0 . 8, 1 . 0, h 5 ) 1.5 1 0.5 f 0 Target Approximation −0.5 0 0.5 1 x More bumps gives more accurate approximation. Corresponds to a single layer network. A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2014 17 / 35

  22. Universality: theoretically we can approximate an arbitrary function So we can learn a really complex cat classifier Where is the catch? A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2014 18 / 35

  23. Universality: theoretically we can approximate an arbitrary function So we can learn a really complex cat classifier Where is the catch? Complexity, we might need quite a few hidden units Overfitting, memorize the training data A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2014 18 / 35

  24. Generalizations are possible to include more input dimensions capture more output dimensions employ multiple layers for more efficient representations See ‘http://neuralnetworksanddeeplearning.com/chap4.html’ for a great read! A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2014 19 / 35

  25. How do we find the parameters to obtain a good approximation? How do we tell a computer to do that? Intuitive explanation: Compute approximation error at the output Propagate error back by computing individual contributions of parameters to error [Fig. from H. Lee] A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2014 20 / 35

  26. Intuitive example: Target function: 5 x 2 Approximation: f w ( x ) Domain of interest: x ∈ [ 0 , 1 ] Error: � 1 ( 5 x 2 − f w ( x )) 2 dx e ( w ) = 0 Program of interest: � 1 ( 5 x 2 − f w ( x )) 2 dx min w e ( w ) = min w 0 A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2014 21 / 35

  27. Chain rule is important: w 1 , w 2 , x ∈ R Assume e ( w 1 , w 2 ) = f ( w 2 , h ( w 1 , x )) Derivatives are: ∂ e ( w 1 , w 2 ) = ∂ f ( w 2 , h ( w 1 , x )) ∂ w 2 ∂ w 2 ∂ e ( w 1 , w 2 ) ∂ f ( w 2 , h ( w 1 , x )) = ∂ w 1 ∂ w 1 ∂ h · ∂ h ∂ f = Chain rule ∂ w 1 A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2014 22 / 35

  28. Back propagation does not work well for deep networks: Diffusion of gradient signal (multiplication of many small numbers) Attractivity of many local minima (random initialization is very far from good points) Requires a lot of training samples Need for significant computational power Solution: 2 step approach Greedy layerwise pre-training Perform full fine tuning at the end A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2014 23 / 35

  29. Why go deep? Representation efficiency (fewer computational units for the same function) Hierarchical representation (non-local generalization) Combinatorial sharing (re-use of earlier computation) Works very well [Fig. from H. Lee] A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2014 24 / 35

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend