administrative a1 is due today midnight you can use up to
play

Administrative - A1 is due Today (midnight). You can use up to 3 - PowerPoint PPT Presentation

Administrative - A1 is due Today (midnight). You can use up to 3 late days - A2 will be up this Friday, its due next next Wednesday (Feb 4) - Project Proposal is due next Friday at midnight (~one paragraph (200-400 words), send as email)


  1. Administrative - A1 is due Today (midnight). You can use up to 3 late days - A2 will be up this Friday, it’s due next next Wednesday (Feb 4) - Project Proposal is due next Friday at midnight (~one paragraph (200-400 words), send as email) Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 1

  2. Lecture 5: Backprop and intro to Neural Nets Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 2

  3. Linear Classification SVM: Softmax: Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 3

  4. Optimization Landscape Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 4

  5. Gradient Descent Numerical gradient : slow :(, approximate :(, easy to write :) Analytic gradient : fast :), exact :), error-prone :( In practice: Derive analytic gradient, check your implementation with numerical gradient Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 5

  6. This class: Becoming a backprop ninja Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 6

  7. Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 7

  8. Example: x = 4, y = -3. => f(x,y) = -12 partial derivatives gradient Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 8

  9. Example: x = 4, y = -3. => f(x,y) = -12 partial derivatives gradient Question: If I increase x by h, how would the output of f change? Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 9

  10. Compound expressions: Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 10

  11. Compound expressions: Chain rule: Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 11

  12. Compound expressions: Chain rule: Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 12

  13. Compound expressions: Chain rule: Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 13

  14. Another example: Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 14

  15. Another example: -1/(1.37^2) = -0.53 Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 15

  16. Another example: [local gradient] x [its gradient] [1] x [-0.53] = -0.53 Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 16

  17. Another example: [local gradient] x [its gradient] [e^(-1)] x [-0.53] = -0.20 Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 17

  18. Another example: [local gradient] x [its gradient] [-1] x [-0.2] = 0.2 Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 18

  19. Another example: [local gradient] x [its gradient] [1] x [0.2] = 0.2 [1] x [0.2] = 0.2 (both inputs!) Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 19

  20. Another example: [local gradient] x [its gradient] x0: [2] x [0.2] ~= 0.4 w0: [-1] x [0.2] = -0.2 Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 20

  21. a gate hanging out Every gate during backprop computes, for all its inputs: [LOCAL GRADIENT] x [GATE GRADIENT] Can be computed right away, The gate receives this during even during forward pass backpropagation Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 21

  22. sigmoid function Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 22

  23. sigmoid function (0.73) * (1 - 0.73) = 0.2 Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 23

  24. sigmoid function Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 24

  25. sigmoid function Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 25

  26. We are ready: Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 26

  27. We are ready: Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 27

  28. forward pass was: Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 28

  29. forward pass was: Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 29

  30. forward pass was: Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 30

  31. forward pass was: Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 31

  32. forward pass was: Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 32

  33. forward pass was: Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 33

  34. forward pass was: Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 34

  35. forward pass was: Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 35

  36. Patterns in backward flow add gate: gradient distributor max gate: gradient router mul gate: gradient… “switcher”? Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 36

  37. Gradients for vectorized code Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 37

  38. Gradients for vectorized code X is [10 x 3], dD is [5 x 3] dW must be [5 x 10] dX must be [10 x 3] Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 38

  39. Gradients for vectorized code Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 39

  40. In summary - in practice it is rarely needed to derive long gradients of variables on pen and paper - structured your code in stages (layers), where you can derive the local gradients, then chain the gradients during backprop . - caveat: sometimes gradients simplify (e.g. for sigmoid, also softmax). Group these. Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 40

  41. NEURAL NETWORKS Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 41

  42. Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 42

  43. Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 43

  44. Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 44

  45. sigmoid activation function Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 45

  46. Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 46

  47. A Single Neuron can be used as a binary linear classifier Regularization has the interpretation of “gradual forgetting” Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 47

  48. Be very careful with your Brain analogies: Biological Neurons: - Many different types - Dendrites can perform complex non- linear computations - Synapses are not a single weight but a complex non-linear dynamical system - Rate code may not be adequate [Dendritic Computation. London and Hausser] Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 48

  49. Activation Functions Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 49

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend