bbm406
play

BBM406 Fundamentals of Machine Learning Lecture 12: Computational - PowerPoint PPT Presentation

Illustration: 3Blue1Brown BBM406 Fundamentals of Machine Learning Lecture 12: Computational Graph Backpropagation Aykut Erdem // Hacettepe University // Fall 2019 Last time Multilayer Perceptron Layer Representation y W 4 y i =


  1. Illustration: 3Blue1Brown BBM406 Fundamentals of 
 Machine Learning Lecture 12: Computational Graph Backpropagation Aykut Erdem // Hacettepe University // Fall 2019

  2. Last time… 
 Multilayer Perceptron • Layer Representation y W 4 y i = W i x i x i +1 = σ ( y i ) x4 W 3 • (typically) iterate between 
 x3 linear mapping Wx and 
 W 2 nonlinear function x2 • Loss function 
 l ( y, y i ) W 1 to measure quality of 
 estimate so far slide by Alex Smola x1 2

  3. Last time… Forward Pass • Output of the network can be written as: D X X h j ( x ) = f ( v j 0 + x i v ji ) i =1 J slide by Raquel Urtasun, Richard Zemel, Sanja Fidler X o k ( x ) = g ( w k 0 + h j ( x ) w kj ) j =1 (j indexing hidden units, k indexing the output units, D number of inputs) • Activation functions f , g : sigmoid/logistic, tanh, or rectified linear (ReLU) 1 + exp( − z ) , tanh ( z ) = exp( z ) − exp( − z ) 1 σ ( z ) = exp( z ) + exp( − z ) , ReLU ( z ) = max(0 , z ) 3

  4. 
 
 
 Last time… Forward Pass in Python • Example code for a forward pass for a 3-layer network in Python: 
 slide by Raquel Urtasun, Richard Zemel, Sanja Fidler • Can be implemented e ffi ciently using matrix operations • Example above: W 1 is matrix of size 4 × 3, W 2 is 4 × 4. What about biases and W 3 ? 4 [http://cs231n.github.io/neural-networks-1/]

  5. Backpropagation 5

  6. Recap: Loss function/Optimization TODO: 1. Define a loss function that quantifies our unhappiness with the scores across the training 3.42 -3.45 -0.51 data. 
 -8.87 4.64 6.04 0.09 2.65 5.31 2. Come up with a way of 2.9 5.1 -4.22 efficiently finding the 4.48 2.64 parameters that minimize the -4.19 loss function. (optimization) 8.02 slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson 5.55 3.58 3.78 -4.34 4.49 1.06 -1.5 -4.37 -0.36 -4.79 -2.09 -0.72 6.14 -2.93 We defined a (linear) score function: 6

  7. Softmax Classifier (Multinomial 7 Logistic Regression) slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  8. Softmax Classifier (Multinomial 8 Logistic Regression) slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  9. Softmax Classifier (Multinomial 9 Logistic Regression) slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  10. Softmax Classifier (Multinomial 10 Logistic Regression) slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  11. Softmax Classifier (Multinomial 11 Logistic Regression) slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  12. Softmax Classifier (Multinomial 12 Logistic Regression) slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  13. Softmax Classifier (Multinomial 13 Logistic Regression) slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  14. Softmax Classifier (Multinomial 14 Logistic Regression) slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  15. Softmax Classifier (Multinomial 15 Logistic Regression) slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  16. Softmax Classifier (Multinomial 16 Logistic Regression) slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  17. Optimization 17

  18. 18 Gradient Descent slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  19. Mini-batch Gradient Descent • only use a small portion of the training set to compute the gradient slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson 19

  20. Mini-batch Gradient Descent • only use a small portion of the training set to compute the gradient slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson there are also more fancy update formulas (momentum, Adagrad, RMSProp, Adam, …) 20

  21. The e ff ects of di ff erent update form formulas slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson 21 (image credits to Alec Radford)

  22. 22 slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  23. 23 slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  24. 24 L Computational Graph + R hinge loss s (scores) * x W slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  25. Convolutional Network (AlexNet) input image weights slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson loss 25

  26. 26 Neural Turing Machine loss input tape slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  27. 27 e.g. x = -2, y = 5, z = -4 slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  28. 28 e.g. x = -2, y = 5, z = -4 Want: slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  29. 29 e.g. x = -2, y = 5, z = -4 Want: slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  30. 30 e.g. x = -2, y = 5, z = -4 Want: slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  31. 31 e.g. x = -2, y = 5, z = -4 Want: slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  32. 32 e.g. x = -2, y = 5, z = -4 Want: slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  33. 33 e.g. x = -2, y = 5, z = -4 Want: slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  34. 34 e.g. x = -2, y = 5, z = -4 Want: slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  35. 35 e.g. x = -2, y = 5, z = -4 Want: slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  36. 36 e.g. x = -2, y = 5, z = -4 Want: slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  37. 37 Chain rule: e.g. x = -2, y = 5, z = -4 Want: slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  38. 38 e.g. x = -2, y = 5, z = -4 Want: slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  39. 39 Chain rule: e.g. x = -2, y = 5, z = -4 Want: slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  40. 40 f activations slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  41. 41 “local gradient” f activations slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  42. 42 gradients “local gradient” f activations slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  43. 43 gradients “local gradient” f activations slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  44. 44 gradients “local gradient” f activations slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  45. 45 gradients “local gradient” f activations slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  46. 46 Another example: slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  47. 47 Another example: slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  48. 48 Another example: slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  49. 49 Another example: slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  50. 50 Another example: slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  51. 51 Another example: slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  52. 52 Another example: slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  53. 53 Another example: slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  54. 54 Another example: slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  55. 55 (-1) * (-0.20) = 0.20 Another example: slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  56. 56 Another example: slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  57. Another example: [local gradient] x [its gradient] [1] x [0.2] = 0.2 [1] x [0.2] = 0.2 (both inputs!) slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson 57

  58. 58 Another example: slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  59. Another example: [local gradient] x [its gradient] x0: [2] x [0.2] = 0.4 w0: [-1] x [0.2] = -0.2 0.40 slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson 59

  60. 60 sigmoid function sigmoid gate 0.40 slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  61. sigmoid function slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson 0.40 sigmoid gate (0.73) * (1 - 0.73) = 0.2 61

  62. Patterns in backward flow • add gate: gradient distributor • max gate: gradient router • mul gate: gradient… “switcher”? slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson 62

  63. 63 Gradients add at branches + slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  64. Implementation: forward/backward API Graph (or Net) object. (Rough pseudo code) slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson 0.40 64

  65. Implementation: forward/backward API x z * y (x,y,z are scalars) slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson 65

  66. Implementation: forward/backward API x z * y (x,y,z are scalars) slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson 66

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend