cs 4803 7643 deep learning
play

CS 4803 / 7643: Deep Learning Topics: Backpropagation - PowerPoint PPT Presentation

CS 4803 / 7643: Deep Learning Topics: Backpropagation Vector/Matrix/Tensor math Deriving vectorized gradients for ReLU Zsolt Kira Georgia Tech Administrivia PS1/HW1 out Start thinking about project topics/teams (C)


  1. CS 4803 / 7643: Deep Learning Topics: – Backpropagation – Vector/Matrix/Tensor math – Deriving vectorized gradients for ReLU Zsolt Kira Georgia Tech

  2. Administrivia • PS1/HW1 out • Start thinking about project topics/teams (C) Dhruv Batra & Zsolt Kira 2

  3. Do the Readings! (C) Dhruv Batra & Zsolt Kira 3

  4. Recap from last time (C) Dhruv Batra & Zsolt Kira 4

  5. Gradient Descent Pseudocode for i in {0,…,num_epochs}: for x, y in data: 𝑧 � � 𝑇𝑁 𝑋𝑦 𝑀 � 𝐷𝐹 𝑧 �, 𝑧 �� �� �? ? ? �� 𝑋 ≔ 𝑋 � 𝛽 �� Some design decisions: • How many examples to use to calculate gradient per iteration? • What should alpha (learning rate) be? • Should it be constant throughout? • How many epochs to run to?

  6. How to Simplify? • Calculating gradients for large functions is complicated • Step 1 : Decompose the function and compute local gradients for each part! • Step 2: Apply generic algorithm that computes gradients locally and uses chain rule to propagate across computation graph (C) Dhruv Batra & Zsolt Kira 6

  7. Computational Graph Any DAG of differentiable modules is allowed! (C) Dhruv Batra & Zsolt Kira 7 Slide Credit: Marc'Aurelio Ranzato

  8. Key Computation: Forward-Prop (C) Dhruv Batra & Zsolt Kira 8 Slide Credit: Marc'Aurelio Ranzato, Yann LeCun

  9. Key Computation: Back-Prop (C) Dhruv Batra & Zsolt Kira 9 Slide Credit: Marc'Aurelio Ranzato, Yann LeCun

  10. Neural Network Training • Step 1: Compute Loss on mini-batch [F-Pass] (C) Dhruv Batra & Zsolt Kira 10 Slide Credit: Marc'Aurelio Ranzato, Yann LeCun

  11. Neural Network Training • Step 1: Compute Loss on mini-batch [F-Pass] (C) Dhruv Batra & Zsolt Kira 11 Slide Credit: Marc'Aurelio Ranzato, Yann LeCun

  12. Neural Network Training • Step 1: Compute Loss on mini-batch [F-Pass] (C) Dhruv Batra & Zsolt Kira 12 Slide Credit: Marc'Aurelio Ranzato, Yann LeCun

  13. Neural Network Training • Step 1: Compute Loss on mini-batch [F-Pass] • Step 2: Compute gradients wrt parameters [B-Pass] (C) Dhruv Batra & Zsolt Kira 13 Slide Credit: Marc'Aurelio Ranzato, Yann LeCun

  14. Neural Network Training • Step 1: Compute Loss on mini-batch [F-Pass] • Step 2: Compute gradients wrt parameters [B-Pass] (C) Dhruv Batra & Zsolt Kira 14 Slide Credit: Marc'Aurelio Ranzato, Yann LeCun

  15. Neural Network Training • Step 1: Compute Loss on mini-batch [F-Pass] • Step 2: Compute gradients wrt parameters [B-Pass] (C) Dhruv Batra & Zsolt Kira 15 Slide Credit: Marc'Aurelio Ranzato, Yann LeCun

  16. Neural Network Training • Step 1: Compute Loss on mini-batch [F-Pass] • Step 2: Compute gradients wrt parameters [B-Pass] • Step 3: Use gradient to update parameters (C) Dhruv Batra & Zsolt Kira 16 Slide Credit: Marc'Aurelio Ranzato, Yann LeCun

  17. Backpropagation: a simple example e.g. x = -2, y = 5, z = -4 Chain rule: Want: Upstream Local gradient gradient 17 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  18. Patterns in backward flow add gate: gradient distributor max gate: gradient router mul gate: gradient switcher Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  19. Summary • We will have a composed non-linear function as our model – Several portions will have parameters • We will use (stochastic/mini-batch) gradient descent with a loss function to define our objective • Rather than analytically derive gradients for complex function, we will modularize computation – Back propagation = Gradient Descent + Chain Rule • Now: – Work through mathematical view – Vectors, matrices, and tensors – Next time: Can the computer do this for us automatically? • Read: – https://explained.ai/matrix-calculus/index.html – https://www.cc.gatech.edu/classes/AY2020/cs7643_fall/slides/L5_gradients _notes.pdf (C) Dhruv Batra and Zsolt Kira 19

  20. Matrix/Vector Derivatives Notation • Read: – https://explained.ai/matrix-calculus/index.html – https://www.cc.gatech.edu/classes/AY2020/cs7643_fall/slide s/L5_gradients_notes.pdf • Matrix/Vector Derivatives Notation • Vector Derivative Example • Extension to Tensors • Chain Rule: Composite Functions – Scalar Case – Vector Case – Jacobian view – Graphical view – Tensors • Logistic Regression Derivatives (C) Dhruv Batra & Zsolt Kira 20

  21. (C) Dhruv Batra & Zsolt Kira 21

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend