cs 4803 7643 deep learning
play

CS 4803 / 7643: Deep Learning Topics: (Finish) Computing Gradients - PowerPoint PPT Presentation

CS 4803 / 7643: Deep Learning Topics: (Finish) Computing Gradients Backprop in Conv Layers Forward mode vs Reverse mode AD Modern CNN Architectures Zsolt Kira Georgia Tech Administrivia HW1 due date moved! Due:


  1. CS 4803 / 7643: Deep Learning Topics: – (Finish) Computing Gradients – Backprop in Conv Layers – Forward mode vs Reverse mode AD – Modern CNN Architectures Zsolt Kira Georgia Tech

  2. Administrivia • HW1 due date moved! – Due: 02/18, 11:55pm • Project topic submissions – Submit by 02/21 to get comments – Form filled out with: • Members identified • Paragraph of problem and another paragraph of what has been done in the literature and approach (note: the approach can be selected from an existing paper and reimplemented) • Description of what each member will do • Link – A project idea: ICLR reproducibility challenge (https://reproducibility-challenge.github.io/iclr_2019/) • Official submission Jan but can still do it and submit later! (C) Dhruv Batra and Zsolt Kira 2

  3. • Google cloud credits out! (see piazza for details) • Clouderizer ties in with Google Cloud Platform (GCP) (C) Dhruv Batra and Zsolt Kira 3

  4. Matrix/Vector Derivatives Notation (C) Dhruv Batra and Zsolt Kira 4

  5. (C) Dhruv Batra and Zsolt Kira 5

  6. (C) Dhruv Batra and Zsolt Kiraand Zsolt Kira 6

  7. Plan for Today • Topics: – (Finish) Computing Gradients – Backprop in Conv Layers – Forward mode vs Reverse mode AD – Modern CNN Architectures (C) Dhruv Batra and Zsolt Kira 7

  8. Backprop in Convolutional Layers (C) Dhruv Batra and Zsolt Kira 8

  9. How do we compute gradients? • Analytic or “Manual” Differentiation • Symbolic Differentiation • Numerical Differentiation • Automatic Differentiation – Forward mode AD – Reverse mode AD • aka “backprop” (C) Dhruv Batra and Zsolt Kira 9

  10. Computational Graph x s (scores) * hinge L + loss W R Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  11. Computational Graph Any DAG of differentiable modules is allowed! (C) Dhruv Batra and Zsolt Kira 11 Slide Credit: Marc'Aurelio Ranzato

  12. Directed Acyclic Graphs (DAGs) • Exactly what the name suggests – Directed edges – No (directed) cycles – Underlying undirected cycles okay (C) Dhruv Batra and Zsolt Kira 12

  13. Directed Acyclic Graphs (DAGs) • Concept – Topological Ordering (C) Dhruv Batra and Zsolt Kira 13

  14. Directed Acyclic Graphs (DAGs) (C) Dhruv Batra and Zsolt Kira 14

  15. Numerical vs Analytic Gradients Numerical gradient : slow :(, approximate :(, easy to write :) Analytic gradient : fast :), exact :), error-prone :( In practice: Derive analytic gradient, check your implementation with numerical gradient. This is called a gradient check. Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  16. How do we compute gradients? • Analytic or “Manual” Differentiation • Symbolic Differentiation • Numerical Differentiation • Automatic Differentiation – Forward mode AD – Reverse mode AD • aka “backprop” (C) Dhruv Batra and Zsolt Kira 16

  17. Forward mode vs Reverse Mode • Key Computations (C) Dhruv Batra and Zsolt Kira 17

  18. Forward mode AD g 18

  19. Reverse mode AD g 19

  20. Example: Forward mode AD + sin( ) * x 1 x 2 (C) Dhruv Batra and Zsolt Kira 20

  21. Example: Forward mode AD + sin( ) * x 1 x 2 (C) Dhruv Batra and Zsolt Kira 21

  22. Example: Forward mode AD + sin( ) * x 1 x 2 (C) Dhruv Batra and Zsolt Kira 22

  23. Example: Forward mode AD + sin( ) * x 1 x 2 (C) Dhruv Batra and Zsolt Kira 23

  24. Example: Reverse mode AD + sin( ) * x 1 x 2 (C) Dhruv Batra and Zsolt Kira 24

  25. Example: Reverse mode AD + sin( ) * x 1 x 2 (C) Dhruv Batra and Zsolt Kira 25

  26. Forward Pass vs Forward mode AD vs Reverse Mode AD + sin( ) * x 1 x 2 + + sin( ) sin( ) * * x 1 x 2 x 1 x 2 (C) Dhruv Batra and Zsolt Kira 26

  27. Forward mode vs Reverse Mode • What are the differences? + + sin( ) sin( ) * * x 1 x 2 x 1 x 2 (C) Dhruv Batra and Zsolt Kira 27

  28. Forward mode vs Reverse Mode • What are the differences? • Which one is faster to compute? – Forward or backward? (C) Dhruv Batra and Zsolt Kira 28

  29. Forward mode vs Reverse Mode • What are the differences? • Which one is faster to compute? – Forward or backward? • Which one is more memory efficient (less storage)? – Forward or backward? (C) Dhruv Batra and Zsolt Kira 29

  30. Patterns in backward flow Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  31. Patterns in backward flow Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  32. Patterns in backward flow add gate: gradient distributor Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  33. Patterns in backward flow add gate: gradient distributor Q: What is a max gate? Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  34. Patterns in backward flow add gate: gradient distributor max gate: gradient router Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  35. Patterns in backward flow add gate: gradient distributor max gate: gradient router Q: What is a mul gate? Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  36. Patterns in backward flow add gate: gradient distributor max gate: gradient router mul gate: gradient switcher Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  37. Gradients add at branches + Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  38. Duality in Fprop and Bprop FPROP BPROP SUM + COPY + (C) Dhruv Batra and Zsolt Kira 38

  39. Modularized implementation: forward / backward API Graph (or Net) object (rough psuedo code) 39 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  40. Modularized implementation: forward / backward API x z * y (x,y,z are scalars) 40 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  41. Modularized implementation: forward / backward API x z * y (x,y,z are scalars) 41 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  42. Example: Caffe layers Caffe is licensed under BSD 2-Clause 42 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  43. Caffe Sigmoid Layer * top_diff (chain rule) Caffe is licensed under BSD 2-Clause 43 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  44. (C) Dhruv Batra and Zsolt Kira 44 Figure Credit: Andrea Vedaldi

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend