CS 4803 / 7643: Deep Learning Topics: (Finish) Computing Gradients - PowerPoint PPT Presentation

CS 4803 / 7643: Deep Learning Topics: – (Finish) Computing Gradients – Backprop in Conv Layers – Forward mode vs Reverse mode AD – Modern CNN Architectures Zsolt Kira Georgia Tech

Administrivia • HW1 due date moved! – Due: 02/18, 11:55pm • Project topic submissions – Submit by 02/21 to get comments – Form filled out with: • Members identified • Paragraph of problem and another paragraph of what has been done in the literature and approach (note: the approach can be selected from an existing paper and reimplemented) • Description of what each member will do • Link – A project idea: ICLR reproducibility challenge (https://reproducibility-challenge.github.io/iclr_2019/) • Official submission Jan but can still do it and submit later! (C) Dhruv Batra and Zsolt Kira 2

• Google cloud credits out! (see piazza for details) • Clouderizer ties in with Google Cloud Platform (GCP) (C) Dhruv Batra and Zsolt Kira 3

Matrix/Vector Derivatives Notation (C) Dhruv Batra and Zsolt Kira 4

(C) Dhruv Batra and Zsolt Kira 5

(C) Dhruv Batra and Zsolt Kiraand Zsolt Kira 6

Plan for Today • Topics: – (Finish) Computing Gradients – Backprop in Conv Layers – Forward mode vs Reverse mode AD – Modern CNN Architectures (C) Dhruv Batra and Zsolt Kira 7

Backprop in Convolutional Layers (C) Dhruv Batra and Zsolt Kira 8

How do we compute gradients? • Analytic or “Manual” Differentiation • Symbolic Differentiation • Numerical Differentiation • Automatic Differentiation – Forward mode AD – Reverse mode AD • aka “backprop” (C) Dhruv Batra and Zsolt Kira 9

Computational Graph x s (scores) * hinge L + loss W R Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Computational Graph Any DAG of differentiable modules is allowed! (C) Dhruv Batra and Zsolt Kira 11 Slide Credit: Marc'Aurelio Ranzato

Directed Acyclic Graphs (DAGs) • Exactly what the name suggests – Directed edges – No (directed) cycles – Underlying undirected cycles okay (C) Dhruv Batra and Zsolt Kira 12

Directed Acyclic Graphs (DAGs) • Concept – Topological Ordering (C) Dhruv Batra and Zsolt Kira 13

Directed Acyclic Graphs (DAGs) (C) Dhruv Batra and Zsolt Kira 14

Numerical vs Analytic Gradients Numerical gradient : slow :(, approximate :(, easy to write :) Analytic gradient : fast :), exact :), error-prone :( In practice: Derive analytic gradient, check your implementation with numerical gradient. This is called a gradient check. Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

How do we compute gradients? • Analytic or “Manual” Differentiation • Symbolic Differentiation • Numerical Differentiation • Automatic Differentiation – Forward mode AD – Reverse mode AD • aka “backprop” (C) Dhruv Batra and Zsolt Kira 16

Forward mode vs Reverse Mode • Key Computations (C) Dhruv Batra and Zsolt Kira 17

Forward mode AD g 18

Reverse mode AD g 19

Example: Forward mode AD + sin( ) * x 1 x 2 (C) Dhruv Batra and Zsolt Kira 20

Example: Reverse mode AD + sin( ) * x 1 x 2 (C) Dhruv Batra and Zsolt Kira 24

Example: Reverse mode AD + sin( ) * x 1 x 2 (C) Dhruv Batra and Zsolt Kira 25

Forward Pass vs Forward mode AD vs Reverse Mode AD + sin( ) * x 1 x 2 + + sin( ) sin( ) * * x 1 x 2 x 1 x 2 (C) Dhruv Batra and Zsolt Kira 26

Forward mode vs Reverse Mode • What are the differences? + + sin( ) sin( ) * * x 1 x 2 x 1 x 2 (C) Dhruv Batra and Zsolt Kira 27

Forward mode vs Reverse Mode • What are the differences? • Which one is faster to compute? – Forward or backward? (C) Dhruv Batra and Zsolt Kira 28

Forward mode vs Reverse Mode • What are the differences? • Which one is faster to compute? – Forward or backward? • Which one is more memory efficient (less storage)? – Forward or backward? (C) Dhruv Batra and Zsolt Kira 29

Patterns in backward flow Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Patterns in backward flow add gate: gradient distributor Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Patterns in backward flow add gate: gradient distributor Q: What is a max gate? Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Patterns in backward flow add gate: gradient distributor max gate: gradient router Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Patterns in backward flow add gate: gradient distributor max gate: gradient router Q: What is a mul gate? Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Patterns in backward flow add gate: gradient distributor max gate: gradient router mul gate: gradient switcher Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Gradients add at branches + Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Duality in Fprop and Bprop FPROP BPROP SUM + COPY + (C) Dhruv Batra and Zsolt Kira 38

Modularized implementation: forward / backward API Graph (or Net) object (rough psuedo code) 39 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Modularized implementation: forward / backward API x z * y (x,y,z are scalars) 40 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Modularized implementation: forward / backward API x z * y (x,y,z are scalars) 41 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Example: Caffe layers Caffe is licensed under BSD 2-Clause 42 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Caffe Sigmoid Layer * top_diff (chain rule) Caffe is licensed under BSD 2-Clause 43 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

(C) Dhruv Batra and Zsolt Kira 44 Figure Credit: Andrea Vedaldi

CS 4803 / 7643: Deep Learning Topics: (Finish) Computing Gradients - PowerPoint PPT Presentation

CS 4803 / 7643: Deep Learning Topics: (Finish) Computing Gradients Backprop in Conv Layers Forward mode vs Reverse mode AD Modern CNN Architectures Zsolt Kira Georgia Tech Administrivia HW1 due date moved! Due:

CS 4803 / 7643: Deep Learning Website: http://www.cc.gatech.edu/classes/AY2020/cs7643_spring/

CS 4803 / 7643: Deep Learning Website: https://www.cc.gatech.edu/classes/AY2020/cs7643_fall/

CS 4803 / 7643: Deep Learning Topics: Image Classification Supervised Learning view

CS 4803 / 7643: Deep Learning Topics: Structured representations with graph networks Zsolt

CS 4803 / 7643: Deep Learning Topics: Dynamic Programming (Q-Value Iteration)

CS 4803 / 7643: Deep Learning Topics: Moving beyond supervised learning Zsolt Kira Georgia

CS 4803 / 7643: Deep Learning Topic: Reinforcement Learning (RL) Overview Markov

CS 4803 / 7643: Deep Learning Topics: Policy Gradients Actor Critic Ashwin Kalyan

CS 4803 / 7643: Deep Learning Guest Lecture: Embeddings and world2vec Feb. 18 th 2020 Ledell Wu

CS 4803 / 7643: Deep Learning Topics: Forward and backward though conv (Beginning) of

CS 4803 / 7643: Deep Learning Topics: Specifying Layers Forward & Backward

CS 4803 / 7643: Deep Learning Topics: (Continue) Low-label ML Formulations Zsolt Kira

CS 4803 / 7643: Deep Learning Topics: Application: PointGoal Navigation Trust Region

CS 4803 / 7643: Deep Learning Topics: Low-label ML Formulations Zsolt Kira Georgia Tech

CS 4803 / 7643: Deep Learning Topics: Backpropagation Vector/Matrix/Tensor math

CS 4803 / 7643: Deep Learning Topics: Specifying Layers Forward & Backward

Computer Applications Lab Computer Applications Lab Lab 9 Lab 9 Numerical Calculus and Symbolic

Numerical Differentiation CIS 541 - Differentiation The mathematical definition: Roger

Numerical methods for Mean Field Games: additional material Y. Achdou (LJLL, Universit e

Numerical Optimal Control Overview Moritz Diehl Simplified Optimal Control Problem in ODE path

Differentiable Functional Programming Atlm Gne Baydin University of Oxford

General Information: 1/7 General Information: 1/7 Course: Course: CS3911 Introduction to

Keren Li Supervisor: Tanaji Sen July 15, 2001 Long range interaction without crab cavity causes

Adjoint Derivative Computation Moritz Diehl and Carlo Savorgnan Adjoint Derivative Computation

CS 4803 / 7643: Deep Learning Topics: (Finish) Computing Gradients - PowerPoint PPT Presentation

CS 4803 / 7643: Deep Learning Topics: (Finish) Computing Gradients Backprop in Conv Layers Forward mode vs Reverse mode AD Modern CNN Architectures Zsolt Kira Georgia Tech Administrivia HW1 due date moved! Due:

CS 4803 / 7643: Deep Learning Website: http://www.cc.gatech.edu/classes/AY2020/cs7643_spring/

CS 4803 / 7643: Deep Learning Website: https://www.cc.gatech.edu/classes/AY2020/cs7643_fall/

CS 4803 / 7643: Deep Learning Topics: Image Classification Supervised Learning view

CS 4803 / 7643: Deep Learning Topics: Structured representations with graph networks Zsolt

CS 4803 / 7643: Deep Learning Topics: Dynamic Programming (Q-Value Iteration)

CS 4803 / 7643: Deep Learning Topics: Moving beyond supervised learning Zsolt Kira Georgia

CS 4803 / 7643: Deep Learning Topic: Reinforcement Learning (RL) Overview Markov

CS 4803 / 7643: Deep Learning Topics: Policy Gradients Actor Critic Ashwin Kalyan

CS 4803 / 7643: Deep Learning Guest Lecture: Embeddings and world2vec Feb. 18 th 2020 Ledell Wu

CS 4803 / 7643: Deep Learning Topics: Forward and backward though conv (Beginning) of

CS 4803 / 7643: Deep Learning Topics: Specifying Layers Forward &amp; Backward

CS 4803 / 7643: Deep Learning Topics: (Continue) Low-label ML Formulations Zsolt Kira

CS 4803 / 7643: Deep Learning Topics: Application: PointGoal Navigation Trust Region

CS 4803 / 7643: Deep Learning Topics: Low-label ML Formulations Zsolt Kira Georgia Tech

CS 4803 / 7643: Deep Learning Topics: Backpropagation Vector/Matrix/Tensor math

CS 4803 / 7643: Deep Learning Topics: Specifying Layers Forward &amp; Backward

Computer Applications Lab Computer Applications Lab Lab 9 Lab 9 Numerical Calculus and Symbolic

Numerical Differentiation CIS 541 - Differentiation The mathematical definition: Roger

Numerical methods for Mean Field Games: additional material Y. Achdou (LJLL, Universit e

Numerical Optimal Control Overview Moritz Diehl Simplified Optimal Control Problem in ODE path

Differentiable Functional Programming Atlm Gne Baydin University of Oxford

General Information: 1/7 General Information: 1/7 Course: Course: CS3911 Introduction to

Keren Li Supervisor: Tanaji Sen July 15, 2001 Long range interaction without crab cavity causes

Adjoint Derivative Computation Moritz Diehl and Carlo Savorgnan Adjoint Derivative Computation

CS 4803 / 7643: Deep Learning Topics: Specifying Layers Forward & Backward

CS 4803 / 7643: Deep Learning Topics: Specifying Layers Forward & Backward