Backpropagation I2DL: Prof. Niessner, Prof. Leal-Taix 1 Lecture 3 - PowerPoint PPT Presentation

Optimization and Backpropagation I2DL: Prof. Niessner, Prof. Leal-Taixé 1

Lecture 3 Recap I2DL: Prof. Niessner, Prof. Leal-Taixé 2

Neural Network • Linear score function 𝒈 = 𝑿𝒚 On CIFAR-10 Credit: Li/Karpathy/Johnson On ImageNet I2DL: Prof. Niessner, Prof. Leal-Taixé 3

Neural Network • Linear score function 𝒈 = 𝑿𝒚 • Neural network is a nesting of ‘functions’ – 2-layers: 𝒈 = 𝑿 𝟑 max(𝟏, 𝑿 𝟐 𝒚) – 3-layers: 𝒈 = 𝑿 𝟒 max(𝟏, 𝑿 𝟑 max(𝟏, 𝑿 𝟐 𝒚)) – 4-layers: 𝒈 = 𝑿 𝟓 tanh (𝑿 𝟒 , max(𝟏, 𝑿 𝟑 max(𝟏, 𝑿 𝟐 𝒚))) – 5-layers: 𝒈 = 𝑿 𝟔 𝜏(𝑿 𝟓 tanh(𝑿 𝟒 , max(𝟏, 𝑿 𝟑 max(𝟏, 𝑿 𝟐 𝒚)))) – … up to hundreds of layers I2DL: Prof. Niessner, Prof. Leal-Taixé 4

Neural Network Output layer Input layer Hidden layer Credit: Li/Karpathy/Johnson I2DL: Prof. Niessner, Prof. Leal-Taixé 5

Neural Network Hidden Layer 1 Hidden Layer 2 Hidden Layer 3 Input Layer Output Layer Width Depth I2DL: Prof. Niessner, Prof. Leal-Taixé 6

Activation Functions Leaky ReLU: max 0.1𝑦, 𝑦 1 Sigmoid: 𝜏 𝑦 = (1+𝑓 −𝑦 ) tanh: tanh 𝑦 Parametric ReLU: max 𝛽𝑦, 𝑦 Maxout max 𝑥 1 𝑈 𝑦 + 𝑐 1 , 𝑥 2 𝑈 𝑦 + 𝑐 2 ReLU: max 0, 𝑦 𝑦 if 𝑦 > 0 ELU f x = ቊ α e 𝑦 − 1 if 𝑦 ≤ 0 I2DL: Prof. Niessner, Prof. Leal-Taixé 7

Loss Functions • Measure the goodness of the predictions (or equivalently, the network's performance) • Regression loss 𝑜 | 𝑧 𝑗 − ෝ 𝒛; 𝜾 = 1 – L1 loss 𝑀 𝒛, ෝ 𝑜 σ 𝑗 𝑧 𝑗 | 1 𝑜 | 𝑧 𝑗 − ෝ 𝒛; 𝜾 = 1 2 – MSE loss 𝑀 𝒛, ෝ 𝑜 σ 𝑗 𝑧 𝑗 | 2 • Classification loss (for multi-class classification) 𝑜 – Cross Entropy loss E 𝒛, ෝ 𝑙 𝒛; 𝜾 = − σ 𝑗=1 σ 𝑙=1 (𝑧 𝑗𝑙 ∙ log ො 𝑧 𝑗𝑙 ) I2DL: Prof. Niessner, Prof. Leal-Taixé 8

Computational Graphs • Neural network is a computational graph – It has compute nodes – It has edges that connect nodes – It is directional – It is organized in ‘layers’ I2DL: Prof. Niessner, Prof. Leal-Taixé 9

Backprop I2DL: Prof. Niessner, Prof. Leal-Taixé 10

The Importance of Gradients • Our optimization schemes are based on computing gradients 𝛼 𝜾 𝑀 𝜾 • One can compute gradients analytically but what if our function is too complex? • Break down gradient computation Backpropagation Rumelhart 1986 I2DL: Prof. Niessner, Prof. Leal-Taixé 11

Backprop: Forward Pass • 𝑔 𝑦, 𝑧, 𝑨 = 𝑦 + 𝑧 ⋅ 𝑨 Initialization 𝑦 = 1, 𝑧 = −3, 𝑨 = 4 1 1 𝑒 = −2 −3 sum 𝑔 = −8 −3 mult 4 4 I2DL: Prof. Niessner, Prof. Leal-Taixé 12

Backprop: Backward Pass 𝑔 𝑦, 𝑧, 𝑨 = 𝑦 + 𝑧 ⋅ 𝑨 1 1 with 𝑦 = 1, 𝑧 = −3, 𝑨 = 4 𝑒 = −2 −3 sum 𝑔 = −8 −3 mult 𝜖𝑒 𝜖𝑒 4 𝜖𝑦 = 1 , 𝑒 = 𝑦 + 𝑧 𝜖𝑧 = 1 4 𝜖𝑔 𝜖𝑔 𝜖𝑒 = 𝑨 , 𝑔 = 𝑒 ⋅ 𝑨 𝜖𝑨 = 𝑒 𝜖𝑔 𝜖𝑔 𝜖𝑔 What is 𝜖𝑨 ? 𝜖𝑦 , 𝜖𝑧 , I2DL: Prof. Niessner, Prof. Leal-Taixé 13

Backprop: Backward Pass 𝑔 𝑦, 𝑧, 𝑨 = 𝑦 + 𝑧 ⋅ 𝑨 1 1 with 𝑦 = 1, 𝑧 = −3, 𝑨 = 4 𝑒 = −2 −3 sum 𝑔 = −8 −3 1 mult 𝜖𝑒 𝜖𝑒 4 𝜖𝑦 = 1 , 𝑒 = 𝑦 + 𝑧 𝜖𝑧 = 1 4 𝜖𝑔 𝜖𝑔 𝜖𝑒 = 𝑨 , 𝑔 = 𝑒 ⋅ 𝑨 𝜖𝑨 = 𝑒 𝜖𝑔 𝜖𝑔 𝜖𝑔 𝜖𝑔 𝜖𝑔 What is 𝜖𝑨 ? 𝜖𝑦 , 𝜖𝑧 , I2DL: Prof. Niessner, Prof. Leal-Taixé 14

Backprop: Backward Pass 𝑔 𝑦, 𝑧, 𝑨 = 𝑦 + 𝑧 ⋅ 𝑨 1 1 with 𝑦 = 1, 𝑧 = −3, 𝑨 = 4 𝑒 = −2 −3 sum 𝑔 = −8 −3 1 mult 𝜖𝑒 𝜖𝑒 4 𝜖𝑦 = 1 , 𝑒 = 𝑦 + 𝑧 𝜖𝑧 = 1 −2 4 −2 𝜖𝑔 𝜖𝑔 𝜖𝑒 = 𝑨 , 𝑔 = 𝑒 ⋅ 𝑨 𝜖𝑨 = 𝑒 𝜖𝑔 𝜖𝑨 𝜖𝑔 𝜖𝑔 𝜖𝑔 What is 𝜖𝑨 ? 𝜖𝑦 , 𝜖𝑧 , I2DL: Prof. Niessner, Prof. Leal-Taixé 15

Backprop: Backward Pass 𝑔 𝑦, 𝑧, 𝑨 = 𝑦 + 𝑧 ⋅ 𝑨 1 1 with 𝑦 = 1, 𝑧 = −3, 𝑨 = 4 𝑒 = −2 −3 4 sum 𝑔 = −8 −3 1 mult 𝜖𝑒 𝜖𝑒 4 𝜖𝑦 = 1 , 𝑒 = 𝑦 + 𝑧 𝜖𝑧 = 1 −2 4 −2 𝜖𝑔 𝜖𝑔 𝜖𝑒 = 𝑨 , 𝑔 = 𝑒 ⋅ 𝑨 𝜖𝑨 = 𝑒 𝜖𝑔 𝜖𝑒 𝜖𝑔 𝜖𝑔 𝜖𝑔 What is 𝜖𝑨 ? 𝜖𝑦 , 𝜖𝑧 , I2DL: Prof. Niessner, Prof. Leal-Taixé 16

Backprop: Backward Pass 𝑔 𝑦, 𝑧, 𝑨 = 𝑦 + 𝑧 ⋅ 𝑨 1 1 with 𝑦 = 1, 𝑧 = −3, 𝑨 = 4 𝑒 = −2 −3 4 4 sum 𝑔 = −8 −3 4 1 mult 𝜖𝑒 𝜖𝑒 4 𝜖𝑦 = 1 , 𝑒 = 𝑦 + 𝑧 𝜖𝑧 = 1 4 −2 𝜖𝑔 𝜖𝑔 𝜖𝑒 = 𝑨 , 𝑔 = 𝑒 ⋅ 𝑨 𝜖𝑨 = 𝑒 Chain Rule: 𝜖𝑔 𝜖𝑧 = 𝜖𝑔 𝜖𝑔 𝜖𝑒 ⋅ 𝜖𝑒 𝜖𝑧 𝜖𝑧 𝜖𝑔 𝜖𝑔 𝜖𝑔 What is 𝜖𝑨 ? 𝜖𝑦 , 𝜖𝑧 , → 𝜖𝑔 𝜖𝑧 = 4 ⋅ 1 = 4 I2DL: Prof. Niessner, Prof. Leal-Taixé 17

Backprop: Backward Pass 𝑔 𝑦, 𝑧, 𝑨 = 𝑦 + 𝑧 ⋅ 𝑨 1 1 4 4 with 𝑦 = 1, 𝑧 = −3, 𝑨 = 4 𝑒 = −2 −3 4 4 sum 𝑔 = −8 −3 4 1 mult 𝜖𝑒 𝜖𝑒 4 𝜖𝑦 = 1 , 𝑒 = 𝑦 + 𝑧 𝜖𝑧 = 1 −2 4 −2 𝜖𝑔 𝜖𝑔 𝜖𝑒 = 𝑨 , 𝑔 = 𝑒 ⋅ 𝑨 𝜖𝑨 = 𝑒 Chain Rule: 𝜖𝑔 𝜖𝑔 𝜖𝑦 = 𝜖𝑔 𝜖𝑒 ⋅ 𝜖𝑒 𝜖𝑦 𝜖𝑦 𝜖𝑔 𝜖𝑔 𝜖𝑔 What is 𝜖𝑨 ? 𝜖𝑦 , 𝜖𝑧 , → 𝜖𝑔 𝜖𝑦 = 4 ⋅ 1 = 4 I2DL: Prof. Niessner, Prof. Leal-Taixé 18

Compute Graphs -> Neural Networks • 𝑦 𝑙 input variables • 𝑥 𝑚,𝑛,𝑜 network weights (note 3 indices) – 𝑚 which layer – 𝑛 which neuron in layer – 𝑜 which weight in neuron • 𝑧 𝑗 computed output ( 𝑗 output dim; 𝑜 𝑝𝑣𝑢 ) ො • 𝑧 𝑗 ground truth targets • 𝑀 loss function I2DL: Prof. Niessner, Prof. Leal-Taixé 19

Compute Graphs -> Neural Networks Input layer Output layer ∗ 𝑥 0 𝑦 0 𝑦 0 Loss/ 𝑦 ∗ 𝑦 + −𝑧 0 cost 𝑧 0 ො 𝑧 0 𝑦 1 ∗ 𝑥 1 𝑦 1 L2 Loss Weights Input function (unknowns!) e.g., class label/ regression target I2DL: Prof. Niessner, Prof. Leal-Taixé 20

Compute Graphs -> Neural Networks Input layer Output layer ∗ 𝑥 0 𝑦 0 Loss/ 𝑦 ∗ 𝑦 −𝑧 0 + max(0, 𝑦) 𝑦 0 cost 𝑦 1 ∗ 𝑥 1 𝑧 0 ො 𝑧 0 ReLU Activation 𝑦 1 L2 Loss Weights Input (btw. I’m not arguing function (unknowns!) this is the right choice here) We want to compute gradients w.r.t. all weights 𝑿 e.g., class label/ regression target I2DL: Prof. Niessner, Prof. Leal-Taixé 21

Compute Graphs -> Neural Networks ∗ 𝑥 0,0 Loss/ Output layer Input layer 𝑦 ∗ 𝑦 + −𝑧 0 cost ∗ 𝑥 0,1 ∗ 𝑥 1,0 𝑧 0 ො 𝑧 0 𝑦 0 Loss/ 𝑦 ∗ 𝑦 + −𝑧 0 cost 𝑦 0 ∗ 𝑥 1,1 𝑦 1 𝑧 1 ො 𝑧 1 ∗ 𝑥 2,0 𝑦 1 Loss/ 𝑧 2 ො 𝑧 2 𝑦 ∗ 𝑦 −𝑧 0 + cost ∗ 𝑥 2,1 We want to compute gradients w.r.t. all weights 𝑿 I2DL: Prof. Niessner, Prof. Leal-Taixé 22

Compute Graphs -> Neural Networks Output layer Input layer Goal: We want to compute gradients of the loss function 𝑀 w.r.t. all weights 𝑿 𝑀 = ෍ 𝑀 𝑗 𝑦 0 𝑗 𝑧 0 ො 𝑧 0 𝑀 : sum over loss per sample, e.g. L2 loss ⟶ simply sum up squares: … 𝑧 𝑗 − 𝑧 𝑗 2 𝑀 𝑗 = ො 𝑧 1 ො 𝑧 1 𝑦 𝑙 ⟶ use chain rule to compute partials 𝑧 𝑗 = 𝐵(𝑐 𝑗 + ෍ ො 𝑦 𝑙 𝑥 𝑗,𝑙 ) 𝜖𝑀 𝑗 = 𝜖𝑀 𝑗 ⋅ 𝜖 ො 𝑧 𝑗 𝑙 𝜖𝑥 𝑗,𝑙 𝜖 ො 𝑧 𝑗 𝜖𝑥 𝑗,𝑙 Activation bias function We want to compute gradients w.r.t. all weights 𝑿 AND all biases 𝒄 I2DL: Prof. Niessner, Prof. Leal-Taixé 23

NNs as Computational Graphs • We can express any kind of functions in a 1 computational graph, e.g. 𝑔 𝒙, 𝒚 = 1+𝑓 − 𝑐+𝑥0𝑦0+𝑥1𝑦1 𝑥 1 ∗ Sigmoid function 𝑦 1 1 + 𝜏 𝑦 = 1 + 𝑓 −𝑦 𝑥 0 ∗ 1 + exp(∙) +1 ∙ −1 𝑦 0 ∙ 𝑐 I2DL: Prof. Niessner, Prof. Leal-Taixé 24

NNs as Computational Graphs 1 • 𝑔 𝒙, 𝒚 = 1+𝑓 − 𝑐+𝑥0𝑦0+𝑥1𝑦1 2 𝑥 1 ∗ −1 −2 𝑦 1 + −3 4 6 𝑥 0 ∗ 1 −1 1.37 0.73 0.37 1 −2 + exp(∙) +1 ∙ −1 𝑦 0 ∙ −3 𝑐 I2DL: Prof. Niessner, Prof. Leal-Taixé 25

NNs as Computational Graphs 1 • 𝑔 𝒙, 𝒚 = 𝑕 𝑦 = 1 𝜖𝑕 1 ⇒ 𝜖𝑦 = − 1+𝑓 − 𝑐+𝑥0𝑦0+𝑥1𝑦1 𝑦 2 𝑦 𝑕 𝛽 𝑦 = 𝛽 + 𝑦 ⇒ 𝜖𝑕 𝜖𝑦 = 1 2 𝑥 1 ⇒ 𝜖𝑕 𝑕 𝑦 = 𝑓 𝑦 𝜖𝑦 = 𝑓 𝑦 ∗ −1 −2 𝜖𝑕 𝑕 𝛽 𝑦 = 𝛽𝑦 ⇒ 𝜖𝑦 = 𝛽 𝑦 1 + 1 1 ∙ − 1.37 2 = −0.53 −3 4 6 𝑥 0 ∗ 1 −1 1.37 0.73 0.37 1 −2 + exp(∙) +1 ∙ −1 𝑦 0 ∙ −0.53 1 −3 𝑐 I2DL: Prof. Niessner, Prof. Leal-Taixé 26

Backpropagation I2DL: Prof. Niessner, Prof. Leal-Taix 1 Lecture 3 - PowerPoint PPT Presentation

Optimization and Backpropagation I2DL: Prof. Niessner, Prof. Leal-Taix 1 Lecture 3 Recap I2DL: Prof. Niessner, Prof. Leal-Taix 2 Neural Network Linear score function = On CIFAR-10 Credit: Li/Karpathy/Johnson On

Backpropagation Why backpropagation Neural networks are sequences of parametrized functions

MLPs with Backpropagation CS 472 Backpropagation 1 Multilayer Nets? Linear Systems F(cx) =

CSC321 Lecture 6: Backpropagation Roger Grosse Roger Grosse CSC321 Lecture 6: Backpropagation 1

Neural Networks for Machine Learning Lecture 13a The ups and downs of backpropagation Geoffrey

Backpropagation Matt Gormley Lecture 12 Oct 10, 2018 1 Q&A 3 BACKPROPAGATION 4 A

Learning From Data Lecture 21 Neural Networks: Backpropagation Forward propagation: algorithmic

Lecture 11: Multi-layer Perceptron Forward Pass Backpropagation Aykut Erdem November

GRAPH REPRESENTATIONS, BACKPROPAGATION AND BIOLOGICAL PLAUSIBILITY Marco Gori SAILAB,

Backpropagation and Gradient Descent Brian Carignan, Dec 5 2016 Overview Notation/background

Event-Driven Random Backpropagation: Enabling Neuromorphic Deep Learning Machines Emre Neftci

White Box : Website Frontend & Network visualization using Guided Backpropagation Neha Das

Unsupervised Domain Adaptation by Backpropagation Chih-Hui Ho, Xingyu Gu, Yuan Qi Outline

Bayesian Neural Networks - Presenters Group 1: A Practical Bayesian Framework for Backpropagation

Backpropagation Learning 15-486/782: Artificial Neural Networks David S. Touretzky Fall 2006 1

Lecture 11: Multi-layer Perceptron Forward Pass Backpropagation Aykut Erdem November

Multi-Layer Networks and Backpropagation Algorithm M. Soleymani Sharif University of Technology

Lecture 4: Backpropagation and Neural Networks part 1 Fei-Fei Li & Andrej Karpathy &

The Embedded Graphs of a Knot and the Partial Duals of a Plane Graph Iain Moffatt University of

Quick Changeover Examples (SMED) AMF Process Improvement Group 16 February 2017 Proprietary

Announcements Wednesday, August 22 Everything youll need to know is on the master website:

Multi-layer Perceptrons & the Back-propagation Algorithm Instructor: Sham Kakade Please email

Lecture 16: Introduction to Neural Networks, Feed-forward Networks and Back-propagation Dr.

Neural Network Backpropagation 3-2-16 Recall from Monday... Perceptrons can only classify

Modular Neural Networks CPSC 533 Franco Lee Ian Ko Modular Neural Networks What is it ? Dif

Backpropagation I2DL: Prof. Niessner, Prof. Leal-Taix 1 Lecture 3 - PowerPoint PPT Presentation

Optimization and Backpropagation I2DL: Prof. Niessner, Prof. Leal-Taix 1 Lecture 3 Recap I2DL: Prof. Niessner, Prof. Leal-Taix 2 Neural Network Linear score function = On CIFAR-10 Credit: Li/Karpathy/Johnson On

Backpropagation Why backpropagation Neural networks are sequences of parametrized functions

MLPs with Backpropagation CS 472 Backpropagation 1 Multilayer Nets? Linear Systems F(cx) =

CSC321 Lecture 6: Backpropagation Roger Grosse Roger Grosse CSC321 Lecture 6: Backpropagation 1

Neural Networks for Machine Learning Lecture 13a The ups and downs of backpropagation Geoffrey

Backpropagation Matt Gormley Lecture 12 Oct 10, 2018 1 Q&amp;A 3 BACKPROPAGATION 4 A

Learning From Data Lecture 21 Neural Networks: Backpropagation Forward propagation: algorithmic

Lecture 11: Multi-layer Perceptron Forward Pass Backpropagation Aykut Erdem November

GRAPH REPRESENTATIONS, BACKPROPAGATION AND BIOLOGICAL PLAUSIBILITY Marco Gori SAILAB,

Backpropagation and Gradient Descent Brian Carignan, Dec 5 2016 Overview Notation/background

Event-Driven Random Backpropagation: Enabling Neuromorphic Deep Learning Machines Emre Neftci

White Box : Website Frontend &amp; Network visualization using Guided Backpropagation Neha Das

Unsupervised Domain Adaptation by Backpropagation Chih-Hui Ho, Xingyu Gu, Yuan Qi Outline

Bayesian Neural Networks - Presenters Group 1: A Practical Bayesian Framework for Backpropagation

Backpropagation Learning 15-486/782: Artificial Neural Networks David S. Touretzky Fall 2006 1

Lecture 11: Multi-layer Perceptron Forward Pass Backpropagation Aykut Erdem November

Multi-Layer Networks and Backpropagation Algorithm M. Soleymani Sharif University of Technology

Lecture 4: Backpropagation and Neural Networks part 1 Fei-Fei Li &amp; Andrej Karpathy &amp;

The Embedded Graphs of a Knot and the Partial Duals of a Plane Graph Iain Moffatt University of

Quick Changeover Examples (SMED) AMF Process Improvement Group 16 February 2017 Proprietary

Announcements Wednesday, August 22 Everything youll need to know is on the master website:

Multi-layer Perceptrons &amp; the Back-propagation Algorithm Instructor: Sham Kakade Please email

Lecture 16: Introduction to Neural Networks, Feed-forward Networks and Back-propagation Dr.

Neural Network Backpropagation 3-2-16 Recall from Monday... Perceptrons can only classify

Modular Neural Networks CPSC 533 Franco Lee Ian Ko Modular Neural Networks What is it ? Dif

Backpropagation Matt Gormley Lecture 12 Oct 10, 2018 1 Q&A 3 BACKPROPAGATION 4 A

White Box : Website Frontend & Network visualization using Guided Backpropagation Neha Das

Lecture 4: Backpropagation and Neural Networks part 1 Fei-Fei Li & Andrej Karpathy &

Multi-layer Perceptrons & the Back-propagation Algorithm Instructor: Sham Kakade Please email