Neural Networks: Backpropagation Machine Learning Based on slides - PowerPoint PPT Presentation

Neural Networks: Backpropagation Machine Learning Based on slides and material from Geoffrey Hinton, Richard Socher, Dan Roth, 1 Yoav Goldberg, Shai Shalev-Shwartz and Shai Ben-David, and others

This lecture • What is a neural network? • Predicting with a neural network • Training neural networks – Backpropagation • Practical concerns 3

Training a neural network • Given – A network architecture (layout of neurons, their connectivity and activations) – A dataset of labeled examples • S = {( x i , y i )} • The goal: Learn the weights of the neural network • Remember : For a fixed architecture, a neural network is a function parameterized by its weights – Prediction: ! = $$(&, () 4

� Recall: Learning as loss minimization We have a classifier NN that is completely defined by its weights Learn the weights by minimizing a loss * min ( . *($$ & / , ( , ! / ) Perhaps with a regularizer / So far, we saw that this strategy worked for: 1. Logistic Regression Each 2. Support Vector Machines minimizes a 3. Perceptron different loss function 4. LMS regression All of these are linear models Same idea for non-linear models too! 6

Back to our running example Given an input x , how is the output predicted 5 + 2 44 5 7 4 + 2 84 5 7 8 output y = 2 34 : + 2 48 : ; 4 + 2 88 : ; 8 ) 7 8 = 9(2 38 : + 2 44 : ; 4 + 2 84 : ; 8 ) z 4 = 9(2 34 7

Back to our running example Given an input x , how is the output predicted 5 + 2 44 5 7 4 + 2 84 5 7 8 output y = 2 34 : + 2 48 : ; 4 + 2 88 : ; 8 ) 7 8 = 9(2 38 : + 2 44 : ; 4 + 2 84 : ; 8 ) z 4 = 9(2 34 Suppose the true label for this example is a number ! / We can write the square loss for this example as: * = 1 2 !– ! / 8 9

� Learning as loss minimization We have a classifier NN that is completely defined by its weights Learn the weights by minimizing a loss * min ( . *($$ ; / , 2 , ! / ) Perhaps with a regularizer / How do we solve the optimization problem? 10

� min ( . *($$ ; / , 2 , ! / ) Stochastic gradient descent / Given a training set S = {( x i , y i )}, x 2 < d 1. Initialize parameters w 2. For epoch = 1 … T: 1. Shuffle the training set 2. For each training example ( x i , y i ) 2 S: Treat this example as the entire dataset • Compute the gradient of the loss A*($$ & / , ( , ! / ) Update: ( ← ( − D E A*($$ & / , ( , ! / )) • 3. Return w 11

� min ( . *($$ ; / , 2 , ! / ) Stochastic gradient descent / Given a training set S = {( x i , y i )}, x 2 < d 1. Initialize parameters w The objective is not convex. Initialization can be important 2. For epoch = 1 … T: 1. Shuffle the training set 2. For each training example ( x i , y i ) 2 S: Treat this example as the entire dataset • Compute the gradient of the loss A*($$ & / , ( , ! / ) Update: ( ← ( − D E A*($$ & / , ( , ! / )) • ° t : learning rate, many tweaks possible 3. Return w 17

� min ( . *($$ ; / , 2 , ! / ) Stochastic gradient descent / Given a training set S = {( x i , y i )}, x 2 < d 1. Initialize parameters w The objective is not convex. Initialization can be important 2. For epoch = 1 … T: 1. Shuffle the training set 2. For each training example ( x i , y i ) 2 S: Treat this example as the entire dataset • Compute the gradient of the loss A*($$ & / , ( , ! / ) Update: ( ← ( − D E A*($$ & / , ( , ! / )) • ° t : learning rate, many tweaks possible Have we solved everything? 3. Return w 18

The derivative of the loss function? A*($$ & / , ( , ! / ) If the neural network is a differentiable function, we can find the gradient – Or maybe its sub-gradient – This is decided by the activation functions and the loss function It was easy for SVMs and logistic regression – Only one layer But how do we find the sub-gradient of a more complex function? – Eg: A recent paper used a ~150 layer neural network for image classification! We need an efficient algorithm: Backpropagation 19

Checkpoint Where are we If we have a neural network (structure, activations and weights), we can make a prediction for an input If we had the true label of the input, then we can define the loss for that example If we can take the derivative of the loss with respect to each of the weights, we can take a gradient step in SGD Questions? 22

Reminder: Chain rule for derivatives – If 7 is a function of ! and ! is a function of ; • Then 7 is a function of ; , as well – Question: how to find FG FH 27 Slide courtesy Richard Socher

Reminder: Chain rule for derivatives – If 7 = a function of ! 4 + a function of ! 8 , and the ! / ’s are functions of ; • Then 7 is a function of ; , as well – Question: how to find FG FH 28 Slide courtesy Richard Socher

Reminder: Chain rule for derivatives – If 7 is a sum of functions of ! / ’s, and the ! / ’s are functions of ; • Then 7 is a function of ; , as well – Question: how to find FG FH 29 Slide courtesy Richard Socher

Backpropagation * = 1 2 !– ! ∗ 8 5 + 2 44 5 7 4 + 2 84 5 7 8 output y = 2 34 : + 2 48 : ; 4 + 2 88 : ; 8 ) 7 8 = 9(2 38 : + 2 44 : ; 4 + 2 84 : ; 8 ) z 4 = 9(2 34 30

Backpropagation * = 1 2 !– ! ∗ 8 5 + 2 44 5 7 4 + 2 84 5 7 8 output y = 2 34 : + 2 48 : ; 4 + 2 88 : ; 8 ) 7 8 = 9(2 38 : + 2 44 : ; 4 + 2 84 : ; 8 ) z 4 = 9(2 34 We want to compute FI FI M and N FJ KL FJ KL 31

Applying the chain rule to compute the gradient Backpropagation (And remembering partial computations along the way to speed up things) * = 1 2 !– ! ∗ 8 5 + 2 44 5 7 4 + 2 84 5 7 8 output y = 2 34 : + 2 48 : ; 4 + 2 88 : ; 8 ) 7 8 = 9(2 38 : + 2 44 : ; 4 + 2 84 : ; 8 ) z 4 = 9(2 34 We want to compute FI FI M and N FJ KL FJ KL 32

Neural Networks: Backpropagation Machine Learning Based on slides - PowerPoint PPT Presentation

Neural Networks: Backpropagation Machine Learning Based on slides and material from Geoffrey Hinton, Richard Socher, Dan Roth, 1 Yoav Goldberg, Shai Shalev-Shwartz and Shai Ben-David, and others This lecture What is a neural network?

Backpropagation Why backpropagation Neural networks are sequences of parametrized functions

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

CSC321 Lecture 6: Backpropagation Roger Grosse Roger Grosse CSC321 Lecture 6: Backpropagation 1

Neural Networks for Machine Learning Lecture 13a The ups and downs of backpropagation Geoffrey

Artificial Neural Networks Oliver Schulte - CMPT 726 Feed-forward Networks Network Training

MLPs with Backpropagation CS 472 Backpropagation 1 Multilayer Nets? Linear Systems F(cx) =

Neural Networks Greg Mori - CMPT 419/726 Bishop PRML Ch. 5 Feed-forward Networks Network

Neural Networks Oliver Schulte - CMPT 726 Bishop PRML Ch. 5 Feed-forward Networks Network

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Learning From Data Lecture 21 Neural Networks: Backpropagation Forward propagation: algorithmic

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Neural Networks + Convolutional Neural Networks Last Class Global Features The perceptron

Backpropagation Matt Gormley Lecture 12 Oct 10, 2018 1 Q&A 3 BACKPROPAGATION 4 A

Neural Networks and Backpropagation Neural Net Readings: Matt Gormley Murphy

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Neural Networks + Backpropagation Last Class Softmax Classifier Generalization /

Quality-Aware Neural Complementary Item Recommendation Yin Zhang , Haokai Lu, Wei Niu, James

RVTensor: A light-weight neural network inference framework based on the RISC-V architecture

Introduction to Neural Networks David Stutz david.stutz@rwth-aachen.de Seminar Selected Topics

Natural language processing with neural networks. Hubert Brykowski Europython 2019 Hubert

Adaptive Layout Decomposition with Graph Embedding Neural Networks Wei Li 1 , Jialu Xia 1 , Yuzhe

Evaluations of Deep Convolutional Neural Networks for Automatic Identification of Malaria Infected

Artificial Neural Networks Based on Machine Learning, T. Mitchell, McGRAW Hill, 1997, ch. 4

Nonparametric regression using deep neural networks with ReLU activation function Johannes

Neural Networks: Backpropagation Machine Learning Based on slides - PowerPoint PPT Presentation

Neural Networks: Backpropagation Machine Learning Based on slides and material from Geoffrey Hinton, Richard Socher, Dan Roth, 1 Yoav Goldberg, Shai Shalev-Shwartz and Shai Ben-David, and others This lecture What is a neural network?

Backpropagation Why backpropagation Neural networks are sequences of parametrized functions

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

CSC321 Lecture 6: Backpropagation Roger Grosse Roger Grosse CSC321 Lecture 6: Backpropagation 1

Neural Networks for Machine Learning Lecture 13a The ups and downs of backpropagation Geoffrey

Artificial Neural Networks Oliver Schulte - CMPT 726 Feed-forward Networks Network Training

MLPs with Backpropagation CS 472 Backpropagation 1 Multilayer Nets? Linear Systems F(cx) =

Neural Networks Greg Mori - CMPT 419/726 Bishop PRML Ch. 5 Feed-forward Networks Network

Neural Networks Oliver Schulte - CMPT 726 Bishop PRML Ch. 5 Feed-forward Networks Network

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Learning From Data Lecture 21 Neural Networks: Backpropagation Forward propagation: algorithmic

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Neural Networks + Convolutional Neural Networks Last Class Global Features The perceptron

Backpropagation Matt Gormley Lecture 12 Oct 10, 2018 1 Q&amp;A 3 BACKPROPAGATION 4 A

Neural Networks and Backpropagation Neural Net Readings: Matt Gormley Murphy

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Neural Networks + Backpropagation Last Class Softmax Classifier Generalization /

Quality-Aware Neural Complementary Item Recommendation Yin Zhang , Haokai Lu, Wei Niu, James

RVTensor: A light-weight neural network inference framework based on the RISC-V architecture

Introduction to Neural Networks David Stutz david.stutz@rwth-aachen.de Seminar Selected Topics

Natural language processing with neural networks. Hubert Brykowski Europython 2019 Hubert

Adaptive Layout Decomposition with Graph Embedding Neural Networks Wei Li 1 , Jialu Xia 1 , Yuzhe

Evaluations of Deep Convolutional Neural Networks for Automatic Identification of Malaria Infected

Artificial Neural Networks Based on Machine Learning, T. Mitchell, McGRAW Hill, 1997, ch. 4

Nonparametric regression using deep neural networks with ReLU activation function Johannes

Backpropagation Matt Gormley Lecture 12 Oct 10, 2018 1 Q&A 3 BACKPROPAGATION 4 A