Neural Networks. Petr Pok Czech Technical University in Prague - PowerPoint PPT Presentation

CZECH TECHNICAL UNIVERSITY IN PRAGUE Faculty of Electrical Engineering Department of Cybernetics Neural Networks. Petr Pošík Czech Technical University in Prague Faculty of Electrical Engineering Dept. of Cybernetics P. Pošík c � 2017 Artificial Intelligence – 1 / 32

Introduction and Rehearsal P. Pošík c � 2017 Artificial Intelligence – 2 / 32

Notation In supervised learning , we work with ■ an observation described by a vector x = ( x 1 , . . . , x D ) , Intro ■ the corresponding true value of the dependent variable y , and • Notation y = f w ( x ) , where the model parameters are in vector w . ■ the prediction of a model � • Multiple regression • Logistic regression • Gradient descent • Ex: Grad. for MR • Ex: Grad. for LR • Relations to NN Multilayer FFN Gradient Descent Regularization Other NNs Summary P. Pošík c � 2017 Artificial Intelligence – 3 / 32

Notation In supervised learning , we work with ■ an observation described by a vector x = ( x 1 , . . . , x D ) , Intro ■ the corresponding true value of the dependent variable y , and • Notation y = f w ( x ) , where the model parameters are in vector w . ■ the prediction of a model � • Multiple regression • Logistic regression ■ Very often, we use homogeneous coordinates and matrix notation, and represent the • Gradient descent whole training data set as T = ( X , y ) , where • Ex: Grad. for MR • Ex: Grad. for LR     • Relations to NN x ( 1 ) y ( 1 ) 1     Multilayer FFN . . .     X = y = . .  , and .  .   . . . Gradient Descent x ( | T | ) y ( | T | ) 1 Regularization Other NNs Summary P. Pošík c � 2017 Artificial Intelligence – 3 / 32

Notation In supervised learning , we work with ■ an observation described by a vector x = ( x 1 , . . . , x D ) , Intro ■ the corresponding true value of the dependent variable y , and • Notation y = f w ( x ) , where the model parameters are in vector w . ■ the prediction of a model � • Multiple regression • Logistic regression ■ Very often, we use homogeneous coordinates and matrix notation, and represent the • Gradient descent whole training data set as T = ( X , y ) , where • Ex: Grad. for MR • Ex: Grad. for LR     • Relations to NN x ( 1 ) y ( 1 ) 1     Multilayer FFN . . .     X = y = . .  , and .  .   . . . Gradient Descent x ( | T | ) y ( | T | ) 1 Regularization Other NNs Summary Learning then amounts to finding such model parameters w ∗ which minimize certain loss (or energy) function: w ∗ = arg min J ( w , T ) w P. Pošík c � 2017 Artificial Intelligence – 3 / 32

Multiple linear regression Multiple linear regression model: y = f w ( x ) = w 1 x 1 + w 2 x 2 + . . . + w D x D = xw T � Intro • Notation The minimum of • Multiple regression • Logistic regression � y ( i ) � 2 | T | • Gradient descent 1 y ( i ) − � ∑ J MSE ( w ) = • Ex: Grad. for MR , | T | • Ex: Grad. for LR i = 1 • Relations to NN is given by Multilayer FFN Gradient Descent w ∗ = ( X T X ) − 1 X T y , Regularization Other NNs or found by numerical optimization. Summary P. Pošík c � 2017 Artificial Intelligence – 4 / 32

Multiple linear regression Multiple linear regression model: y = f w ( x ) = w 1 x 1 + w 2 x 2 + . . . + w D x D = xw T � Intro • Notation The minimum of • Multiple regression • Logistic regression � y ( i ) � 2 | T | • Gradient descent 1 y ( i ) − � ∑ J MSE ( w ) = • Ex: Grad. for MR , | T | • Ex: Grad. for LR i = 1 • Relations to NN is given by Multilayer FFN Gradient Descent w ∗ = ( X T X ) − 1 X T y , Regularization Other NNs or found by numerical optimization. Summary Multiple regression as a linear neuron : x 1 w i 3 x 2 � y 3 x 3 3 P. Pošík c � 2017 Artificial Intelligence – 4 / 32

Logistic regression Logistic regression model: y = f ( w , x ) = g ( xw T ) , � Intro • Notation where • Multiple regression • Logistic regression 1 • Gradient descent g ( z ) = • Ex: Grad. for MR 1 + e − z • Ex: Grad. for LR • Relations to NN is the sigmoid (a.k.a logistic ) function. Multilayer FFN ■ No explicit equation for the optimal weights. Gradient Descent ■ The only option is to find the optimum numerically, usually by some form of gradient Regularization descent. Other NNs Summary P. Pošík c � 2017 Artificial Intelligence – 5 / 32

Logistic regression Logistic regression model: y = f ( w , x ) = g ( xw T ) , � Intro • Notation where • Multiple regression • Logistic regression 1 • Gradient descent g ( z ) = • Ex: Grad. for MR 1 + e − z • Ex: Grad. for LR • Relations to NN is the sigmoid (a.k.a logistic ) function. Multilayer FFN ■ No explicit equation for the optimal weights. Gradient Descent ■ The only option is to find the optimum numerically, usually by some form of gradient Regularization descent. Other NNs Summary Logistic regression as a non-linear neuron : x 1 w i 3 g ( xw T ) x 2 y ˆ 3 x 3 3 P. Pošík c � 2017 Artificial Intelligence – 5 / 32

Gradient descent algorithm ■ Given a function J ( w ) that should be minimized, ■ start with a guess of w , and change it so that J ( w ) decreases, i.e. Intro ■ update our current guess of w by taking a step in the direction opposite to the • Notation gradient: • Multiple regression • Logistic regression w ← w − η ∇ J ( w ) , i.e. • Gradient descent • Ex: Grad. for MR ∂ • Ex: Grad. for LR w d ← w d − η J ( w ) , ∂ w d • Relations to NN Multilayer FFN where all w d s are updated simultaneously and η is a learning rate (step size). Gradient Descent ■ For cost functions given as the sum across the training examples Regularization Other NNs | T | E ( w , x ( i ) , y ( i ) ) , ∑ Summary J ( w ) = i = 1 we can concentrate on a single training example because | T | ∂ ∂ E ( w , x ( i ) , y ( i ) ) , ∑ J ( w ) = ∂ w d ∂ w d i = 1 and we can drop the indices over training data set: E = E ( w , x , y ) . P. Pošík c � 2017 Artificial Intelligence – 6 / 32

Example: Gradient for multiple regression and squared loss x 1 w i 3 Intro x 2 • Notation y � 3 • Multiple regression • Logistic regression • Gradient descent x 3 • Ex: Grad. for MR 3 • Ex: Grad. for LR • Relations to NN Assuming the squared error loss Multilayer FFN E ( w , x , y ) = 1 y ) 2 = 1 Gradient Descent 2 ( y − xw T ) 2 , 2 ( y − � Regularization Other NNs we can compute the derivatives using the chain rule as Summary ∂ � ∂ E = ∂ E y , where ∂ w d ∂ � y ∂ w d ∂ E y = ∂ 1 y ) 2 = − ( y − � 2 ( y − � y ) , and ∂ � ∂ � y ∂ � y ∂ xw T = x d , = ∂ w d ∂ w d and thus ∂ � ∂ E = ∂ E y = − ( y − � y ) x d . ∂ w d ∂ � y ∂ w d P. Pošík c � 2017 Artificial Intelligence – 7 / 32

Example: Gradient for logistic regression and crossentropy loss Nonlinear activation function: x 1 w i 3 1 g ( a ) g ( a ) = a Intro 1 + e − a x 2 y � • Notation 3 • Multiple regression Note that • Logistic regression • Gradient descent g ′ ( a ) = g ( a ) ( 1 − g ( a )) . x 3 • Ex: Grad. for MR 3 • Ex: Grad. for LR • Relations to NN Multilayer FFN Gradient Descent Regularization Other NNs Summary P. Pošík c � 2017 Artificial Intelligence – 8 / 32

Example: Gradient for logistic regression and crossentropy loss Nonlinear activation function: x 1 w i 3 1 g ( a ) g ( a ) = a Intro 1 + e − a x 2 y � • Notation 3 • Multiple regression Note that • Logistic regression • Gradient descent g ′ ( a ) = g ( a ) ( 1 − g ( a )) . x 3 • Ex: Grad. for MR 3 • Ex: Grad. for LR • Relations to NN Assuming the crossentropy loss Multilayer FFN Gradient Descent y = g ( a ) = g ( xw T ) , E ( w , x , y ) = − y log � y − ( 1 − y ) log ( 1 − � y ) , where � Regularization we can compute the derivatives using the chain rule as Other NNs Summary ∂ E = ∂ E ∂ � y ∂ a , where ∂ w d ∂ � y ∂ a ∂ w d y + 1 − y y − � ∂ E y = − y y y = − y ) , 1 − � y ( 1 − � ∂ � � � ∂ � y y ) , and ∂ a ∂ xw T = x d , ∂ a = � y ( 1 − � = ∂ w d ∂ w d and thus ∂ � ∂ E = ∂ E y ∂ a = − ( y − � y ) x d . ∂ w d ∂ � y ∂ a ∂ w d P. Pošík c � 2017 Artificial Intelligence – 8 / 32

Relations to neural networks ■ Above, we derived training algorithms (based on gradient descent) for linear regression model and linear classification model. ■ Note the similarity with the perceptron algorithm (“just add certain part of a Intro • Notation misclassified training example to the weight vector”). • Multiple regression ■ Units like those above are used as building blocks for more complex/flexible • Logistic regression models! • Gradient descent • Ex: Grad. for MR • Ex: Grad. for LR • Relations to NN Multilayer FFN Gradient Descent Regularization Other NNs Summary P. Pošík c � 2017 Artificial Intelligence – 9 / 32

Neural Networks. Petr Pok Czech Technical University in Prague - PowerPoint PPT Presentation

CZECH TECHNICAL UNIVERSITY IN PRAGUE Faculty of Electrical Engineering Department of Cybernetics Neural Networks. Petr Pok Czech Technical University in Prague Faculty of Electrical Engineering Dept. of Cybernetics P. Pok c 2017

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

CHAPTER II III I CHAPTER Neural Networks as Neural Networks as Associative Memory

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Neural Networks 0. Logistics Spring 2019 1 Neural Networks are taking over! Neural networks

Neural Networks and their Application to Go Neural Networks Learning Blackjack Theory Training

Neural Networks 1. Introduction Fall 2017 Neural Networks are taking over! Neural networks

Neural Networks Neural Net Basics Dan Klein, John DeNero UC Berkeley Slides adapted from Greg

Relaxation and Hopfield Networks Neural Networks Neural Networks - Hopfield 1 Bibliography

Neural Networks 1. Introduction Spring 2020 1 Neural Networks are taking over! Neural

Introduction to Artificial Intelligence Neural Networks - Deep Learning for NLP Janyl Jumadinova

Neural Networks 1. Introduction Spring 2019 1 Neural Networks are taking over! Neural

Lectu ture 7 Recap Prof. Leal-Taix and Prof. Niessner 1 Bey Beyon ond l linea ear

Le Lecture 7 7 R Recap ap I2DL: Prof. Niessner, Prof. Leal-Taix 1 Na Nave L Losse

tss

DEEP LEARNING FFR135, Artificial Neural Networks Olof Mogren Chalmers University of Technology

Statistical challenges and opportunities for reliable CNS interfaces Liam Paninski Department of

Convolutional Neural Networks 08, 10 & 17 Nov, 2016 J. Ezequiel Soto S. Image Processing

Distinguished lecture talk by our new AU honorary doctor Wendy E. Mackay on Creating Human-

deep learning for document classification Mentored by: Prof. Amitabha Mukerjee vectors for

Sambuz

Useful Links

Newsletter

Mail Us

Neural Networks. Petr Pok Czech Technical University in Prague - PowerPoint PPT Presentation

CZECH TECHNICAL UNIVERSITY IN PRAGUE Faculty of Electrical Engineering Department of Cybernetics Neural Networks. Petr Pok Czech Technical University in Prague Faculty of Electrical Engineering Dept. of Cybernetics P. Pok c 2017

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

CHAPTER II III I CHAPTER Neural Networks as Neural Networks as Associative Memory

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Neural Networks 0. Logistics Spring 2019 1 Neural Networks are taking over! Neural networks

Neural Networks and their Application to Go Neural Networks Learning Blackjack Theory Training

Neural Networks 1. Introduction Fall 2017 Neural Networks are taking over! Neural networks

Neural Networks Neural Net Basics Dan Klein, John DeNero UC Berkeley Slides adapted from Greg

Relaxation and Hopfield Networks Neural Networks Neural Networks - Hopfield 1 Bibliography

Neural Networks 1. Introduction Spring 2020 1 Neural Networks are taking over! Neural

Introduction to Artificial Intelligence Neural Networks - Deep Learning for NLP Janyl Jumadinova

Neural Networks 1. Introduction Spring 2019 1 Neural Networks are taking over! Neural

Lectu ture 7 Recap Prof. Leal-Taix and Prof. Niessner 1 Bey Beyon ond l linea ear

Le Lecture 7 7 R Recap ap I2DL: Prof. Niessner, Prof. Leal-Taix 1 Na Nave L Losse

tss

DEEP LEARNING FFR135, Artificial Neural Networks Olof Mogren Chalmers University of Technology

Statistical challenges and opportunities for reliable CNS interfaces Liam Paninski Department of

Convolutional Neural Networks 08, 10 &amp; 17 Nov, 2016 J. Ezequiel Soto S. Image Processing

Distinguished lecture talk by our new AU honorary doctor Wendy E. Mackay on Creating Human-

deep learning for document classification Mentored by: Prof. Amitabha Mukerjee vectors for

Sambuz

Useful Links

Newsletter

Mail Us

Convolutional Neural Networks 08, 10 & 17 Nov, 2016 J. Ezequiel Soto S. Image Processing