Chapter 10: Artificial Neural Networks Dr. Xudong Liu Assistant - PowerPoint PPT Presentation

Chapter 10: Artificial Neural Networks Dr. Xudong Liu Assistant Professor School of Computing University of North Florida Monday, 9/30/2019 1 / 17

Overview 1 Artificial neuron: linear threshold unit (LTU) 2 Perceptron 3 Multi-Layer Perceptron (MLP), Deep Neural Networks (DNN) 4 Learning ANN’s: Error Backpropagation Overview 2 / 17

Differential Calculus In this course, I shall stick to Leibniz’s notation. The derivative of a function at an input point, when it exists, is the slope of the tangent line to the graph of the function. Let y = f ( x ) = x 2 + 2 x + 1. Then, f ′ ( x ) = dy dx = 2 x + 2. A partial derivative of a function of several variables is its derivative with respect to one of those variables, with the others held constant. Let z = f ( x , y ) = x 2 + xy + y 2 . Then, ∂ z ∂ x = 2 x + y . The chain rule is a formula for computing the derivative of the composition of two or more functions. Let z = f ( y ) and y = g ( x ). Then, dz dx = dz dy · dy dx = f ′ ( y ) · g ′ ( x ). 1 Nice thing about the sigmoid function f ( x ) = 1+ e − x : f ′ ( x ) = f ( x ) · (1 − f ( x )). Some Preliminaries 3 / 17

Artificial Neuron Artificial neuron , also called linear threshold unit (LTU), by McCulloch and Pitts, 1943: with one or more numeric inputs, it produces a weighted sum of them, applies an activation function , and outputs the result. Common activation functions: step function and sigmoid function. Artificial Neuron 4 / 17

Linear Threshold Unit (LTU) Below is an LTU with the activation function being the step function. Artificial Neuron 5 / 17

Perceptrons A perceptron , by Rosenblatt in 1957, is composed of two layers of neurons: an input layer consisting of special passing through neurons and an output layer of LTU’s. The bias neuron is added for the completeness of linearity. Rosenblatt proved that, if training examples are linearly separable, a perceptron always can be learned to correctly classify all training examples. Perceptrons 6 / 17

Perceptrons For instance, perceptrons can implement logical conjunction, disjunction and negation. For the following perceptron of one LTU with the step function as the activation function. x 1 ∧ x 2 : w 1 = w 2 = 1, θ = − 2 x 1 ∨ x 2 : w 1 = w 2 = 1, θ = − 0 . 5 ¬ x 1 : w 1 = − 0 . 6, w 2 = 0, θ = − 0 . 5 Perceptrons 7 / 17

Perceptrons However, perceptrons cannot solve some trivial non-linear separable problems, such as the Exclusive OR classification problem. This is shown by Minsky and Papert in 1969. Turned out stacking multiple perceptrons can solve any non-linear problems. Perceptrons 8 / 17

Multi-Layer Perceptrons A Multi-Layer Perceptrons (MLP) is composed of one passthrough input layer, one or more layers of LTU’s, called hidden layer , and one final layer of LTUs, called output layer . Again, every layer except the output layer includes a bias neuron and is fully connected to the next layer. When an MLP has two or more hidden layers, it is called a deep neural network (DNN). Multi-Layer Perceptrons 9 / 17

Learning Multi-Layer Perceptrons: Error Backpropagation Error backpropagation so far is the most successful learning algorithm for training MLP’s. Learning Internal Representations by Error Propagation , Rumelhart, Hinton and Williams, 1986. Idea: we start with a fix network, then update edge weights using v ← v + ∆ v , where ∆ v is a gradient descent of some error objective function. Before learning, let’s take a look at a few possible activation functions and their derivatives. Learning MLP 10 / 17

Error Backpropagation Now, I explain the error backpropagation algorithm for the following MLP of d passthrough input neurons, one hidden layer of q LTU’s, and output layer of l LTU’s. I will assume activation functions f at the l + q LTU’s all are the sigmoid function. Note the bias neuron is gone from the picture, as now the bias is embedded in the LTU’s activation function y j = f ( � w i x i − θ j ). i Learning MLP 11 / 17

Error Backpropagation So our training set is { ( x 1 , y 1 ) , . . . , ( x m , y m ) } , where x i ∈ R d and y i ∈ R l . We want to have an MLP like below to fit this data set; namely, to compute the l · q + d · q weights and the l + q biases. To predict on a new example x , we feed it to the input neurons and collect the outputs. The predicted class could be the y j with the highest score. If you want class probabilities, consider softmax. MLP can be used for regression too, when there only is one output neuron. Learning MLP 12 / 17

Error Backpropagation I use θ j to mean the bias in the j -th neuron in the output layer, and γ h the bias in the h -th neuron in the hidden layer. q I use β j = � w hj b h to mean the input to the j -th neuron in the h =1 d output layer, and α h = � v ih x i . i =1 Take a training example ( x , y ), I use ˆ y = (ˆ y 1 , . . . , ˆ y l ) to mean the predicted output, where each ˆ y j = f ( β j − θ j ). Learning MLP 13 / 17

Error Backpropagation l Error function (objective function): E = 1 y j − y j ) 2 . � (ˆ 2 j =1 This error function is a composition function of many parameters and it is differentiable, so we can compute the gradient descents to be used to update the weights and biases. Learning MLP 14 / 17

The Error Backpropagation Algorithm Given a training set D = { ( x , y ) } , and learning rate η , we want to finalize the weights and biases in the MLP. 1 Initialize all the weights and biases in the MLP using random values from interval [0 , 1]; 2 Repeat the following until some stopping criteron: For every ( x , y ) ∈ D , do 1 Calculate ˆ y ; 1 2 Calculate gradient descents ∆ w hj and ∆ θ j for the neurons in the output layer; 3 Calculate gradient descents ∆ v ih and ∆ γ h for the neurons in the hidden layer; 4 Update weights w hj and v ih , and biases θ j and γ h ; Learning MLP 15 / 17

The Error Backpropagation Algorithm: ∆ w hj 1 ∆ w hj = − η ∂ E ∂ w hj y j · ∂ ˆ ∂β j · ∂β j y j ∂ w hj = ∂ E ∂ E 2 ∂ ˆ ∂ w hj q ∂β j 3 We know ∂ w hj = b h , for we have β j = � w hj b h h =1 l 4 We know ∂ E y j − y j , for we have E = 1 y j − y j ) 2 y j = ˆ � (ˆ ∂ ˆ 2 j =1 5 We also know ∂ ˆ y j ∂β j = f ′ ( β j − θ j ) = ˆ y j (1 − ˆ y j ), for we know f is sigmoid 6 Bullets 3, 4 and 5 together can solve ∆ w hj in bullet 1. 7 Computing ∆ θ j is similar. Learning MLP 16 / 17

The Error Backpropagation Algorithm: ∆ v ih 1 ∆ v ih = − η ∂ E ∂ v ih ∂ b h · ∂ b h ∂α h · ∂α h ∂ v ih = ∂ E ∂ E 2 ∂ v ih 3 (Long story short) l 4 ∆ v ih = η x i b h (1 − b h ) � w hj g j , where g j = ( y j − ˆ y j )ˆ y j (1 − ˆ y j ) j =1 5 So we update ∆ w hj and ∆ θ j for the output layer first, then ∆ v ih and ∆ γ h for the hidden layer. 6 This is why it is called backpropagation . Learning MLP 17 / 17

Chapter 10: Artificial Neural Networks Dr. Xudong Liu Assistant - PowerPoint PPT Presentation

Chapter 10: Artificial Neural Networks Dr. Xudong Liu Assistant Professor School of Computing University of North Florida Monday, 9/30/2019 1 / 17 Overview 1 Artificial neuron: linear threshold unit (LTU) 2 Perceptron 3 Multi-Layer Perceptron

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

CHAPTER II III I CHAPTER Neural Networks as Neural Networks as Associative Memory

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Neural networks Chapter 20 Chapter 20 1 Outline Brains Neural networks Perceptrons

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Artificial Neural Networks By: Kodi Neumiller Overview What is an artificial neural network

CHAPTER VI VI CHAPTER Learning in Feedforward Feedforward Learning in Neural Networks Neural

Introduction to Artificial Intelligence Neural Networks - Deep Learning for NLP Janyl Jumadinova

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Artificial neural networks Chapter 18, Section 7 of; based on AIMA Slides c Artificial

Neural networks Chapter 20, Section 5 Chapter 20, Section 5 1 Outline Brains Neural

Artificial Neural Networks Roger Barlow CODATA School - Roger Barlow -Artificial Neural Networks

How Neural Networks (NN) Biological Neuron: A . . . Can (Hopefully) Learn Artificial Neural . .

Artificial Neural Networks Oliver Schulte - CMPT 726 Feed-forward Networks Network Training

Networks Luke Schuler Overview What is an Artificial Neural Network? History

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Deep Learning for Classification CS293S, Yang, 2017 Computational graph for classification w 1 f

Status of the CBM- and HADES RICH projects at FAIR C. Pauly, Wuppertal University for the CBM

SiD detector design - a critics view Felix Sefkow DESY SiD workshop, SLAC, January 12-14,

Analytical and numerical methods for pricing financial derivatives Daniel Sev covi c

Neural Network II Neural Network II Week 8 1 Team Homework Assignment #10 Team Homework

Lecture 2: Linear Classification Princeton University COS 495 Instructor: Yingyu Liang Review:

Linear and Logistic Regression Yingyu Liang Computer Sciences 760 Fall 2017

Machine Learning - MT 2016 8. Classification: Logistic Regression Varun Kanade University of