Neural Networks (Reading: Kuncheva Section 2.5) Introduction - PowerPoint PPT Presentation

Neural Networks (Reading: Kuncheva Section 2.5)

Introduction Inspired by Biology But as used in pattern recognition research, have little relation with real neural systems (studied in neurology and neuroscience ) Kuncheva: the literature ‘on NNs is excessive and continuously growing.’ Early Work McCullough and Pitts (1943) 2

Introduction, Continued Black-Box View of a Neural Net Represents function f : R n → R c where n is the dimensionality of the input space, c the output space • Classification: map feature space to values for c discriminant functions: choose class with maximum discriminant value • Regression: learn continuous outputs directly (e.g. learn to fit the sin function - see Bishop text) Training (for Classification) Minimizes error on outputs (i.e. maximize function approximation) for a training set, most often the squared error: N c E ¼ 1 X X { g i ( z j ) � I ð v i , l ( z j ) Þ } 2 (2 : 77) 2 3 j ¼ 1 i ¼ 1

Introduction, Continued Granular Representation A set of interacting elements (‘neurons’ or nodes) map input values to output values using a structured series of interactions Properties • Instable: like decision trees, small changes in training data can alter NN behavior significantly • Also like decision trees, prone to overfitting: validation set often used to stop training • Expressive: With proper design and training, can approximate any function to a specified precision 4

Expressive Power of NNs Using Squared Error for Learning Classification Functions: For infinite data, the set of discriminant functions learned by a network approach the true posterior probabilities for each class (for multi-layer perceptrons (MLP), and radial basis function (RBF) networks): x [ R n N ! 1 g i ( x ) ¼ P ( v i j x ), lim (2 : 78) Note: This result applies to any classifier that can approximate an arbitrary discriminant function with a specified precision (not specific to NNs) 5

A Single Neuron (Node) Let u ¼ ½ u 0 , . . . , u q � T [ R q þ 1 be the input vector to the node and v [ R be its output. We call w ¼ ½ w 0 , . . . , w q � T [ R q þ 1 a vector of synaptic weights . The processing element implements the function q X v ¼ f ( j ); j ¼ w i u i (2 : 79) i ¼ 0 where f : R ! R is the activation function and j is the net sum . Typical choices for f 6 Fig. 2.14 The NN processing unit.

Common Activation Functions ¼ j : (net sum) . The threshold function � 1, if j � 0, f ( j ) ¼ 0, otherwise. . The sigmoid function 1 form f 0 ( j ) ¼ f ( j ) ½ 1 � f ( j ) � . f ( j ) ¼ 1 þ exp ( � j ) . The identity function (used for input nodes) f ( j ) ¼ j 7

Bias: Offset for Activation Functions ½ � � The weight “ � w 0 ” is used as a bias , and the corresponding input value u 0 is set to 1. Equation (2.79) can be rewritten as " # q X v ¼ f ½ z � ( � w 0 ) � ¼ f w i u i � ( � w 0 ) (2 : 83) i ¼ 1 where z is now the weighted sum of the weighted inputs from 1 to q . Geometrically, the equation q X w i u i � ( � w 0 ) ¼ 0 (2 : 84) i ¼ 1 defines a hyperplane in R q . A node with a threshold activation function (2.80) responds with value þ 1 to all inputs ½ u 1 , . . . , u q � T on the one side of the hyperplane, and value 0 to all inputs on the other side. 8

The Perception (Rosenblatt, 1962) Rosenblatt [8] defined the so called perceptron and its famous training algorithm. The perceptron is implemented as Eq. (2.79) with a threshold activation function � 1, if j � 0, f ( j ) ¼ (2 : 85) � 1, otherwise. q X v ¼ f ( j ); j ¼ w i u i (2 : 79) i ¼ 0 Update Rule: w � w � v h z j (2 : 86) where v is the output of the perceptron for z j and h is a parameter specifying the learning rate . Beside its simplicity, perceptron training has the following interesting Learning Algorithm: • Set all input weights ( w ) randomly (e.g. in [0,1]) • Apply the weight update rule when a misclassification is made • Pass over training data (Z) until no errors are made. One pass = 9 one epoch

Properties of Perceptron Learning Convergence and Zero Error! If two classes are linearly separable in feature space, always converges to a function producing no error on the training set Infinite Looping and No Guarantees! If classes not linearly separable. If stopped early, no guarantee that last function learned is the best considered during training 10

Perceptron Learning ( a ) Uniformly distributed two-class data and the boundary found by the perceptron Fig. 2.16 training algorithm. ( b ) The “evolution” of the class boundary. 11

Multi-Layer Perceptron • Nodes: perceptrons • Hidden, output layers have the same activation function (threshold or sigmoid) • Classification is feed- forward: compute activations one layer at a time, input to ouput: decide ω i for max g i (X) • Learning is through (activation: backpropagation (update identity fn) input weights from output to input layer) Fig. 2.17 A generic model of an MLP classifier. 12

Multi-Layer Perceptron • Nodes: perceptrons • Hidden, output layers have the same activation function (threshold or sigmoid) • Classification is feed- forward: compute activations one layer at a time, input to ouput: decide ω i for max g i (X) • Learning is through (activation: backpropagation (update identity fn) input weights from output to input layer) Fig. 2.17 A generic model of an MLP classifier. 13

Multi-Layer Perceptron • Nodes: perceptrons • Hidden, output layers have the same activation function (threshold or sigmoid) • Classification is feed- forward: compute activations one layer at a time, input to ouput decide ω i for max g i (X) • Learning is through (activation: backpropagation (update identity fn) input weights from output to input layer) Fig. 2.17 A generic model of an MLP classifier. 14

Multi-Layer Perceptron • Nodes: perceptrons • Hidden, output layers have the same activation function (threshold or sigmoid) • Classification is feed- forward: compute activations one layer at a time, input to ouput decide ω i for max g i (X) • Learning is through (activation: backpropagation (update identity fn) input weights from output to input layer) Fig. 2.17 A generic model of an MLP classifier. 15

Multi-Layer Perceptron • Nodes: perceptrons (correct) • Hidden, output layers have the same activation function (threshold or sigmoid) • Classification is feed- forward: compute activations one layer at a time, input to ouput decide ω i for max g i (X) • Learning is through (activation: backpropagation (update identity fn) input weights from output to input layer) Fig. 2.17 A generic model of an MLP classifier. 16

MLP Properties Approximating Classification Regions MLP shown in previous slide with threshold nodes can approximate any classification regions in R n to a specified precision Approximating Any Function Later found that an MLP with one hidden layer and threshold nodes can approximate any function with a specified precision In Practice... These results tell us what is possible, but not how to achieve it (network structure and training algorithms) 20

Input Output Input Output Input Output Fig. 2.18 Possible classification regions for an MLP with one, two, and three layers of threshold nodes. ( Note that the “NN configuration” column only indicates the number of hidden layers and not the number of nodes needed to produce the regions in column “An example”. )

Neural Networks (Reading: Kuncheva Section 2.5) Introduction - PowerPoint PPT Presentation

Neural Networks (Reading: Kuncheva Section 2.5) Introduction Inspired by Biology But as used in pattern recognition research, have little relation with real neural systems (studied in neurology and neuroscience ) Kuncheva: the literature on

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

CHAPTER II III I CHAPTER Neural Networks as Neural Networks as Associative Memory

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Neural Networks 0. Logistics Spring 2019 1 Neural Networks are taking over! Neural networks

Neural Networks and their Application to Go Neural Networks Learning Blackjack Theory Training

Neural Networks 1. Introduction Fall 2017 Neural Networks are taking over! Neural networks

Neural Networks Neural Net Basics Dan Klein, John DeNero UC Berkeley Slides adapted from Greg

Relaxation and Hopfield Networks Neural Networks Neural Networks - Hopfield 1 Bibliography

Neural Networks 1. Introduction Spring 2020 1 Neural Networks are taking over! Neural

Introduction to Artificial Intelligence Neural Networks - Deep Learning for NLP Janyl Jumadinova

Neural Networks 1. Introduction Spring 2019 1 Neural Networks are taking over! Neural

Symmetric Functions, Alternating Sign Matrices, and Tokuyamas Identity Angle Hamel

Linear approximation and Taylor expansion of -terms F. Olimpieri Aix-Marseille Univ, CNRS,

Algorithms for Big Data (IV) Chihao Zhang Shanghai Jiao Tong University Oct. 11, 2019

CSC421/2516 Lecture 14: Exploding and Vanishing Gradients Roger Grosse and Jimmy Ba Roger Grosse

The WZ method and zeta function identities Ira M. Gessel Department of Mathematics Brandeis

Compiling a functional quantum programming language www.cs.nott.ac.uk/ jjg/qml Jonathan

Lambda Calculus Prof. Tom Austin San Jos State University Minimum complete programming

media synchronization draft-brandenburg-avt-rtcp-for-idms-04