Neural Networks Greg Mori - CMPT 419/726 Bishop PRML Ch. 5 - PowerPoint PPT Presentation

Feed-forward Networks Network Training Error Backpropagation Deep Learning Neural Networks Greg Mori - CMPT 419/726 Bishop PRML Ch. 5

Feed-forward Networks Network Training Error Backpropagation Deep Learning Neural Networks • Neural networks arise from attempts to model human/animal brains • Many models, many claims of biological plausibility • We will focus on multi-layer perceptrons • Mathematical properties rather than plausibility

Feed-forward Networks Network Training Error Backpropagation Deep Learning Applications of Neural Networks • Many success stories for neural networks, old and new • Credit card fraud detection • Hand-written digit recognition • Face detection • Autonomous driving (CMU ALVINN) • Object recognition • Speech recognition

Feed-forward Networks Network Training Error Backpropagation Deep Learning Outline Feed-forward Networks Network Training Error Backpropagation Deep Learning

Feed-forward Networks Network Training Error Backpropagation Deep Learning Feed-forward Networks • We have looked at generalized linear models of the form:   M � y ( x , w ) = f w j φ j ( x )   j = 1 for fixed non-linear basis functions φ ( · ) • We now extend this model by allowing adaptive basis functions, and learning their parameters • In feed-forward networks (a.k.a. multi-layer perceptrons) we let each basis function be another non-linear function of linear combination of the inputs:   M � φ j ( x ) = f . . .   j = 1

Feed-forward Networks Network Training Error Backpropagation Deep Learning Feed-forward Networks • Starting with input x = ( x 1 , . . . , x D ) , construct linear combinations: D � w ( 1 ) ji x i + w ( 1 ) a j = j 0 i = 1 These a j are known as activations • Pass through an activation function h ( · ) to get output z j = h ( a j ) • Model of an individual neuron from Russell and Norvig, AIMA2e

Feed-forward Networks Network Training Error Backpropagation Deep Learning Activation Functions • Can use a variety of activation functions • Sigmoidal (S-shaped) • Logistic sigmoid 1 / ( 1 + exp ( − a )) (useful for binary classification) • Hyperbolic tangent tanh • Radial basis function z j = � i ( x i − w ji ) 2 • Softmax • Useful for multi-class classification • Identity • Useful for regression • Threshold • Max, ReLU, Leaky ReLU, . . . • Needs to be differentiable* for gradient-based learning (later) • Can use different activation functions in each unit

Feed-forward Networks Network Training Error Backpropagation Deep Learning Feed-forward Networks hidden units z M w (1) w (2) MD KM x D y K outputs inputs y 1 x 1 w (2) z 1 10 x 0 z 0 • Connect together a number of these units into a feed-forward network (DAG) • Above shows a network with one layer of hidden units • Implements function:   � D � M � � w ( 2 ) w ( 1 ) ji x i + w ( 1 ) + w ( 2 ) y k ( x , w ) = h kj h   j 0 k 0 j = 1 i = 1

Feed-forward Networks Network Training Error Backpropagation Deep Learning Outline Feed-forward Networks Network Training Error Backpropagation Deep Learning

Feed-forward Networks Network Training Error Backpropagation Deep Learning Network Training • Given a specified network structure, how do we set its parameters (weights)? • As usual, we define a criterion to measure how well our network performs, optimize against it • For regression, training data are ( x n , t ) , t n ∈ R • Squared error naturally arises: N � { y ( x n , w ) − t n } 2 E ( w ) = n = 1 • For binary classification, this is another discriminative model, ML: N � y t n n { 1 − y n } 1 − t n p ( t | w ) = n = 1 N � E ( w ) = − { t n ln y n + ( 1 − t n ) ln ( 1 − y n ) } n = 1

Feed-forward Networks Network Training Error Backpropagation Deep Learning Parameter Optimization E ( w ) w 1 w A w B w C w 2 ∇ E • For either of these problems, the error function E ( w ) is nasty • Nasty = non-convex • Non-convex = has local minima

Feed-forward Networks Network Training Error Backpropagation Deep Learning Descent Methods • The typical strategy for optimization problems of this sort is a descent method: w ( τ + 1 ) = w ( τ ) + ∆ w ( τ ) • As we’ve seen before, these come in many flavours • Gradient descent ∇ E ( w ( τ ) ) • Stochastic gradient descent ∇ E n ( w ( τ ) ) • Newton-Raphson (second order) ∇ 2 • All of these can be used here, stochastic gradient descent is particularly effective • Redundancy in training data, escaping local minima

Feed-forward Networks Network Training Error Backpropagation Deep Learning Computing Gradients • The function y ( x n , w ) implemented by a network is complicated • It isn’t obvious how to compute error function derivatives with respect to weights • Numerical method for calculating error derivatives, use finite differences: ≈ E n ( w ji + ǫ ) − E n ( w ji − ǫ ) ∂ E n ∂ w ji 2 ǫ • How much computation would this take with W weights in the network? • O ( W ) per derivative, O ( W 2 ) total per gradient descent step

Neural Networks Greg Mori - CMPT 419/726 Bishop PRML Ch. 5 - PowerPoint PPT Presentation

Feed-forward Networks Network Training Error Backpropagation Deep Learning Neural Networks Greg Mori - CMPT 419/726 Bishop PRML Ch. 5 Feed-forward Networks Network Training Error Backpropagation Deep Learning Neural Networks Neural

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

CHAPTER II III I CHAPTER Neural Networks as Neural Networks as Associative Memory

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Neural Networks 0. Logistics Spring 2019 1 Neural Networks are taking over! Neural networks

Neural Networks and their Application to Go Neural Networks Learning Blackjack Theory Training

Neural Networks 1. Introduction Fall 2017 Neural Networks are taking over! Neural networks

Neural Networks Neural Net Basics Dan Klein, John DeNero UC Berkeley Slides adapted from Greg

Relaxation and Hopfield Networks Neural Networks Neural Networks - Hopfield 1 Bibliography

Neural Networks 1. Introduction Spring 2020 1 Neural Networks are taking over! Neural

Introduction to Artificial Intelligence Neural Networks - Deep Learning for NLP Janyl Jumadinova

Neural Networks 1. Introduction Spring 2019 1 Neural Networks are taking over! Neural

Performance from the Users Perspective Alois Reitbauer Disclaimer We used to measure here We

SAS-Based Group Authentication and Key Agreement Protocols Sven Laur 1 , 2 and Sylvain Pasini 3 2

SAS-Based Group Authentication and Key Agreement Protocols Sven Laur 1 , 2 and Sylvain Pasini 3 2

Meeting 39: 19 April 2018 Karakia 2 Karakia Ko te tumanako Kia pai tenei r Kia tutuki i ng

REDUCED-ORDER MODELING, MORI-ZWANZIG, DISCRETE MODELING, RENORMALIZATION Alexandre Chorin, Ole

The Structure of AS-regular Algebras Izuru Mori Department of Mathematics, Shizuoka University

Spoken Language Understanding EE596B/LING580K -- Conversational Artificial Intelligence Hao Fang

L evy Computability of Probability Distribution Functions Takakazu Mori Yoshiki Tsujii