artificial neural networks
play

Artificial Neural Networks Oliver Schulte - CMPT 726 Feed-forward - PowerPoint PPT Presentation

Feed-forward Networks Network Training Error Backpropagation Applications Artificial Neural Networks Oliver Schulte - CMPT 726 Feed-forward Networks Network Training Error Backpropagation Applications Neural Networks Neural networks


  1. Feed-forward Networks Network Training Error Backpropagation Applications Artificial Neural Networks Oliver Schulte - CMPT 726

  2. Feed-forward Networks Network Training Error Backpropagation Applications Neural Networks • Neural networks arise from attempts to model human/animal brains • Many models, many claims of biological plausibility • We will focus on multi-layer perceptrons • Mathematical properties rather than plausibility • Prof. Hadley CMPT418

  3. Feed-forward Networks Network Training Error Backpropagation Applications Uses of Neural Networks • Pros • Good for continuous input variables. • General continuous function approximators. • Highly non-linear. • Learn feature functions. • Good to use in continuous domains with little knowledge: • When you don’t know good features. • You don’t know the form of a good functional model. • Cons • Not interpretable, “black box”. • Learning is slow. • Good generalization can require many datapoints.

  4. Feed-forward Networks Network Training Error Backpropagation Applications Applications There are many, many applications. • World-Champion Backgammon Player. http://en.wikipedia.org/wiki/TD-Gammon http://en.wikipedia.org/wiki/Backgammon • No Hands Across America Tour. http://www.cs.cmu.edu/afs/cs/usr/tjochem/ www/nhaa/nhaa_home_page.html • Digit Recognition with 99.26% accuracy. • ...

  5. Feed-forward Networks Network Training Error Backpropagation Applications Outline Feed-forward Networks Network Training Error Backpropagation Applications

  6. Feed-forward Networks Network Training Error Backpropagation Applications Outline Feed-forward Networks Network Training Error Backpropagation Applications

  7. Feed-forward Networks Network Training Error Backpropagation Applications Feed-forward Networks • We have looked at generalized linear models of the form:   M � y ( x , w ) = f w j φ j ( x )   j = 1 for fixed non-linear basis functions φ ( · ) • We now extend this model by allowing adaptive basis functions, and learning their parameters • In feed-forward networks (a.k.a. multi-layer perceptrons) we let each basis function be another non-linear function of linear combination of the inputs:   M � φ j ( x ) = f . . .   j = 1

  8. Feed-forward Networks Network Training Error Backpropagation Applications Feed-forward Networks • We have looked at generalized linear models of the form:   M � y ( x , w ) = f w j φ j ( x )   j = 1 for fixed non-linear basis functions φ ( · ) • We now extend this model by allowing adaptive basis functions, and learning their parameters • In feed-forward networks (a.k.a. multi-layer perceptrons) we let each basis function be another non-linear function of linear combination of the inputs:   M � φ j ( x ) = f . . .   j = 1

  9. Feed-forward Networks Network Training Error Backpropagation Applications Feed-forward Networks • Starting with input x = ( x 1 , . . . , x D ) , construct linear combinations: D w ( 1 ) ji x i + w ( 1 ) � a j = j 0 i = 1 These a j are known as activations • Pass through an activation function h ( · ) to get output z j = h ( a j ) • Model of an individual neuron from Russell and Norvig, AIMA2e

  10. Feed-forward Networks Network Training Error Backpropagation Applications Feed-forward Networks • Starting with input x = ( x 1 , . . . , x D ) , construct linear combinations: D w ( 1 ) ji x i + w ( 1 ) � a j = j 0 i = 1 These a j are known as activations • Pass through an activation function h ( · ) to get output z j = h ( a j ) • Model of an individual neuron from Russell and Norvig, AIMA2e

  11. Feed-forward Networks Network Training Error Backpropagation Applications Feed-forward Networks • Starting with input x = ( x 1 , . . . , x D ) , construct linear combinations: D w ( 1 ) ji x i + w ( 1 ) � a j = j 0 i = 1 These a j are known as activations • Pass through an activation function h ( · ) to get output z j = h ( a j ) • Model of an individual neuron from Russell and Norvig, AIMA2e

  12. Feed-forward Networks Network Training Error Backpropagation Applications Activation Functions • Can use a variety of activation functions • Sigmoidal (S-shaped) • Logistic sigmoid 1 / ( 1 + exp ( − a )) (useful for binary classification) • Hyperbolic tangent tanh i ( x i − w ji ) 2 • Radial basis function z j = � • Softmax • Useful for multi-class classification • Hard Threshold • . . . • Should be differentiable for gradient-based learning (later) • Can use different activation functions in each unit

  13. Feed-forward Networks Network Training Error Backpropagation Applications Feed-forward Networks hidden units z M w (1) w (2) MD KM x D y K outputs inputs y 1 x 1 w (2) z 1 10 x 0 z 0 • Connect together a number of these units into a feed-forward network (DAG) • Above shows a network with one layer of hidden units • Implements function: � D   M � w ( 2 ) w ( 1 ) ji x i + w ( 1 ) + w ( 2 ) � � y k ( x , w ) = h kj h  j 0 k 0  j = 1 i = 1 • See http://aispace.org/neural/ .

  14. Feed-forward Networks Network Training Error Backpropagation Applications A general network ... ... target t t 1 t 2 t k t c output z z 1 z 2 z k z c output ... ... w kj y 1 y 2 y j y n H hidden ... ... w ji x 1 x 2 x i x d input ... ... ... ... input x x 1 x 2 x i x d

  15. Feed-forward Networks Network Training Error Backpropagation Applications The XOR Problem Revisited x 2 z=-1 1 R 2 R 1 z=+1 x 1 -1 1 R 2 z=-1 -1

  16. Feed-forward Networks Network Training Error Backpropagation Applications The XOR Problem Solved z x 2 1 0 1 -1 0 -1 0 -1 1 x 1 z k output k y 1 y 2 x 2 x 2 1 -.4 -1 w kj 1 y 1 y 2 .7 0 0 1 1 -1 -1 -1.5 hidden j 0 0 -1 -1 bias 0 .5 0 -1 -1 1 w ji 1 1 1 1 x 1 x 1 1 input i x 2 x 1

  17. Feed-forward Networks Network Training Error Backpropagation Applications Hidden Units Compute Basis Functions • red dots = network function • dashed line = hidden unit activation function. • blue dots = data points Network function is roughly the sum of activation functions.

  18. Feed-forward Networks Network Training Error Backpropagation Applications Hidden Units As Feature Extractors sample training patterns ... ... ... learned input-to-hidden weights • 64 input nodes • 2 hidden units • learned weight matrix at hidden units

  19. Feed-forward Networks Network Training Error Backpropagation Applications Outline Feed-forward Networks Network Training Error Backpropagation Applications

  20. Feed-forward Networks Network Training Error Backpropagation Applications Network Training • Given a specified network structure, how do we set its parameters (weights)? • As usual, we define a criterion to measure how well our network performs, optimize against it • For regression, training data are ( x n , t ) , t n ∈ R • Squared error naturally arises: N � { y ( x n , w ) − t n } 2 E ( w ) = n = 1

  21. Feed-forward Networks Network Training Error Backpropagation Applications Network Training • Given a specified network structure, how do we set its parameters (weights)? • As usual, we define a criterion to measure how well our network performs, optimize against it • For regression, training data are ( x n , t ) , t n ∈ R • Squared error naturally arises: N � { y ( x n , w ) − t n } 2 E ( w ) = n = 1

  22. Feed-forward Networks Network Training Error Backpropagation Applications Parameter Optimization E ( w ) w 1 w A w B w C w 2 ∇ E • For either of these problems, the error function E ( w ) is nasty • Nasty = non-convex • Non-convex = has local minima

  23. Feed-forward Networks Network Training Error Backpropagation Applications Descent Methods • The typical strategy for optimization problems of this sort is a descent method: w ( τ + 1 ) = w ( τ ) + η w ( τ ) • These come in many flavours • Gradient descent ∇ E ( w ( τ ) ) • Stochastic gradient descent ∇ E n ( w ( τ ) ) • Newton-Raphson (second order) ∇ 2 • All of these can be used here, stochastic gradient descent is particularly effective • Redundancy in training data, escaping local minima

  24. Feed-forward Networks Network Training Error Backpropagation Applications Descent Methods • The typical strategy for optimization problems of this sort is a descent method: w ( τ + 1 ) = w ( τ ) + η w ( τ ) • These come in many flavours • Gradient descent ∇ E ( w ( τ ) ) • Stochastic gradient descent ∇ E n ( w ( τ ) ) • Newton-Raphson (second order) ∇ 2 • All of these can be used here, stochastic gradient descent is particularly effective • Redundancy in training data, escaping local minima

  25. Feed-forward Networks Network Training Error Backpropagation Applications Descent Methods • The typical strategy for optimization problems of this sort is a descent method: w ( τ + 1 ) = w ( τ ) + η w ( τ ) • These come in many flavours • Gradient descent ∇ E ( w ( τ ) ) • Stochastic gradient descent ∇ E n ( w ( τ ) ) • Newton-Raphson (second order) ∇ 2 • All of these can be used here, stochastic gradient descent is particularly effective • Redundancy in training data, escaping local minima

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend