supervised learning artificial neural networks
play

Supervised Learning Artificial Neural Networks Marco Chiarandini - PowerPoint PPT Presentation

Lecture 11 Supervised Learning Artificial Neural Networks Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Slides by Stuart Russell and Peter Norvig Neural Networks Course Overview Other


  1. Lecture 11 Supervised Learning Artificial Neural Networks Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Slides by Stuart Russell and Peter Norvig

  2. Neural Networks Course Overview Other Methods and Issues ✔ Introduction Learning Supervised ✔ Artificial Intelligence Decision Trees, Neural ✔ Intelligent Agents Networks ✔ Search Learning Bayesian Networks ✔ Uninformed Search Unsupervised ✔ Heuristic Search EM Algorithm ✔ Uncertain knowledge and Reinforcement Learning Reasoning Games and Adversarial Search ✔ Probability and Bayesian Minimax search and approach Alpha-beta pruning ✔ Bayesian Networks Multiagent search ✔ Hidden Markov Chains ✔ Kalman Filters Knowledge representation and Reasoning Propositional logic First order logic Inference Plannning 2

  3. Neural Networks Outline Other Methods and Issues 1. Neural Networks Feedforward Networks Single-layer perceptrons Multi-layer perceptrons 2. Other Methods and Issues 3

  4. Neural Networks A neuron in a living biological system Other Methods and Issues Axonal arborization Axon from another cell Synapse Dendrite Axon Nucleus Synapses Cell body or Soma Signals are noisy “spike trains” of electrical potential 4

  5. Neural Networks Other Methods and Issues In the brain: > 20 types of neurons with 10 14 synapses (compare with world population = 7 × 10 9 ) Additionally, brain is parallel and reorganizing while computers are serial and static Brain is fault tolerant: neurons can be destroyed. 5

  6. Neural Networks Other Methods and Issues Observations of neuroscience Neuroscientists: view brains as a web of clues to the biological mechanisms of cognition. Engineers: The brain is an example solution to the problem of cognitive computing 6

  7. Neural Networks Applications Other Methods and Issues supervised learning: regression and classification associative memory optimization: grammatical induction, (aka, grammatical inference) e.g. in natural language processing noise filtering simulation of biological brains 7

  8. Neural Networks Artificial Neural Networks Other Methods and Issues � “ The neural network” does not exist. There are different paradigms for neural networks, how they are trained and where they are used. Artificial Neuron Each input is multiplied by a weighting factor. Output is 1 if sum of weighted inputs exceeds the threshold value; 0 otherwise. Network is programmed by adjusting weights using feedback from examples. 8

  9. Neural Networks McCulloch–Pitts “unit” (1943) Other Methods and Issues Output is a function of weighted inputs:   � a i = g ( in i ) = g W j , i a j  j Bias Weight a 0 = − 1 a i = g ( in i ) W 0 ,i g in i W j,i Σ a j a i Input� Input� Activation� Output� Output Links Function Function Links A gross oversimplification of real neurons, but its purpose is to develop understanding of what networks of simple units can do 9

  10. Neural Networks Activation functions Other Methods and Issues Non linear activation functions g ( in i ) g ( in i ) + 1 + 1 in i in i (a)� (b)� (a) is a step function or threshold function (mostly used in theoretical studies) (b) is a continuous activation function, e.g., sigmoid function 1 / ( 1 + e − x ) (mostly used in practical applications) Changing the bias weight W 0 , i moves the threshold location 10

  11. Neural Networks Implementing logical functions Other Methods and Issues W 0 = 1.5 W 0 = 0.5 W 0 = – 0.5 W 1 = 1 W 1 = 1 W 1 = –1 W 2 = 1 W 2 = 1 AND OR NOT McCulloch and Pitts: every (basic) Boolean function can be implemented (eventually by connecting a large number of units in networks, possibly recurrent, of arbitrary depth) 11

  12. Neural Networks Network structures Other Methods and Issues Architecture: definition of number of nodes and interconnection structures and activation functions g but not weights. Feed-forward networks: no cycles in the connection graph single-layer perceptrons (no hidden layers) multi-layer perceptrons (one or more hidden layers) Feed-forward networks implement functions, have no internal state Recurrent networks: – Hopfield networks have symmetric weights ( W i , j = W j , i ) g ( x ) = sign ( x ) , a i = { 1 , 0 } ; associative memory – recurrent neural nets have directed cycles with delays = ⇒ have internal state (like flip-flops), can oscillate etc. 13

  13. Neural Networks Use Other Methods and Issues Neural Networks are used in classification and regression Boolean classification: - value over 0.5 one class - value below 0.5 other class k -way classification - divide single output into k portions - k separate output unit continuous output - identity activation function in output unit 14

  14. Neural Networks Single-layer NN (perceptrons) Other Methods and Issues Perceptron output 1 0.8 0.6 0.4 0.2 -4 -2 0 2 4 0 -4 x 2 -2 0 2 Input Output x 1 4 W j,i Units Units Output units all operate separately—no shared weights Adjusting weights moves the location, orientation, and steepness of cliff 15

  15. Neural Networks Expressiveness of perceptrons Other Methods and Issues Consider a perceptron with g = step function (Rosenblatt, 1957, 1960) The output is 1 when: � W j x j > 0 or W · x > 0 j Hence, it represents a linear separator in input space: - hyperplane in multidimensional space - line in 2 dimensions Minsky & Papert (1969) pricked the neural network balloon 16

  16. Neural Networks Perceptron learning Other Methods and Issues Learn by adjusting weights to reduce error on training set The squared error for an example with input x and true output y is E = 1 2 Err 2 ≡ 1 2 ( y − h W ( x )) 2 , Find local optima for the minimization of the function E ( W ) in the vector of variables W by gradient methods. Note, the function E depends on constant values x that are the inputs to the perceptron. The function E depends on h which is non-convex, hence the optimization problem cannot be solved just by solving ∇ E ( W ) = 0 17

  17. Neural Networks Digression: Gradient methods Other Methods and Issues Gradient methods are iterative approaches: find a descent direction with respect to the objective function E move W in that direction by a step size The descent direction can be computed by various methods, such as gradient descent, Newton-Raphson method and others. The step size can be computed either exactly or loosely by solving a line search problem. Example: gradient descent 1. Set iteration counter t = 0, and make an initial guess W 0 for the minimum 2. Repeat: 3. Compute a descent direction p t = ∇ ( E ( W t )) 4. Choose α t to minimize f ( α ) = E ( W t − α p t ) over α ∈ R + 5. Update W t + 1 = W t − α t p t , and t = t + 1 6. Until �∇ f ( W k ) � < tolerance Step 3 can be solved ’loosely’ by taking a fixed small enough value α > 0 18

  18. Neural Networks Perceptron learning Other Methods and Issues In the specific case of the perceptron, the descent direction is computed by the gradient:   n ∂ E Err · ∂ Err ∂ � = = Err ·  y − g ( W j x j )  ∂ W j ∂ W j ∂ W j j = 0 − Err · g ′ ( in ) · x j = and the weight update rule (perceptron learning rule) in step 5 becomes: W t + 1 = W t j + α · Err · g ′ ( in ) · x j j For threshold perceptron, g ′ ( in ) is undefined: Original perceptron learning rule (Rosenblatt, 1957) simply omits g ′ ( in ) 19

  19. Neural Networks Perceptron learning contd. Other Methods and Issues function Perceptron-Learning( examples,network ) returns perceptron weights inputs : examples , a set of examples, each with input x = x 1 , x 2 , . . . , x n and output y inputs : network , a perceptron with weights W j , j = 0 , . . . , n and activation function g repeat for each e in examples do in ← � n j = 0 W j x j [ e ] Err ← y [ e ] − g ( in ) W j ← W j + α · Err · g ′ ( in ) · x j [ e ] end until all examples correctly predicted or stopping criterion is reached return network Perceptron learning rule converges to a consistent function for any linearly separable data set 20

  20. Neural Networks Numerical Example Other Methods and Issues The (Fisher’s or Anderson’s) iris data set gives the measurements in centimeters of the variables petal length and width, respectively, for 50 flowers from each of 2 species of iris. The species are “Iris setosa”, and “versicolor”. > head(iris.data) Petal Dimensions in Iris Blossoms Sepal.Length Sepal.Width Species id 4.5 6 5.4 3.9 setosa -1 S 4.0 S 4 4.6 3.1 setosa -1 S S S S 84 6.0 2.7 versicolor 1 S S S SS 3.5 S SS S S V S S S V V 31 4.8 3.1 setosa -1 S S V V S S S V V V 3.0 V V V V 77 6.8 2.8 versicolor 1 V V Width V V V V V 15 5.8 4.0 setosa -1 V 2.5 V V S V V V 2.0 1.5 S Setosa Petals V Versicolor Petals 1.0 4 5 6 7 8 Length 21

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend