neural networks
play

Neural Networks Philipp Koehn 14 April 2020 Philipp Koehn - PowerPoint PPT Presentation

Neural Networks Philipp Koehn 14 April 2020 Philipp Koehn Artificial Intelligence: Neural Networks 14 April 2020 Supervised Learning 1 Examples described by attribute values (Boolean, discrete, continuous, etc.) E.g., situations where


  1. Neural Networks Philipp Koehn 14 April 2020 Philipp Koehn Artificial Intelligence: Neural Networks 14 April 2020

  2. Supervised Learning 1 ● Examples described by attribute values (Boolean, discrete, continuous, etc.) ● E.g., situations where I will/won’t wait for a table: Attributes Target Example Alt Bar F ri Hun P at P rice Rain Res T ype Est WillWait X 1 T F F T Some $$$ F T French 0–10 T X 2 T F F T Full $ F F Thai 30–60 F X 3 F T F F Some $ F F Burger 0–10 T X 4 T F T T Full $ F F Thai 10–30 T > 60 X 5 T F T F Full $$$ F T French F X 6 F T F T Some $$ T T Italian 0–10 T X 7 F T F F None $ T F Burger 0–10 F X 8 F F F T Some $$ T T Thai 0–10 T > 60 X 9 F T T F Full $ T F Burger F X 10 T T T T Full $$$ F T Italian 10–30 F X 11 F F F F None $ F F Thai 0–10 F X 12 T T T T Full $ F F Burger 30–60 T ● Classification of examples is positive (T) or negative (F) Philipp Koehn Artificial Intelligence: Neural Networks 14 April 2020

  3. Naive Bayes Models 2 ● Bayes rule p ( C ∣ A ) = 1 Z p ( A ∣ C ) p ( C ) ● Independence assumption p ( A ∣ C ) = p ( a 1 ,a 2 ,a 3 ,...,a n ∣ C ) ≃ p ( a i ∣ C ) ∏ i ● Weights p ( A ∣ C ) = ∏ p ( a i ∣ C ) λ i i Philipp Koehn Artificial Intelligence: Neural Networks 14 April 2020

  4. Naive Bayes Models 3 ● Linear model p ( A ∣ C ) = p ( a i ∣ C ) λ i ∏ i = λ i log p ( a i ∣ C ) exp ∑ i ● Probability distribution as features h i ( A ,C ) = log p ( a i ∣ C ) h 0 ( A ,C ) = log p ( C ) ● Linear model with features p ( C ∣ A ) ∝ ∑ λ i h i ( A ,C ) i Philipp Koehn Artificial Intelligence: Neural Networks 14 April 2020

  5. Linear Model 4 ● Weighted linear combination of feature values h j and weights λ j for example d i score ( λ, d i ) = ∑ λ j h j ( d i ) j ● Such models can be illustrated as a ”network” Philipp Koehn Artificial Intelligence: Neural Networks 14 April 2020

  6. Limits of Linearity 5 ● We can give each feature a weight ● But not more complex value relationships, e.g, – any value in the range [0;5] is equally good – values over 8 are bad – higher than 10 is not worse Philipp Koehn Artificial Intelligence: Neural Networks 14 April 2020

  7. XOR 6 ● Linear models cannot model XOR good bad bad good Philipp Koehn Artificial Intelligence: Neural Networks 14 April 2020

  8. Multiple Layers 7 ● Add an intermediate (”hidden”) layer of processing (each arrow is a weight) ● Have we gained anything so far? Philipp Koehn Artificial Intelligence: Neural Networks 14 April 2020

  9. Non-Linearity 8 ● Instead of computing a linear combination score ( λ, d i ) = ∑ λ j h j ( d i ) j ● Add a non-linear function score ( λ, d i ) = f (∑ λ j h j ( d i )) j ● Popular choices 1 tanh(x) sigmoid(x) = 1 + e − x ✻ ✻ ✲ ✲ (sigmoid is also called the ”logistic function”) Philipp Koehn Artificial Intelligence: Neural Networks 14 April 2020

  10. Deep Learning 9 ● More layers = deep learning Philipp Koehn Artificial Intelligence: Neural Networks 14 April 2020

  11. 10 example Philipp Koehn Artificial Intelligence: Neural Networks 14 April 2020

  12. Simple Neural Network 11 3.7 2.9 4.5 3.7 -5.2 2.9 5 . 1 - -2.0 -4.6 1 1 ● One innovation: bias units (no inputs, always value 1) Philipp Koehn Artificial Intelligence: Neural Networks 14 April 2020

  13. Sample Input 12 3.7 1.0 2.9 4 . 5 3.7 -5.2 0.0 2.9 5 . 1 - -2.0 -4.6 1 1 ● Try out two input values ● Hidden unit computation 1 sigmoid ( 1.0 × 3 . 7 + 0.0 × 3 . 7 + 1 × − 1 . 5 ) = sigmoid ( 2 . 2 ) = 1 + e − 2 . 2 = 0 . 90 1 sigmoid ( 1.0 × 2 . 9 + 0.0 × 2 . 9 + 1 × − 4 . 5 ) = sigmoid (− 1 . 6 ) = 1 + e 1 . 6 = 0 . 17 Philipp Koehn Artificial Intelligence: Neural Networks 14 April 2020

  14. Computed Hidden 13 3.7 1.0 .90 2.9 4.5 3.7 -5.2 0.0 .17 2.9 5 . 1 - -2.0 -4.6 1 1 ● Try out two input values ● Hidden unit computation 1 sigmoid ( 1.0 × 3 . 7 + 0.0 × 3 . 7 + 1 × − 1 . 5 ) = sigmoid ( 2 . 2 ) = 1 + e − 2 . 2 = 0 . 90 1 sigmoid ( 1.0 × 2 . 9 + 0.0 × 2 . 9 + 1 × − 4 . 5 ) = sigmoid (− 1 . 6 ) = 1 + e 1 . 6 = 0 . 17 Philipp Koehn Artificial Intelligence: Neural Networks 14 April 2020

  15. Compute Output 14 3.7 1.0 .90 2.9 4.5 3.7 -5.2 0.0 .17 2.9 5 . 1 - -2.0 -4.6 1 1 ● Output unit computation 1 sigmoid ( .90 × 4 . 5 + .17 × − 5 . 2 + 1 × − 2 . 0 ) = sigmoid ( 1 . 17 ) = 1 + e − 1 . 17 = 0 . 76 Philipp Koehn Artificial Intelligence: Neural Networks 14 April 2020

  16. Computed Output 15 3.7 1.0 .90 2.9 4.5 3.7 -5.2 0.0 .17 .76 2.9 5 . 1 - -2.0 -4.6 1 1 ● Output unit computation 1 sigmoid ( .90 × 4 . 5 + .17 × − 5 . 2 + 1 × − 2 . 0 ) = sigmoid ( 1 . 17 ) = 1 + e − 1 . 17 = 0 . 76 Philipp Koehn Artificial Intelligence: Neural Networks 14 April 2020

  17. 16 why ”neural” networks? Philipp Koehn Artificial Intelligence: Neural Networks 14 April 2020

  18. Neuron in the Brain 17 ● The human brain is made up of about 100 billion neurons Dendrite Axon terminal Soma Axon Nucleus ● Neurons receive electric signals at the dendrites and send them to the axon Philipp Koehn Artificial Intelligence: Neural Networks 14 April 2020

  19. The Brain vs. Artificial Neural Networks 18 ● Similarities – Neurons, connections between neurons – Learning = change of connections, not change of neurons – Massive parallel processing ● But artificial neural networks are much simpler – computation within neuron vastly simplified – discrete time steps – typically some form of supervised learning with massive number of stimuli Philipp Koehn Artificial Intelligence: Neural Networks 14 April 2020

  20. 19 back-propagation training Philipp Koehn Artificial Intelligence: Neural Networks 14 April 2020

  21. Error 20 3.7 1.0 .90 2.9 4.5 3.7 -5.2 0.0 .17 .76 2.9 5 . 1 - -2.0 -4.6 1 1 ● Computed output: y = .76 ● Correct output: t = 1.0 ⇒ How do we adjust the weights? Philipp Koehn Artificial Intelligence: Neural Networks 14 April 2020

  22. Key Concepts 21 ● Gradient descent – error is a function of the weights – we want to reduce the error – gradient descent: move towards the error minimum – compute gradient → get direction to the error minimum – adjust weights towards direction of lower error ● Back-propagation – first adjust last set of weights – propagate error back to each previous layer – adjust their weights Philipp Koehn Artificial Intelligence: Neural Networks 14 April 2020

  23. Gradient Descent 22 error( λ ) gradient = 1 λ optimal λ current λ Philipp Koehn Artificial Intelligence: Neural Networks 14 April 2020

  24. Gradient Descent 23 Current Point Gradient for w 1 Optimum Combined Gradient Gradient for w 2 Philipp Koehn Artificial Intelligence: Neural Networks 14 April 2020

  25. Derivative of Sigmoid 24 1 ● Sigmoid sigmoid ( x ) = 1 + e − x ● Reminder: quotient rule ( f ( x ) = g ( x ) f ′ ( x ) − f ( x ) g ′ ( x ) ′ g ( x )) g ( x ) 2 d sigmoid ( x ) = d 1 ● Derivative 1 + e − x dx dx = 0 × ( 1 − e − x ) − ( − e − x ) ( 1 + e − x ) 2 e − x 1 1 + e − x ( 1 + e − x ) = 1 1 1 + e − x ( 1 − 1 + e − x ) = = sigmoid ( x )( 1 − sigmoid ( x )) Philipp Koehn Artificial Intelligence: Neural Networks 14 April 2020

  26. Final Layer Update 25 ● Linear combination of weights s = ∑ k w k h k ● Activation function y = sigmoid ( s ) 2 ( t − y ) 2 ● Error (L2 norm) E = 1 ● Derivative of error with regard to one weight w k dE = dE dy ds dw k dy ds dw k Philipp Koehn Artificial Intelligence: Neural Networks 14 April 2020

  27. Final Layer Update (1) 26 ● Linear combination of weights s = ∑ k w k h k ● Activation function y = sigmoid ( s ) 2 ( t − y ) 2 ● Error (L2 norm) E = 1 ● Derivative of error with regard to one weight w k dE = dE dy ds dw k dy ds dw k ● Error E is defined with respect to y dE dy = d 1 2 ( t − y ) 2 = − ( t − y ) dy Philipp Koehn Artificial Intelligence: Neural Networks 14 April 2020

  28. Final Layer Update (2) 27 ● Linear combination of weights s = ∑ k w k h k ● Activation function y = sigmoid ( s ) 2 ( t − y ) 2 ● Error (L2 norm) E = 1 ● Derivative of error with regard to one weight w k dE = dE dy ds dw k dy ds dw k ● y with respect to x is sigmoid ( s ) ds = d sigmoid ( s ) dy = sigmoid ( s )( 1 − sigmoid ( s )) = y ( 1 − y ) ds Philipp Koehn Artificial Intelligence: Neural Networks 14 April 2020

  29. Final Layer Update (3) 28 ● Linear combination of weights s = ∑ k w k h k ● Activation function y = sigmoid ( s ) 2 ( t − y ) 2 ● Error (L2 norm) E = 1 ● Derivative of error with regard to one weight w k dE = dE dy ds dw k dy ds dw k ● x is weighted linear combination of hidden node values h k ds d dw k ∑ = w k h k = h k dw k k Philipp Koehn Artificial Intelligence: Neural Networks 14 April 2020

  30. Putting it All Together 29 ● Derivative of error with regard to one weight w k dE = dE dy ds dw k dy ds dw k = − ( t − y ) y ( 1 − y ) h k – error – derivative of sigmoid: y ′ ● Weight adjustment will be scaled by a fixed learning rate µ ∆ w k = µ ( t − y ) y ′ h k Philipp Koehn Artificial Intelligence: Neural Networks 14 April 2020

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend