introduction to neural networks
play

Introduction to Neural Networks Philipp Koehn 24 September 2020 - PowerPoint PPT Presentation

Introduction to Neural Networks Philipp Koehn 24 September 2020 Philipp Koehn Machine Translation: Introduction to Neural Networks 24 September 2020 Linear Models 1 We used before weighted linear combination of feature values h j and


  1. Introduction to Neural Networks Philipp Koehn 24 September 2020 Philipp Koehn Machine Translation: Introduction to Neural Networks 24 September 2020

  2. Linear Models 1 • We used before weighted linear combination of feature values h j and weights λ j � score ( λ, d i ) = λ j h j ( d i ) j • Such models can be illustrated as a ”network” Philipp Koehn Machine Translation: Introduction to Neural Networks 24 September 2020

  3. Limits of Linearity 2 • We can give each feature a weight • But not more complex value relationships, e.g, – any value in the range [0;5] is equally good – values over 8 are bad – higher than 10 is not worse Philipp Koehn Machine Translation: Introduction to Neural Networks 24 September 2020

  4. XOR 3 • Linear models cannot model XOR good bad bad good Philipp Koehn Machine Translation: Introduction to Neural Networks 24 September 2020

  5. Multiple Layers 4 • Add an intermediate (”hidden”) layer of processing (each arrow is a weight) x h y Philipp Koehn Machine Translation: Introduction to Neural Networks 24 September 2020

  6. 5 • Have we gained anything so far? Philipp Koehn Machine Translation: Introduction to Neural Networks 24 September 2020

  7. Non-Linearity 6 • Instead of computing a linear combination � score ( λ, d i ) = λ j h j ( d i ) j • Add a non-linear function � � � score ( λ, d i ) = f λ j h j ( d i ) j • Popular choices 1 tanh(x) sigmoid(x) = relu( x ) = max(0, x ) 1+ e − x (sigmoid is also called the ”logistic function”) Philipp Koehn Machine Translation: Introduction to Neural Networks 24 September 2020

  8. Deep Learning 7 • More layers = deep learning Philipp Koehn Machine Translation: Introduction to Neural Networks 24 September 2020

  9. What Depths Holds 8 • Each layer is a processing step • Having multiple processing steps allows complex functions • Metaphor: NN and computing circuits – computer = sequence of Boolean gates – neural computer = sequence of layers • Deep neural networks can implement complex functions e.g., sorting on input values Philipp Koehn Machine Translation: Introduction to Neural Networks 24 September 2020

  10. 9 example Philipp Koehn Machine Translation: Introduction to Neural Networks 24 September 2020

  11. Simple Neural Network 10 3.7 2.9 4.5 3.7 -5.2 2.9 5 . 1 - -2.0 -4.6 1 1 • One innovation: bias units (no inputs, always value 1) Philipp Koehn Machine Translation: Introduction to Neural Networks 24 September 2020

  12. Sample Input 11 3.7 1.0 2.9 4 . 5 3.7 -5.2 0.0 2.9 5 . 1 - -2.0 -4.6 1 1 • Try out two input values • Hidden unit computation 1 sigmoid ( 1.0 × 3 . 7 + 0.0 × 3 . 7 + 1 × − 1 . 5) = sigmoid (2 . 2) = 1 + e − 2 . 2 = 0 . 90 1 sigmoid ( 1.0 × 2 . 9 + 0.0 × 2 . 9 + 1 × − 4 . 5) = sigmoid ( − 1 . 6) = 1 + e 1 . 6 = 0 . 17 Philipp Koehn Machine Translation: Introduction to Neural Networks 24 September 2020

  13. Computed Hidden 12 3.7 1.0 .90 2.9 4.5 3.7 -5.2 0.0 .17 2.9 5 . 1 - -2.0 -4.6 1 1 • Try out two input values • Hidden unit computation 1 sigmoid ( 1.0 × 3 . 7 + 0.0 × 3 . 7 + 1 × − 1 . 5) = sigmoid (2 . 2) = 1 + e − 2 . 2 = 0 . 90 1 sigmoid ( 1.0 × 2 . 9 + 0.0 × 2 . 9 + 1 × − 4 . 5) = sigmoid ( − 1 . 6) = 1 + e 1 . 6 = 0 . 17 Philipp Koehn Machine Translation: Introduction to Neural Networks 24 September 2020

  14. Compute Output 13 3.7 1.0 .90 2.9 4.5 3.7 -5.2 0.0 .17 2.9 5 . 1 - -2.0 -4.6 1 1 • Output unit computation 1 sigmoid ( .90 × 4 . 5 + .17 × − 5 . 2 + 1 × − 2 . 0) = sigmoid (1 . 17) = 1 + e − 1 . 17 = 0 . 76 Philipp Koehn Machine Translation: Introduction to Neural Networks 24 September 2020

  15. Computed Output 14 3.7 1.0 .90 2.9 4.5 3.7 -5.2 0.0 .17 .76 2.9 5 . 1 - -2.0 -4.6 1 1 • Output unit computation 1 sigmoid ( .90 × 4 . 5 + .17 × − 5 . 2 + 1 × − 2 . 0) = sigmoid (1 . 17) = 1 + e − 1 . 17 = 0 . 76 Philipp Koehn Machine Translation: Introduction to Neural Networks 24 September 2020

  16. Output for all Binary Inputs 15 Input x 0 Input x 1 Hidden h 0 Hidden h 1 Output y 0 0 0 0.12 0.02 0.18 → 0 0 1 0.88 0.27 0.74 → 1 1 0 0.73 0.12 0.74 → 1 1 1 0.99 0.73 0.33 → 0 • Network implements XOR – hidden node h 0 is OR – hidden node h 1 is AND – final layer operation is h 0 − − h 1 • Power of deep neural networks: chaining of processing steps just as: more Boolean circuits → more complex computations possible Philipp Koehn Machine Translation: Introduction to Neural Networks 24 September 2020

  17. 16 why ”neural” networks? Philipp Koehn Machine Translation: Introduction to Neural Networks 24 September 2020

  18. Neuron in the Brain 17 • The human brain is made up of about 100 billion neurons Dendrite Axon terminal Soma Axon Nucleus • Neurons receive electric signals at the dendrites and send them to the axon Philipp Koehn Machine Translation: Introduction to Neural Networks 24 September 2020

  19. Neural Communication 18 • The axon of the neuron is connected to the dendrites of many other neurons Neurotransmitter Synaptic vesicle Neurotransmitter Axon transporter terminal Voltage gated Ca++ channel Receptor Postsynaptic density Synaptic cleft Dendrite Philipp Koehn Machine Translation: Introduction to Neural Networks 24 September 2020

  20. The Brain vs. Artificial Neural Networks 19 • Similarities – Neurons, connections between neurons – Learning = change of connections, not change of neurons – Massive parallel processing • But artificial neural networks are much simpler – computation within neuron vastly simplified – discrete time steps – typically some form of supervised learning with massive number of stimuli Philipp Koehn Machine Translation: Introduction to Neural Networks 24 September 2020

  21. 20 back-propagation training Philipp Koehn Machine Translation: Introduction to Neural Networks 24 September 2020

  22. Error 21 3.7 1.0 .90 2.9 4.5 3.7 -5.2 0.0 .17 .76 2.9 5 . 1 - -2.0 -4.6 1 1 • Computed output: y = .76 • Correct output: t = 1.0 ⇒ How do we adjust the weights? Philipp Koehn Machine Translation: Introduction to Neural Networks 24 September 2020

  23. Key Concepts 22 • Gradient descent – error is a function of the weights – we want to reduce the error – gradient descent: move towards the error minimum – compute gradient → get direction to the error minimum – adjust weights towards direction of lower error • Back-propagation – first adjust last set of weights – propagate error back to each previous layer – adjust their weights Philipp Koehn Machine Translation: Introduction to Neural Networks 24 September 2020

  24. Gradient Descent 23 error( λ ) gradient = 1 λ optimal λ current λ Philipp Koehn Machine Translation: Introduction to Neural Networks 24 September 2020

  25. Gradient Descent 24 Current Point Gradient for w 1 Optimum Combined Gradient Gradient for w 2 Philipp Koehn Machine Translation: Introduction to Neural Networks 24 September 2020

  26. Derivative of Sigmoid 25 1 • Sigmoid sigmoid ( x ) = 1 + e − x • Reminder: quotient rule = g ( x ) f ′ ( x ) − f ( x ) g ′ ( x ) � f ( x ) � ′ g ( x ) 2 g ( x ) d sigmoid ( x ) 1 = d • Derivative 1 + e − x dx dx = 0 × (1 − e − x ) − ( − e − x ) (1 + e − x ) 2 e − x 1 � � = 1 + e − x 1 + e − x 1 1 � � = 1 − 1 + e − x 1 + e − x = sigmoid ( x )(1 − sigmoid ( x )) Philipp Koehn Machine Translation: Introduction to Neural Networks 24 September 2020

  27. Final Layer Update 26 • Linear combination of weights s = � k w k h k • Activation function y = sigmoid ( s ) • Error (L2 norm) E = 1 2 ( t − y ) 2 • Derivative of error with regard to one weight w k dE = dE dy ds dw k dy ds dw k Philipp Koehn Machine Translation: Introduction to Neural Networks 24 September 2020

  28. Final Layer Update (1) 27 • Linear combination of weights s = � k w k h k • Activation function y = sigmoid ( s ) • Error (L2 norm) E = 1 2 ( t − y ) 2 • Derivative of error with regard to one weight w k dE = dE dy ds dw k dy ds dw k • Error E is defined with respect to y 1 dE dy = d 2( t − y ) 2 = − ( t − y ) dy Philipp Koehn Machine Translation: Introduction to Neural Networks 24 September 2020

  29. Final Layer Update (2) 28 • Linear combination of weights s = � k w k h k • Activation function y = sigmoid ( s ) • Error (L2 norm) E = 1 2 ( t − y ) 2 • Derivative of error with regard to one weight w k dE = dE dy ds dw k dy ds dw k • y with respect to x is sigmoid ( s ) ds = d sigmoid ( s ) dy = sigmoid ( s )(1 − sigmoid ( s )) = y (1 − y ) ds Philipp Koehn Machine Translation: Introduction to Neural Networks 24 September 2020

  30. Final Layer Update (3) 29 • Linear combination of weights s = � k w k h k • Activation function y = sigmoid ( s ) • Error (L2 norm) E = 1 2 ( t − y ) 2 • Derivative of error with regard to one weight w k dE = dE dy ds dw k dy ds dw k • x is weighted linear combination of hidden node values h k ds d � = w k h k = h k dw k dw k k Philipp Koehn Machine Translation: Introduction to Neural Networks 24 September 2020

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend