Neural Networks Philipp Koehn 14 April 2020 Philipp Koehn - PowerPoint PPT Presentation

Neural Networks Philipp Koehn 14 April 2020 Philipp Koehn Artificial Intelligence: Neural Networks 14 April 2020

Supervised Learning 1 ● Examples described by attribute values (Boolean, discrete, continuous, etc.) ● E.g., situations where I will/won’t wait for a table: Attributes Target Example Alt Bar F ri Hun P at P rice Rain Res T ype Est WillWait X 1 T F F T Some $$$ F T French 0–10 T X 2 T F F T Full $ F F Thai 30–60 F X 3 F T F F Some $ F F Burger 0–10 T X 4 T F T T Full $ F F Thai 10–30 T > 60 X 5 T F T F Full $$$ F T French F X 6 F T F T Some $$ T T Italian 0–10 T X 7 F T F F None $ T F Burger 0–10 F X 8 F F F T Some $$ T T Thai 0–10 T > 60 X 9 F T T F Full $ T F Burger F X 10 T T T T Full $$$ F T Italian 10–30 F X 11 F F F F None $ F F Thai 0–10 F X 12 T T T T Full $ F F Burger 30–60 T ● Classification of examples is positive (T) or negative (F) Philipp Koehn Artificial Intelligence: Neural Networks 14 April 2020

Naive Bayes Models 2 ● Bayes rule p ( C ∣ A ) = 1 Z p ( A ∣ C ) p ( C ) ● Independence assumption p ( A ∣ C ) = p ( a 1 ,a 2 ,a 3 ,...,a n ∣ C ) ≃ p ( a i ∣ C ) ∏ i ● Weights p ( A ∣ C ) = ∏ p ( a i ∣ C ) λ i i Philipp Koehn Artificial Intelligence: Neural Networks 14 April 2020

Naive Bayes Models 3 ● Linear model p ( A ∣ C ) = p ( a i ∣ C ) λ i ∏ i = λ i log p ( a i ∣ C ) exp ∑ i ● Probability distribution as features h i ( A ,C ) = log p ( a i ∣ C ) h 0 ( A ,C ) = log p ( C ) ● Linear model with features p ( C ∣ A ) ∝ ∑ λ i h i ( A ,C ) i Philipp Koehn Artificial Intelligence: Neural Networks 14 April 2020

Linear Model 4 ● Weighted linear combination of feature values h j and weights λ j for example d i score ( λ, d i ) = ∑ λ j h j ( d i ) j ● Such models can be illustrated as a ”network” Philipp Koehn Artificial Intelligence: Neural Networks 14 April 2020

Limits of Linearity 5 ● We can give each feature a weight ● But not more complex value relationships, e.g, – any value in the range [0;5] is equally good – values over 8 are bad – higher than 10 is not worse Philipp Koehn Artificial Intelligence: Neural Networks 14 April 2020

XOR 6 ● Linear models cannot model XOR good bad bad good Philipp Koehn Artificial Intelligence: Neural Networks 14 April 2020

Multiple Layers 7 ● Add an intermediate (”hidden”) layer of processing (each arrow is a weight) ● Have we gained anything so far? Philipp Koehn Artificial Intelligence: Neural Networks 14 April 2020

Non-Linearity 8 ● Instead of computing a linear combination score ( λ, d i ) = ∑ λ j h j ( d i ) j ● Add a non-linear function score ( λ, d i ) = f (∑ λ j h j ( d i )) j ● Popular choices 1 tanh(x) sigmoid(x) = 1 + e − x ✻ ✻ ✲ ✲ (sigmoid is also called the ”logistic function”) Philipp Koehn Artificial Intelligence: Neural Networks 14 April 2020

Deep Learning 9 ● More layers = deep learning Philipp Koehn Artificial Intelligence: Neural Networks 14 April 2020

10 example Philipp Koehn Artificial Intelligence: Neural Networks 14 April 2020

Simple Neural Network 11 3.7 2.9 4.5 3.7 -5.2 2.9 5 . 1 - -2.0 -4.6 1 1 ● One innovation: bias units (no inputs, always value 1) Philipp Koehn Artificial Intelligence: Neural Networks 14 April 2020

Sample Input 12 3.7 1.0 2.9 4 . 5 3.7 -5.2 0.0 2.9 5 . 1 - -2.0 -4.6 1 1 ● Try out two input values ● Hidden unit computation 1 sigmoid ( 1.0 × 3 . 7 + 0.0 × 3 . 7 + 1 × − 1 . 5 ) = sigmoid ( 2 . 2 ) = 1 + e − 2 . 2 = 0 . 90 1 sigmoid ( 1.0 × 2 . 9 + 0.0 × 2 . 9 + 1 × − 4 . 5 ) = sigmoid (− 1 . 6 ) = 1 + e 1 . 6 = 0 . 17 Philipp Koehn Artificial Intelligence: Neural Networks 14 April 2020

Computed Hidden 13 3.7 1.0 .90 2.9 4.5 3.7 -5.2 0.0 .17 2.9 5 . 1 - -2.0 -4.6 1 1 ● Try out two input values ● Hidden unit computation 1 sigmoid ( 1.0 × 3 . 7 + 0.0 × 3 . 7 + 1 × − 1 . 5 ) = sigmoid ( 2 . 2 ) = 1 + e − 2 . 2 = 0 . 90 1 sigmoid ( 1.0 × 2 . 9 + 0.0 × 2 . 9 + 1 × − 4 . 5 ) = sigmoid (− 1 . 6 ) = 1 + e 1 . 6 = 0 . 17 Philipp Koehn Artificial Intelligence: Neural Networks 14 April 2020

Compute Output 14 3.7 1.0 .90 2.9 4.5 3.7 -5.2 0.0 .17 2.9 5 . 1 - -2.0 -4.6 1 1 ● Output unit computation 1 sigmoid ( .90 × 4 . 5 + .17 × − 5 . 2 + 1 × − 2 . 0 ) = sigmoid ( 1 . 17 ) = 1 + e − 1 . 17 = 0 . 76 Philipp Koehn Artificial Intelligence: Neural Networks 14 April 2020

Computed Output 15 3.7 1.0 .90 2.9 4.5 3.7 -5.2 0.0 .17 .76 2.9 5 . 1 - -2.0 -4.6 1 1 ● Output unit computation 1 sigmoid ( .90 × 4 . 5 + .17 × − 5 . 2 + 1 × − 2 . 0 ) = sigmoid ( 1 . 17 ) = 1 + e − 1 . 17 = 0 . 76 Philipp Koehn Artificial Intelligence: Neural Networks 14 April 2020

16 why ”neural” networks? Philipp Koehn Artificial Intelligence: Neural Networks 14 April 2020

Neuron in the Brain 17 ● The human brain is made up of about 100 billion neurons Dendrite Axon terminal Soma Axon Nucleus ● Neurons receive electric signals at the dendrites and send them to the axon Philipp Koehn Artificial Intelligence: Neural Networks 14 April 2020

The Brain vs. Artificial Neural Networks 18 ● Similarities – Neurons, connections between neurons – Learning = change of connections, not change of neurons – Massive parallel processing ● But artificial neural networks are much simpler – computation within neuron vastly simplified – discrete time steps – typically some form of supervised learning with massive number of stimuli Philipp Koehn Artificial Intelligence: Neural Networks 14 April 2020

19 back-propagation training Philipp Koehn Artificial Intelligence: Neural Networks 14 April 2020

Error 20 3.7 1.0 .90 2.9 4.5 3.7 -5.2 0.0 .17 .76 2.9 5 . 1 - -2.0 -4.6 1 1 ● Computed output: y = .76 ● Correct output: t = 1.0 ⇒ How do we adjust the weights? Philipp Koehn Artificial Intelligence: Neural Networks 14 April 2020

Key Concepts 21 ● Gradient descent – error is a function of the weights – we want to reduce the error – gradient descent: move towards the error minimum – compute gradient → get direction to the error minimum – adjust weights towards direction of lower error ● Back-propagation – first adjust last set of weights – propagate error back to each previous layer – adjust their weights Philipp Koehn Artificial Intelligence: Neural Networks 14 April 2020

Gradient Descent 22 error( λ ) gradient = 1 λ optimal λ current λ Philipp Koehn Artificial Intelligence: Neural Networks 14 April 2020

Gradient Descent 23 Current Point Gradient for w 1 Optimum Combined Gradient Gradient for w 2 Philipp Koehn Artificial Intelligence: Neural Networks 14 April 2020

Derivative of Sigmoid 24 1 ● Sigmoid sigmoid ( x ) = 1 + e − x ● Reminder: quotient rule ( f ( x ) = g ( x ) f ′ ( x ) − f ( x ) g ′ ( x ) ′ g ( x )) g ( x ) 2 d sigmoid ( x ) = d 1 ● Derivative 1 + e − x dx dx = 0 × ( 1 − e − x ) − ( − e − x ) ( 1 + e − x ) 2 e − x 1 1 + e − x ( 1 + e − x ) = 1 1 1 + e − x ( 1 − 1 + e − x ) = = sigmoid ( x )( 1 − sigmoid ( x )) Philipp Koehn Artificial Intelligence: Neural Networks 14 April 2020

Final Layer Update 25 ● Linear combination of weights s = ∑ k w k h k ● Activation function y = sigmoid ( s ) 2 ( t − y ) 2 ● Error (L2 norm) E = 1 ● Derivative of error with regard to one weight w k dE = dE dy ds dw k dy ds dw k Philipp Koehn Artificial Intelligence: Neural Networks 14 April 2020

Final Layer Update (1) 26 ● Linear combination of weights s = ∑ k w k h k ● Activation function y = sigmoid ( s ) 2 ( t − y ) 2 ● Error (L2 norm) E = 1 ● Derivative of error with regard to one weight w k dE = dE dy ds dw k dy ds dw k ● Error E is defined with respect to y dE dy = d 1 2 ( t − y ) 2 = − ( t − y ) dy Philipp Koehn Artificial Intelligence: Neural Networks 14 April 2020

Final Layer Update (2) 27 ● Linear combination of weights s = ∑ k w k h k ● Activation function y = sigmoid ( s ) 2 ( t − y ) 2 ● Error (L2 norm) E = 1 ● Derivative of error with regard to one weight w k dE = dE dy ds dw k dy ds dw k ● y with respect to x is sigmoid ( s ) ds = d sigmoid ( s ) dy = sigmoid ( s )( 1 − sigmoid ( s )) = y ( 1 − y ) ds Philipp Koehn Artificial Intelligence: Neural Networks 14 April 2020

Final Layer Update (3) 28 ● Linear combination of weights s = ∑ k w k h k ● Activation function y = sigmoid ( s ) 2 ( t − y ) 2 ● Error (L2 norm) E = 1 ● Derivative of error with regard to one weight w k dE = dE dy ds dw k dy ds dw k ● x is weighted linear combination of hidden node values h k ds d dw k ∑ = w k h k = h k dw k k Philipp Koehn Artificial Intelligence: Neural Networks 14 April 2020

Putting it All Together 29 ● Derivative of error with regard to one weight w k dE = dE dy ds dw k dy ds dw k = − ( t − y ) y ( 1 − y ) h k – error – derivative of sigmoid: y ′ ● Weight adjustment will be scaled by a fixed learning rate µ ∆ w k = µ ( t − y ) y ′ h k Philipp Koehn Artificial Intelligence: Neural Networks 14 April 2020

Neural Networks Philipp Koehn 14 April 2020 Philipp Koehn - PowerPoint PPT Presentation

Neural Networks Philipp Koehn 14 April 2020 Philipp Koehn Artificial Intelligence: Neural Networks 14 April 2020 Supervised Learning 1 Examples described by attribute values (Boolean, discrete, continuous, etc.) E.g., situations where

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

CHAPTER II III I CHAPTER Neural Networks as Neural Networks as Associative Memory

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Neural Networks 0. Logistics Spring 2019 1 Neural Networks are taking over! Neural networks

Neural Networks and their Application to Go Neural Networks Learning Blackjack Theory Training

Neural Networks 1. Introduction Fall 2017 Neural Networks are taking over! Neural networks

Neural Networks Neural Net Basics Dan Klein, John DeNero UC Berkeley Slides adapted from Greg

Relaxation and Hopfield Networks Neural Networks Neural Networks - Hopfield 1 Bibliography

Neural Networks 1. Introduction Spring 2020 1 Neural Networks are taking over! Neural

Introduction to Artificial Intelligence Neural Networks - Deep Learning for NLP Janyl Jumadinova

Neural Networks 1. Introduction Spring 2019 1 Neural Networks are taking over! Neural

When Neurons Fail El Mahdi El Mhamdi, Rachid Guerraoui BDA, Chicago July 25th, 2016 1 / 28

ResNet with one-neuron hidden layers is universal approximator Hongzhou Lin, Stefanie Jegelka

Neural encoding models & maximum likelihood Jonathan Pillow 1 probability leftovers:

Neural Networks Hugo Larochelle ( @hugo_larochelle ) Google Brain 2 NEURAL NETWORK ONLINE

Lecture 10: Training Neural Networks (Part 1) Justin Johnson Lecture 1 - 1 October 7, 2019

Pattern Recognition Two main challenges Representation Matching Jain CSE 802, Spring

NEURON + Python Michael Hines HBP CodeJam Workshop #7 Manchester 2016 NINDS I r n e t t

Neural Networks Part 1 Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University

Neural Networks Philipp Koehn 14 April 2020 Philipp Koehn - PowerPoint PPT Presentation

Neural Networks Philipp Koehn 14 April 2020 Philipp Koehn Artificial Intelligence: Neural Networks 14 April 2020 Supervised Learning 1 Examples described by attribute values (Boolean, discrete, continuous, etc.) E.g., situations where

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

CHAPTER II III I CHAPTER Neural Networks as Neural Networks as Associative Memory

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Neural Networks 0. Logistics Spring 2019 1 Neural Networks are taking over! Neural networks

Neural Networks and their Application to Go Neural Networks Learning Blackjack Theory Training

Neural Networks 1. Introduction Fall 2017 Neural Networks are taking over! Neural networks

Neural Networks Neural Net Basics Dan Klein, John DeNero UC Berkeley Slides adapted from Greg

Relaxation and Hopfield Networks Neural Networks Neural Networks - Hopfield 1 Bibliography

Neural Networks 1. Introduction Spring 2020 1 Neural Networks are taking over! Neural

Introduction to Artificial Intelligence Neural Networks - Deep Learning for NLP Janyl Jumadinova

Neural Networks 1. Introduction Spring 2019 1 Neural Networks are taking over! Neural

When Neurons Fail El Mahdi El Mhamdi, Rachid Guerraoui BDA, Chicago July 25th, 2016 1 / 28

ResNet with one-neuron hidden layers is universal approximator Hongzhou Lin, Stefanie Jegelka

Neural encoding models &amp; maximum likelihood Jonathan Pillow 1 probability leftovers:

Neural Networks Hugo Larochelle ( @hugo_larochelle ) Google Brain 2 NEURAL NETWORK ONLINE

Lecture 10: Training Neural Networks (Part 1) Justin Johnson Lecture 1 - 1 October 7, 2019

Pattern Recognition Two main challenges Representation Matching Jain CSE 802, Spring

NEURON + Python Michael Hines HBP CodeJam Workshop #7 Manchester 2016 NINDS I r n e t t

Neural Networks Part 1 Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University

Neural encoding models & maximum likelihood Jonathan Pillow 1 probability leftovers: