Introduction to Neural Networks Philipp Koehn 24 September 2020 - PowerPoint PPT Presentation

Introduction to Neural Networks Philipp Koehn 24 September 2020 Philipp Koehn Machine Translation: Introduction to Neural Networks 24 September 2020

Linear Models 1 • We used before weighted linear combination of feature values h j and weights λ j � score ( λ, d i ) = λ j h j ( d i ) j • Such models can be illustrated as a ”network” Philipp Koehn Machine Translation: Introduction to Neural Networks 24 September 2020

Limits of Linearity 2 • We can give each feature a weight • But not more complex value relationships, e.g, – any value in the range [0;5] is equally good – values over 8 are bad – higher than 10 is not worse Philipp Koehn Machine Translation: Introduction to Neural Networks 24 September 2020

XOR 3 • Linear models cannot model XOR good bad bad good Philipp Koehn Machine Translation: Introduction to Neural Networks 24 September 2020

Multiple Layers 4 • Add an intermediate (”hidden”) layer of processing (each arrow is a weight) x h y Philipp Koehn Machine Translation: Introduction to Neural Networks 24 September 2020

5 • Have we gained anything so far? Philipp Koehn Machine Translation: Introduction to Neural Networks 24 September 2020

Non-Linearity 6 • Instead of computing a linear combination � score ( λ, d i ) = λ j h j ( d i ) j • Add a non-linear function � � � score ( λ, d i ) = f λ j h j ( d i ) j • Popular choices 1 tanh(x) sigmoid(x) = relu( x ) = max(0, x ) 1+ e − x (sigmoid is also called the ”logistic function”) Philipp Koehn Machine Translation: Introduction to Neural Networks 24 September 2020

Deep Learning 7 • More layers = deep learning Philipp Koehn Machine Translation: Introduction to Neural Networks 24 September 2020

What Depths Holds 8 • Each layer is a processing step • Having multiple processing steps allows complex functions • Metaphor: NN and computing circuits – computer = sequence of Boolean gates – neural computer = sequence of layers • Deep neural networks can implement complex functions e.g., sorting on input values Philipp Koehn Machine Translation: Introduction to Neural Networks 24 September 2020

9 example Philipp Koehn Machine Translation: Introduction to Neural Networks 24 September 2020

Simple Neural Network 10 3.7 2.9 4.5 3.7 -5.2 2.9 5 . 1 - -2.0 -4.6 1 1 • One innovation: bias units (no inputs, always value 1) Philipp Koehn Machine Translation: Introduction to Neural Networks 24 September 2020

Sample Input 11 3.7 1.0 2.9 4 . 5 3.7 -5.2 0.0 2.9 5 . 1 - -2.0 -4.6 1 1 • Try out two input values • Hidden unit computation 1 sigmoid ( 1.0 × 3 . 7 + 0.0 × 3 . 7 + 1 × − 1 . 5) = sigmoid (2 . 2) = 1 + e − 2 . 2 = 0 . 90 1 sigmoid ( 1.0 × 2 . 9 + 0.0 × 2 . 9 + 1 × − 4 . 5) = sigmoid ( − 1 . 6) = 1 + e 1 . 6 = 0 . 17 Philipp Koehn Machine Translation: Introduction to Neural Networks 24 September 2020

Computed Hidden 12 3.7 1.0 .90 2.9 4.5 3.7 -5.2 0.0 .17 2.9 5 . 1 - -2.0 -4.6 1 1 • Try out two input values • Hidden unit computation 1 sigmoid ( 1.0 × 3 . 7 + 0.0 × 3 . 7 + 1 × − 1 . 5) = sigmoid (2 . 2) = 1 + e − 2 . 2 = 0 . 90 1 sigmoid ( 1.0 × 2 . 9 + 0.0 × 2 . 9 + 1 × − 4 . 5) = sigmoid ( − 1 . 6) = 1 + e 1 . 6 = 0 . 17 Philipp Koehn Machine Translation: Introduction to Neural Networks 24 September 2020

Compute Output 13 3.7 1.0 .90 2.9 4.5 3.7 -5.2 0.0 .17 2.9 5 . 1 - -2.0 -4.6 1 1 • Output unit computation 1 sigmoid ( .90 × 4 . 5 + .17 × − 5 . 2 + 1 × − 2 . 0) = sigmoid (1 . 17) = 1 + e − 1 . 17 = 0 . 76 Philipp Koehn Machine Translation: Introduction to Neural Networks 24 September 2020

Computed Output 14 3.7 1.0 .90 2.9 4.5 3.7 -5.2 0.0 .17 .76 2.9 5 . 1 - -2.0 -4.6 1 1 • Output unit computation 1 sigmoid ( .90 × 4 . 5 + .17 × − 5 . 2 + 1 × − 2 . 0) = sigmoid (1 . 17) = 1 + e − 1 . 17 = 0 . 76 Philipp Koehn Machine Translation: Introduction to Neural Networks 24 September 2020

Output for all Binary Inputs 15 Input x 0 Input x 1 Hidden h 0 Hidden h 1 Output y 0 0 0 0.12 0.02 0.18 → 0 0 1 0.88 0.27 0.74 → 1 1 0 0.73 0.12 0.74 → 1 1 1 0.99 0.73 0.33 → 0 • Network implements XOR – hidden node h 0 is OR – hidden node h 1 is AND – final layer operation is h 0 − − h 1 • Power of deep neural networks: chaining of processing steps just as: more Boolean circuits → more complex computations possible Philipp Koehn Machine Translation: Introduction to Neural Networks 24 September 2020

16 why ”neural” networks? Philipp Koehn Machine Translation: Introduction to Neural Networks 24 September 2020

Neuron in the Brain 17 • The human brain is made up of about 100 billion neurons Dendrite Axon terminal Soma Axon Nucleus • Neurons receive electric signals at the dendrites and send them to the axon Philipp Koehn Machine Translation: Introduction to Neural Networks 24 September 2020

Neural Communication 18 • The axon of the neuron is connected to the dendrites of many other neurons Neurotransmitter Synaptic vesicle Neurotransmitter Axon transporter terminal Voltage gated Ca++ channel Receptor Postsynaptic density Synaptic cleft Dendrite Philipp Koehn Machine Translation: Introduction to Neural Networks 24 September 2020

The Brain vs. Artificial Neural Networks 19 • Similarities – Neurons, connections between neurons – Learning = change of connections, not change of neurons – Massive parallel processing • But artificial neural networks are much simpler – computation within neuron vastly simplified – discrete time steps – typically some form of supervised learning with massive number of stimuli Philipp Koehn Machine Translation: Introduction to Neural Networks 24 September 2020

20 back-propagation training Philipp Koehn Machine Translation: Introduction to Neural Networks 24 September 2020

Error 21 3.7 1.0 .90 2.9 4.5 3.7 -5.2 0.0 .17 .76 2.9 5 . 1 - -2.0 -4.6 1 1 • Computed output: y = .76 • Correct output: t = 1.0 ⇒ How do we adjust the weights? Philipp Koehn Machine Translation: Introduction to Neural Networks 24 September 2020

Key Concepts 22 • Gradient descent – error is a function of the weights – we want to reduce the error – gradient descent: move towards the error minimum – compute gradient → get direction to the error minimum – adjust weights towards direction of lower error • Back-propagation – first adjust last set of weights – propagate error back to each previous layer – adjust their weights Philipp Koehn Machine Translation: Introduction to Neural Networks 24 September 2020

Gradient Descent 23 error( λ ) gradient = 1 λ optimal λ current λ Philipp Koehn Machine Translation: Introduction to Neural Networks 24 September 2020

Gradient Descent 24 Current Point Gradient for w 1 Optimum Combined Gradient Gradient for w 2 Philipp Koehn Machine Translation: Introduction to Neural Networks 24 September 2020

Derivative of Sigmoid 25 1 • Sigmoid sigmoid ( x ) = 1 + e − x • Reminder: quotient rule = g ( x ) f ′ ( x ) − f ( x ) g ′ ( x ) � f ( x ) � ′ g ( x ) 2 g ( x ) d sigmoid ( x ) 1 = d • Derivative 1 + e − x dx dx = 0 × (1 − e − x ) − ( − e − x ) (1 + e − x ) 2 e − x 1 � � = 1 + e − x 1 + e − x 1 1 � � = 1 − 1 + e − x 1 + e − x = sigmoid ( x )(1 − sigmoid ( x )) Philipp Koehn Machine Translation: Introduction to Neural Networks 24 September 2020

Final Layer Update 26 • Linear combination of weights s = � k w k h k • Activation function y = sigmoid ( s ) • Error (L2 norm) E = 1 2 ( t − y ) 2 • Derivative of error with regard to one weight w k dE = dE dy ds dw k dy ds dw k Philipp Koehn Machine Translation: Introduction to Neural Networks 24 September 2020

Final Layer Update (1) 27 • Linear combination of weights s = � k w k h k • Activation function y = sigmoid ( s ) • Error (L2 norm) E = 1 2 ( t − y ) 2 • Derivative of error with regard to one weight w k dE = dE dy ds dw k dy ds dw k • Error E is defined with respect to y 1 dE dy = d 2( t − y ) 2 = − ( t − y ) dy Philipp Koehn Machine Translation: Introduction to Neural Networks 24 September 2020

Final Layer Update (2) 28 • Linear combination of weights s = � k w k h k • Activation function y = sigmoid ( s ) • Error (L2 norm) E = 1 2 ( t − y ) 2 • Derivative of error with regard to one weight w k dE = dE dy ds dw k dy ds dw k • y with respect to x is sigmoid ( s ) ds = d sigmoid ( s ) dy = sigmoid ( s )(1 − sigmoid ( s )) = y (1 − y ) ds Philipp Koehn Machine Translation: Introduction to Neural Networks 24 September 2020

Final Layer Update (3) 29 • Linear combination of weights s = � k w k h k • Activation function y = sigmoid ( s ) • Error (L2 norm) E = 1 2 ( t − y ) 2 • Derivative of error with regard to one weight w k dE = dE dy ds dw k dy ds dw k • x is weighted linear combination of hidden node values h k ds d � = w k h k = h k dw k dw k k Philipp Koehn Machine Translation: Introduction to Neural Networks 24 September 2020

Introduction to Neural Networks Philipp Koehn 24 September 2020 - PowerPoint PPT Presentation

Introduction to Neural Networks Philipp Koehn 24 September 2020 Philipp Koehn Machine Translation: Introduction to Neural Networks 24 September 2020 Linear Models 1 We used before weighted linear combination of feature values h j and

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Neural Networks 1. Introduction Fall 2017 Neural Networks are taking over! Neural networks

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

CHAPTER II III I CHAPTER Neural Networks as Neural Networks as Associative Memory

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Neural Networks 0. Logistics Spring 2019 1 Neural Networks are taking over! Neural networks

Neural Networks and their Application to Go Neural Networks Learning Blackjack Theory Training

Neural Networks 1. Introduction Spring 2020 1 Neural Networks are taking over! Neural

Introduction to Artificial Intelligence Neural Networks - Deep Learning for NLP Janyl Jumadinova

Neural Networks 1. Introduction Spring 2019 1 Neural Networks are taking over! Neural

(Very) Brief Introduction to Neural Networks IITP-03 Algorithms for NLP 1 / 31 Learning

Neural Networks Neural Net Basics Dan Klein, John DeNero UC Berkeley Slides adapted from Greg

RGC and optic nerve Literature review 2018 Nitza Goldenberg-Cohen , MD Bnai Zion Medical Center,

Artificial Neural Networks Genome 559: Introduction to Statistical and Computational Genomics

High performance and scalable architectures A practical introduction to CQRS and Axon Framework

Using light to observe the brain Gyorgy Lur, PhD Bio Sci H195, University of California, Irvine

Thanks and Disclosures UCSF Neurology Outpatient Conference Thanks! John Engstrom, M.D. Dr.

FORGET ME PLEASE? EVENT SOURCING & THE GDPR Michiel Rook - @michieltcs DISCLAIMER: I AM

Foundations I Fall, 2018 Action Potentials and Associated Voltage-gated Ion Channels The

Y P O C T O N TMS physics: Quantitative aspects of O targeting and dosing Intensive Course

Introduction to Neural Networks Philipp Koehn 24 September 2020 - PowerPoint PPT Presentation

Introduction to Neural Networks Philipp Koehn 24 September 2020 Philipp Koehn Machine Translation: Introduction to Neural Networks 24 September 2020 Linear Models 1 We used before weighted linear combination of feature values h j and

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Neural Networks 1. Introduction Fall 2017 Neural Networks are taking over! Neural networks

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

CHAPTER II III I CHAPTER Neural Networks as Neural Networks as Associative Memory

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Neural Networks 0. Logistics Spring 2019 1 Neural Networks are taking over! Neural networks

Neural Networks and their Application to Go Neural Networks Learning Blackjack Theory Training

Neural Networks 1. Introduction Spring 2020 1 Neural Networks are taking over! Neural

Introduction to Artificial Intelligence Neural Networks - Deep Learning for NLP Janyl Jumadinova

Neural Networks 1. Introduction Spring 2019 1 Neural Networks are taking over! Neural

(Very) Brief Introduction to Neural Networks IITP-03 Algorithms for NLP 1 / 31 Learning

Neural Networks Neural Net Basics Dan Klein, John DeNero UC Berkeley Slides adapted from Greg

RGC and optic nerve Literature review 2018 Nitza Goldenberg-Cohen , MD Bnai Zion Medical Center,

Artificial Neural Networks Genome 559: Introduction to Statistical and Computational Genomics

High performance and scalable architectures A practical introduction to CQRS and Axon Framework

Using light to observe the brain Gyorgy Lur, PhD Bio Sci H195, University of California, Irvine

Thanks and Disclosures UCSF Neurology Outpatient Conference Thanks! John Engstrom, M.D. Dr.

FORGET ME PLEASE? EVENT SOURCING &amp; THE GDPR Michiel Rook - @michieltcs DISCLAIMER: I AM

Foundations I Fall, 2018 Action Potentials and Associated Voltage-gated Ion Channels The

Y P O C T O N TMS physics: Quantitative aspects of O targeting and dosing Intensive Course

FORGET ME PLEASE? EVENT SOURCING & THE GDPR Michiel Rook - @michieltcs DISCLAIMER: I AM