CS480/680 Lecture 9: June 5, 2019 Perceptrons, Neural Networks [D] - - PowerPoint PPT Presentation

cs480 680 lecture 9 june 5 2019
SMART_READER_LITE
LIVE PREVIEW

CS480/680 Lecture 9: June 5, 2019 Perceptrons, Neural Networks [D] - - PowerPoint PPT Presentation

CS480/680 Lecture 9: June 5, 2019 Perceptrons, Neural Networks [D] Chapt. 4, [HTF] Chapt. 11, [B] Sec. 4.1.7, 5.1, [M] Sec. 8.5.4, [RN] Sec. 18.7 University of Waterloo CS480/680 Spring 2019 Pascal Poupart 1 Outline Neural networks


slide-1
SLIDE 1

CS480/680 Lecture 9: June 5, 2019

Perceptrons, Neural Networks [D] Chapt. 4, [HTF] Chapt. 11, [B] Sec. 4.1.7, 5.1, [M] Sec. 8.5.4, [RN] Sec. 18.7

CS480/680 Spring 2019 Pascal Poupart 1 University of Waterloo

slide-2
SLIDE 2

CS480/680 Spring 2019 Pascal Poupart 2

Outline

  • Neural networks

– Perceptron – Supervised learning algorithms for neural networks

University of Waterloo

slide-3
SLIDE 3

CS480/680 Spring 2019 Pascal Poupart 3

Brain

  • Seat of human intelligence
  • Where memory/knowledge resides
  • Responsible for thoughts and decisions
  • Can learn
  • Consists of nerve cells called neurons

University of Waterloo

slide-4
SLIDE 4

CS480/680 Spring 2019 Pascal Poupart 4

Neuron

University of Waterloo

slide-5
SLIDE 5

CS480/680 Spring 2019 Pascal Poupart 5

Comparison

  • Brain

– Network of neurons – Nerve signals propagate in a neural network – Parallel computation – Robust (neurons die everyday without any impact)

  • Computer

– Bunch of gates – Electrical signals directed by gates – Sequential and parallel computation – Fragile (if a gate stops working, computer crashes)

University of Waterloo

slide-6
SLIDE 6

CS480/680 Spring 2019 Pascal Poupart 6

Artificial Neural Networks

  • Idea: mimic the brain to do computation
  • Artificial neural network:

– Nodes (a.k.a. units) correspond to neurons – Links correspond to synapses

  • Computation:

– Numerical signal transmitted between nodes corresponds to chemical signals between neurons – Nodes modifying numerical signal corresponds to neurons firing rate

University of Waterloo

slide-7
SLIDE 7

CS480/680 Spring 2019 Pascal Poupart 7

ANN Unit

  • For each unit i:
  • Weights: !

– Strength of the link from unit " to unit # – Input signals $" weighted by %#" and linearly combined: &# = ∑) %

*) $) + ,- = !. /

  • Activation function: 1

– Numerical signal produced: 2* = ℎ(&*)

University of Waterloo

slide-8
SLIDE 8

CS480/680 Spring 2019 Pascal Poupart 8

ANN Unit

  • Picture

University of Waterloo

slide-9
SLIDE 9

CS480/680 Spring 2019 Pascal Poupart 9

Activation Function

  • Should be nonlinear

– Otherwise network is just a linear function

  • Often chosen to mimic firing in neurons

– Unit should be “active” (output near 1) when fed with the “right” inputs – Unit should be “inactive” (output near 0) when fed with the “wrong” inputs

University of Waterloo

slide-10
SLIDE 10

CS480/680 Spring 2019 Pascal Poupart 10

Common Activation Functions

Threshold Sigmoid

University of Waterloo

slide-11
SLIDE 11

CS480/680 Spring 2019 Pascal Poupart 11

Logic Gates

  • McCulloch and Pitts (1943)

– Design ANNs to represent Boolean functions

  • What should be the weights of the following units to

code AND, OR, NOT ?

University of Waterloo

slide-12
SLIDE 12

CS480/680 Spring 2019 Pascal Poupart 12

Network Structures

  • Feed-forward network

– Directed acyclic graph – No internal state – Simply computes outputs from inputs

  • Recurrent network

– Directed cyclic graph – Dynamical system with internal states – Can memorize information

University of Waterloo

slide-13
SLIDE 13

CS480/680 Spring 2019 Pascal Poupart 13

Feed-forward network

  • Simple network with two inputs, one hidden layer of

two units, one output unit

University of Waterloo

slide-14
SLIDE 14

CS480/680 Spring 2019 Pascal Poupart 14

Perceptron

  • Single layer feed-forward network

University of Waterloo

slide-15
SLIDE 15

CS480/680 Spring 2019 Pascal Poupart 15

Supervised Learning

  • Given list of (", $) pairs
  • Train feed-forward ANN

– To compute proper outputs $ when fed with inputs " – Consists of adjusting weights &

'(

  • Simple learning algorithm for threshold perceptrons

University of Waterloo

slide-16
SLIDE 16

CS480/680 Spring 2019 Pascal Poupart 16

Threshold Perceptron Learning

  • Learning is done separately for each unit !

– Since units do not share weights

  • Perceptron learning for unit !:

– For each (#, %) pair do:

  • Case 1: correct output produced

∀( )

*( ← ) *(

  • Case 2: output produced is 0 instead of 1

∀( )

*( ← ) *( + -(

  • Case 3: output produced is 1 instead of 0

∀( )

*( ← ) *( − -(

– Until correct output for all training instances

University of Waterloo

slide-17
SLIDE 17

CS480/680 Spring 2019 Pascal Poupart 17

Threshold Perceptron Learning

  • Dot products: !

"#! " ≥ 0 and −! "#! " ≤ 0

  • Perceptron computes

1 when )#! " = ∑, -,., + .0 > 0 0 when )#! " = ∑, -,., + .0 < 0

  • If output should be 1 instead of 0 then

) ← ) + ! " since ) + ! " 4! " ≥ )#! "

  • If output should be 0 instead of 1 then

) ← ) − ! " since ) − ! " 4! " ≤ )#! "

University of Waterloo

slide-18
SLIDE 18

CS480/680 Spring 2019 Pascal Poupart 18

Alternative Approach

  • Let ! ∈ −1,1 ∀!
  • Let ' = { *+, !+ ∀+} be set of misclassified examples

– i.e., !+-./ *0 < 0

  • Find - that minimizes misclassification error

3(-) = − ∑ *7,87 ∈9 !+-./ *0

  • Algorithm: gradient descent
  • ← - − ;<=

learning rate

  • r step length

University of Waterloo

slide-19
SLIDE 19

CS480/680 Spring 2019 Pascal Poupart 19

Sequential Gradient Descent

  • Gradient: !" = − ∑ &',)' ∈+ ,-.

/0

  • Sequential gradient descent:

– Adjust 1 based on one example /, , at a time

1 ← 1 + 4,. /

  • When 4 = 1, we recover the threshold perceptron

learning algorithm

University of Waterloo

slide-20
SLIDE 20

CS480/680 Spring 2019 Pascal Poupart 20

Threshold Perceptron Hypothesis Space

  • Hypothesis space ℎ":

– All binary classifications with parameters " s.t.

"#$ % > 0 → +1 "#$ % < 0 → −1

  • Since "#$

% is linear in ", perceptron is called a linear separator

  • Theorem: Threshold perceptron learning converges iff

the data is linearly separable

University of Waterloo

slide-21
SLIDE 21

CS480/680 Spring 2019 Pascal Poupart 21

Linear Separability

  • Examples:

Linearly separable Non-linearly separable

University of Waterloo

slide-22
SLIDE 22

CS480/680 Spring 2019 Pascal Poupart 22

Sigmoid Perceptron

  • Represent “soft” linear separators
  • Same hypothesis space as logistic regression

University of Waterloo

slide-23
SLIDE 23

CS480/680 Spring 2019 Pascal Poupart 23

Sigmoid Perceptron Learning

  • Possible objectives

– Minimum squared error

! " = 1 2 &

'

!' " ( = 1 2 &

'

)' − + ",- ./

(

– Maximum likelihood

  • Same algorithm as for logistic regression

– Maximum a posteriori hypothesis – Bayesian Learning

University of Waterloo

slide-24
SLIDE 24

CS480/680 Spring 2019 Pascal Poupart 24

Gradient

  • Gradient:

!" !#$ = ∑' (' ) !"* !#$

= − ∑' (' ) ,- ). ̅ 0' 01 Recall that ,- = ,(1 − ,) = − ∑' (' ) , ). ̅ 0' 1 − , ). ̅ 0' 01

University of Waterloo

slide-25
SLIDE 25

CS480/680 Spring 2019 Pascal Poupart 25

Sequential Gradient Descent

  • Perceptron-Learning(examples,network)

– Repeat

  • For each ("#, %&) in examples do

(& ← %& − +(,-. "#) , ← , + 0 (& + ,-. "# 1 − + ,-. "# . "#

– Until some stopping criterion satisfied – Return learnt network

  • N.B. 0 is a learning rate corresponding to the step size

in gradient descent

University of Waterloo

slide-26
SLIDE 26

CS480/680 Spring 2019 Pascal Poupart 26

Multilayer Networks

  • Adding two sigmoid units with parallel but
  • pposite “cliffs” produces a ridge

University of Waterloo

slide-27
SLIDE 27

CS480/680 Spring 2019 Pascal Poupart 27

Multilayer Networks

  • Adding two intersecting ridges (and

thresholding) produces a bump

University of Waterloo

slide-28
SLIDE 28

CS480/680 Spring 2019 Pascal Poupart 28

Multilayer Networks

  • By tiling bumps of various heights together, we

can approximate any function

  • Training algorithm:

– Back-propagation – Essentially sequential gradient descent performed by propagating errors backward into the network – Derivation next class

University of Waterloo

slide-29
SLIDE 29

CS480/680 Spring 2019 Pascal Poupart 29

Neural Net Applications

  • Neural nets can approximate any function,

hence millions of applications

– Speech recognition – Word embeddings – Machine translation – Vision-based object recognition – Vision-based autonomous driving – Etc.

University of Waterloo