Out line Neural net wor ks Percept r on Neural Net works - - PDF document

out line
SMART_READER_LITE
LIVE PREVIEW

Out line Neural net wor ks Percept r on Neural Net works - - PDF document

Out line Neural net wor ks Percept r on Neural Net works Supervised learning algorit hms f or neur al net works J uly 7, 2005 Reading: R&N Ch 20.5 CS 486/ 686 Univer sit y of Wat erloo 2 CS486/686 Lecture Slides (c)


slide-1
SLIDE 1

1

Neural Net works

J uly 7, 2005 CS 486/ 686 Univer sit y of Wat erloo

CS486/686 Lecture Slides (c) 2005 P. Poupart

2

Out line

  • Neural net wor ks

– Percept r on – Supervised learning algorit hms f or neur al net works

  • Reading: R&N Ch 20.5

CS486/686 Lecture Slides (c) 2005 P. Poupart

3

Brain

  • Seat of human int elligence
  • Where memory/ knowledge resides
  • Responsible f or t hought s and decisions
  • Can learn
  • Consist s of ner ve cells called neurons

CS486/686 Lecture Slides (c) 2005 P. Poupart

4

Neuron

Axon Cell body or Soma Nucleus Dendrite Synapses Axonal arborization Axon from another cell Synapse

CS486/686 Lecture Slides (c) 2005 P. Poupart

5

Comparison

  • Brain

– Net work of neurons – Nerve signals propagat e in a neural net work – Parallel comput at ion – Robust (neurons die everyday wit hout any impact )

  • Comput er

– Bunch of gat es – Elect rical signals direct ed by gat es – Sequent ial comput at ion – Fragile (if a gat e st ops working, comput er crashes)

CS486/686 Lecture Slides (c) 2005 P. Poupart

6

Art if icial Neural Net works

  • I dea: mimic t he brain t o do comput at ion
  • Art if icial neural net work:

– Nodes (a.k.a unit s) correspond t o neurons – Links correspond t o synapses

  • Comput at ion:

– Numerical signal t ransmit t ed bet ween nodes corresponds t o chemical signals bet ween neurons – Nodes modif ying numerical signal cor responds t o neurons f iring rat e

slide-2
SLIDE 2

2

CS486/686 Lecture Slides (c) 2005 P. Poupart

7

ANN Unit

  • For each unit i:
  • Weight s: Wj i

– St rengt h of t he link f rom unit j t o unit i – I nput signals aj weight ed by Wj i and linearly combined: ini = Σj Wj i aj

  • Act ivat ion f unct ion: g

– Numerical signal pr oduced: ai = g(ini)

CS486/686 Lecture Slides (c) 2005 P. Poupart

8

ANN Unit

Output

a = g(in )

i i

aj Wj,i

Input Function g ini

Σ

i

a

Input Links Output Links Activation Function

W0,i a0 = −1

Bias Weight

CS486/686 Lecture Slides (c) 2005 P. Poupart

9

Act ivat ion Funct ion

  • Should be nonlinear

– Ot herwise net work is j ust a linear f unct ion

  • Of t en chosen t o mimic f ir ing in neurons

– Unit should be “act ive” (out put near 1) when f ed wit h t he “right ” input s – Unit should be “inact ive” (out put near 0) when f ed wit h t he “wrong” input s

CS486/686 Lecture Slides (c) 2005 P. Poupart

10

Common Act ivat ion Funct ions

(a) (b) +1

ini g( ) ini

+1

ini g( ) ini

Thr eshold Sigmoid g(x) = 1/ (1+e-x)

CS486/686 Lecture Slides (c) 2005 P. Poupart

11

Logic Gat es

  • McCulloch and Pit t s (1943)

– Design ANNs t o represent Boolean f ns

  • What should be t he weight s of t he

f ollowing unit s t o code AND, OR, NOT ?

  • 1
  • 1
  • 1

a1 a1 a1 a2 a2 thresh thresh thresh

CS486/686 Lecture Slides (c) 2005 P. Poupart

12

Net work St ruct ures

  • Feed-f orwar d net work

– Direct ed acyclic gr aph – No int ernal st at e – Simply comput es out put s f rom input s

  • Recurrent net work

– Direct ed cyclic graph – Dynamical syst em wit h int ernal st at es – Can memorize inf ormat ion

slide-3
SLIDE 3

3

CS486/686 Lecture Slides (c) 2005 P. Poupart

13

Feed-f orward net work

  • Simple net work wit h t wo input s, one

hidden layer of t wo unit s, one out put unit

a5 = g(W3,5a3 + W4,5a4) = g(W3,5g(W1,3a1 + W2,3a2) + W4,5g(W1,4a1 + W2,4a2))

W

1,3 1,4

W

2,3

W

2,4

W W

3,5 4,5

W 1 2 3 4 5

CS486/686 Lecture Slides (c) 2005 P. Poupart

14

Percept ron

  • Single layer f eed-f orwar d net wor k

Input Units Units Output

Wj,i

CS486/686 Lecture Slides (c) 2005 P. Poupart

15

Supervised Learning

  • Given list of <

input ,out put > pair s

  • Train f eed-f orwar d ANN

– To comput e proper out put s when f ed wit h input s – Consist s of adj ust ing weight s Wj i

  • Simple lear ning algorit hm f or t hreshold

percept rons

CS486/686 Lecture Slides (c) 2005 P. Poupart

16

Threshold Percept ron Lear ning

  • Learning is done separat ely f or each unit

– Since unit s do not share weight s

  • Percept r on lear ning f or unit i:

– For each < input s,out put > pair do:

  • Case 1: correct out put produced

– ∀j W j i W j i

  • Case 2: out put produced is 0 inst ead of 1

– ∀j W j i W j i + aj

  • Case 3: out put produced is 1 inst ead of 0

– ∀j W j i W j i – aj

– Unt il correct out put f or all t raining inst ances

CS486/686 Lecture Slides (c) 2005 P. Poupart

17

Threshold Percept r on Lear ning

  • Dot product s: a●a ≥ 0 and -a●a ≤ 0
  • Percept ron comput es

– 1 when a●W = Σj ajWj i > – 0 when a●W = Σj ajWj i <

  • I f out put should be 1 inst ead of 0 t hen

– W W+a since a●(W+a) ≥ a●W

  • I f out put should be 0 inst ead of 1 t hen

– W W-a since a●(W-a) ≤ a●W

CS486/686 Lecture Slides (c) 2005 P. Poupart

18

Thr eshold Per cept r on Hypot hesis Space

  • Hypot hesis space hW:

– All binary classif icat ions wit h param. W s.t .

  • a●W >

0 1

  • a●W <

0 0

  • Since a●W is linear in W, percept ron is

called a linear separ at or

slide-4
SLIDE 4

4

CS486/686 Lecture Slides (c) 2005 P. Poupart

19

Thr eshold Per cept r on Hypot hesis Space

  • Ar e all Boolean gat es linear ly separ able?

I

1

I

2

I

1

I

2

I

1

I

2

?

(a) (b) (c) 1 1 1 1 1 1 xor I

2

I

1

  • r

I

1

I

2

and

I

1

I

2

CS486/686 Lecture Slides (c) 2005 P. Poupart

20

Sigmoid Percept ron

  • Represent “sof t ” linear separat or s

CS486/686 Lecture Slides (c) 2005 P. Poupart

21

Sigmoid Percept ron Learning

  • Formulat e lear ning as an opt imizat ion

search in weight space

– Since g dif f erent iable, use gradient descent

  • Minimize squared error:

– E = 0.5 Er r 2 = 0.5 (y – hW(x))2

  • x: input
  • y: t arget out put
  • hW(x): comput ed out put

CS486/686 Lecture Slides (c) 2005 P. Poupart

22

Percept ron Error Gradient

  • E = 0.5 Err 2 = 0.5 (y – hW(x))2
  • ∂E/ ∂Wj = Err × ∂Err/ ∂Wj

= Err × ∂(y – g(Σj Wjxj)) = -Err × g’(Σj Wjxj) × xj

  • When g is sigmoid f n, t hen g’ = g(1-g)

CS486/686 Lecture Slides (c) 2005 P. Poupart

23

Percept ron Learning Algorit hm

  • Percept r on-Learning(examples,net work)

– Repeat

  • For each e in examples do

– in Σj W jx j[e] – Err y[e] – g(in) – W j W j + α × Err × g’(in) × xj[e]

– Unt il some st opping crit eria sat isf ied – Ret urn learnt net work

  • N.B. α is a lear ning r at e cor responding t o t he

st ep size in gradient descent

CS486/686 Lecture Slides (c) 2005 P. Poupart

24

Mult ilayer Feed-f or war d Neur al Net works

  • Percept ron can only represent (sof t )

linear separ at or s

– Because single layer

  • Wit h mult iple layer s, what f ns can be

represent ed?

– Virt ually any f unct ion!

slide-5
SLIDE 5

5

CS486/686 Lecture Slides (c) 2005 P. Poupart

25

Mult ilayer Net works

  • Adding t wo sigmoid unit s wit h parallel

but opposit e “clif f s” produces a ridge

  • 4
  • 2

2 4 x1

  • 4 -2 0

2 4 x2 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Network output

CS486/686 Lecture Slides (c) 2005 P. Poupart

26

Mult ilayer Net works

  • Adding t wo int er sect ing ridges (and

t hresholding) produces a bump

  • 4
  • 2

2 4 x1

  • 4 -2 0

2 4 x2 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Network output

CS486/686 Lecture Slides (c) 2005 P. Poupart

27

Mult ilayer Net works

  • By t iling bumps of various height s t o-

get her, we can approximat e any f unct ion

  • Training algorit hm:

– Back-propagat ion – Essent ially gradient perf ormed by propagat ing err ors backward int o t he net work – See t ext book f or derivat ion

CS486/686 Lecture Slides (c) 2005 P. Poupart

28

Neural Net Applicat ions

  • Neural net s can approximat e any

f unct ion, hence 1000’s of applicat ions

– NETt alk f or pronouncing English t ext – Charact er recognit ion – Paint -qualit y inspect ion – Vision-based aut onomous driving – Et c.

CS486/686 Lecture Slides (c) 2005 P. Poupart

29

Neural Net Drawbacks

  • Common problems:

– How should we int erpret unit s? – How many layers and unit s should a net work have? – How t o avoid local opt imum while t raining wit h gradient descent ?

CS486/686 Lecture Slides (c) 2005 P. Poupart

30

Next Class

  • Next Class:
  • Ensemble lear ning
  • Russell and Norvig Sect . 18.4