Learning From Data Lecture 20 Multilayer Perceptron Multiple - - PowerPoint PPT Presentation

learning from data lecture 20 multilayer perceptron
SMART_READER_LITE
LIVE PREVIEW

Learning From Data Lecture 20 Multilayer Perceptron Multiple - - PowerPoint PPT Presentation

Learning From Data Lecture 20 Multilayer Perceptron Multiple layers Universal Approximation The Neural Network M. Magdon-Ismail CSCI 4100/6100 recap: Unsupervised Learning k -Means Clustering Gaussian Mixture Model P ( x ) x Hard


slide-1
SLIDE 1

Learning From Data Lecture 20 Multilayer Perceptron

Multiple layers Universal Approximation The Neural Network

  • M. Magdon-Ismail

CSCI 4100/6100

slide-2
SLIDE 2

recap:Unsupervised Learning

k-Means Clustering Gaussian Mixture Model

x P(x)

‘Hard’ partition into k-clusters ‘Soft’ probability density estimation

c A M L Creator: Malik Magdon-Ismail

Multilayer Perceptron: 2 /18

Bio-inspired Neural Network − →

slide-3
SLIDE 3

The Neural Network - Biologically Inspired

Engineering success may start with biological inspiration, but then take a totally different path.

c A M L Creator: Malik Magdon-Ismail

Multilayer Perceptron: 3 /18

Planes don’t flap wings − →

slide-4
SLIDE 4

Planes Don’t Flap Wings to Fly

Engineering success may start with biological inspiration, but then take a totally different path.

c A M L Creator: Malik Magdon-Ismail

Multilayer Perceptron: 4 /18

xor − →

slide-5
SLIDE 5

xor: A Limitation of the Linear Model

+1 +1 −1 −1

x1 x2

c A M L Creator: Malik Magdon-Ismail

Multilayer Perceptron: 5 /18

Decomposing xor − →

slide-6
SLIDE 6

Decomposing xor

+1 +1 −1 −1

x1 x2

f = h1h2 + h1h2

+1 −1

x1 x2

+1 −1

x1 x2

h1(x) = sign(wt

1x)

h2(x) = sign(wt

2x)

c A M L Creator: Malik Magdon-Ismail

Multilayer Perceptron: 6 /18

Perceptrons for or and and − →

slide-7
SLIDE 7

Perceptrons for or and and

  • r(x1, x2) = sign(x1 + x2 + 1.5)

and(x1, x2) = sign(x1 + x2 − 1.5)

1 x1 x2

1 1 1.5

  • r(x1, x2)

1 x1 x2

1 1 −1.5

and(x1, x2)

c A M L Creator: Malik Magdon-Ismail

Multilayer Perceptron: 7 /18

Representing f using or and and − →

slide-8
SLIDE 8

Representing f Using or and and

f = h1h2 + h1h2

f 1 1 1.5

1 h1h2 h1h2

c A M L Creator: Malik Magdon-Ismail

Multilayer Perceptron: 8 /18

Expand ands − →

slide-9
SLIDE 9

Representing f Using or and and

f = h1h2 + h1h2

1 1 f

1 1

−1.5

h1 h2

1 1 −1 −1 −1.5 1.5

c A M L Creator: Malik Magdon-Ismail

Multilayer Perceptron: 9 /18

Expand h1, h2 − →

slide-10
SLIDE 10

Representing f Using or and and

f = h1h2 + h1h2

wt

2x

1 f

1

1.5 1

1 1

−1.5 1 1 −1 −1 −1.5

x1 x2

wt

1x

c A M L Creator: Malik Magdon-Ismail

Multilayer Perceptron: 10 /18

The Multilayer Perceptron − →

slide-11
SLIDE 11

The Multilayer Perceptron (MLP)

wt

2x

1 f

1

1.5 1

1 1

−1.5 1 1 −1 −1 −1.5

x1 x2

wt

1x

w1

x1 x2 1

sign(wtx) w0 w2

More layers allow us to implement f

These additional layers are called hidden layers

c A M L Creator: Malik Magdon-Ismail

Multilayer Perceptron: 11 /18

Universal approximation − →

slide-12
SLIDE 12

Universal Approximation

Any target function f that can be decomposed into linear separators can be implemented by a 3-layer MLP.

c A M L Creator: Malik Magdon-Ismail

Multilayer Perceptron: 12 /18

Circle Example − →

slide-13
SLIDE 13

Universal Approximation

A sufficiently smooth separator can “essentially” be decomposed into linear separators.

− − −

+ + +

+

− − − −

+ + + +

− − −

+ + +

+

Target 8 perceptrons 16 perceptrons

c A M L Creator: Malik Magdon-Ismail

Multilayer Perceptron: 13 /18

Approximation versus generalization − →

slide-14
SLIDE 14

Approximation Versus Generalization

The size of the MLP controls the approximation-generalization tradeoff.

More nodes per hidden layer = ⇒ approximation↑ and generalization↓

c A M L Creator: Malik Magdon-Ismail

Multilayer Perceptron: 14 /18

Minimizing Ein − →

slide-15
SLIDE 15

Minimizing Ein

A combinatorial problem even harder with the MLP than the Perceptron. Ein is not smooth (due to sign function), so cannot use gradient descent. sign(x) ≈ tan(x) − → gradient descent to minimize Ein.

c A M L Creator: Malik Magdon-Ismail

Multilayer Perceptron: 15 /18

Neural Network − →

slide-16
SLIDE 16

The Neural Network

input layer ℓ = 0

1 1 h(x)

. . .

s θ(s) θ θ θ θ θ θ 1 x1 x2 xd

  • utput layer ℓ = L

hidden layers 0 < ℓ < L

c A M L Creator: Malik Magdon-Ismail

Multilayer Perceptron: 16 /18

Zooming into a hidden node − →

slide-17
SLIDE 17

Zooming into a Hidden Node

input layer ℓ = 0

1 1 h(x)

. . .

s θ(s) θ θ θ θ θ θ 1 x1 x2 xd

  • utput layer ℓ = L

hidden layers 0 < ℓ < L

+ θ θ

layer (ℓ − 1) layer ℓ layer (ℓ + 1)

s(ℓ) x(ℓ) W(ℓ) W(ℓ+1)

layer ℓ parameters signals in s(ℓ) d(ℓ) dimensional input vector

  • utputs

x(ℓ) d(ℓ) + 1 dimensional output vector weights in W(ℓ) (d(ℓ−1) + 1) × d(ℓ) dimensional matrix weights out W(ℓ+1) (d(ℓ) + 1) × d(ℓ+1) dimensional matrix

layers ℓ = 0, 1, 2, . . . , L layer ℓ has “dimension” d(ℓ) = ⇒ d(ℓ) + 1 nodes W(ℓ) =     w(ℓ)

1

w(ℓ)

2

· · · w(ℓ)

d(ℓ)

. . .    

c A M L Creator: Malik Magdon-Ismail

Multilayer Perceptron: 17 /18

Neural Network − →

slide-18
SLIDE 18

The Neural Network

Biology − − − − − − − − − − − → Engineering − − − →

input layer ℓ = 0

1 1 h(x)

. . .

s θ(s) θ θ θ θ θ θ 1 x1 x2 xd

  • utput layer ℓ = L

hidden layers 0 < ℓ < L

c A M L Creator: Malik Magdon-Ismail

Multilayer Perceptron: 18 /18