Introduction to Machine Learning Multilayer Perceptron Barnabs - - PowerPoint PPT Presentation

introduction to machine learning
SMART_READER_LITE
LIVE PREVIEW

Introduction to Machine Learning Multilayer Perceptron Barnabs - - PowerPoint PPT Presentation

Introduction to Machine Learning Multilayer Perceptron Barnabs Pczos The Multilayer Perceptron 2 Multilayer Perceptron 3 ALVINN: AN AUTONOMOUS LAND VEHICLE IN A NEURAL NETWORK Dean A. Pomerleau, Carnegie Mellon University, 1989


slide-1
SLIDE 1

Introduction to Machine Learning

Multilayer Perceptron

Barnabás Póczos

slide-2
SLIDE 2

2

The Multilayer Perceptron

slide-3
SLIDE 3

3

Multilayer Perceptron

slide-4
SLIDE 4

4

Dean A. Pomerleau, Carnegie Mellon University, 1989

ALVINN: AN AUTONOMOUS LAND VEHICLE IN A NEURAL NETWORK

Training: using simulated road generator

slide-5
SLIDE 5

5

Gradient Descent

We want to solve:

slide-6
SLIDE 6

6

Starting Point

slide-7
SLIDE 7

7

Starting Point

slide-8
SLIDE 8

8

Fixed step size can be too big

slide-9
SLIDE 9

9

Fixed step size can be too small

slide-10
SLIDE 10

10

slide-11
SLIDE 11

11

slide-12
SLIDE 12

12

Character Recognition with MLP

Matlab: appcr1

slide-13
SLIDE 13

13

The network

Noise-free input: 26 different letters of size 7x5

slide-14
SLIDE 14

14

Noisy inputs

slide-15
SLIDE 15

15

% Create MLP hiddenlayers=[10, 25]; net1 = feedforwardnet(hiddenlayers); net1 = configure(net1,X,T); %View view(net1); %Train net1 = train(net1,X,T); %Test Y1 = net1(Xtest);

Matlab MLP Training

slide-16
SLIDE 16

16

Prediction errors

▪ Network 1 was trained on clean images ▪ Network 2 was trained on noisy images. 30 noisy copies of each letter are created

slide-17
SLIDE 17

17

The Backpropagation Algorithm

slide-18
SLIDE 18

18

Multilayer Perceptron

slide-19
SLIDE 19

19

The gradient of the error

slide-20
SLIDE 20

20

Notation

slide-21
SLIDE 21

21

Some observations

slide-22
SLIDE 22

22

The backpropagated error

slide-23
SLIDE 23

23

The backpropagated error

Lemma

slide-24
SLIDE 24

24

The backpropagated error

Therefore,

slide-25
SLIDE 25

25

The backpropagation algorithm

slide-26
SLIDE 26

26

slide-27
SLIDE 27

27

slide-28
SLIDE 28

28

slide-29
SLIDE 29

29

slide-30
SLIDE 30

30

slide-31
SLIDE 31

31

slide-32
SLIDE 32

32

slide-33
SLIDE 33

33

What functions can multilayer perceptrons represent?

slide-34
SLIDE 34

34

Perceptrons cannot represent the XOR function

f(0,0)=1, f(1,1)=1, f(0,1)=0, f(1,0)=0

What functions can multilayer perceptrons represent?

slide-35
SLIDE 35

35

“Solve 7-th degree equation using continuous functions of two parameters.”

Related conjecture: Let f be a function of 3 arguments such that Prove that f cannot be rewritten as a composition of finitely many functions of two arguments. Another rewritten form: Prove that there is a nonlinear continuous system of three variables that cannot be decomposed with finitely many functions of two variables.

Hilbert’s 13th Problem

1902: 23 “most important” problems in mathematics The 13th Problem: Conjecture: It can’t be solved…

slide-36
SLIDE 36

36

f(x,y,z)=Φ1(ψ1(x), ψ2(y))+Φ2(c1ψ3(y)+c2ψ4(z),x)

x z y

Σ Φ1 Φ2 ψ1 ψ2 ψ3 ψ4 Σ

f(x,y,z)

c1 c2

Function decompositions

slide-37
SLIDE 37

37

1957, Arnold disproves Hilbert’s conjecture.

Function decompositions

slide-38
SLIDE 38

38

Issues: This statement is not constructive.

Function decompositions

Corollary:

slide-39
SLIDE 39

39

Kur Hornik, Maxwell Stinchcombe and Halber White: “Multilayer feedforward networks are universal approximators”, Neural Networks, Vol:2(3), 359-366, 1989

Universal Approximators

Definition: ΣN(g) neural network with 1 hidden layer: Theorem: Definition:

slide-40
SLIDE 40

40

Theorem: (Blum & Li, 1991)

Universal Approximators

Definition: Formal statement:

slide-41
SLIDE 41

41

Integral approximation in 1-dim:

 

i j i i

      

Proof

xi xi xi

Integral approximation in 2-dim: GOAL:

slide-42
SLIDE 42

42

xi xi xi

The indicator function of Xi polygon can be learned by this neural network: 1 if x is in Xi

  • 1 otherwise

Proof

The weighted linear combination of these indicator functions will be a good approximation of the original function f GOAL:

slide-43
SLIDE 43

43

Proof

This linear equation can also be solved.

slide-44
SLIDE 44

44

Thanks for your attention!