Introduction to Machine Learning Perceptron Barnabs Pczos Contents - - PowerPoint PPT Presentation

introduction to machine learning
SMART_READER_LITE
LIVE PREVIEW

Introduction to Machine Learning Perceptron Barnabs Pczos Contents - - PowerPoint PPT Presentation

Introduction to Machine Learning Perceptron Barnabs Pczos Contents History of Artificial Neural Networks Definitions: Perceptron, Multi-Layer Perceptron Perceptron algorithm 2 Short History of Artificial Neural Networks 3


slide-1
SLIDE 1

Introduction to Machine Learning

Perceptron

Barnabás Póczos

slide-2
SLIDE 2

2

Contents

 History of Artificial Neural Networks  Definitions: Perceptron, Multi-Layer Perceptron  Perceptron algorithm

slide-3
SLIDE 3

3

Short History of Artificial Neural Networks

slide-4
SLIDE 4

4

 Progression (1943-1960)

  • First mathematical model of neurons

▪ Pitts & McCulloch (1943)

  • Beginning of artificial neural networks
  • Perceptron, Rosenblatt (1958)

▪ A single neuron for classification ▪ Perceptron learning rule ▪ Perceptron convergence theorem

 Degression (1960-1980)

  • Perceptron can’t even learn the XOR function
  • We don’t know how to train MLP
  • 1963 Backpropagation… but not much attention…

Bryson, A.E.; W.F. Denham; S.E. Dreyfus. Optimal programming problems with inequality constraints. I: Necessary conditions for extremal solutions. AIAA J. 1, 11 (1963) 2544-2550

Short History

slide-5
SLIDE 5

5

 Progression (1980-)

  • 1986 Backpropagation reinvented:

▪ Rumelhart, Hinton, Williams:

Learning representations by back-propagating errors. Nature, 323, 533—536, 1986

  • Successful applications:

▪ Character recognition, autonomous cars,…

  • Open questions: Overfitting? Network structure?

Neuron number? Layer number? Bad local minimum points? When to stop training?

  • Hopfield nets (1982), Boltzmann machines,…

Short History

slide-6
SLIDE 6

6

 Degression (1993-)

  • SVM: Vapnik and his co-workers developed the Support

Vector Machine (1993). It is a shallow architecture.

  • SVM and Graphical models almost kill the ANN research.
  • Training deeper networks consistently yields poor results.
  • Exception: deep convolutional neural networks, Yann

LeCun 1998. (discriminative model)

Short History

slide-7
SLIDE 7

7

Short History

Deep Belief Networks (DBN)

  • Hinton, G. E, Osindero, S., and Teh, Y. W. (2006).

A fast learning algorithm for deep belief nets. Neural Computation, 18:1527-1554.

  • Generative graphical model
  • Based on restrictive Boltzmann machines
  • Can be trained efficiently

Deep Autoencoder based networks Bengio, Y., Lamblin, P., Popovici, P., Larochelle, H. (2007). Greedy Layer-Wise Training of Deep Networks, Advances in Neural Information Processing Systems 19 Convolutional neural networks running on GPUs Alex Krizhevsky, Ilya Sutskever, Geoffrey Hinton, Advances in Neural Information Processing Systems 2012

Progression (2006-)

slide-8
SLIDE 8

8

The Neuron

slide-9
SLIDE 9

9

The Neuron

– Each neuron has a body, axon, and many dendrites – A neuron can fire or rest

– If the sum of weighted inputs larger than a threshold, then the neuron fires. – Synapses: The gap between the axon and other neuron’s dendrites. It determines the weights in the sum.

slide-10
SLIDE 10

10

The Mathematical Model of a Neuron

slide-11
SLIDE 11

11

  • Identity function
  • Threshold function

(perceptron)

  • Ramp function

Typical activation functions

slide-12
SLIDE 12

12

  • Logistic function

Typical activation functions

  • Hyperbolic tangent function
slide-13
SLIDE 13

13

Typical activation functions

  • Rectified Linear Unit (ReLU)
  • Exponential Linear Unit
  • Softplus function

(This is a smooth approximation of ReLU)

  • Leaky ReLU
slide-14
SLIDE 14

14

slide-15
SLIDE 15

15

slide-16
SLIDE 16

16

Structure of Neural Networks

slide-17
SLIDE 17

17

Input neurons, Hidden neurons, Output neurons

Fully Connected Neural Network

slide-18
SLIDE 18

18

Layers, Feedforward neural networks

Convention: The input layer is Layer 0.

slide-19
SLIDE 19

19

  • Multilayer perceptron: Connections only

between Layer i and Layer i+1

  • The most popular architecture.

Multilayer Perceptron

slide-20
SLIDE 20

20

slide-21
SLIDE 21

21

Recurrent Neural Networks

Recurrent NN: there are connections backwards too.

slide-22
SLIDE 22

22

The Perceptron

slide-23
SLIDE 23

23

The Training Set

slide-24
SLIDE 24

24

The Perceptron

slide-25
SLIDE 25

25

1

  • 1

The Perceptron

slide-26
SLIDE 26

Matlab: opengl hardwarebasic, nnd4pr

slide-27
SLIDE 27

27

Matlab demos: nnd3pc

slide-28
SLIDE 28

28

The Perceptron Algorithm

slide-29
SLIDE 29

The Perceptron algorithm

29

The perceptron learning algorithm

slide-30
SLIDE 30

30

The perceptron algorithm

Observation

slide-31
SLIDE 31

The Perceptron Algorithm

31

How can we remember this rule? An interesting property: we do not require the learning rate to go to zero!

slide-32
SLIDE 32

The Perceptron Algorithm

32

slide-33
SLIDE 33

Perceptron Convergence

33

slide-34
SLIDE 34

Perceptron Convergence

34

slide-35
SLIDE 35

Perceptron Convergence

35

Lemma Using this notation, the update rule can be written as Proof

slide-36
SLIDE 36

Perceptron Convergence

36

Lemma

slide-37
SLIDE 37

Perceptron Convergence

37

slide-38
SLIDE 38

Lower bound

38

slide-39
SLIDE 39

Upper bound

39

Therefore,

slide-40
SLIDE 40

Upper bound

40

Therefore,

slide-41
SLIDE 41

The Perceptron Algorithm

41

slide-42
SLIDE 42

Take me home!

 History of Neural Networks  Mathematical model of the neuron  Activation Functions  Perceptron definition  Perceptron algorithm  Perceptron Convergence Theorem

42