Machine Learning
Neural Networks: Introduction
1
Based on slides and material from Geoffrey Hinton, Richard Socher, Dan Roth, Yoav Goldberg, Shai Shalev-Shwartz and Shai Ben-David, and others
Neural Networks: Introduction Machine Learning Based on slides and - - PowerPoint PPT Presentation
Neural Networks: Introduction Machine Learning Based on slides and material from Geoffrey Hinton, Richard Socher, Dan Roth, 1 Yoav Goldberg, Shai Shalev-Shwartz and Shai Ben-David, and others Where are we? Learning algorithms General learning
1
Based on slides and material from Geoffrey Hinton, Richard Socher, Dan Roth, Yoav Goldberg, Shai Shalev-Shwartz and Shai Ben-David, and others
General learning principles
Minimization
Learning algorithms
4
Produce linear classifiers
6
7
11
features dot product threshold Prediction sgn (&'( + *) = sgn(∑./0/ + *) Learning various algorithms perceptron, SVM, logistic regression,… in general, minimize loss But where do these input features come from? What if the features were outputs of another classifier?
12
13
14
Each of these connections have their own weights as well
15
16
This is a two layer feed forward neural network
17
The output layer The hidden layer The input layer This is a two layer feed forward neural network Think of the hidden layer as learning a good representation of the inputs
19
The dot product followed by the threshold constitutes a neuron Five neurons in this picture (four in hidden layer and one output) This is a two layer feed forward neural network
20
What if the inputs were the outputs of a classifier? The input layer We can make a three layer network…. And so on.
21
22
Functions that very loosely mimic a biological neuron
25
123423 = activation(&'( + *)
Functions that very loosely mimic a biological neuron
27
Dot product Threshold activation Other activations are possible 123423 = activation(&'( + *)
Name of the neuron Activation function: activation(;) Linear unit ; Threshold/sign unit sgn(;) Sigmoid unit 1 1 + exp (−;) Rectified linear unit (ReLU) max (0, ;) Tanh unit tanh (;)
28
123423 = activation(&'( + *) Many more activation functions exist (sinusoid, sinc, gaussian, polynomial…) Also called transfer functions
30
Input Hidden Output wFG
H
wFG
I
31
Input Hidden Output wFG
H
wFG
I
32
Called the architecture
Typically predefined, part of the design of the classifier Input Hidden Output wFG
H
wFG
I
33
Called the architecture
Typically predefined, part of the design of the classifier Learned from data Input Hidden Output wFG
H
wFG
I
34
See also: http://people.idsia.ch/~juergen/deep-learning-overview.html very
35
36
37
In general, convex polygons
Figure from Shai Shalev-Shwartz and Shai Ben-David, 2014
38
In general, unions
Figure from Shai Shalev-Shwartz and Shai Ben-David, 2014
[DasGupta et al 1993]
– Exercise: Prove this
– Upper bound: Ο J H N H – Lower bound: Ω N H
39
Exercise: Show that if we have only linear units, then multiple layers does not change the expressiveness
[DasGupta et al 1993]
– Exercise: Prove this
– Upper bound: Ο J H N H – Lower bound: Ω N H
40
[DasGupta et al 1993]
– Exercise: Prove this
– Upper bound: Ο J H N H – Lower bound: Ω N H
41
Exercise: Show that if we have only linear units, then multiple layers does not change the expressiveness