Machine Learning Basics Lecture 3: Perceptron
Princeton University COS 495 Instructor: Yingyu Liang
Lecture 3: Perceptron Princeton University COS 495 Instructor: - - PowerPoint PPT Presentation
Machine Learning Basics Lecture 3: Perceptron Princeton University COS 495 Instructor: Yingyu Liang Perceptron Overview Previous lectures: (Principle for loss function) MLE to derive loss Example: linear regression; some linear
Princeton University COS 495 Instructor: Yingyu Liang
(π₯β)ππ¦ = 0 Class +1 Class -1 π₯β (π₯β)ππ¦ > 0 (π₯β)ππ¦ < 0
π₯ π¦ = π₯ππ¦
π₯ π¦ ) = sign(π₯ππ¦)
Perceptron: figure from the lecture note of Nina Balcan
π₯π’+1
π π¦ = π₯π’ + π¦ ππ¦ = π₯π’ ππ¦ + π¦ππ¦ = π₯π’ ππ¦ + 1
π₯π’+1
π π¦ = π₯π’ β π¦ ππ¦ = π₯π’ ππ¦ β π¦ππ¦ = π₯π’ ππ¦ β 1
π¦π, π§π
example to the decision boundary is πΏ = min
π
| π₯β ππ¦π|
1 πΏ 2
mistakes
π¦π, π§π
example to the decision boundary is πΏ = min
π
| π₯β ππ¦π|
1 πΏ 2
mistakes
Need not be i.i.d. ! Do not depend on π, the length of the data sequence!
ππ₯β
π π₯β β₯ π₯π’ ππ₯β + πΏ
π₯π’+1
π π₯β = π₯π’ + π¦ ππ₯β = π₯π’ ππ₯β + π¦ππ₯β β₯ π₯π’ ππ₯β + πΏ
π₯π’+1
π π₯β = π₯π’ β π¦ ππ₯β = π₯π’ ππ₯β β π¦ππ₯β β₯ π₯π’ ππ₯β + πΏ
2 β€
π₯π’
2 + 1
π₯π’+1
2 =
π₯π’ + π¦
2 =
π₯π’
2 +
π¦
2 + 2π₯π’ ππ¦
Negative since we made a mistake on x
π π₯β β₯ π₯π’ ππ₯β + πΏ
2 β€
π₯π’
2 + 1
After π mistakes:
π
π₯β β₯ πΏπ
β€ βπ
π
π₯β β€ π₯π+1 So πΏπ β€ βπ, and thus π β€
1 πΏ 2
π π₯β β₯ π₯π’ ππ₯β + πΏ
2 β€
π₯π’
2 + 1
The correlation gets larger. Could be:
Rules out the bad case β2. π₯π’+1 gets much longerβ
Figure from Pattern Recognition and Machine Learning, Bishop
between neurons (i.e., artificial neural networks)
Example from Machine learning lecture notes by Tom Mitchell
Figure from Pattern Recognition and machine learning, Bishop
Neuron/perceptron
mental faculties compel us to think of the mind as a machine for rule- based manipulation of highly structured arrays of symbols. What we know of the brain compels us to think of human information processing in terms of manipulation of a large unstructured set of numbers, the activity levels of interconnected neurons.
π π , where the hypothesis is parametrized by π
π ππ’
π π =
1 π Οπ’=1 π
π(π, π¦π’, π§π’), but we only know π(π, π¦π’, π§π’) at time π’
π₯ π¦ = π₯ππ¦ that minimizes ΰ·
π π
π₯ = 1 π Οπ’=1 π
π₯ππ¦π’ β π§π’ 2
1 π π₯ππ¦π’ β π§π’ 2
2ππ’ π
π₯π’
ππ¦π’ β π§π’ π¦π’
ΰ· π π₯ = β 1 π ΰ·
π§π’=1
logπ(π₯ππ¦π’) β 1 π ΰ·
π§π’=β1
log[1 β π π₯ππ¦π’ ] ΰ· π π₯ = β 1 π ΰ·
π’
logπ(π§π’π₯ππ¦π’) π π₯, π¦π’, π§π’ = β1 π logπ(π§π’π₯ππ¦π’)
π π₯, π¦π’, π§π’ = β1 π logπ(π§π’π₯ππ¦π’) π₯π’+1 = π₯π’ β ππ’πΌπ π₯π’, π¦π’, π§π’ = π₯π’ +
ππ’ π π π 1βπ π π(π)
π§π’π¦π’ Where π = π§π’π₯π’
ππ¦π’
π π₯, π¦π’, π§π’ = βπ§π’π₯ππ¦π’ π[mistake on π¦π’] ΰ· π π₯ = β ΰ·
π’
π§π’π₯ππ¦π’ π[mistake on π¦π’] π₯π’+1 = π₯π’ β ππ’πΌπ π₯π’, π¦π’, π§π’ = π₯π’ + ππ’π§π’π¦π’ π[mistake on π¦π’]
π₯π’+1 = π₯π’ β ππ’πΌπ π₯π’, π¦π’, π§π’ = π₯π’ + ππ’π§π’π¦π’ π[mistake on π¦π’]
π₯π’+1 = π₯π’ + π§π’π¦π’ = π₯π’ + π¦
π₯π’+1 = π₯π’ + π§π’π¦π’ = π₯π’ β π¦
Pros:
Cons:
(π¦π’π+1,π§π’π+1),β¦, (π¦π’π+π,π§π’π+π)
ππ’+1 = ππ’ β ππ’πΌ 1 π ΰ·
1β€πβ€π
π ππ’, π¦π’π+π, π§π’π+π
http://www.cs.princeton.edu/courses/archive/spring16/cos495/
exercise by 10%.
before the solution to each question, list all people that you discussed with on that particular question.