Machine Learning Basics Lecture 4: SVM I
Princeton University COS 495 Instructor: Yingyu Liang
Lecture 4: SVM I Princeton University COS 495 Instructor: Yingyu - - PowerPoint PPT Presentation
Machine Learning Basics Lecture 4: SVM I Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data , : 1 i.i.d. from distribution 1
Princeton University COS 495 Instructor: Yingyu Liang
π π =
1 π Οπ=1 π
π(π, π¦π, π§π)
π π = π½ π¦,π§ ~πΈ[π(π, π¦, π§)]
(π₯β)ππ¦ = 0 Class +1 Class -1 π₯β (π₯β)ππ¦ > 0 (π₯β)ππ¦ < 0 Assume perfect separation between the two classes
π₯ π¦ ) = sign(π₯ππ¦)
Class +1 Class -1 π₯2 π₯3 π₯1 Same on empirical loss; Different on test/expected loss
Class +1 Class -1 π₯1
New test data
Class +1 Class -1 π₯3
New test data
Class +1 Class -1 π₯2
New test data
Class +1 Class -1 π₯2
large margin
|π
π₯ π¦ |
| π₯ | to the hyperplane π π₯ π¦ = π₯ππ¦ = 0
Proof:
π₯ | π₯ |
π₯ π₯ π
π¦ =
π
π₯(π¦)
| π₯ | π₯ | π₯ |
π¦
π₯ π₯
π
π¦
π₯,π π¦ = π₯ππ¦ + π = 0
Proof:
βπ | π₯ | to the hyperplane π₯ππ¦ + π = 0
Proof:
π₯ | π₯ | to get the distance
π₯ π
π¦1 =
βπ | π₯ | since π₯ππ¦1 + π = 0
|ππ₯,π π¦ | | π₯ |
to the hyperplane π
π₯,π π¦ = π₯ππ¦ +
π = 0 Proof:
π₯ | π₯ |, then |π | is the distance
π₯,π π¦
π₯ππ₯ | π₯ | + π = 0 + π | π₯ |
π§ π¦ = π₯ππ¦ + π₯0 The notation here is:
Figure from Pattern Recognition and Machine Learning, Bishop
πΏ = min
π
|π
π₯,π π¦π |
| π₯ |
π₯,π, and recall π§π β {+1, β1}, we have
πΏ = min
π
π§ππ
π₯,π π¦π
| π₯ |
π₯,π incorrect on some π¦π, the margin is negative
max
π₯,π πΏ = max π₯,π min π
π§ππ
π₯,π π¦π
| π₯ | = max
π₯,π min π
π§π(π₯ππ¦π + π) | π₯ |
π§πβ π₯ππ¦πβ + π = 1 where π¦πβ is the point closest to the hyperplane π§π(ππ₯ππ¦π + ππ) | ππ₯ | = π§π(π₯ππ¦π + π) | π₯ |
π§πβ π₯ππ¦πβ + π = 1 where π¦πβ is the point closet to the hyperplane
π§π π₯ππ¦π + π β₯ 1 and at least for one π the equality holds
1 | π₯ |
min
π₯,π
1 2 π₯
2
π§π π₯ππ¦π + π β₯ 1, βπ
π₯β?
1 2 π₯
2 β€ π
π§π π₯ππ¦π + π β₯ 1, βπ
π₯β is the best weight (i.e., satisfying the smallest π) ΰ· π₯β
π₯β is the best weight (i.e., satisfying the smallest π) ΰ· π₯β
π₯β is the best weight (i.e., satisfying the smallest π) ΰ· π₯β
π₯β is the best weight (i.e., satisfying the smallest π) ΰ· π₯β
π₯β is the best weight (i.e., satisfying the smallest π) ΰ· π₯β
ΰ· π₯β
Corresponds to the hypothesis class
ΰ· π₯β
Corresponds to the hypothesis class
ΰ· π₯β
Corresponds to the hypothesis class
min
π₯,π
1 2 π₯
2
π§π π₯ππ¦π + π β₯ 1, βπ
β π₯, π, π· = 1 2 π₯
2
β ΰ·
π
π½π[π§π π₯ππ¦π + π β 1] where π· is the Lagrange multiplier
http://www.cs.princeton.edu/courses/archive/spring16/cos495/