Machine Learning Basics Lecture 2: Linear Classification
Princeton University COS 495 Instructor: Yingyu Liang
Lecture 2: Linear Classification Princeton University COS 495 - - PowerPoint PPT Presentation
Machine Learning Basics Lecture 2: Linear Classification Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data , : 1 i.i.d. from
Princeton University COS 495 Instructor: Yingyu Liang
π π =
1 π Οπ=1 π
π(π, π¦π, π§π)
π π = π½ π¦,π§ ~πΈ[π(π, π¦, π§)]
Experience
Prior knowledge
π₯ π¦ = π₯ππ¦ that minimizes ΰ·
π π
π₯ = 1 π Οπ=1 π
π₯ππ¦π β π§π 2
π2 loss
Linear model π
Questions:
likelihood π; π¦π, π§π β ππ(π¦π, π§π)
likelihood π; {π¦π, π§π} β ππ {π¦π, π§π} = Οπ ππ(π¦π, π§π)
πππ = argmaxΞΈβΞ Οπ ππ(π¦π, π§π)
πππ = argmaxΞΈβΞ log[Οπ ππ π¦π, π§π ] πππ = argmaxΞΈβΞ Οπ log[ππ π¦π, π§π ]
πππ = argmaxΞΈβΞ Οπ log(ππ π¦π, π§π ) π ππ, π¦π, π§π = β log(ππ π¦π, π§π ) ΰ· π ππ = β Οπ log(ππ π¦π, π§π )
πππ = argmaxΞΈβΞ Οπ log(ππ π§π|π¦π ) π ππ, π¦π, π§π = β log(ππ π§π|π¦π ) ΰ· π ππ = β Οπ log(ππ π§π|π¦π )
Only care about predicting y from x; do not care about p(x)
πππ = argmaxΞΈβΞ Οπ log(ππ π§π|π¦π ) π ππ, π¦π, π§π = β log(ππ π§π|π¦π ) ΰ· π ππ = β Οπ log(ππ π§π|π¦π )
P(y|x): discriminative; P(x,y): generative
π π¦ that minimizes ΰ·
π π
π = 1 π Οπ=1 π
π
π(π¦π) β π§π 2
π π¦ that minimizes ΰ·
π π
π = 1 π Οπ=1 π
π
π(π¦π) β π§π 2
π π¦ , π2
β1 2π2 (π π π¦π
β π§π)2βlog(π) β
1 2 log(2π)
1 π Οπ=1 π
π
π(π¦π) β π§π 2
π2 loss: Normal + MLE
indoor
Indoor
#β$β #βMr.β #βsaleβ β¦ Spam? Email 1 2 1 1 Yes Email 2 1 No Email 3 1 1 1 Yes β¦ Email n No New email 1 ??
π₯ππ¦ = 0 Class 1 Class 0 π₯ π₯ππ¦ > 0 π₯ππ¦ < 0
π₯ π¦ = π₯ππ¦
π₯ π¦ ) = step(π₯ππ¦)
Linear model π
π₯ π¦ = π₯ππ¦ to minimize ΰ·
π π
π₯ = 1 π Οπ=1 π
π[step(π₯ππ¦π) β π§π]
0-1 loss
π₯ π¦ = π₯ππ¦ that minimizes ΰ·
π π
π₯ = 1 π Οπ=1 π
π₯ππ¦π β π§π 2
Reduce to linear regression; ignore the fact π§ β {0,1}
Figure borrowed from Pattern Recognition and Machine Learning, Bishop
Drawback: not robust to βoutliersβ
π§ = π₯ππ¦
π₯ππ¦ π§
π§ = step(π₯ππ¦)
1 1+exp(βπ) Figure borrowed from Pattern Recognition and Machine Learning, Bishop
Sigmoid π₯ππ¦ = π π₯ππ¦ = 1 1 + exp(βπ₯ππ¦)
π π
π₯ = 1 π Οπ=1 π
π(π₯ππ¦π) β π§π 2
Sigmoid π₯ππ¦ = π π₯ππ¦ = 1 1 + exp(βπ₯ππ¦)
π
π₯(π§ = 1|π¦) = π π₯ππ¦ =
1 1 + exp(βπ₯ππ¦) π
π₯ π§ = 0 π¦ = 1 β π π₯ π§ = 1 π¦ = 1 β π π₯ππ¦
ΰ· π π₯ = β 1 π ΰ·
π=1 π
log π
π₯ π§ π¦
ΰ· π π₯ = β 1 π ΰ·
π§π=1
logπ(π₯ππ¦π) β 1 π ΰ·
π§π=0
log[1 β π π₯ππ¦π ]
Logistic regression:
MLE with sigmoid
ΰ· π π₯ = β 1 π ΰ·
π§π=1
logπ(π₯ππ¦π) β 1 π ΰ·
π§π=0
log[1 β π π₯ππ¦π ]
No close form solution; Need to use gradient descent
π π = 1 1 + exp(βπ) β (0,1)
1 β π π = exp βπ 1 + exp βπ = 1 exp π + 1 = π(βπ)
πβ²(π) = exp βπ 1 + exp βπ
2 = π(π)(1 β π π )