Learning From Data Lecture 9 Logistic Regression and Gradient Descent
Logistic Regression Gradient Descent
- M. Magdon-Ismail
CSCI 4100/6100
Learning From Data Lecture 9 Logistic Regression and Gradient - - PowerPoint PPT Presentation
Learning From Data Lecture 9 Logistic Regression and Gradient Descent Logistic Regression Gradient Descent M. Magdon-Ismail CSCI 4100/6100 recap: Linear Classification and Regression The linear signal: s = w t x Good Features are Important
CSCI 4100/6100
recap: Linear Classification and Regression
x1 x2 y
c A M L Creator: Malik Magdon-Ismail
Logistic Regression and Gradient Descent: 2 /23
Predicting a probability − →
c A M L Creator: Malik Magdon-Ismail
Logistic Regression and Gradient Descent: 3 /23
What is θ? − →
θ(s) 1 s
θ(s) = es 1 + es = 1 1 + e−s . θ(−s) = e−s 1 + e−s = 1 1 + es = 1 − θ(s).
c A M L Creator: Malik Magdon-Ismail
Logistic Regression and Gradient Descent: 4 /23
Data is binary ±1 − →
c A M L Creator: Malik Magdon-Ismail
Logistic Regression and Gradient Descent: 5 /23
f is noisy − →
c A M L Creator: Malik Magdon-Ismail
Logistic Regression and Gradient Descent: 6 /23
When is h good? − →
c A M L Creator: Malik Magdon-Ismail
Logistic Regression and Gradient Descent: 7 /23
Cross entropy error − →
Verify: yn = +1 encourages wtxn ≫ 0, so θ(wtxn) ≈ 1; yn = −1 encourages wtxn ≪ 0, so θ(wtxn) ≈ 0;
c A M L Creator: Malik Magdon-Ismail
Logistic Regression and Gradient Descent: 8 /23
Probabilistic interpretation − →
c A M L Creator: Malik Magdon-Ismail
Logistic Regression and Gradient Descent: 9 /23
1 − θ(s) = θ(−s) − →
c A M L Creator: Malik Magdon-Ismail
Logistic Regression and Gradient Descent: 10 /23
Simplify to one equation − →
c A M L Creator: Malik Magdon-Ismail
Logistic Regression and Gradient Descent: 11 /23
The likelihood − →
c A M L Creator: Malik Magdon-Ismail
Logistic Regression and Gradient Descent: 12 /23
Maximize the likelihood − →
c A M L Creator: Malik Magdon-Ismail
Logistic Regression and Gradient Descent: 13 /23
How to minimize Ein(w) − →
c A M L Creator: Malik Magdon-Ismail
Logistic Regression and Gradient Descent: 14 /23
Hill analogy − →
this is called a local minimum
c A M L Creator: Malik Magdon-Ismail
Logistic Regression and Gradient Descent: 15 /23
Our Ein is convex − →
(So, who care’s if it looks ugly!)
c A M L Creator: Malik Magdon-Ismail
Logistic Regression and Gradient Descent: 16 /23
How to roll down? − →
← what’s the best direction to take the step?
c A M L Creator: Malik Magdon-Ismail
Logistic Regression and Gradient Descent: 17 /23
The gradient − →
| | ∇Ein(w(t)) | |
(Taylor’s Approximation) >
v = − ∇Ein(w(t))
| | ∇Ein(w(t)) | |
c A M L Creator: Malik Magdon-Ismail
Logistic Regression and Gradient Descent: 18 /23
Iterate the gradient − →
c A M L Creator: Malik Magdon-Ismail
Logistic Regression and Gradient Descent: 19 /23
What step size? − →
Weights, w In-sample Error, Ein Weights, w In-sample Error, Ein Weights, w In-sample Error, Ein large η small η
c A M L Creator: Malik Magdon-Ismail
Logistic Regression and Gradient Descent: 20 /23
Fixed learning rate gradient descent − →
| | ∇Ein(w(t)) | | → 0 when closer to the minimum.
1: Initialize at step t = 0 to w(0). 2: for t = 0, 1, 2, . . . do 3:
4:
5:
6:
7: end for 8: Return the final weights.
← logistic regression
c A M L Creator: Malik Magdon-Ismail
Logistic Regression and Gradient Descent: 21 /23
Stochastic gradient descent − →
N cheaper per step;
c A M L Creator: Malik Magdon-Ismail
Logistic Regression and Gradient Descent: 22 /23
GD versus SGD, a picture − →
c A M L Creator: Malik Magdon-Ismail
Logistic Regression and Gradient Descent: 23 /23