Machine Learning
Logistic Regression
Hamid R. Rabiee
Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1 /
Machine Learning Logistic Regression Hamid R. Rabiee Spring 2015 - - PowerPoint PPT Presentation
Machine Learning Logistic Regression Hamid R. Rabiee Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1 / Agenda Agenda Probabilistic Classification Introduction to Logistic regression Binary logistic regression Logistic
Hamid R. Rabiee
Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1 /
Sharif University of Technology, Computer Engineering Department, Machine Learning Course
2
Agenda Agenda
Probabilistic Classification Introduction to Logistic regression Binary logistic regression Logistic regression: Decision surface Logistic regression: ML estimation Logistic regression: Gradient descent Logistic regression: multi-class Logistic Regression: Regularization Logistic Regression VS. Naïve Bayes
Sharif University of Technology, Computer Engineering Department, Machine Learning Course
3
Probab Probabil ilis isti tic C c Classi lassifi ficati cation
Generative probabilistic classification (Previous lecture)
motivation: assume a distribution for each class and try to find the parameters for the distributions cons: need to assume distributions; need to fit many parameters
Discriminative approach: Logistic regression (Focus of today)
motivation: like least square, but assume logistic distribution y(x) = (wTx); classify based on y(x) > 0:5 or not. technique: gradient descent
Sharif University of Technology, Computer Engineering Department, Machine Learning Course
4
Int Introducti roduction to
Logistic r c regression egression
Logistic regression represents the probability of category i using a linear function of the input variables: The name comes from the logit transformation:
Sharif University of Technology, Computer Engineering Department, Machine Learning Course
5
Bi Binary logist nary logistic regressi ic regression
Logistic Regression assumes a parametric form for the distribution then directly estimates its parameters from the training data. The parametric model assumed by Logistic Regression in the case where is boolean is: Notice that equation (2) follows directly from equation (1), because the sum of these two probabilities must equal 1.
( | ) P Y X
Sharif University of Technology, Computer Engineering Department, Machine Learning Course
6
Bi Binary logist nary logistic regressi ic regression
We only need one set of parameters: Sigmoid (logistic) function
Sharif University of Technology, Computer Engineering Department, Machine Learning Course
7
Adapted from slides of John Whitehead
Logisti Logistic r c regression egression vs. Linear
regression egression
Sharif University of Technology, Computer Engineering Department, Machine Learning Course
8
Logisti Logistic r c regression: egression: Decisi Decision surf
ace
Given a logistic regression W and an X: Decision surface 𝑔(𝒚;𝒙)=constant Decision surfaces are linear functions of 𝒚 Decision making on Y:
Sharif University of Technology, Computer Engineering Department, Machine Learning Course
9
Computing the likelihood in details
We can re-express the log of the conditional likelihood as:
1 1
l l l l l l l l l l l l l l l n n l l l i i i i l i i
Sharif University of Technology, Computer Engineering Department, Machine Learning Course
10
Logistic regression: ML estimation
is a concave in w No closed form solution What is a concave and a convex function?
Sharif University of Technology, Computer Engineering Department, Machine Learning Course
11
Opti ptimi mizing co zing concav ncave/convex e/convex functi function
Maximum of a concave function = minimum of a convex function Gradient ascent (concave) / Gradient descent (convex)
Sharif University of Technology, Computer Engineering Department, Machine Learning Course
12
Gradi radient a ent ascen scent t / G / Gradi radient d ent desce escent nt
For function f(w)
If f is concave : Gradient ascent rule If f is convex: Gradient descent rule
Sharif University of Technology, Computer Engineering Department, Machine Learning Course
13
Logistic regression: Gradient descent
Iteratively updating the weights in this fashion increases likelihood each round. We eventually reach the maximum We are near the maximum when changes in the weights are small. Thus, we can stop when the sum of the absolute values of the weight differences is less than some small number.
Sharif University of Technology, Computer Engineering Department, Machine Learning Course
14
Logistic regression: multi-class
In the two-class case For multiclass, we work with soft-max function instead of logistic sigmoid Aka Softmax
Sharif University of Technology, Computer Engineering Department, Machine Learning Course
15
Logisti Logistic R c Regression: egression: Regulari Regularizati zation
Overfitting the training data is a problem that can arise in Logistic Regression, especially when data has very high dimensions and is sparse. One approach to reducing overfitting is regularization, in which we create a modified “penalized log likelihood function,” which penalizes large values of w. The derivative of this penalized log likelihood function is similar to our earlier derivative, with one additional penalty term which gives us the modified gradient descent rule
2
argmax ln ( | , ) || || 2
l l l
P y
w
w = x w w
ˆ ( ) ( ( 1| , ))
l l l l i i l i
l x y P y w w
w x w
ˆ ( ( 1| , ))
l l l l i i i i l
w w x y P y w
x w
Sharif University of Technology, Computer Engineering Department, Machine Learning Course
16
Logisti Logistic R c Regression VS. egression VS. N Naïve Bayes aïve Bayes
In general, NB and LR make different assumptions
NB: Features independent given class -> assumption on P(X|Y) LR: Functional form of P(Y|X), no assumption on P(X|Y)
LR is a linear classifier
decision rule is a hyperplane
LR optimized by conditional likelihood
no closed-form solution concave -> global optimum with gradient ascent
Sharif University of Technology, Computer Engineering Department, Machine Learning Course
17
Logisti Logistic R c Regression VS. egression VS. N Naïve Bayes aïve Bayes
Consider Y and Xi boolean, X=<X1... Xn> Number of parameters:
NB: 2n +1 LR: n+1
Estimation method:
NB parameter estimates are uncoupled LR parameter estimates are coupled
Sharif University of Technology, Computer Engineering Department, Machine Learning Course
18
Logistic Regression VS. Gaussian Naive Bayes
When the GNB modeling assumptions do not hold, Logistic Regression and GNB typically learn different classifier functions Logistic Regression is consistent with the Naïve Bayes assumption that the input features Xi are conditionally independent given Y ,it is not rigidly tied to this assumption as is Naive Bayes. GNB parameter estimates converge toward their asymptotic values in
Regression parameter estimates converge more slowly, requiring order (n ) examples.
Sharif University of Technology, Computer Engineering Department, Machine Learning Course
19
Summary
Logistic Regression learns the Conditional Probability Distribution P(y|x) Local Search. Begins with initial weight vector. Modifies it iteratively to maximize an objective function. The objective function is the conditional log likelihood of the data: so the algorithm seeks the probability distribution P(y|x) that is most likely given the data.
Sharif University of Technology, Computer Engineering Department, Machine Learning Course
20
Any Q Any Questi uestion
Spring 2015
http://ce.sharif.edu/courses/93-94/2/ce717-1/