Machine Learning Logistic Regression Hamid R. Rabiee Spring 2015 - PowerPoint PPT Presentation

Machine Learning Logistic Regression Hamid R. Rabiee Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1 /

Agenda Agenda  Probabilistic Classification  Introduction to Logistic regression  Binary logistic regression  Logistic regression: Decision surface  Logistic regression: ML estimation  Logistic regression: Gradient descent  Logistic regression: multi-class  Logistic Regression: Regularization  Logistic Regression VS. Naïve Bayes Sharif University of Technology, Computer Engineering Department, Machine Learning Course 2

Probab Probabil ilis isti tic C c Classi lassifi ficati cation on  Generative probabilistic classification (Previous lecture)  motivation: assume a distribution for each class and try to find the parameters for the distributions  cons: need to assume distributions; need to fit many parameters  Discriminative approach: Logistic regression (Focus of today)  motivation: like least square, but assume logistic distribution y(x) = (wTx); classify based on y(x) > 0:5 or not.  technique: gradient descent Sharif University of Technology, Computer Engineering Department, Machine Learning Course 3

Int Introducti roduction to on to Logisti Logistic r c regression egression  Logistic regression represents the probability of category i using a linear function of the input variables:  The name comes from the logit transformation: Sharif University of Technology, Computer Engineering Department, Machine Learning Course 4

Bi Binary logist nary logistic regressi ic regression on  Logistic Regression assumes a parametric form for the distribution ( | ) P Y X then directly estimates its parameters from the training data. The Y parametric model assumed by Logistic Regression in the case where is boolean is:  Notice that equation (2) follows directly from equation (1), because the sum of these two probabilities must equal 1. Sharif University of Technology, Computer Engineering Department, Machine Learning Course 5

Bi Binary logist nary logistic regressi ic regression on  We only need one set of parameters:  Sigmoid (logistic) function Sharif University of Technology, Computer Engineering Department, Machine Learning Course 6

Logisti Logistic r c regression egression vs. Linear vs. Linear r regression egression Adapted from slides of John Whitehead Sharif University of Technology, Computer Engineering Department, Machine Learning Course 7

Logisti Logistic r c regression: egression: Decisi Decision surf on surface ace  Given a logistic regression W and an X:  Decision surface 𝑔 ( 𝒚 ; 𝒙 )=constant  Decision surfaces are linear functions of 𝒚  Decision making on Y: Sharif University of Technology, Computer Engineering Department, Machine Learning Course 8

Computing the likelihood in details  We can re-express the log of the conditional likelihood as:       l l l l l l ( ) w ln ( 1| x w , ) (1 )ln ( 0| x w , ) l y P y y P y l  l l ( 1| x w , ) P y     l l l ln ln ( 0| x w , ) y P y  l l ( 0| x w , ) P y l n n         l l l ( ) ln(1 exp( )) y w w x w w x 0 0 i i i i   1 1 l i i Sharif University of Technology, Computer Engineering Department, Machine Learning Course 9

Logistic regression: ML estimation is a concave in w What is a concave and a convex function? No closed form solution Sharif University of Technology, Computer Engineering Department, Machine Learning Course 10

Opti ptimi mizing co zing concav ncave/convex e/convex functi function on  Maximum of a concave function = minimum of a convex function  Gradient ascent (concave) / Gradient descent (convex) Sharif University of Technology, Computer Engineering Department, Machine Learning Course 11

Gradi radient a ent ascen scent t / G / Gradi radient d ent desce escent nt  For function f(w)  If f is concave : Gradient ascent rule  If f is convex: Gradient descent rule Sharif University of Technology, Computer Engineering Department, Machine Learning Course 12

Logistic regression: Gradient descent  Iteratively updating the weights in this fashion increases likelihood each round.  We eventually reach the maximum  We are near the maximum when changes in the weights are small.  Thus, we can stop when the sum of the absolute values of the weight differences is less than some small number. Sharif University of Technology, Computer Engineering Department, Machine Learning Course 13

Logistic regression: multi-class  In the two-class case  For multiclass, we work with soft-max function instead of logistic sigmoid  Aka Softmax Sharif University of Technology, Computer Engineering Department, Machine Learning Course 14

Logisti Logistic R c Regression: egression: Regulari Regularizati zation on  Overfitting the training data is a problem that can arise in Logistic Regression, especially when data has very high dimensions and is sparse.  One approach to reducing overfitting is regularization, in which we create a modified “penalized log likelihood function,” which penalizes large values of w.    l l 2 w = argmax ln ( | x w , ) || w || P y 2 w l  The derivative of this penalized log likelihood function is similar to our earlier derivative, with one additional penalty term   ˆ      l l l l ( ) w ( ( 1| x w , )) l x y P y w  i i w l i  which gives us the modified gradient descent rule  ˆ        l l l l ( ( 1| x w , )) w w x y P y w i i i i l Sharif University of Technology, Computer Engineering Department, Machine Learning Course 15

Logisti Logistic R c Regression VS. egression VS. N Naïve Bayes aïve Bayes  In general, NB and LR make different assumptions  NB: Features independent given class -> assumption on P(X|Y)  LR: Functional form of P(Y|X), no assumption on P(X|Y)  LR is a linear classifier  decision rule is a hyperplane  LR optimized by conditional likelihood  no closed-form solution  concave -> global optimum with gradient ascent Sharif University of Technology, Computer Engineering Department, Machine Learning Course 16

Logisti Logistic R c Regression VS. egression VS. N Naïve Bayes aïve Bayes  Consider Y and Xi boolean, X=<X1... Xn>  Number of parameters:  NB: 2n +1  LR: n+1  Estimation method:  NB parameter estimates are uncoupled  LR parameter estimates are coupled Sharif University of Technology, Computer Engineering Department, Machine Learning Course 17

Logistic Regression VS. Gaussian Naive Bayes  When the GNB modeling assumptions do not hold, Logistic Regression and GNB typically learn different classifier functions  Logistic Regression is consistent with the Naïve Bayes assumption that the input features Xi are conditionally independent given Y ,it is not rigidly tied to this assumption as is Naive Bayes.  GNB parameter estimates converge toward their asymptotic values in order log(n) examples, where n is the dimension of X . Logistic Regression parameter estimates converge more slowly, requiring order (n ) examples. Sharif University of Technology, Computer Engineering Department, Machine Learning Course 18

Summary  Logistic Regression learns the Conditional Probability Distribution P(y|x)  Local Search.  Begins with initial weight vector.  Modifies it iteratively to maximize an objective function.  The objective function is the conditional log likelihood of the data: so the algorithm seeks the probability distribution P(y|x) that is most likely given the data. Sharif University of Technology, Computer Engineering Department, Machine Learning Course 19

Any Q Any Questi uestion on End of Lecture 9 Thank you! Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1/ Sharif University of Technology, Computer Engineering Department, Machine Learning Course 20

Machine Learning Logistic Regression Hamid R. Rabiee Spring 2015 - PowerPoint PPT Presentation

Machine Learning Logistic Regression Hamid R. Rabiee Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1 / Agenda Agenda Probabilistic Classification Introduction to Logistic regression Binary logistic regression Logistic

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

Machine learning for finance Nathan George Data Science Professor DataCamp Machine Learning

APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft K-means DBSCAN 1 MACHINE

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

Regularization Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824 Administrative

Excel2013: Model Logistic MLE 1Y1X Sept 2015 V1A V1A V1A Excel2013 Model Logistic MLE 1Y1X

Logistic regression Shay Cohen (based on slides by Sharon Goldwater) 28 October 2019 Todays

Logistic Regression Many slides attributable to: Prof. Mike Hughes Erik Sudderth (UCI) Finale

CSE 158 Lecture 10 Web Mining and Recommender Systems Midterm recap Midterm on Wednesday!

Logistic regression Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 17: Midterm

Applied Machine Learning Logistic and Softmax Regression Siamak Ravanbakhsh COMP 551 (Fall 2020)