Logistic Regression Machine Learning 1 Where are we? We have seen - PowerPoint PPT Presentation

Logistic Regression Machine Learning 1

Where are we? We have seen the following ideas – Linear models – Learning as loss minimization – Bayesian learning criteria (MAP and MLE estimation) – The Naïve Bayes classifier 2

This lecture • Logistic regression • Connection to Naïve Bayes • Training a logistic regression classifier • Back to loss minimization 3

Logistic Regression: Setup • The setting – Binary classification – Inputs: Feature vectors x 2 < d – Labels: y y 2 { -1 , +1 } • Training data – S = { (x (x i , y i ) }, m examples 5

Classification, but… The output y is discrete valued ( -1 or 1 ) Instead of predicting the output, let us try to predict P(y = 1 | x ) Expand hypothesis space to functions whose output is [0-1] • Original problem: < d ! { -1 , 1 } • Modified problem: < d ! [0-1] • Effectively make the problem a regression problem Many hypothesis spaces possible 6

Classification, but… The output y is discrete valued ( -1 or 1 ) Instead of predicting the output, let us try to predict P(y = 1 | x ) Expand hypothesis space to functions whose output is [0-1] • Original problem: < d ! { -1 , 1 } • Modified problem: < d ! [0-1] • Effectively make the problem a regression problem Many hypothesis spaces possible 7

The Sigmoid function The hypothesis space for logistic regression: All functions of the form That is, a linear function, composed with a sigmoid function (the logistic function) ¾ What is the domain and the range of the sigmoid function? This is a reasonable choice. We will see why later 8

The Sigmoid function The hypothesis space for logistic regression: All functions of the form That is, a linear function, composed with a sigmoid function (the logistic function) ¾ This is a reasonable choice. We will see why later 9

The Sigmoid function The hypothesis space for logistic regression: All functions of the form That is, a linear function, composed with a sigmoid function (the logistic function) ¾ What is the domain and the range of the sigmoid function? This is a reasonable choice. We will see why later 10

The Sigmoid function ¾ ( z ) z 11

The Sigmoid function What is its derivative with respect to z ? 12

The Sigmoid function What is its derivative with respect to z ? 13

Predicting probabilities According to the logistic regression model, we have 14

Predicting probabilities According to the logistic regression model, we have Or equivalently 17

Predicting probabilities According to the logistic regression model, we have Note that we are directly modeling Or equivalently 𝑄(𝑧 | 𝑦) rather than 𝑄(𝑦 |𝑧) and 𝑄(𝑧) 18

Predicting a label with logistic regression • Compute P(y =1 | x; w) • If this is greater than half, predict 1 else predict -1 – What does this correspond to in terms of w T x ? 19

Predicting a label with logistic regression • Compute P(y =1 | x; w) • If this is greater than half, predict 1 else predict -1 – What does this correspond to in terms of w T x ? – Prediction = sgn( w T x ) 20

Naïve Bayes and Logistic regression Remember that the naïve Bayes decision is a linear function log 𝑄(𝑧 = −1|𝐲, 𝐱) 𝑄(𝑧 = +1|𝐲, 𝐱) = 𝐱 2 𝐲 Here, the P’s represent the Naïve Bayes posterior distribution, and w can be used to calculate the priors and the likelihoods. That is, 𝑄(𝑧 = 1 | 𝐱, 𝐲) is computed using 𝑄(𝐲 | 𝑧 = 1, 𝐱) and 𝑄(𝑧 = 1 | 𝐱) 22

Naïve Bayes and Logistic regression Remember that the naïve Bayes decision is a linear function log 𝑄(𝑧 = −1|𝐲, 𝐱) 𝑄(𝑧 = +1|𝐲, 𝐱) = 𝐱 2 𝐲 But we also know that 𝑄 𝑧 = +1 𝐲, 𝐱 = 1 − 𝑄(𝑧 = −1|𝐲, 𝐱) 23

Naïve Bayes and Logistic regression Remember that the naïve Bayes decision is a linear function log 𝑄(𝑧 = −1|𝐲, 𝐱) 𝑄(𝑧 = +1|𝐲, 𝐱) = 𝐱 2 𝐲 But we also know that 𝑄 𝑧 = +1 𝐲, 𝐱 = 1 − 𝑄(𝑧 = −1|𝐲, 𝐱) Substituting in the above expression, we get 1 𝑄 𝑧 = +1 𝐱, 𝐲 = 𝜏 𝐱 2 𝐲 = (−𝐱 2 𝐲) 1 + exp 24

Naïve Bayes and Logistic regression Remember that the naïve Bayes decision is a linear function log 𝑄(𝑧 = −1|𝐲, 𝐱) 𝑄(𝑧 = +1|𝐲, 𝐱) = 𝐱 2 𝐲 That is, both naïve Bayes and logistic regression try to compute the same posterior distribution over the outputs But we also know that 𝑄 𝑧 = +1 𝐲, 𝐱 = 1 − 𝑄(𝑧 = −1|𝐲, 𝐱) Naïve Bayes is a generative model. Substituting in the above expression, we get Logistic Regression is the discriminative version. 1 𝑄 𝑧 = +1 𝐱, 𝐲 = 𝜏 𝐱 2 𝐲 = (−𝐱 2 𝐲) 1 + exp 25

This lecture • Logistic regression • Connection to Naïve Bayes • Training a logistic regression classifier – First: Maximum likelihood estimation – Then: Adding priors à Maximum a Posteriori estimation • Back to loss minimization 26

Maximum likelihood estimation Let’s get back to the problem of learning • Training data – S = { (x (x i , y i ) }, m examples • What we want – Find a w such that P(S | w ) is maximized – We know that our examples are drawn independently and are identically distributed (i.i.d) – How do we proceed? 27

Maximum likelihood estimation = argmax 𝑄 𝑇 𝐱 = argmax ; 𝑄 𝑧 < 𝐲 < , 𝐱) 𝐱 𝐱 <>? The usual trick: Convert products to sums by taking log Recall that this works only because log is an increasing function and the maximizer will not change 28

Maximum likelihood estimation = argmax 𝑄 𝑇 𝐱 = argmax ; 𝑄 𝑧 < 𝐲 < , 𝐱) 𝐱 𝐱 <>? Equivalent to solving = max 𝐱 @ log 𝑄 𝑧 < 𝐲 < , 𝐱) < 29

Maximum likelihood estimation = argmax 𝑄 𝑇 𝐱 = argmax ; 𝑄 𝑧 < 𝐲 < , 𝐱) 𝐱 𝐱 <>? = max 𝐱 @ log 𝑄 𝑧 < 𝐲 < , 𝐱) < But (by definition) we know that 1 𝑄 𝑧 𝐱, 𝐲 = 𝜏 𝑧 < 𝐱 2 𝐲 < = (−𝑧 < 𝐱 2 𝐲 < ) 1 + exp 30

1 𝑄 𝑧 𝐱, 𝐲 = (−y B 𝐱 2 𝐲 < ) 1 + exp Maximum likelihood estimation = argmax 𝑄 𝑇 𝐱 = argmax ; 𝑄 𝑧 < 𝐲 < , 𝐱) 𝐱 𝐱 <>? = max 𝐱 @ log 𝑄 𝑧 < 𝐲 < , 𝐱) < Equivalent to solving = (−𝑧 < 𝐱 2 𝐲 < ) max 𝐱 @ −log(1 + exp < 31

1 𝑄 𝑧 𝐱, 𝐲 = (−y B 𝐱 2 𝐲 < ) 1 + exp Maximum likelihood estimation = argmax 𝑄 𝑇 𝐱 = argmax ; 𝑄 𝑧 < 𝐲 < , 𝐱) 𝐱 𝐱 <>? = max 𝐱 @ log 𝑄 𝑧 < 𝐲 < , 𝐱) The goal : Maximum likelihood training of a < discriminative probabilistic classifier Equivalent to solving under the logistic model for the posterior = distribution. (−𝑧 < 𝐱 2 𝐲 < ) max 𝐱 @ −log(1 + exp < 32

1 𝑄 𝑧 𝐱, 𝐲 = (−y B 𝐱 2 𝐲 < ) 1 + exp Maximum likelihood estimation = argmax 𝑄 𝑇 𝐱 = argmax ; 𝑄 𝑧 < 𝐲 < , 𝐱) 𝐱 𝐱 <>? = max 𝐱 @ log 𝑄 𝑧 < 𝐲 < , 𝐱) The goal : Maximum likelihood training of a < discriminative probabilistic classifier Equivalent to solving under the logistic model for the posterior = distribution. (−𝑧 < 𝐱 2 𝐲 < ) max 𝐱 @ −log(1 + exp < Equivalent to: Training a linear classifier by minimizing the logistic loss . 33

� Maximum a posteriori estimation We could also add a prior on the weights Suppose each weight in the weight vector is drawn independently from the normal distribution with zero mean and standard deviation 𝜏 E E J 1 exp −𝑥 < 𝑞 𝐱 = ; 𝑞(𝑥 < ) = ; 𝜏 J 𝜏 2𝜌 F>? F>? 34

� MAP estimation for logistic regression E E J 1 exp −𝑥 < 𝑞 𝐱 = ; 𝑞(𝑥 < ) = ; 𝜏 J 𝜏 2𝜌 F>? F>? Let us work through this procedure again to see what changes 35

� MAP estimation for logistic regression E E J 1 exp −𝑥 < 𝑞 𝐱 = ; 𝑞(𝑥 < ) = ; 𝜏 J 𝜏 2𝜌 F>? F>? Let us work through this procedure again to see what changes What is the goal of MAP estimation? (In maximum likelihood, we maximized the likelihood of the data) 36

� MAP estimation for logistic regression E E J 1 exp −𝑥 < 𝑞 𝐱 = ; 𝑞(𝑥 < ) = ; 𝜏 J 𝜏 2𝜌 F>? F>? What is the goal of MAP estimation? (In maximum likelihood, we maximized the likelihood of the data) To maximize the posterior probability of the model given the data (i.e. to find the most probable model, given the data) 𝑄 𝐱 𝑇 ∝ 𝑄 𝑇 𝐱 𝑄(𝐱) 37

Logistic Regression Machine Learning 1 Where are we? We have seen - PowerPoint PPT Presentation

Logistic Regression Machine Learning 1 Where are we? We have seen the following ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Nave Bayes classifier 2 This lecture

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Todays lecture Logistic regression How can we use logistic regression for reranking? Shay

From Logistic Regression to Neural Networks CMSC 470 Marine Carpuat Logistic Regression What

LEARNING Outline Math Behind Logistic Regression Visualizing Logistic Regression Loss

Workshop 10.5a: Logistic regression Murray Logan August 23, 2016 Table of contents 1 Logistic

Logistic Regression using OLS1D in Excel 2013 XL4D: V0H XL4D: V0H XL4D: V0H 2015 Schield

Workshop 10.5a: Logistic regression Murray Logan 05 Sep 2016 Section 1 Logistic regression

Lecture 3: Logistic Regression Feng Li Shandong University fli@sdu.edu.cn September 21, 2020

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

XL4B: Logistic Regression using OLS1B in Excel 2013 25 Feb 2018 V0C-2x XL4B: V0C-2x XL4B: V0C-2x

Logistic regression Shay Cohen (based on slides by Sharon Goldwater) 28 October 2019 Todays

Machine Learning Logistic Regression Hamid R. Rabiee Spring 2015

Learning From Data Lecture 9 Logistic Regression and Gradient Descent Logistic Regression

Logistic regression Predict binary outcomes (success/failure) from numerical or categorical

Logistic Regression: MLE vs. OLS3 in Excel2013 25 Aug 2016 V0H V0H V0H Schield MLE vs.

EC400 Part II, Math for Micro: Lecture 3 Leonardo Felli NAB.SZT 13 September 2010 Sufficient

Central Limit Theorem for discrete loggases Vadim Gorin MIT (Cambridge) and IITP (Moscow)

Polytopes Associated with Symmetry Handling Christopher Hojny joint work with Marc Pfetsch

Welcome to The Nature Place and the Leadership Denver 2016 Retreat! 1 Leadership Denver 2016

On the maximum likelihood degree of linear mixed models with two variance components Mariusz Grz

Covers universal portfolio and stochastic portfolio theory Ting-Kam Leonard Wong University

Self-testing quantum systems of arbitrary local Self-testing quantum systems of arbitrary local

The Role of Normware in Trustworthy and Explainable AI Giovanni Sileno (g.sileno@uva.nl),