CSE 802 Spring 2017 Logistic Regression Inci M. Baytas Computer - PowerPoint PPT Presentation

CSE 802 Spring 2017 Logistic Regression Inci M. Baytas Computer Science Michigan State University March 29, 2017 1 / 10

Introduction ◮ Consider two-class classification problem, the posterior probability of class C 1 can be written as: w T Φ � � p ( C 1 | Φ) = y (Φ) = σ (1) ◮ σ ( · ) is the logistic sigmoid function. ◮ p ( C 2 | Φ) = 1 − p ( C 1 | Φ) ◮ Φ is a feature vector, a non-linear transformation on original observation space x . ◮ The model in Eq.1 is called as Logistic Regression in the terminology of statistics. 2 / 10

Logistic Regression I ◮ A classification model rather than regression ◮ A probabilistic discriminative model ◮ We estimate the parameter w directly. ◮ Comparison of logistic regression and generative model in M -dimensional space Φ : ◮ Logistic regression: M adjustable parameters. ◮ Generative models: Assume we fit Gaussian class conditional densities using maximum likelihood; M ( M + 5) / 2 + 1 = Means: 2 M + Shared covariance: ( M + 1) M/ 2 + Prior p ( C 1 ) : 1 ◮ Maximum likelihood is used to determine the parameters of logistic regression model. 3 / 10

Logistic Regression II ◮ Definition and properties of logistic sigmoid function: 1 σ ( a ) = 1 + exp ( − a ) (2) σ ( − a ) = 1 − σ ( a ) dσ da = σ (1 − σ ) 4 / 10

Logistic Regression III - How to Estimate w ◮ For a training data set { Φ n , t n } , where t n ∈ { 0 , 1 } and Φ n = Φ ( x n ) , with n = 1 , ..., N , the log likelihood can be written as: N n { 1 − y n } 1 − t n � y t n p ( t | w ) = (3) n =1 where t = ( t 1 , ..., t N ) T and y n = p ( C 1 | Φ n ) ◮ The error function is the negative logarithm of the likelihood, known as Cross-entropy error function: N � E ( w ) = − ln p ( t | w ) = − { t n ln y n + (1 − t n ) ln (1 − y n ) } n =1 (4) where y n = σ ( a n ) and a n = w T Φ n . 5 / 10

Logistic Regression IV - How to Estimate w ◮ There is no analytical (closed-form) solution. ◮ The cross entropy loss is a convex function. ◮ There is a global minimum. ◮ Can use an iterative approach. ◮ Calculate the gradient with respect to w : N � ∇ E ( w ) = ( y n − t n ) Φ n (5) n =1 ◮ Use gradient descent (batch or online): w τ +1 = w τ − η ∇ E ( w τ ) (6) 6 / 10

Logistic Regression V - How to Estimate w ◮ Newton-Raphson Algorithm w ( new ) = w ( old ) − H − 1 ∇ E ( w ) (7) ◮ It uses a local quadratic approximation to the cross-entropy error function to update w iteratively. ◮ Newton-Raphson algorithm is also known as iterative reweighted least squares . ◮ Convexity: H is positive definite (eigenvalues of H are non-negative). 7 / 10

Multi-class Logistic Regression ◮ Cross-entropy for multi-class classification problem: N K � � E ( w 1 , ..., w K ) = − t nk ln y nk (8) n =1 k =1 exp ( w T k Φ ) where y k (Φ) = p ( C k | Φ) = j Φ ) which is called j exp ( w T � softmax function . ◮ Use maximum likelihood to estimate the parameters. ◮ Use an iterative approach such as Newton-Rapson. 8 / 10

Over-fitting in Logistic Regression ◮ Maximum likelihood can suffer from severe over-fitting. ◮ This can be overcome by finding a MAP solution for w (Bayesian treatment). ◮ Another alternative is to use regularization. ◮ Add regularizers to the loss function, regularized log-likelihood. ◮ ℓ 2 norm ◮ ℓ 1 norm (Lasso) 9 / 10

References ◮ Classification lecture of Dr. Jiayu Zhou. ◮ Christopher Bishop, Pattern Recognition and Machine Learning, Information Science and Statistics , Springer-Verlag New York, 2006. 10 / 10

CSE 802 Spring 2017 Logistic Regression Inci M. Baytas Computer - PowerPoint PPT Presentation

CSE 802 Spring 2017 Logistic Regression Inci M. Baytas Computer Science Michigan State University March 29, 2017 1 / 10 Introduction Consider two-class classification problem, the posterior probability of class C 1 can be written as: w T

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Todays lecture Logistic regression How can we use logistic regression for reranking? Shay

From Logistic Regression to Neural Networks CMSC 470 Marine Carpuat Logistic Regression What

LEARNING Outline Math Behind Logistic Regression Visualizing Logistic Regression Loss

Workshop 10.5a: Logistic regression Murray Logan August 23, 2016 Table of contents 1 Logistic

Machine Learning Logistic Regression Hamid R. Rabiee Spring 2015

Logistic Regression using OLS1D in Excel 2013 XL4D: V0H XL4D: V0H XL4D: V0H 2015 Schield

Workshop 10.5a: Logistic regression Murray Logan 05 Sep 2016 Section 1 Logistic regression

Lecture 3: Logistic Regression Feng Li Shandong University fli@sdu.edu.cn September 21, 2020

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Discussion with 802.1 Regarding 802.3at/802.3az use of LLDP July 2008 Denver Plenary Wael

XL4B: Logistic Regression using OLS1B in Excel 2013 25 Feb 2018 V0C-2x XL4B: V0C-2x XL4B: V0C-2x

Logistic regression Shay Cohen (based on slides by Sharon Goldwater) 28 October 2019 Todays

Logistic Regression using Excel OLS with Nudge V1F 7/27/2017 V1F 2017 ASA 1 V1F 2017

Learning From Data Lecture 9 Logistic Regression and Gradient Descent Logistic Regression

+ + Error Surfaces Backpropagation is based on gradient descent in a criterion function, we

Baby Penguin Slips and Slides Baby Penguin Slips and Slides Book Review Book Review This book

DHCP Relay Agent Assignment Notification Option IETF-64 Bernie Volz PD Route Injection

Smart Networks III Cybersecurity Panel Verizon Perspective Ernie Hayden CISSP CEH Managing

Bounds for the capacity error function for unidirectional channels with noiseless feedback

Training Neural Networks with Local Error Signals Arild Nkland Lars H. Eidnes Local learning

CS7015 (Deep Learning) : Lecture 3 Sigmoid Neurons, Gradient Descent, Feedforward Neural Networks,

Error Handling Marco Chiarandini Department of Mathematics & Computer Science University of

CSE 802 Spring 2017 Logistic Regression Inci M. Baytas Computer - PowerPoint PPT Presentation

CSE 802 Spring 2017 Logistic Regression Inci M. Baytas Computer Science Michigan State University March 29, 2017 1 / 10 Introduction Consider two-class classification problem, the posterior probability of class C 1 can be written as: w T

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Todays lecture Logistic regression How can we use logistic regression for reranking? Shay

From Logistic Regression to Neural Networks CMSC 470 Marine Carpuat Logistic Regression What

LEARNING Outline Math Behind Logistic Regression Visualizing Logistic Regression Loss

Workshop 10.5a: Logistic regression Murray Logan August 23, 2016 Table of contents 1 Logistic

Machine Learning Logistic Regression Hamid R. Rabiee Spring 2015

Logistic Regression using OLS1D in Excel 2013 XL4D: V0H XL4D: V0H XL4D: V0H 2015 Schield

Workshop 10.5a: Logistic regression Murray Logan 05 Sep 2016 Section 1 Logistic regression

Lecture 3: Logistic Regression Feng Li Shandong University fli@sdu.edu.cn September 21, 2020

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Discussion with 802.1 Regarding 802.3at/802.3az use of LLDP July 2008 Denver Plenary Wael

XL4B: Logistic Regression using OLS1B in Excel 2013 25 Feb 2018 V0C-2x XL4B: V0C-2x XL4B: V0C-2x

Logistic regression Shay Cohen (based on slides by Sharon Goldwater) 28 October 2019 Todays

Logistic Regression using Excel OLS with Nudge V1F 7/27/2017 V1F 2017 ASA 1 V1F 2017

Learning From Data Lecture 9 Logistic Regression and Gradient Descent Logistic Regression

+ + Error Surfaces Backpropagation is based on gradient descent in a criterion function, we

Baby Penguin Slips and Slides Baby Penguin Slips and Slides Book Review Book Review This book

DHCP Relay Agent Assignment Notification Option IETF-64 Bernie Volz PD Route Injection

Smart Networks III Cybersecurity Panel Verizon Perspective Ernie Hayden CISSP CEH Managing

Bounds for the capacity error function for unidirectional channels with noiseless feedback

Training Neural Networks with Local Error Signals Arild Nkland Lars H. Eidnes Local learning

CS7015 (Deep Learning) : Lecture 3 Sigmoid Neurons, Gradient Descent, Feedforward Neural Networks,

Error Handling Marco Chiarandini Department of Mathematics &amp; Computer Science University of

Error Handling Marco Chiarandini Department of Mathematics & Computer Science University of