CS109A Introduction to Data Science
Pavlos Protopapas, Kevin Rader, and Chris Tanner
Advanced Section #5: Generalized Linear Models: Logistic Regression and Beyond
1
Advanced Section #5: Generalized Linear Models: Logistic Regression - - PowerPoint PPT Presentation
Advanced Section #5: Generalized Linear Models: Logistic Regression and Beyond Nick Stern CS109A Introduction to Data Science Pavlos Protopapas, Kevin Rader, and Chris Tanner 1 Outline Motivation 1. Limitations of linear regression 2.
1
CS109A, PROTOPAPAS, RADER
2
CS109A, PROTOPAPAS, RADER
3
CS109A, PROTOPAPAS, RADER
4
%πΎ + π"
1. Linearity: Linear relationship between expected value and predictors 2. Normality: Residuals are normally distributed about expected value 3. Homoskedasticity: Residuals have constant variance π* 4. Independence: Observations are independent of one another
CS109A, PROTOPAPAS, RADER
5
π½ π§" = π¦"
%πΎ
π§" βΌ πͺ(π¦"
%πΎ, π*)
π* (instead of) π"
*
π π§"|π§3 = π(π§") for π β π
CS109A, PROTOPAPAS, RADER
6
CS109A, PROTOPAPAS, RADER
7
Transform X or Y (Polynomial Regression)
Nonlinearity
Weight observations (WLS Regression)
Heteroskedasticity
CS109A, PROTOPAPAS, RADER
8
CS109A, PROTOPAPAS, RADER
9
CS109A, PROTOPAPAS, RADER
10
CS109A, PROTOPAPAS, RADER
11
CS109A, PROTOPAPAS, RADER
12
CS109A, PROTOPAPAS, RADER
13
π - βcanonical parameterβ π - βdispersion parameterβ π π - βcumulant functionβ π π§, π - βnormalization factorβ
CS109A, PROTOPAPAS, RADER
14
PDF of a Bernoulli random variable: π π§" π" = π"
@A 1 β π" C D @A
Taking the log and then exponentiating (to cancel each other out) gives: π π§" π" = exp π§" log π" + 1 β π§" log 1 β π" Rearranging termsβ¦ π π§" π" = exp π§" log π" 1 β π" + log 1 β π"
CS109A, PROTOPAPAS, RADER
15
π π§" π" = exp π§" log π" 1 β π" + log 1 β π" π π§"|π" = exp π§"π" β π π" π" + π π§", π"
π" = log π" 1 β π" π" = 1 π(π") = log 1 + πIA π(π§", π") = 0
CS109A, PROTOPAPAS, RADER
16
(the proofs for these identities are in the notes)
CS109A, PROTOPAPAS, RADER
17
CS109A, PROTOPAPAS, RADER
18
%πΎ
%πΎ
%πΎ β‘ π"
*For the Bernoulli distribution, the link function is the βlogitβ function (hence βlogisticβ regression)
CS109A, PROTOPAPAS, RADER
19
Typically increasing so that π increases w/ π
Example: Logit function for Bernoulli
CS109A, PROTOPAPAS, RADER
20
(derivative of cumulant function must be invertible)
CS109A, PROTOPAPAS, RADER
21
Distribution π(ππ|πΎπ) Mean Function ππ = πN(πΎπ) Canonical Link πΎπ = π(ππ) Normal π" π" Bernoulli/Binomial πIA 1 + πIA log π" 1 β π" Poisson πIA log(π") Gamma β1 π" β1 π" Inverse Gaussian β2π"
DC *
β1 2π"
*
CS109A, PROTOPAPAS, RADER
22
CS109A, PROTOPAPAS, RADER
23
" _
CS109A, PROTOPAPAS, RADER
24
"`C _
"`C _ π§"π" β π π"
"`C _
CS109A, PROTOPAPAS, RADER
25
"`C _
"`C _
CS109A, PROTOPAPAS, RADER
26
"`C _
%
"`C _
ef
ef
% = 0
CS109A, PROTOPAPAS, RADER
27
CS109A, PROTOPAPAS, RADER
28
In order to find the π that maximizes the log-likelihood, β(π§|π):
π"hC = π" β βN(π") π½ βNN π" In words: perform gradient ascent with a learning rate inversely proportional to the expected curvature of the function at that point.
CS109A, PROTOPAPAS, RADER
29
CS109A, PROTOPAPAS, RADER
30