advanced section 5 generalized linear models logistic
play

Advanced Section #5: Generalized Linear Models: Logistic Regression - PowerPoint PPT Presentation

Advanced Section #5: Generalized Linear Models: Logistic Regression and Beyond Nick Stern CS109A Introduction to Data Science Pavlos Protopapas, Kevin Rader, and Chris Tanner 1 Outline Motivation 1. Limitations of linear regression 2.


  1. Advanced Section #5: Generalized Linear Models: Logistic Regression and Beyond Nick Stern CS109A Introduction to Data Science Pavlos Protopapas, Kevin Rader, and Chris Tanner 1

  2. Outline Motivation 1. Limitations of linear regression • 2. Anatomy Exponential Dispersion Family (EDF) • Link function • 3. Maximum Likelihood Estimation for GLM’s Fischer Scoring • CS109A, P ROTOPAPAS , R ADER 2

  3. Motivation CS109A, P ROTOPAPAS , R ADER 3

  4. Motivation Linear regression framework: % 𝛾 + 𝜗 " 𝑧 " = 𝑦 " Assumptions: 1. Linearity: Linear relationship between expected value and predictors 2. Normality: Residuals are normally distributed about expected value Homoskedasticity: Residuals have constant variance 𝜏 * 3. 4. Independence: Observations are independent of one another CS109A, P ROTOPAPAS , R ADER 4

  5. Motivation Expressed mathematically … • Linearity % 𝛾 𝔽 𝑧 " = 𝑦 " • Normality % 𝛾, 𝜏 * ) 𝑧 " ∼ 𝒪(𝑦 " • Homoskedasticity 𝜏 * (instead of) 𝜏 " * • Independence 𝑞 𝑧 " |𝑧 3 = 𝑞(𝑧 " ) for 𝑗 ≠ 𝑘 CS109A, P ROTOPAPAS , R ADER 5

  6. Motivation What happens when our assumptions break down? CS109A, P ROTOPAPAS , R ADER 6

  7. Motivation We have options within the framework of linear regression Heteroskedasticity Nonlinearity Transform X or Y Weight observations (Polynomial Regression) (WLS Regression) CS109A, P ROTOPAPAS , R ADER 7

  8. Motivation But assuming Normality can be pretty limiting … Consider modeling the following random variables: • Whether a coin flip is heads or tails (Bernoulli) • Counts of species in a given area (Poisson) • Time between stochastic events that occur w/ constant rate (gamma) • Vote counts for multiple candidates in a poll (multinomial) CS109A, P ROTOPAPAS , R ADER 8

  9. Motivation We can extend the framework for linear regression. Enter: Generalized Linear Models Relaxes: • Normality assumption • Homoskedasticity assumption CS109A, P ROTOPAPAS , R ADER 9

  10. Motivation CS109A, P ROTOPAPAS , R ADER 10

  11. Anatomy CS109A, P ROTOPAPAS , R ADER 11

  12. Anatomy Two adjustments must be made to turn LM into GLM 1. Assume response variable comes from a family of distributions called the exponential dispersion family (EDF) . 2. The relationship between expected value and predictors is expressed through a link function . CS109A, P ROTOPAPAS , R ADER 12

  13. Anatomy – EDF Family The EDF family contains: Normal, Poisson, gamma, and more! The probability density function looks like this: 𝑔 𝑧 " |𝜄 " = exp 𝑧 " 𝜄 " − 𝑐 𝜄 " + 𝑑 𝑧 " , 𝜚 " 𝜚 " Where 𝜄 - “ canonical parameter ” 𝜚 - “ dispersion parameter ” 𝑐 𝜄 - “ cumulant function ” 𝑑 𝑧, 𝜚 - “ normalization factor ” CS109A, P ROTOPAPAS , R ADER 13

  14. Anatomy – EDF Family Example: representing Bernoulli distribution in EDF form. PDF of a Bernoulli random variable: @ A 1 − 𝑞 " C D @ A 𝑔 𝑧 " 𝑞 " = 𝑞 " Taking the log and then exponentiating (to cancel each other out) gives: 𝑔 𝑧 " 𝑞 " = exp 𝑧 " log 𝑞 " + 1 − 𝑧 " log 1 − 𝑞 " Rearranging terms … 𝑞 " 𝑔 𝑧 " 𝑞 " = exp 𝑧 " log + log 1 − 𝑞 " 1 − 𝑞 " CS109A, P ROTOPAPAS , R ADER 14

  15. Anatomy – EDF Family Comparing: 𝑔 𝑧 " |𝜄 " = exp 𝑧 " 𝜄 " − 𝑐 𝜄 " 𝑞 " vs. + 𝑑 𝑧 " , 𝜚 " 𝑔 𝑧 " 𝑞 " = exp 𝑧 " log + log 1 − 𝑞 " 𝜚 " 1 − 𝑞 " Choosing: 𝑞 " 𝑐(𝜄 " ) = log 1 + 𝑓 I A 𝜄 " = log 1 − 𝑞 " 𝑑(𝑧 " , 𝜚 " ) = 0 𝜚 " = 1 And we recover the EDF form of the Bernoulli distribution CS109A, P ROTOPAPAS , R ADER 15

  16. Anatomy – EDF Family The EDF family has some useful properties. Namely: 1. 𝔽 𝑧 " ≡ 𝜈 " = 𝑐 N 𝜄 " 2. 𝑊𝑏𝑠 𝑧 " = 𝜚 " 𝑐 NN 𝜄 " (the proofs for these identities are in the notes) Plugging in the values we obtained for Bernoulli, we get back: 𝔽 𝑧 " = 𝑞 " , 𝑊𝑏𝑠 𝑧 " = 𝑞 " (1 − 𝑞 " ) CS109A, P ROTOPAPAS , R ADER 16

  17. Anatomy – Link Function Time to talk about the link function CS109A, P ROTOPAPAS , R ADER 17

  18. Anatomy – Link Function Recall from linear regression that: % 𝛾 𝜈 " = 𝑦 " Does this work for the Bernoulli distribution? % 𝛾 𝜈 " = 𝑞 " = 𝑦 " Solution: wrap the expectation in a function called the link function : % 𝛾 ≡ 𝜃 " 𝑕 𝜈 " = 𝑦 " *For the Bernoulli distribution, the link function is the “ logit ” function (hence “ logistic ” regression) CS109A, P ROTOPAPAS , R ADER 18

  19. Anatomy – Link Function Link functions are a choice, not a property. A good choice is: 1. Differentiable (implies “ smoothness ” ) 2. Monotonic (guarantees invertibility) 1. Typically increasing so that 𝜈 increases w/ 𝜃 3. Expands the range of 𝜈 to the entire real line Example: Logit function for Bernoulli 𝑞 " 𝑕 𝜈 " = 𝑕 𝑞 " = log 1 − 𝑞 " CS109A, P ROTOPAPAS , R ADER 19

  20. Anatomy – Link Function Logit function for Bernoulli looks familiar … 𝑞 " 𝑕 𝑞 " = log = 𝜄 " 1 − 𝑞 " Choosing the link function by setting 𝜄 " = 𝜃 " gives us what is known as the “ canonical link function . ” Note: 𝜈 " = 𝑐 N 𝜄 " → 𝜄 " = 𝑐 NDC (𝜈 " ) (derivative of cumulant function must be invertible) This choice of link, while not always effective, has some nice properties. Take STAT 149 to find out more! CS109A, P ROTOPAPAS , R ADER 20

  21. Anatomy – Link Function Here are some more examples (fun exercises at home) Mean Function 𝝂 𝒋 = 𝒄 N (𝜾 𝒋 ) Distribution 𝒈(𝒛 𝒋 |𝜾 𝒋 ) Canonical Link 𝜾 𝒋 = 𝒉(𝝂 𝒋 ) 𝜄 " 𝜈 " Normal 𝜈 " 𝑓 I A log Bernoulli/Binomial 1 − 𝜈 " 1 + 𝑓 I A 𝑓 I A Poisson log(𝜈 " ) −1 −1 Gamma 𝜄 " 𝜈 " −1 DC Inverse Gaussian −2𝜄 " * * 2𝜈 " CS109A, P ROTOPAPAS , R ADER 21

  22. Maximum Likelihood Estimation CS109A, P ROTOPAPAS , R ADER 22

  23. Maximum Likelihood Estimation Recall from linear regression – we can estimate our parameters, 𝜄 , by choosing those that maximize the likelihood, 𝑀 𝑧 𝜄) , of the data, where: _ 𝑀 𝑧 𝜄 = ^ 𝑞 𝑧 " 𝜄 " " In words: likelihood is the probability of observing a set of “ N ” independent datapoints, given our assumptions about the generative process. CS109A, P ROTOPAPAS , R ADER 23

  24. Maximum Likelihood Estimation For GLM’s we can plug in the PDF of the EDF family: _ exp 𝑧 " 𝜄 " − 𝑐 𝜄 " 𝑀 𝑧 𝜄 = ^ + 𝑑 𝑧 " , 𝜚 " 𝜚 " "`C How do we maximize this? Differentiate w.r.t. 𝜄 and set equal to 0. Taking the log first simplifies our life: _ 𝑧 " 𝜄 " − 𝑐 𝜄 " _ ℓ 𝑧 𝜄 = b + b 𝑑 𝑧 " , 𝜚 " 𝜚 " "`C "`C CS109A, P ROTOPAPAS , R ADER 24

  25. Maximum Likelihood Estimation Through lots of calculus & algebra (see notes), we can obtain the following form for the derivative of the log-likelihood: _ 1 𝜖𝜈 " ℓ N 𝑧 𝜄 = b 𝜖𝛾 (𝑧 " − 𝜈 " ) 𝑊𝑏𝑠 𝑧 " "`C Setting this sum equal to 0 gives us the generalized estimating equations: _ 1 𝜖𝜈 " b 𝜖𝛾 (𝑧 " − 𝜈 " ) = 0 𝑊𝑏𝑠 𝑧 " "`C CS109A, P ROTOPAPAS , R ADER 25

  26. Maximum Likelihood Estimation When we use the canonical link, this simplifies to the normal equations : _ % 𝑧 " − 𝜈 " 𝑦 " b = 0 𝜚 " "`C Let’s attempt to solve the normal equations for the Bernoulli distribution. Plugging in 𝜈 " and 𝜚 " we get: _ e f 𝑓 d A % = 0 b 𝑧 " − 𝑦 " e f 1 − 𝑓 d A "`C CS109A, P ROTOPAPAS , R ADER 26

  27. Maximum Likelihood Estimation Sad news: we can’t isolate 𝛾 analytically. CS109A, P ROTOPAPAS , R ADER 27

  28. Maximum Likelihood Estimation Good news: we can approximate it numerically. One choice of algorithm is the Fisher Scoring algorithm. In order to find the 𝜄 that maximizes the log-likelihood, ℓ(𝑧|𝜄) : 1. Pick a starting value for our parameter, 𝜄 g . 2. Iteratively update this value as follows: ℓ N (𝜄 " ) 𝜄 "hC = 𝜄 " − 𝔽 ℓ NN 𝜄 " In words: perform gradient ascent with a learning rate inversely proportional to the expected curvature of the function at that point. CS109A, P ROTOPAPAS , R ADER 28

  29. Maximum Likelihood Estimation Here are the results of implementing the Fisher Scoring algorithm for simple logistic regression in python: DEMO CS109A, P ROTOPAPAS , R ADER 29

  30. Questions? CS109A, P ROTOPAPAS , R ADER 30

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend