Advanced Section #5: Generalized Linear Models: Logistic Regression - PowerPoint PPT Presentation

Advanced Section #5: Generalized Linear Models: Logistic Regression and Beyond Nick Stern CS109A Introduction to Data Science Pavlos Protopapas, Kevin Rader, and Chris Tanner 1

Outline Motivation 1. Limitations of linear regression • 2. Anatomy Exponential Dispersion Family (EDF) • Link function • 3. Maximum Likelihood Estimation for GLM’s Fischer Scoring • CS109A, P ROTOPAPAS , R ADER 2

Motivation CS109A, P ROTOPAPAS , R ADER 3

Motivation Linear regression framework: % 𝛾 + 𝜗 " 𝑧 " = 𝑦 " Assumptions: 1. Linearity: Linear relationship between expected value and predictors 2. Normality: Residuals are normally distributed about expected value Homoskedasticity: Residuals have constant variance 𝜏 * 3. 4. Independence: Observations are independent of one another CS109A, P ROTOPAPAS , R ADER 4

Motivation Expressed mathematically … • Linearity % 𝛾 𝔽 𝑧 " = 𝑦 " • Normality % 𝛾, 𝜏 * ) 𝑧 " ∼ 𝒪(𝑦 " • Homoskedasticity 𝜏 * (instead of) 𝜏 " * • Independence 𝑞 𝑧 " |𝑧 3 = 𝑞(𝑧 " ) for 𝑗 ≠ 𝑘 CS109A, P ROTOPAPAS , R ADER 5

Motivation What happens when our assumptions break down? CS109A, P ROTOPAPAS , R ADER 6

Motivation We have options within the framework of linear regression Heteroskedasticity Nonlinearity Transform X or Y Weight observations (Polynomial Regression) (WLS Regression) CS109A, P ROTOPAPAS , R ADER 7

Motivation But assuming Normality can be pretty limiting … Consider modeling the following random variables: • Whether a coin flip is heads or tails (Bernoulli) • Counts of species in a given area (Poisson) • Time between stochastic events that occur w/ constant rate (gamma) • Vote counts for multiple candidates in a poll (multinomial) CS109A, P ROTOPAPAS , R ADER 8

Motivation We can extend the framework for linear regression. Enter: Generalized Linear Models Relaxes: • Normality assumption • Homoskedasticity assumption CS109A, P ROTOPAPAS , R ADER 9

Motivation CS109A, P ROTOPAPAS , R ADER 10

Anatomy CS109A, P ROTOPAPAS , R ADER 11

Anatomy Two adjustments must be made to turn LM into GLM 1. Assume response variable comes from a family of distributions called the exponential dispersion family (EDF) . 2. The relationship between expected value and predictors is expressed through a link function . CS109A, P ROTOPAPAS , R ADER 12

Anatomy – EDF Family The EDF family contains: Normal, Poisson, gamma, and more! The probability density function looks like this: 𝑔 𝑧 " |𝜄 " = exp 𝑧 " 𝜄 " − 𝑐 𝜄 " + 𝑑 𝑧 " , 𝜚 " 𝜚 " Where 𝜄 - “ canonical parameter ” 𝜚 - “ dispersion parameter ” 𝑐 𝜄 - “ cumulant function ” 𝑑 𝑧, 𝜚 - “ normalization factor ” CS109A, P ROTOPAPAS , R ADER 13

Anatomy – EDF Family Example: representing Bernoulli distribution in EDF form. PDF of a Bernoulli random variable: @ A 1 − 𝑞 " C D @ A 𝑔 𝑧 " 𝑞 " = 𝑞 " Taking the log and then exponentiating (to cancel each other out) gives: 𝑔 𝑧 " 𝑞 " = exp 𝑧 " log 𝑞 " + 1 − 𝑧 " log 1 − 𝑞 " Rearranging terms … 𝑞 " 𝑔 𝑧 " 𝑞 " = exp 𝑧 " log + log 1 − 𝑞 " 1 − 𝑞 " CS109A, P ROTOPAPAS , R ADER 14

Anatomy – EDF Family Comparing: 𝑔 𝑧 " |𝜄 " = exp 𝑧 " 𝜄 " − 𝑐 𝜄 " 𝑞 " vs. + 𝑑 𝑧 " , 𝜚 " 𝑔 𝑧 " 𝑞 " = exp 𝑧 " log + log 1 − 𝑞 " 𝜚 " 1 − 𝑞 " Choosing: 𝑞 " 𝑐(𝜄 " ) = log 1 + 𝑓 I A 𝜄 " = log 1 − 𝑞 " 𝑑(𝑧 " , 𝜚 " ) = 0 𝜚 " = 1 And we recover the EDF form of the Bernoulli distribution CS109A, P ROTOPAPAS , R ADER 15

Anatomy – EDF Family The EDF family has some useful properties. Namely: 1. 𝔽 𝑧 " ≡ 𝜈 " = 𝑐 N 𝜄 " 2. 𝑊𝑏𝑠 𝑧 " = 𝜚 " 𝑐 NN 𝜄 " (the proofs for these identities are in the notes) Plugging in the values we obtained for Bernoulli, we get back: 𝔽 𝑧 " = 𝑞 " , 𝑊𝑏𝑠 𝑧 " = 𝑞 " (1 − 𝑞 " ) CS109A, P ROTOPAPAS , R ADER 16

Anatomy – Link Function Time to talk about the link function CS109A, P ROTOPAPAS , R ADER 17

Anatomy – Link Function Recall from linear regression that: % 𝛾 𝜈 " = 𝑦 " Does this work for the Bernoulli distribution? % 𝛾 𝜈 " = 𝑞 " = 𝑦 " Solution: wrap the expectation in a function called the link function : % 𝛾 ≡ 𝜃 " 𝑕 𝜈 " = 𝑦 " *For the Bernoulli distribution, the link function is the “ logit ” function (hence “ logistic ” regression) CS109A, P ROTOPAPAS , R ADER 18

Anatomy – Link Function Link functions are a choice, not a property. A good choice is: 1. Differentiable (implies “ smoothness ” ) 2. Monotonic (guarantees invertibility) 1. Typically increasing so that 𝜈 increases w/ 𝜃 3. Expands the range of 𝜈 to the entire real line Example: Logit function for Bernoulli 𝑞 " 𝑕 𝜈 " = 𝑕 𝑞 " = log 1 − 𝑞 " CS109A, P ROTOPAPAS , R ADER 19

Anatomy – Link Function Logit function for Bernoulli looks familiar … 𝑞 " 𝑕 𝑞 " = log = 𝜄 " 1 − 𝑞 " Choosing the link function by setting 𝜄 " = 𝜃 " gives us what is known as the “ canonical link function . ” Note: 𝜈 " = 𝑐 N 𝜄 " → 𝜄 " = 𝑐 NDC (𝜈 " ) (derivative of cumulant function must be invertible) This choice of link, while not always effective, has some nice properties. Take STAT 149 to find out more! CS109A, P ROTOPAPAS , R ADER 20

Anatomy – Link Function Here are some more examples (fun exercises at home) Mean Function 𝝂 𝒋 = 𝒄 N (𝜾 𝒋 ) Distribution 𝒈(𝒛 𝒋 |𝜾 𝒋 ) Canonical Link 𝜾 𝒋 = 𝒉(𝝂 𝒋 ) 𝜄 " 𝜈 " Normal 𝜈 " 𝑓 I A log Bernoulli/Binomial 1 − 𝜈 " 1 + 𝑓 I A 𝑓 I A Poisson log(𝜈 " ) −1 −1 Gamma 𝜄 " 𝜈 " −1 DC Inverse Gaussian −2𝜄 " * * 2𝜈 " CS109A, P ROTOPAPAS , R ADER 21

Maximum Likelihood Estimation CS109A, P ROTOPAPAS , R ADER 22

Maximum Likelihood Estimation Recall from linear regression – we can estimate our parameters, 𝜄 , by choosing those that maximize the likelihood, 𝑀 𝑧 𝜄) , of the data, where: _ 𝑀 𝑧 𝜄 = ^ 𝑞 𝑧 " 𝜄 " " In words: likelihood is the probability of observing a set of “ N ” independent datapoints, given our assumptions about the generative process. CS109A, P ROTOPAPAS , R ADER 23

Maximum Likelihood Estimation For GLM’s we can plug in the PDF of the EDF family: _ exp 𝑧 " 𝜄 " − 𝑐 𝜄 " 𝑀 𝑧 𝜄 = ^ + 𝑑 𝑧 " , 𝜚 " 𝜚 " "`C How do we maximize this? Differentiate w.r.t. 𝜄 and set equal to 0. Taking the log first simplifies our life: _ 𝑧 " 𝜄 " − 𝑐 𝜄 " _ ℓ 𝑧 𝜄 = b + b 𝑑 𝑧 " , 𝜚 " 𝜚 " "`C "`C CS109A, P ROTOPAPAS , R ADER 24

Maximum Likelihood Estimation Through lots of calculus & algebra (see notes), we can obtain the following form for the derivative of the log-likelihood: _ 1 𝜖𝜈 " ℓ N 𝑧 𝜄 = b 𝜖𝛾 (𝑧 " − 𝜈 " ) 𝑊𝑏𝑠 𝑧 " "`C Setting this sum equal to 0 gives us the generalized estimating equations: _ 1 𝜖𝜈 " b 𝜖𝛾 (𝑧 " − 𝜈 " ) = 0 𝑊𝑏𝑠 𝑧 " "`C CS109A, P ROTOPAPAS , R ADER 25

Maximum Likelihood Estimation When we use the canonical link, this simplifies to the normal equations : _ % 𝑧 " − 𝜈 " 𝑦 " b = 0 𝜚 " "`C Let’s attempt to solve the normal equations for the Bernoulli distribution. Plugging in 𝜈 " and 𝜚 " we get: _ e f 𝑓 d A % = 0 b 𝑧 " − 𝑦 " e f 1 − 𝑓 d A "`C CS109A, P ROTOPAPAS , R ADER 26

Maximum Likelihood Estimation Sad news: we can’t isolate 𝛾 analytically. CS109A, P ROTOPAPAS , R ADER 27

Maximum Likelihood Estimation Good news: we can approximate it numerically. One choice of algorithm is the Fisher Scoring algorithm. In order to find the 𝜄 that maximizes the log-likelihood, ℓ(𝑧|𝜄) : 1. Pick a starting value for our parameter, 𝜄 g . 2. Iteratively update this value as follows: ℓ N (𝜄 " ) 𝜄 "hC = 𝜄 " − 𝔽 ℓ NN 𝜄 " In words: perform gradient ascent with a learning rate inversely proportional to the expected curvature of the function at that point. CS109A, P ROTOPAPAS , R ADER 28

Maximum Likelihood Estimation Here are the results of implementing the Fisher Scoring algorithm for simple logistic regression in python: DEMO CS109A, P ROTOPAPAS , R ADER 29

Questions? CS109A, P ROTOPAPAS , R ADER 30

Advanced Section #5: Generalized Linear Models: Logistic Regression - PowerPoint PPT Presentation

Advanced Section #5: Generalized Linear Models: Logistic Regression and Beyond Nick Stern CS109A Introduction to Data Science Pavlos Protopapas, Kevin Rader, and Chris Tanner 1 Outline Motivation 1. Limitations of linear regression 2.

Overview of logistic regression Richard Erickson Instructor DataCamp Generalized Linear Models

Multiple logistic regression Richard Erickson Instructor DataCamp Generalized Linear Models in

Limitations of linear models Richard Erickson Instructor DataCamp Generalized Linear Models in

Advanced Section #5: Generalized Linear Models: Logistic Regression and Beyond Marios Mattheakis

LOGISTIC REGRESSION AND GENERALIZED LINEAR MODELS W. RYAN LEE CS109/AC209/STAT121 ADVANCED

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Topics of the day Logistic regression and generalized linear models Rasmus Waagepetersen

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Introduction to the R Statistical Computing Environment Linear and Generalized Linear Models in R

Introduction to Data Science: Logistic 0 1 1 according to a data fit criterion. account

Generalized linear models Christopher F Baum EC 823: Applied Econometrics Boston College, Spring

Introduction to General and Generalized Linear Models Generalized Linear Models - part II Henrik

Introduction to General and Generalized Linear Models Generalized Linear Models - part I Henrik

Workshop 11.2a: Generalized Linear Mixed Effects Models (GLMM) Murray Logan February 7, 2017

Introduction to General and Generalized Linear Models Generalized Linear Models - part III Henrik

Logistic mixed models for DIF IRT models can be regarded as logistic mixed models (e.g., Adams,

SUCCE SSF UL IN E UROPE AN DE VE L OPME NT F UND GRANT ING PRACT ICAL RE COMME

What the Public Wants to Know Whats the Delay with the Needed Replacement of Old Pipe

BLACKSPRING RIDGE PROJECT UPDATE EDF EN Canada and Enbridge Vulcan County Presentation July 3

power plant and prospects for development of heat and cogeneration segment within PGE Group May

EDF TRAINING TO MNER CHARACTERISE AND MAINTAIN THE MEASUREMENT CHAIN JULY 2017 EDF Thermal

Redwood County Energy Dialogue Shanelle Montana Project Development September 13, 2019 Manager

Enedis - company overview What is Enedis? Subsidiary of the EDF Group Independent DSO Company

2016 LAS/ANS SYMPOSIUM Students Initiatives Proposal for Public Acceptance Wednesday, June 22