Advanced Section #5: Generalized Linear Models: Logistic Regression - - PowerPoint PPT Presentation

advanced section 5 generalized linear models logistic
SMART_READER_LITE
LIVE PREVIEW

Advanced Section #5: Generalized Linear Models: Logistic Regression - - PowerPoint PPT Presentation

Advanced Section #5: Generalized Linear Models: Logistic Regression and Beyond Marios Mattheakis and Pavlos Protopapas CS109A Introduction to Data Science Pavlos Protopapas and Kevin Rader 1 Outline 1. Generalized Linear Models (GLMs): a.


slide-1
SLIDE 1

CS109A Introduction to Data Science

Pavlos Protopapas and Kevin Rader

Advanced Section #5: Generalized Linear Models: Logistic Regression and Beyond

1

Marios Mattheakis and Pavlos Protopapas

slide-2
SLIDE 2

CS109A, PROTOPAPAS, RADER

Outline

  • 1. Generalized Linear Models (GLMs):
  • a. Motivation.
  • b. Linear Regression Model (Recap): jumping-off point
  • c. Generalize the Linear Model:
  • i. Generalization of random component (Error Distribution).
  • ii. Generalization of systematic component (Link Function).
  • 2. Maximum Likelihood Estimation in this General Framework:
  • a. Canonical Links.
  • b. General Links.

2

slide-3
SLIDE 3

CS109A, PROTOPAPAS, RADER

Motivation

3

Ordinary Linear Regression (OLS) is a great model … but cannot describe all the situations. OLS assumes: ➢ Normal distributed observations. ➢ Expectation that linearly depends on predictors. Many real-world observations do not follow these assumptions, e.g.: ➢ Binary data: Bernoulli or Binomial distributions. ➢ Positive data: Exponential or Gamma distributions.

slide-4
SLIDE 4

CS109A, PROTOPAPAS, RADER

GLMs formulations: Overview

4

Error distribution:

Normal Poisson Bernoulli ...more

Regression Model

...more Exponential Family Distributions Link Function Generalized Linear Models

slide-5
SLIDE 5

CS109A, PROTOPAPAS, RADER

Regression Models

Suppose a dataset with n training points

5

In a Regression model we are looking for: ➢ is some fixed but unknown function. ➢ a random error term.

slide-6
SLIDE 6

CS109A, PROTOPAPAS, RADER

Linear Regression Model

6

The observations are independently distributed about: A linear predictor with a Normal distribution. Linear Model:

slide-7
SLIDE 7

CS109A, PROTOPAPAS, RADER

Linear Regression Model

The conditional on the predictor distribution:

7

slide-8
SLIDE 8

CS109A, PROTOPAPAS, RADER

GLMs formulation

8

slide-9
SLIDE 9

CS109A, PROTOPAPAS, RADER

GLMs formulation

This will be a two-step generalization of simple linear regression.

9

  • 1. Random Component:
  • 2. Systematic Component:
slide-10
SLIDE 10

CS109A, PROTOPAPAS, RADER

Exponential Family of Distributions

10

A wide range of distributions that includes a special cases the Normal, exponential, Gamma, Poisson, Bernoulli, binomial, and many others. : canonical parameter and is the parameter of interest. : dispersion parameter and is a scale parameter relative to variance. : cumulant function and completely characterizes the distribution. : normalization factor.

slide-11
SLIDE 11

CS109A, PROTOPAPAS, RADER

Likelihood and Score function

Likelihood:

11

log-likelihood:

easier and numerically more stable

Score function:

slide-12
SLIDE 12

CS109A, PROTOPAPAS, RADER

Two General Identities

12

is the called Fisher information matrix. denotes the ν moment.

slide-13
SLIDE 13

CS109A, PROTOPAPAS, RADER

Some derivatives before the proofs

13

First derivative of log-likelihood: Second derivative of log-likelihood:

slide-14
SLIDE 14

CS109A, PROTOPAPAS, RADER

Some useful relations before the proofs

14

The ν moment of an arbitrary function: Since the observations are assumed independent of each other: For a well defined probability density:

slide-15
SLIDE 15

CS109A, PROTOPAPAS, RADER

Proof of Identity I

15

Proof:

the regularity condition takes the derivative out of the integral.

slide-16
SLIDE 16

CS109A, PROTOPAPAS, RADER

Proof of Identity II

16

Proof

1st term: 2nd term:

slide-17
SLIDE 17

CS109A, PROTOPAPAS, RADER

Mean & Variance Formulas in the Exponential Family

17

is the cumulant function of the distribution, since it completely determines the first two moments. where primes denote derivatives w.r.t. canonical parameter

slide-18
SLIDE 18

CS109A, PROTOPAPAS, RADER

Some derivatives before the proofs

18

slide-19
SLIDE 19

CS109A, PROTOPAPAS, RADER

Proof of mean formula

19

Proof

slide-20
SLIDE 20

CS109A, PROTOPAPAS, RADER

Proof of Variance formula

20

Proof

slide-21
SLIDE 21

CS109A, PROTOPAPAS, RADER

Normal Distribution: Example

21

Probability density in Normal distribution:

slide-22
SLIDE 22

CS109A, PROTOPAPAS, RADER

Bernoulli distribution: Example

22

It is a discrete probability distribution of a random binary variable:

slide-23
SLIDE 23

CS109A, PROTOPAPAS, RADER

Second step of GLMs formulation: Link Function

23

Systematic Component:

slide-24
SLIDE 24

CS109A, PROTOPAPAS, RADER

Link Function

A link function is a one-to-one differentiable transformation that transforms the expectation values to be linear with the predictors

24

is called linear predictor. The link transforms the expectation NOT the observations. For instance, for the link One-to-one function, so we can invert to get

slide-25
SLIDE 25

CS109A, PROTOPAPAS, RADER

Canonical Links

25

A Canonical Link makes the linear predictor equal to the canonical parameter A Canonical Transformation is relative to the cumulant function So, the cumulant function must be invertible

slide-26
SLIDE 26

CS109A, PROTOPAPAS, RADER

Normal and Bernoulli distributions: Examples

We found earlier:

26

Hence, Normal Distribution: We found earlier: Hence, Bernoulli Distribution:

slide-27
SLIDE 27

CS109A, PROTOPAPAS, RADER

Data Distribution and Canonical Links

27

slide-28
SLIDE 28

CS109A, PROTOPAPAS, RADER

GLMs: A general framework

We found that linear, logistic and other regression models are special cases of the GMLs.

28

Working in such a general framework is a great advantage. There is general theory that can be applied afterwards in any specific distribution and regression model. For instance, we have the general Likelihood and we can derive to general equations that Maximize the Likelihood.

slide-29
SLIDE 29

CS109A, PROTOPAPAS, RADER

Maximum Likelihood Estimation (MLE)

29

slide-30
SLIDE 30

CS109A, PROTOPAPAS, RADER

Maximum Likelihood Estimation (MLE)

30

Likelihood in the Exponential Family: Log-likelihood in the Exponential Family:

slide-31
SLIDE 31

CS109A, PROTOPAPAS, RADER

log-likelihood is a strictly concave function

31

hence, it can be maximized.

slide-32
SLIDE 32

CS109A, PROTOPAPAS, RADER

MLE for Canonical Links

32

Normal Equations for MLE Solving Normal Equations we estimate the coefficients

slide-33
SLIDE 33

CS109A, PROTOPAPAS, RADER

MLE Examples

33

Normal Distribution: Link = Identity Bernoulli Distribution: Link = Logit

slide-34
SLIDE 34

CS109A, PROTOPAPAS, RADER

MLE for General Links

Sometimes we may use non-Canonical links. For instance, for algorithmic purposes such in the Bayesian probit regression.

34

Generalizing Estimating Equations:

slide-35
SLIDE 35

CS109A, PROTOPAPAS, RADER

Summary

  • Generalized Linear Models:

1. Motivation: OLS cannot describe everything. Good jumping-off. 2. Formulation: ➢ Generalization of Random Component (error distribution). ➢ Generalization of Systematic Component (Link function). 3. Normal & Bernoulli distributions: Examples.

  • Maximum Likelihood Estimation (MLE)

1. General Framework: One theory for many regression models. 2. Normal Equations for MLE (Canonical Links). ➢ Linear & Logistic Regression examples. 3. Generalized Estimating Equations (General Links).

35

slide-36
SLIDE 36

CS109A, PROTOPAPAS, RADER

Questions ??

Office hours for Adv. Sec. Monday 6:00-7:30 pm Tuesday 6:30-8:00 pm

36

Advanced Section 5: Generalized Linear Models

slide-37
SLIDE 37

CS109A, PROTOPAPAS, RADER

General Equations: Proof

37

Using the chain rule: hence