Co u nt data and Poisson distrib u tion G E N E R AL IZE D L IN E - - PowerPoint PPT Presentation

co u nt data and poisson distrib u tion
SMART_READER_LITE
LIVE PREVIEW

Co u nt data and Poisson distrib u tion G E N E R AL IZE D L IN E - - PowerPoint PPT Presentation

Co u nt data and Poisson distrib u tion G E N E R AL IZE D L IN E AR MOD E L S IN P YTH ON Ita Ciro v ic Done v Data Science Cons u ltant Co u nt data Co u nt the n u mber of occ u rrences in a speci ed u nit of time , distance , area or v


slide-1
SLIDE 1

Count data and Poisson distribution

G E N E R AL IZE D L IN E AR MOD E L S IN P YTH ON

Ita Cirovic Donev

Data Science Consultant

slide-2
SLIDE 2

GENERALIZED LINEAR MODELS IN PYTHON

Count data

Count the number of occurrences in a specied unit of time, distance, area or volume Examples: Goals in a soccer match Number of earthquakes Number of crab satellites Number of awards won by a person Number of bike crossings over the bridge

slide-3
SLIDE 3

GENERALIZED LINEAR MODELS IN PYTHON

Poisson random variable

Events occur independently and randomly Poisson distribution

P(y) = λ : mean and variance y = 0,1,2,3,... Always positive

Discrete (not continuous)

Lower bound at zero, but no upper bound y! λ e

y −λ

slide-4
SLIDE 4

GENERALIZED LINEAR MODELS IN PYTHON

Understanding the parameter of the Poisson distribution

slide-5
SLIDE 5

GENERALIZED LINEAR MODELS IN PYTHON

Visualizing the response

import seaborn as sns sns.distplot('y')

slide-6
SLIDE 6

GENERALIZED LINEAR MODELS IN PYTHON

Poisson regression

Response variable

y ∼ Poisson(λ)

Mean of the response

E(y) = λ

Poisson regression model

log(λ) = β + β x

1 1

slide-7
SLIDE 7

GENERALIZED LINEAR MODELS IN PYTHON

Explanatory variables

Continuous and/or categorical → Poisson regression model Categorical → log-linear model

slide-8
SLIDE 8

GENERALIZED LINEAR MODELS IN PYTHON

GLM with Poisson in Python

import statsmodels.api as sm from statsmodels.formula.api import glm glm('y ~ x', data = my_data, family = sm.families.Poisson())

slide-9
SLIDE 9

Let's practice!

G E N E R AL IZE D L IN E AR MOD E L S IN P YTH ON

slide-10
SLIDE 10

Interpreting model fit

G E N E R AL IZE D L IN E AR MOD E L S IN P YTH ON

Ita Cirovic Donev

Data Science Consultant

slide-11
SLIDE 11

GENERALIZED LINEAR MODELS IN PYTHON

Parameter estimation

Maximum likelihood estimation (MLE) Iteratively reweighted least squares (IRLS)

slide-12
SLIDE 12

GENERALIZED LINEAR MODELS IN PYTHON

The response function

Poisson regression model

log(λ) = β + β x

The response function:

λ = exp(β + β x )

  • r

λ = exp(β ) × exp(β x )

1 1 1 1 1 1

slide-13
SLIDE 13

GENERALIZED LINEAR MODELS IN PYTHON

The response function

Poisson regression model

log(λ) = β + β x

The response function:

λ = exp(β + β x )

  • r

λ = exp(β ) × exp(β x )

1 1 1 1 1 1

slide-14
SLIDE 14

GENERALIZED LINEAR MODELS IN PYTHON

Interpretation of parameters

exp(β )

The eect on the mean λ when x = 0

exp(β )

The multiplicative eect on the mean λ for a 1-unit increase in x

1

slide-15
SLIDE 15

GENERALIZED LINEAR MODELS IN PYTHON

Interpreting coefficient effect

If β > 0

exp(β ) > 1 λ is exp(β ) times larger than when x = 0

If β < 0

exp(β ) < 1 λ is exp(β ) times smaller than when x = 0

If β = 0

exp(β ) = 1 λ = exp(β )

Multiplicative factor is 1

y and x are not related

1 1 1 1 1 1 1

slide-16
SLIDE 16

GENERALIZED LINEAR MODELS IN PYTHON

Example

model = glm('sat ~ weight', data = crab, family = sm.families.Poisson()).fit() Generalized Linear Model Regression Results (print cut) ============================================================================= coef std err z P>|z| [0.025 0.975]

  • Intercept -0.4284 0.179 -2.394 0.017 -0.779 -0.078

weight 0.5893 0.065 9.064 0.000 0.462 0.717 =============================================================================

slide-17
SLIDE 17

GENERALIZED LINEAR MODELS IN PYTHON

Example - interpretation of beta

Extract model coecients

model.params

Intercept -0.428405 weight 0.589304

Compute the eect

np.exp(0.589304)

1.803

slide-18
SLIDE 18

GENERALIZED LINEAR MODELS IN PYTHON

Confidence interval for ...

β

print(model.conf_int()) 0 1 Intercept -0.779112 -0.077699 weight 0.461873 0.716735

The multiplicative eect on mean

print(np.exp(crab_fit.conf_int())) 0 1 Intercept 0.458813 0.925243 weight 1.587044 2.047737

1

slide-19
SLIDE 19

Let's practice!

G E N E R AL IZE D L IN E AR MOD E L S IN P YTH ON

slide-20
SLIDE 20

The Problem of Overdispersion

G E N E R AL IZE D L IN E AR MOD E L S IN P YTH ON

Ita Cirovic Donev

Data Science Consultant

slide-21
SLIDE 21

GENERALIZED LINEAR MODELS IN PYTHON

Understanding the data

# mean of y y_mean = crab['sat'].mean() 2.919 # variance of y y_variance = crab['sat'].var() 9.912

slide-22
SLIDE 22

GENERALIZED LINEAR MODELS IN PYTHON

Mean not equal to variance

variance > mean → overdispersion variance < mean → underdispersion

Consequences: Small standard errors Small p-value

slide-23
SLIDE 23

GENERALIZED LINEAR MODELS IN PYTHON

How to check for overdispersion?

slide-24
SLIDE 24

GENERALIZED LINEAR MODELS IN PYTHON

Compute estimated overdispersion

ratio = crab_fit.pearson_chi2 / crab_fit.df_resid print(ratio) 3.134

Ratio = 1 → approximately Poisson Ratio < 1 → underdispersion Ratio > 1 → overdispersion

slide-25
SLIDE 25

GENERALIZED LINEAR MODELS IN PYTHON

Negative Binomial Regression

E(y) = λ V ar(y) = λ + αλ α - dispersion parameter

2

slide-26
SLIDE 26

GENERALIZED LINEAR MODELS IN PYTHON

GLM negative Binomial in Python

import statsmodels.api as sm from statsmodels.formula.api import glm model = glm('y ~ x', data = my_data, family = sm.families.NegativeBinomial(alpha = 1)).fit()

slide-27
SLIDE 27

Let's practice!

G E N E R AL IZE D L IN E AR MOD E L S IN P YTH ON

slide-28
SLIDE 28

Plotting a regression model

G E N E R AL IZE D L IN E AR MOD E L S IN P YTH ON

Ita Cirovic Donev

Data Science Consultant

slide-29
SLIDE 29

GENERALIZED LINEAR MODELS IN PYTHON

Import libraries

import seaborn as sns import matplotlib.pyplot as plt

Crab model 'sat ~ width' is saved as model

slide-30
SLIDE 30

GENERALIZED LINEAR MODELS IN PYTHON

Plot data points

# Adjust figure size plt.subplots(figsize = (8, 5)) # Plot data points sns.regplot('width', 'sat', data = crab, fit_reg = False)

slide-31
SLIDE 31

GENERALIZED LINEAR MODELS IN PYTHON

Add jitter

sns.regplot('width', 'sat', data = crab, fit_reg = False, y_jitter = 0.3)

slide-32
SLIDE 32

GENERALIZED LINEAR MODELS IN PYTHON

Add linear fit

sns.regplot('width', 'sat', data = crab, y_jitter = 0.3, fit_reg = True, line_kws = {'color':'green', 'label':'LM fit'})

slide-33
SLIDE 33

GENERALIZED LINEAR MODELS IN PYTHON

Add Poisson GLM estimated values

crab['fit_values'] = model.fittedvalues sns.scatterplot('width','fit_values', data = crab, color = 'red', label = 'Poisson')

slide-34
SLIDE 34

GENERALIZED LINEAR MODELS IN PYTHON

Predictions

slide-35
SLIDE 35

GENERALIZED LINEAR MODELS IN PYTHON

Predictions

new_data = pd.DataFrame({'width':[24, 28, 32]}) model.predict(new_data) 0 1.881981

slide-36
SLIDE 36

GENERALIZED LINEAR MODELS IN PYTHON

Predictions

new_data = pd.DataFrame({'width':[24, 28, 32]}) model.predict(new_data) 0 1.881981 1 3.627360

slide-37
SLIDE 37

GENERALIZED LINEAR MODELS IN PYTHON

Predictions

new_data = pd.DataFrame({'width':[24, 28, 32]}) model.predict(new_data) 0 1.881981 1 3.627360 2 6.991433

slide-38
SLIDE 38

Let's practice!

G E N E R AL IZE D L IN E AR MOD E L S IN P YTH ON