co u nt data and poisson distrib u tion
play

Co u nt data and Poisson distrib u tion G E N E R AL IZE D L IN E - PowerPoint PPT Presentation

Co u nt data and Poisson distrib u tion G E N E R AL IZE D L IN E AR MOD E L S IN P YTH ON Ita Ciro v ic Done v Data Science Cons u ltant Co u nt data Co u nt the n u mber of occ u rrences in a speci ed u nit of time , distance , area or v


  1. Co u nt data and Poisson distrib u tion G E N E R AL IZE D L IN E AR MOD E L S IN P YTH ON Ita Ciro v ic Done v Data Science Cons u ltant

  2. Co u nt data Co u nt the n u mber of occ u rrences in a speci � ed u nit of time , distance , area or v ol u me E x amples : Goals in a soccer match N u mber of earthq u akes N u mber of crab satellites N u mber of a w ards w on b y a person N u mber of bike crossings o v er the bridge GENERALIZED LINEAR MODELS IN PYTHON

  3. Poisson random v ariable E v ents occ u r independentl y and randoml y Poisson distrib u tion y − λ λ e P ( y ) = y ! λ : mean and v ariance y = 0,1,2,3,... Always positive Discrete ( not contin u o u s ) Lower bound at zero , b u t no u pper bo u nd GENERALIZED LINEAR MODELS IN PYTHON

  4. Understanding the parameter of the Poisson distrib u tion GENERALIZED LINEAR MODELS IN PYTHON

  5. Vis u ali z ing the response import seaborn as sns sns.distplot('y') GENERALIZED LINEAR MODELS IN PYTHON

  6. Poisson regression Response v ariable y ∼ Poisson ( λ ) Mean of the response E ( y ) = λ Poisson regression model log ( λ ) = β + β x 0 1 1 GENERALIZED LINEAR MODELS IN PYTHON

  7. E x planator y v ariables Contin u o u s and / or categorical → Poisson regression model Categorical → log - linear model GENERALIZED LINEAR MODELS IN PYTHON

  8. GLM w ith Poisson in P y thon import statsmodels.api as sm from statsmodels.formula.api import glm glm('y ~ x', data = my_data, family = sm.families.Poisson()) GENERALIZED LINEAR MODELS IN PYTHON

  9. Let ' s practice ! G E N E R AL IZE D L IN E AR MOD E L S IN P YTH ON

  10. Interpreting model fit G E N E R AL IZE D L IN E AR MOD E L S IN P YTH ON Ita Ciro v ic Done v Data Science Cons u ltant

  11. Parameter estimation Ma x im u m likelihood estimation ( MLE ) Iterati v el y re w eighted least sq u ares ( IRLS ) GENERALIZED LINEAR MODELS IN PYTHON

  12. The response f u nction Poisson regression model log ( λ ) = β + β x 0 1 1 The response f u nction : λ = exp ( β + β x ) 0 1 1 or λ = exp ( β ) × exp ( β x ) 0 1 1 GENERALIZED LINEAR MODELS IN PYTHON

  13. The response f u nction Poisson regression model log ( λ ) = β + β x 0 1 1 The response f u nction : λ = exp ( β + β x ) 0 1 1 or λ = exp ( β ) × exp ( β x ) 0 1 1 GENERALIZED LINEAR MODELS IN PYTHON

  14. Interpretation of parameters exp ( β ) 0 The e � ect on the mean λ w hen x = 0 exp ( β ) 1 The m u ltiplicati v e e � ect on the mean λ for a 1-u nit increase in x GENERALIZED LINEAR MODELS IN PYTHON

  15. Interpreting coefficient effect If β > 0 If β < 0 1 exp ( β ) > 1 exp ( β ) < 1 1 1 λ is exp ( β ) times larger than w hen λ is exp ( β ) times smaller than w hen 1 1 x = 0 x = 0 If β = 0 1 exp ( β ) = 1 1 λ = exp ( β ) 0 M u ltiplicati v e factor is 1 y and x are not related GENERALIZED LINEAR MODELS IN PYTHON

  16. E x ample model = glm('sat ~ weight', data = crab, family = sm.families.Poisson()).fit() Generalized Linear Model Regression Results (print cut) ============================================================================= coef std err z P>|z| [0.025 0.975] ----------------------------------------------------------------------------- Intercept -0.4284 0.179 -2.394 0.017 -0.779 -0.078 weight 0.5893 0.065 9.064 0.000 0.462 0.717 ============================================================================= GENERALIZED LINEAR MODELS IN PYTHON

  17. E x ample - interpretation of beta E x tract model coe � cients Comp u te the e � ect model.params np.exp(0.589304) Intercept -0.428405 1.803 weight 0.589304 GENERALIZED LINEAR MODELS IN PYTHON

  18. Confidence inter v al for ... β 1 The m u ltiplicati v e e � ect on mean print(model.conf_int()) print(np.exp(crab_fit.conf_int())) 0 1 0 1 Intercept -0.779112 -0.077699 Intercept 0.458813 0.925243 weight 0.461873 0.716735 weight 1.587044 2.047737 GENERALIZED LINEAR MODELS IN PYTHON

  19. Let ' s practice ! G E N E R AL IZE D L IN E AR MOD E L S IN P YTH ON

  20. The Problem of O v erdispersion G E N E R AL IZE D L IN E AR MOD E L S IN P YTH ON Ita Ciro v ic Done v Data Science Cons u ltant

  21. Understanding the data # mean of y y_mean = crab['sat'].mean() 2.919 # variance of y y_variance = crab['sat'].var() 9.912 GENERALIZED LINEAR MODELS IN PYTHON

  22. Mean not eq u al to v ariance variance > mean → o v erdispersion variance < mean → u nderdispersion Conseq u ences : Small standard errors Small p -v al u e GENERALIZED LINEAR MODELS IN PYTHON

  23. Ho w to check for o v erdispersion ? GENERALIZED LINEAR MODELS IN PYTHON

  24. Comp u te estimated o v erdispersion ratio = crab_fit.pearson_chi2 / crab_fit.df_resid print(ratio) 3.134 Ratio = 1 → appro x imatel y Poisson Ratio < 1 → u nderdispersion Ratio > 1 → o v erdispersion GENERALIZED LINEAR MODELS IN PYTHON

  25. Negati v e Binomial Regression E ( y ) = λ 2 V ar ( y ) = λ + αλ α - dispersion parameter GENERALIZED LINEAR MODELS IN PYTHON

  26. GLM negati v e Binomial in P y thon import statsmodels.api as sm from statsmodels.formula.api import glm model = glm('y ~ x', data = my_data, family = sm.families.NegativeBinomial(alpha = 1)).fit() GENERALIZED LINEAR MODELS IN PYTHON

  27. Let ' s practice ! G E N E R AL IZE D L IN E AR MOD E L S IN P YTH ON

  28. Plotting a regression model G E N E R AL IZE D L IN E AR MOD E L S IN P YTH ON Ita Ciro v ic Done v Data Science Cons u ltant

  29. Import libraries import seaborn as sns import matplotlib.pyplot as plt Crab model 'sat ~ width' is sa v ed as model GENERALIZED LINEAR MODELS IN PYTHON

  30. Plot data points # Adjust figure size plt.subplots(figsize = (8, 5)) # Plot data points sns.regplot('width', 'sat', data = crab, fit_reg = False) GENERALIZED LINEAR MODELS IN PYTHON

  31. Add jitter sns.regplot('width', 'sat', data = crab, fit_reg = False, y_jitter = 0.3) GENERALIZED LINEAR MODELS IN PYTHON

  32. Add linear fit sns.regplot('width', 'sat', data = crab, y_jitter = 0.3, fit_reg = True, line_kws = {'color':'green', 'label':'LM fit'}) GENERALIZED LINEAR MODELS IN PYTHON

  33. Add Poisson GLM estimated v al u es crab['fit_values'] = model.fittedvalues sns.scatterplot('width','fit_values', data = crab, color = 'red', label = 'Poisson') GENERALIZED LINEAR MODELS IN PYTHON

  34. Predictions GENERALIZED LINEAR MODELS IN PYTHON

  35. Predictions new_data = pd.DataFrame({'width':[24, 28, 32]}) model.predict(new_data) 0 1.881981 GENERALIZED LINEAR MODELS IN PYTHON

  36. Predictions new_data = pd.DataFrame({'width':[24, 28, 32]}) model.predict(new_data) 0 1.881981 1 3.627360 GENERALIZED LINEAR MODELS IN PYTHON

  37. Predictions new_data = pd.DataFrame({'width':[24, 28, 32]}) model.predict(new_data) 0 1.881981 1 3.627360 2 6.991433 GENERALIZED LINEAR MODELS IN PYTHON

  38. Let ' s practice ! G E N E R AL IZE D L IN E AR MOD E L S IN P YTH ON

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend