binar y data and logistic regression
play

Binar y data and logistic regression G E N E R AL IZE D L IN E AR - PowerPoint PPT Presentation

Binar y data and logistic regression G E N E R AL IZE D L IN E AR MOD E L S IN P YTH ON Ita Ciro v ic Done v Data Science Cons u ltant Binar y response data T w o - class response 0,1 E x amples : Credit scoring


  1. Binar y data and logistic regression G E N E R AL IZE D L IN E AR MOD E L S IN P YTH ON Ita Ciro v ic Done v Data Science Cons u ltant

  2. Binar y response data T w o - class response → 0,1 E x amples : Credit scoring → "Default"/"Non-Default" Passing a test → "Pass"/"Fail" Fra u d detection → "Fraud"/"No-Fraud" Choice of a prod u ct → "Product ABC"/"Product XYZ" GENERALIZED LINEAR MODELS IN PYTHON

  3. Binar y data UNGROUPED GROUPED Single e v ent M u ltiple e v ents Flip one coin Flip m u ltiple coins N u mber of s u ccesses in a gi v en n n u mber T w o of possible o u tcomes : 0/1 Bernoulli ( p ) or of trials Binomial ( n , p ) Binomial ( n = 1, p ) GENERALIZED LINEAR MODELS IN PYTHON

  4. Logistic f u nction GENERALIZED LINEAR MODELS IN PYTHON

  5. Logistic f u nction Test o u tcome : PASS = 1 or FAIL = 0 Want to model P ( y = 1) = β + β x 0 1 1 P (Pass) = β + β × Hours of study 0 1 GENERALIZED LINEAR MODELS IN PYTHON

  6. Logistic f u nction Test o u tcome : PASS = 1 or FAIL = 0 Want to model P ( y = 1) = β + β x 0 1 1 P (Pass) = β + β × Hours of study 0 1 Use logistic f u nction 1 f ( z ) = (1+exp(− z )) GENERALIZED LINEAR MODELS IN PYTHON

  7. Odds and odds ratio event occuring ODDS = event NOT occuring odds 1 ODDS RATIO = odds 2 GENERALIZED LINEAR MODELS IN PYTHON

  8. Odds e x ample 4 games Odds are 3 to 1 GENERALIZED LINEAR MODELS IN PYTHON

  9. Odds and probabilities odds ≠ probability probability odds = 1 − probability odds probability = 1 − odds GENERALIZED LINEAR MODELS IN PYTHON

  10. From probabilit y model to logistic regression Step 3. Appl y logistic f u nction → INVERSE - Step 1. Probabilit y model E ( y ) = μ = P ( y = 1) = β + β x LOGIT 0 1 1 exp( β + β x ) 1 = 0 1 1 μ = 1+exp(−( β + β x )) 1+exp( β + β x ) 0 1 1 0 1 1 1 1 − μ = 1+exp( β + β x ) 0 1 1 Step 2. Logistic f u nction 1 f ( z ) = (1+exp(− z )) GENERALIZED LINEAR MODELS IN PYTHON

  11. From probabilit y model to logistic regression Probabilit y → odds μ ODDS = = exp ( β + β x ) 0 1 1 1 − μ Log transformation → LOGISTIC REGRESSION μ LOGIT ( μ ) = log ( ) = β + β x 0 1 1 1 − μ GENERALIZED LINEAR MODELS IN PYTHON

  12. Logistic regression in P y thon F u nction - glm() model_GLM = glm(formula = 'y ~ x', data = my_data, family = sm.families.Binomial()a).fit Inp u t y = [0,1,1,0,...] y = ['No','Yes','Yes',...] y = ['Fail','Pass','Pass',...] GENERALIZED LINEAR MODELS IN PYTHON

  13. Let ' s practice ! G E N E R AL IZE D L IN E AR MOD E L S IN P YTH ON

  14. Interpreting coefficients G E N E R AL IZE D L IN E AR MOD E L S IN P YTH ON Ita Ciro v ic Done v Data Science Cons u ltant

  15. Model coefficients GENERALIZED LINEAR MODELS IN PYTHON

  16. Coefficient beta β > 0 → ascending c u r v e β < 0 → descending c u r v e GENERALIZED LINEAR MODELS IN PYTHON

  17. Linear v s logistic LINEAR MODEL LOGIT MODEL glm('y ~ weight', glm('y ~ weight', data = crab, data = crab, family = sm.families.Gaussian()) family = sm.families.Binomial()) μ = −0.14 + 0.32 ∗ weight log ( odds ) = −3.69 + 1.8 ∗ weight For e v er y one -u nit increase in w eight For e v er y one -u nit increase in w eight estimated probability increases b y 0.32 log(odds) increase b y 1.8 GENERALIZED LINEAR MODELS IN PYTHON

  18. Log odds interpretation Logistic model μ log ( ) = β + β x 0 1 1 1 − μ Increase x b y one -u nit μ log ( ) = β + β ( x + 1) 0 1 1 1 − μ GENERALIZED LINEAR MODELS IN PYTHON

  19. Log odds interpretation Logistic model μ log ( ) = β + β x 0 1 1 1 − μ Increase x b y one -u nit μ log ( ) = β + β ( x + 1) = β + β x + β 0 1 1 0 1 1 1 1 − μ Take the e x ponential μ ( ) = exp( β + β x )exp( β ) 0 1 1 1 1 − μ Concl u sion → the odds are m u ltiplied b y exp( β ) 1 GENERALIZED LINEAR MODELS IN PYTHON

  20. Log odds interpretation Crab model y ~ weight μ log ( ) = −3.6947 + 1.815 ∗ weight 1 − μ The odds of satellite crab m u ltipl y b y exp(1.815) = 6.14 for a u nit increase in w eight GENERALIZED LINEAR MODELS IN PYTHON

  21. Log odds interpretation Crab model y ~ weight μ log ( ) = −3.6947 + 1.8151 ∗ weight 1 − μ The odds of satellite crab m u ltipl y b y exp(1.8151) = 6.14 for a u nit increase in w eight The intercept coe � cient of −3.6947 denotes the baseline log odds exp(−3.6947) = 0.0248 are the odds w hen weight = 0 . GENERALIZED LINEAR MODELS IN PYTHON

  22. Probabilit y v s logistic fit GENERALIZED LINEAR MODELS IN PYTHON

  23. Probabilit y v s logistic fit GENERALIZED LINEAR MODELS IN PYTHON

  24. Probabilit y v s logistic fit slope → β × μ (1 − μ ) GENERALIZED LINEAR MODELS IN PYTHON

  25. Probabilit y v s logistic fit slope → β × μ (1 − μ ) GENERALIZED LINEAR MODELS IN PYTHON

  26. Comp u te change in estimated probabilit y # Choose x (weight) and extract model coefficients x = 1.5 intercept, slope = model_GLM.params # Compute estimated probability est_prob = np.exp(intercept + slope * x)/(1 + np.exp(intercept + slope * x)) 0.2744 # Compute incremental change in estimated probability given x ic_prob = slope * est_prob * (1 - est_prob) 0.3614 GENERALIZED LINEAR MODELS IN PYTHON

  27. Rate of change in probabilit y for e v er y x logit = −3.6947 + 1.8151 ∗ weight GENERALIZED LINEAR MODELS IN PYTHON

  28. Let ' s practice ! G E N E R AL IZE D L IN E AR MOD E L S IN P YTH ON

  29. Interpreting model inference G E N E R AL IZE D L IN E AR MOD E L S IN P YTH ON Ita Ciro v ic Done v Data Science Cons u ltant

  30. Estimation of beta coefficient Ma x im u m likelihood estimation ( MLE ) ^ β Estimated coe � cient , log - likelihood takes on the ma x im u m v al u e GENERALIZED LINEAR MODELS IN PYTHON

  31. Estimation of beta coefficient Iterati v el y re w eighted least sq u ares ( IRLS ) GENERALIZED LINEAR MODELS IN PYTHON

  32. Significance testing GENERALIZED LINEAR MODELS IN PYTHON

  33. Standard error ( SE ) Fla � er peak Sharper peak → Location of ma x im u m harder to de � ne → Location of ma x im u m more clearl y → Larger SE de � ned → Smaller SE GENERALIZED LINEAR MODELS IN PYTHON

  34. Comp u tation of the standard error # Extract variance-covariance matrix Variance - co v ariance matri x print(model_GLM.cov_params()) Intercept weight Intercept 0.774762 -0.325087 weight -0.325087 0.141903 # Compute standard error for weight std_error = np.sqrt(0.141903) 0.3767 GENERALIZED LINEAR MODELS IN PYTHON

  35. Significance testing z- statistic E x ample : horseshoe crab model ^ z = / SE y ~ weight β z large ⇒ coe � cient ≠ 0 ⇒ v ariable z = 1.8151/0.377 = 4.819 signi � cant R u le of th u mb : c u t - o � v al u e of 2 GENERALIZED LINEAR MODELS IN PYTHON

  36. Confidence inter v als for beta Uncertaint y of the estimates 95% con � dence inter v als for β [ lower , upper ] ^ ^ [ − 1.96 × SE , + 1.96 × SE ] β β GENERALIZED LINEAR MODELS IN PYTHON

  37. Comp u ting confidence inter v als E x ample : horseshoe crab model coef std err ---------------------------------- Intercept -3.6947 0.880 weight 1.8151 0.377 [1.8151 − 1.96 × 0.377, 1.8151 + 1.96 × 0.377] [1.07618, 2.55402] GENERALIZED LINEAR MODELS IN PYTHON

  38. E x tract confidence inter v als print(model_GLM.conf_int()) 0 1 Intercept -5.419897 -1.969555 weight 1.076826 2.553463 GENERALIZED LINEAR MODELS IN PYTHON

  39. E x tract confidence inter v als print(model_GLM.conf_int()) lower 1 Intercept -5.419897 -1.969555 weight 1.076826 2.553463 GENERALIZED LINEAR MODELS IN PYTHON

  40. E x tract confidence inter v als print(model_GLM.conf_int()) 0 upper Intercept -5.419897 -1.969555 weight 1.076826 2.553463 GENERALIZED LINEAR MODELS IN PYTHON

  41. Confidence inter v als for odds 1. E x tract con � dence inter v als for β 2. E x ponentiate endpoints print(np.exp(model_GLM.conf_int())) 0 1 Intercept 0.004428 0.139519 weight 2.935348 12.851533 GENERALIZED LINEAR MODELS IN PYTHON

  42. Let ' s practice ! G E N E R AL IZE D L IN E AR MOD E L S IN P YTH ON

  43. Comp u ting and describing predictions G E N E R AL IZE D L IN E AR MOD E L S IN P YTH ON Ita Ciro v ic Done v Data Science Cons u ltant

  44. Comp u ting predictions A � er obtaining model � t 1. Fi � ed v al u es for original x v al u es GENERALIZED LINEAR MODELS IN PYTHON

  45. Comp u ting predictions A � er obtaining model � t 1. � � ed v al u es for original x v al u es 2. Ne w v al u es of x for predicted v al u es GENERALIZED LINEAR MODELS IN PYTHON

  46. Comp u ting predictions Horseshoe crab model y ~ weight exp(−3.6947 + 1.8151 × weight ) μ = 1 + exp(−3.6947 + 1.8151 × weight ) Ne w meas u rement : weight = 2.85 exp(−3.6947 + 1.8151 × 2.85) μ = = 0.814 1 + exp(−3.6947 + 1.8151 × 2.85) GENERALIZED LINEAR MODELS IN PYTHON

  47. Predictions in P y thon Comp u te model predictions for dataset new_data # Compute model predictions model_GLM.predict(exog = new_data) GENERALIZED LINEAR MODELS IN PYTHON

  48. From probabilities to classes GENERALIZED LINEAR MODELS IN PYTHON

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend