Binar y data and logistic regression G E N E R AL IZE D L IN E AR - - PowerPoint PPT Presentation

binar y data and logistic regression
SMART_READER_LITE
LIVE PREVIEW

Binar y data and logistic regression G E N E R AL IZE D L IN E AR - - PowerPoint PPT Presentation

Binar y data and logistic regression G E N E R AL IZE D L IN E AR MOD E L S IN P YTH ON Ita Ciro v ic Done v Data Science Cons u ltant Binar y response data T w o - class response 0,1 E x amples : Credit scoring


slide-1
SLIDE 1

Binary data and logistic regression

G E N E R AL IZE D L IN E AR MOD E L S IN P YTH ON

Ita Cirovic Donev

Data Science Consultant

slide-2
SLIDE 2

GENERALIZED LINEAR MODELS IN PYTHON

Binary response data

Two-class response → 0,1 Examples: Credit scoring → "Default"/"Non-Default" Passing a test → "Pass"/"Fail" Fraud detection → "Fraud"/"No-Fraud" Choice of a product → "Product ABC"/"Product XYZ"

slide-3
SLIDE 3

GENERALIZED LINEAR MODELS IN PYTHON

Binary data

UNGROUPED Single event Flip one coin Two of possible outcomes: 0/1

Bernoulli(p) or Binomial(n = 1,p)

GROUPED Multiple events Flip multiple coins Number of successes in a given n number

  • f trials

Binomial(n,p)

slide-4
SLIDE 4

GENERALIZED LINEAR MODELS IN PYTHON

Logistic function

slide-5
SLIDE 5

GENERALIZED LINEAR MODELS IN PYTHON

Logistic function

Test outcome: PASS = 1 or FAIL = 0 Want to model

P(y = 1) = β + β x P(Pass) = β + β × Hours of study

1 1 1

slide-6
SLIDE 6

GENERALIZED LINEAR MODELS IN PYTHON

Logistic function

Test outcome: PASS = 1 or FAIL = 0 Want to model

P(y = 1) = β + β x P(Pass) = β + β × Hours of study

Use logistic function

f(z) =

1 1 1 (1+exp(−z)) 1

slide-7
SLIDE 7

GENERALIZED LINEAR MODELS IN PYTHON

Odds and odds ratio

ODDS = ODDS RATIO = event NOT occuring event occuring

  • dds2
  • dds1
slide-8
SLIDE 8

GENERALIZED LINEAR MODELS IN PYTHON

Odds example

4 games Odds are 3 to 1

slide-9
SLIDE 9

GENERALIZED LINEAR MODELS IN PYTHON

Odds and probabilities

  • dds ≠ probability
  • dds =

probability = 1 − probability probability 1 − odds

  • dds
slide-10
SLIDE 10

GENERALIZED LINEAR MODELS IN PYTHON

From probability model to logistic regression

Step 1. Probability model

E(y) = μ = P(y = 1) = β + β x

Step 2. Logistic function

f(z) =

Step 3. Apply logistic function → INVERSE- LOGIT

μ =

=

1 − μ =

1 1

(1+exp(−z)) 1 1+exp(−(β +β x ))

1 1

1 1+exp(β +β x )

1 1

exp(β +β x )

1 1

1+exp(β +β x )

1 1

1

slide-11
SLIDE 11

GENERALIZED LINEAR MODELS IN PYTHON

From probability model to logistic regression

Probability → odds

ODDS = = exp(β + β x )

Log transformation → LOGISTIC REGRESSION

LOGIT(μ) = log( ) = β + β x 1 − μ μ

1 1

1 − μ μ

1 1

slide-12
SLIDE 12

GENERALIZED LINEAR MODELS IN PYTHON

Logistic regression in Python

Function - glm()

model_GLM = glm(formula = 'y ~ x', data = my_data, family = sm.families.Binomial()a).fit

Input

y = [0,1,1,0,...] y = ['No','Yes','Yes',...] y = ['Fail','Pass','Pass',...]

slide-13
SLIDE 13

Let's practice!

G E N E R AL IZE D L IN E AR MOD E L S IN P YTH ON

slide-14
SLIDE 14

Interpreting coefficients

G E N E R AL IZE D L IN E AR MOD E L S IN P YTH ON

Ita Cirovic Donev

Data Science Consultant

slide-15
SLIDE 15

GENERALIZED LINEAR MODELS IN PYTHON

Model coefficients

slide-16
SLIDE 16

GENERALIZED LINEAR MODELS IN PYTHON

Coefficient beta

β > 0 → ascending curve β < 0 → descending curve

slide-17
SLIDE 17

GENERALIZED LINEAR MODELS IN PYTHON

Linear vs logistic

LINEAR MODEL

glm('y ~ weight', data = crab, family = sm.families.Gaussian())

μ = −0.14 + 0.32 ∗ weight

For every one-unit increase in weight

estimated probability increases by 0.32

LOGIT MODEL

glm('y ~ weight', data = crab, family = sm.families.Binomial())

log(odds) = −3.69 + 1.8 ∗ weight

For every one-unit increase in weight

log(odds) increase by 1.8

slide-18
SLIDE 18

GENERALIZED LINEAR MODELS IN PYTHON

Log odds interpretation

Logistic model

log( ) = β + β x

Increase x by one-unit

log( ) = β + β (x + 1) 1 − μ μ

1 1

1 − μ μ

1 1

slide-19
SLIDE 19

GENERALIZED LINEAR MODELS IN PYTHON

Log odds interpretation

Logistic model

log( ) = β + β x

Increase x by one-unit

log( ) = β + β (x + 1) = β + β x + β

Take the exponential

( ) = exp(β + β x )exp(β )

Conclusion → the odds are multiplied by exp(β )

1 − μ μ

1 1

1 − μ μ

1 1 1 1 1

1 − μ μ

1 1 1 1

slide-20
SLIDE 20

GENERALIZED LINEAR MODELS IN PYTHON

Log odds interpretation

Crab model y ~ weight

log( ) = −3.6947 + 1.815 ∗ weight

The odds of satellite crab multiply by exp(1.815) = 6.14 for a unit increase in weight

1 − μ μ

slide-21
SLIDE 21

GENERALIZED LINEAR MODELS IN PYTHON

Log odds interpretation

Crab model y ~ weight

log( ) = −3.6947 + 1.8151 ∗ weight

The odds of satellite crab multiply by exp(1.8151) = 6.14 for a unit increase in weight The intercept coecient of −3.6947 denotes the baseline log odds

exp(−3.6947) = 0.0248 are the odds when weight = 0. 1 − μ μ

slide-22
SLIDE 22

GENERALIZED LINEAR MODELS IN PYTHON

Probability vs logistic fit

slide-23
SLIDE 23

GENERALIZED LINEAR MODELS IN PYTHON

Probability vs logistic fit

slide-24
SLIDE 24

GENERALIZED LINEAR MODELS IN PYTHON

Probability vs logistic fit

slope → β × μ(1 − μ)

slide-25
SLIDE 25

GENERALIZED LINEAR MODELS IN PYTHON

Probability vs logistic fit

slope → β × μ(1 − μ)

slide-26
SLIDE 26

GENERALIZED LINEAR MODELS IN PYTHON

Compute change in estimated probability

# Choose x (weight) and extract model coefficients x = 1.5 intercept, slope = model_GLM.params # Compute estimated probability est_prob = np.exp(intercept + slope * x)/(1 + np.exp(intercept + slope * x)) 0.2744 # Compute incremental change in estimated probability given x ic_prob = slope * est_prob * (1 - est_prob) 0.3614

slide-27
SLIDE 27

GENERALIZED LINEAR MODELS IN PYTHON

Rate of change in probability for every x

logit = −3.6947 + 1.8151 ∗ weight

slide-28
SLIDE 28

Let's practice!

G E N E R AL IZE D L IN E AR MOD E L S IN P YTH ON

slide-29
SLIDE 29

Interpreting model inference

G E N E R AL IZE D L IN E AR MOD E L S IN P YTH ON

Ita Cirovic Donev

Data Science Consultant

slide-30
SLIDE 30

GENERALIZED LINEAR MODELS IN PYTHON

Estimation of beta coefficient

Maximum likelihood estimation (MLE) Estimated coecient, log-likelihood takes on the maximum value

β ^

slide-31
SLIDE 31

GENERALIZED LINEAR MODELS IN PYTHON

Estimation of beta coefficient

Iteratively reweighted least squares (IRLS)

slide-32
SLIDE 32

GENERALIZED LINEAR MODELS IN PYTHON

Significance testing

slide-33
SLIDE 33

GENERALIZED LINEAR MODELS IN PYTHON

Standard error (SE)

Flaer peak

→ Location of maximum harder to dene → Larger SE

Sharper peak

→ Location of maximum more clearly

dened

→ Smaller SE

slide-34
SLIDE 34

GENERALIZED LINEAR MODELS IN PYTHON

Computation of the standard error

# Extract variance-covariance matrix print(model_GLM.cov_params()) Intercept weight Intercept 0.774762 -0.325087 weight -0.325087 0.141903 # Compute standard error for weight std_error = np.sqrt(0.141903) 0.3767

Variance-covariance matrix

slide-35
SLIDE 35

GENERALIZED LINEAR MODELS IN PYTHON

Significance testing

z-statistic

z = /SE

z large ⇒ coecient ≠ 0 ⇒ variable

signicant Rule of thumb: cut-o value of 2 Example: horseshoe crab model

y ~ weight

z = 1.8151/0.377 = 4.819

β ^

slide-36
SLIDE 36

GENERALIZED LINEAR MODELS IN PYTHON

Confidence intervals for beta

Uncertainty of the estimates 95% condence intervals for β

[lower,upper] [ − 1.96 × SE, + 1.96 × SE] β ^ β ^

slide-37
SLIDE 37

GENERALIZED LINEAR MODELS IN PYTHON

Computing confidence intervals

Example: horseshoe crab model

coef std err

  • Intercept -3.6947 0.880

weight 1.8151 0.377

[1.8151 − 1.96 × 0.377, 1.8151 + 1.96 × 0.377] [1.07618, 2.55402]

slide-38
SLIDE 38

GENERALIZED LINEAR MODELS IN PYTHON

Extract confidence intervals

print(model_GLM.conf_int()) 0 1 Intercept -5.419897 -1.969555 weight 1.076826 2.553463

slide-39
SLIDE 39

GENERALIZED LINEAR MODELS IN PYTHON

Extract confidence intervals

print(model_GLM.conf_int()) lower 1 Intercept -5.419897 -1.969555 weight 1.076826 2.553463

slide-40
SLIDE 40

GENERALIZED LINEAR MODELS IN PYTHON

Extract confidence intervals

print(model_GLM.conf_int()) 0 upper Intercept -5.419897 -1.969555 weight 1.076826 2.553463

slide-41
SLIDE 41

GENERALIZED LINEAR MODELS IN PYTHON

Confidence intervals for odds

  • 1. Extract condence intervals for β
  • 2. Exponentiate endpoints

print(np.exp(model_GLM.conf_int())) 0 1 Intercept 0.004428 0.139519 weight 2.935348 12.851533

slide-42
SLIDE 42

Let's practice!

G E N E R AL IZE D L IN E AR MOD E L S IN P YTH ON

slide-43
SLIDE 43

Computing and describing predictions

G E N E R AL IZE D L IN E AR MOD E L S IN P YTH ON

Ita Cirovic Donev

Data Science Consultant

slide-44
SLIDE 44

GENERALIZED LINEAR MODELS IN PYTHON

Computing predictions

Aer obtaining model t

  • 1. Fied values for original x values
slide-45
SLIDE 45

GENERALIZED LINEAR MODELS IN PYTHON

Computing predictions

Aer obtaining model t

  • 1. ed values for original x values
  • 2. New values of x for predicted values
slide-46
SLIDE 46

GENERALIZED LINEAR MODELS IN PYTHON

Computing predictions

Horseshoe crab model y ~ weight

μ =

New measurement: weight = 2.85

μ = = 0.814 1 + exp(−3.6947 + 1.8151 × weight) exp(−3.6947 + 1.8151 × weight) 1 + exp(−3.6947 + 1.8151 × 2.85) exp(−3.6947 + 1.8151 × 2.85)

slide-47
SLIDE 47

GENERALIZED LINEAR MODELS IN PYTHON

Predictions in Python

Compute model predictions for dataset new_data

# Compute model predictions model_GLM.predict(exog = new_data)

slide-48
SLIDE 48

GENERALIZED LINEAR MODELS IN PYTHON

From probabilities to classes

slide-49
SLIDE 49

GENERALIZED LINEAR MODELS IN PYTHON

Computing class predictions

# Extract fitted probabilities from model crab['fitted'] = model.fittedvalues.values # Define cut-off value cut_off = 0.4 # Compute class predictions crab['pred_class'] = np.where(crab['fitted'] > cut_off, 1, 0)

slide-50
SLIDE 50

GENERALIZED LINEAR MODELS IN PYTHON

Computing class predictions

# Count occurences for each class crab['pred_class'].value_counts() 1 151 0 22

Cut-o

= 1 = 0 μ = 0.4

151 22

μ = 0.5

126 47

y ^ y ^

slide-51
SLIDE 51

GENERALIZED LINEAR MODELS IN PYTHON

Confusion matrix

slide-52
SLIDE 52

GENERALIZED LINEAR MODELS IN PYTHON

Confusion matrix - True Negatives

slide-53
SLIDE 53

GENERALIZED LINEAR MODELS IN PYTHON

Confusion matrix - True Positives

slide-54
SLIDE 54

GENERALIZED LINEAR MODELS IN PYTHON

Confusion matrix - False Positives

slide-55
SLIDE 55

GENERALIZED LINEAR MODELS IN PYTHON

Confusion matrix - False Negatives

slide-56
SLIDE 56

GENERALIZED LINEAR MODELS IN PYTHON

Confusion matrix in Python

print(pd.crosstab(y_actual, y_predicted, rownames=['Actual'], colnames=['Predicted'], margins = True)) Predicted 0 1 All Actual 0 15 47 62 1 7 104 111 All 22 151 173

slide-57
SLIDE 57

Let's practice!

G E N E R AL IZE D L IN E AR MOD E L S IN P YTH ON