Logistic Regression: MLE vs. OLS3 in Excel2013 25 Aug 2016 V0H V0H - - PDF document

logistic regression mle vs ols3 in excel2013 25 aug 2016
SMART_READER_LITE
LIVE PREVIEW

Logistic Regression: MLE vs. OLS3 in Excel2013 25 Aug 2016 V0H V0H - - PDF document

Logistic Regression: MLE vs. OLS3 in Excel2013 25 Aug 2016 V0H V0H V0H Schield MLE vs. OLS3-Based Logistic Excel 2013 1 Schield MLE vs. OLS3-Based Logistic Excel 2013 2 Logistic Regression: Background & Goals MLE vs. OLS3 in Excel 2013


slide-1
SLIDE 1

Logistic Regression: MLE vs. OLS3 in Excel2013 25 Aug 2016 V0H 2014-Schield-Logistic-MLE-OLS3-Excel2013-Slides.pdf 1

Schield MLE vs. OLS3-Based Logistic Excel 2013

V0H 1

by Milo Schield Member: International Statistical Institute US Rep: International Statistical Literacy Project Director, W. M. Keck Statistical Literacy Project

Slides and data at: www.StatLit.org/

pdf/2014-Schield-Logistic-MLE-OLS3-Excel2013-slides.pdf Excel/2014-Schield-Logistic-MLE-OLS3-Excel2013.xlsx

Logistic Regression: MLE vs. OLS3 in Excel 2013

Schield MLE vs. OLS3-Based Logistic Excel 2013

V0H 2

Background & Goals

Doing logistic regression properly requires MLE. Doing MLE in Excel is not easy. See Schield 2014a Schield has identified three OLS shortcuts: OLS1: Model Ln(Odds(p)) where p is near 0 or 1 OLS2: Model Ln(Odds(p)) where p is grouped data OLS3: Use OLS to estimate logistic parameters. These slides compare OLS3 with MLE. Schield (2014b) presents the OLS3-based approach.

Schield MLE vs. OLS3-Based Logistic Excel 2013

V0H

Source: Minitab Pulse dataset

3

Predict Gender using Height

Schield MLE vs. OLS3-Based Logistic Excel 2013

V0H 4

Model Gender by Height (OLS) Must use logistic regression

This trend-line does not satisfy the least-squares assumptions and it goes outside the valid range. but it does pass through the joint mean.

Schield MLE vs. OLS3-Based Logistic Excel 2013

V0H 5

1a: MLE Logistic Regression

  • f Gender by Height

0.25 0.5 0.75 1 60 62 64 66 68 70 72 74 76 Probability (Male) Height (inches)

Gender by Height

Pulse.xls MLE Logistic Schield Excel 2013 Women Men MLE Logistic regression P(Y|X) = 1 / {1 + Exp[‐(a + bX)]} a = ‐53.32 ; b = 0.7905. Xo = ‐a/b = 67.4529 Slope(X=Xo) = b/4 = 0.1976 X = Xo if P(Y) = 0.5

Schield MLE vs. OLS3-Based Logistic Excel 2013

V0H

Different slopes but they intersect near P(Y) = 0.5

6

1b: MLE Logistic Regression vs OLS(Y|X) of Gender on Height

y = 0.0953x ‐ 5.9282 R² = 0.5102

0% 25% 50% 75% 100% 60 62 64 66 68 70 72 74 76

Probability (Male) Height (inches)

Gender by Height

Excel 2013 MLE Logistic and OLS Gender on Height Pulse.xls Schield 0.5 = 0.0953*Xc ‐ 5.982 Xc = (0.5 + 5.982)/ 0.0953 Xc = 67.4626 Slope = 0.0953 Based on MLE logistic regression P(Male) = 0.5 when X = 67.4529 Slope (X = 67.45) = 0.1976

slide-2
SLIDE 2

Logistic Regression: MLE vs. OLS3 in Excel2013 25 Aug 2016 V0H 2014-Schield-Logistic-MLE-OLS3-Excel2013-Slides.pdf 2

Schield MLE vs. OLS3-Based Logistic Excel 2013

V0H

Similar slopes near P(Y) = 0.5

7

1c: MLE Logistic Regression vs OLS(X|Y) of Height on Gender

0.2 0.4 0.6 0.8 1 60 62 64 66 68 70 72 74 76 Probability (Male) Height (inches)

Gender by Height

Pulse.xls MLE Logistic and OLS

  • f Height by Gender

Schield Excel 2013 Women Men Ave Height of Men: 70.75" Ave Ht of All: 68.72" 62% are men Ave Height of Women 65.40" MLE logistic regression P(Male) = 0.5 when X = 67.4529 Slope (X = 67.45) = 0.1976 OLS2 Height|Gender Means difference: 5.35 Slope = 1/5.35 = 0.1868

Schield MLE vs. OLS3-Based Logistic Excel 2013

V0H

Very nice estimate!

8

1d: Intersection from OLS(Y|X) Get slope from OLS(X|Y)

0.25 0.5 0.75 1 60 62 64 66 68 70 72 74 76 Probability (Male) Height (inches)

Logistic: Gender by Height: MLE vs OLS‐Based

Pulse.xls Schield Excel 2013 Women Men Estimate: Dashed line MLE: Solid line P = 1/{1+exp[4(Xo-X)/(M2-M1]} P = 1/{1+exp[4(67.45-X)/(5.35)]}

Schield MLE vs. OLS3-Based Logistic Excel 2013

V0H

.

9

2a: MLE Logistic Regression

  • f Gender by Weight.

0.25 0.5 0.75 1 90 115 140 165 190 215 Probability (Male) Weight (pounds)

Gender by Weight

Pulse.xls MLE Logistic Schield Excel 2013 Women Men MLE Logistic regression P(Y|X) = 1 / {1 + Exp[‐(a + bX)]} a = ‐21.48 ; b = 0.1577. Xo = ‐a/b = 136.218 Slope(X=Xo) = b/4 = 0.0394 X = Xo if P(X) = 0.5

Schield MLE vs. OLS3-Based Logistic Excel 2013

V0H

Different slopes but they intersect near P(Y) = 0.5

10

2b: MLE Logistic Regression vs OLS(Y|X) of Gender on Weight

y = 0.0146x ‐ 1.4956 R² = 0.5022 0% 25% 50% 75% 100% 100 110 120 130 140 150 160 170

Probability (Male) Weight (pounds)

Gender by Weight

Excel 2013 MLE Logistic and OLS Gender on Weight Pulse.xls Schield 0.5 = 0.0146*Xc ‐ 1.4956 Xc = (0.5 + 1.4956)/ 0.0146 Xc = 136.7 OLS Slope = 0.0146 MLE logistic regression P(Male) = 0.5 when X = 136.22# Slope (X=136.22) = 0.0394

Schield MLE vs. OLS3-Based Logistic Excel 2013

V0H

Similar slopes near P(Y) = 0.5

11

2c: MLE Logistic Regression vs OLS(X|Y) of Weight on Gender

0.00 0.25 0.50 0.75 1.00 110 120 130 140 150 160 170 Probability (Male) Weight (pounds)

Gender by Weight

Pulse.xls MLE Logistic and OLS

  • f Weight by Gender

Schield Excel 2013 Women Men Ave Weight Men: 158.26 Ave Weight of Women 123.80# Ave Wt of All: 145.15# 62% are men OLS Weight|Gender Means difference: 34.46 Slope = 1/34.46 = 0.0290 MLE logistic regression P(Male) = 0.5 when X = 136.22# Slope (X=136.22) = 0.0394

Schield MLE vs. OLS3-Based Logistic Excel 2013

V0H

Fairly good estimate

12

2d: Intersection from OLS(Y|X) Get slope from OLS(X|Y)

0.25 0.5 0.75 1 100 110 120 130 140 150 160 170 180 Probability (Male) Weight (pounds)

Logistic: Gender by Weight: MLE vs OLS‐Based

Pulse.xls Schield Excel 2013 Women Men Estimate: Dashed line MLE: Solid line

slide-3
SLIDE 3

Logistic Regression: MLE vs. OLS3 in Excel2013 25 Aug 2016 V0H 2014-Schield-Logistic-MLE-OLS3-Excel2013-Slides.pdf 3

Schield MLE vs. OLS3-Based Logistic Excel 2013

V0H

Almost flat! Discriminatory power is weak.

13

3a: MLE Logistic Regression

  • f Gender by Rest Pulse

0.25 0.5 0.75 1 40 50 60 70 80 90 100 110 120 Probability (Male) Rest Pulse (bpm)

Gender by Rest Pulse

Pulse.xls MLE Logistic Schield Excel 2013 Women Men MLE Logistic regression P(Y|X) = 1 / {1 + Exp[‐(a + bX)]} a = 4.61; b = ‐0.0561. Slope(X=Xo) = b/4 = ‐0.0140 Xo = ‐a/b = 82.22 bpm X = Xo if P(X) = 0.5

Schield MLE vs. OLS3-Based Logistic Excel 2013

V0H

MLE and OLS1 match near P(Y|X) = 0.5

y = ‐0.0127x + 1.5417 R² = 0.0814

0% 25% 50% 75% 100% 40 50 60 70 80 90 100 110 120

Probability (Male) Rest Pulse (bpm)

Gender by Rest Pulse

Excel 2013 MLE Logistic and OLS Gender on Pulse1 Pulse.xls Schield P(Y) = 0.5 if X = 82.32 Slope = ‐0.0127 MLE logistic regression P(Male) = 0.5 when Xo = 82.22 bpm Slope = ‐0.0140

14

3b: MLE Logistic Regression vs OLS(Y|X) of Gender on Pulse1

Schield MLE vs. OLS3-Based Logistic Excel 2013

V0H

Very dissimilar slopes near P(Y) = 0.5

15

3c: MLE Logistic Regression vs OLS(X|Y) of Gender on Pulse1

0.2 0.4 0.6 0.8 1 50 60 70 80 90 100 Probability (Male) Rest Pulse (bpm)

Gender by Rest Pulse

Pulse.xls MLE Logistic and OLS

  • f Rest Pulse by Gender

Schield Excel 2013 Women Men Ave rest pulse of Women 76.86 bpm Ave rest pulse of Men: 70.42 Ave rest pulse of All: 72.87 bpm 62% are men MLE logistic regression Slope = ‐0.0140 OLS Pulse1|Gender Means difference: ‐6.44 Slope = 1/(‐6.44) = ‐0.1544

Schield MLE vs. OLS3-Based Logistic Excel 2013

V0H

Very bad estimate

16

3d: Intersection from OLS(Y|X); Get slope from OLS(X|Y)

0.25 0.5 0.75 1 40 50 60 70 80 90 100 110 120 Probability (Male) Weight (pounds)

Logistic: Gender by Pulse1: MLE vs OLS‐Based

Pulse.xls Schield Excel 2013 Women Men Estimate: Dashed line MLE: Solid line P = 1/{1+exp[4(Xo-X)/(M2-M1]} P = 1/{1+exp[4(82.32-X)/(-6.44)]}

Schield MLE vs. OLS3-Based Logistic Excel 2013

V0H

#1: R2 = 51% by Height. “Good estimate” #2: R2 = 50% by Weight. “Fair estimate” #3: R2 = 8% by Rest Pulse. “Bad estimate” Conclusion #1: Using OLS(Y|X) for Xo and using OLS(X|Y) for the associated slope works fairly well when the overlap is small or moderate: OLS(Y|X). R-squared is high: > 0.5 Conclusion #2: Must use MLE when the overlap is large: OLS(Y|X) R-squared is low (< 0.5). But why bother if the model explains so little?

17

Analysis and Conclusion Model Gender:

Schield MLE vs. OLS3-Based Logistic Excel 2013

V0H

Schield, Milo (2014a). www.statlit.org/pdf/ Model-Logistic-MLE1A-Excel2013-Slides.pdf Model-Logistic-MLE1C-Excel2013-Slides.pdf Schield, Milo (2014b). www.statlit.org/pdf/ 2014-Schield-Logistic-OLS3-Excel2013-Slides.pdf Ln(Odds) = a+bX. P/(1-P) = exp(a+bX) P = 1 / [1 + exp(-a –bX)]. If P=1/2 at X=Xo, a = -bXo. P = 1 / {1 + exp[b(Xo – X)]}. b = 4*(dp/dx)|X=Xo. P = 1 / {1 + exp[4*(Xo-X)*Slope]} If slope = 1/(M2-M1), P = 1/{1+exp[4(Xo-X)/(M2-M1]}

18

References & Derivation

slide-4
SLIDE 4

Schield MLE vs. OLS3-Based Logistic Excel 2013

V0H 1

by Milo Schield Member: International Statistical Institute US Rep: International Statistical Literacy Project Director, W. M. Keck Statistical Literacy Project

Slides and data at: www.StatLit.org/

pdf/2014-Schield-Logistic-MLE-OLS3-Excel2013-slides.pdf Excel/2014-Schield-Logistic-MLE-OLS3-Excel2013.xlsx

Logistic Regression: MLE vs. OLS3 in Excel 2013

slide-5
SLIDE 5

Schield MLE vs. OLS3-Based Logistic Excel 2013

V0H 2

Background & Goals

Doing logistic regression properly requires MLE. Doing MLE in Excel is not easy. See Schield 2014a Schield has identified three OLS shortcuts: OLS1: Model Ln(Odds(p)) where p is near 0 or 1 OLS2: Model Ln(Odds(p)) where p is grouped data OLS3: Use OLS to estimate logistic parameters. These slides compare OLS3 with MLE. Schield (2014b) presents the OLS3-based approach.

slide-6
SLIDE 6

Schield MLE vs. OLS3-Based Logistic Excel 2013

V0H

Source: Minitab Pulse dataset

3

Predict Gender using Height

slide-7
SLIDE 7

Schield MLE vs. OLS3-Based Logistic Excel 2013

V0H 4

Model Gender by Height (OLS) Must use logistic regression

This trend-line does not satisfy the least-squares assumptions and it goes outside the valid range. but it does pass through the joint mean.

slide-8
SLIDE 8

Schield MLE vs. OLS3-Based Logistic Excel 2013

V0H 5

1a: MLE Logistic Regression

  • f Gender by Height

0.25 0.5 0.75 1 60 62 64 66 68 70 72 74 76

Probability (Male) Height (inches)

Gender by Height

Pulse.xls MLE Logistic Schield Excel 2013 Women Men MLE Logistic regression P(Y|X) = 1 / {1 + Exp[-(a + bX)]} a = -53.32 ; b = 0.7905. Xo = -a/b = 67.4529 Slope(X=Xo) = b/4 = 0.1976 X = Xo if P(Y) = 0.5

slide-9
SLIDE 9

Schield MLE vs. OLS3-Based Logistic Excel 2013

V0H

Different slopes but they intersect near P(Y) = 0.5

6

1b: MLE Logistic Regression vs OLS(Y|X) of Gender on Height

y = 0.0953x - 5.9282 R² = 0.5102

0% 25% 50% 75% 100% 60 62 64 66 68 70 72 74 76

Probability (Male) Height (inches)

Gender by Height

Excel 2013 MLE Logistic and OLS Gender on Height Pulse.xls Schield 0.5 = 0.0953*Xc - 5.982 Xc = (0.5 + 5.982)/ 0.0953 Xc = 67.4626 Slope = 0.0953 Based on MLE logistic regression P(Male) = 0.5 when X = 67.4529 Slope (X = 67.45) = 0.1976

slide-10
SLIDE 10

Schield MLE vs. OLS3-Based Logistic Excel 2013

V0H

Similar slopes near P(Y) = 0.5

7

1c: MLE Logistic Regression vs OLS(X|Y) of Height on Gender

0.2 0.4 0.6 0.8 1 60 62 64 66 68 70 72 74 76

Probability (Male) Height (inches)

Gender by Height

Pulse.xls MLE Logistic and OLS

  • f Height by Gender

Schield Excel 2013 Women Men Ave Height of Men: 70.75" Ave Ht of All: 68.72" 62% are men Ave Height of Women 65.40" MLE logistic regression P(Male) = 0.5 when X = 67.4529 Slope (X = 67.45) = 0.1976 OLS2 Height|Gender Means difference: 5.35 Slope = 1/5.35 = 0.1868

slide-11
SLIDE 11

Schield MLE vs. OLS3-Based Logistic Excel 2013

V0H

Very nice estimate!

8

1d: Intersection from OLS(Y|X) Get slope from OLS(X|Y)

0.25 0.5 0.75 1 60 62 64 66 68 70 72 74 76

Probability (Male) Height (inches)

Logistic: Gender by Height: MLE vs OLS-Based

Pulse.xls Schield Excel 2013 Women Men Estimate: Dashed line MLE: Solid line

P = 1/{1+exp[4(Xo-X)/(M2-M1]} P = 1/{1+exp[4(67.45-X)/(5.35)]}

slide-12
SLIDE 12

Schield MLE vs. OLS3-Based Logistic Excel 2013

V0H

.

9

2a: MLE Logistic Regression

  • f Gender by Weight.

0.25 0.5 0.75 1 90 115 140 165 190 215

Probability (Male) Weight (pounds)

Gender by Weight

Pulse.xls MLE Logistic Schield Excel 2013 Women Men MLE Logistic regression P(Y|X) = 1 / {1 + Exp[-(a + bX)]} a = -21.48 ; b = 0.1577. Xo = -a/b = 136.218 Slope(X=Xo) = b/4 = 0.0394 X = Xo if P(X) = 0.5

slide-13
SLIDE 13

Schield MLE vs. OLS3-Based Logistic Excel 2013

V0H

Different slopes but they intersect near P(Y) = 0.5

10

2b: MLE Logistic Regression vs OLS(Y|X) of Gender on Weight

y = 0.0146x - 1.4956 R² = 0.5022 0% 25% 50% 75% 100% 100 110 120 130 140 150 160 170

Probability (Male) Weight (pounds)

Gender by Weight

Excel 2013 MLE Logistic and OLS Gender on Weight Pulse.xls Schield 0.5 = 0.0146*Xc - 1.4956 Xc = (0.5 + 1.4956)/ 0.0146 Xc = 136.7 OLS Slope = 0.0146 MLE logistic regression P(Male) = 0.5 when X = 136.22# Slope (X=136.22) = 0.0394

slide-14
SLIDE 14

Schield MLE vs. OLS3-Based Logistic Excel 2013

V0H

Similar slopes near P(Y) = 0.5

11

2c: MLE Logistic Regression vs OLS(X|Y) of Weight on Gender

0.00 0.25 0.50 0.75 1.00 110 120 130 140 150 160 170

Probability (Male) Weight (pounds)

Gender by Weight

Pulse.xls MLE Logistic and OLS

  • f Weight by Gender

Schield Excel 2013 Women Men Ave Weight Men: 158.26 Ave Weight of Women 123.80# Ave Wt of All: 145.15# 62% are men OLS Weight|Gender Means difference: 34.46 Slope = 1/34.46 = 0.0290 MLE logistic regression P(Male) = 0.5 when X = 136.22# Slope (X=136.22) = 0.0394

slide-15
SLIDE 15

Schield MLE vs. OLS3-Based Logistic Excel 2013

V0H

Fairly good estimate

12

2d: Intersection from OLS(Y|X) Get slope from OLS(X|Y)

0.25 0.5 0.75 1 100 110 120 130 140 150 160 170 180

Probability (Male) Weight (pounds)

Logistic: Gender by Weight: MLE vs OLS-Based

Pulse.xls Schield Excel 2013 Women Men Estimate: Dashed line MLE: Solid line

slide-16
SLIDE 16

Schield MLE vs. OLS3-Based Logistic Excel 2013

V0H

Almost flat! Discriminatory power is weak.

13

3a: MLE Logistic Regression

  • f Gender by Rest Pulse

0.25 0.5 0.75 1 40 50 60 70 80 90 100 110 120

Probability (Male) Rest Pulse (bpm)

Gender by Rest Pulse

Pulse.xls MLE Logistic Schield Excel 2013 Women Men MLE Logistic regression P(Y|X) = 1 / {1 + Exp[-(a + bX)]} a = 4.61; b = -0.0561. Slope(X=Xo) = b/4 = -0.0140 Xo = -a/b = 82.22 bpm X = Xo if P(X) = 0.5

slide-17
SLIDE 17

Schield MLE vs. OLS3-Based Logistic Excel 2013

V0H

MLE and OLS1 match near P(Y|X) = 0.5

y = -0.0127x + 1.5417 R² = 0.0814

0% 25% 50% 75% 100% 40 50 60 70 80 90 100 110 120

Probability (Male) Rest Pulse (bpm)

Gender by Rest Pulse

Excel 2013 MLE Logistic and OLS Gender on Pulse1 Pulse.xls Schield P(Y) = 0.5 if X = 82.32 Slope = -0.0127 MLE logistic regression P(Male) = 0.5 when Xo = 82.22 bpm Slope = -0.0140

14

3b: MLE Logistic Regression vs OLS(Y|X) of Gender on Pulse1

slide-18
SLIDE 18

Schield MLE vs. OLS3-Based Logistic Excel 2013

V0H

Very dissimilar slopes near P(Y) = 0.5

15

3c: MLE Logistic Regression vs OLS(X|Y) of Gender on Pulse1

0.2 0.4 0.6 0.8 1 50 60 70 80 90 100

Probability (Male) Rest Pulse (bpm)

Gender by Rest Pulse

Pulse.xls MLE Logistic and OLS

  • f Rest Pulse by Gender

Schield Excel 2013 Women Men Ave rest pulse of Women 76.86 bpm Ave rest pulse of Men: 70.42 Ave rest pulse of All: 72.87 bpm 62% are men MLE logistic regression Slope = -0.0140 OLS Pulse1|Gender Means difference: -6.44 Slope = 1/(-6.44) = -0.1544

slide-19
SLIDE 19

Schield MLE vs. OLS3-Based Logistic Excel 2013

V0H

Very bad estimate

16

3d: Intersection from OLS(Y|X); Get slope from OLS(X|Y)

0.25 0.5 0.75 1 40 50 60 70 80 90 100 110 120

Probability (Male) Weight (pounds)

Logistic: Gender by Pulse1: MLE vs OLS-Based

Pulse.xls Schield Excel 2013 Women Men Estimate: Dashed line MLE: Solid line

P = 1/{1+exp[4(Xo-X)/(M2-M1]} P = 1/{1+exp[4(82.32-X)/(-6.44)]}

slide-20
SLIDE 20

Schield MLE vs. OLS3-Based Logistic Excel 2013

V0H

#1: R2 = 51% by Height. “Good estimate” #2: R2 = 50% by Weight. “Fair estimate” #3: R2 = 8% by Rest Pulse. “Bad estimate” Conclusion #1: Using OLS(Y|X) for Xo and using OLS(X|Y) for the associated slope works fairly well when the overlap is small or moderate: OLS(Y|X). R-squared is high: > 0.5 Conclusion #2: Must use MLE when the overlap is large: OLS(Y|X) R-squared is low (< 0.5). But why bother if the model explains so little?

17

Analysis and Conclusion Model Gender:

slide-21
SLIDE 21

Schield MLE vs. OLS3-Based Logistic Excel 2013

V0H

Schield, Milo (2014a). www.statlit.org/pdf/ Model-Logistic-MLE1A-Excel2013-Slides.pdf Model-Logistic-MLE1C-Excel2013-Slides.pdf Schield, Milo (2014b). www.statlit.org/pdf/ 2014-Schield-Logistic-OLS3-Excel2013-Slides.pdf Ln(Odds) = a+bX. P/(1-P) = exp(a+bX) P = 1 / [1 + exp(-a –bX)]. If P=1/2 at X=Xo, a = -bXo. P = 1 / {1 + exp[b(Xo – X)]}. b = 4*(dp/dx)|X=Xo. P = 1 / {1 + exp[4*(Xo-X)*Slope]} If slope = 1/(M2-M1), P = 1/{1+exp[4(Xo-X)/(M2-M1]}

18

References & Derivation