Naval Postgraduate School Logistic Regression Professor Ron Fricker - PDF document

1 Naval Postgraduate School Logistic Regression Professor Ron Fricker Monterey, California for Survey Data

Goals for this Lecture • Introduction to logistic regression – Discuss when and why it is useful – Interpret output • Odds and odds ratios – Illustrate use with examples • Show how to run in JMP • Discuss other software for fitting linear and logistic regression models to complex survey data 2

Logistic Regression • Logistic regression – Response ( Y ) is binary representing event or not – Model, where p i =Pr( Y i =1): ⎛ ⎞ = p β + β + β + + β ⎜ ⎟ K i ln 1 X X X − 0 1 1 2 2 i i k ki ⎝ ⎠ p i • In surveys, useful for modeling: – Probability respondent says “yes” (or “no”) • Can also dichotomize other questions – Probability respondent in a (binary) class 3

Why Logistic Regression? • Some reasons: – Resulting “S” curve fits many observed phenomenon – Model follows the same general principles as linear regression • Can estimate probability p of binary outcome ( ) β ˆ + β ˆ + β ˆ + + β ˆ K exp x x x 0 1 1 2 2 k k = ˆ p ( ) ˆ ˆ ˆ ˆ + β + β + β + + β K 1 exp x x x 0 1 1 2 2 k k – Estimates of p bounded between 0 and 1 4

Linear Regression with Binary Y s • Example: modeling presence or absence of coronary heart disease (CHD) as a function of age • Data looks like this: ID Age CHD 1 20 0 – 100 obs 2 23 0 3 24 0 – min age = 20 4 25 0 – max age = 69 5 25 1 6 26 0 – 43 w/ CHD 7 26 0 8 28 0 . . 5 . . . . . . .

Modeling CHD Existence • Imagine each subject flips a coin: Heads = CHD Tails = no CHD • Each coin has a different probability of heads related to subject’s age • Only observe existence of CHD – y =1, has CHD; y =0, does not • We want to model the chance of getting CHD as a function of age 6

Proportion with CHD by Age CHD Age Group n Absent Present Proportion 20-29 10 9 1 0.10 30-34 15 13 2 0.13 35-39 12 9 3 0.25 40-44 15 10 5 0.33 45-49 13 7 6 0.46 50-54 8 3 5 0.63 55-59 17 4 13 0.76 60-69 10 2 8 0.80 Total 100 57 43 0.43 7

8 70 60 Mean Group Age 50 Plotting the Proportions 40 30 20 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Proportion w / CHD

Interpreting Model Results 1 0.8 p(CHD) 0.6 0.4 0.2 0 10 30 50 70 90 Age If age is 50 years then the probability of CHD is about 0.56 9

Logistic Regression: The Picture 1.2 1 Probability of CHD 0.8 0.6 data 0.4 p (age) 0.2 0 -0.2 0 10 20 30 40 50 60 70 80 90 100 Age 10

Where Logistic Regression Fits Independent or Predictor Variable Continuous Categorical Dependent or Response Continuous Linear reg. Linear w/ dummy regression variables Logistic reg. Categorical Logistic w/ dummy regression variables 11

Logistic Regression in JMP • Fit much like multiple regression: Analyze > Fit Model – Fill in Y with nominal binary dependent variable – Put X s in model by highlighting and then clicking “Add” • Use “Remove” to take out X s – Click “Run Model” when done • Takes care of missing values and non- numeric data automatically 12

Estimating the Parameters • JMP estimates β s via maximum likelihood • Given estimated β s, probabilities estimated as ( ) β ˆ + β ˆ + β ˆ + + β ˆ K exp x x x 0 1 1 2 2 3 k = ˆ p ( ) ˆ ˆ ˆ ˆ + β + β + β + + β K 1 exp x x x 0 1 1 2 2 3 k • Calculating probabilities in JMP is easy – After Fit Model, red triangle > Save Probability Formula 13

Probability, Odds, and Log Odds • Probability ( p ) – Number between 0 and 1 – Example: Pr(Red Sox win next World Series) = 5/8 = 0.62 • Odds: p /( 1- p ) – Any number > 0 – Example: Odds Red Sox win World Series are 5/3 = 1.667 • Log odds: ln( p / 1- p ) – Any number from - ¶ to + ¶ – Log odds is sometimes called the “logit” 14

Interpreting the β s “slope” p -value Log odds of having CHD • Slope is positive and significant – Increasing age means higher probability of coronary heart disease – Increase Age by 1 year and log odds of CHD increases by 0.11 – No t -test, χ -square test instead • p -value still means the same thing 15

Final Model and Results − + exp( 5.31 0.111x ) age = + ˆ(CHD) p − + 1 exp( 5.31 0.111x ) age 1 Age can be any (positive) number and answer still makes sense 0.8 p (C H D ) 0.6 0.4 0.2 0 10 30 50 70 90 Age 16

Odds Ratios – An Example • An odds ratio is, literally, ratio of two odds – Example from some recent (non-survey) work: • Odds IAer retained = 2.01 • Odds non-IAer retained = 1.55 • Odds ratio = 1.30 17

Interpreting the Slope of an Indicator Variable • Let x 1 be an indicator variable – Say, x 1 =1 means male and x 1 =0 means female • Consider the ratio of two logistic regression models, one for males and one for females: ⎛ ⎞ ⎛ ⎞ β + β + β + + β K |male |female X X p p = 0 1 2 2 ⎜ i ⎟ ⎜ i ⎟ i k ki ln ln − − β + β + + β K ⎝ 1 |male ⎠ ⎝ 1 |female ⎠ p p X X 0 2 2 i i i k ki • Exponentiate numerator and denominator: β β β β L exp( )exp( )exp( ) exp( ) X X = β = 0 1 2 2 i k ki exp( ) O. . R β β β 1 L exp( )exp( ) ex p( ) X X 0 2 2 i k ki 18

Example: Using Logistic Regression in NPS New Student Survey • Dichotomize Q1 into “satisfied” (4 or 5) and “not satisfied” (1, 2, or 3) • Model satisfied on Gender and Type Student 19

20 Compare the Output to Raw Data

Regression in Complex Surveys • Parameters are fit to minimize the sums of squared errors to the population: N ( ) ∑ [ ] 2 = − + SSE y B B x 0 1 i i = 1 i • Resulting estimators: = ∑ ∑ ∑ ∑ ∑ ∑ − ˆ − w y B w w w x y w y w x w 1 i i i i i i i i i i i i ˆ ∈ ∈ and ˆ = ∈ ∈ ∈ ∈ i S i S i S i S i S i S B B ∑ 0 1 2 ⎛ ⎞ w ∑ ∑ ∑ −⎜ i 2 w x w x ⎟ w ∈ i S i i i i i ⎝ ⎠ ∈ ∈ ∈ i S i S i S • Still need to estimate standard errors… 21

Using SAS for Regression • SAS procedures for regression assuming SRS: – PROC REG – PROC LOGISTIC • In SAS v9.1 for complex surveys – PROC SURVEYREG – PROC SURVEYLOGISTIC • See http://support.sas.com/onlinedoc/913/docMainpage.jsp 22

Using Stata for Regression • Stata 9: SVY procedures for regression include – svy:regress – svy:logistic – svy:logit • See www.stata.com/stata9/svy.html for more detail 23

Using R / S+ for Regression • ‘survey’ package by Thomas Lumley – Must install as library for S+ or R – Copy up on Blackboard • Has svyglm for generalized linear models • If like usual glm in S+, can do linear and logistic modeling – But I need to look more closely at it… • See http://faculty.washington.edu/tlumley/survey/ 24

What We Have Just Learned • Introduced logistic regression – Discussed when and why it is useful – Interpreted output • Odds and odds ratios – Illustrated use with examples • Showed how to run in JMP • Discussed other software for fitting linear and logistic regression models to complex survey data 25

Naval Postgraduate School Logistic Regression Professor Ron Fricker - PDF document

1 Naval Postgraduate School Logistic Regression Professor Ron Fricker Monterey, California for Survey Data Goals for this Lecture Introduction to logistic regression Discuss when and why it is useful Interpret output Odds

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Todays lecture Logistic regression How can we use logistic regression for reranking? Shay

From Logistic Regression to Neural Networks CMSC 470 Marine Carpuat Logistic Regression What

LEARNING Outline Math Behind Logistic Regression Visualizing Logistic Regression Loss

Workshop 10.5a: Logistic regression Murray Logan August 23, 2016 Table of contents 1 Logistic

Logistic Regression using OLS1D in Excel 2013 XL4D: V0H XL4D: V0H XL4D: V0H 2015 Schield

Workshop 10.5a: Logistic regression Murray Logan 05 Sep 2016 Section 1 Logistic regression

Lecture 3: Logistic Regression Feng Li Shandong University fli@sdu.edu.cn September 21, 2020

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

XL4B: Logistic Regression using OLS1B in Excel 2013 25 Feb 2018 V0C-2x XL4B: V0C-2x XL4B: V0C-2x

Logistic regression Shay Cohen (based on slides by Sharon Goldwater) 28 October 2019 Todays

Machine Learning Logistic Regression Hamid R. Rabiee Spring 2015

Learning From Data Lecture 9 Logistic Regression and Gradient Descent Logistic Regression

Logistic regression Predict binary outcomes (success/failure) from numerical or categorical

Logistic Regression: MLE vs. OLS3 in Excel2013 25 Aug 2016 V0H V0H V0H Schield MLE vs.

Lecture 10: Classification and Logistic Regression CS109A Introduction to Data Science Pavlos

Short-range radionuclide dispersion and deposition modelling Vienna, January 2011 University of

Agenda Step 2 Planning, DSRIP/1115 Waiver, and DHHS Personnel, Enrollment Data

APPLICATION PROCESS CLASS OF 2020 More information posted on our website! Agenda General

Spectrum 2007 Overview and New Changes UNAIDS Reference Group on Estimates, Modelling and

TODAY in 2017, We excels in a huge range of offerings through logistical solutions :-

Moatize and the Nacala Logistics Corridor welcome New Investor Rio de Janeiro, December 09, 2014

Oppo rtunity Day (26 August 2019) 2Q2019 Re sult Your Ultimate Solution Par tner WHA

Sambuz

Useful Links

Newsletter

Mail Us