[PPT] - 1 Outline Chi-square test Logistic regression 2 Chi-square test PowerPoint Presentation

SLIDE 1

Dongmei Li Department of Public Health Sciences Office of Public Health Studies University of Hawai’i at Mānoa

1

SLIDE 2

Outline

Chi-square test Logistic regression

2

SLIDE 3

Chi-square test

3

SLIDE 4

Chi-Square Test - Example

Data below reveal a negative association between smoking and education level. Let us test H0: no association in the population vs. Ha: association in the population.

4

SLIDE 5

χ2, Expected Frequencies

total table al column tot total row s frequencie xpected  

i

E E

5

SLIDE 6

Chi-Square Test of Association

A. Hypotheses.

H0: no association in population versus Ha: association in population

B. Test statistic.
C. P-value. Convert the X2

stat to a P-value with a a

Table E or software program.

 

) 1 )( 1 ( total table al column tot total row calculated cell in count expected and cell count,

bserved

where

cells all 2 2 stat

         



C R df E i E i O E E O

i i i i i i

6

SLIDE 7

Chi-Square Statistic - Example

7

SLIDE 8

Chi-Square Test, P-value

 X2

stat= 13.20 with 4 df

 Using Chi-square Table, find the row for 4 df  Find the chi-square values in this row that bracket

13.20  Bracketing values are 11.14 (P = .025) and 13.28 (P =

.01).

 Thus, .025 < P < .01 (closer to .01)

Probability in right tail df 0.98 0.25 0.20 0.15 0.10 0.05 0.025 0.01 0.01 4 0.48 5.39 5.99 6.74 7.78 9.49 11.14 13.28 14.86

8

SLIDE 9

Illustrative example X2

stat= 13.20 with 4 df

The P-value = AUC in the tail beyond X2

stat

9

SLIDE 10

Yates’ Continuity Corrected Chi-Square Statistic

 Two different chi-square statistics are used in practice  Pearson’s chi-square statistic (covered) is  Yates’ continuity-corrected chi-square statistic is:  The continuity-corrected method produces smaller chi-

square statistics and larger P-values.

 Both chi-square are used in practice.

 

cells all 2 2 stat



  

i i i

E E O

 

| |

cells all 2 2 1 2 c stat,



   

i i i

E E O

10

SLIDE 11

Chi-Square test using JMP

 Data set: Presentation4_chisqtest.jmp

11

SLIDE 12

Results from JMP

 P-value from both

likelihood ratio test and Pearson chi-square test is 0.0103

 Significant association

between education level and smoking status

12

SLIDE 13

Chi-Square, cont.

1.

How the chi-square works. When observed values = expected values, the chi-square statistic is 0. When the observed minus expected values gets large and evidence against H0 mounts

2.

Avoid chi-square tests in small samples. Do not use a chi-square test when more than 20% of the cells have expected values that are less than 5.

3.

Supplement chi-squares with measures of

association. Chi-square statistics do not measure

the strength of association. Use descriptive statistics

r Relative Risks to quantify “strength”.

13

SLIDE 14

Logistic Regression

14

SLIDE 15

Logistic regression example

 Surviving third-degree burns  These Presentation4_Burn.jmp data refer to 435 adults

who were treated for third-degree burns by the University of Southern California General Hospital Burn Center. The patients were grouped according to the area of third-degree burns on the body. (The groups are identified as midpoints of set intervals of log(area +1).) For each patient, it was recorded whether

r not they survived, and the area of their burn was

recorded as the midpoint of the group corresponding to their burn.

 Source: http://statmaster.sdu.dk/courses/st111/module14/index.html

15

SLIDE 16

Logistic regression example

 Variable Description  Midpoint: Midpoint of the group corresponding to the

patients burn.

 survive: Binary variable: survived=1, died=0  A first idea might be to model the relationship

between the probability of success (that the patient survives) and the explanatory variable ‘log(area +1)’ as a simple linear regression model.

16

SLIDE 17

Logistic regression example

 However, the scatterplot of the proportions of patients

surviving a third-degree burn against the explanatory variable shows a distinct curved relationship between the two variables, rather than a linear one. It seems that a transformation of the data is in place.

17

SLIDE 18

Logistic regression example

 The curved relationship is typical for many situations

where the response variable is binary.

 Some examples of the curved relationship

18

SLIDE 19

Logistic regression example

 The following scatterplot shows the logit-transformed

proportions of patients surviving a third-degree burn against the explanatory variable ‘log(area +1)’.

19

SLIDE 20

The simple logistic regression model

 The simple logistic regression model relates px to x

through the following equation:

 Alternatively, it can be written as

20

) (

1 1 ) | (

bx a x

e x X D P p

 

   

bx a x X D for

dds

p p

x x

     ) | log( ) 1 log(

SLIDE 21

Fit logistic regression in JMP

 Data set: Presentation4_Burn.jmp

21

SLIDE 22

Fit logistic regression in JMP

 Estimated logistic regression

22

int 66 . 10 71 . 22 ) ( log midpo p it    

SLIDE 23

Interpretation of logistic regression parameters

 If X has several discrete levels or is measured on a continuous

scale, there is no change in the interpretation of a (log odds of D when X=0)

 The log odds ratio comparing two exposure groups is  b is the log odds ratio associated with a unit increase in X  10.66 is the log odds ratio of death associated with a unit

increase in midpoint midpoints of set intervals of log(area +1).

23

b x b a x b a p p p p p p p p x X D for

dds

x X D for

dds

OR

x x x x x x x x

                              

   

] [ )] 1 ( [ )] 1 /( log[ )] 1 /( log[ ) 1 /( ) 1 /( log | 1 | log ) log(

1 1 1 1

SLIDE 24

Example of logistic regression model

 Consider a study of the analgesic effects of treatments on

elderly patients with neuralgia. Two test treatments and a placebo are compared. The response variable is whether the patient reported pain or not. Researchers recorded age and gender of the patients and the duration

f complaint before the treatment began. The data,

consisting of 60 patients, are contained in the data set Presentation4_logistic.jmp.

 Look at the difference between male and female on pain  Look at the treatment effect on pain

24

SLIDE 25

Logistic regression in JMP

 Data set: Presentation4_logistic.jmp  Analyze---Fit Model

25

SLIDE 26

Logistic regression in JMP

 Logistic regression results

26

) ( * 63 . 37 . ) ( log F sex I p it   

SLIDE 27

Interpretation of logistic regression parameters

 Suppose the exposure variable X only takes on two values (1 is

exposed and 0 is unexposed)

 When X=0, then log(p0/1-p0)=a+b*0 = a  So, a is the log odds of D amongst the unexposed.  The slope parameter b is just the log Odds Ratio.  0.63 is the log odds ratio of No Pain comparing females vs. males.

27

b b a b a p p p p p p p p X D for

dds

X D for

dds

OR                              ) ( ) 1 ( )] 1 /( log[ )] 1 /( log[ ) 1 /( ) 1 /( log | 1 | log ) log(

1 1 1 1

SLIDE 28

Odds ratio from JMP

 Odds ratio for sex  The odds ratio of reporting no pain comparing females

vs. males is 3.60 and the odds ratio could be as low as

1.25 and as high as 11.09 with 95% confidence from the

bserved data.

28

SLIDE 29

Odds ratio from JMP

 Calculate the odds ratio of no pain for comparing

treatment A or B vs. placebo.

29

SLIDE 30

Exercise

 A study is conducted to examine the effect of age on

coronary heart disease (CHD). The data includes the ID and the age of the subject and whether the subject has CHD or not.

 1. Fit a logistic regression to examine the effect of age on

CHD.

 2. Fit a logistic regression to examine the effect of age

group on CHD.

 Data set Presentation4_logisticCHD.jmp.

30

SLIDE 31

31