intro to contingency tables
play

Intro to Contingency Tables Author: Nicholas Reich Course: - PowerPoint PPT Presentation

Intro to Contingency Tables Author: Nicholas Reich Course: Categorical Data Analysis (BIOSTATS 743) Made available under the Creative Commons Attribution-ShareAlike 4.0 International License. Independence Definition: Two categorical variable


  1. Intro to Contingency Tables Author: Nicholas Reich Course: Categorical Data Analysis (BIOSTATS 743) Made available under the Creative Commons Attribution-ShareAlike 4.0 International License.

  2. Independence Definition: Two categorical variable are independent iff π ij = π i + π + j , ∀ i ∈ { 1 , 2 , .. I } and j ∈ { 1 , 2 , .. J } or P ( X = i , Y = j ) = P ( X = i ) P ( Y = j ) Independence implies that the conditional distribution reverts to marginal distribution π j | i = π ij = π i + π j + = π j + π i + π i + or under the independence assumption P ( Y = j | X = i ) = P ( Y = j )

  3. Testing for independence (Two-way contigency table) ◮ Under H 0 : π ij = π i + π + j , ∀ i , j , the expected cell counts are µ ij = n π i + π + j ◮ Usually π i + and π + j are unknown. Their MLEs are π i + = n i + π + j = n + j ˆ n , ˆ n ◮ Estimated expected cell counts are π + j = n i + n + j µ ij = n ˆ ˆ π i + ˆ n ◮ Pearson χ 2 statistic: I J µ ij ) 2 = ( n ij − ˆ X 2 = � � µ 2 i =1 j =1

  4. ◮ ˆ µ ij requires estimating π i + and π + j which have degrees of freedom I − 1 and J − 1, respectively. Notice the constraints � i π i + = � j π + j = 1 ◮ The degrees of freedom is ( IJ = 1) − ( I − 1) − ( J − 1) = ( I − 1)( J − 1) ◮ X 2 is asymptotically χ 2 ( I − 1)( J − 1) ◮ It is helpful to look at the residuals { ( O − E ) 2 } E The residuals can give useful information about where the model is fitting well or not

  5. Measure of Diagnostic Tests Diagnosis Disease Status + - D π 11 π 12 π 21 π 22 D ◮ Sensitivity: P (+ | D ) = π 11 π 1+ ◮ Specificity: P ( −| D ) = π 22 π 2+ ◮ An ideal diagnostic test has high Sensitivity, Specificity

  6. Example: Diagnosis Disease Status + - D 0.86 0.14 0.12 0.88 D ◮ Sensitivity = 0 . 86 ◮ Specificity = 0 . 88 However, from the clinical point, sensitivity and specificity do not provide useful information. So we introduce Positive Predictive Value and Negative Predictive Value

  7. ◮ Positive predictive value (PPV) = P ( D | +) = π 11 π +1 ◮ Negative predictive value (NPV) = P ( D |− ) = π 22 π +2 ◮ Relationship between PPV and sensitivity: PPV = P ( D | +) = P ( D ∩ +) P (+) P (+ | D ) P ( D ) = P (+ | D ) P ( D ) + P (+ | D ) P ( D ) P (+ | D ) P ( D ) = P (+ | D ) P ( D ) + (1 − P ( −| D )) P ( D ) Sensitivity × Prevalence = Sensitivity × Prevalence + (1 − Specificity) × (1 − Prevalence)

  8. The same example: Diagnosis Disease Status + - D 0.86 0.14 D 0.12 0.88 ◮ If the the prevalence is P ( D ) = 0 . 02 0 . 86 × 0 . 02 ◮ PPV = 0 . 86 × 0 . 02+0 . 12 × 0 . 98 ≈ 13% ◮ Notice: π 11 PPV � = π 11 + π 21 n 1 ◮ This is only true when n 1 + n 2 equals the disease prevalence

  9. Comparing two groups We first consider 2 × 2 tables. Suppose that the response variable Y has two categories: success and failure. The explanatory variable X has two categories, group 1 and group 2, with fixed sample sizes in each group. Response Y Explanatory X Success Failure Row Total group 1 n 11 = x 1 n 12 = n 1 − x 1 n 1 group 2 n 21 = x 2 n 22 = n 1 − x 2 n 2 The goal is to compare the probability of an outcome (success) of Y across the two levels of X. Assume: X 1 ∼ bin ( n 1 , π 1 ) , X 2 ∼ bin ( n 2 , π 2 ) ◮ difference of proportions ◮ relative risk ◮ odds ratio

  10. Difference of Proportions Response Y Explanatory X Success Failure Row Total group 1 n 11 = x 1 n 12 = n 1 − x 1 n 1 group 2 n 21 = x 2 n 22 = n 1 − x 2 n 2 ◮ The difference of proportions of successes is: π 1 − π 2 ◮ Comparison on failures is equivalent to comparison on successes: (1 − π 1 ) − (1 − π 2 ) = π 2 − π 1 ◮ Difference of proportions takes values in [ − 1 , 1]

  11. π 2 = n 11 n 1 − n 21 ◮ The estimate of π 1 − π 2 is ˆ π 1 − ˆ n 2 ◮ the estimate of the asymptotic standard error: π 2 ) = [ ˆ π 1 (1 − ˆ π 1 ) − ˆ π 2 (1 − ˆ π 2 ) ] 1 / 2 σ (ˆ ˆ π 1 − ˆ n 1 n 2 ◮ The statistic for testing H 0 : π 1 = π 2 vs. H a : π 1 � = π 2 Z = (ˆ π 1 − ˆ π 2 ) / ˆ σ (ˆ π 1 − ˆ π 2 ) which follows a standard normal distribution (normal + normal = normal) ◮ The CI is given by (ˆ π 1 − ˆ π 2 ) ± Z α/ 2 ˆ σ (ˆ π 1 − ˆ π 2 )

  12. Relative Risk ◮ Definition r = π 1 /π 2 ◮ Motivation: The difference between π 1 = 0 . 010 and π 2 = 0 . 001 is more noteworthy than the difference between π 1 = 0 . 410 and π 2 = 0 . 401. The “relative risk” (0.010/0.001=10, 0.410/0.401=1.02) is more informative than “difference of proportions” (0.009 for both). ◮ The estimate of r is ˆ r = ˆ π 1 / ˆ π 2

  13. ◮ The estimator converges to normality faster on the log scale. ◮ The estimator of log r is log ˆ r = log ˆ π 1 − log ˆ π 2 The asymptotic standard error of log ˆ r r ) = (1 − π 1 + 1 − π 2 ) 1 / 2 σ (log ˆ ˆ π 1 n 1 π 2 n 2 ◮ Delta method: If √ n (ˆ β − β 0 ) → N (0 , σ 2 ), then √ n ( f (ˆ β ) − f ( β 0 )) → N (0 , [ f ′ ( β 0 )] 2 σ 2 ) for any function f satisfying the condition that f ′ ( β ) exists ◮ Here β = π 1 or π 2 and f ( β ) = log( π 1 ) or log( π 1 )

  14. ◮ The CI for log ˆ r is [log ˆ r − Z 1 − α/ 2 ˆ σ (log ˆ r ) , log ˆ r + Z 1 − α/ 2 ˆ σ (log ˆ r )] ◮ The CI for ˆ r is [exp { log ˆ r − Z 1 − α/ 2 ˆ σ (log ˆ r ) } , exp { log ˆ r + Z 1 − α/ 2 ˆ σ (log ˆ r ) } ]

  15. Odds Ratio ◮ Odds in group 1: π 1 φ 1 = (1 − π 1 ) ◮ Interpretation: φ 1 = 3 means a success is three times as likely as a failure in group 1 ◮ Odds ratio: θ = φ 1 = π 1 / (1 − π 1 ) π 2 / (1 − π 2 ) ∼ χ 2 φ 2 ◮ Interpretation: θ = 4 means the odds of success in group 1 are four times the odds of success in group 2

  16. ◮ The estimate is θ = n 11 n 22 ˆ n 12 n 21 ◮ log(ˆ θ ) converge to normality much faster than ˆ θ ◮ An estimate of asymptotic standard error for log(ˆ θ ) is � 1 + 1 + 1 + 1 σ (log ˆ ˆ θ ) = n 11 n 12 n 21 n 22

  17. This formula can be derived using the Delta method Recall log ˆ θ = log(ˆ π 1 ) − log(1 − ˆ π 1 ) − log(ˆ π 2 ) + log(1 − ˆ π 2 ) First, f ( β ) = log(ˆ π 1 ) − log(1 − ˆ π 1 ) σ = π 1 (1 − π 1 ) f ′ ( β ) = 1 1 , + n 1 π 1 1 − π 1 1 1 [ f ′ ( β )] 2 σ 2 = + n 1 π 1 n 1 (1 − π 1 ) 1 1 The estimate is n 11 + n 12 Similar, when f ( β ) = log(ˆ π 2 ) − log(1 − ˆ π 2 )

  18. ◮ The Wald CI for log ˆ θ is log ˆ σ (log ˆ θ ± Z α/ 2 ˆ θ ) ◮ Exponentiation of the endpoints provides a confidence interval for ˆ θ

  19. Relationship between Odds Ratio and Relative Risk ◮ A large relative risk does not imply large odds ratio ◮ From the definitions of relative risk and odds ratio, we have θ = π 1 1 − π 2 = relative risk × 1 − π 2 π 2 1 − π 1 1 − π 1 ◮ When probabilities π 1 and π 2 (the risk in each row group)are both very small, then the second ratio above ≈ 1. Thus odds ratio ≈ relative risk ◮ This means when relative risk is not directly estimable, e.g., in case-control studies, and the probabilities π 1 and π 2 are both very small, the relative risk can be approximated by the odds ratio.

  20. Case-Control Studies and Odds Ratio Consider the case-control study of lung cancer: Lung Cancer Smoker Cases Controls Yes 688 650 No 21 59 Total 709 709 ◮ People are recruited based on lung cancer status, therefore P ( Y = j ) is known. However P ( X = i ) is unknown ◮ Conditional probabilities P ( X = i | Y = j ) can be estimated ◮ Conditional probabilities P ( Y = j | X = i ) cannot be estimated ◮ Relative risk and difference of proportions cannot be estimated

  21. ◮ Odds can be estimated: P (Case|Smoker) Odds of lung cancer among smoker = P (Control|Smoker) P (Case ∩ Smoker) P (Smoker) = P (Control ∩ Smoker) P (Smoker) P (Case ∩ Smoker) = P (Control ∩ Smoker) = 688 / 650 = 1 . 06 ◮ Odds is irrelevant to the probability of being a smoker ◮ Odds ratio can also be estimated: θ = P ( X = 1 | Y = 1) P ( X = 2 | Y = 2) P ( X = 1 | Y = 2) P ( X = 2 | Y = 1) = 2 . 97

  22. Supplementary: Review of the Delta Method The Delta method builds upon the Central Limit Theorem to allow us to examine the convergence of the distribution of a function g of a random variable X . It is not too complicated to derive the Delta method in the univariate case. We need to use Slutsky’s Theorem along the way; it will be helpful to first review ideas of convergence in order to better understand where Slutsky’s Theorem fits into the derivation.

  23. Delta Method: Convergence of Random Variables Consider a sequence of random variables X 1 , X 2 , . . . , X n , where the distribution of X i may be a function of of i . ◮ Let F n ( x ) be the CDF for X n and F ( x ) be the CDF for X . It is said that X n converges in distribution to X , written X n → d X , if lim n →∞ [ F n ( x ) − F ( x )] = 0 for all x where F ( x ) is continuous. ◮ It is said that X n converges in probability to X , written X n → p X if lim n →∞ [ X n − X ] = 0. Note that if X n → p X , then F n ( x ) → d F ( x ), since F n ( x ) = P ( X n ≤ x ) and F ( x ) = P ( X ≤ x ). (This is not a proof, but an intuition. The Wikipedia article on convergence has a nice proof.)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend