Intro to Contingency Tables
Author: Nicholas Reich Course: Categorical Data Analysis (BIOSTATS 743)
Made available under the Creative Commons Attribution-ShareAlike 4.0 International License.
Intro to Contingency Tables Author: Nicholas Reich Course: - - PowerPoint PPT Presentation
Intro to Contingency Tables Author: Nicholas Reich Course: Categorical Data Analysis (BIOSTATS 743) Made available under the Creative Commons Attribution-ShareAlike 4.0 International License. Independence Definition: Two categorical variable
Made available under the Creative Commons Attribution-ShareAlike 4.0 International License.
◮ Under H0 : πij = πi+π+j, ∀ i, j, the expected cell counts are
◮ Usually πi+ and π+j are unknown. Their MLEs are
◮ Estimated expected cell counts are
◮ Pearson χ2 statistic:
I
J
◮ ˆ
j π+j = 1 ◮ The degrees of freedom is
◮ X 2 is asymptotically χ2 (I−1)(J−1) ◮ It is helpful to look at the residuals
◮ Sensitivity: P(+|D) = π11 π1+ ◮ Specificity: P(−|D) = π22 π2+ ◮ An ideal diagnostic test has high Sensitivity, Specificity
◮ Sensitivity = 0.86 ◮ Specificity = 0.88
◮ Positive predictive value (PPV) = P(D|+) = π11 π+1 ◮ Negative predictive value (NPV) = P(D|−) = π22 π+2 ◮ Relationship between PPV and sensitivity:
◮ If the the prevalence is P(D) = 0.02 ◮ PPV = 0.86×0.02 0.86×0.02+0.12×0.98 ≈ 13% ◮ Notice:
◮ This is only true when n1 n1+n2 equals the disease prevalence
◮ difference of proportions ◮ relative risk ◮ odds ratio
◮ The difference of proportions of successes is: π1 − π2 ◮ Comparison on failures is equivalent to comparison on
◮ Difference of proportions takes values in [−1, 1]
◮ The estimate of π1 − π2 is ˆ
n1 − n21 n2 ◮ the estimate of the asymptotic standard error:
◮ The statistic for testing H0 : π1 = π2 vs. Ha : π1 = π2
◮ The CI is given by
◮ Definition
◮ Motivation: The difference between π1 = 0.010 and
◮ The estimate of r is
◮ The estimator converges to normality faster on the log scale. ◮ The estimator of log r is
◮ Delta method: If √n(ˆ
◮ Here β = π1 or π2 and f (β) = log(π1) or log(π1)
◮ The CI for log ˆ
◮ The CI for ˆ
◮ Odds in group 1:
◮ Interpretation: φ1 = 3 means a success is three times as likely
◮ Odds ratio:
◮ Interpretation: θ = 4 means the odds of success in group 1 are
◮ The estimate is
◮ log(ˆ
◮ An estimate of asymptotic standard error for log(ˆ
1 n11 + 1 n12
◮ The Wald CI for log ˆ
◮ Exponentiation of the endpoints provides a confidence interval
◮ A large relative risk does not imply large odds ratio ◮ From the definitions of relative risk and odds ratio, we have
◮ When probabilities π1 and π2 (the risk in each row group)are
◮ This means when relative risk is not directly estimable, e.g., in
◮ People are recruited based on lung cancer status, therefore
◮ Conditional probabilities P(X = i|Y = j) can be estimated ◮ Conditional probabilities P(Y = j|X = i) cannot be estimated ◮ Relative risk and difference of proportions cannot be estimated
◮ Odds can be estimated:
◮ Odds is irrelevant to the probability of being a smoker ◮ Odds ratio can also be estimated:
◮ Let Fn(x) be the CDF for Xn and F(x) be the CDF for X. It is
◮ It is said that Xn converges in probability to X, written
◮ Recall that Slutsky’s Theorem tells us that if some random
◮ Recall that the first-order Taylor approximation of a
i=2 g(i)(u)(x−u)i i!
∞
∞
◮ http://www.stat.rice.edu/~dobelman/notes_papers/math/
◮ https:
◮ http://www.stat.cmu.edu/~larry/=stat325.01/chapter5.pdf ◮ https://en.wikipedia.org/wiki/Slutsky%27s_theorem ◮ http://www.math.mcgill.ca/dstephens/OldCourses/556-2007/