SLIDE 1
On the Causal Interpretation of Race in Regressions Adjusting for - - PowerPoint PPT Presentation
On the Causal Interpretation of Race in Regressions Adjusting for - - PowerPoint PPT Presentation
On the Causal Interpretation of Race in Regressions Adjusting for Confounding and Mediating Variables Tyler J. VanderWeele Departments of Epidemiology and Biostatistics Harvard T.H. Chan School of Public Health Regressions with Race
SLIDE 2
SLIDE 3
Related Literature and Motivation
Current presentation is based on a paper (VanderWeele and Robinson, 2014) and work in progress (with John Jackson); related literature include: (1) Countless studies in sociology, social epidemiology, social policy, etc. (2) Position in the statistics causal inference literature that race is an immutable characteristic not allowing counterfactuals (Holland, 1986; Greiner and Rubin, 2011) (3) Exchange in AJE between Kaufman and Cooper (1999, 2000) and Krieger and Davey Smith (2000) on the meaning of adjusting for SES (4) Discussion of Berkman (2004) on reframing counterfactual questions in social epidemiology on interventions not involving race (5) Other recent work on the topic from a counterfactual perspective (Blank et al., 2004; Marcellesi, 2013; Sen and Wasow, 2015)
(6) Oaxaca-Blinder decomposition in economics (Blinder 1973; Oaxaca 1973) and mediation analysis (Pearl, 2001; Imai, 2010; VanderWeele, 2015)
SLIDE 4
Other Motivation
Analyses by Neal and Johnson (1996) and Fryer (2011), using NLSY data, indicate that for black-white racial inequalities in income, unemployment, incarceration, and self-reported health: Adjustment for a standardized test measure of educational achievement (AFQT) at ages 15-18, eliminate 72% of the gap in wages 75% of the gap in unemployment 69% of the gap in incarceration rates 100%+ of the gap in self-reported physical health How do we interpret this…? Is this the right analysis…? What do we learn from it…?
SLIDE 5
Definitions of Race
Results presented are purely formal For any given definition of race: The results relate: (i) the results of an analysis (under that definition)
to (ii) an interpretation (with respect to that same definition)
The results are applicable irrespective of how race is defined They are applicable if race is e.g. defined by
- self-reported
- genealogy, ancestry, genetic analysis etc.
The results are essentially agnostic to how race is defined The interpretation the results provide are always relative to the definition of race used in the analysis
SLIDE 6
Associational Interpretation
We might simply interpret the race coefficient, β1, in an associational or predictive manner Interpretation: For persons with the same value of covariates X (e.g. age; or age and childhood SES; or age and childhood SES; and educational attainment), but who differed in race (e.g. black versus white) what is the expected difference in outcomes? Assumptions: The regression model is correctly specified Covariates: The interpretation is relative to the covariates X But it is the same type predictive/associational interpretational irrespective of what the covariates are
SLIDE 7
Associational Interpretation
We might simply interpret the race coefficient, β1, in an associational or predictive manner
Use: The interpretation is straightforward (descriptive), but it is not causal and it is not clear how we would use it to reduce disparities
If we find an association with race we do not know if it is:
- Discrimination
- Physical or genetic characteristics
- Unequal educational or economic opportunities
- Common cause of race and the outcome
- Common cause a covariate X affected by race, and the
- utcome
SLIDE 8
Equalizing SES Distributions
Suppose instead we consider the race coefficient with and without control for individual and neighborhood SES in childhood
Y – outcome R – race variable
SES0 – SES in childhood
NSES0 – neighborhood SES in childhood H – complex historical process giving rise to associations
R Y H SES0 NSES0
SLIDE 9
Equalizing SES Distributions
We fit the model twice: Once with X empty; once with X = (SES0 ,NSES0 ) Interpretation:
When X is empty, the race coefficient is just the average difference in
- utcomes comparing black and white individuals
When X = (SES0, NSES0), the race coefficient is the inequality that would remain if the distributions of individual and neighborhood SES during childhood, in the black population had been set equal to that of the white population The difference between the two is how much of the inequality we could eliminate by equalizing the SES distributions
SLIDE 10
Equalizing SES Distributions
We fit the model twice: Once with X empty; once with X = (SES0 ,NSES0) Covariates: If we include a covariate in X that is one of the components of race (e.g. childhood individual or neighborhood SES) then the race coefficient only picks up the remaining components
SLIDE 11
Equalizing SES Distributions
Let E[Y|R=1] denote the average outcome for black individuals Let E[Y|R=0] denote the average outcomes for white individuals Let Yx be the outcome that would have been observed if the SES variable(s) had been set to x G0 be a random draw from the SES variable(s) from the distribution in the white population E[Y|R=1] - E[YG0|R=1] is the portion of the inequality eliminated if
we equalized white-black SES distributions
E[YG0|R=1] - E[Y|R=0] is the inequality that would remain if we equalized white-black SES distributions Under assumptions described below: E[YG0|R=1] = Σx E[Y|R=1,x] P(x|R=0)
SLIDE 12
Equalizing SES Distributions
Note: In the interpretation here we are not talking about the effect of race, but really about the effects of SES How much the racial inequality would be reduced by intervening on early childhood SES
Assumptions:
The effects of childhood individual and neighborhood SES on the outcome Y should be unconfounded conditional on the race variable and other covariates (e.g. may want to control for C=age) i.e. Yx | | X | (R,C) Caveat: In any given study we will only have specific measures of childhood individual and neighborhood SES (so we see how the inequality would be reduced if we equalized our actual SES measures)
R Y H SES0 NSES0 Age
SLIDE 13
Equalizing SES Distributions
Covariates: What if we control for covariates in X that may themselves be affected by race e.g. years of education?
SES1 H R SES0 NSES0 Y
SLIDE 14
Equalizing SES Distributions
We could again fit the model twice: Once with X = (SES0, NSES0) Once with (SES0, NSES0, SES1) Interpretation: When X= (SES0, NSES0), the race coefficient is the racial inequality for those with the same childhood individual and neighborhood SES When we control for (SES0, NSES0, SES1) the race coefficient is the inequality for those with a given (SES0, NSES0) that would remain if the distribution of adult SES1 in the black population had been set equal to that of the white population The difference between the two is how much of the racial inequality we could eliminate (for those of the same childhood SES) by equalizing the adult SES1 distributions
SLIDE 15
Equalizing SES Distributions
We could again fit the model twice: Once with X = (SES0, NSES0) Once with (SES0, NSES0, SES1) Assumptions: The assumption required is that effects of the adult measure M=SES1
- n outcome Y are unconfounded conditional on race, covariates C, and
X=childhood individual and neighborhood SES i.e. Ym | | M | (R,C,X)
It is thus important to control for childhood SES (otherwise this is confounding the effect; adult SES then picks up childhood SES effects too)
Note again we are not interpreting the race coefficient itself causally Our causal intervention is on adult SES1
SLIDE 16
Equalizing SES Distributions
We could again fit the model twice: Once with X = (SES0, NSES0) Once with (SES0, NSES0, SES1) Method: Adding adult SES1 is somewhat analogous to mediation Analytic approaches in causal mediation analysis literature (Pearl, 2001; Imai et al., 2010; Valeri and VanderWeele, 2013; VanderWeele, 2015) in fact empirically coincide with analytic approach described here Assumptions differ, but the analytic methods are the same
SLIDE 17
Equalizing SES Distributions
We could again fit the model twice: Once with X = (SES0, NSES0) Once with X = (SES0, NSES0, SES1) Method: Adding adult SES1 is somewhat analogous to mediation This also allows us also to use more complex models We can do this non-parametrically
E.g. we could include interaction (cf. VanderWeele and Vansteelandt, 2009; Imai et al., 2010; VanderWeele, 2015) between race and adult SES1 We can still estimate “mediated/reduced” and “direct effect/remainder” inequality measures Reduction: Σm E[Y|R=1,x,m,c] {P(m|R=0,x,c) - P(m|R=0,x,c)} Remainder: Σm {E[Y|R=1,x,m,c] - E[Y|R=0,x,m,c]} P(m|R=0,x,c)
SLIDE 18
Equalizing SES Distributions
We could again fit the model twice: Once with X = (SES0, NSES0) Once with X = (SES0, NSES0, SES1) Caveat: We will only have specific measures of adult SES The interpretation is thus how the inequality would be reduced if we equalized our actual SES measure(s) Use: In fact, this caveat may actually be useful in informing how to reduce racial inequalities We might compare several different SES measures, or measures of education, to see which might most reduce racial inequalities
SLIDE 19
Illustration: Add Health
Outcome: Income (follow-up visit at age ~24-32) “Mediator”: Verbal Test Scores at baseline (age 15.5 on average) All regressions adjusted for age:
95% Confidence Interval Initial Disparity:
- 8836 (-11121, -6551)
P<0.001 Adjusting for Mat Ed and NSES: -7132 (-9100, -5165) P<0.001 Adjusted for Test Scores:
- 4842 (- 6854, -2830)
P<0.001 Mediated by Test Scores:
- 2290 (- 3117, -1463)
P<0.001 About 32% of the inequality in income would be eliminated by equalizing adolescent verbal test scores
SLIDE 20
Illustration: Add Health
Outcome: Income (follow-up visit at age ~24-32) “Mediator”: Years Education All regressions adjusted for age:
95% Confidence Interval Initial Disparity:
- 8836 (-11121, -6551)
P<0.001 Adjusting for Mat Ed and NSES: -7132 (-9100, -5165) P<0.001 Adjusted for Years Education:
- 6620 (-8262, -4977)
P<0.001 Mediated by Years Education:
- 539
(-1256, 177) P=0.14 Only about 8% of the inequality in income would be eliminated by equalizing years of education
SLIDE 21
Equalizing SES Distributions
What if we only equalize the distribution of SES1? Let X=(SES0, NSES0), M=SES1, C=other covariates (e.g. age) The formulas become more complicated (and do not correspond to any standard comparison employed in the literature): Equalizing the distribution of M=SES1 blocks some of the effect of X = The portion of the effect of X on Y that is not through M = How much of the disparity is reduced by equalizing X alone This approach can also be generalized to non-linear models
SLIDE 22
Illustration: NLSY
Analyses by Neal and Johnson (1996) and Fryer (2011) using the National Longitudinal Survey of Youth (NLSY) indicate for racial gaps in income, unemployment, incarceration, and self-reported health between white and black men: Adjustment for a standardized test measure of educational achievement (AFQT), more comprehensive than the one we used, eliminate:
72% of the gap in wages
But… these analyses did not control for childhood SES However, some measures of early SES were available in their data: Maternal education, net family income in childhood, poverty level
SLIDE 23
Illustration: NLSY
We use National Longitudinal Survey of Youth (NLSY 1979) data Examine how inequality changes after adjustment for test scores Replicated analyses by Fryer (2011) for: Wage, Unemployment, Incarceration, Health In all cases numbers were very close (<<5% difference) Also included analyses that made adjustment for early SES measures: maternal education, net family income in childhood, poverty level To get some sense of how much of the inequality could be eliminated by equalizing either early SES circumstances or later education
SLIDE 24
Illustration: NLSY Income
Outcome: Income (log wages) Fryer reported: 72% of the gap in wages is eliminated controlling for scores Initial Disparity:
- 0.41 (-0.50, -0.33)
Adjusting for Early SES:
- 0.28 (-0.38, -0.18)
Adjusted also for Test Scores:
- 0.10
(-0.19, 0.01) About 32% = (0.13/0.41) is eliminated by equalizing early SES About 76% = (0.31/0.41) is eliminated by equalizing early SES and test scores For those with the same early SES, equalizing test scores eliminates (0.18/0.28)=64% of the disparity If we only equalize test scores without equalizing early SES the reduction is: Reducing the disparity by (0.27/0.41) = 66%
SLIDE 25
Illustration: NLSY Income
Note: (1) In the Fryer analysis early SES was not adjusted for
- Estimate does not correspond to any of the causal interpretations
(2) If what is of primary interest is how much of the disparity would be eliminated if we equalize only test score distributions then…
- We need to use the slightly more complicated formulas for this
Fryer reported: 72% of the gap in wages is eliminated controlling for test scores Here the estimates themselves don’t seem that different Does this all ever matter…?
SLIDE 26
Illustration: NLSY Health
Outcome: Health (inequality measured in continuous score) Fryer reported: 100%+ of the gap in self-reported health is eliminated Initial Disparity:
- 0.14
(-0.24, -0.04) Adjusting for Early SES:
- 0.02 (-0.13, 0.09)
Adjusted for Test Scores:
- About 86% is eliminated by equalizing early SES
The remainder is eliminated by equalizing test scores as well About 50% of the effect of early SES on health is blocked by test scores Which would give approximately -.02 + 50%*(-0.12) = -0.08 This would still perhaps reduce the disparity by slightly over 50% But equalizing test scores alone would be unlikely to completely eliminate the disparity
SLIDE 27
Limitations
(1) We are assuming unconfoundedness of our SES measures (2) Various SES measures are all correlated It may be difficult to determine which potential interventions are more important (3) May not often have sufficiently rich temporal data on various SES or educational achievement measures to determine when interventions are most effective in reducing inequalities? (4) Any actual practical intervention will likely not correspond to whatever it is we are measuring with our SES measures Might this still be helpful?
SLIDE 28
Informing Interventions
(1) The analyses can help generate hypotheses about what might be most effective (2) We don’t know whether our effect estimates are reasonable until we attempt to intervene / run a trial (3) But some decision has to be made as to where to intervene first Analyses can inform this decision We can choose what looks most effective (4) We must of course also consider what we can actually practically intervene on Can we change the SES target? This is perhaps the best we can do
SLIDE 29
Concluding Remarks
Ø Regressions with race can be interpreted in different ways under different assumptions Ø It is important to be clear about: What is the interpretation that is being given? What assumptions are being made? How the analyses are to be used? Ø Arguably, an interpretation which considers the extent to which a racial inequality can be reduced is of greatest use
Ø Analytically this is similar to the Oaxaca-Blinder decomposition in economics (Blinder 1973; Oaxaca 1973), which focuses more on description (discrimination) rather than causal interpretation; when interpreted causally (Fortin, 2011) it has been wrt race (or gender, or union membership) not equalizing covariates (SES)
SLIDE 30
Concluding Remarks
Ø Without careful definitions, we may not estimate the right quantities Ø When interpreted causally with respect to SES/education, such analyses can inform where best to intervene Careful analysis, and interpretation, may lead to better decisions on where to try to intervene first Ø Reducing disparities via education seems promising but we need to be realistic on what this can accomplish if early SES context is left unchanged
SLIDE 31
31
OXFORD UNIVERSITY PRESS
Explanation in Causal Inference Methods for Mediation and Interaction
2015 │ Hardcover│ ISBN: 9780199325870 Available on Amazon or from Oxford University Press