On the Causal Interpretation of Race in Regressions Adjusting for - - PowerPoint PPT Presentation

on the causal interpretation of race in regressions
SMART_READER_LITE
LIVE PREVIEW

On the Causal Interpretation of Race in Regressions Adjusting for - - PowerPoint PPT Presentation

On the Causal Interpretation of Race in Regressions Adjusting for Confounding and Mediating Variables Tyler J. VanderWeele Departments of Epidemiology and Biostatistics Harvard T.H. Chan School of Public Health Regressions with Race


slide-1
SLIDE 1

On the Causal Interpretation of Race in Regressions Adjusting for Confounding and Mediating Variables

Tyler J. VanderWeele Departments of Epidemiology and Biostatistics Harvard T.H. Chan School of Public Health

slide-2
SLIDE 2

Regressions with Race

Researchers often fit regression models of some outcome (Y) on Race (R) and other covariates (X): (1) What are different interpretation of the race coefficient? (2) Under what assumptions do different interpretations hold? (3) Does it matter what is included in the covariates X? Does it matter if some covariates X are affected by race? (4) How might any of this be useful in trying to reduce disparities?

slide-3
SLIDE 3

Related Literature and Motivation

Current presentation is based on a paper (VanderWeele and Robinson, 2014) and work in progress (with John Jackson); related literature include: (1) Countless studies in sociology, social epidemiology, social policy, etc. (2) Position in the statistics causal inference literature that race is an immutable characteristic not allowing counterfactuals (Holland, 1986; Greiner and Rubin, 2011) (3) Exchange in AJE between Kaufman and Cooper (1999, 2000) and Krieger and Davey Smith (2000) on the meaning of adjusting for SES (4) Discussion of Berkman (2004) on reframing counterfactual questions in social epidemiology on interventions not involving race (5) Other recent work on the topic from a counterfactual perspective (Blank et al., 2004; Marcellesi, 2013; Sen and Wasow, 2015)

(6) Oaxaca-Blinder decomposition in economics (Blinder 1973; Oaxaca 1973) and mediation analysis (Pearl, 2001; Imai, 2010; VanderWeele, 2015)

slide-4
SLIDE 4

Other Motivation

Analyses by Neal and Johnson (1996) and Fryer (2011), using NLSY data, indicate that for black-white racial inequalities in income, unemployment, incarceration, and self-reported health: Adjustment for a standardized test measure of educational achievement (AFQT) at ages 15-18, eliminate 72% of the gap in wages 75% of the gap in unemployment 69% of the gap in incarceration rates 100%+ of the gap in self-reported physical health How do we interpret this…? Is this the right analysis…? What do we learn from it…?

slide-5
SLIDE 5

Definitions of Race

Results presented are purely formal For any given definition of race: The results relate: (i) the results of an analysis (under that definition)

to (ii) an interpretation (with respect to that same definition)

The results are applicable irrespective of how race is defined They are applicable if race is e.g. defined by

  • self-reported
  • genealogy, ancestry, genetic analysis etc.

The results are essentially agnostic to how race is defined The interpretation the results provide are always relative to the definition of race used in the analysis

slide-6
SLIDE 6

Associational Interpretation

We might simply interpret the race coefficient, β1, in an associational or predictive manner Interpretation: For persons with the same value of covariates X (e.g. age; or age and childhood SES; or age and childhood SES; and educational attainment), but who differed in race (e.g. black versus white) what is the expected difference in outcomes? Assumptions: The regression model is correctly specified Covariates: The interpretation is relative to the covariates X But it is the same type predictive/associational interpretational irrespective of what the covariates are

slide-7
SLIDE 7

Associational Interpretation

We might simply interpret the race coefficient, β1, in an associational or predictive manner

Use: The interpretation is straightforward (descriptive), but it is not causal and it is not clear how we would use it to reduce disparities

If we find an association with race we do not know if it is:

  • Discrimination
  • Physical or genetic characteristics
  • Unequal educational or economic opportunities
  • Common cause of race and the outcome
  • Common cause a covariate X affected by race, and the
  • utcome
slide-8
SLIDE 8

Equalizing SES Distributions

Suppose instead we consider the race coefficient with and without control for individual and neighborhood SES in childhood

Y – outcome R – race variable

SES0 – SES in childhood

NSES0 – neighborhood SES in childhood H – complex historical process giving rise to associations

R Y H SES0 NSES0

slide-9
SLIDE 9

Equalizing SES Distributions

We fit the model twice: Once with X empty; once with X = (SES0 ,NSES0 ) Interpretation:

When X is empty, the race coefficient is just the average difference in

  • utcomes comparing black and white individuals

When X = (SES0, NSES0), the race coefficient is the inequality that would remain if the distributions of individual and neighborhood SES during childhood, in the black population had been set equal to that of the white population The difference between the two is how much of the inequality we could eliminate by equalizing the SES distributions

slide-10
SLIDE 10

Equalizing SES Distributions

We fit the model twice: Once with X empty; once with X = (SES0 ,NSES0) Covariates: If we include a covariate in X that is one of the components of race (e.g. childhood individual or neighborhood SES) then the race coefficient only picks up the remaining components

slide-11
SLIDE 11

Equalizing SES Distributions

Let E[Y|R=1] denote the average outcome for black individuals Let E[Y|R=0] denote the average outcomes for white individuals Let Yx be the outcome that would have been observed if the SES variable(s) had been set to x G0 be a random draw from the SES variable(s) from the distribution in the white population E[Y|R=1] - E[YG0|R=1] is the portion of the inequality eliminated if

we equalized white-black SES distributions

E[YG0|R=1] - E[Y|R=0] is the inequality that would remain if we equalized white-black SES distributions Under assumptions described below: E[YG0|R=1] = Σx E[Y|R=1,x] P(x|R=0)

slide-12
SLIDE 12

Equalizing SES Distributions

Note: In the interpretation here we are not talking about the effect of race, but really about the effects of SES How much the racial inequality would be reduced by intervening on early childhood SES

Assumptions:

The effects of childhood individual and neighborhood SES on the outcome Y should be unconfounded conditional on the race variable and other covariates (e.g. may want to control for C=age) i.e. Yx | | X | (R,C) Caveat: In any given study we will only have specific measures of childhood individual and neighborhood SES (so we see how the inequality would be reduced if we equalized our actual SES measures)

R Y H SES0 NSES0 Age

slide-13
SLIDE 13

Equalizing SES Distributions

Covariates: What if we control for covariates in X that may themselves be affected by race e.g. years of education?

SES1 H R SES0 NSES0 Y

slide-14
SLIDE 14

Equalizing SES Distributions

We could again fit the model twice: Once with X = (SES0, NSES0) Once with (SES0, NSES0, SES1) Interpretation: When X= (SES0, NSES0), the race coefficient is the racial inequality for those with the same childhood individual and neighborhood SES When we control for (SES0, NSES0, SES1) the race coefficient is the inequality for those with a given (SES0, NSES0) that would remain if the distribution of adult SES1 in the black population had been set equal to that of the white population The difference between the two is how much of the racial inequality we could eliminate (for those of the same childhood SES) by equalizing the adult SES1 distributions

slide-15
SLIDE 15

Equalizing SES Distributions

We could again fit the model twice: Once with X = (SES0, NSES0) Once with (SES0, NSES0, SES1) Assumptions: The assumption required is that effects of the adult measure M=SES1

  • n outcome Y are unconfounded conditional on race, covariates C, and

X=childhood individual and neighborhood SES i.e. Ym | | M | (R,C,X)

It is thus important to control for childhood SES (otherwise this is confounding the effect; adult SES then picks up childhood SES effects too)

Note again we are not interpreting the race coefficient itself causally Our causal intervention is on adult SES1

slide-16
SLIDE 16

Equalizing SES Distributions

We could again fit the model twice: Once with X = (SES0, NSES0) Once with (SES0, NSES0, SES1) Method: Adding adult SES1 is somewhat analogous to mediation Analytic approaches in causal mediation analysis literature (Pearl, 2001; Imai et al., 2010; Valeri and VanderWeele, 2013; VanderWeele, 2015) in fact empirically coincide with analytic approach described here Assumptions differ, but the analytic methods are the same

slide-17
SLIDE 17

Equalizing SES Distributions

We could again fit the model twice: Once with X = (SES0, NSES0) Once with X = (SES0, NSES0, SES1) Method: Adding adult SES1 is somewhat analogous to mediation This also allows us also to use more complex models We can do this non-parametrically

E.g. we could include interaction (cf. VanderWeele and Vansteelandt, 2009; Imai et al., 2010; VanderWeele, 2015) between race and adult SES1 We can still estimate “mediated/reduced” and “direct effect/remainder” inequality measures Reduction: Σm E[Y|R=1,x,m,c] {P(m|R=0,x,c) - P(m|R=0,x,c)} Remainder: Σm {E[Y|R=1,x,m,c] - E[Y|R=0,x,m,c]} P(m|R=0,x,c)

slide-18
SLIDE 18

Equalizing SES Distributions

We could again fit the model twice: Once with X = (SES0, NSES0) Once with X = (SES0, NSES0, SES1) Caveat: We will only have specific measures of adult SES The interpretation is thus how the inequality would be reduced if we equalized our actual SES measure(s) Use: In fact, this caveat may actually be useful in informing how to reduce racial inequalities We might compare several different SES measures, or measures of education, to see which might most reduce racial inequalities

slide-19
SLIDE 19

Illustration: Add Health

Outcome: Income (follow-up visit at age ~24-32) “Mediator”: Verbal Test Scores at baseline (age 15.5 on average) All regressions adjusted for age:

95% Confidence Interval Initial Disparity:

  • 8836 (-11121, -6551)

P<0.001 Adjusting for Mat Ed and NSES: -7132 (-9100, -5165) P<0.001 Adjusted for Test Scores:

  • 4842 (- 6854, -2830)

P<0.001 Mediated by Test Scores:

  • 2290 (- 3117, -1463)

P<0.001 About 32% of the inequality in income would be eliminated by equalizing adolescent verbal test scores

slide-20
SLIDE 20

Illustration: Add Health

Outcome: Income (follow-up visit at age ~24-32) “Mediator”: Years Education All regressions adjusted for age:

95% Confidence Interval Initial Disparity:

  • 8836 (-11121, -6551)

P<0.001 Adjusting for Mat Ed and NSES: -7132 (-9100, -5165) P<0.001 Adjusted for Years Education:

  • 6620 (-8262, -4977)

P<0.001 Mediated by Years Education:

  • 539

(-1256, 177) P=0.14 Only about 8% of the inequality in income would be eliminated by equalizing years of education

slide-21
SLIDE 21

Equalizing SES Distributions

What if we only equalize the distribution of SES1? Let X=(SES0, NSES0), M=SES1, C=other covariates (e.g. age) The formulas become more complicated (and do not correspond to any standard comparison employed in the literature): Equalizing the distribution of M=SES1 blocks some of the effect of X = The portion of the effect of X on Y that is not through M = How much of the disparity is reduced by equalizing X alone This approach can also be generalized to non-linear models

slide-22
SLIDE 22

Illustration: NLSY

Analyses by Neal and Johnson (1996) and Fryer (2011) using the National Longitudinal Survey of Youth (NLSY) indicate for racial gaps in income, unemployment, incarceration, and self-reported health between white and black men: Adjustment for a standardized test measure of educational achievement (AFQT), more comprehensive than the one we used, eliminate:

72% of the gap in wages

But… these analyses did not control for childhood SES However, some measures of early SES were available in their data: Maternal education, net family income in childhood, poverty level

slide-23
SLIDE 23

Illustration: NLSY

We use National Longitudinal Survey of Youth (NLSY 1979) data Examine how inequality changes after adjustment for test scores Replicated analyses by Fryer (2011) for: Wage, Unemployment, Incarceration, Health In all cases numbers were very close (<<5% difference) Also included analyses that made adjustment for early SES measures: maternal education, net family income in childhood, poverty level To get some sense of how much of the inequality could be eliminated by equalizing either early SES circumstances or later education

slide-24
SLIDE 24

Illustration: NLSY Income

Outcome: Income (log wages) Fryer reported: 72% of the gap in wages is eliminated controlling for scores Initial Disparity:

  • 0.41 (-0.50, -0.33)

Adjusting for Early SES:

  • 0.28 (-0.38, -0.18)

Adjusted also for Test Scores:

  • 0.10

(-0.19, 0.01) About 32% = (0.13/0.41) is eliminated by equalizing early SES About 76% = (0.31/0.41) is eliminated by equalizing early SES and test scores For those with the same early SES, equalizing test scores eliminates (0.18/0.28)=64% of the disparity If we only equalize test scores without equalizing early SES the reduction is: Reducing the disparity by (0.27/0.41) = 66%

slide-25
SLIDE 25

Illustration: NLSY Income

Note: (1) In the Fryer analysis early SES was not adjusted for

  • Estimate does not correspond to any of the causal interpretations

(2) If what is of primary interest is how much of the disparity would be eliminated if we equalize only test score distributions then…

  • We need to use the slightly more complicated formulas for this

Fryer reported: 72% of the gap in wages is eliminated controlling for test scores Here the estimates themselves don’t seem that different Does this all ever matter…?

slide-26
SLIDE 26

Illustration: NLSY Health

Outcome: Health (inequality measured in continuous score) Fryer reported: 100%+ of the gap in self-reported health is eliminated Initial Disparity:

  • 0.14

(-0.24, -0.04) Adjusting for Early SES:

  • 0.02 (-0.13, 0.09)

Adjusted for Test Scores:

  • About 86% is eliminated by equalizing early SES

The remainder is eliminated by equalizing test scores as well About 50% of the effect of early SES on health is blocked by test scores Which would give approximately -.02 + 50%*(-0.12) = -0.08 This would still perhaps reduce the disparity by slightly over 50% But equalizing test scores alone would be unlikely to completely eliminate the disparity

slide-27
SLIDE 27

Limitations

(1) We are assuming unconfoundedness of our SES measures (2) Various SES measures are all correlated It may be difficult to determine which potential interventions are more important (3) May not often have sufficiently rich temporal data on various SES or educational achievement measures to determine when interventions are most effective in reducing inequalities? (4) Any actual practical intervention will likely not correspond to whatever it is we are measuring with our SES measures Might this still be helpful?

slide-28
SLIDE 28

Informing Interventions

(1) The analyses can help generate hypotheses about what might be most effective (2) We don’t know whether our effect estimates are reasonable until we attempt to intervene / run a trial (3) But some decision has to be made as to where to intervene first Analyses can inform this decision We can choose what looks most effective (4) We must of course also consider what we can actually practically intervene on Can we change the SES target? This is perhaps the best we can do

slide-29
SLIDE 29

Concluding Remarks

Ø Regressions with race can be interpreted in different ways under different assumptions Ø It is important to be clear about: What is the interpretation that is being given? What assumptions are being made? How the analyses are to be used? Ø Arguably, an interpretation which considers the extent to which a racial inequality can be reduced is of greatest use

Ø Analytically this is similar to the Oaxaca-Blinder decomposition in economics (Blinder 1973; Oaxaca 1973), which focuses more on description (discrimination) rather than causal interpretation; when interpreted causally (Fortin, 2011) it has been wrt race (or gender, or union membership) not equalizing covariates (SES)

slide-30
SLIDE 30

Concluding Remarks

Ø Without careful definitions, we may not estimate the right quantities Ø When interpreted causally with respect to SES/education, such analyses can inform where best to intervene Careful analysis, and interpretation, may lead to better decisions on where to try to intervene first Ø Reducing disparities via education seems promising but we need to be realistic on what this can accomplish if early SES context is left unchanged

slide-31
SLIDE 31

31

OXFORD UNIVERSITY PRESS

Explanation in Causal Inference Methods for Mediation and Interaction

2015 │ Hardcover│ ISBN: 9780199325870 Available on Amazon or from Oxford University Press