Independent and conditionally independent counterfactuals Marcin - - PowerPoint PPT Presentation

independent and conditionally independent counterfactuals
SMART_READER_LITE
LIVE PREVIEW

Independent and conditionally independent counterfactuals Marcin - - PowerPoint PPT Presentation

Independent and conditionally independent counterfactuals Marcin Wolski European Investment Bank M.Wolski@eib.org CRoNoS & MDA 2019 Limassol April 15, 2019 Views expressed in this study are those of the author only, and do not


slide-1
SLIDE 1

Independent and conditionally independent counterfactuals

Marcin Wolski

European Investment Bank M.Wolski@eib.org

CRoNoS & MDA 2019 Limassol April 15, 2019

Views expressed in this study are those of the author only, and do not necessarily represent the position of the European Investment Bank. Marcin Wolski (EIB) Independent counterfactuals April 15, 2019 1 / 24

slide-2
SLIDE 2

Overview

1

Introduction Motivation

2

Framework Unconditional dependence Conditional setup

3

Numerics (unconditional) Monte Carlo setup (unconditional)

4

Empirical application Understanding the (Granger) dependence in the US grain market

5

Conclusions The main take-aways References

6

Supplementary materials

Marcin Wolski (EIB) Independent counterfactuals April 15, 2019 2 / 24

slide-3
SLIDE 3

Counterfactuals

The goal of the counterfactual analysis is the comparison between what actually happened to what would have happened under an alternative scenario. How to define alternative scenarios? Exogenous policy change (Rothe, 2010), treatment group (Chernozhukov et al., 2013), filter the dependence between variables (this paper).

Marcin Wolski (EIB) Independent counterfactuals April 15, 2019 3 / 24

slide-4
SLIDE 4

Quick literature overview

The vast majority of impact evaluation studies focus on parametric models treatment effect models (Heckman, 1978), propensity score matching (Rosenbaum and Rubin, 1983), matching estimators models (Abadie and Imbens, 2002), OLS, diff-in-diff estimators (Gertler et al., 2010). Non-parametric methods propensity score through a nonparametric regression model (Heckman, et al. (1997, 1998)), non-parametric/parametric method (Chernozhukov et al., 2013)

under an assumption called conditional exogeneity counterfactual effects can be interpreted as causal effects.

fully nonparametric approach

total effects (Rothe, 2010), partial effects (Rothe, 2012).

Marcin Wolski (EIB) Independent counterfactuals April 15, 2019 4 / 24

slide-5
SLIDE 5

This paper

Theoretical contribution provide a fully non-parametric dependence filtering framework

unconditional dependence, conditional dependence,

consistent inference methods

Gaussian and bootstrap confidence bounds,

utilize smooth estimates (improved MSE performance), numerical verification. Empirical contribution build a link to hypothesis testing, understand the dependence structure in the US grain market.

Marcin Wolski (EIB) Independent counterfactuals April 15, 2019 5 / 24

slide-6
SLIDE 6

Unconditional setup (I)

General assumptions Y outcome variable (1d) with CDF given by FY (y), X covariate (vector) with CDF/PDF given by FX(x) and fX(x), i.i.d sample {(Yi, Xi) : i = 1, ..., n}. Variable dependence FY |X(y|x) = FY (y) for some x, y. The dependence filtering idea counterfactual outcome variable Y ′ with realizations y′, FY ′|X(y|x) = FY (y) for all x, y, estimate realizations y′.

Marcin Wolski (EIB) Independent counterfactuals April 15, 2019 6 / 24

slide-7
SLIDE 7

Unconditional setup (II)

Filtering through data sharpening (Hall and Minote, 2002) assume Y ′ = φ(Y , X = x) ≡ φ(Y ), so that y′ = φ(y, x) ≡ φ(y) φ : RdX +1 → R and localy invertible. Then the plug-in estimator of the joint density becomes ˆ FY ′|X(y, x) = n−1 n

i=1 ¯

KHY (y − φ(Yi)) KHX (x − Xi) n−1 n

i=1 KHX (x − Xi)

= FY (y), (1) where HHY and HHX are bandwidth matrices, ¯ KHY is a cumulative and KHX is a scaled (multivariate) kernel function satisfying the standard regularity conditions (Wand and Jones, 1995).

Marcin Wolski (EIB) Independent counterfactuals April 15, 2019 7 / 24

slide-8
SLIDE 8

Unconditional setup (III)

Theorem

Suppose that we have an i.i.d. sample {(Yi, Xi) : i = 1, ..., n} from a continuous distribution with well-defined and sufficiently smooth PDFs. Then, the counterfactual random variable Y ′ with realizations y′, satisfying the independence condition given by FY ′|X(y|x) = FY (y), follows FY (y′) = FY |X(y|x), (2) where FY |X is the conditional distribution function of Y given X = x, for any y and x in the support of (Y , X).

Marcin Wolski (EIB) Independent counterfactuals April 15, 2019 8 / 24

slide-9
SLIDE 9

Unconditional setup (estimation)

The estimator of the independent counterfactuals ˆ y′ = ˆ F −1

Y (ˆ

FY |X(y|x)). (3)

Theorem

Suppose that Assumptions 1-4 hold (see Appendix). Then √n

  • ˆ

y′ − y′

d

− → N(0, σ2), (4) conditional on the data, where σ2 is given by

σ2 = FY |X(y|x)(1 − FY |X(y|x)) fY

  • F −1

Y (FY |X(y|x))

  • +
  • K(u)2du

fX(x)ΠdX

j=1hjXY

FY |X(y|x)(1 − FY |X(y|x)) fY

  • F −1

Y (FY |X(y|x))

  • .

(5)

Marcin Wolski (EIB) Independent counterfactuals April 15, 2019 9 / 24

slide-10
SLIDE 10

Unconditional setup (consistency)

Consistency of ˆ y′ achieved under uniform convergence of estimators satisfied for 1-dimensional X, higher dimensions require lower estimate bias (higher order kernels)

Assumption (Bandwidths of conditional CDF)

As n → ∞,

(i)

n1/2hY /(log n)1/2 + n1/2hr

Y → 0,

(ii)

n1/2hX/ log n + n1/2hr

X → 0,

where r is the kernel order.

Marcin Wolski (EIB) Independent counterfactuals April 15, 2019 10 / 24

slide-11
SLIDE 11

Unconditional setup (example)

Consider a stylized mean-dependent process X ∼ N(0, 1), yt = axt +

  • 1 − a2εt,

a ∈ (0, 1). (6) Independent counterfactuals are equal to the error term y′

t ≡ φ(yt, xt) = F −1 Y (FY |X(yt|xt))

= √ 2erf−1

  • erf

yt − axt √ 2 − 2a2

  • = yt − axt

√ 1 − a2 = εt. (7) More generally, under error exogeneity for nonseparable model yt = m(xt, εt), y′

t = F −1 Y (Fε(εt)).

(8)

Marcin Wolski (EIB) Independent counterfactuals April 15, 2019 11 / 24

slide-12
SLIDE 12

Conditional setup(I)

General assumptions Y outcome variable with CDF given by FY (y), Q variable(s) with CDF/PDF given by FQ(q) and fq(q), X covariate (vector) with CDF/PDF given by FX(x) and fX(x), i.i.d sample {(Yi, Qi, Xi) : i = 1, ..., n}. Variable dependence (CDF version of Diks and Panchenko, 2006) FY |Q,X(y|q, x) = FY |Q(y|q) for some y, q, x. The filtering idea counterfactual outcome variable Y ′′ with realizations y′′, FY ′′|Q,X(y|q, x) = FY |Q(y|q) for all y, q, x, estimate realizations y′′.

Marcin Wolski (EIB) Independent counterfactuals April 15, 2019 12 / 24

slide-13
SLIDE 13

Conditional setup (II)

Theorem

Suppose that we have an i.i.d. sample {(Yi, Qi, Xi) : i = 1, ..., n} from a continuous distribution with well-defined and sufficiently smooth PDFs. Then, the counterfactual random variable Y ′′, satisfying the conditional independence condition given by FY ′′|Q,X(y|q, x) = FY |Q(y|q), follows asymptotically FY |Q(y′′|q) = FY |Q,X(y|q, x). (9)

Marcin Wolski (EIB) Independent counterfactuals April 15, 2019 13 / 24

slide-14
SLIDE 14

Monte Carlo setup

Process specification (Diks and Wolski (2016)) Xi ∼ N (0, 1) , Yi ∼ N

  • 0, c + aX 2

i

  • ,

(10) with c > 0 and 1 > a > 0. Filtering Mean Squared Error (MSE) is given by MSE(ˆ y′) = n−1

n

  • i=1
  • ˆ

F −1

Y (ˆ

F −i

Y |X(y|x)) − F −1 Y (FY |X(y|x))

2 . Technicalities compare step-wise and smooth kernel estimators (normal-scale, process-driven, LS-CV bandwidths) 1000 replications.

Marcin Wolski (EIB) Independent counterfactuals April 15, 2019 14 / 24

slide-15
SLIDE 15

MSE performance

Table: Median MSE estimates of independent counterfactuals.

Bandwidth selector n=50 n=100 n=200 n=500 n=1000 no smoothing 0.584 0.406 0.274 0.169 0.107 smoothing 0.292 0.232 0.178 0.116 0.080

Notes: Medians taken over 1000 Monte Carlo results for the ARCH process. Band- width selectors are chosen as: ‘no smoothing’ for step-wise estimators and ‘smooth- ing’ for normal-scale bandwidth selector.

Marcin Wolski (EIB) Independent counterfactuals April 15, 2019 15 / 24

slide-16
SLIDE 16

Granger causality in the US grain market

Null hypothesis {Xt} is not Granger causing {Yt}. Diks & Wolski (2016) framework Yt+1|(Xt, Yt) ∼ Yt+1|Yt, test statistic (implication of the null) q ≡ E [fX,Y ,Z(X, Y , Z)fY (Y ) − fX,Y (X, Y )fY ,Z(Y , Z)] = 0. Conditionally independent counterfactuals framework FY ′′

t+1|Yt,Xt(yt+1|yt, xt) = FYt+1|Yt(yt+1|yt) ,

test statistic z ≡ ˆ y′′

t+1 − yt+1

ˆ σ .

Marcin Wolski (EIB) Independent counterfactuals April 15, 2019 16 / 24

slide-17
SLIDE 17

Granger causality in the US grain market

Table: US grain market results (Diks & Wolski, 2016).

Variables Linear Granger Causality Nonlinear Granger Causality (N) X Y Raw data VAR residuals Raw data VAR residuals X→Y Y→X X→Y Y→X X→Y Y→X X→Y Y→X Corn Wheat *** *** *** *** Corn Beans * Beans Wheat

Notes: causality results for the pairwise relations of the log returns on the US grain market. (*), (**), (***) denote statistical significance at 10%, 5% and 1%. Period: 09/01/2010–03/06/2013. Nonlinear tests are performed on standardized data, transformed to (N)ormal marginals. The number of lags is lX = lY = 1 from the Bayesian Information Criterion.

Marcin Wolski (EIB) Independent counterfactuals April 15, 2019 17 / 24

slide-18
SLIDE 18

Granger causality in the US grain market

Figure: Wheat counterfactuals independent from corn (left) and beans (right).

−4 −2 2 4 0.0 0.1 0.2 0.3 0.4 Range Density

P−value 0−0.005 0.005−0.015 0.015−0.025 0.025−0.035 0.035−0.045 0.045−0.055 0.055−0.065 0.065−0.075 0.075−0.085 0.085−0.095 0.095−0.1

−4 −2 2 4 0.0 0.1 0.2 0.3 0.4 Range Density

P−value 0−0.005 0.005−0.015 0.015−0.025 0.025−0.035 0.035−0.045 0.045−0.055 0.055−0.065 0.065−0.075 0.075−0.085 0.085−0.095 0.095−0.1

Notes: causality results for the pairwise relations of the log returns on the corn → wheat (left panel) and beans → wheat (right panel). Period: 09/01/2010–03/06/2013. Nonlinear tests are performed on standardized data, transformed to (N)ormal marginals. The number of lags is lX = lY = 1 from the Bayesian Information Criterion.

Marcin Wolski (EIB) Independent counterfactuals April 15, 2019 18 / 24

slide-19
SLIDE 19

The main take-aways

Theory fully non-parametric dependence filtering framework,

unconditional + conditional dependence,

standard + bootstrap confidence bounds, desired MSE performance on non-linear processes, good finite-sample properties. Practice framework flexibility, hypothesis testing

further insights into the US grain market dependence structure.

In the future panel data extension, causal interpretation (yes, under error exogeneity).

Marcin Wolski (EIB) Independent counterfactuals April 15, 2019 19 / 24

slide-20
SLIDE 20

Selected references

Gertler, P. J. and Martinez, S. and Premand, P. and Rawlings, L. B. and Vermeersch, C. M. J. (2010) Impact Evaluation in Practice World Bank Training Rothe, C. (2010) Nonparametric estimation of distributional policy effects Journal of Econometrics 155 pp. 56 - 70 Chernozhukov, V. and Fern´ andez-Val, I. and Melly, B. (2013) Inference on counterfactual distributions Econometrica 81(6) pp. 2205 - 2268 Diks, C. and Wolski, M. (2019, forthcoming) New nonparametric measures for instantaneous and Granger-causality tail co-dependence

Marcin Wolski (EIB) Independent counterfactuals April 15, 2019 20 / 24

slide-21
SLIDE 21

The End

Marcin Wolski (EIB) Independent counterfactuals April 15, 2019 21 / 24

slide-22
SLIDE 22

Estimation assumptions (1)

Assumption (1)

Data {Wi : i = 1, ..., n}, where Wi = {W1i, ..., WdW i}, are i.i.d. as a dW -variate smooth continuous distribution FW(w) with well-defined PDF fW(w) and respective derivatives, up to a finite order r, which are finite, continuous and uniformly bounded on the support.

Marcin Wolski (EIB) Independent counterfactuals April 15, 2019 22 / 24

slide-23
SLIDE 23

Estimation assumptions (2)

Assumption (2)

Kernel function K : RdW → R behaves as

  • K(w)dw = 1,
  • K(w)wcdw = 0

for c = 1, ..., r − 1,

  • K(w)wcdw = κrIdW < ∞

for c = r, (11) and K(w) is r-times differentiable, where IdW is a dW × dW identity matrix.

Marcin Wolski (EIB) Independent counterfactuals April 15, 2019 23 / 24

slide-24
SLIDE 24

Estimation assumptions (3/4)

Assumption (3)

As n → ∞,

(i)

n1/2h0/(log n)1/2 + n1/2hr

0 → 0,

(ii)

n1/2 det H1/2/ log n + n1/2 max Hr/2 → 0.

Assumption (4)

We assume that (i) distribution functions FY and FY |X are Hadamard differentiable, (ii) F −1

Y

is uniformly Lipschitz and bounded by [a, b] ∈ R, (iii) Y is supported by a compact interval on J ∈ R for which FY |X(y|x) is uniformly bounded by [p1, p2] ∈ (0, 1).

Marcin Wolski (EIB) Independent counterfactuals April 15, 2019 24 / 24