Causality Actions, Confounders and Interventions Christos - - PowerPoint PPT Presentation

causality
SMART_READER_LITE
LIVE PREVIEW

Causality Actions, Confounders and Interventions Christos - - PowerPoint PPT Presentation

Causality Actions, Confounders and Interventions Christos Dimitrakakis October 30, 2019 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C. Dimitrakakis Causality October


slide-1
SLIDE 1

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Causality

Actions, Confounders and Interventions Christos Dimitrakakis October 30, 2019

  • C. Dimitrakakis

Causality October 30, 2019 1 / 22

slide-2
SLIDE 2

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction

Introduction Decision diagrams Common structural assumptions Interventions Policy evaluation and optimisation Individual effects and counterfactuals

  • C. Dimitrakakis

Causality October 30, 2019 2 / 22

slide-3
SLIDE 3

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction

Headaches and aspirins

Example 1 (Population effects)

1 2 3 4 5 0.2 0.4 0.6 0.8 1 Dose Response Cured Side-effects

(a) Dose-response curve.

−2 −1 1 2 0.2 0.4 0.6 0.8 1 Sensitivity Response High dose Low dose

(b) Response distribution Figure: Investigation the response of the population to various doses of the drug.

▶ Is aspirin an effective cure for headaches? ▶ Does having a headache lead to aspirin-taking?

  • C. Dimitrakakis

Causality October 30, 2019 3 / 22

slide-4
SLIDE 4

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction

Example 2 (Individual effects)

▶ Effects of Causes: Will my headache pass if I take an aspirin? ▶ Causes of Effects: Would my headache have passed if I had not taken

an aspirin?

  • C. Dimitrakakis

Causality October 30, 2019 4 / 22

slide-5
SLIDE 5

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction

Overview

Inferring causal models

We can distinguish different models from observational or experimental data.

Inferring individual effects

The effect of possible intervention on an individual is not generally

  • determinable. We usually require strong assumptions.

Decision-theoretic view

There are many competing approaches to causality. We will remain within the decision-theoretic framework, which allows us to crisply define both

  • ur knowledge and assumptions.
  • C. Dimitrakakis

Causality October 30, 2019 5 / 22

slide-6
SLIDE 6

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction

What causes what?

Example 3

θ at xt

(a) Independence of at.

θ at xt

(b) Independence of xt.

Suppose we have data xt, at where

▶ xt: lung cancer ▶ at: smoking

Does smoking cause lung cancer or does lung cancer make people smoke? Can we compare the two models above to determine it?

  • C. Dimitrakakis

Causality October 30, 2019 6 / 22

slide-7
SLIDE 7

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction

What causes what?

Example 3

θ at xt

(a) Independence of at.

θ at xt

(b) Independence of xt.

Suppose we have data xt, at where

▶ xt: lung cancer ▶ at: smoking

Does smoking cause lung cancer or does lung cancer make people smoke? Can we compare the two models above to determine it? Pθ(D) = ∏

t

Pθ(xt, at) = ∏

t

Pθ′(xt | at)Pθ′(at) = ∏

t

Pθ′′(at | xt)Pθ′′(xt).

  • C. Dimitrakakis

Causality October 30, 2019 6 / 22

slide-8
SLIDE 8

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction Decision diagrams

θ xt yt at

Figure: A typical decision diagram where xt: individual information, yt: individual result, at: action, π: policy

  • C. Dimitrakakis

Causality October 30, 2019 7 / 22

slide-9
SLIDE 9

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction Decision diagrams

θ xt yt at U

Figure: A typical decision diagram where xt: individual information, yt: individual result, at: action, π: policy

  • C. Dimitrakakis

Causality October 30, 2019 7 / 22

slide-10
SLIDE 10

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction Decision diagrams

θ xt yt at π U

Figure: A typical decision diagram where xt: individual information, yt: individual result, at: action, π: policy

Example 4 (Taking an aspirin)

▶ Individual t ▶ Individual information xt ▶ at = 1 if t takes an aspirin, and 0 otherwise. ▶ yt = 1 if the headache is cured in 30 minutes, 0 otherwise. ▶ π: intervention policy.

  • C. Dimitrakakis

Causality October 30, 2019 7 / 22

slide-11
SLIDE 11

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction Decision diagrams

θ xt yt at π U

Figure: A typical decision diagram where xt: individual information, yt: individual result, at: action, π: policy

Example 4 (A recommendation system)

▶ xt: User information (random variable) ▶ at: System action (random variable) ▶ yt: Click (random varaible) ▶ π: recommendation policy (decision variable).

  • C. Dimitrakakis

Causality October 30, 2019 7 / 22

slide-12
SLIDE 12

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction Decision diagrams

Conditional distributions and decision variables.

P(A | B) ≜ P(A ∩ B) P(B) .

The conditional distribution of decisions

π(a) ≡ Pπ(a) ≡ P(a | π). Pπ

θ (a) ≡ P(a | θ, π).

  • C. Dimitrakakis

Causality October 30, 2019 8 / 22

slide-13
SLIDE 13

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction Common structural assumptions

Basic causal structures

Non-cause

π at yt

Figure: π does not cause y

No confounding

π at yt

Figure: No confounding: π causes yt

  • C. Dimitrakakis

Causality October 30, 2019 9 / 22

slide-14
SLIDE 14

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction Common structural assumptions

Basic causal structures

Non-cause

π at yt θ

Figure: π does not cause y

No confounding

π at yt θ

Figure: No confounding: π causes yt

  • C. Dimitrakakis

Causality October 30, 2019 9 / 22

slide-15
SLIDE 15

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction Common structural assumptions

Covariates

Sufficient covariate

π at yt xt

Figure: Sufficient covariate xt

Instrumental variables and confounders

π at yt xt zt

Figure: Instrumental variable zt

  • C. Dimitrakakis

Causality October 30, 2019 10 / 22

slide-16
SLIDE 16

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction Common structural assumptions

Covariates

Sufficient covariate

π at yt xt θ

Figure: Sufficient covariate xt

Instrumental variables and confounders

π at yt xt zt θ

Figure: Instrumental variable zt

  • C. Dimitrakakis

Causality October 30, 2019 10 / 22

slide-17
SLIDE 17

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Interventions

Modelling interventions

▶ Observational data D. ▶ Policy space Π.

Default policy

The space of policies Π includes a default policy π0, under which the data was collected.

Intervention policies

Except π0, policies π ∈ Π represent different interventions specifying a distribution π(at | xt).

▶ Direct interventions. ▶ Indirect interventions and non-compliance.

  • C. Dimitrakakis

Causality October 30, 2019 11 / 22

slide-18
SLIDE 18

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Interventions

Example 5 (Weight loss)

θ xt yt at π U

  • C. Dimitrakakis

Causality October 30, 2019 12 / 22

slide-19
SLIDE 19

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Interventions

Example 5 (Weight loss)

θ zt xt yt at π U

Figure: Model of non-compliance as a confounder.

  • C. Dimitrakakis

Causality October 30, 2019 12 / 22

slide-20
SLIDE 20

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Policy evaluation and optimisation

The value of an observed policy

θ xt yt at π U

Figure: Basic decision diagram

ˆ a∗

D ∈ arg max a

ˆ ED(U | a),

  • C. Dimitrakakis

Causality October 30, 2019 13 / 22

slide-21
SLIDE 21

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Policy evaluation and optimisation

The value of an observed policy

θ xt yt at π U

Figure: Basic decision diagram

ˆ ED(U | a) ≜ 1 | {t | at = a} | ∑

t:at=a

U(at, yt) (3.1) ≈ Eπ0

θ (U | a)

(at, yt) ∼ Pπ0

θ .

(3.2) ˆ a∗

D ∈ arg max a

ˆ ED(U | a),

  • C. Dimitrakakis

Causality October 30, 2019 13 / 22

slide-22
SLIDE 22

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Policy evaluation and optimisation

xt | θ ∼ Pθ(x) yt | θ, xt, at ∼ Pθ(y | xt, at) at | xt, π ∼ π(a | xt).

The value of a policy

θ (U) =

X

dPθ(x) ∑

y∈Y

Pθ(y | x, a)U(a, y) ∑

a∈A

π(a | x). The optimal policy under a known parameter θ is given simply by max

π∈Π Eπ θ (U),

where Π is the set of allowed policies.

  • C. Dimitrakakis

Causality October 30, 2019 14 / 22

slide-23
SLIDE 23

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Policy evaluation and optimisation

Monte-Carlo estimation

Importance sampling1

We can obtain an unbiased estimate of the utility in a model-free manner through importance sampling: Eπ

θ (U) =

X

dPθ(x) ∑

a

Eθ(U | a, x)π(a | x) ≈ 1 T

T

t=1

Ut π(at | xt) π0(at | xt).

1Also known as Propensity Scoring

  • C. Dimitrakakis

Causality October 30, 2019 15 / 22

slide-24
SLIDE 24

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Policy evaluation and optimisation

Bayesian estimation

If we π0 is given, we can calculate the utility of any policy to whatever degree of accuracy we wish. ξ(θ | D, π0) ∝ ∏

t

Pπ0

θ (xt, yt, at)

ξ (U | D) =

Θ

θ (U) dξ(θ | D)

= ∫

Θ

X

dPθ(x)

T

t=1

a

Eθ(U | a, x)π(a | x) dξ(θ | D).

  • C. Dimitrakakis

Causality October 30, 2019 16 / 22

slide-25
SLIDE 25

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Policy evaluation and optimisation

Causal inference and policy optimisation

Example 6

θ yt at π U

Figure: Simple decision problem.

Let at, yt ∈ {0, 1}, θ ∈ [0, 1]2 and yt | at = a ∼ Bernoulli(θa) Then, by estimating θ, we can predict the effect of any action.

  • C. Dimitrakakis

Causality October 30, 2019 17 / 22

slide-26
SLIDE 26

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Policy evaluation and optimisation

Causal inference and policy optimisation

Example 6

θ yt at π U xt

Figure: Decision problem with covariates.

Let at, xt = {0, 1}, yt ∈ R, θ ∈ R4 and yt | at = a, xt = x ∼ Bernoulli(θa,x) Then, by estimating θ, we can predict the effect of any action.

  • C. Dimitrakakis

Causality October 30, 2019 17 / 22

slide-27
SLIDE 27

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Individual effects and counterfactuals Disturbances and structural equation models

θ xt yt at π U ωt,y ωt,x ωt,a

Figure: Decision diagram with exogenous disturbances ω.

Example 7 (Structural equation model for Figure 12)

θ ∼ N (04, I4), xt = θ0ωt,x, ωt,x ∼ Bernoulli(0.5) yt = θ1yt + θ2xt + θ3at + ωt,y, ωt,y ∼ N (0, 1) at = π(xt) + ωt,a mod |A| ωt,a ∼ 0.1 D(0) + 0.9 Unif (A),

  • C. Dimitrakakis

Causality October 30, 2019 18 / 22

slide-28
SLIDE 28

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Individual effects and counterfactuals Disturbances and structural equation models

Treatment-unit additivity

θ yt at π U ωt,y ωt,a

Figure: Decision diagram for treatment-unit additivity

Assumption 1 (TUA)

For any given treatment a ∈ A, the response variable satisfies yt = g(at) + ωt,y

  • C. Dimitrakakis

Causality October 30, 2019 19 / 22

slide-29
SLIDE 29

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Individual effects and counterfactuals Example: Learning instrumental variables

Example 8 (Pricing model)

xt yt zt at π ωt

Figure: Graph of structural equation model for airport pricing policy π: at is the actual price, zt are fuel costs, xt is the customer type, yt is the amount of sales, ωt is whether there is a conference. The dependency on θ is omitted for clarity.

Assumption 2 (Relevance)

at depends on zt.

Assumption 3 (Exclusion)

zt ⊥

⊥ yt | xt, at, ωt. Assumption 4 (Unconfounded instrument)

zt ⊥

⊥ ωt | xt.

  • C. Dimitrakakis

Causality October 30, 2019 20 / 22

slide-30
SLIDE 30

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Individual effects and counterfactuals Example: Learning instrumental variables

Prediction tasks

yt = gθ(at, xt) + ωt, Eθ ωt = 0, ∀θ ∈ Θ (4.1)

Standard prediction

θ (yt | xt, at),

θ (yt | xt, at) = gθ(xt, at) + Eπ θ (ωt | xt, at).

Counterfactual prediction

θ (yt | xt, zt) =

A

[g(at | xt, zt) + Eθ(ω | xt)]

  • h(at,xt)

dπ(at | xt)

  • C. Dimitrakakis

Causality October 30, 2019 21 / 22

slide-31
SLIDE 31

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Individual effects and counterfactuals Discussion

Further reading

▶ Pearl, Causality. ▶ ?

In the following exercises, we are taking actions at and obtaining outcomes

  • yt. Our utility function is simply U = yt.
  • C. Dimitrakakis

Causality October 30, 2019 22 / 22