What does strong causal influence mean? Joint work with David - - PowerPoint PPT Presentation

what does strong causal influence mean
SMART_READER_LITE
LIVE PREVIEW

What does strong causal influence mean? Joint work with David - - PowerPoint PPT Presentation

What does strong causal influence mean? Joint work with David Balduzzi, Moritz Grosse-Wentrup and Bernhard Sch olkopf Dominik Janzing Max Planck Institute for Intelligent Systems T ubingen, Germany 1 Quantifying strength of


slide-1
SLIDE 1

Dominik Janzing

Max Planck Institute for Intelligent Systems T¨ ubingen, Germany 1

What does ’strong causal influence’ mean?

Joint work with David Balduzzi, Moritz Grosse-Wentrup and Bernhard Sch¨

  • lkopf
slide-2
SLIDE 2

X1 X2 X3 X4

¡ ¡

quantify the strength of Xi→Xj

Given:

  • causally sufficient set of variables X1, . . . , Xn
  • causal DAG G
  • all causal conditionals P(xj|paj) even for values paj with probability zero

(more than just knowing P(X1, . . . , Xn)

Quantifying strength of an arrow:

slide-3
SLIDE 3

¡ ¡

Motivation:

Z X Y W Z X Y W

Maybe, the true causal DAG is always complete if we also account for weak

  • interactions. Which ones are so weak that we can neglect them?
slide-4
SLIDE 4

¡ ¡

Strength of a set of arrows

Idea:

  • strength of an arrow measures its relevance for understanding the behav-

ior of the system under inverventions

  • strength of a set of arrows measures their relevance for understanding

the behavior of the system under interventions

  • if each arrow in S is irrelevant then S could still be relevant
slide-5
SLIDE 5

Z X Y W Z X Y W

¡ ¡

Note:

this picture is misleading because for a set S of arrows

  • each element may have negligible strength
  • but jointly they are not negligible
  • ur causal strength will not be subadditive over the edges!
slide-6
SLIDE 6

¡ ¡

Information theoretic approach

don’t consider approaches that involve expectations, variances, etc. (ANOVA, ACE. . . )

advantages of information theory

  • variables may have different domains
  • quantities are invariant under rescaling
  • related to thermodynamics
  • better for non-statistical generalizations
slide-7
SLIDE 7

¡ ¡

Some related work

  • Avin, Sphitser, Pearl: Identifiability of path-specific effects, 2005.
  • Pearl: direct and indirect effects, 2001.
  • Robins, Greenland: Identifiability and exchangeability of direct and indi-

rect effects, 1992.

  • Holland: Causal inference, path analysis, and recursive structural equa-

tion models, 1988.

do not achieve our goal because:

  • measure impact of switching X from x to x0 for one particular pair (x, x0)
  • n Y when other paths are blocked
  • we want an overall score of the strength of X → Y without referring to

particular pairs

slide-8
SLIDE 8

¡ ¡

Axiomatic approach: Let S be a set of arrows.

  • Let CS denote its strength.
  • Postulate desired properties of CS.
slide-9
SLIDE 9

¡ ¡

Postulate 0 Causal Markov condition:

Z X Y

S

Z X Y DAG G DAG GS

if CS = 0 then P is also Markov w.r.t. GS (after removing all arrows in S)

slide-10
SLIDE 10

¡ ¡

Postulate 1 Mutual information:

X Y

for this simple DAG we postulate CX→Y = I(X; Y )

(all the dependences are due to the influence of X on Y , hence the strength of dependences can be a measure of the strength of the influence)

slide-11
SLIDE 11

¡ ¡

X Y

Alternative option:

CX→Y := capacity of the information channel P(Y |do(X)) = P(Y |X) defined by maximizing I(X; Y ) over all possible input distributions Q(X)

  • requires knowing P(Y |x) also for x-values that never/seldom occur
  • quantifies the potential influence rather than the actual one
  • nevertheless an interesting option
slide-12
SLIDE 12

¡ ¡

Potential strength vs actual strength

Assume a medical study shows that

  • changing cholesterol within the range of values occurring in humans has

no impact on life expectancy

  • increasing it by 10 times compared to the highest observed value had a

strong impact Which statement would you prefer:

  • “cholesterol has a strong impact on life expectancy”
  • “cholesterol would have a strong impact on life expectancy if it was much

higher than it is”

slide-13
SLIDE 13

¡ ¡

Postulate 2

Y X Z Y X Z

Z is irrelevant in both cases ξX→Y is determined by P(Y |PAY ) and P(PAY ) Locality:

slide-14
SLIDE 14

¡ ¡

Postulate 3: CX→Y ≥ I(X; Y |PAX

Y )

X Y PAY

X (parents of Y without X)

X Y

Quantitative causal Markov cond: No other arrow can generate non-zero dependence I(X; Y |PAX

Y )

Idea: removing X → Y would imply I(X; Y |PAX

Y ) = 0

slide-15
SLIDE 15

¡ ¡

Postulate 4: Heredity: (subsets of irrelevant sets of arrows are irrevalent) If T ⊃ S then CT = 0 ⇒ CS = 0

slide-16
SLIDE 16

¡ ¡

Apart from the postulates. . . Consider a simple communication scenario for which we might agree on how C should read...

slide-17
SLIDE 17

¡ ¡

  • Each variable Xj consists of kj bits
  • some of the bits are set uniformly at random
  • the remaining ones are copied from parents

Toy model with partial copy operations: i.e. structural equation model Xj = fj(PAj, Uj) where

  • every Xj and Uj is a vector of bits
  • every fj is a restriction map
slide-18
SLIDE 18

1 1 1

X Y

1 1

¡ ¡

  • 2. Y copies some of them
  • 1. X sets all its bits randomly

1 1 1

X Y

1 1 0

  • 3. Y sets the remaining ones randomly

Example with X → Y

1 1 1 1

X Y

1 1 1 1 1

X Y

slide-19
SLIDE 19

¡ ¡

Do we agree that. . .

. . . CX→Y should be the number of bits that Y takes from X?

(for the simple DAG X → Y this number equals I(X; Y ))

slide-20
SLIDE 20

¡ ¡

X Z Y

a)

Z X Y

b)

doesn’t account for the fact that part of the dependences are due to a) the confounder Z b) the indirect influence via Z

Why I(X; Y ) is an inappropriate measure for general DAGs

slide-21
SLIDE 21

¡ ¡

X Z Y

a)

Z X Y

b)

First guess: I(X; Y |Z)

  • qualitatively, it behaves correctly:

screens off the path involving Z

  • quantitatively wrong because. . .
slide-22
SLIDE 22

¡ ¡

Fails even for a simple copy scenario

1

Z

1 1

Y X

1

Z Y X

1 1 1

Z Y X

1 1 1 1

Z Y X

1 1 1

1) 2) 3) 4)

  • I(X; Y |Z) = 0 because X and Y are constants when conditioned on Z
  • we would like to have CX→Y = 1
slide-23
SLIDE 23

¡ ¡

X Z Y

a)

Why I(X; Y |Z) is inappropriate

b)

weakening Z → Y converts a) into b), where CX→Y = I(X; Y )

slide-24
SLIDE 24

¡ ¡

Idea:

  • formalized by Ay & Polani (2006) in terms of Pearl’s do-calculus
  • defined family of information theoretic quantities called “Information Flow”

Measure strength of X on Y by the impact of interventions on X (while adjusting other variables)

slide-25
SLIDE 25

¡ ¡

does not solve our problem

  • Ay and Polani’s Information Flow measures an interesting quantity

(something related to causality)

  • we don’t consider it a good measure for the strength of an arrow
  • arguments follow
slide-26
SLIDE 26

¡ ¡

First attempt:

X Z Y

a)

The strength of X → Y is the mutual information between I(X; Y ) in a scenario where

  • X is subjected to a randomized intervention
slide-27
SLIDE 27

¡ ¡

Fails because...

X Z Y

  • X, Y, Z binary
  • P(Z) uniform
  • Y = X ⊕ Z

X and Y are independent both with respect to the

  • observed distribution
  • distribution obtained by randomizing X
slide-28
SLIDE 28

¡ ¡

X Z Y

a)

Second attempt:

Question: X is randomized according to which distribution?

The strength of X → Y is the conditional mutual information I(X; Y |Z) in a scenario where

  • X is subjected to a randomized intervention
slide-29
SLIDE 29

¡ ¡

X Z Y

a)

Second attempt, Version I

The strength of X → Y is the conditional mutual information between I(X; Y |Z) in a scenario where

  • X is subjected to a randomized intervention
  • X distributed according to P(X|Z)
slide-30
SLIDE 30

¡ ¡

Fails because. . .

X Z Y

=

If X is a copy of Z,

  • given Z, X is a constant
  • I(X; Y |Z) = 0 also for the post-interventional distribution
slide-31
SLIDE 31

¡ ¡

X Z Y

a)

Second attempt, Version II

The strength of X → Y is the conditional mutual information between I(X; Y |Z) in a scenario where

  • X is subjected to a randomized intervention
  • X distributed according to P(X)
slide-32
SLIDE 32

¡ ¡

X Z Y

a)

Violates Postulate 3: there is a contrived example where strength of X → Y would be smaller than I(X; Y |Z)

slide-33
SLIDE 33

Z Y X

¡ ¡

Violates Postulate 3:

random bit

k bits

  • randomized for Z = 1
  • set to zero for Z = 0

k bits

  • copied from X for Z = 1
  • set to 1 for Z = 0

I(X; Y |Z) = k/2 because k bits are copied in half of the cases for X and Z independent, copying occurs only in 1/4 of the cases

slide-34
SLIDE 34

¡ ¡

  • Hence. . .
  • defining strength of an arrow by intervention on nodes seems difficult
  • we now define the strength by intervention on edges
slide-35
SLIDE 35

X Z Y

P(X) P(Z)

X Z Y

S

¡ ¡

Our approach: measure impact of ‘deleting arrows’

To define the strength of S, cut every edge in S and feed the open end with an independent copy

defines new distribution PS(x, y, z) := P(x, z) P

x0,z0 P(y|x0, z0)P(x0)P(z0)

CS := D(PkPS)

slide-36
SLIDE 36

X Z Y

P(X) P(Z)

¡ ¡

Idea of ‘edge deletion’

  • edges are electrical wires
  • attacker cuts some wires
  • feeds the open ends with random input
  • distribution of input chosen like observed marginal distribution
  • only distribution that is locally accessible
slide-37
SLIDE 37

¡ ¡

why product distribution?

X Z Y

P(X)P(Z)

X Z Y

P(X,Z)

  • ur edge deletion

‘source exclusion’ by Ay & Krakauer (2006)

  • not accessible to local attacker
  • Postulate 4 fails
slide-38
SLIDE 38

¡ ¡

X Z Y

S

1

X Z Y

S

1

X Z Y

S

D(PkPS) = number of corrupted bits (in agreement with what we expect) Applying our measure to our toy model

slide-39
SLIDE 39

vaccinated

  • r not

Age

infected

  • r not

¡ ¡

Quantifying the impact of a vaccine

PS corresponds to an experiment where

  • vaccine is randomly redistributed regardless of Age

(keeping the fraction of treated subjects)

  • the random variable vaccinated is reinterpreted as

‘intention to get vaccinated’

slide-40
SLIDE 40

X Z Y

a)

¡ ¡

= ⊕ P(Z) = uniform X = Z Y = X ⊕ Z XOR-Example

  • Y is always 0
  • Y is uniformly distributed after deleting X → Y
  • Y remains independent of X
  • I(X; Y ) = 0 and I(X; Y |Z) = 0
  • CX→Y = 1
  • Ay and Krakauer’s definition yields zero strength
slide-41
SLIDE 41

E D B1 B2 B2k+1

...

= majority of Bj

= = =

¡ ¡

Failure of subadditivity redundancy code: bit E is copied to all Bj

  • removing less than half of the arrows Bj → D has no impact
  • each arrow has strength zero
  • all arrow together have strength 1
slide-42
SLIDE 42

¡ ¡

application to time series

why time series also require a new measure of causal strength

slide-43
SLIDE 43

¡ ¡

Granger causality

Xt-2 Yt-2 Xt-1 Xt Xt+1 Yt-1 Yt Yt+1

...

measures

  • relevance of past Xt−1, Xt−2, . . . for predicting Yt
  • if past Yt−1, Yt−2, . . . is known
slide-44
SLIDE 44

¡ ¡

Xt-2 Yt-2 Xt-1 Xt Xt+1 Yt-1 Yt Yt+1

...

Transfer Entropy information theoretic version I(Yt ; Xt−1, Xt−2, . . . |Yt−1, Yt−2, . . . )

slide-45
SLIDE 45

¡ ¡

Criticizing Granger causality and Transfer Entropy

(Ay & Polani 2006)

Xt-2 Yt-2 Xt-1 Xt Xt+1 Yt-1 Yt Yt+1

assume perfect copy

  • past of Y allows for predicting Y without X
  • Granger causality and TE are zero
  • interventions on X clearly change Y
slide-46
SLIDE 46

¡ ¡

Xt-2 Yt-2 Xt-1 Xt Xt+1 Yt-1 Yt Yt+1

...

S

Applying our measure to time series CS quantifies effect of all X on Yt+1

(applying this to the example of Ay & Polani yields a reasonable result)

slide-47
SLIDE 47

¡ ¡

Conclusions:

  • none of the existing measures appeared to be conceptually right for mea-

suring strength of sets of edges

  • our measure satisfies our postulates
  • relies on interventions on edges
  • clear operational meaning (does not refer to counterfactuals)
  • definitions that rely on interventions on nodes failed although they seem

more straightforward

  • replacing Transfer Entropy (Granger causality) with our measure seems

reasonable

slide-48
SLIDE 48

48

Thank you for listening!

Reference:

  • D. Janzing, D. Balduzzi, M. Grosse-Wentrup, B. Sch¨
  • lkopf:

Quantifying causal influences, to appear in Annals of Statistics.