Bayesian Causal Induction Pedro A. Ortega Sensorimotor Learning and - - PowerPoint PPT Presentation

bayesian causal induction
SMART_READER_LITE
LIVE PREVIEW

Bayesian Causal Induction Pedro A. Ortega Sensorimotor Learning and - - PowerPoint PPT Presentation

Bayesian Causal Induction Pedro A. Ortega Sensorimotor Learning and Decision-Making Group MPI for Biological Cybernetics/Intelligent Systems 17th December 2011 Introduction Causal Induction (AKA Causal Discovery): One of the oldest


slide-1
SLIDE 1

Bayesian Causal Induction

Pedro A. Ortega

Sensorimotor Learning and Decision-Making Group MPI for Biological Cybernetics/Intelligent Systems

17th December 2011

slide-2
SLIDE 2

Introduction

Causal Induction (AKA Causal Discovery):

◮ One of the oldest philosophical problems:

◮ Aristotle, Kant, Hume, . . .

◮ The generalization from particular causal instances

to abstract causal laws.

slide-3
SLIDE 3

Introduction

Causal Induction (AKA Causal Discovery):

◮ One of the oldest philosophical problems:

◮ Aristotle, Kant, Hume, . . .

◮ The generalization from particular causal instances

to abstract causal laws.

◮ Example:

◮ ‘I had a bad fall on wet floor.’ ◮ ‘Therefore, it is dangerous to ride a bike on ice.’ ◮ (‘Because I learned that a slippery floor can cause a fall’)

slide-4
SLIDE 4

Introduction

Causal Induction (AKA Causal Discovery):

◮ One of the oldest philosophical problems:

◮ Aristotle, Kant, Hume, . . .

◮ The generalization from particular causal instances

to abstract causal laws.

◮ Example:

◮ ‘I had a bad fall on wet floor.’ ◮ ‘Therefore, it is dangerous to ride a bike on ice.’ ◮ (‘Because I learned that a slippery floor can cause a fall’)

◮ Two important aspects:

◮ Infer causal link from experience. ◮ Extrapolate to future experience.

slide-5
SLIDE 5

Introduction

Causal Induction (AKA Causal Discovery):

◮ One of the oldest philosophical problems:

◮ Aristotle, Kant, Hume, . . .

◮ The generalization from particular causal instances

to abstract causal laws.

◮ Example:

◮ ‘I had a bad fall on wet floor.’ ◮ ‘Therefore, it is dangerous to ride a bike on ice.’ ◮ (‘Because I learned that a slippery floor can cause a fall’)

◮ Two important aspects:

◮ Infer causal link from experience. ◮ Extrapolate to future experience.

◮ We all do this in our everyday lives—but how?

slide-6
SLIDE 6

Causal Graphical Model

  • ◮ A pair of (binary) random variables X and Y

◮ Two candidate causal hypotheses {h, ¬h}

(having identical joint distributions)

slide-7
SLIDE 7

Causal Graphical Model

  • ◮ A pair of (binary) random variables X and Y

◮ Two candidate causal hypotheses {h, ¬h}

(having identical joint distributions)

◮ How do we express the problem of causal induction using the

language of graphical models alone?

slide-8
SLIDE 8

Causal Graphical Model

  • ◮ A pair of (binary) random variables X and Y

◮ Two candidate causal hypotheses {h, ¬h}

(having identical joint distributions)

◮ How do we express the problem of causal induction using the

language of graphical models alone?

slide-9
SLIDE 9

Causal Graphical Model

  • ◮ A pair of (binary) random variables X and Y

◮ Two candidate causal hypotheses {h, ¬h}

(having identical joint distributions)

◮ How do we express the problem of causal induction using the

language of graphical models alone?

◮ Do we have to introduce a meta-level for H?

slide-10
SLIDE 10

Probability Trees

  • 1

2 1 2 1 2 1 2 1 2 1 2 3 4 1 4 1 4 3 4 3 4 1 4 1 4 3 4

H X Y Y Y X X ◮ Node: mechanism, history dependent

◮ e.g. P(y|h, ¬x) = 1

4 and P(¬y|h, ¬x) = 3 4

◮ Path: causal realization of mechanisms ◮ Tree: causal realizations, possibly heterogeneous ◮ All random variables are first class citizens!

slide-11
SLIDE 11

Inferring the Causal Direction

◮ We observe X = x, then we observe Y = y. ◮ What is the probability of H = h? ◮ Calculate posterior probability:

P(h|x, y) = P(y|h, x)P(x|h)P(h) P(y|h, x)P(x|h)P(h) + P(x|¬h, y)P(y|¬h)P(¬h) =

3 4 · 1 2 · 1 2 3 4 · 1 2 · 1 2 + 3 4 · 1 2 · 1 2

slide-12
SLIDE 12

Inferring the Causal Direction

◮ We observe X = x, then we observe Y = y. ◮ What is the probability of H = h? ◮ Calculate posterior probability:

P(h|x, y) = P(y|h, x)P(x|h)P(h) P(y|h, x)P(x|h)P(h) + P(x|¬h, y)P(y|¬h)P(¬h) =

3 4 · 1 2 · 1 2 3 4 · 1 2 · 1 2 + 3 4 · 1 2 · 1 2

= 1 2 = P(h)!

slide-13
SLIDE 13

Inferring the Causal Direction

◮ We observe X = x, then we observe Y = y. ◮ What is the probability of H = h? ◮ Calculate posterior probability:

P(h|x, y) = P(y|h, x)P(x|h)P(h) P(y|h, x)P(x|h)P(h) + P(x|¬h, y)P(y|¬h)P(¬h) =

3 4 · 1 2 · 1 2 3 4 · 1 2 · 1 2 + 3 4 · 1 2 · 1 2

= 1 2 = P(h)!

◮ We haven’t learned anything!

slide-14
SLIDE 14

Inferring the Causal Direction

◮ We observe X = x, then we observe Y = y. ◮ What is the probability of H = h? ◮ Calculate posterior probability:

P(h|x, y) = P(y|h, x)P(x|h)P(h) P(y|h, x)P(x|h)P(h) + P(x|¬h, y)P(y|¬h)P(¬h) =

3 4 · 1 2 · 1 2 3 4 · 1 2 · 1 2 + 3 4 · 1 2 · 1 2

= 1 2 = P(h)!

◮ We haven’t learned anything! ◮ To extract new causal information,

we have to supply old causal information:

◮ “no causes in, no causes out” ◮ “to learn what happens if you kick the system,

you have to kick the system”

slide-15
SLIDE 15

Interventions in a Probability Tree

Set X = x:

  • 1

2 1 2 1 2 1 2 1 2 1 2 3 4 1 4 1 4 3 4 3 4 1 4 1 4 3 4

P(X, Y |H) :

3 8 1 8 1 8 3 8 3 8 1 8 1 8 3 8

H X Y Y Y X X

slide-16
SLIDE 16

Interventions in a Probability Tree

Set X = x:

  • 1

2 1 2 1 2 1 2 3 4 1 4 1 4 3 4

P(X, Y |H) : 1 1 1

3 4 1 4 1 2 1 2

H X Y Y Y X X ◮ Replace all mechanisms resolving X with the delta “X = x”.

slide-17
SLIDE 17

Inferring the Causal Direction—2nd Attempt

◮ We set X = x, then we observe Y = y. ◮ What is the probability of H = h?

slide-18
SLIDE 18

Inferring the Causal Direction—2nd Attempt

◮ We set X = x, then we observe Y = y. ◮ What is the probability of H = h? ◮ Calculate posterior probability:

P(h|ˆ x, y) = P(y|h, ˆ x)P(ˆ x|h)P(h) P(y|h, ˆ x)P(ˆ x|h)P(h) + P(ˆ x|¬h, y)P(y|¬h)P(¬h) =

3 4 · 1 · 1 2 3 4 · 1 · 1 2 + 1 · 1 2 · 1 2

slide-19
SLIDE 19

Inferring the Causal Direction—2nd Attempt

◮ We set X = x, then we observe Y = y. ◮ What is the probability of H = h? ◮ Calculate posterior probability:

P(h|ˆ x, y) = P(y|h, ˆ x)P(ˆ x|h)P(h) P(y|h, ˆ x)P(ˆ x|h)P(h) + P(ˆ x|¬h, y)P(y|¬h)P(¬h) =

3 4 · 1 · 1 2 3 4 · 1 · 1 2 + 1 · 1 2 · 1 2

= 3 5 = P(h).

◮ We have have acquired evidence for “X → Y ”!

slide-20
SLIDE 20

Conclusions

◮ Causal induction can be done using purely Bayesian

techniques plus a description allowing multiple causal explanations of an experiment.

◮ Probability trees provide a clean & simple way to encode

causal probabilistic information.

◮ The purpose of an intervention is to introduce statistical

asymmetries.

◮ The causal information that we can acquire is limited by the

interventions we can apply to the system.

◮ In this approach, the causal dependencies are not “in the

data”, but they rather arise from the data and the hypotheses that the reasoner “imprints” on them.