Two Optimal Strategies for Active Learning of Causal Models from - - PowerPoint PPT Presentation

two optimal strategies for active learning of causal
SMART_READER_LITE
LIVE PREVIEW

Two Optimal Strategies for Active Learning of Causal Models from - - PowerPoint PPT Presentation

Two Optimal Strategies for Active Learning of Causal Models from Interventions Alain Hauser Peter B uhlmann Seminar f ur Statistik, ETH Z urich PGM 2012, Granada Alain Hauser (ETH Z urich) Active learning of causal models PGM


slide-1
SLIDE 1

Two Optimal Strategies for Active Learning

  • f Causal Models from Interventions

Alain Hauser Peter B¨ uhlmann

Seminar f¨ ur Statistik, ETH Z¨ urich

PGM 2012, Granada

Alain Hauser (ETH Z¨ urich) Active learning of causal models PGM 2012, Granada 1 / 16

slide-2
SLIDE 2

Causal model: example

Random variables: X1: taxis honking X2: Jonas awake X3: Alain awake X4: watermelons eaten

Alain Hauser (ETH Z¨ urich) Active learning of causal models PGM 2012, Granada 2 / 16

slide-3
SLIDE 3

Causal model: example

Random variables: X1: taxis honking X2: Jonas awake X3: Alain awake X4: watermelons eaten Directed acyclic graph (DAG)

  • f causal dependencies:

1 2 3 4

Alain Hauser (ETH Z¨ urich) Active learning of causal models PGM 2012, Granada 2 / 16

slide-4
SLIDE 4

Causal model: example

Random variables: X1: taxis honking X2: Jonas awake X3: Alain awake X4: watermelons eaten Directed acyclic graph (DAG)

  • f causal dependencies:

1 2 3 4 Factorization of density: f (x) = f (x1)f (x2|x1)f (x3|x1)f (x4|x2, x3) f has Markov property of D

Alain Hauser (ETH Z¨ urich) Active learning of causal models PGM 2012, Granada 2 / 16

slide-5
SLIDE 5

Intervention: example

Random variables: X1: taxis honking X2: Jonas awake X3: Alain awake X4: watermelons eaten 1 2 3 4 True DAG D Observational density: f (x) = f (x1)f (x2|x1)f (x3|x1)f (x4|x2, x3)

Alain Hauser (ETH Z¨ urich) Active learning of causal models PGM 2012, Granada 3 / 16

slide-6
SLIDE 6

Intervention: example

Random variables: X1: taxis honking X2: Jonas awake X3: Alain awake X4: watermelons eaten Intervention at X2: waking Jonas 1 2 3 4 Observational density: f (x) = f (x1)f (x2|x1)f (x3|x1)f (x4|x2, x3)

Alain Hauser (ETH Z¨ urich) Active learning of causal models PGM 2012, Granada 3 / 16

slide-7
SLIDE 7

Intervention: example

Random variables: X1: taxis honking X2: Jonas awake X3: Alain awake X4: watermelons eaten 1 2 3 4 Intervention DAG D({2}) Observational density: f (x) = f (x1)f (x2|x1)f (x3|x1)f (x4|x2, x3) Interventional density: f (x|do(X2 = U)) = f (x1)˜ f (x2)f (x3|x1)f (x4|x2, x3)

Alain Hauser (ETH Z¨ urich) Active learning of causal models PGM 2012, Granada 3 / 16

slide-8
SLIDE 8

Markov equivalence

A probability density in general obeys the Markov properties of several DAGs; those DAGs are called Markov equivalent limited identifiability under observational data 1 2 3 4 D 1 2 3 4 D1 1 2 3 4 D2

Alain Hauser (ETH Z¨ urich) Active learning of causal models PGM 2012, Granada 4 / 16

slide-9
SLIDE 9

Markov equivalence

A probability density in general obeys the Markov properties of several DAGs; those DAGs are called Markov equivalent limited identifiability under observational data 1 2 3 4 D 1 2 3 4 D1 1 2 3 4 D2 On the other hand, intervention effects do depend on the DAG improved identifiability of causal models under interventional data

Alain Hauser (ETH Z¨ urich) Active learning of causal models PGM 2012, Granada 4 / 16

slide-10
SLIDE 10

Interventional Markov equivalence

Assume experiment in which different interventions at targets I1, I2, . . . are performed, summarized as family of targets I = {I1, I2, . . .}. Note: observational case corresponds to special family I = {∅}

Definition (I-Markov equivalence; Hauser and B¨ uhlmann, 2012)

Given a family of targets I, two DAGs D1 and D2 are called I-Markov equivalent if they produce the same class of tuples of interventional densities.

Alain Hauser (ETH Z¨ urich) Active learning of causal models PGM 2012, Granada 5 / 16

slide-11
SLIDE 11

Interventional Markov equivalence

Assume experiment in which different interventions at targets I1, I2, . . . are performed, summarized as family of targets I = {I1, I2, . . .}. Note: observational case corresponds to special family I = {∅}

Definition (I-Markov equivalence; Hauser and B¨ uhlmann, 2012)

Given a family of targets I, two DAGs D1 and D2 are called I-Markov equivalent if they produce the same class of tuples of interventional densities. In words: two DAGs D1 and D2 are I-Markov equivalent if they are statistically indistinguishable from data produced from interventions at the targets in I.

Alain Hauser (ETH Z¨ urich) Active learning of causal models PGM 2012, Granada 5 / 16

slide-12
SLIDE 12

Interventional essential graph

Definition

Let I be a family of targets. The I-essential graph of some DAG D is defined as EI(D) :=

D′∼ID D′.

In words: EI(D) is a partially directed graph having the same skeleton as D with a directed edge where the corresponding arrows of all DAGs I-equivalent to D have the same orientation with an undirected edge where the orientation of the corresponding arrow is not common to all DAGs I-equivalent to D

Alain Hauser (ETH Z¨ urich) Active learning of causal models PGM 2012, Granada 6 / 16

slide-13
SLIDE 13

Interventional essential graph

Definition

Let I be a family of targets. The I-essential graph of some DAG D is defined as EI(D) :=

D′∼ID D′.

In words: EI(D) is a partially directed graph having the same skeleton as D with a directed edge where the corresponding arrows of all DAGs I-equivalent to D have the same orientation with an undirected edge where the orientation of the corresponding arrow is not common to all DAGs I-equivalent to D Properties: unique representation of an I-Markov equivalence class chain graph with chordal chain components (Hauser and B¨ uhlmann, 2012)

Alain Hauser (ETH Z¨ urich) Active learning of causal models PGM 2012, Granada 6 / 16

slide-14
SLIDE 14

Interventional Markov equivalence: example

1 2 3 4 D 1 2 3 4 D1 1 2 3 4 D2 1 2 3 4 E{∅}(D) Observational Markov equivalence class of D with corresponding essential graph

Alain Hauser (ETH Z¨ urich) Active learning of causal models PGM 2012, Granada 7 / 16

slide-15
SLIDE 15

Interventional Markov equivalence: example

1 2 3 4 D 1 2 3 4 D1 1 2 3 4 E{∅,{2}}(D) Interventional Markov equivalence class of D for family of targets I = {∅, {2}}. Corresponds to an experiment which measures

  • bservational data (I = ∅)

interventional data from an intervention at X2 (I = {2})

Alain Hauser (ETH Z¨ urich) Active learning of causal models PGM 2012, Granada 8 / 16

slide-16
SLIDE 16

Active learning: overview

Up to now: given list of interventions; characterization of identifiability via interventional essential graphs

Alain Hauser (ETH Z¨ urich) Active learning of causal models PGM 2012, Granada 9 / 16

slide-17
SLIDE 17

Active learning: overview

Up to now: given list of interventions; characterization of identifiability via interventional essential graphs

Problem

1 2 3 4 5 Given list of interventions performed so far and corresponding interventional essential graph, find “optimal” intervention target for maximal im- provement of identifiability of causal models

Alain Hauser (ETH Z¨ urich) Active learning of causal models PGM 2012, Granada 9 / 16

slide-18
SLIDE 18

Active learning: overview

Up to now: given list of interventions; characterization of identifiability via interventional essential graphs

Problem

1 2 3 4 5 Given list of interventions performed so far and corresponding interventional essential graph, find “optimal” intervention target for maximal im- provement of identifiability of causal models

Objectives: assessing identifiability

Number of edges orientable after one (single-vertex) intervention OptSingle

Alain Hauser (ETH Z¨ urich) Active learning of causal models PGM 2012, Granada 9 / 16

slide-19
SLIDE 19

Active learning: overview

Up to now: given list of interventions; characterization of identifiability via interventional essential graphs

Problem

1 2 3 4 5 Given list of interventions performed so far and corresponding interventional essential graph, find “optimal” intervention target for maximal im- provement of identifiability of causal models

Objectives: assessing identifiability

Number of edges orientable after one (single-vertex) intervention OptSingle Number of interventions (at arbitrary targets) needed for full identifiability OptUnb

Alain Hauser (ETH Z¨ urich) Active learning of causal models PGM 2012, Granada 9 / 16

slide-20
SLIDE 20

OptSingle: overview

Yields single-vertex intervention that maximizes number of

  • rientable edges in worst case

Implementation: local algorithm that finds optimal intervention target in “local” fashion, only considering neighborhood of candidate vertices Complexity: in worst case exponential, depending on clique number of I-essential graph

Alain Hauser (ETH Z¨ urich) Active learning of causal models PGM 2012, Granada 10 / 16

slide-21
SLIDE 21

OptSingle: worst case example

1 2 3 4 5

Alain Hauser (ETH Z¨ urich) Active learning of causal models PGM 2012, Granada 11 / 16

slide-22
SLIDE 22

OptSingle: worst case example

1 2 3 4 5 OptSingle: Find vertex that guarantees orientability of a maximum of edges after intervention

Alain Hauser (ETH Z¨ urich) Active learning of causal models PGM 2012, Granada 11 / 16

slide-23
SLIDE 23

OptSingle: worst case example

1 2 3 4 5 Intervention at vertex 2

Alain Hauser (ETH Z¨ urich) Active learning of causal models PGM 2012, Granada 11 / 16

slide-24
SLIDE 24

OptSingle: worst case example

1 2 3 4 5

Alain Hauser (ETH Z¨ urich) Active learning of causal models PGM 2012, Granada 11 / 16

slide-25
SLIDE 25

OptSingle: worst case example

1 2 3 4 5 OptSingle: Find vertex that guarantees orientability of a maximum of edges after intervention

Alain Hauser (ETH Z¨ urich) Active learning of causal models PGM 2012, Granada 11 / 16

slide-26
SLIDE 26

OptSingle: worst case example

1 2 3 4 5 Intervention at vertex 3

Alain Hauser (ETH Z¨ urich) Active learning of causal models PGM 2012, Granada 11 / 16

slide-27
SLIDE 27

OptSingle: worst case example

1 2 3 4 5

Alain Hauser (ETH Z¨ urich) Active learning of causal models PGM 2012, Granada 11 / 16

slide-28
SLIDE 28

OptSingle: worst case example

1 2 3 4 5 OptSingle: Find vertex that guarantees orientability of a maximum of edges after intervention

Alain Hauser (ETH Z¨ urich) Active learning of causal models PGM 2012, Granada 11 / 16

slide-29
SLIDE 29

OptSingle: worst case example

1 2 3 4 5 Intervention at vertex 4

Alain Hauser (ETH Z¨ urich) Active learning of causal models PGM 2012, Granada 11 / 16

slide-30
SLIDE 30

OptUnb: overview

Yields an intervention target (of arbitrary size) that maximally reduces clique number of interventional essential graph Iterative application of OptUnb yields minimum set of intervention targets that guarantees full identifiability for all causal models in interventional Markov equivalence class Implementation: based on LexBFS (Rose, 1970) and greedy coloring; exploits chordality of chain components Complexity: linear-time algorithm Proof of optimality proves conjecture of Eberhardt (2008) concerning number of interventions necessary and sufficient for full identifiability

Alain Hauser (ETH Z¨ urich) Active learning of causal models PGM 2012, Granada 12 / 16

slide-31
SLIDE 31

OptUnb: worst case example

1 2 3 4 5

Alain Hauser (ETH Z¨ urich) Active learning of causal models PGM 2012, Granada 13 / 16

slide-32
SLIDE 32

OptUnb: worst case example

1 2 3 4 5 Optimal coloring using greedy coloring on LexBFS-ordering (Rose, 1970)

Alain Hauser (ETH Z¨ urich) Active learning of causal models PGM 2012, Granada 13 / 16

slide-33
SLIDE 33

OptUnb: worst case example

1 2 3 4 5 Intervention at lower half of colors

Alain Hauser (ETH Z¨ urich) Active learning of causal models PGM 2012, Granada 13 / 16

slide-34
SLIDE 34

OptUnb: worst case example

1 2 3 4 5

Alain Hauser (ETH Z¨ urich) Active learning of causal models PGM 2012, Granada 13 / 16

slide-35
SLIDE 35

OptUnb: worst case example

1 2 3 4 5 Optimal coloring using greedy coloring on LexBFS-ordering (Rose, 1970)

Alain Hauser (ETH Z¨ urich) Active learning of causal models PGM 2012, Granada 13 / 16

slide-36
SLIDE 36

OptUnb: worst case example

1 2 3 4 5 Intervention at lower half of colors

Alain Hauser (ETH Z¨ urich) Active learning of causal models PGM 2012, Granada 13 / 16

slide-37
SLIDE 37

Evaluating active learning algorithms: simulation results

2 4 6 8 10 12 14 25 50 75 100 Rand RandAdv OptSingle OptUnb (T) OptUnb (V )

Interventions % DAGs not fully identified

Number of intervention steps needed for full identifiability of DAGs (p = 40), measured in targets (T) or intervened variables (V ). Thin lines: Kaplan-Meier estimates; colored bands: 95% confidence region.

Alain Hauser (ETH Z¨ urich) Active learning of causal models PGM 2012, Granada 14 / 16

slide-38
SLIDE 38

Conclusions

Causal models not fully identifiable from observational data Interventional data improves identifiability; information gain depends

  • n intervention target

Alain Hauser (ETH Z¨ urich) Active learning of causal models PGM 2012, Granada 15 / 16

slide-39
SLIDE 39

Conclusions

Causal models not fully identifiable from observational data Interventional data improves identifiability; information gain depends

  • n intervention target

Two new active learning algorithms for proposing valuable intervention targets:

◮ OptSingle: finds single-vertex target that maximizes number of

  • rientable edges after intervention

◮ OptUnb: finds target that maximally reduces clique size of

interventional essential graph; iterative application guarantees full identifiability with a minimum number of interventions

Alain Hauser (ETH Z¨ urich) Active learning of causal models PGM 2012, Granada 15 / 16

slide-40
SLIDE 40

Conclusions

Causal models not fully identifiable from observational data Interventional data improves identifiability; information gain depends

  • n intervention target

Two new active learning algorithms for proposing valuable intervention targets:

◮ OptSingle: finds single-vertex target that maximizes number of

  • rientable edges after intervention

◮ OptUnb: finds target that maximally reduces clique size of

interventional essential graph; iterative application guarantees full identifiability with a minimum number of interventions

Both strategies lead to significantly faster identification of causal models than randomly chosen interventions

Alain Hauser (ETH Z¨ urich) Active learning of causal models PGM 2012, Granada 15 / 16

slide-41
SLIDE 41

References

  • F. Eberhardt. Almost optimal intervention sets for causal discovery. In UAI, pages

161–168, 2008.

  • A. Hauser and P. B¨
  • uhlmann. Characterization and greedy learning of interventional

Markov equivalence classes of directed acyclic graphs. JMLR, 13:2409–2464, 2012. D.J. Rose. Triangulated graphs and the elimination process. Journal of Mathematical Analysis and Applications, 32(3):597–609, 1970.

¡Muchas gracias!

Alain Hauser (ETH Z¨ urich) Active learning of causal models PGM 2012, Granada 16 / 16