Two Optimal Strategies for Active Learning of Causal Models from - - PowerPoint PPT Presentation

▶

Jan 08, 2023 30 likes •445 views

Two Optimal Strategies for Active Learning of Causal Models from Interventions Alain Hauser Peter B uhlmann Seminar f ur Statistik, ETH Z urich PGM 2012, Granada Alain Hauser (ETH Z urich) Active learning of causal models PGM

SLIDE 1

Two Optimal Strategies for Active Learning

f Causal Models from Interventions

Alain Hauser Peter B¨ uhlmann

Seminar f¨ ur Statistik, ETH Z¨ urich

PGM 2012, Granada

Alain Hauser (ETH Z¨ urich) Active learning of causal models PGM 2012, Granada 1 / 16

SLIDE 2

Causal model: example

Random variables: X1: taxis honking X2: Jonas awake X3: Alain awake X4: watermelons eaten

Alain Hauser (ETH Z¨ urich) Active learning of causal models PGM 2012, Granada 2 / 16

SLIDE 3

Causal model: example

Random variables: X1: taxis honking X2: Jonas awake X3: Alain awake X4: watermelons eaten Directed acyclic graph (DAG)

f causal dependencies:

1 2 3 4

Alain Hauser (ETH Z¨ urich) Active learning of causal models PGM 2012, Granada 2 / 16

SLIDE 4

Causal model: example

Random variables: X1: taxis honking X2: Jonas awake X3: Alain awake X4: watermelons eaten Directed acyclic graph (DAG)

f causal dependencies:

1 2 3 4 Factorization of density: f (x) = f (x1)f (x2|x1)f (x3|x1)f (x4|x2, x3) f has Markov property of D

Alain Hauser (ETH Z¨ urich) Active learning of causal models PGM 2012, Granada 2 / 16

SLIDE 5

Intervention: example

Random variables: X1: taxis honking X2: Jonas awake X3: Alain awake X4: watermelons eaten 1 2 3 4 True DAG D Observational density: f (x) = f (x1)f (x2|x1)f (x3|x1)f (x4|x2, x3)

Alain Hauser (ETH Z¨ urich) Active learning of causal models PGM 2012, Granada 3 / 16

SLIDE 6

Intervention: example

Random variables: X1: taxis honking X2: Jonas awake X3: Alain awake X4: watermelons eaten Intervention at X2: waking Jonas 1 2 3 4 Observational density: f (x) = f (x1)f (x2|x1)f (x3|x1)f (x4|x2, x3)

Alain Hauser (ETH Z¨ urich) Active learning of causal models PGM 2012, Granada 3 / 16

SLIDE 7

Intervention: example

Random variables: X1: taxis honking X2: Jonas awake X3: Alain awake X4: watermelons eaten 1 2 3 4 Intervention DAG D({2}) Observational density: f (x) = f (x1)f (x2|x1)f (x3|x1)f (x4|x2, x3) Interventional density: f (x|do(X2 = U)) = f (x1)˜ f (x2)f (x3|x1)f (x4|x2, x3)

Alain Hauser (ETH Z¨ urich) Active learning of causal models PGM 2012, Granada 3 / 16

SLIDE 8

Markov equivalence

A probability density in general obeys the Markov properties of several DAGs; those DAGs are called Markov equivalent limited identifiability under observational data 1 2 3 4 D 1 2 3 4 D1 1 2 3 4 D2

Alain Hauser (ETH Z¨ urich) Active learning of causal models PGM 2012, Granada 4 / 16

SLIDE 9

Markov equivalence

A probability density in general obeys the Markov properties of several DAGs; those DAGs are called Markov equivalent limited identifiability under observational data 1 2 3 4 D 1 2 3 4 D1 1 2 3 4 D2 On the other hand, intervention effects do depend on the DAG improved identifiability of causal models under interventional data

Alain Hauser (ETH Z¨ urich) Active learning of causal models PGM 2012, Granada 4 / 16

SLIDE 10

Interventional Markov equivalence

Assume experiment in which different interventions at targets I1, I2, . . . are performed, summarized as family of targets I = {I1, I2, . . .}. Note: observational case corresponds to special family I = {∅}

Definition (I-Markov equivalence; Hauser and B¨ uhlmann, 2012)

Given a family of targets I, two DAGs D1 and D2 are called I-Markov equivalent if they produce the same class of tuples of interventional densities.

Alain Hauser (ETH Z¨ urich) Active learning of causal models PGM 2012, Granada 5 / 16

SLIDE 11

Interventional Markov equivalence

Assume experiment in which different interventions at targets I1, I2, . . . are performed, summarized as family of targets I = {I1, I2, . . .}. Note: observational case corresponds to special family I = {∅}

Definition (I-Markov equivalence; Hauser and B¨ uhlmann, 2012)

Given a family of targets I, two DAGs D1 and D2 are called I-Markov equivalent if they produce the same class of tuples of interventional densities. In words: two DAGs D1 and D2 are I-Markov equivalent if they are statistically indistinguishable from data produced from interventions at the targets in I.

Alain Hauser (ETH Z¨ urich) Active learning of causal models PGM 2012, Granada 5 / 16

SLIDE 12

Interventional essential graph

Definition

Let I be a family of targets. The I-essential graph of some DAG D is defined as EI(D) :=

D′∼ID D′.

In words: EI(D) is a partially directed graph having the same skeleton as D with a directed edge where the corresponding arrows of all DAGs I-equivalent to D have the same orientation with an undirected edge where the orientation of the corresponding arrow is not common to all DAGs I-equivalent to D

Alain Hauser (ETH Z¨ urich) Active learning of causal models PGM 2012, Granada 6 / 16

SLIDE 13

Interventional essential graph

Definition

Let I be a family of targets. The I-essential graph of some DAG D is defined as EI(D) :=

D′∼ID D′.

In words: EI(D) is a partially directed graph having the same skeleton as D with a directed edge where the corresponding arrows of all DAGs I-equivalent to D have the same orientation with an undirected edge where the orientation of the corresponding arrow is not common to all DAGs I-equivalent to D Properties: unique representation of an I-Markov equivalence class chain graph with chordal chain components (Hauser and B¨ uhlmann, 2012)

Alain Hauser (ETH Z¨ urich) Active learning of causal models PGM 2012, Granada 6 / 16

SLIDE 14

Interventional Markov equivalence: example

1 2 3 4 D 1 2 3 4 D1 1 2 3 4 D2 1 2 3 4 E{∅}(D) Observational Markov equivalence class of D with corresponding essential graph

Alain Hauser (ETH Z¨ urich) Active learning of causal models PGM 2012, Granada 7 / 16

SLIDE 15

Interventional Markov equivalence: example

1 2 3 4 D 1 2 3 4 D1 1 2 3 4 E{∅,{2}}(D) Interventional Markov equivalence class of D for family of targets I = {∅, {2}}. Corresponds to an experiment which measures

bservational data (I = ∅)

interventional data from an intervention at X2 (I = {2})

Alain Hauser (ETH Z¨ urich) Active learning of causal models PGM 2012, Granada 8 / 16

SLIDE 16

Active learning: overview

Up to now: given list of interventions; characterization of identifiability via interventional essential graphs

Alain Hauser (ETH Z¨ urich) Active learning of causal models PGM 2012, Granada 9 / 16

SLIDE 17

Active learning: overview

Up to now: given list of interventions; characterization of identifiability via interventional essential graphs

Problem

1 2 3 4 5 Given list of interventions performed so far and corresponding interventional essential graph, find “optimal” intervention target for maximal im- provement of identifiability of causal models

Alain Hauser (ETH Z¨ urich) Active learning of causal models PGM 2012, Granada 9 / 16

SLIDE 18

Active learning: overview

Up to now: given list of interventions; characterization of identifiability via interventional essential graphs

Problem

1 2 3 4 5 Given list of interventions performed so far and corresponding interventional essential graph, find “optimal” intervention target for maximal im- provement of identifiability of causal models

Objectives: assessing identifiability

Number of edges orientable after one (single-vertex) intervention OptSingle

Alain Hauser (ETH Z¨ urich) Active learning of causal models PGM 2012, Granada 9 / 16

SLIDE 19

Active learning: overview

Up to now: given list of interventions; characterization of identifiability via interventional essential graphs

Problem

1 2 3 4 5 Given list of interventions performed so far and corresponding interventional essential graph, find “optimal” intervention target for maximal im- provement of identifiability of causal models

Objectives: assessing identifiability

Number of edges orientable after one (single-vertex) intervention OptSingle Number of interventions (at arbitrary targets) needed for full identifiability OptUnb

Alain Hauser (ETH Z¨ urich) Active learning of causal models PGM 2012, Granada 9 / 16

SLIDE 20

OptSingle: overview

Yields single-vertex intervention that maximizes number of

rientable edges in worst case

Implementation: local algorithm that finds optimal intervention target in “local” fashion, only considering neighborhood of candidate vertices Complexity: in worst case exponential, depending on clique number of I-essential graph

Alain Hauser (ETH Z¨ urich) Active learning of causal models PGM 2012, Granada 10 / 16

SLIDE 21