Predicting perturbation effects in large-scale systems from - - PowerPoint PPT Presentation

predicting perturbation effects in large scale systems
SMART_READER_LITE
LIVE PREVIEW

Predicting perturbation effects in large-scale systems from - - PowerPoint PPT Presentation

Predicting perturbation effects in large-scale systems from observational data Marloes Maathuis Seminar f ur Statistik, ETH Z urich, Switzerland Joint work with Peter B uhlmann Diego Colombo Markus Kalisch Marloes Maathuis, ETH Z


slide-1
SLIDE 1

Predicting perturbation effects in large-scale systems from observational data

Marloes Maathuis Seminar f¨ ur Statistik, ETH Z¨ urich, Switzerland

slide-2
SLIDE 2

Joint work with

Marloes Maathuis, ETH Z¨ urich 2 / 29

Peter B¨ uhlmann Diego Colombo Markus Kalisch

slide-3
SLIDE 3

Research question

Marloes Maathuis, ETH Z¨ urich 3 / 29

  • In short: Can we learn perturbation effects without doing

perturbation experiments?

slide-4
SLIDE 4

Research question

Marloes Maathuis, ETH Z¨ urich 3 / 29

  • In short: Can we learn perturbation effects without doing

perturbation experiments?

  • Concretely: Can we learn the gene regulatory network of yeast from
  • bservational data?
  • Predict perturbation effects between all pairs of genes
  • Identify pairs of genes between which there is a large effect
slide-5
SLIDE 5

Why use observational data?

Marloes Maathuis, ETH Z¨ urich 4 / 29

  • Thousands of perturbation experiments needed to estimate all

perturbation effects ⇒ time consuming and expensive

slide-6
SLIDE 6

Why use observational data?

Marloes Maathuis, ETH Z¨ urich 4 / 29

  • Thousands of perturbation experiments needed to estimate all

perturbation effects ⇒ time consuming and expensive

  • Questions:
  • Does observational data provide some information on

perturbation effects?

  • Can this information be used to guide and prioritize perturbation

experiments?

slide-7
SLIDE 7

Definition of perturbation effect

Marloes Maathuis, ETH Z¨ urich 5 / 29

  • Consider the effect of gene i on gene j.

Let Xi and Xj be the expression levels of the genes.

  • If we experimentally change Xi, what happens to Xj?
slide-8
SLIDE 8

Definition of perturbation effect

Marloes Maathuis, ETH Z¨ urich 5 / 29

  • Consider the effect of gene i on gene j.

Let Xi and Xj be the expression levels of the genes.

  • If we experimentally change Xi, what happens to Xj?
  • Hypothetical experiment:

Genetically modified such that Xi ≈ a Genetically modified such that Xi ≈ a + 1

slide-9
SLIDE 9

Definition of perturbation effect

Marloes Maathuis, ETH Z¨ urich 5 / 29

  • Consider the effect of gene i on gene j.

Let Xi and Xj be the expression levels of the genes.

  • If we experimentally change Xi, what happens to Xj?
  • Hypothetical experiment:

do(Xi = a) do(Xi = a + 1)

slide-10
SLIDE 10

Definition of perturbation effect

Marloes Maathuis, ETH Z¨ urich 5 / 29

  • Consider the effect of gene i on gene j.

Let Xi and Xj be the expression levels of the genes.

  • If we experimentally change Xi, what happens to Xj?
  • Hypothetical experiment:

do(Xi = a) do(Xi = a + 1)

  • Perturbation effect of gene i on gene j:

E(Xj|do(Xi = a + 1)) − E(Xj|do(Xi = a)) (value of a drops out if the system is linear)

slide-11
SLIDE 11

Estimating perturbation effects from observational data

Marloes Maathuis, ETH Z¨ urich 6 / 29

  • It is easy to estimating associations from observational data.

But association is not causation!

  • Pearl (2003):
  • “An associational concept is any relationship that can be defined

in terms of the joint distribution of observed variables.”

slide-12
SLIDE 12

Estimating perturbation effects from observational data

Marloes Maathuis, ETH Z¨ urich 6 / 29

  • It is easy to estimating associations from observational data.

But association is not causation!

  • Pearl (2003):
  • “An associational concept is any relationship that can be defined

in terms of the joint distribution of observed variables.”

  • “A causal concept [such as a perturbation effect] is any

relationship that cannot be defined from the distribution alone (...) Any claim invoking causal concepts must be traced to some premises that invoke such concepts; it cannot be inferred or derived from statistical associations alone.”

slide-13
SLIDE 13

Estimating perturbation effects from observational data

Marloes Maathuis, ETH Z¨ urich 6 / 29

  • It is easy to estimating associations from observational data.

But association is not causation!

  • Pearl (2003):
  • “An associational concept is any relationship that can be defined

in terms of the joint distribution of observed variables.”

  • “A causal concept [such as a perturbation effect] is any

relationship that cannot be defined from the distribution alone (...) Any claim invoking causal concepts must be traced to some premises that invoke such concepts; it cannot be inferred or derived from statistical associations alone.”

  • An assumption that is often made: data were generated by a known

directed acyclic graph (DAG)

slide-14
SLIDE 14

Directed acyclic graph (DAG)

Marloes Maathuis, ETH Z¨ urich 7 / 29

X2 X1 X3

  • Nodes represent random variables and edges represent conditional

independence relationships

  • The DAG encodes causal assumptions:
  • Edge X2 → X1: X2 may have a direct causal effect on X1
  • No edge X1 X3: X1 cannot have a direct causal effect on X3

(but X1 and X3 will be correlated!)

slide-15
SLIDE 15

Pearl’s intervention-calculus / do-calculus

Marloes Maathuis, ETH Z¨ urich 8 / 29

X2 X1 X3

  • The perturbation effect of X1 on X3:

E(X3|do(X1 = a + 1)) − E(X3|do(X1 = a))

slide-16
SLIDE 16

Pearl’s intervention-calculus / do-calculus

Marloes Maathuis, ETH Z¨ urich 8 / 29

X2 X1 X3

  • The perturbation effect of X1 on X3:

E(X3|do(X1 = a + 1)) − E(X3|do(X1 = a))

  • The do-operator stands for a hypothetical experiment. So

E(X3|do(X1 = a)) is not the usual conditional expectation! In the example:

  • E(X3|X1 = a) = E(X3)
  • E(X3|do(X1 = a)) = E(X3)
slide-17
SLIDE 17

Pearl’s intervention-calculus / do-calculus

Marloes Maathuis, ETH Z¨ urich 8 / 29

X2 X1 X3

  • The perturbation effect of X1 on X3:

E(X3|do(X1 = a + 1)) − E(X3|do(X1 = a))

  • The do-operator stands for a hypothetical experiment. So

E(X3|do(X1 = a)) is not the usual conditional expectation! In the example:

  • E(X3|X1 = a) = E(X3)
  • E(X3|do(X1 = a)) = E(X3)
  • Pearl’s do-calculus uses the DAG to write expressions involving the

do-operator in terms of pre-intervention conditional distributions

slide-18
SLIDE 18

Pearl’s intervention-calculus / do-calculus

Marloes Maathuis, ETH Z¨ urich 9 / 29

X2 X1 X3

  • Summary: If the DAG is given, one can estimate perturbation effects

(or causal effects) from observational data

slide-19
SLIDE 19

Main points in this talk

Marloes Maathuis, ETH Z¨ urich 10 / 29

  • Present IDA (Intervention calculus when the DAG is Absent)
  • Requires observational data
  • generated from an unknown DAG
  • multivariate Gaussian
  • no hidden confounders
  • potentially high-dimensional system
  • Returns (summary measures of) estimated set of possible causal

effects

  • Consistent in sparse high-dimensional settings
  • Validation on yeast data
slide-20
SLIDE 20

What to do when the DAG is unknown?

Marloes Maathuis, ETH Z¨ urich 11 / 29

  • A DAG encodes conditional independence relationships
  • So given all conditional independence relationships of the data, can

we infer the DAG?

slide-21
SLIDE 21

What to do when the DAG is unknown?

Marloes Maathuis, ETH Z¨ urich 11 / 29

  • A DAG encodes conditional independence relationships
  • So given all conditional independence relationships of the data, can

we infer the DAG?

  • Almost...
slide-22
SLIDE 22

What to do when the DAG is unknown?

Marloes Maathuis, ETH Z¨ urich 11 / 29

  • A DAG encodes conditional independence relationships
  • So given all conditional independence relationships of the data, can

we infer the DAG?

  • Almost... several DAGs can encode the same conditional

independence relationships. They form an equivalence class, described by a CPDAG.

slide-23
SLIDE 23

What to do when the DAG is unknown?

Marloes Maathuis, ETH Z¨ urich 11 / 29

  • A DAG encodes conditional independence relationships
  • So given all conditional independence relationships of the data, can

we infer the DAG?

  • Almost... several DAGs can encode the same conditional

independence relationships. They form an equivalence class, described by a CPDAG.

  • One can estimate this CPDAG, for example using the PC-algorithm
  • f Peter Spirtes and Clark Glymour (Spirtes et al, 2000)
  • Fast implementation in the R-package pcalg
  • Consistent in sparse high-dimensional settings

(Kalisch and B¨ uhlmann, JMLR 2007)

slide-24
SLIDE 24

IDA (oracle version)

Marloes Maathuis, ETH Z¨ urich 12 / 29

  • racle

CPDAG PC-algorithm DAG 1 DAG 2 . . . . . . DAG m do-calculus effect 1 effect 2 . . . . . . effect m multi-set Θ

slide-25
SLIDE 25

The multi-set Θ

Marloes Maathuis, ETH Z¨ urich 13 / 29

  • Why multi-set instead of a unique value?
slide-26
SLIDE 26

The multi-set Θ

Marloes Maathuis, ETH Z¨ urich 13 / 29

  • Why multi-set instead of a unique value?
  • Recall quote of Pearl. We make “weak” causal assumptions:
  • The data are generated from unknown DAG
  • There are no hidden confounders
slide-27
SLIDE 27

The multi-set Θ

Marloes Maathuis, ETH Z¨ urich 13 / 29

  • Why multi-set instead of a unique value?
  • Recall quote of Pearl. We make “weak” causal assumptions:
  • The data are generated from unknown DAG
  • There are no hidden confounders
  • What information does Θ provide? Examples:
  • Θ = {1.5} ⇒ causal effect is 1.5
  • Θ = {1.5, 0.5, 3.1} ⇒ causal effect is positive
  • Θ = {1.5, 1.5, −1} ⇒ absolute value of causal effect ≥ 1
slide-28
SLIDE 28

The multi-set Θ

Marloes Maathuis, ETH Z¨ urich 13 / 29

  • Why multi-set instead of a unique value?
  • Recall quote of Pearl. We make “weak” causal assumptions:
  • The data are generated from unknown DAG
  • There are no hidden confounders
  • What information does Θ provide? Examples:
  • Θ = {1.5} ⇒ causal effect is 1.5
  • Θ = {1.5, 0.5, 3.1} ⇒ causal effect is positive
  • Θ = {1.5, 1.5, −1} ⇒ absolute value of causal effect ≥ 1
  • Hence:
  • The true causal effect is always contained in Θ
  • The minimum absolute value of Θ is a lower bound on the size
  • f the true causal effect
slide-29
SLIDE 29

Scalability

Marloes Maathuis, ETH Z¨ urich 14 / 29

  • Finding all DAGs in an equivalence class is very computationally

intensive

  • Hence, method only works for small graphs (< 15 nodes)
  • Solution: local method
slide-30
SLIDE 30

IDA (oracle version)

Marloes Maathuis, ETH Z¨ urich 15 / 29

  • racle

CPDAG PC-algorithm DAG 1 DAG 2 . . . . . . DAG m do-calculus effect 1 effect 2 . . . . . . effect m multi-set Θ

slide-31
SLIDE 31

IDA (local oracle version)

Marloes Maathuis, ETH Z¨ urich 16 / 29

  • racle

CPDAG PC-algorithm do-calculus effect 1 effect 2 . . . . . . effect q multi-set ΘL

slide-32
SLIDE 32

Comparison of Θ and ΘL

Marloes Maathuis, ETH Z¨ urich 17 / 29

  • Multiplicities of elements in Θ and ΘL may differ
  • Distinct elements of Θ and ΘL are identical
  • Example: Θ = {1.5, 1.5, −1}, ΘL = {1.5, −1}
slide-33
SLIDE 33

Comparison of Θ and ΘL

Marloes Maathuis, ETH Z¨ urich 17 / 29

  • Multiplicities of elements in Θ and ΘL may differ
  • Distinct elements of Θ and ΘL are identical
  • Example: Θ = {1.5, 1.5, −1}, ΘL = {1.5, −1}
  • Minimum absolute values of ΘL and Θ are identical
slide-34
SLIDE 34

Sample version

Marloes Maathuis, ETH Z¨ urich 18 / 29

  • In practice there is no oracle...
  • Use sample version of PC algorithm to obtain an estimated

CPDAG G(α), where α is a tuning parameter

  • Replace all causal effects by their estimated versions (least

squares regression)

  • Denote results by

Θ(α) and ΘL(α)

slide-35
SLIDE 35

IDA (local oracle version)

Marloes Maathuis, ETH Z¨ urich 19 / 29

  • racle

CPDAG PC-algorithm do-calculus effect 1 effect 2 . . . . . . effect q multi-set ΘL

slide-36
SLIDE 36

IDA (local sample version)

Marloes Maathuis, ETH Z¨ urich 20 / 29

data CPDAG PC-algorithm do-calculus effect 1 effect 2 . . . . . . effect q multi-set ΘL

slide-37
SLIDE 37

IDA (local sample version)

Marloes Maathuis, ETH Z¨ urich 21 / 29

data

  • CPDAG

PC-algorithm do-calculus

  • effect 1
  • effect 2

. . . . . .

  • effect q

multi-set ΘL requires tuning parameter α

slide-38
SLIDE 38

High dimensional setting

Marloes Maathuis, ETH Z¨ urich 22 / 29

  • We allow the underlying graph to grow as n grows:
  • DAG G = Gn
  • Number of variables p = pn
  • Distribution P = Pn
  • Causal sets Θnij and ΘL

nij containing the effect of Xni on Xnj

slide-39
SLIDE 39

High dimensional setting

Marloes Maathuis, ETH Z¨ urich 22 / 29

  • We allow the underlying graph to grow as n grows:
  • DAG G = Gn
  • Number of variables p = pn
  • Distribution P = Pn
  • Causal sets Θnij and ΘL

nij containing the effect of Xni on Xnj

  • Assume:
  • Pn is multivariate Gaussian and faithful to true unknown causal

DAG Gn

  • High-dimensionality and sparseness:
  • pn = O(na), for some 0 ≤ a < ∞
  • Maximum number of neighbors in Gn is of order O(n1−b),

for some 0 < b ≤ 1

  • Some regularity conditions on partial correlations and

conditional variances

slide-40
SLIDE 40

Consistency in high dimensional setting

Marloes Maathuis, ETH Z¨ urich 23 / 29

  • Uniform consistency of

Θnij and ΘL

nij: There exists a sequence αn

such that sup

i=j∈{1,...,pn}

d( Θnij(αn), Θnij) →p 0 as n → ∞, sup

i=j∈{1,...,pn}

d( ΘL

nij(αn), ΘL nij) →p 0 as n → ∞.

slide-41
SLIDE 41

Consistency in high dimensional setting

Marloes Maathuis, ETH Z¨ urich 23 / 29

  • Uniform consistency of

Θnij and ΘL

nij: There exists a sequence αn

such that sup

i=j∈{1,...,pn}

d( Θnij(αn), Θnij) →p 0 as n → ∞, sup

i=j∈{1,...,pn}

d( ΘL

nij(αn), ΘL nij) →p 0 as n → ∞.

  • Corollary: the minimum absolute value of Θnij can be consistently

estimated by the local algorithm, i.e., there exists a sequence αn such that sup

i=j∈{1,...,pn}

  • min{|

θ| : θ ∈ ΘL

nij(αn)} − min{|θ| : θ ∈ Θnij}

  • →p 0.
slide-42
SLIDE 42

Validation: overview

Marloes Maathuis, ETH Z¨ urich 24 / 29

complex system experimental data

  • bservational

data compute and rank causal effects apply IDA and rank the effects compare rankings

slide-43
SLIDE 43

Validation: data and methods

Marloes Maathuis, ETH Z¨ urich 25 / 29

  • Yeast gene expression data (Hughes et al., Cell 2000):
  • Experimental data: expression levels of 5361 genes for 234

single gene deletion strains

  • Observational data: expression levels of 5361 genes for 63

wild-type cultures

slide-44
SLIDE 44

Validation: data and methods

Marloes Maathuis, ETH Z¨ urich 25 / 29

  • Yeast gene expression data (Hughes et al., Cell 2000):
  • Experimental data: expression levels of 5361 genes for 234

single gene deletion strains

  • Observational data: expression levels of 5361 genes for 63

wild-type cultures

  • Experimental data:
  • Compute causal effects of the knock-out genes on all remaining

genes (234 × 5360 ≈ 1 million effects)

slide-45
SLIDE 45

Validation: data and methods

Marloes Maathuis, ETH Z¨ urich 25 / 29

  • Yeast gene expression data (Hughes et al., Cell 2000):
  • Experimental data: expression levels of 5361 genes for 234

single gene deletion strains

  • Observational data: expression levels of 5361 genes for 63

wild-type cultures

  • Experimental data:
  • Compute causal effects of the knock-out genes on all remaining

genes (234 × 5360 ≈ 1 million effects)

  • Observational data:
  • Apply IDA
  • Apply other methods: random guessing, Lasso and Elastic-net
slide-46
SLIDE 46

Evaluation: comparing the rankings

Marloes Maathuis, ETH Z¨ urich 26 / 29

  • For the effects based on experimental data:
  • Define the largest 10% as target set
slide-47
SLIDE 47

Evaluation: comparing the rankings

Marloes Maathuis, ETH Z¨ urich 26 / 29

  • For the effects based on experimental data:
  • Define the largest 10% as target set
  • For the effects based on the observational data (for 4 methods):
  • Rank them and take the top q effects
  • Compute nr of true positives: effects in the target set
  • Compute nr of false positives: effects not in the target set
slide-48
SLIDE 48

Evaluation: comparing the rankings

Marloes Maathuis, ETH Z¨ urich 26 / 29

  • For the effects based on experimental data:
  • Define the largest 10% as target set
  • For the effects based on the observational data (for 4 methods):
  • Rank them and take the top q effects
  • Compute nr of true positives: effects in the target set
  • Compute nr of false positives: effects not in the target set
  • Create ROC curve
slide-49
SLIDE 49

ROC curve

Marloes Maathuis, ETH Z¨ urich 27 / 29

1000 2000 3000 4000 false positives 200 400 600 800 1000 true positives

slide-50
SLIDE 50

ROC curve

Marloes Maathuis, ETH Z¨ urich 27 / 29

1000 2000 3000 4000 false positives 200 400 600 800 1000 true positives

Consider top q = 1000 effects TP FP Random guessing 100 900

slide-51
SLIDE 51

ROC curve

Marloes Maathuis, ETH Z¨ urich 27 / 29

1000 2000 3000 4000 false positives 200 400 600 800 1000 true positives

Consider top q = 1000 effects TP FP Random guessing 100 900 Lasso / E-net 130 870

slide-52
SLIDE 52

ROC curve

Marloes Maathuis, ETH Z¨ urich 27 / 29

1000 2000 3000 4000 false positives 200 400 600 800 1000 true positives

Consider top q = 1000 effects TP FP Random guessing 100 900 Lasso / E-net 130 870 IDA 425 575

slide-53
SLIDE 53

Summary

Marloes Maathuis, ETH Z¨ urich 28 / 29

  • Problem:
  • Learning perturbation effects without doing perturbation

experiments

slide-54
SLIDE 54

Summary

Marloes Maathuis, ETH Z¨ urich 28 / 29

  • Problem:
  • Learning perturbation effects without doing perturbation

experiments

  • Existing work:
  • Estimating causal effects when DAG is known (do-calculus)
  • Estimating equivalence class of DAGs (PC-algorithm)
slide-55
SLIDE 55

Summary

Marloes Maathuis, ETH Z¨ urich 28 / 29

  • Problem:
  • Learning perturbation effects without doing perturbation

experiments

  • Existing work:
  • Estimating causal effects when DAG is known (do-calculus)
  • Estimating equivalence class of DAGs (PC-algorithm)
  • New contributions:
  • Put these two pieces together to estimate sets of possible

perturbation effects (Θ)

  • Fast local method that correctly finds the distinct values of Θ
  • Consistency in sparse high-dimensional settings
  • Validation of the method on yeast data
  • New computational tool for the design of experiments
slide-56
SLIDE 56

Summary

Marloes Maathuis, ETH Z¨ urich 28 / 29

  • Problem:
  • Learning perturbation effects without doing perturbation

experiments

  • Existing work:
  • Estimating causal effects when DAG is known (do-calculus)
  • Estimating equivalence class of DAGs (PC-algorithm)
  • New contributions:
  • Put these two pieces together to estimate sets of possible

perturbation effects (Θ)

  • Fast local method that correctly finds the distinct values of Θ
  • Consistency in sparse high-dimensional settings
  • Validation of the method on yeast data
  • New computational tool for the design of experiments
  • Current work: allow for hidden variables
slide-57
SLIDE 57

References

Marloes Maathuis, ETH Z¨ urich 29 / 29

  • Main references:
  • MAATHUIS, KALISCH AND B ¨

UHLMANN (2009).

“Estimating high-dimensional intervention effects from

  • bservational data”. Annals of Statistics 37 3133-3164.
  • MAATHUIS, COLOMBO, KALISCH AND B ¨

UHLMANN (2010).

”Predicting causal effects in large-scale systems from

  • bservational data”. Nature Methods 7 247-248.
  • R-package pcalg (available on CRAN)
  • Contact info:
  • http://www.stat.math.ethz.ch/∼maathuis
  • maathuis@stat.math.ethz.ch
slide-58
SLIDE 58

References

Marloes Maathuis, ETH Z¨ urich 29 / 29

  • Main references:
  • MAATHUIS, KALISCH AND B ¨

UHLMANN (2009).

“Estimating high-dimensional intervention effects from

  • bservational data”. Annals of Statistics 37 3133-3164.
  • MAATHUIS, COLOMBO, KALISCH AND B ¨

UHLMANN (2010).

”Predicting causal effects in large-scale systems from

  • bservational data”. Nature Methods 7 247-248.
  • R-package pcalg (available on CRAN)
  • Contact info:
  • http://www.stat.math.ethz.ch/∼maathuis
  • maathuis@stat.math.ethz.ch

Thanks!