Predicting perturbation effects in large-scale systems from - - PowerPoint PPT Presentation
Predicting perturbation effects in large-scale systems from - - PowerPoint PPT Presentation
Predicting perturbation effects in large-scale systems from observational data Marloes Maathuis Seminar f ur Statistik, ETH Z urich, Switzerland Joint work with Peter B uhlmann Diego Colombo Markus Kalisch Marloes Maathuis, ETH Z
Joint work with
Marloes Maathuis, ETH Z¨ urich 2 / 29
Peter B¨ uhlmann Diego Colombo Markus Kalisch
Research question
Marloes Maathuis, ETH Z¨ urich 3 / 29
- In short: Can we learn perturbation effects without doing
perturbation experiments?
Research question
Marloes Maathuis, ETH Z¨ urich 3 / 29
- In short: Can we learn perturbation effects without doing
perturbation experiments?
- Concretely: Can we learn the gene regulatory network of yeast from
- bservational data?
- Predict perturbation effects between all pairs of genes
- Identify pairs of genes between which there is a large effect
Why use observational data?
Marloes Maathuis, ETH Z¨ urich 4 / 29
- Thousands of perturbation experiments needed to estimate all
perturbation effects ⇒ time consuming and expensive
Why use observational data?
Marloes Maathuis, ETH Z¨ urich 4 / 29
- Thousands of perturbation experiments needed to estimate all
perturbation effects ⇒ time consuming and expensive
- Questions:
- Does observational data provide some information on
perturbation effects?
- Can this information be used to guide and prioritize perturbation
experiments?
Definition of perturbation effect
Marloes Maathuis, ETH Z¨ urich 5 / 29
- Consider the effect of gene i on gene j.
Let Xi and Xj be the expression levels of the genes.
- If we experimentally change Xi, what happens to Xj?
Definition of perturbation effect
Marloes Maathuis, ETH Z¨ urich 5 / 29
- Consider the effect of gene i on gene j.
Let Xi and Xj be the expression levels of the genes.
- If we experimentally change Xi, what happens to Xj?
- Hypothetical experiment:
Genetically modified such that Xi ≈ a Genetically modified such that Xi ≈ a + 1
Definition of perturbation effect
Marloes Maathuis, ETH Z¨ urich 5 / 29
- Consider the effect of gene i on gene j.
Let Xi and Xj be the expression levels of the genes.
- If we experimentally change Xi, what happens to Xj?
- Hypothetical experiment:
do(Xi = a) do(Xi = a + 1)
Definition of perturbation effect
Marloes Maathuis, ETH Z¨ urich 5 / 29
- Consider the effect of gene i on gene j.
Let Xi and Xj be the expression levels of the genes.
- If we experimentally change Xi, what happens to Xj?
- Hypothetical experiment:
do(Xi = a) do(Xi = a + 1)
- Perturbation effect of gene i on gene j:
E(Xj|do(Xi = a + 1)) − E(Xj|do(Xi = a)) (value of a drops out if the system is linear)
Estimating perturbation effects from observational data
Marloes Maathuis, ETH Z¨ urich 6 / 29
- It is easy to estimating associations from observational data.
But association is not causation!
- Pearl (2003):
- “An associational concept is any relationship that can be defined
in terms of the joint distribution of observed variables.”
Estimating perturbation effects from observational data
Marloes Maathuis, ETH Z¨ urich 6 / 29
- It is easy to estimating associations from observational data.
But association is not causation!
- Pearl (2003):
- “An associational concept is any relationship that can be defined
in terms of the joint distribution of observed variables.”
- “A causal concept [such as a perturbation effect] is any
relationship that cannot be defined from the distribution alone (...) Any claim invoking causal concepts must be traced to some premises that invoke such concepts; it cannot be inferred or derived from statistical associations alone.”
Estimating perturbation effects from observational data
Marloes Maathuis, ETH Z¨ urich 6 / 29
- It is easy to estimating associations from observational data.
But association is not causation!
- Pearl (2003):
- “An associational concept is any relationship that can be defined
in terms of the joint distribution of observed variables.”
- “A causal concept [such as a perturbation effect] is any
relationship that cannot be defined from the distribution alone (...) Any claim invoking causal concepts must be traced to some premises that invoke such concepts; it cannot be inferred or derived from statistical associations alone.”
- An assumption that is often made: data were generated by a known
directed acyclic graph (DAG)
Directed acyclic graph (DAG)
Marloes Maathuis, ETH Z¨ urich 7 / 29
X2 X1 X3
- Nodes represent random variables and edges represent conditional
independence relationships
- The DAG encodes causal assumptions:
- Edge X2 → X1: X2 may have a direct causal effect on X1
- No edge X1 X3: X1 cannot have a direct causal effect on X3
(but X1 and X3 will be correlated!)
Pearl’s intervention-calculus / do-calculus
Marloes Maathuis, ETH Z¨ urich 8 / 29
X2 X1 X3
- The perturbation effect of X1 on X3:
E(X3|do(X1 = a + 1)) − E(X3|do(X1 = a))
Pearl’s intervention-calculus / do-calculus
Marloes Maathuis, ETH Z¨ urich 8 / 29
X2 X1 X3
- The perturbation effect of X1 on X3:
E(X3|do(X1 = a + 1)) − E(X3|do(X1 = a))
- The do-operator stands for a hypothetical experiment. So
E(X3|do(X1 = a)) is not the usual conditional expectation! In the example:
- E(X3|X1 = a) = E(X3)
- E(X3|do(X1 = a)) = E(X3)
Pearl’s intervention-calculus / do-calculus
Marloes Maathuis, ETH Z¨ urich 8 / 29
X2 X1 X3
- The perturbation effect of X1 on X3:
E(X3|do(X1 = a + 1)) − E(X3|do(X1 = a))
- The do-operator stands for a hypothetical experiment. So
E(X3|do(X1 = a)) is not the usual conditional expectation! In the example:
- E(X3|X1 = a) = E(X3)
- E(X3|do(X1 = a)) = E(X3)
- Pearl’s do-calculus uses the DAG to write expressions involving the
do-operator in terms of pre-intervention conditional distributions
Pearl’s intervention-calculus / do-calculus
Marloes Maathuis, ETH Z¨ urich 9 / 29
X2 X1 X3
- Summary: If the DAG is given, one can estimate perturbation effects
(or causal effects) from observational data
Main points in this talk
Marloes Maathuis, ETH Z¨ urich 10 / 29
- Present IDA (Intervention calculus when the DAG is Absent)
- Requires observational data
- generated from an unknown DAG
- multivariate Gaussian
- no hidden confounders
- potentially high-dimensional system
- Returns (summary measures of) estimated set of possible causal
effects
- Consistent in sparse high-dimensional settings
- Validation on yeast data
What to do when the DAG is unknown?
Marloes Maathuis, ETH Z¨ urich 11 / 29
- A DAG encodes conditional independence relationships
- So given all conditional independence relationships of the data, can
we infer the DAG?
What to do when the DAG is unknown?
Marloes Maathuis, ETH Z¨ urich 11 / 29
- A DAG encodes conditional independence relationships
- So given all conditional independence relationships of the data, can
we infer the DAG?
- Almost...
What to do when the DAG is unknown?
Marloes Maathuis, ETH Z¨ urich 11 / 29
- A DAG encodes conditional independence relationships
- So given all conditional independence relationships of the data, can
we infer the DAG?
- Almost... several DAGs can encode the same conditional
independence relationships. They form an equivalence class, described by a CPDAG.
What to do when the DAG is unknown?
Marloes Maathuis, ETH Z¨ urich 11 / 29
- A DAG encodes conditional independence relationships
- So given all conditional independence relationships of the data, can
we infer the DAG?
- Almost... several DAGs can encode the same conditional
independence relationships. They form an equivalence class, described by a CPDAG.
- One can estimate this CPDAG, for example using the PC-algorithm
- f Peter Spirtes and Clark Glymour (Spirtes et al, 2000)
- Fast implementation in the R-package pcalg
- Consistent in sparse high-dimensional settings
(Kalisch and B¨ uhlmann, JMLR 2007)
IDA (oracle version)
Marloes Maathuis, ETH Z¨ urich 12 / 29
- racle
CPDAG PC-algorithm DAG 1 DAG 2 . . . . . . DAG m do-calculus effect 1 effect 2 . . . . . . effect m multi-set Θ
The multi-set Θ
Marloes Maathuis, ETH Z¨ urich 13 / 29
- Why multi-set instead of a unique value?
The multi-set Θ
Marloes Maathuis, ETH Z¨ urich 13 / 29
- Why multi-set instead of a unique value?
- Recall quote of Pearl. We make “weak” causal assumptions:
- The data are generated from unknown DAG
- There are no hidden confounders
The multi-set Θ
Marloes Maathuis, ETH Z¨ urich 13 / 29
- Why multi-set instead of a unique value?
- Recall quote of Pearl. We make “weak” causal assumptions:
- The data are generated from unknown DAG
- There are no hidden confounders
- What information does Θ provide? Examples:
- Θ = {1.5} ⇒ causal effect is 1.5
- Θ = {1.5, 0.5, 3.1} ⇒ causal effect is positive
- Θ = {1.5, 1.5, −1} ⇒ absolute value of causal effect ≥ 1
The multi-set Θ
Marloes Maathuis, ETH Z¨ urich 13 / 29
- Why multi-set instead of a unique value?
- Recall quote of Pearl. We make “weak” causal assumptions:
- The data are generated from unknown DAG
- There are no hidden confounders
- What information does Θ provide? Examples:
- Θ = {1.5} ⇒ causal effect is 1.5
- Θ = {1.5, 0.5, 3.1} ⇒ causal effect is positive
- Θ = {1.5, 1.5, −1} ⇒ absolute value of causal effect ≥ 1
- Hence:
- The true causal effect is always contained in Θ
- The minimum absolute value of Θ is a lower bound on the size
- f the true causal effect
Scalability
Marloes Maathuis, ETH Z¨ urich 14 / 29
- Finding all DAGs in an equivalence class is very computationally
intensive
- Hence, method only works for small graphs (< 15 nodes)
- Solution: local method
IDA (oracle version)
Marloes Maathuis, ETH Z¨ urich 15 / 29
- racle
CPDAG PC-algorithm DAG 1 DAG 2 . . . . . . DAG m do-calculus effect 1 effect 2 . . . . . . effect m multi-set Θ
IDA (local oracle version)
Marloes Maathuis, ETH Z¨ urich 16 / 29
- racle
CPDAG PC-algorithm do-calculus effect 1 effect 2 . . . . . . effect q multi-set ΘL
Comparison of Θ and ΘL
Marloes Maathuis, ETH Z¨ urich 17 / 29
- Multiplicities of elements in Θ and ΘL may differ
- Distinct elements of Θ and ΘL are identical
- Example: Θ = {1.5, 1.5, −1}, ΘL = {1.5, −1}
Comparison of Θ and ΘL
Marloes Maathuis, ETH Z¨ urich 17 / 29
- Multiplicities of elements in Θ and ΘL may differ
- Distinct elements of Θ and ΘL are identical
- Example: Θ = {1.5, 1.5, −1}, ΘL = {1.5, −1}
- Minimum absolute values of ΘL and Θ are identical
Sample version
Marloes Maathuis, ETH Z¨ urich 18 / 29
- In practice there is no oracle...
- Use sample version of PC algorithm to obtain an estimated
CPDAG G(α), where α is a tuning parameter
- Replace all causal effects by their estimated versions (least
squares regression)
- Denote results by
Θ(α) and ΘL(α)
IDA (local oracle version)
Marloes Maathuis, ETH Z¨ urich 19 / 29
- racle
CPDAG PC-algorithm do-calculus effect 1 effect 2 . . . . . . effect q multi-set ΘL
IDA (local sample version)
Marloes Maathuis, ETH Z¨ urich 20 / 29
data CPDAG PC-algorithm do-calculus effect 1 effect 2 . . . . . . effect q multi-set ΘL
IDA (local sample version)
Marloes Maathuis, ETH Z¨ urich 21 / 29
data
- CPDAG
PC-algorithm do-calculus
- effect 1
- effect 2
. . . . . .
- effect q
multi-set ΘL requires tuning parameter α
High dimensional setting
Marloes Maathuis, ETH Z¨ urich 22 / 29
- We allow the underlying graph to grow as n grows:
- DAG G = Gn
- Number of variables p = pn
- Distribution P = Pn
- Causal sets Θnij and ΘL
nij containing the effect of Xni on Xnj
High dimensional setting
Marloes Maathuis, ETH Z¨ urich 22 / 29
- We allow the underlying graph to grow as n grows:
- DAG G = Gn
- Number of variables p = pn
- Distribution P = Pn
- Causal sets Θnij and ΘL
nij containing the effect of Xni on Xnj
- Assume:
- Pn is multivariate Gaussian and faithful to true unknown causal
DAG Gn
- High-dimensionality and sparseness:
- pn = O(na), for some 0 ≤ a < ∞
- Maximum number of neighbors in Gn is of order O(n1−b),
for some 0 < b ≤ 1
- Some regularity conditions on partial correlations and
conditional variances
Consistency in high dimensional setting
Marloes Maathuis, ETH Z¨ urich 23 / 29
- Uniform consistency of
Θnij and ΘL
nij: There exists a sequence αn
such that sup
i=j∈{1,...,pn}
d( Θnij(αn), Θnij) →p 0 as n → ∞, sup
i=j∈{1,...,pn}
d( ΘL
nij(αn), ΘL nij) →p 0 as n → ∞.
Consistency in high dimensional setting
Marloes Maathuis, ETH Z¨ urich 23 / 29
- Uniform consistency of
Θnij and ΘL
nij: There exists a sequence αn
such that sup
i=j∈{1,...,pn}
d( Θnij(αn), Θnij) →p 0 as n → ∞, sup
i=j∈{1,...,pn}
d( ΘL
nij(αn), ΘL nij) →p 0 as n → ∞.
- Corollary: the minimum absolute value of Θnij can be consistently
estimated by the local algorithm, i.e., there exists a sequence αn such that sup
i=j∈{1,...,pn}
- min{|
θ| : θ ∈ ΘL
nij(αn)} − min{|θ| : θ ∈ Θnij}
- →p 0.
Validation: overview
Marloes Maathuis, ETH Z¨ urich 24 / 29
complex system experimental data
- bservational
data compute and rank causal effects apply IDA and rank the effects compare rankings
Validation: data and methods
Marloes Maathuis, ETH Z¨ urich 25 / 29
- Yeast gene expression data (Hughes et al., Cell 2000):
- Experimental data: expression levels of 5361 genes for 234
single gene deletion strains
- Observational data: expression levels of 5361 genes for 63
wild-type cultures
Validation: data and methods
Marloes Maathuis, ETH Z¨ urich 25 / 29
- Yeast gene expression data (Hughes et al., Cell 2000):
- Experimental data: expression levels of 5361 genes for 234
single gene deletion strains
- Observational data: expression levels of 5361 genes for 63
wild-type cultures
- Experimental data:
- Compute causal effects of the knock-out genes on all remaining
genes (234 × 5360 ≈ 1 million effects)
Validation: data and methods
Marloes Maathuis, ETH Z¨ urich 25 / 29
- Yeast gene expression data (Hughes et al., Cell 2000):
- Experimental data: expression levels of 5361 genes for 234
single gene deletion strains
- Observational data: expression levels of 5361 genes for 63
wild-type cultures
- Experimental data:
- Compute causal effects of the knock-out genes on all remaining
genes (234 × 5360 ≈ 1 million effects)
- Observational data:
- Apply IDA
- Apply other methods: random guessing, Lasso and Elastic-net
Evaluation: comparing the rankings
Marloes Maathuis, ETH Z¨ urich 26 / 29
- For the effects based on experimental data:
- Define the largest 10% as target set
Evaluation: comparing the rankings
Marloes Maathuis, ETH Z¨ urich 26 / 29
- For the effects based on experimental data:
- Define the largest 10% as target set
- For the effects based on the observational data (for 4 methods):
- Rank them and take the top q effects
- Compute nr of true positives: effects in the target set
- Compute nr of false positives: effects not in the target set
Evaluation: comparing the rankings
Marloes Maathuis, ETH Z¨ urich 26 / 29
- For the effects based on experimental data:
- Define the largest 10% as target set
- For the effects based on the observational data (for 4 methods):
- Rank them and take the top q effects
- Compute nr of true positives: effects in the target set
- Compute nr of false positives: effects not in the target set
- Create ROC curve
ROC curve
Marloes Maathuis, ETH Z¨ urich 27 / 29
1000 2000 3000 4000 false positives 200 400 600 800 1000 true positives
ROC curve
Marloes Maathuis, ETH Z¨ urich 27 / 29
1000 2000 3000 4000 false positives 200 400 600 800 1000 true positives
Consider top q = 1000 effects TP FP Random guessing 100 900
ROC curve
Marloes Maathuis, ETH Z¨ urich 27 / 29
1000 2000 3000 4000 false positives 200 400 600 800 1000 true positives
Consider top q = 1000 effects TP FP Random guessing 100 900 Lasso / E-net 130 870
ROC curve
Marloes Maathuis, ETH Z¨ urich 27 / 29
1000 2000 3000 4000 false positives 200 400 600 800 1000 true positives
Consider top q = 1000 effects TP FP Random guessing 100 900 Lasso / E-net 130 870 IDA 425 575
Summary
Marloes Maathuis, ETH Z¨ urich 28 / 29
- Problem:
- Learning perturbation effects without doing perturbation
experiments
Summary
Marloes Maathuis, ETH Z¨ urich 28 / 29
- Problem:
- Learning perturbation effects without doing perturbation
experiments
- Existing work:
- Estimating causal effects when DAG is known (do-calculus)
- Estimating equivalence class of DAGs (PC-algorithm)
Summary
Marloes Maathuis, ETH Z¨ urich 28 / 29
- Problem:
- Learning perturbation effects without doing perturbation
experiments
- Existing work:
- Estimating causal effects when DAG is known (do-calculus)
- Estimating equivalence class of DAGs (PC-algorithm)
- New contributions:
- Put these two pieces together to estimate sets of possible
perturbation effects (Θ)
- Fast local method that correctly finds the distinct values of Θ
- Consistency in sparse high-dimensional settings
- Validation of the method on yeast data
- New computational tool for the design of experiments
Summary
Marloes Maathuis, ETH Z¨ urich 28 / 29
- Problem:
- Learning perturbation effects without doing perturbation
experiments
- Existing work:
- Estimating causal effects when DAG is known (do-calculus)
- Estimating equivalence class of DAGs (PC-algorithm)
- New contributions:
- Put these two pieces together to estimate sets of possible
perturbation effects (Θ)
- Fast local method that correctly finds the distinct values of Θ
- Consistency in sparse high-dimensional settings
- Validation of the method on yeast data
- New computational tool for the design of experiments
- Current work: allow for hidden variables
References
Marloes Maathuis, ETH Z¨ urich 29 / 29
- Main references:
- MAATHUIS, KALISCH AND B ¨
UHLMANN (2009).
“Estimating high-dimensional intervention effects from
- bservational data”. Annals of Statistics 37 3133-3164.
- MAATHUIS, COLOMBO, KALISCH AND B ¨
UHLMANN (2010).
”Predicting causal effects in large-scale systems from
- bservational data”. Nature Methods 7 247-248.
- R-package pcalg (available on CRAN)
- Contact info:
- http://www.stat.math.ethz.ch/∼maathuis
- maathuis@stat.math.ethz.ch
References
Marloes Maathuis, ETH Z¨ urich 29 / 29
- Main references:
- MAATHUIS, KALISCH AND B ¨
UHLMANN (2009).
“Estimating high-dimensional intervention effects from
- bservational data”. Annals of Statistics 37 3133-3164.
- MAATHUIS, COLOMBO, KALISCH AND B ¨
UHLMANN (2010).
”Predicting causal effects in large-scale systems from
- bservational data”. Nature Methods 7 247-248.
- R-package pcalg (available on CRAN)
- Contact info:
- http://www.stat.math.ethz.ch/∼maathuis
- maathuis@stat.math.ethz.ch