Predicting perturbation effects in large-scale systems from - PowerPoint PPT Presentation

Predicting perturbation effects in large-scale systems from observational data Marloes Maathuis Seminar f¨ ur Statistik, ETH Z¨ urich, Switzerland

Joint work with Peter B¨ uhlmann Diego Colombo Markus Kalisch Marloes Maathuis, ETH Z¨ urich 2 / 29

Research question • In short: Can we learn perturbation effects without doing perturbation experiments? Marloes Maathuis, ETH Z¨ urich 3 / 29

Research question • In short: Can we learn perturbation effects without doing perturbation experiments? • Concretely: Can we learn the gene regulatory network of yeast from observational data? • Predict perturbation effects between all pairs of genes • Identify pairs of genes between which there is a large effect Marloes Maathuis, ETH Z¨ urich 3 / 29

Why use observational data? • Thousands of perturbation experiments needed to estimate all perturbation effects ⇒ time consuming and expensive Marloes Maathuis, ETH Z¨ urich 4 / 29

Why use observational data? • Thousands of perturbation experiments needed to estimate all perturbation effects ⇒ time consuming and expensive • Questions: • Does observational data provide some information on perturbation effects? • Can this information be used to guide and prioritize perturbation experiments? Marloes Maathuis, ETH Z¨ urich 4 / 29

Definition of perturbation effect • Consider the effect of gene i on gene j . Let X i and X j be the expression levels of the genes. • If we experimentally change X i , what happens to X j ? Marloes Maathuis, ETH Z¨ urich 5 / 29

Definition of perturbation effect • Consider the effect of gene i on gene j . Let X i and X j be the expression levels of the genes. • If we experimentally change X i , what happens to X j ? • Hypothetical experiment: Genetically modified Genetically modified such that X i ≈ a such that X i ≈ a + 1 Marloes Maathuis, ETH Z¨ urich 5 / 29

Definition of perturbation effect • Consider the effect of gene i on gene j . Let X i and X j be the expression levels of the genes. • If we experimentally change X i , what happens to X j ? • Hypothetical experiment: do( X i = a ) do( X i = a + 1 ) Marloes Maathuis, ETH Z¨ urich 5 / 29

Definition of perturbation effect • Consider the effect of gene i on gene j . Let X i and X j be the expression levels of the genes. • If we experimentally change X i , what happens to X j ? • Hypothetical experiment: do( X i = a ) do( X i = a + 1 ) • Perturbation effect of gene i on gene j : E ( X j | do ( X i = a + 1)) − E ( X j | do ( X i = a )) (value of a drops out if the system is linear) Marloes Maathuis, ETH Z¨ urich 5 / 29

Estimating perturbation effects from observational data • It is easy to estimating associations from observational data. But association is not causation! • Pearl (2003): • “An associational concept is any relationship that can be defined in terms of the joint distribution of observed variables.” Marloes Maathuis, ETH Z¨ urich 6 / 29

Estimating perturbation effects from observational data • It is easy to estimating associations from observational data. But association is not causation! • Pearl (2003): • “An associational concept is any relationship that can be defined in terms of the joint distribution of observed variables.” • “A causal concept [such as a perturbation effect] is any relationship that cannot be defined from the distribution alone (...) Any claim invoking causal concepts must be traced to some premises that invoke such concepts; it cannot be inferred or derived from statistical associations alone.” Marloes Maathuis, ETH Z¨ urich 6 / 29

Estimating perturbation effects from observational data • It is easy to estimating associations from observational data. But association is not causation! • Pearl (2003): • “An associational concept is any relationship that can be defined in terms of the joint distribution of observed variables.” • “A causal concept [such as a perturbation effect] is any relationship that cannot be defined from the distribution alone (...) Any claim invoking causal concepts must be traced to some premises that invoke such concepts; it cannot be inferred or derived from statistical associations alone.” • An assumption that is often made: data were generated by a known directed acyclic graph (DAG) Marloes Maathuis, ETH Z¨ urich 6 / 29

Directed acyclic graph (DAG) X 2 X 1 X 3 • Nodes represent random variables and edges represent conditional independence relationships • The DAG encodes causal assumptions: • Edge X 2 → X 1 : X 2 may have a direct causal effect on X 1 • No edge X 1 � X 3 : X 1 cannot have a direct causal effect on X 3 (but X 1 and X 3 will be correlated!) Marloes Maathuis, ETH Z¨ urich 7 / 29

Pearl’s intervention-calculus / do-calculus X 2 X 1 X 3 • The perturbation effect of X 1 on X 3 : E ( X 3 | do ( X 1 = a + 1)) − E ( X 3 | do ( X 1 = a )) Marloes Maathuis, ETH Z¨ urich 8 / 29

Pearl’s intervention-calculus / do-calculus X 2 X 1 X 3 • The perturbation effect of X 1 on X 3 : E ( X 3 | do ( X 1 = a + 1)) − E ( X 3 | do ( X 1 = a )) • The do-operator stands for a hypothetical experiment. So E ( X 3 | do ( X 1 = a )) is not the usual conditional expectation! In the example: • E ( X 3 | X 1 = a ) � = E ( X 3 ) • E ( X 3 | do ( X 1 = a )) = E ( X 3 ) Marloes Maathuis, ETH Z¨ urich 8 / 29

Pearl’s intervention-calculus / do-calculus X 2 X 1 X 3 • The perturbation effect of X 1 on X 3 : E ( X 3 | do ( X 1 = a + 1)) − E ( X 3 | do ( X 1 = a )) • The do-operator stands for a hypothetical experiment. So E ( X 3 | do ( X 1 = a )) is not the usual conditional expectation! In the example: • E ( X 3 | X 1 = a ) � = E ( X 3 ) • E ( X 3 | do ( X 1 = a )) = E ( X 3 ) • Pearl’s do-calculus uses the DAG to write expressions involving the do-operator in terms of pre-intervention conditional distributions Marloes Maathuis, ETH Z¨ urich 8 / 29

Pearl’s intervention-calculus / do-calculus X 2 X 1 X 3 • Summary: If the DAG is given, one can estimate perturbation effects (or causal effects) from observational data Marloes Maathuis, ETH Z¨ urich 9 / 29

Main points in this talk • Present IDA (Intervention calculus when the DAG is Absent) • Requires observational data • generated from an unknown DAG • multivariate Gaussian • no hidden confounders • potentially high-dimensional system • Returns (summary measures of) estimated set of possible causal effects • Consistent in sparse high-dimensional settings • Validation on yeast data Marloes Maathuis, ETH Z¨ urich 10 / 29

What to do when the DAG is unknown? • A DAG encodes conditional independence relationships • So given all conditional independence relationships of the data, can we infer the DAG? Marloes Maathuis, ETH Z¨ urich 11 / 29

What to do when the DAG is unknown? • A DAG encodes conditional independence relationships • So given all conditional independence relationships of the data, can we infer the DAG? • Almost... Marloes Maathuis, ETH Z¨ urich 11 / 29

What to do when the DAG is unknown? • A DAG encodes conditional independence relationships • So given all conditional independence relationships of the data, can we infer the DAG? • Almost... several DAGs can encode the same conditional independence relationships. They form an equivalence class, described by a CPDAG. Marloes Maathuis, ETH Z¨ urich 11 / 29

What to do when the DAG is unknown? • A DAG encodes conditional independence relationships • So given all conditional independence relationships of the data, can we infer the DAG? • Almost... several DAGs can encode the same conditional independence relationships. They form an equivalence class, described by a CPDAG. • One can estimate this CPDAG, for example using the PC-algorithm of Peter Spirtes and Clark Glymour (Spirtes et al, 2000) • Fast implementation in the R-package pcalg • Consistent in sparse high-dimensional settings (Kalisch and B¨ uhlmann, JMLR 2007) Marloes Maathuis, ETH Z¨ urich 11 / 29

IDA (oracle version) PC-algorithm do-calculus DAG 1 effect 1 DAG 2 effect 2 . . . . oracle CPDAG multi-set Θ . . . . . . . . DAG m effect m Marloes Maathuis, ETH Z¨ urich 12 / 29

The multi-set Θ • Why multi-set instead of a unique value? Marloes Maathuis, ETH Z¨ urich 13 / 29

The multi-set Θ • Why multi-set instead of a unique value? • Recall quote of Pearl. We make “weak” causal assumptions: • The data are generated from unknown DAG • There are no hidden confounders Marloes Maathuis, ETH Z¨ urich 13 / 29

The multi-set Θ • Why multi-set instead of a unique value? • Recall quote of Pearl. We make “weak” causal assumptions: • The data are generated from unknown DAG • There are no hidden confounders • What information does Θ provide? Examples: • Θ = { 1 . 5 } ⇒ causal effect is 1 . 5 • Θ = { 1 . 5 , 0 . 5 , 3 . 1 } ⇒ causal effect is positive • Θ = { 1 . 5 , 1 . 5 , − 1 } ⇒ absolute value of causal effect ≥ 1 Marloes Maathuis, ETH Z¨ urich 13 / 29

Predicting perturbation effects in large-scale systems from - PowerPoint PPT Presentation

Predicting perturbation effects in large-scale systems from observational data Marloes Maathuis Seminar f ur Statistik, ETH Z urich, Switzerland Joint work with Peter B uhlmann Diego Colombo Markus Kalisch Marloes Maathuis, ETH Z

Privacy Preserving Data Mining: Additive Data Perturbation Outline Input perturbation

A large-scale International IPv6 Network A large-scale International IPv6 Network www.6net.org

FINANCING LARGE SCALE SOLAR Large Scale Solar Conference - Sydney Gloria Chan Director, Large

Variational Perturbation Theory Variational Perturbation Theory Hagen Kleinert, FU BERLIN

Harmonic Oscillator with x 3 perturbation 0.3 0.25 0.2 0.15 0.1 0.05 -0.4 -0.2 0.2 0.4

Effects and State Liam OConnor CSE, UNSW (and Data61) Term 2 2019 1 Effects State IO

INFRASTRUCTURE 2110414 Large Scale Computing Systems Natawut Nupairoj, Ph.D. Outline 2

Welcome Predicting Change Outcomes Leveraging SQL Server Profiler Lee Everest SQL Rx Predicting

Reliability of Cloud-Scale Systems (CS 598) Fall 2018 Tianyin Xu 1 Reliability of Cloud-Scale

Large-Scale Machine Learning at Twitter 2 Large-Scale Machine Learning at Twitter Jimmy Lin and

OVERVIEW 2110414 Large Scale Computing Systems Natawut Nupairoj, Ph.D. Outline 2 Overview

Interspecific strategic effects Interspecific strategic effects Interspecific strategic effects

Enabling Future Enabling Future Technology Technology Ultra-Large-Scale Systems

INTERNET SERVICES 2110414 Large Scale Computing Systems Natawut Nupairoj, Ph.D. Outline 2

A Summary of the Black Hole Perturbation Theory Steven Hochman Introduction Many frameworks

Perturbation Methods Jess Fernndez-Villaverde University of Pennsylvania May 28, 2015

Data Mining in Bioinformatics Day 7: Clustering in Bioinformatics Clustering Gene Expression Data

3D folding of chromosomal domains in relation to gene expression Marc A. Marti-Renom

1 Milestones Milestones ID Task Name Duration Start Finish % Complete 1 Project Proposal

Reconstruction Spatiotemporal Gene Expression from Partial Observations Dustin Cartwright 1 April

Assessing Differential Gene Expression from RNA-Seq Data Yanming Di Department of Statistics

Capturing Best Practice for Microarray Gene Expression Data Analysis Gregory Piatetsky-Shapiro

On the Expressive Power of Programming Languages 1 Historical Context Control Reduction

On the expressivity of total reversible programming languages Luca Paolini and Luca Roversi

Sambuz

Useful Links

Newsletter

Mail Us