Causal Models for Scientific Discovery Research Challenges and - - PowerPoint PPT Presentation

causal models for scientific discovery
SMART_READER_LITE
LIVE PREVIEW

Causal Models for Scientific Discovery Research Challenges and - - PowerPoint PPT Presentation

Causal Models for Scientific Discovery Research Challenges and Opportunities David Jensen College of Information and Computer Sciences Computational Social Science Institute Center for Data Science University of Massachusetts


slide-1
SLIDE 1

Causal Models for Scientific Discovery


Research Challenges and Opportunities

David Jensen


College of Information and Computer Sciences
 Computational Social Science Institute Center for Data Science
 University of Massachusetts Amherst 
 Symposium on Accelerating Science
 18 November 2016

slide-2
SLIDE 2

Sources: The Guardian, July 2005; Wallace Kirkland, for Time

slide-3
SLIDE 3

Sources: Wikipedia (pile); Argonne National Laboratory (Fermi)

slide-4
SLIDE 4

Main points

  • Representing and reasoning about causality is central

to science and scientific discovery.

  • Understanding of causal inference has advanced

tremendously in the past 25 years through the work

  • f several disparate research communities.
  • Several emerging opportunities and challenges exist:
  • Expressiveness — Combining data and knowledge from

multiple sources to understand complex phenomena

  • Critique — Inferring errors in modeling assumptions or

problem construction

  • Empirical evaluation — Providing realistic empirical tests
  • f methods for causal modeling
slide-5
SLIDE 5

Causality is central 
 to science

slide-6
SLIDE 6

Explanation ⇒ Causality

  • Explanation is a central activity 


in science. Effective theories explain previously unexplained phenomena

  • Effective explanations generally take

the form of a counterfactual 
 (“What would have happened if 
 conditions had been different?”).

  • “…explanatory relationships are

relationships that are potentially exploitable for purposes of manipulation and control.”

slide-7
SLIDE 7

Control & design ⇒ Causality

Sources: Wikipedia (pile)

slide-8
SLIDE 8

Models

  • Because of this, “models” in 


most scientific fields have causal implications (infer how a system would behave under intervention)

  • In contrast, most “models” in

machine learning and statistics have been defined as having only associational semantics.

  • This leads to substantial confusion

among researchers from other
 fields when first encountering
 machine learning methods.

slide-9
SLIDE 9

Progress in causal modeling

  • An explicit theory of causal inference has been worked
  • ut over the past 20 years 


by a small group of computer 
 scientists, philosophers, 
 and statisticians.

  • The theory uses directed


graphical models to represent
 causal dependence among variables.

  • That theory provides a formal correspondence 


between causal models and their observable statistical

  • implications. This correspondence has been exploited to

produce a number of algorithms for reasoning with causal graphical models (CGMs).

(Pearl 2000, 2009; Spirtes, Glymour, and Scheines 1993, 2001)

slide-10
SLIDE 10

Key concepts

  • Only statistical dependence is directly observable in data.

Causal dependence is not observable.

  • Statistical dependence underdetermines causal

dependence (“correlation is not causation”)

  • The observable statistical consequences of a given causal

model can be inferred from structure (d-separation)

  • Multiple causal structures produce the same observed

statistical dependencies (Markov equivalence).

  • However, some combinations of conditional

independence and known causal dependence imply constraints on the space of causal structures, and some uniquely identify causal structures

slide-11
SLIDE 11

Main points

  • Representing and reasoning about causality is central

to science and scientific discovery.

  • Understanding of causal inference has advanced

tremendously in the past 25 years through the work

  • f several disparate research communities.
  • Several emerging opportunities and challenges exist:
  • Expressiveness — Combining data and knowledge from

multiple sources to understand complex phenomena

  • Critique — Inferring errors in modeling assumptions or

problem construction

  • Empirical evaluation — Providing realistic empirical tests
  • f methods for causal modeling
slide-12
SLIDE 12

Expressiveness

slide-13
SLIDE 13

Source: Honavar, Hill, & Yelick (2016) , Accelerating Science: A Computing Research Agenda

slide-14
SLIDE 14

Source: Honavar, Hill, & Yelick (2016) , Accelerating Science: A Computing Research Agenda

slide-15
SLIDE 15

Manual Scientific Practice


Rarely searches large spaces


  • f formally represented models

Machine Learning


Rarely analyzes 
 causal dependence

Causal Discovery


Rarely discovers relational, temporal, or spatial models

Causal Analysis Automated Discovery Relational, Temporal and Spatial Models

slide-16
SLIDE 16

Causal models of independent outcomes

Causal
 Process Outcome Variables

A B Z

. . .

slide-17
SLIDE 17

Causal models of independent outcomes

I J H E D A F G B C

slide-18
SLIDE 18

Key assumption of simple CGMs

Causal
 Process Outcome Variables

A B Z

. . .

slide-19
SLIDE 19

Key assumption of simple CGMs

Causal
 Process Multiple Dependent Outcomes

x

?

slide-20
SLIDE 20

Causal models of independent outcomes

I J H E D A F G B C

slide-21
SLIDE 21

K K

Causal models of dependent outcomes

(Friedman, Getoor, Koller, & Pfeffer 1999; Heckerman, Meek, & Koller 2007; Maier, Marazopoulou, and Jensen 2013)

I J H E D A F G B C

K O R P S Q T L M N

slide-22
SLIDE 22

(Maier, Marazopoulou, and Jensen 2013)

slide-23
SLIDE 23

(Maier, Marazopoulou, and Jensen 2013)

slide-24
SLIDE 24

(Maier, Marazopoulou, and Jensen 2013)

slide-25
SLIDE 25

Causal models of general processes

Causal
 Process

1: bool c1, c2;
 2: int count = 0;
 3: c1 = Bernoulli(0.5); 
 4: if (c1==true) then
 5: count = count + 1; 
 6: c2 = Bernoulli(0.5); 
 7: if (c2==true) then
 8: count = count + 1; 
 9:

  • bserve(c1==true||c2==true); 


10: return(count);

Probabilistic
 Program

slide-26
SLIDE 26

Critique

slide-27
SLIDE 27

“[To support science, we would expect] 
 that two different kinds of inferential process 
 would be required to put it into effect. The first, used in estimating parameters from data conditional on the truth of some tentative model, 
 is appropriately called Estimation. 
 The second, used in checking whether, in the light of the data, any model of the kind proposed is plausible, has been aptly named…Criticism.”

— George Box (emphasis added)

slide-28
SLIDE 28

Example assumptions

  • Faithfulness
  • Causal Markov assumption
  • Definitions of variables, entities, relationships, etc.
  • Measurement process
  • Temporal granularity of measurement
  • Latent variables, entities, relationships, etc.
  • Structural form of causal dependence
  • Functional form of probabilistic dependence
  • Compositional form
  • Closed world (or form of open world)
  • …and many others
slide-29
SLIDE 29

Empirical evaluation

slide-30
SLIDE 30

Goals for Empirical Evaluation Approaches

  • Empirical — A pre-existing system created by someone
  • ther than the researchers.
  • Stochastic — Produces non-deterministic experimental

results.

  • Identifiable — Amenable to direct experimental

investigation to estimate interventional distributions

  • Recoverable — Lacks memory or irreversible effects, which

enables complete state recovery during experiments.

  • Efficient — Generates large amounts of data with relatively

few resources.

  • Reproducible — Fairly easy to recreate nearly identical data

sets without access to one-of-a-kind hardware or software.

slide-31
SLIDE 31

Simple example: Database configuration

slide-32
SLIDE 32

ML for database configuration (setup)

  • Assume a fixed database 


and DB server hardware

  • Questions
  • For a given query, what is the expected performance

under each set of configuration parameters?

  • For a given query, which configuration will give me the

best performance?

  • Data
  • Run 11,252 queries actually run against the Stack

Exchange Data Explorer

  • Each query run using one of many different joint values
  • f the configuration parameters using Postgres 9.2.2

(Garant & Jensen 2016)

slide-33
SLIDE 33

Indexing Page Cost Memory Level Block Writes to RAM Year Created Join Count Group-by Count Block Reads from RAM Runtime Retrieved Row Count Table Count Total Row Count Length Total Queries by User Block Reads from Disk Block Hits in Cache

CGM for database configuration

slide-34
SLIDE 34

Indexing Page Cost Memory Level Block Writes to RAM Year Created Join Count Group-by Count Block Reads from RAM Runtime Retrieved Row Count Table Count Total Row Count Length Total Queries by User Block Reads from Disk Block Hits in Cache

Database Query Processing User

Indexing Page Cost Memory Level Block Writes to RAM Year Created Join Count Group-by Count Block Reads from RAM Runtime Retrieved Row Count Table Count Total Row Count Length Total Queries by User Block Reads from Disk Block Hits in Cache

CGM for database configuration

slide-35
SLIDE 35

Indexing Page Cost Memory Level Block Writes to RAM Year Created Join Count Group-by Count Block Reads from RAM Runtime Retrieved Row Count Table Count Total Row Count Length Total Queries by User Block Reads from Disk Block Hits in Cache

Database Query Processing User

Indexing Page Cost Memory Level Block Writes to RAM Year Created Join Count Group-by Count Block Reads from RAM Runtime Retrieved Row Count Table Count Total Row Count Length Total Queries by User Block Reads from Disk Block Hits in Cache

CGM for database configuration

Database Query Processing User

Indexing Page Cost Memory Level Block Writes to RAM Year Created Join Count Group-by Count Block Reads from RAM Runtime Retrieved Row Count Table Count Total Row Count Length Total Queries by User Block Reads from Disk Block Hits in Cache

slide-36
SLIDE 36

Database Query Processing User

Indexing Page Cost Memory Level Block Writes to RAM Year Created Join Count Group-by Count Block Reads from RAM Runtime Retrieved Row Count Table Count Total Row Count Length Total Queries by User Block Reads from Disk Block Hits in Cache

CGM for database configuration

slide-37
SLIDE 37

Database Query Processing User

Indexing Page Cost Memory Level Block Writes to RAM Year Created Join Count Group-by Count Block Reads from RAM Runtime Retrieved Row Count Table Count Total Row Count Length Total Queries by User Block Reads from Disk Block Hits in Cache

CGM for database configuration

slide-38
SLIDE 38

Comparing associational and causal models

  • Compare a state-of the-art associational model (a random

forest) to a CGM constructed using greedy equivalence search (GES) (Chickering 
 & Meek 2002)

  • Evaluate by 


comparing to 
 “ground truth” 
 (experimental 
 results for all 
 queries obtained 
 using a specific 
 joint setting of 
 the configuration 
 parameters).

Cache Hits

(Garant & Jensen 2016)

slide-39
SLIDE 39

Comparing associational and causal models

  • Compare a state-of the-art associational model (a random

forest) to a CGM constructed using greedy equivalence search (GES) (Chickering 
 & Meek 2002)

  • Evaluate by 


comparing to 
 “ground truth” 
 (experimental 
 results for all 
 queries obtained 
 using a specific 
 joint setting of 
 the configuration 
 parameters).

Cache Hits

(Garant & Jensen 2016)

slide-40
SLIDE 40

Comparing associational and causal models

  • Compare a state-of the-art associational model (a random

forest) to a CGM constructed using greedy equivalence search (GES) (Chickering 
 & Meek 2002)

  • Evaluate by 


comparing to 
 “ground truth” 
 (experimental 
 results for all 
 queries obtained 
 using a specific 
 joint setting of 
 the configuration 
 parameters).

Disk Reads

(Garant & Jensen 2016)

slide-41
SLIDE 41

Comparing associational and causal models

  • Compare a state-of the-art associational model (a random

forest) to a CGM constructed using greedy equivalence search (GES) (Chickering 
 & Meek 2002)

  • Evaluate by 


comparing to 
 “ground truth” 
 (experimental 
 results for all 
 queries obtained 
 using a specific 
 joint setting of 
 the configuration 
 parameters).

Disk Reads

(Garant & Jensen 2016)

slide-42
SLIDE 42

Comparing associational and causal models

  • Compare a state-of the-art associational model (a random

forest) to a CGM constructed using greedy equivalence search (GES) (Chickering 
 & Meek 2002)

  • Evaluate by 


comparing to 
 “ground truth” 
 (experimental 
 results for all 
 queries obtained 
 using a specific 
 joint setting of 
 the configuration 
 parameters).

Runtime

(Garant & Jensen 2016)

slide-43
SLIDE 43

Comparing associational and causal models

  • Compare a state-of the-art associational model (a random

forest) to a CGM constructed using greedy equivalence search (GES) (Chickering 
 & Meek 2002)

  • Evaluate by 


comparing to 
 “ground truth” 
 (experimental 
 results for all 
 queries obtained 
 using a specific 
 joint setting of 
 the configuration 
 parameters).

Runtime

(Garant & Jensen 2016)

slide-44
SLIDE 44

Main points

  • Representing and reasoning about causality is central

to science and scientific discovery.

  • Understanding of causal inference has advanced

tremendously in the past 25 years through the work

  • f several disparate research communities.
  • Several emerging opportunities and challenges exist:
  • Expressiveness — Combining data and knowledge from

multiple sources to understand complex phenomena

  • Critique — Inferring errors in modeling assumptions or

problem construction

  • Empirical evaluation — Providing realistic empirical tests
  • f methods for causal modeling
slide-45
SLIDE 45

Thanks

David Arbour — Recent developments in learning causal dependence from bivariate joint distributions in relational data (UAI & KDD 2016) Dan Garant — Empirical evaluation of algorithms for learning causal models (UAI 2016) Amanda Gentzel — Granger causality methods and empirical evaluation Katerina Marazopoulou — Extending causal semantics to temporal models (UAI 2015; 2016) Kaleigh Clary — Additive noise models for learning causal dependence from bivariate joint distributions

slide-46
SLIDE 46

jensen@cs.umass.edu kdl.cs.umass.edu
 cs.umass.edu/~jensen/

All opinions are mine and not those of any company, agency of the US Government, 


  • r the University of Massachusetts Amherst.