Cause-Effect Pairs Challenge Isabelle Guyon ChaLearn Thanks - - PowerPoint PPT Presentation

cause effect pairs challenge
SMART_READER_LITE
LIVE PREVIEW

Cause-Effect Pairs Challenge Isabelle Guyon ChaLearn Thanks - - PowerPoint PPT Presentation

Cause-Effect Pairs Challenge Isabelle Guyon ChaLearn Thanks Initial impulse : Joris Mooij, Dominik Janzing, and Bernhard Schlkopf, from the Max Planck. Examples of algorithms and data: Povilas Daniuis, Arthur Gretton, Patrik O. Hoyer,


slide-1
SLIDE 1

Cause-Effect Pairs Challenge

Isabelle Guyon ChaLearn

slide-2
SLIDE 2

Challenges in Machine Learning http://chalearn.org

Initial impulse: Joris Mooij, Dominik Janzing, and Bernhard Schölkopf, from the Max Planck. Examples of algorithms and data: Povilas Daniušis, Arthur Gretton, Patrik O. Hoyer, Dominik Janzing, Antti Kerminen, Joris Mooij, Jonas Peters, Bernhard Schölkopf, Shohei Shimizu, Oliver Stegle, and Kun Zhang, Jakob Zscheischler. Datasets and result analysis: Isabelle Guyon + Mehreen Saeed + {Mikael Henaff, Sisi Ma, and Alexander Statnikov}, from NYU. Website and sample code: Isabelle Guyon + Ben Hamner (Kaggle). Review, testing: Marc Boullé, Hugo Jair Escalant, Frederick Eberhardt, Seth Flaxman, Patrik Hoyer, Dominik Janzing, Richard Kennaway, Vincent Lemaire, Joris Mooij, Jonas Peters, Florin , Peter Spirtes,
Ioannis Tsamardinos, Jianxin Yin, Kun Zhang.

Thanks

slide-3
SLIDE 3

Challenges in Machine Learning http://chalearn.org

Gene networks 100,000 genes Neural networks 100 billion neurons Small networks: Influence diagrams

Causal discovery without overfitting?

slide-4
SLIDE 4

Causation coefficient

Causality Workbench clopinet.com/causality

C A <- B A -> B A – B or A|B

C can be used to

  • RANK pairs of variables

and prioritize experiments

  • Orient edges in degenerate

causal graphs

slide-5
SLIDE 5

ROC curves for A->B

Challenges in Machine Learning chalearn.org

slide-6
SLIDE 6

Winners

  • 1. ProtoML (Rank 1): Diogo Moitinho de Almeida.
  • 2. Jarfo (Rank 2): José Adrián Rodríguez Fonollosa.
  • 3. FirfID (Rank 4): Spyridon Samothrakis.

Challenges in Machine Learning chalearn.org

slide-7
SLIDE 7

Data

Challenges in Machine Learning chalearn.org

slide-8
SLIDE 8

Cause-effect pairs method

Causality Workbench clopinet.com/causality

Test whether A -> B is a better explanation than A <- B comparing two hypotheses: B = f (A, noise) A = f (B, noise)

slide-9
SLIDE 9

Setting of the challenge

Causality Workbench clopinet.com/causality

A B Z A -> B B A Z A <- B A B Z ZB ZA A <- Z -> B ~ A - B A B A | B

slide-10
SLIDE 10

Setting

  • No feed-back loops.
  • No explicit time information.
  • A variable can be though of as an aggregate

statistic, like life expectancy of a population, or a measurement like temperature.

  • We consider pairs of variables {A, B} for

which A->B means B = f (A, noise).

  • Pairs are independent of each other.

Causality Workbench clopinet.com/causality

slide-11
SLIDE 11

Data provided

Challenges in Machine Learning chalearn.org

slide-12
SLIDE 12

Example: Best fit: A -> B

Causality Workbench clopinet.com/causality

A -> B A <- B

slide-13
SLIDE 13

Large dataset

  • Real data (18%):

– Altitude -> Temperature – Age -> Wages – Car color -> Price – Country -> Infant mortality

  • Artificial data (82%):

B = f(A, noise)

Challenges in Machine Learning chalearn.org

slide-14
SLIDE 14

Real variables

Challenges in Machine Learning chalearn.org

Demographics: Sex -> Height Age -> Wages Native country -> Education Latitude -> Infant mortality Ecology: City elevation -> Temperature Water level -> Algal frequency Elevation -> Vegetation type Distance to hydrology -> Fire Econometrics: Mileage -> Car resell price Number of rooms -> House price Trace price last day -> Trade price Medicine: Cancer volume -> Recurrence Metastasis -> Prognosis Age -> Blood pressure Genomics (mRNA level): transcription factor -> protein induced Engineering: Car model year -> Horsepower Number of cylinders -> MPG Cache memory -> Compute power Roof area -> Heating load Cement used -> Compressive strength

slide-15
SLIDE 15

Real variables

Challenges in Machine Learning chalearn.org

2N manually curated pairs N A -> B N A <- B N A | B N A <-> B

  • Var. random

permutations N artificial A <-> B Rank preserving

  • var. substitution
slide-16
SLIDE 16

Artificial data

Causality Workbench clopinet.com/causality

F(A, Z) Real variables Mix Categorical + Continuous A B Z B

slide-17
SLIDE 17

Data browser and sample code

Challenges in Machine Learning chalearn.org

slide-18
SLIDE 18

Result analysis

Challenges in Machine Learning chalearn.org

slide-19
SLIDE 19

Model-based methods

  • Additive Noise Model (ANM): Best fit,

compare independence of input and residual.

  • Latent variable models (LINGAM): Enforce

independence of input and residual, compare model weights.

  • Complexity-based models: Select simplest

explanation of the data (GPI and IGCI).

Causality Workbench clopinet.com/causality

http://webdav.tuebingen.mpg.de/causality/

slide-20
SLIDE 20

Empirical methods

  • 267 teams and 4578 entries.
  • All baseline methods outperformed!
  • Code of 3 winners available.

Causality Workbench clopinet.com/causality

slide-21
SLIDE 21

No overfitting

Challenges in Machine Learning chalearn.org

slide-22
SLIDE 22

Result comparison

Challenges in Machine Learning chalearn.org

slide-23
SLIDE 23

Statistical significance

Challenges in Machine Learning chalearn.org

slide-24
SLIDE 24

Causation coefficient distribution

Challenges in Machine Learning chalearn.org

slide-25
SLIDE 25

Causation coefficient distribution

Challenges in Machine Learning chalearn.org

slide-26
SLIDE 26

Post-challenge verifications

Challenges in Machine Learning chalearn.org

3648 cause effect pairs from GeneNetWeaver 3.0 (http://gnw.sourceforge.net/) based on E. Coli transcriptional regulatory network. Experiment 1: no retraining Experiment 2: train ½, test ½. Alexander Statnikov and Sisi Ma

slide-27
SLIDE 27

Survey (27 responses)

Challenges in Machine Learning chalearn.org

slide-28
SLIDE 28

Preprocessing

Challenges in Machine Learning chalearn.org

slide-29
SLIDE 29

Feature extraction

Challenges in Machine Learning chalearn.org

slide-30
SLIDE 30

Dimensionality reduction

Challenges in Machine Learning chalearn.org

slide-31
SLIDE 31

Recognition

Challenges in Machine Learning chalearn.org

slide-32
SLIDE 32

Classifier

Challenges in Machine Learning chalearn.org

slide-33
SLIDE 33

Implementation

Challenges in Machine Learning chalearn.org

slide-34
SLIDE 34

Time spent

Challenges in Machine Learning chalearn.org

slide-35
SLIDE 35

Causality Workbench clopinet.com/causality

Cause-Effect Pairs Challenge

http://clopinet.com/causality