Open-access datasets for time series causality discovery validation - - PowerPoint PPT Presentation

open access datasets for time series causality discovery
SMART_READER_LITE
LIVE PREVIEW

Open-access datasets for time series causality discovery validation - - PowerPoint PPT Presentation

Open-access datasets for time series causality discovery validation I. Guyon, C. Aliferis, G. Cooper, A. Elisseff, O. Guyon, J.-P. Pellet, A. Statnikov, P. Spirtes http://clopinet.com/causality/ causality@clopinet.com The challenges of


slide-1
SLIDE 1

Open-access datasets for time series causality discovery validation

  • I. Guyon, C. Aliferis, G. Cooper,
  • A. Elisseff, O. Guyon, J.-P. Pellet,
  • A. Statnikov, P. Spirtes

http://clopinet.com/causality/ causality@clopinet.com

slide-2
SLIDE 2

The challenges of causality discovery

which actions will have beneficial effects? …your health? …climate changes?

… the economy?

What affects… and…

slide-3
SLIDE 3

Causality and tim e

  • Everyday notion of causality involves time:

The causes precede their effects

  • Is that always true?

– Delayed/weak measurements; reverse causation – Final cause (objective)

  • Time does not resolve:

– Variability – Confounding – Sample bias

  • Other difficulties:

– Non i.i.d. samples: redundancy; correlation misleading. – Seasonality. – Censored data.

slide-4
SLIDE 4

Experimenting is usually needed to determine cause-effect relationships but …

Experim enting is needed…

slide-5
SLIDE 5

but…

  • Experiments are often:

– Costly – Unethical – Infeasible

  • Non-experimental

“observational” data is abundant and costs less.

slide-6
SLIDE 6

Identify algorithms both

  • efficient to identify causes
  • cost effective

The Causality Workbench

Our goal:

slide-7
SLIDE 7
  • Finding adequate data

– Ground truth of causal relationships – Experimental data – Large sample size

  • Conducting “life” experiments

– Costly – Impractical in a challenge setting

The Causality Workbench

Our challenges:

slide-8
SLIDE 8

The Causality Workbench

  • Collecting donations or real data
  • Acquiring or designing good simulators of

real systems

– Trained with real data – Used in the field to simulate systems, or – Including real data + artificial “probe” variables

  • Defining tasks with well defined objectives

Our methodology:

slide-9
SLIDE 9

To benchm ark algorithm s, w e built a …

http://clopinet.com/causality

slide-10
SLIDE 10

QUERIES ANSWERS Database

Lung Cancer Smoking Genetics Coughing Attention Disorder Allergy Anxiety Peer Pressure Yellow Fingers Car Accident Born an Even Day Fatigue

Models of systems

slide-11
SLIDE 11
  • Let you intervene on the system

– Perform virtual experiments

  • Serve you the data you want

– For a virtual cash fee

  • Include

– Real data – Semi-artificial data – Simulated data

What we can do for you:

slide-12
SLIDE 12

Causation and Prediction challenge

Toy datasets Challenge datasets

slide-13
SLIDE 13

artif

Pot-Luck challenge

self eval self eval real real artif artif artif

372

Stemmatology

580

CauseEffectPairs

551 TIED 918 SIGNET 862 PROMO 1372 LOCANET 609 CYTO Type Views Task

self eval real real

Time dep.

slide-14
SLIDE 14

Other donated datasets

self eval real real real

280 SEFTI 297 SECOM 247 NOISE 232 MIDS 272 WebLogs Type Views Task

artif artif real

http://clopinet.com/causality

Time dep.

slide-15
SLIDE 15

Active Learning Challenge

http://clopinet.com/al

slide-16
SLIDE 16

Next: Causality and Tim e Series

  • Get more datasets

– of practical and scientific interest

  • Get good simulators of real systems

– paired with the real datasets

  • Define tasks and objectives

– and practical challenge protocols

With your help: