benchmarks wikis and open source causal discovery
play

Benchmarks, wikis, and open-source causal discovery Patrik O. Hoyer - PowerPoint PPT Presentation

Benchmarks, wikis, and open-source causal discovery Patrik O. Hoyer Univ. of Helsinki Finland NIPS*08 workshop on causality Dec 12, 2008 Beware! Not a technical talk! The causal discovery problem Unknown data-generating (causal)


  1. Benchmarks, wikis, and open-source causal discovery Patrik O. Hoyer Univ. of Helsinki Finland NIPS*08 workshop on causality Dec 12, 2008 Beware! Not a technical talk!

  2. The causal discovery problem • Unknown data-generating (‘causal’) system • We have some non-experimental and/or experimental data, from which we seek to infer the causal system... ... this is an extremely difficult ‘inverse’ problem which requires good assumptions/priors to succeed! p ( x ) = . . . p ( y | x ) = . . . p ( x ) = . . . ? . . . x p ( y | x ) = . . . ? ? ? 0.11 0.10 0.19 0.08 ... y ➠ ... ➠ . . . y y 0.27 1.54 0.33 0.76 ... z ? ? ? w 9.32 2.34 5.33 3.87 ? z z w w ?

  3. How well can we actually do it? • Lots of different methods proposed • Testing these methods on real data requires... - ...a set of non-experimental / experimental data - ...auxiliary knowledge of the true causal system! ...so testing causal discovery methods is quite more complicated than testing methods for regression, classification, or density estimation! ? density estimation causal discovery classification causal inference regression

  4. Causal discovery repository? Task: • Both problems and solutions... • Real or simulated data • Precisely defined: • what should algo do? Task Task Task Task • exact scoring procedure A B C D • reasonable assumptions about the data Algo Algo Algo Algo 1 2 3 4 Algorithm: • What does it do? Ongoing discussion on • Input-output well defined tasks and algorithms! • Open-source (if possible) or executable available (so anyone can run it on ...and anyone can evaluate how any dataset) any algorithm performs on any task.

  5. Causality workbench! (Isabelle Guyon et al.) • 15 datasets ( ≈ ‘tasks’) already collected! • The two challenges have spawned numerous approaches to solutions ( ≈ ‘algorithms’), although these are not (at least yet) collected on-line for anyone to evaluate • ...all-in-all, an excellent effort and a great start!

  6. The nature of the project • Volunteer-based collaborative effort • Need for repository to become self-sustaining • Look at successful examples for good ideas and principles... - Open-source software development (Linux, GNU) - Wikipedia (obviously, slightly different scale of projects, but the basic principles are still pretty much the same...) • What follows are my humble thoughts on what some of these principles are

  7. 1. All material – all versions – permanently available • Store everything on-site rather than just as links - Eliminates broken links -problems repository (e.g. ‘ICA Central’ repository, quick test: 5/10 broken links) - Enables full versioning of all material (particularly important in a scientific context where priority is often an issue) - Easy full downloading of all material (ensuring that all material is permanently available) complete download (Thus, need storage capacity and bandwidth, and need to consider licensing issues)

  8. 2. As few technical restrictions as possible • With full versioning (and easy reverting) there is no need for technical restrictions, rather one can rely on socially agreed-upon rules (may still require registration to make changes/updates) [assuming the collaborators are trying to help the project, technical restrictions are more annoying than helpful] • Anyone can help with any part of the project - Contributing new tasks and solutions, and developing improved solutions based on earlier ones - Writing documentation - Clarifying the rules and conventions of the repository - Graphics and design

  9. 3. Emergence of structure • It may be difficult to predict and impose the appropriate structure from the outset... ...so often useful to allow the structure to emerge as the project develops (and allow anyone to help in constructing the structure that best fits the changing needs) • ‘Wiki’ software: Free-form and/or structured collaborative webpages, file uploads, categories, templates, full versioning with easy reverts, complete downloads, all text and structure is collaboratively edited by all users + lots of other options

  10. Summary • Might be worthwhile to consider... - ...storing all material on-site rather than as links - ...using full versioning of all material - ...relying on social rules rather than technical restrictions to keep the repository in order - ...making it possible for everyone to work on the structure as well as the content • These features/aspects may be easiest to implement using freely available wiki software • I hope (and believe) that together we can make the repository an extremely useful tool for benchmarking existing causal discovery methods and developing new ones.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend