Causal Data Science Roman Kern Knowledge Discovery and Data Mining - PDF document

www.tugraz.at > Motivation : With purely observational data we are not able Causal Data Science to answer many questions that one would expect data science to deliver. Taking into the causal perspective , one may (with SCIENCE assumptions, or domain knowledge) answer these questions. PASSION TECHNOLOGY > Goal : Understand the importance and implications of the data generation process and its implications of how to tackle a data science analysis. Causal Data Science Roman Kern Knowledge Discovery and Data Mining 2 (Version 1.0.4) Roman Kern, ISDS, TU Graz 1 > www.tugraz.at Knowledge Discovery and Data Mining 2 (Version 1.0.4) > This lecture can only scratch the surface of causality, so large www.tugraz.at Causal Data Science sections of research are lef out. Outline 1 Overview & Motivation 2 Correlation without Reason 3 Potential Outcomes 4 Structural Causal Model 5 Causal Graph 6 Causal Inference 7 Causal Discovery 8 Conclusions Roman Kern, ISDS, TU Graz 2 Knowledge Discovery and Data Mining 2 (Version 1.0.4) www.tugraz.at Overview & Motivation Gentle introduction to causality, and how we ended up here... Roman Kern, ISDS, TU Graz 3 Knowledge Discovery and Data Mining 2 (Version 1.0.4) > Image a factory that produces t-shirts . www.tugraz.at Overview & Motivation > Problem: some of the t-shirts have defects . Root Cause Analytics - T-Shirts > Task: Root cause analytics to find out, what part of the production process steps is associated (i.e., causally related) with these faults. > Data to solve this task: longitudinal data (mostly time series data) from around the shop floor. > Spoiler alert: we need domain knowledge to beter understand the data generation process (e.g., the causal effects). > We need domain knowledge just to correctly segment our data. Roman Kern, ISDS, TU Graz 4 Knowledge Discovery and Data Mining 2 (Version 1.0.4)

www.tugraz.at > Each shirt is produced in multiple steps, each step may have Overview & Motivation multiple (semi-)identical machines and each machine provide a Root Cause Analytics - T-Shirts number of data (e.g., time series data). > The arrows present the path a t-shirt takes throughout the production process, this may already be the base for what we will later call a causal graph. > And already we can use time to our advantage, the root cause need to always precede the effect. > Knowing the production process will immensely help us in our task! Roman Kern, ISDS, TU Graz 5 Knowledge Discovery and Data Mining 2 (Version 1.0.4) > We all learnt that we cannot jump to conclusions about the www.tugraz.at Overview & Motivation true nature , just given observations. Starting Point > “Since event Y followed event X, event Y must have been caused by event X”. > In the 20th century we learnt to avoid phrases like “X causes Correlation does not imply causation Y”, and go for the more vague/safe phrase like “X is associated with Y”. Post hoc ergo propter hoc > The “Book of Why” of Judea Pearl gives a nice history lesson. > Today, we progressed forward and beter understand, when (exactly) we are allowed to state “X causes Y” given just observational data. Roman Kern, ISDS, TU Graz 6 Knowledge Discovery and Data Mining 2 (Version 1.0.4) www.tugraz.at > The sports illustrated curse! Overview & Motivation > There appears a solid causation (title followed by dip in per- Regression to the Mean formance), but in fact the good performance prior to the title page caused the title page. > There is even a hastag on Instagram: https://www. The magazine “ Sports Illustrated ” features successful athletes on its instagram.com/explore/tags/sicurse/ cover > And it is mentioned in Kahneman’s book, Thinking fast, thinking slow. But once they appear on the cover, their performance drops. > Initial insight : → “The Sports Illustrated Cover Jinx” > Correlation is symmetric, causation is directed. It can be explained by the regression to the mean Or, via reverse causation i.e., good performance caused the cover, and the cover did not cause bad performance Roman Kern, ISDS, TU Graz 7 Knowledge Discovery and Data Mining 2 (Version 1.0.4) > Randomised controlled trial (RCT): www.tugraz.at Overview & Motivation > - Want to study the impact of a treatment Role of Causality in Data Science > - Have a (large) number of people > - Assign people randomly into 2 groups: gets treatment, don’t get treatment (without them knowing) The gold standard to measure effects are randomised controlled > - Measure the difference experiments > Since the only difference is the treatment, any change can be atributed to the treatment. In practice they ofen cannot be conducted > Many reasons, why randomised controlled trials cannot be conducted: ethical, financial, practical. A-B testing is a form of such experiment > One needs many participants (instances, e.g., t-shirts). Make use, if possible > Data-driven causal inference = causal inference from observational data. Data-driven causal inference as next best option Roman Kern, ISDS, TU Graz 8 Knowledge Discovery and Data Mining 2 (Version 1.0.4)

www.tugraz.at > See: Guo, R. et al. (2020) ‘A Survey of Learning Causality Overview & Motivation with Data’, ACM Computing Surveys, 53(4), pp. 1–37. doi: Nomenclature 10.1145/3397269. > In data science, we are mostly interested into learning causal Terminology Alternatives Explanation effects, i.e, we know (via domain knowledge) the causal relation- causality causal relation, causation causal relation between variables ships, and with observational data we estimate the strength of causal effect - the strength of a causal relation a relationship (instead of conducting a randomised controlled instance unit, sample, example an independent unit of the population features covariates, observables, pre-treatment variables describing instances experiment). variables > Ofen, the cause is called treatment and the effect is called learning causal ef- forward causal inference, forward causal identification and estimation of causal ef- outcome - this is for historic reasons (as causality mostly pro- fects reasoning fects learning causal rela- causal discovery, causal learning, causal inferring causal graphs from data gressed in these areas). tions search > Features are ofen also called independent variables, especially causal graph causal diagram a graph with variables as nodes and causality as edges in a seting, where one wants to predict the dependent variable confounder confounding variable a variable causally influences both treat- (also called target). ment and outcome > Relationship to classical statistics : see if there is an effect: statistical hypothesis testing, e.g. via p-values → causal discovery, measuring the strength of the effect: effect size, e.g. via Roman Kern, ISDS, TU Graz 9 Knowledge Discovery and Data Mining 2 (Version 1.0.4) correlation → causal inference. > Two frameworks for causal learning. www.tugraz.at Overview & Motivation > See also: https://blog.methodsconsultants.com/posts/ Main Approaches pearl-causality/ . > SCMs are ofen preferred when learning causal relations Potential Outcomes by Donald Rubin among a set of variables, and PO for learning the strength of relations. Structural Causal Models (SCMs) by Judea Pearl Roman Kern, ISDS, TU Graz 10 Knowledge Discovery and Data Mining 2 (Version 1.0.4) www.tugraz.at # Good book for find a match for practical setings: Overview & Motivation # Hernán MA, Robins JM (2020). Causal Inference: What If . Boca Recommended Literature Raton: Chapman & Hall/CRC. # https://www.hsph.harvard.edu/miguel-hernan/ Suggested reading sequence causal-inference-book/ 1. Glymour, M. M. and Greenland, S. (2008) ‘Causal diagrams’ , Modern epidemiology. Lippincot Williams & Wilkins Philadelphia, PA, 3, pp. 183–209. 2. Guo, R. et al. (2020) ‘A Survey of Learning Causality with Data’ , ACM Computing Surveys, 53(4), pp. 1–37. doi: 10.1145/3397269. 3. Pearl, J., & Mackenzie, D. (2018). The book of why: the new science of cause and effect . Basic Books. 4. Pearl, J., Glymour, M., & Jewell, N. P. (2016). Causal inference in statistics: A primer . John Wiley & Sons. Roman Kern, ISDS, TU Graz 11 Knowledge Discovery and Data Mining 2 (Version 1.0.4) > Also interesting, the causal inference tutorial: www.tugraz.at https:// Overview & Motivation github.com/amit-sharma/causal-inference-tutorial/ Recommended Resources > Also good starting point, a four-part lecture on YouTube by Jonas Peters : https://www.youtube.com/watch?v= zvrcyqcN9Wo Introduction to Causal Inference by Brady Neal , https://www.bradyneal.com/causal-inference-course Causal Data Science by Adam Kelleher , https://medium.com/ causal-data-science/causal-data-science-721ed63a4027 Causal Data Science with Directed Acyclic Graphs by Paul Hünermund , https://www.udemy.com/course/causal-data-science/ Roman Kern, ISDS, TU Graz 12 Knowledge Discovery and Data Mining 2 (Version 1.0.4)

Causal Data Science Roman Kern Knowledge Discovery and Data Mining - PDF document

www.tugraz.at > Motivation : With purely observational data we are not able Causal Data Science to answer many questions that one would expect data science to deliver. Taking into the causal perspective , one may (with SCIENCE assumptions, or

Causal Effect Evaluation and Causal Network Learning Zhi Geng Peking University, China June

Political Science 209 - Fall 2018 Causal Inference Florian Hollenbach 7th September 2018 Causal

Foundations of Causal Discovery Frederick Eberhardt KDD Causality Workshop 2016 Causal Discovery

Causal Discovery from Observational Data Brady Neal causalcourse.com What if we dont have

Few-shot Domain Adaptation 1/12 by Causal Mechanism Transfer Domain adaptation Causal mechanism

Causal Inference By: Miguel A. Hern an and James M. Robins Part I: Causal inference without

Causal Programming Causal Programming Joshua Brul Joshua Brul

Week 5 Video 2 Relationship Mining Causal Mining Causal Data Mining These slides developed in

Introduction to Causal Inference Lan Liu University of Minnesota at Twin Cities liux3771@umn.edu

A Brief Introduction to Causal Inference Brady Neal causalcourse.com What is causal inference?

Randomized Experiments The goal of randomized experiments is to identify The causal

Causal and Non-Causal Feature Selection for Ridge Regression Gavin Cawley School of Computing

Geographic Data Science - Lecture IX Causal Inference Dani Arribas-Bel Today Correlation Vs

Geographic Data Science - Lecture IX Causal Inference Dani Arribas-Bel Today Correlation Vs

Geographic Data Science - Lecture IX Causal Inference Dani Arribas-Bel Today Correlation Vs

Data-efficient causal effect estimation Adith Swaminathan adswamin@microsoft.com Joint work with

An Investigation of Why Overparameterization Exacerbates Spurious Correlation Authors: Shiori

Week 13 -Wednesday Image based effects Skyboxes Lightfields Sprites Billboards

Register Allocation (via graph coloring and spilling) Register allocation LLVM IR uses an

Low-Level Issues Last lecture Interprocedural analysis Today Start low-level issues

The Regional Dimension A Bayesian Network Analysis Marco Scutari scutari@idsia.ch Dalle Molle

Causality V. Bunkin, L. Steffen (Seminar in Statistics) Causality 02.05.2016 1 / 23

Party on! A new, conditional variable importance A new, conditional importance measure for

The Problem of Size prof. dr Arno Siebes Algorithmic Data Analysis Group Department of