Entropic Causal Inference Murat Kocaoglu, Alexandros G. Dimakis, - - PowerPoint PPT Presentation

entropic causal inference
SMART_READER_LITE
LIVE PREVIEW

Entropic Causal Inference Murat Kocaoglu, Alexandros G. Dimakis, - - PowerPoint PPT Presentation

Entropic Causal Inference Murat Kocaoglu, Alexandros G. Dimakis, Sriram Vishwanath and Babak Hassibi University of Texas at Austin November 28, 2019 Amirkasra Jalaldoust Entropic Causal Inference November 28, 2019 1 / 25 Outline Amirkasra


slide-1
SLIDE 1

Entropic Causal Inference

Murat Kocaoglu, Alexandros G. Dimakis, Sriram Vishwanath and Babak Hassibi

University of Texas at Austin

November 28, 2019

Amirkasra Jalaldoust Entropic Causal Inference November 28, 2019 1 / 25

slide-2
SLIDE 2

Outline

Amirkasra Jalaldoust Entropic Causal Inference November 28, 2019 2 / 25

slide-3
SLIDE 3

Outline

Problem Definition

Amirkasra Jalaldoust Entropic Causal Inference November 28, 2019 2 / 25

slide-4
SLIDE 4

Outline

Problem Definition Approach

Amirkasra Jalaldoust Entropic Causal Inference November 28, 2019 2 / 25

slide-5
SLIDE 5

Outline

Problem Definition Approach Background and Notation

Amirkasra Jalaldoust Entropic Causal Inference November 28, 2019 2 / 25

slide-6
SLIDE 6

Outline

Problem Definition Approach Background and Notation Identifiability (H0)

Amirkasra Jalaldoust Entropic Causal Inference November 28, 2019 2 / 25

slide-7
SLIDE 7

Outline

Problem Definition Approach Background and Notation Identifiability (H0) Identifiability (H1)

Amirkasra Jalaldoust Entropic Causal Inference November 28, 2019 2 / 25

slide-8
SLIDE 8

Outline

Problem Definition Approach Background and Notation Identifiability (H0) Identifiability (H1) Greedy Entropy Minimization

Amirkasra Jalaldoust Entropic Causal Inference November 28, 2019 2 / 25

slide-9
SLIDE 9

Outline

Problem Definition Approach Background and Notation Identifiability (H0) Identifiability (H1) Greedy Entropy Minimization Experiments

Amirkasra Jalaldoust Entropic Causal Inference November 28, 2019 2 / 25

slide-10
SLIDE 10

Problem Definition

Amirkasra Jalaldoust Entropic Causal Inference November 28, 2019 3 / 25

slide-11
SLIDE 11

Problem Definition

Pair of random variables: (X, Y ) ∼ pX,Y

Amirkasra Jalaldoust Entropic Causal Inference November 28, 2019 3 / 25

slide-12
SLIDE 12

Problem Definition

Pair of random variables: (X, Y ) ∼ pX,Y Causal discovery: X

?

→ Y

Amirkasra Jalaldoust Entropic Causal Inference November 28, 2019 3 / 25

slide-13
SLIDE 13

Problem Definition

Pair of random variables: (X, Y ) ∼ pX,Y Causal discovery: X

?

→ Y Structural Causal Model: E ∼ pE Y = f (X, E)

Amirkasra Jalaldoust Entropic Causal Inference November 28, 2019 3 / 25

slide-14
SLIDE 14

Problem Definition

Pair of random variables: (X, Y ) ∼ pX,Y Causal discovery: X

?

→ Y Structural Causal Model: E ∼ pE Y = f (X, E) Causal sufficiency: X ⊥ ⊥ E

Amirkasra Jalaldoust Entropic Causal Inference November 28, 2019 3 / 25

slide-15
SLIDE 15

Problem Definition

Pair of random variables: (X, Y ) ∼ pX,Y Causal discovery: X

?

→ Y Structural Causal Model: E ∼ pE Y = f (X, E) Causal sufficiency: X ⊥ ⊥ E Example: Additive noise: f (X, E) = f (X) + E Linear causal mechanism: f (X) = A.X + µ

Amirkasra Jalaldoust Entropic Causal Inference November 28, 2019 3 / 25

slide-16
SLIDE 16

Approach

Amirkasra Jalaldoust Entropic Causal Inference November 28, 2019 4 / 25

slide-17
SLIDE 17

Approach

The use of information theory as a tool for causal discovery

Amirkasra Jalaldoust Entropic Causal Inference November 28, 2019 4 / 25

slide-18
SLIDE 18

Approach

The use of information theory as a tool for causal discovery e.g. Granger causality, Directed information and etc.

Amirkasra Jalaldoust Entropic Causal Inference November 28, 2019 4 / 25

slide-19
SLIDE 19

Approach

Amirkasra Jalaldoust Entropic Causal Inference November 28, 2019 5 / 25

slide-20
SLIDE 20

Approach

Amirkasra Jalaldoust Entropic Causal Inference November 28, 2019 6 / 25

slide-21
SLIDE 21

Approach

Key Assumption: Exogenous noise E is “simple” in the correct causal direction. Occam’s Razor There should not be too much complexity not included in the causal model

Amirkasra Jalaldoust Entropic Causal Inference November 28, 2019 7 / 25

slide-22
SLIDE 22

Approach

Focus on discrete random variables i.e. categorical variables pX(i) = P(X = i) Notions of simplicity: Renyi entropy Ha(X) = 1 1 − alog(

  • i

pX(i)a) This work emphasizes on: Shannon entropy: H1 Cardinality: H0

Amirkasra Jalaldoust Entropic Causal Inference November 28, 2019 8 / 25

slide-23
SLIDE 23

Approach

Objective: Find the minimum H(E) such that Y = f (X, E) is feasible

Amirkasra Jalaldoust Entropic Causal Inference November 28, 2019 9 / 25

slide-24
SLIDE 24

Identifiability (H0)

Causal model: M = ({X, Y }, E, f , X → Y , pX,E) Independent identically distributed samples: {(xi, yi)}i ∼ pX,Y Decide X → Y or Y → X, given the joint distribution pX,Y .

Amirkasra Jalaldoust Entropic Causal Inference November 28, 2019 10 / 25

slide-25
SLIDE 25

Identifiability (H0)

Have in mind that both X, Y have cardinality n and E has cardinality m.

Definition

(Conditional distribution matrix). n × n matrix Y |X(i, j) := P(Y = i|X = j). The vector vec(Y |X)(i + (j − 1)n) = Y |X(i, j) is called the conditional distribution vector.

Definition

(Block Partition Matrices). Consider a matrix M ∈ {0, 1}n2×m. Let mi,j represent the i + (j − 1)n th row of M. Let Si,j = {k ∈ [m] : mi,j(k) = 0}. The matrix M is called a block partition matrix if it belongs to C := {M : M ∈ {0, 1}n2×m, i ∈ [n]Si,j = [m], Si,j ∩ Sl,j = ∅, ∀i = l}.

Amirkasra Jalaldoust Entropic Causal Inference November 28, 2019 11 / 25

slide-26
SLIDE 26

Identifiability (H0)

Equivalent condition for existence of causal mechanism:

Lemma

Lemma 1. Given discrete random variables X, Y with distribution pX,Y , ∃ a causal model M = ({X, Y }, E, f , X → Y , pX,E) with H0(E) = m if and

  • nly if ∃M ∈ C, e ∈ Rm

+ with i e(i) = 1 that satisfy vec(Y |X) = Me.

Amirkasra Jalaldoust Entropic Causal Inference November 28, 2019 12 / 25

slide-27
SLIDE 27

Identifiability (H0)

Lemma

(Upper Bound on Minimum Cardinality of E). Let X, Y be two random variables with joint probability distribution pX,Y (x, y), where H0(X) = H0(Y ) = n. Then exists a causal model Y = f (X, E), X ⊥ ⊥ E that induces pX,Y , where m = H0(E) ≤ n(n1) + 1. If the columns of Y |X are uniformly sampled points in the n1 dimensional simplex, then n(n1) states are necessary for E

Amirkasra Jalaldoust Entropic Causal Inference November 28, 2019 13 / 25

slide-28
SLIDE 28

Identifiability (H0)

True causal direction: Y = f (X, E), X ⊥ ⊥ Y Wrong causal direction: X = g(Y , ˜ E), ˜ E ⊥ ⊥ X Under mild assumptions about the generation process of causal mechanism f , X, E instead of Y |X, we can have the same lower-bound.

Amirkasra Jalaldoust Entropic Causal Inference November 28, 2019 14 / 25

slide-29
SLIDE 29

Identifiability (H0)

Definition

(Generic Function). Let Y = f (X, E) where variables X, Y , E have supports X, Y, E, respectively. Let Sy,x = f −1

x

(y) ⊂ Ebe the inverse map, i.e., Sy,x = {e ∈ E : y = f (x, e)}. A function f is called “generic”, if for each (x1, x2, y) triple f −1

x1 (y) = f −1 x2 (y) and for every pair (x, y),

f −1

x

(y) = ∅. Causal mechanism f will be generic almost surely (!)

Amirkasra Jalaldoust Entropic Causal Inference November 28, 2019 15 / 25

slide-30
SLIDE 30

Identifiability (H0)

Theorem

(Identifiability). Consider the causal model M = ({X, Y }, E, f , X → Y , pX,E) where the random variables X, Y have n states, E ⊥ ⊥ X has θ states and f is a generic function. If the distributions of X and E are uniformly randomly selected from the n1 and 1 simplices, then with probability 1, any ˜ E ⊥ ⊥ Y that satisfies X = g(Y , ˜ E) for some deterministic function g has cardinality at least n(n − 1).

Amirkasra Jalaldoust Entropic Causal Inference November 28, 2019 16 / 25

slide-31
SLIDE 31

Identifiability (H0)

Assume that we have the algorithm A that given the joint probability of X, Y , outputs E and f such that Y = f (X, E) with minimum cardinality E.

Corollary

The causal direction can be recovered with probability 1 if the original exogenous random variable E has cardinality less than n(n − 1), the causal mechanism f is generic and the distributions of X and E are selected uniformly randomly from the proper simplice.

Amirkasra Jalaldoust Entropic Causal Inference November 28, 2019 17 / 25

slide-32
SLIDE 32

Identifiability (H0)

Proposition

(Inference algorithm). Suppose X → Y . Let X ∈ X, Y ∈ Y, |X| = n, |Y| = m. Assume that A is the algorithm that finds the exogenous variables E, ˜ E with minimum cardinality. Then, if the underlying exogenous variable has less cardinality than n(m − 1), with probability 1, we have H0(X) + H0(E) < H0(Y ) + H0( ˜ E) . Unfortunately, it turns out there does not exist an efficient algorithm A, unless P = NP

Definition

Subset sum problem: For a given set of integers V , and an integer a, decide whether there exists a subset S of V such that

u∈S u = a.

Amirkasra Jalaldoust Entropic Causal Inference November 28, 2019 18 / 25

slide-33
SLIDE 33

Identifiability (H1)

THE EXACT SAME STORY!

Amirkasra Jalaldoust Entropic Causal Inference November 28, 2019 19 / 25

slide-34
SLIDE 34

Identifiability (H1)

Theorem

(Minimum Entropy Causal Model). Assume that there exists an algorithm A that given n random variables {Zi}, i ∈ [n] with distributions pii, i ∈ [n] each with n states, outputs the joint distribution over Zi consistent with the given marginals, with minimum entropy. Then, A can be used to find the causal model with minimum input entropy, given any joint distribution pX,Y . Finding the causal model with minimum entropy of exogenous variable that induce a given distribution pX,Y is NP hard!

Amirkasra Jalaldoust Entropic Causal Inference November 28, 2019 20 / 25

slide-35
SLIDE 35

Identifiability (H1)

Conjecture

Consider the causal model of random variables X, Y having n states, E ⊥ ⊥ X having θ states. If the distribution of X is uniformly randomly selected from the n1 dimensional simplex and distribution of E is uniformly selected from the probability distributions that satisfy H1(E) ≤ log(n) + O(1) and f is randomly selected from all functions f : [n][][n], then with high probability, any ˜ E ⊥ ⊥ Y that satisfies X = g(Y , ˜ E) for some deterministic g entails H1(X) + H1(E) < H1(Y ) + H1(E).

Amirkasra Jalaldoust Entropic Causal Inference November 28, 2019 21 / 25

slide-36
SLIDE 36

Greedy Entropy Minimization

Amirkasra Jalaldoust Entropic Causal Inference November 28, 2019 22 / 25

slide-37
SLIDE 37

Experiments

Amirkasra Jalaldoust Entropic Causal Inference November 28, 2019 23 / 25

slide-38
SLIDE 38

Experiments

Amirkasra Jalaldoust Entropic Causal Inference November 28, 2019 24 / 25

slide-39
SLIDE 39

Experiments

Amirkasra Jalaldoust Entropic Causal Inference November 28, 2019 25 / 25