High-dimensional consistency in score-based and hybrid structure - PowerPoint PPT Presentation

High-dimensional consistency in score-based and hybrid structure learning Marloes Maathuis joint work with Preetam Nandy and Alain Hauser

Structure learning ◮ We consider random variables ( X 1 , . . . , X p ) with distribution F 0 , where F 0 is multivariate Gaussian (or nonparanormal) ◮ We assume that F 0 has a perfect map G 0 ◮ Based on n i.i.d. observations from F 0 , we want to learn the CPDAG of G 0

Terminology... ◮ We consider directed acyclic graphs (DAGs), where each node represents a random variable

Terminology... ◮ We consider directed acyclic graphs (DAGs), where each node represents a random variable ◮ A DAG encodes d-separations (Pearl). Example: X 1 → X 2 → X 3 encodes that X 1 and X 3 are d-separated by X 2 .

Terminology... ◮ We consider directed acyclic graphs (DAGs), where each node represents a random variable ◮ A DAG encodes d-separations (Pearl). Example: X 1 → X 2 → X 3 encodes that X 1 and X 3 are d-separated by X 2 . ◮ A DAG G is a perfect map of a distribution F if { d-separations in G} = { conditional independencies in F }

Terminology... ◮ We consider directed acyclic graphs (DAGs), where each node represents a random variable ◮ A DAG encodes d-separations (Pearl). Example: X 1 → X 2 → X 3 encodes that X 1 and X 3 are d-separated by X 2 . ◮ A DAG G is a perfect map of a distribution F if { d-separations in G} = { conditional independencies in F } ◮ Examples: ◮ ( X 1 , X 2 , X 3 ) with X 1 ⊥ ⊥ X 3 | X 2 : 3 perfect maps: X 1 → X 2 → X 3 , X 1 ← X 2 ← X 3 , X 1 ← X 2 → X 3

Terminology... ◮ We consider directed acyclic graphs (DAGs), where each node represents a random variable ◮ A DAG encodes d-separations (Pearl). Example: X 1 → X 2 → X 3 encodes that X 1 and X 3 are d-separated by X 2 . ◮ A DAG G is a perfect map of a distribution F if { d-separations in G} = { conditional independencies in F } ◮ Examples: ◮ ( X 1 , X 2 , X 3 ) with X 1 ⊥ ⊥ X 3 | X 2 : 3 perfect maps: X 1 → X 2 → X 3 , X 1 ← X 2 ← X 3 , X 1 ← X 2 → X 3 ◮ ( X 1 , X 2 , X 3 ) with X 1 ⊥ ⊥ X 3 : 1 perfect map: X 1 → X 2 ← X 3 (v-structure)

Terminology... ◮ We consider directed acyclic graphs (DAGs), where each node represents a random variable ◮ A DAG encodes d-separations (Pearl). Example: X 1 → X 2 → X 3 encodes that X 1 and X 3 are d-separated by X 2 . ◮ A DAG G is a perfect map of a distribution F if { d-separations in G} = { conditional independencies in F } ◮ Examples: ◮ ( X 1 , X 2 , X 3 ) with X 1 ⊥ ⊥ X 3 | X 2 : 3 perfect maps: X 1 → X 2 → X 3 , X 1 ← X 2 ← X 3 , X 1 ← X 2 → X 3 ◮ ( X 1 , X 2 , X 3 ) with X 1 ⊥ ⊥ X 3 : 1 perfect map: X 1 → X 2 ← X 3 (v-structure) ◮ ( X 1 , X 2 , X 3 , X 4 ) with X 1 ⊥ ⊥ X 3 , X 2 ⊥ ⊥ X 4 : no perfect map

Terminology... ◮ We consider directed acyclic graphs (DAGs), where each node represents a random variable ◮ A DAG encodes d-separations (Pearl). Example: X 1 → X 2 → X 3 encodes that X 1 and X 3 are d-separated by X 2 . ◮ A DAG G is a perfect map of a distribution F if { d-separations in G} = { conditional independencies in F } ◮ Examples: ◮ ( X 1 , X 2 , X 3 ) with X 1 ⊥ ⊥ X 3 | X 2 : 3 perfect maps: X 1 → X 2 → X 3 , X 1 ← X 2 ← X 3 , X 1 ← X 2 → X 3 ◮ ( X 1 , X 2 , X 3 ) with X 1 ⊥ ⊥ X 3 : 1 perfect map: X 1 → X 2 ← X 3 (v-structure) ◮ ( X 1 , X 2 , X 3 , X 4 ) with X 1 ⊥ ⊥ X 3 , X 2 ⊥ ⊥ X 4 : no perfect map ◮ We consider distributions that have a perfect map

Markov equivalence classes and CPDAGs ◮ DAGs that encode the same set of d-separations form a Markov equivalence class. Example: X 1 → X 2 → X 3 , X 1 ← X 2 ← X 3 , X 1 ← X 2 → X 3

Markov equivalence classes and CPDAGs ◮ DAGs that encode the same set of d-separations form a Markov equivalence class. Example: X 1 → X 2 → X 3 , X 1 ← X 2 ← X 3 , X 1 ← X 2 → X 3 ◮ All DAGs in a Markov equivalence class share the same skeleton and the same v-structures

Markov equivalence classes and CPDAGs ◮ DAGs that encode the same set of d-separations form a Markov equivalence class. Example: X 1 → X 2 → X 3 , X 1 ← X 2 ← X 3 , X 1 ← X 2 → X 3 ◮ All DAGs in a Markov equivalence class share the same skeleton and the same v-structures ◮ A Markov equivalence class can be described uniquely by a CPDAG. We want to learn the CPDAG. Example: X 2 X 1 X 4 X 3 CPDAG

Markov equivalence classes and CPDAGs ◮ DAGs that encode the same set of d-separations form a Markov equivalence class. Example: X 1 → X 2 → X 3 , X 1 ← X 2 ← X 3 , X 1 ← X 2 → X 3 ◮ All DAGs in a Markov equivalence class share the same skeleton and the same v-structures ◮ A Markov equivalence class can be described uniquely by a CPDAG. We want to learn the CPDAG. Example: X 2 X 2 X 2 X 2 X 1 X 4 X 1 X 4 X 1 X 4 X 1 X 4 X 3 X 3 X 3 X 3 CPDAG DAG 1 DAG 2 DAG 3

Possible applications of DAGs/CPDAGs ◮ Efficient estimation/computation using factorization: p � f ( x 1 , . . . , x p ) = f ( x j | pa ( x j , G )) j =1 ◮ Probabilistic reasoning in expert systems ◮ Causal inference ◮ ...

CPDAG versus conditional independence graph ◮ A conditional independence graph (CIG) is an undirected graph, where X i and X j are adjacent ⇔ X i � ⊥ ⊥ X j | S for S = { all remaining variables }

CPDAG versus conditional independence graph ◮ A conditional independence graph (CIG) is an undirected graph, where X i and X j are adjacent ⇔ X i � ⊥ ⊥ X j | S for S = { all remaining variables } ◮ A CPDAG is a partially directed graph, where X i and X j are adjacent ⇔ X i � ⊥ ⊥ X j | S for all S ⊆ { all remaining variables }

CPDAG versus conditional independence graph ◮ A conditional independence graph (CIG) is an undirected graph, where X i and X j are adjacent ⇔ X i � ⊥ ⊥ X j | S for S = { all remaining variables } ◮ A CPDAG is a partially directed graph, where X i and X j are adjacent ⇔ X i � ⊥ ⊥ X j | S for all S ⊆ { all remaining variables } ◮ The skeleton of the CPDAG is a subgraph of the CIG

CPDAG versus conditional independence graph ◮ A conditional independence graph (CIG) is an undirected graph, where X i and X j are adjacent ⇔ X i � ⊥ ⊥ X j | S for S = { all remaining variables } ◮ A CPDAG is a partially directed graph, where X i and X j are adjacent ⇔ X i � ⊥ ⊥ X j | S for all S ⊆ { all remaining variables } ◮ The skeleton of the CPDAG is a subgraph of the CIG ◮ The CIG can be obtained from the CPDAG by “moralization”: Marry unmarried parents and then make all edges undirected

CPDAG versus conditional independence graph ◮ A conditional independence graph (CIG) is an undirected graph, where X i and X j are adjacent ⇔ X i � ⊥ ⊥ X j | S for S = { all remaining variables } ◮ A CPDAG is a partially directed graph, where X i and X j are adjacent ⇔ X i � ⊥ ⊥ X j | S for all S ⊆ { all remaining variables } ◮ The skeleton of the CPDAG is a subgraph of the CIG ◮ The CIG can be obtained from the CPDAG by “moralization”: Marry unmarried parents and then make all edges undirected ◮ Example: X 2 X 1 X 4 X 3 CPDAG

CPDAG versus conditional independence graph ◮ A conditional independence graph (CIG) is an undirected graph, where X i and X j are adjacent ⇔ X i � ⊥ ⊥ X j | S for S = { all remaining variables } ◮ A CPDAG is a partially directed graph, where X i and X j are adjacent ⇔ X i � ⊥ ⊥ X j | S for all S ⊆ { all remaining variables } ◮ The skeleton of the CPDAG is a subgraph of the CIG ◮ The CIG can be obtained from the CPDAG by “moralization”: Marry unmarried parents and then make all edges undirected ◮ Example: X 2 X 2 X 1 X 4 X 1 X 4 X 3 X 3 CPDAG Marrying

CPDAG versus conditional independence graph ◮ A conditional independence graph (CIG) is an undirected graph, where X i and X j are adjacent ⇔ X i � ⊥ ⊥ X j | S for S = { all remaining variables } ◮ A CPDAG is a partially directed graph, where X i and X j are adjacent ⇔ X i � ⊥ ⊥ X j | S for all S ⊆ { all remaining variables } ◮ The skeleton of the CPDAG is a subgraph of the CIG ◮ The CIG can be obtained from the CPDAG by “moralization”: Marry unmarried parents and then make all edges undirected ◮ Example: X 2 X 2 X 2 X 1 X 4 X 1 X 4 X 1 X 4 X 3 X 3 X 3 CPDAG Marrying CIG

Summary of problem definition ◮ We consider random variables ( X 1 , . . . , X p ) with distribution F 0 , where F 0 is multivariate Gaussian (or nonparanormal) ◮ We assume that F 0 has a perfect map G 0 ◮ Based on n i.i.d. observations from F 0 , we want to learn the CPDAG of G 0

Three main approaches for structure learning ◮ Constraint-based: ◮ Conditional independencies in the data impose constraints on the CPDAG ◮ Example: PC-algorithm (Spirtes et al. ’93) ◮ Score-based: ◮ A score function is optimized over the space of DAGs/CPDAGs ◮ Example: greedy equivalence search (GES) (Chickering ’02) ◮ Hybrid: ◮ A score function is optimized over a restricted space of DAGs/CPDAGs, where the restricted space is determined using conditional independence constraints ◮ Examples: Max-Min Hill Climbing (MMHC) (Tsmardinos et al ’06), Restricted GES (RGES: GES restricted to estimated CIG)

High-dimensional consistency in score-based and hybrid structure - PowerPoint PPT Presentation

High-dimensional consistency in score-based and hybrid structure learning Marloes Maathuis joint work with Preetam Nandy and Alain Hauser Structure learning We consider random variables ( X 1 , . . . , X p ) with distribution F 0 , where F 0

Consistency - Chapter 5 Introduce several notions of Local Consistency: arc consistency,

Constraint Programming - An overview Node-consistency Arc-consistency Path-consistency

MARC Fall Meeting 09/24/17 MARC Fall Meeting 09/24/17 SCORE Presentation SCORE

Web Cache Consistency Web Cache Consistency Web Cache Consistency Web Cache Consistency

1 Applications ? Trading Consistency for Performance Applications ? Trading Consistency for

Hybrid Construction Hybrid Construction Hybrid Construction Hybrid Construction 1 VP

Web/CD Hybrid Model Web/CD Hybrid Model Web/CD Hybrid Model Web/CD Hybrid Model for t he Dist

Hybrid Automobiles Hybrid Automobiles It switches easily between fuel, batteries, or both It

Seminar: Search and Optimization Directional Consistency Gabi R oger Universit at Basel

Consistent Storage or Scalable Storage Why Not Both? CONSISTENCY Strong Consistency

Advanced consistency methods Chapter 8 ICS-275 Winter 2016 Winter 2016 ICS 275 - Constraint

Hybrid NLP Hybrid NLP O UTLINE O UTLINE Problems of Deep and Shallow Processing

EXPO REAL Hybrid Summit Your virtual exhibition EXPO REAL Hybrid Summit The Hybrid Conference

n -dimensional manifold M with T := TM n -dimensional manifold M with T := TM T n -dimensional

Model Predictive Control Model Predictive Control of Hybrid Systems of Hybrid Systems Model

WELCOME TO WASHINGTON STATE HORSEMEN CONVENTION 2012 HIGH SCORE ENGLISH SHOW HORSE A

Outline Introduction. Paper: System Design for Ultra-Low Power. Bernier, C. Hameau, F.,

Key Algebraic Results in Linear Regression James H. Steiger Department of Psychology and Human

Statistics! EDUC 7610 Chapter 3 The Multiple Regression Model ! " = $ % + $ ' ( '" $

Game theory (Ch. 17.5) Game theory Typically game theory uses a payoff matrix to represent the

Variational Inference of Sparse Network from Count Data Julien Chiquet, Mahendra Mariadasou, St

Prs Prsrt Pt

Question 1 1. How is the sub-project going to align the US-CMS deliverables with the

CSEP505: Programming Languages Lecture 5: continuations, types Dan Grossman Spring 2006

High-dimensional consistency in score-based and hybrid structure - PowerPoint PPT Presentation

High-dimensional consistency in score-based and hybrid structure learning Marloes Maathuis joint work with Preetam Nandy and Alain Hauser Structure learning We consider random variables ( X 1 , . . . , X p ) with distribution F 0 , where F 0

Consistency - Chapter 5 Introduce several notions of Local Consistency: arc consistency,

Constraint Programming - An overview Node-consistency Arc-consistency Path-consistency

MARC Fall Meeting 09/24/17 MARC Fall Meeting 09/24/17 SCORE Presentation SCORE

Web Cache Consistency Web Cache Consistency Web Cache Consistency Web Cache Consistency

1 Applications ? Trading Consistency for Performance Applications ? Trading Consistency for

Hybrid Construction Hybrid Construction Hybrid Construction Hybrid Construction 1 VP

Web/CD Hybrid Model Web/CD Hybrid Model Web/CD Hybrid Model Web/CD Hybrid Model for t he Dist

Hybrid Automobiles Hybrid Automobiles It switches easily between fuel, batteries, or both It

Seminar: Search and Optimization Directional Consistency Gabi R oger Universit at Basel

Consistent Storage or Scalable Storage Why Not Both? CONSISTENCY Strong Consistency

Advanced consistency methods Chapter 8 ICS-275 Winter 2016 Winter 2016 ICS 275 - Constraint

Hybrid NLP Hybrid NLP O UTLINE O UTLINE Problems of Deep and Shallow Processing

EXPO REAL Hybrid Summit Your virtual exhibition EXPO REAL Hybrid Summit The Hybrid Conference

n -dimensional manifold M with T := TM n -dimensional manifold M with T := TM T n -dimensional

Model Predictive Control Model Predictive Control of Hybrid Systems of Hybrid Systems Model

WELCOME TO WASHINGTON STATE HORSEMEN CONVENTION 2012 HIGH SCORE ENGLISH SHOW HORSE A

Outline Introduction. Paper: System Design for Ultra-Low Power. Bernier, C. Hameau, F.,

Key Algebraic Results in Linear Regression James H. Steiger Department of Psychology and Human

Statistics! EDUC 7610 Chapter 3 The Multiple Regression Model ! &quot; = $ % + $ ' ( '&quot; $

Game theory (Ch. 17.5) Game theory Typically game theory uses a payoff matrix to represent the

Variational Inference of Sparse Network from Count Data Julien Chiquet, Mahendra Mariadasou, St

Prs Prsrt Pt

Question 1 1. How is the sub-project going to align the US-CMS deliverables with the

CSEP505: Programming Languages Lecture 5: continuations, types Dan Grossman Spring 2006

Statistics! EDUC 7610 Chapter 3 The Multiple Regression Model ! " = $ % + $ ' ( '" $