high dimensional consistency in score based and hybrid
play

High-dimensional consistency in score-based and hybrid structure - PowerPoint PPT Presentation

High-dimensional consistency in score-based and hybrid structure learning Marloes Maathuis joint work with Preetam Nandy and Alain Hauser Structure learning We consider random variables ( X 1 , . . . , X p ) with distribution F 0 , where F 0


  1. High-dimensional consistency in score-based and hybrid structure learning Marloes Maathuis joint work with Preetam Nandy and Alain Hauser

  2. Structure learning ◮ We consider random variables ( X 1 , . . . , X p ) with distribution F 0 , where F 0 is multivariate Gaussian (or nonparanormal) ◮ We assume that F 0 has a perfect map G 0 ◮ Based on n i.i.d. observations from F 0 , we want to learn the CPDAG of G 0

  3. Terminology... ◮ We consider directed acyclic graphs (DAGs), where each node represents a random variable

  4. Terminology... ◮ We consider directed acyclic graphs (DAGs), where each node represents a random variable ◮ A DAG encodes d-separations (Pearl). Example: X 1 → X 2 → X 3 encodes that X 1 and X 3 are d-separated by X 2 .

  5. Terminology... ◮ We consider directed acyclic graphs (DAGs), where each node represents a random variable ◮ A DAG encodes d-separations (Pearl). Example: X 1 → X 2 → X 3 encodes that X 1 and X 3 are d-separated by X 2 . ◮ A DAG G is a perfect map of a distribution F if { d-separations in G} = { conditional independencies in F }

  6. Terminology... ◮ We consider directed acyclic graphs (DAGs), where each node represents a random variable ◮ A DAG encodes d-separations (Pearl). Example: X 1 → X 2 → X 3 encodes that X 1 and X 3 are d-separated by X 2 . ◮ A DAG G is a perfect map of a distribution F if { d-separations in G} = { conditional independencies in F } ◮ Examples: ◮ ( X 1 , X 2 , X 3 ) with X 1 ⊥ ⊥ X 3 | X 2 : 3 perfect maps: X 1 → X 2 → X 3 , X 1 ← X 2 ← X 3 , X 1 ← X 2 → X 3

  7. Terminology... ◮ We consider directed acyclic graphs (DAGs), where each node represents a random variable ◮ A DAG encodes d-separations (Pearl). Example: X 1 → X 2 → X 3 encodes that X 1 and X 3 are d-separated by X 2 . ◮ A DAG G is a perfect map of a distribution F if { d-separations in G} = { conditional independencies in F } ◮ Examples: ◮ ( X 1 , X 2 , X 3 ) with X 1 ⊥ ⊥ X 3 | X 2 : 3 perfect maps: X 1 → X 2 → X 3 , X 1 ← X 2 ← X 3 , X 1 ← X 2 → X 3 ◮ ( X 1 , X 2 , X 3 ) with X 1 ⊥ ⊥ X 3 : 1 perfect map: X 1 → X 2 ← X 3 (v-structure)

  8. Terminology... ◮ We consider directed acyclic graphs (DAGs), where each node represents a random variable ◮ A DAG encodes d-separations (Pearl). Example: X 1 → X 2 → X 3 encodes that X 1 and X 3 are d-separated by X 2 . ◮ A DAG G is a perfect map of a distribution F if { d-separations in G} = { conditional independencies in F } ◮ Examples: ◮ ( X 1 , X 2 , X 3 ) with X 1 ⊥ ⊥ X 3 | X 2 : 3 perfect maps: X 1 → X 2 → X 3 , X 1 ← X 2 ← X 3 , X 1 ← X 2 → X 3 ◮ ( X 1 , X 2 , X 3 ) with X 1 ⊥ ⊥ X 3 : 1 perfect map: X 1 → X 2 ← X 3 (v-structure) ◮ ( X 1 , X 2 , X 3 , X 4 ) with X 1 ⊥ ⊥ X 3 , X 2 ⊥ ⊥ X 4 : no perfect map

  9. Terminology... ◮ We consider directed acyclic graphs (DAGs), where each node represents a random variable ◮ A DAG encodes d-separations (Pearl). Example: X 1 → X 2 → X 3 encodes that X 1 and X 3 are d-separated by X 2 . ◮ A DAG G is a perfect map of a distribution F if { d-separations in G} = { conditional independencies in F } ◮ Examples: ◮ ( X 1 , X 2 , X 3 ) with X 1 ⊥ ⊥ X 3 | X 2 : 3 perfect maps: X 1 → X 2 → X 3 , X 1 ← X 2 ← X 3 , X 1 ← X 2 → X 3 ◮ ( X 1 , X 2 , X 3 ) with X 1 ⊥ ⊥ X 3 : 1 perfect map: X 1 → X 2 ← X 3 (v-structure) ◮ ( X 1 , X 2 , X 3 , X 4 ) with X 1 ⊥ ⊥ X 3 , X 2 ⊥ ⊥ X 4 : no perfect map ◮ We consider distributions that have a perfect map

  10. Markov equivalence classes and CPDAGs ◮ DAGs that encode the same set of d-separations form a Markov equivalence class. Example: X 1 → X 2 → X 3 , X 1 ← X 2 ← X 3 , X 1 ← X 2 → X 3

  11. Markov equivalence classes and CPDAGs ◮ DAGs that encode the same set of d-separations form a Markov equivalence class. Example: X 1 → X 2 → X 3 , X 1 ← X 2 ← X 3 , X 1 ← X 2 → X 3 ◮ All DAGs in a Markov equivalence class share the same skeleton and the same v-structures

  12. Markov equivalence classes and CPDAGs ◮ DAGs that encode the same set of d-separations form a Markov equivalence class. Example: X 1 → X 2 → X 3 , X 1 ← X 2 ← X 3 , X 1 ← X 2 → X 3 ◮ All DAGs in a Markov equivalence class share the same skeleton and the same v-structures ◮ A Markov equivalence class can be described uniquely by a CPDAG. We want to learn the CPDAG. Example: X 2 X 1 X 4 X 3 CPDAG

  13. Markov equivalence classes and CPDAGs ◮ DAGs that encode the same set of d-separations form a Markov equivalence class. Example: X 1 → X 2 → X 3 , X 1 ← X 2 ← X 3 , X 1 ← X 2 → X 3 ◮ All DAGs in a Markov equivalence class share the same skeleton and the same v-structures ◮ A Markov equivalence class can be described uniquely by a CPDAG. We want to learn the CPDAG. Example: X 2 X 2 X 2 X 2 X 1 X 4 X 1 X 4 X 1 X 4 X 1 X 4 X 3 X 3 X 3 X 3 CPDAG DAG 1 DAG 2 DAG 3

  14. Possible applications of DAGs/CPDAGs ◮ Efficient estimation/computation using factorization: p � f ( x 1 , . . . , x p ) = f ( x j | pa ( x j , G )) j =1 ◮ Probabilistic reasoning in expert systems ◮ Causal inference ◮ ...

  15. CPDAG versus conditional independence graph ◮ A conditional independence graph (CIG) is an undirected graph, where X i and X j are adjacent ⇔ X i � ⊥ ⊥ X j | S for S = { all remaining variables }

  16. CPDAG versus conditional independence graph ◮ A conditional independence graph (CIG) is an undirected graph, where X i and X j are adjacent ⇔ X i � ⊥ ⊥ X j | S for S = { all remaining variables } ◮ A CPDAG is a partially directed graph, where X i and X j are adjacent ⇔ X i � ⊥ ⊥ X j | S for all S ⊆ { all remaining variables }

  17. CPDAG versus conditional independence graph ◮ A conditional independence graph (CIG) is an undirected graph, where X i and X j are adjacent ⇔ X i � ⊥ ⊥ X j | S for S = { all remaining variables } ◮ A CPDAG is a partially directed graph, where X i and X j are adjacent ⇔ X i � ⊥ ⊥ X j | S for all S ⊆ { all remaining variables } ◮ The skeleton of the CPDAG is a subgraph of the CIG

  18. CPDAG versus conditional independence graph ◮ A conditional independence graph (CIG) is an undirected graph, where X i and X j are adjacent ⇔ X i � ⊥ ⊥ X j | S for S = { all remaining variables } ◮ A CPDAG is a partially directed graph, where X i and X j are adjacent ⇔ X i � ⊥ ⊥ X j | S for all S ⊆ { all remaining variables } ◮ The skeleton of the CPDAG is a subgraph of the CIG ◮ The CIG can be obtained from the CPDAG by “moralization”: Marry unmarried parents and then make all edges undirected

  19. CPDAG versus conditional independence graph ◮ A conditional independence graph (CIG) is an undirected graph, where X i and X j are adjacent ⇔ X i � ⊥ ⊥ X j | S for S = { all remaining variables } ◮ A CPDAG is a partially directed graph, where X i and X j are adjacent ⇔ X i � ⊥ ⊥ X j | S for all S ⊆ { all remaining variables } ◮ The skeleton of the CPDAG is a subgraph of the CIG ◮ The CIG can be obtained from the CPDAG by “moralization”: Marry unmarried parents and then make all edges undirected ◮ Example: X 2 X 1 X 4 X 3 CPDAG

  20. CPDAG versus conditional independence graph ◮ A conditional independence graph (CIG) is an undirected graph, where X i and X j are adjacent ⇔ X i � ⊥ ⊥ X j | S for S = { all remaining variables } ◮ A CPDAG is a partially directed graph, where X i and X j are adjacent ⇔ X i � ⊥ ⊥ X j | S for all S ⊆ { all remaining variables } ◮ The skeleton of the CPDAG is a subgraph of the CIG ◮ The CIG can be obtained from the CPDAG by “moralization”: Marry unmarried parents and then make all edges undirected ◮ Example: X 2 X 2 X 1 X 4 X 1 X 4 X 3 X 3 CPDAG Marrying

  21. CPDAG versus conditional independence graph ◮ A conditional independence graph (CIG) is an undirected graph, where X i and X j are adjacent ⇔ X i � ⊥ ⊥ X j | S for S = { all remaining variables } ◮ A CPDAG is a partially directed graph, where X i and X j are adjacent ⇔ X i � ⊥ ⊥ X j | S for all S ⊆ { all remaining variables } ◮ The skeleton of the CPDAG is a subgraph of the CIG ◮ The CIG can be obtained from the CPDAG by “moralization”: Marry unmarried parents and then make all edges undirected ◮ Example: X 2 X 2 X 2 X 1 X 4 X 1 X 4 X 1 X 4 X 3 X 3 X 3 CPDAG Marrying CIG

  22. Summary of problem definition ◮ We consider random variables ( X 1 , . . . , X p ) with distribution F 0 , where F 0 is multivariate Gaussian (or nonparanormal) ◮ We assume that F 0 has a perfect map G 0 ◮ Based on n i.i.d. observations from F 0 , we want to learn the CPDAG of G 0

  23. Three main approaches for structure learning ◮ Constraint-based: ◮ Conditional independencies in the data impose constraints on the CPDAG ◮ Example: PC-algorithm (Spirtes et al. ’93) ◮ Score-based: ◮ A score function is optimized over the space of DAGs/CPDAGs ◮ Example: greedy equivalence search (GES) (Chickering ’02) ◮ Hybrid: ◮ A score function is optimized over a restricted space of DAGs/CPDAGs, where the restricted space is determined using conditional independence constraints ◮ Examples: Max-Min Hill Climbing (MMHC) (Tsmardinos et al ’06), Restricted GES (RGES: GES restricted to estimated CIG)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend