high dimensional graphical model selection
play

High-Dimensional Graphical Model Selection Anima Anandkumar U.C. - PowerPoint PPT Presentation

High-Dimensional Graphical Model Selection Anima Anandkumar U.C. Irvine Joint work with Vincent Tan (U. Wisc.) and Alan Willsky (MIT). Graphical Models: Definition Conditional Independence Dodgers Everton X A X B | X S B Red Sox S


  1. High-Dimensional Graphical Model Selection Anima Anandkumar U.C. Irvine Joint work with Vincent Tan (U. Wisc.) and Alan Willsky (MIT).

  2. Graphical Models: Definition Conditional Independence Dodgers Everton X A ⊥ ⊥ X B | X S B Red Sox S Baseball Soccer Chelsea A Yankees Arsenal Mets Manchester United Phillies

  3. Graphical Models: Definition Conditional Independence Dodgers Everton X A ⊥ ⊥ X B | X S B Red Sox S Baseball Soccer Chelsea A Yankees Factorization Arsenal Mets Manchester United   Phillies  �  . P ( x ) ∝ exp Ψ i,j ( x i , x j ) ( i,j ) ∈ G

  4. Graphical Models: Definition Conditional Independence Dodgers Everton X A ⊥ ⊥ X B | X S B Red Sox S Baseball Soccer Chelsea A Yankees Factorization Arsenal Mets Manchester United   Phillies  �  . P ( x ) ∝ exp Ψ i,j ( x i , x j ) ( i,j ) ∈ G Tree-Structured Graphical Models 4 1 2 3

  5. Graphical Models: Definition Conditional Independence Dodgers Everton X A ⊥ ⊥ X B | X S B Red Sox S Baseball Soccer Chelsea A Yankees Factorization Arsenal Mets Manchester United   Phillies  �  . P ( x ) ∝ exp Ψ i,j ( x i , x j ) ( i,j ) ∈ G Tree-Structured Graphical Models 4 � � P i,j ( x i , x j ) P ( x ) = P i ( x i ) P i ( x i ) P j ( x j ) i ∈ V ( i,j ) ∈ E 1 = P 1 ( x 1 ) P 2 | 1 ( x 2 | x 1 ) P 3 | 1 ( x 3 | x 1 ) P 4 | 1 ( x 4 | x 1 ) . 2 3

  6. Structure Learning of Graphical Models Graphical model on p nodes n i.i.d. samples from multivariate distribution B Output estimated structure � G n � � G n � = G � Structural Consistency: lim n →∞ P = 0 .

  7. Structure Learning of Graphical Models Graphical model on p nodes n i.i.d. samples from multivariate distribution B Output estimated structure � G n � � G n � = G � Structural Consistency: lim n →∞ P = 0 . Challenge: High Dimensionality (“Data-Poor” Regime) Large p , small n regime ( p ≫ n ) Sample Complexity: Required # of samples to achieve consistency Challenge: Computational Complexity Goal: Address above challenges and provide provable guarantees

  8. Tree Graphical Models: Tractable Learning Maximum likelihood learning of tree structure Proposed by Chow and Liu (68) Max. weight spanning tree n � B ˆ T ML = arg max log P ( x V ) . T k =1

  9. Tree Graphical Models: Tractable Learning Maximum likelihood learning of tree structure Proposed by Chow and Liu (68) Max. weight spanning tree n � B ˆ T ML = arg max log P ( x V ) . T k =1 � ˆ � I n ( X i ; X j ) . T ML = arg max T ( i,j ) ∈ T

  10. Tree Graphical Models: Tractable Learning Maximum likelihood learning of tree structure Proposed by Chow and Liu (68) Max. weight spanning tree n � B ˆ T ML = arg max log P ( x V ) . T k =1 � ˆ � I n ( X i ; X j ) . T ML = arg max T ( i,j ) ∈ T Pairwise statistics suffice for ML

  11. Tree Graphical Models: Tractable Learning Maximum likelihood learning of tree structure Proposed by Chow and Liu (68) Max. weight spanning tree n � B ˆ T ML = arg max log P ( x V ) . T k =1 � ˆ � I n ( X i ; X j ) . T ML = arg max T ( i,j ) ∈ T Pairwise statistics suffice for ML n samples and p nodes: Sample complexity: log p = O (1) . n

  12. Tree Graphical Models: Tractable Learning Maximum likelihood learning of tree structure Proposed by Chow and Liu (68) Max. weight spanning tree n � B ˆ T ML = arg max log P ( x V ) . T k =1 � ˆ � I n ( X i ; X j ) . T ML = arg max T ( i,j ) ∈ T Pairwise statistics suffice for ML n samples and p nodes: Sample complexity: log p = O (1) . n What other classes of graphical models are tractable for learning?

  13. Learning Graphical Models Beyond Trees Challenges Presence of cycles ◮ Pairwise statistics no longer suffice ◮ Likelihood function not tractable    � P ( x ) = 1  . Z exp Ψ i,j ( x i , x j ) ( i,j ) ∈ G

  14. Learning Graphical Models Beyond Trees Challenges Presence of cycles ◮ Pairwise statistics no longer suffice ◮ Likelihood function not tractable    � P ( x ) = 1  . Z exp Ψ i,j ( x i , x j ) ( i,j ) ∈ G Presence of high-degree nodes ◮ Brute-force search not tractable

  15. Learning Graphical Models Beyond Trees Challenges Presence of cycles ◮ Pairwise statistics no longer suffice ◮ Likelihood function not tractable    � P ( x ) = 1  . Z exp Ψ i,j ( x i , x j ) ( i,j ) ∈ G Presence of high-degree nodes ◮ Brute-force search not tractable Can we provide learning guarantees under above conditions?

  16. Learning Graphical Models Beyond Trees Challenges Presence of cycles ◮ Pairwise statistics no longer suffice ◮ Likelihood function not tractable    � P ( x ) = 1  . Z exp Ψ i,j ( x i , x j ) ( i,j ) ∈ G Presence of high-degree nodes ◮ Brute-force search not tractable Can we provide learning guarantees under above conditions? Our Perspective: Tractable Graph Families Characterize the class of tractable families Incorporate all the above challenges Relevant for real datasets, e.g., social-network data

  17. Related Work in Structure Learning Algorithms for Structure Learning Chow and Liu (68) Meinshausen and Buehlmann (06) Bresler, Mossel and Sly (09) Ravikumar, Wainwright and Lafferty (10) . . . Approaches Employed EM/Search approaches Combinatorial/Greedy approach Convex relaxation, . . .

  18. Outline Introduction 1 Tractable Graph Families 2 Structure Estimation in Graphical Models 3 Method and Guarantees 4 Conclusion 5

  19. Intuitions: Conditional Mutual Information Test Separators in Graphical Models i j S ? X i ⊥ ⊥ X j | X S ⇐ ⇒ I ( X i ; X j | X S ) = 0

  20. Intuitions: Conditional Mutual Information Test Separators in Graphical Models i j S ? X i ⊥ ⊥ X j | X S ⇐ ⇒ I ( X i ; X j | X S ) = 0 Observations ∆ -separator for graphs with maximum degree ∆ ◮ Brute-force search for the separator: argmin I ( X i ; X j | X S ) | S |≤ ∆ ◮ Computational complexity scales as O ( p ∆ )

  21. Intuitions: Conditional Mutual Information Test Separators in Graphical Models i j S ? X i ⊥ ⊥ X j | X S ⇐ ⇒ I ( X i ; X j | X S ) = 0 Observations ∆ -separator for graphs with maximum degree ∆ ◮ Brute-force search for the separator: argmin I ( X i ; X j | X S ) | S |≤ ∆ ◮ Computational complexity scales as O ( p ∆ ) Approximate separators in general graphs?

  22. Intuitions: Conditional Mutual Information Test Separators in Graphical Models i j S ? X i ⊥ ⊥ X j | X S ⇐ ⇒ I ( X i ; X j | X S ) = 0 Observations ∆ -separator for graphs with maximum degree ∆ ◮ Brute-force search for the separator: argmin I ( X i ; X j | X S ) | S |≤ ∆ ◮ Computational complexity scales as O ( p ∆ ) Approximate separators in general graphs?

  23. Intuitions: Conditional Mutual Information Test Separators in Graphical Models i j S ? X i �⊥ ⊥ X j | X S = ⇒ I ( X i ; X j | X S ) ≈ 0 Observations ∆ -separator for graphs with maximum degree ∆ ◮ Brute-force search for the separator: argmin I ( X i ; X j | X S ) | S |≤ ∆ ◮ Computational complexity scales as O ( p ∆ ) Approximate separators in general graphs?

  24. Tractable Graph Families: Local Separation γ -Local Separator S γ ( i, j ) i j Minimal vertex separator with respect to paths of S length less than γ ( η, γ ) -Local Separation Property for Graph G | S γ ( i, j ) | ≤ η for all ( i, j ) / ∈ G Locally tree-like Small-world Graphs Erd˝ os-R´ enyi graphs Watts-Strogatz model Power-law/scale-free graphs Hybrid/augmented graphs B

  25. Outline Introduction 1 Tractable Graph Families 2 Structure Estimation in Graphical Models 3 Method and Guarantees 4 Conclusion 5

  26. Setup: Ising and Gaussian Graphical Models n i.i.d. samples available for structure estimation

  27. Setup: Ising and Gaussian Graphical Models n i.i.d. samples available for structure estimation Ising and Gaussian Graphical Models � 1 � 2 x T J G x + h T x x ∈ {− 1 , 1 } p . P ( x ) ∝ exp , � � − 1 2 x T J G x + h T x x ∈ R p . f ( x ) ∝ exp ,

  28. Setup: Ising and Gaussian Graphical Models n i.i.d. samples available for structure estimation Ising and Gaussian Graphical Models � 1 � 2 x T J G x + h T x x ∈ {− 1 , 1 } p . P ( x ) ∝ exp , � � − 1 2 x T J G x + h T x x ∈ R p . f ( x ) ∝ exp , For ( i, j ) ∈ G , J min ≤ | J i,j | ≤ J max

  29. Setup: Ising and Gaussian Graphical Models n i.i.d. samples available for structure estimation Ising and Gaussian Graphical Models � 1 � 2 x T J G x + h T x x ∈ {− 1 , 1 } p . P ( x ) ∝ exp , � � − 1 2 x T J G x + h T x x ∈ R p . f ( x ) ∝ exp , For ( i, j ) ∈ G , J min ≤ | J i,j | ≤ J max Graph G satisfies ( η, γ ) local separation property

  30. Setup: Ising and Gaussian Graphical Models n i.i.d. samples available for structure estimation Ising and Gaussian Graphical Models � 1 � 2 x T J G x + h T x x ∈ {− 1 , 1 } p . P ( x ) ∝ exp , � � − 1 2 x T J G x + h T x x ∈ R p . f ( x ) ∝ exp , For ( i, j ) ∈ G , J min ≤ | J i,j | ≤ J max Graph G satisfies ( η, γ ) local separation property Tradeoff between η, γ, J min , J max for tractable learning

  31. Regime of Tractable Learning Efficient Learning Under Approximate Separation Maximum edge potential J max of Ising model satisfies J max < J ∗ . J ∗ is threshold for phase transition for conditional uniqueness.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend