beyond the graphical lasso structure learning via inverse
play

Beyond the graphical Lasso: Structure learning via inverse - PowerPoint PPT Presentation

Beyond the graphical Lasso: Structure learning via inverse covariance estimation Po-Ling Loh UC Berkeley Department of Statistics ICML Workshop on Covariance Selection and Graphical Model Structure Learning June 26, 2014 Joint work with


  1. Beyond the graphical Lasso: Structure learning via inverse covariance estimation Po-Ling Loh UC Berkeley Department of Statistics ICML Workshop on Covariance Selection and Graphical Model Structure Learning June 26, 2014 Joint work with Martin Wainwright (UC Berkeley) & Peter B¨ uhlmann (ETH Z¨ urich) P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 1 / 40

  2. Outline Introduction 1 Generalized inverse covariances 2 Linear structural equation models 3 Corrupted data 4 P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 2 / 40

  3. Outline Introduction 1 Generalized inverse covariances 2 Linear structural equation models 3 Corrupted data 4 P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 3 / 40

  4. Undirected graphical models Undirected graph G = ( V , E ) Joint distribution of ( X 1 , . . . , X p ), where | V | = p X 3 X 1 X 2 X p P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 4 / 40

  5. Undirected graphical models Undirected graph G = ( V , E ) Joint distribution of ( X 1 , . . . , X p ), where | V | = p X 3 X 1 X 2 X p Markov property: ( s , t ) / ∈ E = ⇒ X s ⊥ ⊥ X t | X \{ s , t } P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 4 / 40

  6. Undirected graphical models Undirected graph G = ( V , E ) Joint distribution of ( X 1 , . . . , X p ), where | V | = p A X 3 X 1 B X 2 X p S More generally, X A ⊥ ⊥ X B | X S when S ⊆ V separates A from B P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 4 / 40

  7. Directed graphical models Directed acyclic graph G = ( V , E ) X 1 X 2 X 3 X p Markov property: X j ⊥ ⊥ X Nondesc( j ) | X Pa( j ) , ∀ j P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 5 / 40

  8. Structure learning � � n ( X ( i ) 1 , X ( i ) 2 , . . . , X ( i ) Goal: Edge recovery from n samples: p ) i =1 P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 6 / 40

  9. Structure learning � � n ( X ( i ) 1 , X ( i ) 2 , . . . , X ( i ) Goal: Edge recovery from n samples: p ) i =1 High-dimensional setting: p ≫ n , assume deg( G ) ≤ d P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 6 / 40

  10. Structure learning � � n ( X ( i ) 1 , X ( i ) 2 , . . . , X ( i ) Goal: Edge recovery from n samples: p ) i =1 High-dimensional setting: p ≫ n , assume deg( G ) ≤ d Sources of corruption: non-i.i.d. observations, contamination by noise/missing data P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 6 / 40

  11. Structure learning � � n ( X ( i ) 1 , X ( i ) 2 , . . . , X ( i ) Goal: Edge recovery from n samples: p ) i =1 High-dimensional setting: p ≫ n , assume deg( G ) ≤ d Sources of corruption: non-i.i.d. observations, contamination by noise/missing data Note: Structure learning generally harder for directed graphs (topological order unknown) P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 6 / 40

  12. Graphical Lasso When ( X 1 , . . . , X p ) ∼ N (0 , Σ), well-known fact: (Σ − 1 ) st = 0 ⇐ ⇒ ( s , t ) / ∈ E P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 7 / 40

  13. Graphical Lasso When ( X 1 , . . . , X p ) ∼ N (0 , Σ), well-known fact: (Σ − 1 ) st = 0 ⇐ ⇒ ( s , t ) / ∈ E Establishes statistical consistency of graphical Lasso (Yuan & Lin ’07):     � �  trace( � Θ ∈ arg min ΣΘ) − log det(Θ) + λ | Θ st |  Θ � 0 s � = t P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 7 / 40

  14. Some observations P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 8 / 40

  15. Some observations Only sample-based quantity is � Σ:     � �  trace( � Θ ∈ arg min ΣΘ) − log det(Θ) + λ | Θ st |  Θ � 0 s � = t P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 8 / 40

  16. Some observations Only sample-based quantity is � Σ:     � �  trace( � Θ ∈ arg min ΣΘ) − log det(Θ) + λ | Θ st |  Θ � 0 s � = t Although graphical Lasso is penalized Gaussian MLE , can always be used to estimate � Θ from � Σ: � � (Σ ∗ ) − 1 = arg min trace(Σ ∗ Θ) − log det(Θ) Θ P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 8 / 40

  17. Some observations Only sample-based quantity is � Σ:     � �  trace( � Θ ∈ arg min ΣΘ) − log det(Θ) + λ | Θ st |  Θ � 0 s � = t Although graphical Lasso is penalized Gaussian MLE , can always be used to estimate � Θ from � Σ: � � (Σ ∗ ) − 1 = arg min trace(Σ ∗ Θ) − log det(Θ) Θ We extend graphical Lasso to discrete-valued data (undirected case) and linear structural equation models (directed case) P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 8 / 40

  18. Theory for graphical Lasso If � � log p log p � � Σ − Σ ∗ � max � and λ � , n n then �� � log p � � Θ − Θ ∗ � max � + λ n P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 9 / 40

  19. Theory for graphical Lasso If � � log p log p � � Σ − Σ ∗ � max � and λ � , n n then �� � log p � � Θ − Θ ∗ � max � + λ n Deviation condition holds w.h.p. for various ensembles (e.g., sub-Gaussian) P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 9 / 40

  20. Theory for graphical Lasso If � � log p log p � � Σ − Σ ∗ � max � and λ � , n n then �� � log p � � Θ − Θ ∗ � max � + λ n Deviation condition holds w.h.p. for various ensembles (e.g., sub-Gaussian) � Thresholding � log p Θ at level yields correct support n P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 9 / 40

  21. Outline Introduction 1 Generalized inverse covariances 2 Linear structural equation models 3 Corrupted data 4 P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 10 / 40

  22. Non-Gaussian distributions (Liu et al. ’09, ’12): ( X 1 , . . . , X p ) follows nonparanormal distribution if ( f 1 ( X 1 ) , . . . , f p ( X p )) ∼ N (0 , Σ), and f j ’s monotone and differentiable P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 11 / 40

  23. Non-Gaussian distributions (Liu et al. ’09, ’12): ( X 1 , . . . , X p ) follows nonparanormal distribution if ( f 1 ( X 1 ) , . . . , f p ( X p )) ∼ N (0 , Σ), and f j ’s monotone and differentiable Then ( i , j ) / ∈ E iff Θ ij = 0 P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 11 / 40

  24. Non-Gaussian distributions (Liu et al. ’09, ’12): ( X 1 , . . . , X p ) follows nonparanormal distribution if ( f 1 ( X 1 ) , . . . , f p ( X p )) ∼ N (0 , Σ), and f j ’s monotone and differentiable Then ( i , j ) / ∈ E iff Θ ij = 0 In general non-Gaussian setting, relationship between entries of Θ = Σ − 1 and edges of G unknown P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 11 / 40

  25. Discrete graphical models Assume X i ’s take values in a discrete set: { 0 , 1 , . . . , m − 1 } P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 12 / 40

  26. Discrete graphical models Assume X i ’s take values in a discrete set: { 0 , 1 , . . . , m − 1 } Our results: Establish relationship between augmented inverse covariance matrices and edge structure New algorithms for structure learning in discrete graphs P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 12 / 40

  27. An illustrative example Binary Ising model:   � �  , P θ ( x 1 , . . . , x p ) ∝ exp θ s x s + θ st x s x t s ∈ V ( s , t ) ∈ E P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 13 / 40

  28. An illustrative example Binary Ising model:   � �  , P θ ( x 1 , . . . , x p ) ∝ exp θ s x s + θ st x s x t s ∈ V ( s , t ) ∈ E θ ∈ R p + ( p 2 ) , ( x 1 , . . . , x p ) ∈ { 0 , 1 } p P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 13 / 40

  29. An illustrative example Ising models with θ s = 0 . 1 , θ st = 2 X 1 X 4 X 2 X 3 X 1 X 4 X 2 X 3 P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 14 / 40

  30. An illustrative example Ising models with θ s = 0 . 1 , θ st = 2   X 1 X 4 9 . 80 − 3 . 59 0 0   − 3 . 59 34 . 30 − 4 . 77 0   Θ chain =   0 − 4 . 77 34 . 30 − 3 . 59 0 0 − 3 . 59 9 . 80 X 2 X 3   X 1 X 4 51 . 37 − 5 . 37 − 0 . 17 − 5 . 37   − 5 . 37 51 . 37 − 5 . 37 − 0 . 17   Θ loop =   − 0 . 17 − 5 . 37 51 . 37 − 5 . 37 − 5 . 37 − 0 . 17 − 5 . 37 51 . 37 X 2 X 3 P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 14 / 40

  31. An illustrative example Ising models with θ s = 0 . 1 , θ st = 2   X 1 X 4 9 . 80 − 3 . 59 0 0   − 3 . 59 34 . 30 − 4 . 77 0   Θ chain =   0 − 4 . 77 34 . 30 − 3 . 59 0 0 − 3 . 59 9 . 80 X 2 X 3   X 1 X 4 51 . 37 − 5 . 37 − 0 . 17 − 5 . 37   − 5 . 37 51 . 37 − 5 . 37 − 0 . 17   Θ loop =   − 0 . 17 − 5 . 37 51 . 37 − 5 . 37 − 5 . 37 − 0 . 17 − 5 . 37 51 . 37 X 2 X 3 Θ is graph-structured for chain, but not loop P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 14 / 40

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend