learning linear bayesian networks with latent variables
play

Learning Linear Bayesian Networks with Latent Variables Adel - PowerPoint PPT Presentation

Learning Linear Bayesian Networks with Latent Variables Adel Javanmard Stanford University joint work with Anima Anandkumar , Daniel Hsu , Sham Kakade University of California, Irvine Microsoft Research, New England Adel


  1. Learning Linear Bayesian Networks with Latent Variables Adel Javanmard Stanford University joint work with Anima Anandkumar ✄ , Daniel Hsu ② , Sham Kakade ② ✄ University of California, Irvine ② Microsoft Research, New England Adel Javanmard (Stanford University) Linear Bayesian Networks 1 / 22

  2. Modern data ◮ Lots of high-dimensional data, but highly structured. ◮ Learning the underlying structure is central to: ✎ Modeling ✎ Dimensionality reduction/ summarizing data ✎ Prediction This talk: Learning hidden (unobserved) variables that pervaded the data. Adel Javanmard (Stanford University) Linear Bayesian Networks 2 / 22

  3. Modern data ◮ Lots of high-dimensional data, but highly structured. ◮ Learning the underlying structure is central to: ✎ Modeling ✎ Dimensionality reduction/ summarizing data ✎ Prediction This talk: Learning hidden (unobserved) variables that pervaded the data. Adel Javanmard (Stanford University) Linear Bayesian Networks 2 / 22

  4. Example: document modeling In One Day, 11,000 Flee Syria as War and Hardship Worsen By RICK GLADSTONE and NEIL Hurricane Exposed Flaws in Protection of MacFARQUHAR Tunnels The United Nations reported that 11,000 By ELISABETH ROSENTHAL Syrians fled on Friday, the vast majority of Nearly two weeks after Hurricane Sandy them clambering for safety over the Turkish struck, the vital arteries that bring cars, trucks border. Nursing Home Is Faulted Over Care After and subways into New York City’s Storm transportation network have recovered, with By MICHAEL POWELL and SHERI FINK one major exception: the Brooklyn-Battery Amid the worst hurricane to hit New York City Obama to Insist on Tax Increase for the Tunnel remains closed. in nearly 80 years, officials have claimed that Wealthy the Promenade Rehabilitation and Health By HELENE COOPER and JONATHAN Behind New York Gas Lines, Warnings and Care Center failed to provide the most basic WEISMAN Crossed Fingers care to its patients. Amid talk of compromise, President Obama By DAVID W. CHEN, WINNIE HU and and Speaker John A. Boehner both indicated CLIFFORD KRAUSS unchanged stances on this issue, long a point The return of 1970s-era gas lines to the five of contention. boroughs of New York City was not the result of a single miscalculation, but a combination of ignored warnings and indecisiveness. Observations: words Hidden variables: topics Adel Javanmard (Stanford University) Linear Bayesian Networks 3 / 22

  5. Topics genome disease software molecular tuberculosis system sequence penumonia parallel DNA control hardware human doctor cyber genetics weak network map resistance data project fatal program Adel Javanmard (Stanford University) Linear Bayesian Networks 4 / 22

  6. Example: social network modeling Observations: social interactions Hidden: communities, relationships Adel Javanmard (Stanford University) Linear Bayesian Networks 5 / 22

  7. Example: bio-informatics Observations: gene expressions Hidden variables: gene regulators Adel Javanmard (Stanford University) Linear Bayesian Networks 6 / 22

  8. Linear Bayesian Network Markov relationship on DAG ◮ PA i ✿ parents of node i . ◮ P ✒ ✭ z ✮ ❂ ◗ n i ❂ 1 P ✒ ✭ z i ❥ z PA i ✮ . Linear model with latent nodes ◮ Observed variables ❢ x i ❣ and hidden variables ❢ h i ❣ . ◮ Linear relations: x i ❂ P j ✷ PA i a ij h j ✰ ✎ i ◮ uncorrelated noise variables ✎ i Adel Javanmard (Stanford University) Linear Bayesian Networks 7 / 22

  9. Linear Bayesian Network Markov relationship on DAG h 1 h 2 h 3 ◮ PA i ✿ parents of node i . A ◮ P ✒ ✭ z ✮ ❂ ◗ n i ❂ 1 P ✒ ✭ z i ❥ z PA i ✮ . x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 Linear model with latent nodes ◮ Observed variables ❢ x i ❣ and hidden variables ❢ h i ❣ . ◮ Linear relations: x i ❂ P j ✷ PA i a ij h j ✰ ✎ i ◮ uncorrelated noise variables ✎ i Adel Javanmard (Stanford University) Linear Bayesian Networks 7 / 22

  10. Learning latent models Goal: Given the observed data, learn structure and parameters of model. Challenges: ◮ Identifiablity Many models can explain the observed data! ◮ ICA: no edge between hidden nodes ◮ LDA: hidden variables are drawn from a Dirichlet distribution ◮ latent trees, graphical models with long cycles. [Anandkumar et.al. 2011, Choi et. al. 2011, Daskalakis et. al. 2006] ◮ Tractable learning algorithms: ◮ Maximum likelihood (tractable on trees, NP-hard in general) ◮ Expectation maximization [Redner, Walker 1984], Gibbs sampling [Asuncion et. al. 2011] ◮ Local tests [Bresler et. al. 2008, Anadkumar et. al. 2012, ] ◮ Convex relaxations (e.g. Lasso) [Meinshausen, Bühlmann 2006, Ravikumar, Wainwright 2010 ] Adel Javanmard (Stanford University) Linear Bayesian Networks 8 / 22

  11. Learning latent models Goal: Given the observed data, learn structure and parameters of model. Challenges: ◮ Identifiablity Many models can explain the observed data! ◮ ICA: no edge between hidden nodes ◮ LDA: hidden variables are drawn from a Dirichlet distribution ◮ latent trees, graphical models with long cycles. [Anandkumar et.al. 2011, Choi et. al. 2011, Daskalakis et. al. 2006] ◮ Tractable learning algorithms: ◮ Maximum likelihood (tractable on trees, NP-hard in general) ◮ Expectation maximization [Redner, Walker 1984], Gibbs sampling [Asuncion et. al. 2011] ◮ Local tests [Bresler et. al. 2008, Anadkumar et. al. 2012, ] ◮ Convex relaxations (e.g. Lasso) [Meinshausen, Bühlmann 2006, Ravikumar, Wainwright 2010 ] Adel Javanmard (Stanford University) Linear Bayesian Networks 8 / 22

  12. Learning latent models Goal: Given the observed data, learn structure and parameters of model. Challenges: ◮ Identifiablity Many models can explain the observed data! ◮ ICA: no edge between hidden nodes ◮ LDA: hidden variables are drawn from a Dirichlet distribution ◮ latent trees, graphical models with long cycles. [Anandkumar et.al. 2011, Choi et. al. 2011, Daskalakis et. al. 2006] ◮ Tractable learning algorithms: ◮ Maximum likelihood (tractable on trees, NP-hard in general) ◮ Expectation maximization [Redner, Walker 1984], Gibbs sampling [Asuncion et. al. 2011] ◮ Local tests [Bresler et. al. 2008, Anadkumar et. al. 2012, ] ◮ Convex relaxations (e.g. Lasso) [Meinshausen, Bühlmann 2006, Ravikumar, Wainwright 2010 ] Adel Javanmard (Stanford University) Linear Bayesian Networks 8 / 22

  13. An example ✄ ❂ ✭ ✕ ij ✮ A ❂ ✭ a ij ✮ Adel Javanmard (Stanford University) Linear Bayesian Networks 9 / 22

  14. An example ✄ ❂ ✭ ✕ ij ✮ ✑ 1 ✑ 2 ✑ 3 A ✭ I � ✄✮ � 1 A ❂ ✭ a ij ✮ ✭ ❂ Ah ✰ ✎ x ✮ x ❂ A ✭ I � ✄✮ � 1 ✑ ✰ ✎ ❂ ✄ h ✰ ✑ ❂ h Adel Javanmard (Stanford University) Linear Bayesian Networks 9 / 22

  15. An example ✄ ❂ ✭ ✕ ij ✮ ✑ 1 ✑ 2 ✑ 3 A ✭ I � ✄✮ � 1 A ❂ ✭ a ij ✮ A prudent restriction on the model Adel Javanmard (Stanford University) Linear Bayesian Networks 9 / 22

  16. An example ✄ ❂ ✭ ✕ ij ✮ ✑ 1 ✑ 2 ✑ 3 A ✭ I � ✄✮ � 1 A ❂ ✭ a ij ✮ A prudent restriction on the model broadly applicable tractable learning methods Adel Javanmard (Stanford University) Linear Bayesian Networks 9 / 22

  17. Sufficient conditions for identifiability Task: Recover A Structural Condition: (Additive) Graph S Expansion ❥◆ ✭ S ✮ ❥ ✕ ❥ S ❥ ✰ d max , for all S ✚ ❍ ◆ ✭ S ✮ Parametric Condition: Generic Parameters ❦ Av ❦ 0 ❃ ❥◆ A ✭ supp ✭ v ✮✮ ❥ � ❥ supp ✭ v ✮ ❥ Identifiability result Under above conditions, A can be uniquely recovered from E ❬ xx T ❪ . Adel Javanmard (Stanford University) Linear Bayesian Networks 10 / 22

  18. Sufficient conditions for identifiability Task: Recover A Structural Condition: (Additive) Graph S Expansion ❥◆ ✭ S ✮ ❥ ✕ ❥ S ❥ ✰ d max , for all S ✚ ❍ ◆ ✭ S ✮ Parametric Condition: Generic Parameters ❦ Av ❦ 0 ❃ ❥◆ A ✭ supp ✭ v ✮✮ ❥ � ❥ supp ✭ v ✮ ❥ Identifiability result Under above conditions, A can be uniquely recovered from E ❬ xx T ❪ . Adel Javanmard (Stanford University) Linear Bayesian Networks 10 / 22

  19. Intuition ◮ Denoising the moment: E ❬ xx T ❪ ❂ A E ❬ hh T ❪ A T ✰ E ❬ ✎✎ T ❪ Adel Javanmard (Stanford University) Linear Bayesian Networks 11 / 22

  20. Intuition ◮ Denoising the moment: E ❬ xx T ❪ ❂ A E ❬ hh T ❪ A T ✰ E ❬ ✎✎ T ❪ ⑤ ④③ ⑥ ⑤ ④③ ⑥ lowrank diagonal Adel Javanmard (Stanford University) Linear Bayesian Networks 11 / 22

  21. Intuition ◮ Denoising the moment: A E ❬ hh T ❪ A T Adel Javanmard (Stanford University) Linear Bayesian Networks 11 / 22

  22. Intuition ◮ Denoising the moment: A E ❬ hh T ❪ A T ◮ For non-degenerate E ❬ hh T ❪ , we know Col ✭ A ✮ . Adel Javanmard (Stanford University) Linear Bayesian Networks 11 / 22

  23. Intuition ◮ Denoising the moment: A E ❬ hh T ❪ A T ◮ For non-degenerate E ❬ hh T ❪ , we know Col ✭ A ✮ . ◮ Under above conditions, sparsest vectors in Col ✭ A ✮ are columns of A . Adel Javanmard (Stanford University) Linear Bayesian Networks 11 / 22

  24. Intuition ◮ Denoising the moment: A E ❬ hh T ❪ A T ◮ For non-degenerate E ❬ hh T ❪ , we know Col ✭ A ✮ . ◮ Under above conditions, sparsest vectors in Col ✭ A ✮ are columns of A . [ Spielman, Wang, Wright 2012] Adel Javanmard (Stanford University) Linear Bayesian Networks 11 / 22

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend