Learning Linear Bayesian Networks with Latent Variables Adel - PowerPoint PPT Presentation

Learning Linear Bayesian Networks with Latent Variables Adel Javanmard Stanford University joint work with Anima Anandkumar ✄ , Daniel Hsu ② , Sham Kakade ② ✄ University of California, Irvine ② Microsoft Research, New England Adel Javanmard (Stanford University) Linear Bayesian Networks 1 / 22

Modern data ◮ Lots of high-dimensional data, but highly structured. ◮ Learning the underlying structure is central to: ✎ Modeling ✎ Dimensionality reduction/ summarizing data ✎ Prediction This talk: Learning hidden (unobserved) variables that pervaded the data. Adel Javanmard (Stanford University) Linear Bayesian Networks 2 / 22

Example: document modeling In One Day, 11,000 Flee Syria as War and Hardship Worsen By RICK GLADSTONE and NEIL Hurricane Exposed Flaws in Protection of MacFARQUHAR Tunnels The United Nations reported that 11,000 By ELISABETH ROSENTHAL Syrians fled on Friday, the vast majority of Nearly two weeks after Hurricane Sandy them clambering for safety over the Turkish struck, the vital arteries that bring cars, trucks border. Nursing Home Is Faulted Over Care After and subways into New York City’s Storm transportation network have recovered, with By MICHAEL POWELL and SHERI FINK one major exception: the Brooklyn-Battery Amid the worst hurricane to hit New York City Obama to Insist on Tax Increase for the Tunnel remains closed. in nearly 80 years, officials have claimed that Wealthy the Promenade Rehabilitation and Health By HELENE COOPER and JONATHAN Behind New York Gas Lines, Warnings and Care Center failed to provide the most basic WEISMAN Crossed Fingers care to its patients. Amid talk of compromise, President Obama By DAVID W. CHEN, WINNIE HU and and Speaker John A. Boehner both indicated CLIFFORD KRAUSS unchanged stances on this issue, long a point The return of 1970s-era gas lines to the five of contention. boroughs of New York City was not the result of a single miscalculation, but a combination of ignored warnings and indecisiveness. Observations: words Hidden variables: topics Adel Javanmard (Stanford University) Linear Bayesian Networks 3 / 22

Topics genome disease software molecular tuberculosis system sequence penumonia parallel DNA control hardware human doctor cyber genetics weak network map resistance data project fatal program Adel Javanmard (Stanford University) Linear Bayesian Networks 4 / 22

Example: social network modeling Observations: social interactions Hidden: communities, relationships Adel Javanmard (Stanford University) Linear Bayesian Networks 5 / 22

Example: bio-informatics Observations: gene expressions Hidden variables: gene regulators Adel Javanmard (Stanford University) Linear Bayesian Networks 6 / 22

Linear Bayesian Network Markov relationship on DAG ◮ PA i ✿ parents of node i . ◮ P ✒ ✭ z ✮ ❂ ◗ n i ❂ 1 P ✒ ✭ z i ❥ z PA i ✮ . Linear model with latent nodes ◮ Observed variables ❢ x i ❣ and hidden variables ❢ h i ❣ . ◮ Linear relations: x i ❂ P j ✷ PA i a ij h j ✰ ✎ i ◮ uncorrelated noise variables ✎ i Adel Javanmard (Stanford University) Linear Bayesian Networks 7 / 22

Linear Bayesian Network Markov relationship on DAG h 1 h 2 h 3 ◮ PA i ✿ parents of node i . A ◮ P ✒ ✭ z ✮ ❂ ◗ n i ❂ 1 P ✒ ✭ z i ❥ z PA i ✮ . x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 Linear model with latent nodes ◮ Observed variables ❢ x i ❣ and hidden variables ❢ h i ❣ . ◮ Linear relations: x i ❂ P j ✷ PA i a ij h j ✰ ✎ i ◮ uncorrelated noise variables ✎ i Adel Javanmard (Stanford University) Linear Bayesian Networks 7 / 22

Learning latent models Goal: Given the observed data, learn structure and parameters of model. Challenges: ◮ Identifiablity Many models can explain the observed data! ◮ ICA: no edge between hidden nodes ◮ LDA: hidden variables are drawn from a Dirichlet distribution ◮ latent trees, graphical models with long cycles. [Anandkumar et.al. 2011, Choi et. al. 2011, Daskalakis et. al. 2006] ◮ Tractable learning algorithms: ◮ Maximum likelihood (tractable on trees, NP-hard in general) ◮ Expectation maximization [Redner, Walker 1984], Gibbs sampling [Asuncion et. al. 2011] ◮ Local tests [Bresler et. al. 2008, Anadkumar et. al. 2012, ] ◮ Convex relaxations (e.g. Lasso) [Meinshausen, Bühlmann 2006, Ravikumar, Wainwright 2010 ] Adel Javanmard (Stanford University) Linear Bayesian Networks 8 / 22

An example ✄ ❂ ✭ ✕ ij ✮ A ❂ ✭ a ij ✮ Adel Javanmard (Stanford University) Linear Bayesian Networks 9 / 22

An example ✄ ❂ ✭ ✕ ij ✮ ✑ 1 ✑ 2 ✑ 3 A ✭ I � ✄✮ � 1 A ❂ ✭ a ij ✮ ✭ ❂ Ah ✰ ✎ x ✮ x ❂ A ✭ I � ✄✮ � 1 ✑ ✰ ✎ ❂ ✄ h ✰ ✑ ❂ h Adel Javanmard (Stanford University) Linear Bayesian Networks 9 / 22

An example ✄ ❂ ✭ ✕ ij ✮ ✑ 1 ✑ 2 ✑ 3 A ✭ I � ✄✮ � 1 A ❂ ✭ a ij ✮ A prudent restriction on the model Adel Javanmard (Stanford University) Linear Bayesian Networks 9 / 22

An example ✄ ❂ ✭ ✕ ij ✮ ✑ 1 ✑ 2 ✑ 3 A ✭ I � ✄✮ � 1 A ❂ ✭ a ij ✮ A prudent restriction on the model broadly applicable tractable learning methods Adel Javanmard (Stanford University) Linear Bayesian Networks 9 / 22

Sufficient conditions for identifiability Task: Recover A Structural Condition: (Additive) Graph S Expansion ❥◆ ✭ S ✮ ❥ ✕ ❥ S ❥ ✰ d max , for all S ✚ ❍ ◆ ✭ S ✮ Parametric Condition: Generic Parameters ❦ Av ❦ 0 ❃ ❥◆ A ✭ supp ✭ v ✮✮ ❥ � ❥ supp ✭ v ✮ ❥ Identifiability result Under above conditions, A can be uniquely recovered from E ❬ xx T ❪ . Adel Javanmard (Stanford University) Linear Bayesian Networks 10 / 22

Intuition ◮ Denoising the moment: E ❬ xx T ❪ ❂ A E ❬ hh T ❪ A T ✰ E ❬ ✎✎ T ❪ Adel Javanmard (Stanford University) Linear Bayesian Networks 11 / 22

Intuition ◮ Denoising the moment: E ❬ xx T ❪ ❂ A E ❬ hh T ❪ A T ✰ E ❬ ✎✎ T ❪ ⑤ ④③ ⑥ ⑤ ④③ ⑥ lowrank diagonal Adel Javanmard (Stanford University) Linear Bayesian Networks 11 / 22

Intuition ◮ Denoising the moment: A E ❬ hh T ❪ A T Adel Javanmard (Stanford University) Linear Bayesian Networks 11 / 22

Intuition ◮ Denoising the moment: A E ❬ hh T ❪ A T ◮ For non-degenerate E ❬ hh T ❪ , we know Col ✭ A ✮ . Adel Javanmard (Stanford University) Linear Bayesian Networks 11 / 22

Intuition ◮ Denoising the moment: A E ❬ hh T ❪ A T ◮ For non-degenerate E ❬ hh T ❪ , we know Col ✭ A ✮ . ◮ Under above conditions, sparsest vectors in Col ✭ A ✮ are columns of A . Adel Javanmard (Stanford University) Linear Bayesian Networks 11 / 22

Intuition ◮ Denoising the moment: A E ❬ hh T ❪ A T ◮ For non-degenerate E ❬ hh T ❪ , we know Col ✭ A ✮ . ◮ Under above conditions, sparsest vectors in Col ✭ A ✮ are columns of A . [ Spielman, Wang, Wright 2012] Adel Javanmard (Stanford University) Linear Bayesian Networks 11 / 22

Learning Linear Bayesian Networks with Latent Variables Adel - PowerPoint PPT Presentation

Learning Linear Bayesian Networks with Latent Variables Adel Javanmard Stanford University joint work with Anima Anandkumar , Daniel Hsu , Sham Kakade University of California, Irvine Microsoft Research, New England Adel

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Bayesian Networks Youve heard about how Bayesian networks have revolutionized AI

AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Bayesian Networks Directed Acyclic Graph (DAG)

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

YCL Week 3 Lets talk about variables! Variables Variables are containers for data. Variables

Case Study: Bayesian Linear Regression and Sparse Bayesian Models Piyush Rai Dept. of CSE, IIT

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Bayesian networks (2) Lirong Xia Last class Bayesian networks compact, graphical

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Methods for Neural Networks Readings: Bishop, Neural Networks for Pattern Recognition .

Chapter14 Probabilistic Reasoning (Bayesian Networks) Sec. 1 - 2 20070607 Chap14 1

Bayesian Latent Variable Modelling of Longitudinal Family Data for Genetic Pleiotropy Studies

1 Latent variable models In the next section we will discuss latent variable models for

Part III: Latent Tree Models Le Song ICML 2012 Tutorial on Spectral Algorithms for Latent

Lets make set theory great again! John Harrison Amazon Web Services AITP 2018, Aussois 27th

CMPS 2200 Fall 2015 Probability and Expected Values Carola Wenk 11/18/15 CMPS 2200

Foundations of Computing II Lecture 5: Introduction to probability Stefano Tessaro

2.22.3 Introduction to Probability and Sample Spaces Prof. Tesler Math 186 Winter 2019

MachineLearning CMPT726 SimonFraserUniversity

Foundations of Computer Science Lecture 18 Random Variables Measurable Outcomes Probability

MA162: Finite mathematics . Jack Schmidt University of Kentucky April 10th, 2013 Schedule: HW

Introduction to Natural Language Processing a course taught as B4M36NLP at Open Informatics by