Learning Linear Bayesian Networks with Latent Variables Adel - - PowerPoint PPT Presentation

learning linear bayesian networks with latent variables
SMART_READER_LITE
LIVE PREVIEW

Learning Linear Bayesian Networks with Latent Variables Adel - - PowerPoint PPT Presentation

Learning Linear Bayesian Networks with Latent Variables Adel Javanmard Stanford University joint work with Anima Anandkumar , Daniel Hsu , Sham Kakade University of California, Irvine Microsoft Research, New England Adel


slide-1
SLIDE 1

Learning Linear Bayesian Networks with Latent Variables

Adel Javanmard

Stanford University

joint work with Anima Anandkumar✄, Daniel Hsu②, Sham Kakade②

✄ University of California, Irvine ② Microsoft Research, New England Adel Javanmard (Stanford University) Linear Bayesian Networks 1 / 22

slide-2
SLIDE 2

Modern data

◮ Lots of high-dimensional data, but highly structured. ◮ Learning the underlying structure is central to:

✎ Modeling ✎ Dimensionality reduction/ summarizing data ✎ Prediction

This talk: Learning hidden (unobserved) variables that pervaded the data.

Adel Javanmard (Stanford University) Linear Bayesian Networks 2 / 22

slide-3
SLIDE 3

Modern data

◮ Lots of high-dimensional data, but highly structured. ◮ Learning the underlying structure is central to:

✎ Modeling ✎ Dimensionality reduction/ summarizing data ✎ Prediction

This talk: Learning hidden (unobserved) variables that pervaded the data.

Adel Javanmard (Stanford University) Linear Bayesian Networks 2 / 22

slide-4
SLIDE 4

Example: document modeling

Nursing Home Is Faulted Over Care After Storm By MICHAEL POWELL and SHERI FINK Amid the worst hurricane to hit New York City in nearly 80 years, officials have claimed that the Promenade Rehabilitation and Health Care Center failed to provide the most basic care to its patients. In One Day, 11,000 Flee Syria as War and Hardship Worsen By RICK GLADSTONE and NEIL MacFARQUHAR The United Nations reported that 11,000 Syrians fled on Friday, the vast majority of them clambering for safety over the Turkish border. Obama to Insist on Tax Increase for the Wealthy By HELENE COOPER and JONATHAN WEISMAN Amid talk of compromise, President Obama and Speaker John A. Boehner both indicated unchanged stances on this issue, long a point

  • f contention.

Hurricane Exposed Flaws in Protection of Tunnels By ELISABETH ROSENTHAL Nearly two weeks after Hurricane Sandy struck, the vital arteries that bring cars, trucks and subways into New York City’s transportation network have recovered, with

  • ne major exception: the Brooklyn-Battery

Tunnel remains closed. Behind New York Gas Lines, Warnings and Crossed Fingers By DAVID W. CHEN, WINNIE HU and CLIFFORD KRAUSS The return of 1970s-era gas lines to the five boroughs of New York City was not the result

  • f a single miscalculation, but a combination
  • f ignored warnings and indecisiveness.

Observations: words Hidden variables: topics

Adel Javanmard (Stanford University) Linear Bayesian Networks 3 / 22

slide-5
SLIDE 5

Topics

genome disease software molecular tuberculosis system sequence penumonia parallel DNA control hardware human doctor cyber genetics weak network map resistance data project fatal program

Adel Javanmard (Stanford University) Linear Bayesian Networks 4 / 22

slide-6
SLIDE 6

Example: social network modeling

Observations: social interactions Hidden: communities, relationships

Adel Javanmard (Stanford University) Linear Bayesian Networks 5 / 22

slide-7
SLIDE 7

Example: bio-informatics

Observations: gene expressions Hidden variables: gene regulators

Adel Javanmard (Stanford University) Linear Bayesian Networks 6 / 22

slide-8
SLIDE 8

Linear Bayesian Network

Markov relationship on DAG

◮ PAi ✿ parents of node i. ◮ P✒✭z✮ ❂ ◗n i❂1 P✒✭zi❥zPAi✮.

Linear model with latent nodes

◮ Observed variables ❢xi❣ and hidden variables ❢hi❣. ◮ Linear relations: xi ❂ P j ✷PAi aij hj ✰ ✎i ◮ uncorrelated noise variables ✎i

Adel Javanmard (Stanford University) Linear Bayesian Networks 7 / 22

slide-9
SLIDE 9

Linear Bayesian Network

h1 h2 h3 x1 x2 x3 x4 x5 x6 x7 x8

A Markov relationship on DAG

◮ PAi ✿ parents of node i. ◮ P✒✭z✮ ❂ ◗n i❂1 P✒✭zi❥zPAi✮.

Linear model with latent nodes

◮ Observed variables ❢xi❣ and hidden variables ❢hi❣. ◮ Linear relations: xi ❂ P j ✷PAi aij hj ✰ ✎i ◮ uncorrelated noise variables ✎i

Adel Javanmard (Stanford University) Linear Bayesian Networks 7 / 22

slide-10
SLIDE 10

Learning latent models

Goal: Given the observed data, learn structure and parameters of model. Challenges:

◮ Identifiablity Many models can explain the observed data!

◮ ICA: no edge between hidden nodes ◮ LDA: hidden variables are drawn from a Dirichlet distribution ◮ latent trees, graphical models with long cycles.

[Anandkumar et.al. 2011, Choi et. al. 2011, Daskalakis et. al. 2006]

◮ Tractable learning algorithms:

◮ Maximum likelihood (tractable on trees, NP-hard in general) ◮ Expectation maximization [Redner, Walker 1984], Gibbs sampling

[Asuncion et. al. 2011]

◮ Local tests [Bresler et. al. 2008, Anadkumar et. al. 2012, ] ◮ Convex relaxations (e.g. Lasso) [Meinshausen, Bühlmann 2006,

Ravikumar, Wainwright 2010 ]

Adel Javanmard (Stanford University) Linear Bayesian Networks 8 / 22

slide-11
SLIDE 11

Learning latent models

Goal: Given the observed data, learn structure and parameters of model. Challenges:

◮ Identifiablity Many models can explain the observed data!

◮ ICA: no edge between hidden nodes ◮ LDA: hidden variables are drawn from a Dirichlet distribution ◮ latent trees, graphical models with long cycles.

[Anandkumar et.al. 2011, Choi et. al. 2011, Daskalakis et. al. 2006]

◮ Tractable learning algorithms:

◮ Maximum likelihood (tractable on trees, NP-hard in general) ◮ Expectation maximization [Redner, Walker 1984], Gibbs sampling

[Asuncion et. al. 2011]

◮ Local tests [Bresler et. al. 2008, Anadkumar et. al. 2012, ] ◮ Convex relaxations (e.g. Lasso) [Meinshausen, Bühlmann 2006,

Ravikumar, Wainwright 2010 ]

Adel Javanmard (Stanford University) Linear Bayesian Networks 8 / 22

slide-12
SLIDE 12

Learning latent models

Goal: Given the observed data, learn structure and parameters of model. Challenges:

◮ Identifiablity Many models can explain the observed data!

◮ ICA: no edge between hidden nodes ◮ LDA: hidden variables are drawn from a Dirichlet distribution ◮ latent trees, graphical models with long cycles.

[Anandkumar et.al. 2011, Choi et. al. 2011, Daskalakis et. al. 2006]

◮ Tractable learning algorithms:

◮ Maximum likelihood (tractable on trees, NP-hard in general) ◮ Expectation maximization [Redner, Walker 1984], Gibbs sampling

[Asuncion et. al. 2011]

◮ Local tests [Bresler et. al. 2008, Anadkumar et. al. 2012, ] ◮ Convex relaxations (e.g. Lasso) [Meinshausen, Bühlmann 2006,

Ravikumar, Wainwright 2010 ]

Adel Javanmard (Stanford University) Linear Bayesian Networks 8 / 22

slide-13
SLIDE 13

An example

A ❂ ✭aij ✮ ✄ ❂ ✭✕ij ✮

Adel Javanmard (Stanford University) Linear Bayesian Networks 9 / 22

slide-14
SLIDE 14

An example

A ❂ ✭aij ✮ ✄ ❂ ✭✕ij ✮ A✭I ✄✮1 ✑1 ✑2 ✑3

x ❂ Ah ✰ ✎ h ❂ ✄h ✰ ✑ ❂ ✮ x ❂ A✭I ✄✮1✑ ✰ ✎

Adel Javanmard (Stanford University) Linear Bayesian Networks 9 / 22

slide-15
SLIDE 15

An example

A ❂ ✭aij ✮ ✄ ❂ ✭✕ij ✮ A✭I ✄✮1 ✑1 ✑2 ✑3 A prudent restriction on the model

Adel Javanmard (Stanford University) Linear Bayesian Networks 9 / 22

slide-16
SLIDE 16

An example

A ❂ ✭aij ✮ ✄ ❂ ✭✕ij ✮ A✭I ✄✮1 ✑1 ✑2 ✑3 A prudent restriction on the model

broadly applicable tractable learning methods

Adel Javanmard (Stanford University) Linear Bayesian Networks 9 / 22

slide-17
SLIDE 17

Sufficient conditions for identifiability

Task: Recover A Structural Condition: (Additive) Graph Expansion ❥◆✭S✮❥ ✕ ❥S❥ ✰ dmax, for all S ✚ ❍ Parametric Condition: Generic Parameters ❦Av❦0 ❃ ❥◆A✭supp✭v✮✮❥ ❥supp✭v✮❥ S ◆✭S✮

Identifiability result

Under above conditions, A can be uniquely recovered from E❬xx T❪.

Adel Javanmard (Stanford University) Linear Bayesian Networks 10 / 22

slide-18
SLIDE 18

Sufficient conditions for identifiability

Task: Recover A Structural Condition: (Additive) Graph Expansion ❥◆✭S✮❥ ✕ ❥S❥ ✰ dmax, for all S ✚ ❍ Parametric Condition: Generic Parameters ❦Av❦0 ❃ ❥◆A✭supp✭v✮✮❥ ❥supp✭v✮❥ S ◆✭S✮

Identifiability result

Under above conditions, A can be uniquely recovered from E❬xx T❪.

Adel Javanmard (Stanford University) Linear Bayesian Networks 10 / 22

slide-19
SLIDE 19

Intuition

◮ Denoising the moment: E❬xx T❪ ❂ AE❬hhT❪AT ✰ E❬✎✎T❪

Adel Javanmard (Stanford University) Linear Bayesian Networks 11 / 22

slide-20
SLIDE 20

Intuition

◮ Denoising the moment: E❬xx T❪ ❂ AE❬hhT❪AT

⑤ ④③ ⑥

lowrank

✰ E❬✎✎T❪

⑤ ④③ ⑥

diagonal

Adel Javanmard (Stanford University) Linear Bayesian Networks 11 / 22

slide-21
SLIDE 21

Intuition

◮ Denoising the moment: AE❬hhT❪AT

Adel Javanmard (Stanford University) Linear Bayesian Networks 11 / 22

slide-22
SLIDE 22

Intuition

◮ Denoising the moment: AE❬hhT❪AT ◮ For non-degenerate E❬hhT❪, we know Col✭A✮.

Adel Javanmard (Stanford University) Linear Bayesian Networks 11 / 22

slide-23
SLIDE 23

Intuition

◮ Denoising the moment: AE❬hhT❪AT ◮ For non-degenerate E❬hhT❪, we know Col✭A✮. ◮ Under above conditions, sparsest vectors in Col✭A✮ are columns of A.

Adel Javanmard (Stanford University) Linear Bayesian Networks 11 / 22

slide-24
SLIDE 24

Intuition

◮ Denoising the moment: AE❬hhT❪AT ◮ For non-degenerate E❬hhT❪, we know Col✭A✮. ◮ Under above conditions, sparsest vectors in Col✭A✮ are columns of A.

[Spielman, Wang, Wright 2012]

Adel Javanmard (Stanford University) Linear Bayesian Networks 11 / 22

slide-25
SLIDE 25

Intuition

◮ Denoising the moment: AE❬hhT❪AT ◮ For non-degenerate E❬hhT❪, we know Col✭A✮. ◮ Under above conditions, sparsest vectors in Col✭A✮ are columns of A.

Exhaustive search

1 Let U ❂ Col✭AE❬hhT❪AT✮ 2 min

z✻❂0 ❦Uz❦0

Adel Javanmard (Stanford University) Linear Bayesian Networks 11 / 22

slide-26
SLIDE 26

A tractable algorithm

Task: Recover A from U ❂ Col✭AE❬hh❃❪AT✮.

TWMLearn

1 Let U ❂ Col✭AE❬hhT❪✮AT ✷ Rn✂k. 2 Solve

min

z

❦Uz❦1❀ ✭eT

i U✮z ❂ 1 ✿

3 Set si ❂ Uz, and ❙ ❂ ❢s1❀ ✿ ✿ ✿ ❀ sn❣. 4 Return a maximal full rank subset of ❙.

Under “reasonable” conditions, the above program exactly recovers A

Adel Javanmard (Stanford University) Linear Bayesian Networks 12 / 22

slide-27
SLIDE 27

Learning latent space parameters

Recall so far ... Recovered A

◮ from second order moment E❬xx T❪ ◮ under no assumption on the hidden variables!

What hidden structures can be learnt from low order observed moments?

Adel Javanmard (Stanford University) Linear Bayesian Networks 13 / 22

slide-28
SLIDE 28

Learning latent space parameters

Recall so far ... Recovered A

◮ from second order moment E❬xx T❪ ◮ under no assumption on the hidden variables!

What hidden structures can be learnt from low order observed moments?

Adel Javanmard (Stanford University) Linear Bayesian Networks 13 / 22

slide-29
SLIDE 29

Multi-level DAGs

x A h ⑦ A ⑦ h E❬xx T❪ ❂ AE❬hhT❪AT ✰ E❬✎✎T❪

Adel Javanmard (Stanford University) Linear Bayesian Networks 14 / 22

slide-30
SLIDE 30

Multi-level DAGs

x A h ⑦ A ⑦ h AE❬hhT❪AT

Adel Javanmard (Stanford University) Linear Bayesian Networks 14 / 22

slide-31
SLIDE 31

Multi-level DAGs

x A h ⑦ A ⑦ h AE❬hhT❪AT

Adel Javanmard (Stanford University) Linear Bayesian Networks 14 / 22

slide-32
SLIDE 32

Multi-level DAGs

x A h ⑦ A ⑦ h A②AE❬hhT❪AT✭A②✮T

Adel Javanmard (Stanford University) Linear Bayesian Networks 14 / 22

slide-33
SLIDE 33

Multi-level DAGs

h ⑦ A ⑦ h E❬hhT❪ ❂ ⑦ AE❬⑦ h ⑦ hT❪⑦ AT ✰ E❬⑦ ✎⑦ ✎T❪

Adel Javanmard (Stanford University) Linear Bayesian Networks 14 / 22

slide-34
SLIDE 34

Multi-level DAGs

h ⑦ A ⑦ h ⑦ AE❬⑦ h ⑦ hT❪⑦ AT

Adel Javanmard (Stanford University) Linear Bayesian Networks 14 / 22

slide-35
SLIDE 35

Multi-level DAGs

h ⑦ A ⑦ h ⑦ AE❬⑦ h ⑦ hT❪⑦ AT

Adel Javanmard (Stanford University) Linear Bayesian Networks 14 / 22

slide-36
SLIDE 36

Multi-level DAGs

h ⑦ A ⑦ h ⑦ A② ⑦ AE❬⑦ h ⑦ hT❪⑦ AT✭⑦ A②✮T

Adel Javanmard (Stanford University) Linear Bayesian Networks 14 / 22

slide-37
SLIDE 37

Multi-level DAGs

⑦ h E❬⑦ h ⑦ hT❪

Adel Javanmard (Stanford University) Linear Bayesian Networks 14 / 22

slide-38
SLIDE 38

Linear structural equations

◮ Recall x ❂ Ah ✰ ✎ ◮ Now additionally A is full rank

(each hidden nodes has at least one

  • bserved neighbor)

◮ Linear dependence among hidden node:

hj ❂ P

i✷PAj ✕jihi ✰ ✑j

( in matrix form h ❂ ✄h ✰ ✑ )

◮ Noise variables ✑j are uncorrelated.

✄ A h x Spectral approach for learning

Adel Javanmard (Stanford University) Linear Bayesian Networks 15 / 22

slide-39
SLIDE 39

Linear structural equations

◮ Recall x ❂ Ah ✰ ✎ ◮ Now additionally A is full rank

(each hidden nodes has at least one

  • bserved neighbor)

◮ Linear dependence among hidden node:

hj ❂ P

i✷PAj ✕jihi ✰ ✑j

( in matrix form h ❂ ✄h ✰ ✑ )

◮ Noise variables ✑j are uncorrelated.

✄ A h x Spectral approach for learning

Adel Javanmard (Stanford University) Linear Bayesian Networks 15 / 22

slide-40
SLIDE 40

Linear structural equations

◮ Recall x ❂ Ah ✰ ✎ ◮ Now additionally A is full rank

(each hidden nodes has at least one

  • bserved neighbor)

◮ Linear dependence among hidden node:

hj ❂ P

i✷PAj ✕jihi ✰ ✑j

( in matrix form h ❂ ✄h ✰ ✑ )

◮ Noise variables ✑j are uncorrelated.

✄ A h x Spectral approach for learning

Adel Javanmard (Stanford University) Linear Bayesian Networks 15 / 22

slide-41
SLIDE 41

Linear structural equations

◮ Recall x ❂ Ah ✰ ✎ ◮ Now additionally A is full rank

(each hidden nodes has at least one

  • bserved neighbor)

◮ Linear dependence among hidden node:

hj ❂ P

i✷PAj ✕jihi ✰ ✑j

( in matrix form h ❂ ✄h ✰ ✑ )

◮ Noise variables ✑j are uncorrelated.

✄ A h x Spectral approach for learning

Adel Javanmard (Stanford University) Linear Bayesian Networks 15 / 22

slide-42
SLIDE 42

Linear structural equations

◮ Recall x ❂ Ah ✰ ✎ ◮ Now additionally A is full rank

(each hidden nodes has at least one

  • bserved neighbor)

◮ Linear dependence among hidden node:

hj ❂ P

i✷PAj ✕jihi ✰ ✑j

( in matrix form h ❂ ✄h ✰ ✑ )

◮ Noise variables ✑j are uncorrelated.

✄ A h x Spectral approach for learning

Adel Javanmard (Stanford University) Linear Bayesian Networks 15 / 22

slide-43
SLIDE 43

Learning ✄: idea

x1 x2 x3 x4 x5 h

1

h2 h3 h4

A x ❂ Ah ✰ ✎ h ❂ ✄h ✰ ✑

Adel Javanmard (Stanford University) Linear Bayesian Networks 16 / 22

slide-44
SLIDE 44

Learning ✄: idea

x1 x2 x3 x4 x5 h

1

h2 h3 h4

A x ❂ Ah ✰ ✎ h ❂ ✄h ✰ ✑

η1 η2 η3 η4 x1 x2 x3 x4 x5

A✭I ✄✮1 x ❂ A✭I ✄✮1✑ ✰ ✎

Adel Javanmard (Stanford University) Linear Bayesian Networks 16 / 22

slide-45
SLIDE 45

Learning ✄: idea

◮ Employ spectral approach to learn A✭I ✄✮1

◮ second order moment:

E❬xx T❪ ❂ A✭I ✄✮1E❬✑✑T❪✭A✭I ✄✮✮T ✰ E❬✎✎T❪

◮ third order moment:

E❬xx T❤✘❀ x✐❪ ❂ A✭I ✄✮1E❬✑✑T❤✑❀ AT✘✐❪✭A✭I ✄✮✮T ✰ E❬✎✎T❤✘❀ ✎✐❪

Adel Javanmard (Stanford University) Linear Bayesian Networks 16 / 22

slide-46
SLIDE 46

Learning ✄: idea

◮ Employ spectral approach to learn A✭I ✄✮1

◮ second order moment:

E❬xx T❪ ❂ A✭I ✄✮1E❬✑✑T❪✭A✭I ✄✮✮T ✰ E❬✎✎T❪

◮ third order moment:

E❬xx T❤✘❀ x✐❪ ❂ A✭I ✄✮1E❬✑✑T❤✑❀ AT✘✐❪✭A✭I ✄✮✮T ✰ E❬✎✎T❤✘❀ ✎✐❪

◮ Simultaneous diagonalization of the moments

( through SVD or tensor decompositions )

[Anandkumar, Foster, Hsu, Kakade, Liu 2012] [Anandkumar, Ge, Hsu, Kakade 2012] ◮ “Col✭A✮ ❂ Col✭A✭I ✄✮1✮” + “expansion property” ✮ A and ✄

Adel Javanmard (Stanford University) Linear Bayesian Networks 16 / 22

slide-47
SLIDE 47

Experiment

◮ k ❂ 25 hidden nodes and n ❂ 150 observed nodes ◮ Bernoulli-Gaussian model (p ❂ 0✿3), total number of edges = 1177. ◮ Noise variables distributed as exponential, poisson, chi-2, Gaussian

with mean zero and variances chosen randomly in ❬0✿5❀ 1❪.

Adel Javanmard (Stanford University) Linear Bayesian Networks 17 / 22

slide-48
SLIDE 48

number of samples = 25,000

−0.5 0.5 −0.6 −0.4 −0.2 0.2 0.4 0.6 −0.4 −0.2 0.2 0.4 0.6 −0.4 −0.2 0.2 0.4 0.6

⑦ ✕ij ✕ij ⑦ aij aij

Adel Javanmard (Stanford University) Linear Bayesian Networks 18 / 22

slide-49
SLIDE 49

number of samples = 100,000

−0.5 0.5 −0.8 −0.6 −0.4 −0.2 0.2 0.4 0.6 −0.5 0.5 −0.6 −0.4 −0.2 0.2 0.4

⑦ ✕ij ✕ij ⑦ aij aij

Adel Javanmard (Stanford University) Linear Bayesian Networks 19 / 22

slide-50
SLIDE 50

number of samples = 400,000

−0.5 0.5 −0.6 −0.4 −0.2 0.2 0.4 0.6 −0.5 0.5 −0.4 −0.2 0.2 0.4 0.6

⑦ ✕ij ✕ij ⑦ aij aij

Adel Javanmard (Stanford University) Linear Bayesian Networks 20 / 22

slide-51
SLIDE 51

Conclusion

◮ Considered learning latent models with arbitrary hidden variable

dependencies.

◮ Constraint on the model: expansion of bipartite graph from hidden to

  • bserved layer, generic parameter and non-degeneracy.

◮ Established identifiability of A under no assumption but

non-degeneracy of the hidden variables!

◮ Recovering A through ❵1 optimization. ◮ Can be used to learn topic-word matrix under the expansion constraint

and arbitrary topic dependencies.

◮ Learning the hidden space parameters and structure for multi-level

DAGs and linear structural equations.

Adel Javanmard (Stanford University) Linear Bayesian Networks 21 / 22

slide-52
SLIDE 52

You are welcome to visit our poster presentation (Paper ID: 146)! Thanks!

Adel Javanmard (Stanford University) Linear Bayesian Networks 22 / 22