Stochastic Blockmodels meet Graph Neural Networks Nikhil Mehta - - PowerPoint PPT Presentation

stochastic blockmodels meet
SMART_READER_LITE
LIVE PREVIEW

Stochastic Blockmodels meet Graph Neural Networks Nikhil Mehta - - PowerPoint PPT Presentation

Stochastic Blockmodels meet Graph Neural Networks Nikhil Mehta Lawrence Carin Piyush Rai I nternational Conference on Machine Learning (ICML) 2019, Long Beach, CA Problem Statement Goal: Learn sparse node embeddings for graphs.


slide-1
SLIDE 1

Stochastic Blockmodels meet Graph Neural Networks

Nikhil Mehta Lawrence Carin Piyush Rai

International Conference on Machine Learning (ICML) 2019, Long Beach, CA

slide-2
SLIDE 2

Problem Statement

➢ Goal: Learn sparse node embeddings for graphs. ➢ Motivation:

▪ Can be used for downstream machine learning tasks – link/edge prediction, node classification, community discovery.

➢ Some notation

▪ Consider a graph associated with an adjacency matrix: 𝐵 ∈ 0,1

𝑂×𝑂

▪ Additional side information associated with each node: 𝑌 ∈ R 𝑂×𝐸

Stochastic blockmodels meet Graph Neural Networks

slide-3
SLIDE 3

Some Existing Work

  • Probabilistic Methods:
  • A simple class of models: Stochastic Block Models (SBM) [Nowicki & Snijders, 2001]

𝑨𝑗 ∼ 𝑁𝑣𝑚𝑢𝑗𝑜𝑝𝑣𝑚𝑚𝑗 𝜌 𝐵 𝑗,𝑘 ∼ 𝐶𝑓𝑠𝑜𝑝𝑣𝑚𝑚𝑗 𝑨𝑗

𝑈𝑋𝑨 𝑘

  • Overlapping SBM (OSBM) [Miller et al., 2009] – participation in multiple cochromemmunities.
  • Latent Feature Relational Model (LFRM), 𝑨𝑗 ∈ 0,1 𝐿 𝐿 → ∞

𝑎 ∼ 𝐽𝐶𝑄(𝛽); 𝜇 𝑙,𝑙` ∼ 𝒪 0, 𝜏

𝜇 2 ; 𝐵 𝑗,𝑘 ∼ 𝐶𝑓𝑠𝑜𝑝𝑣𝑚𝑚𝑗(𝜏(𝑨𝑗 𝑈Λ𝑨 𝑘))

  • Can handle uncertainty & missing data better. ☺
  • Interpretability can be achieved by suitable choice of prior. ☺
  • Uses iterative inference methods (MCMC, VB), not easy to scale. 
  • What about Variational Graph Autoencoder (GVAE) [Kipf & Welling, 2016] ?
  • Encoder – Graph Convolutional Network (GCN)
  • Decoder – Link prediction: 𝜏 𝑨𝑗

𝑈𝑨 𝑘 or Node classification: softmax(g(z))

  • Fast and scalable ☺
  • Generative method + Uses deep NN = Best of both worlds? No
  • Embeddings are often not interpretable. 
  • What should be the size of the latent space? 

Stochastic blockmodels meet Graph Neural Networks

slide-4
SLIDE 4

Deep Generative LFRM

  • We propose DGLFRM – Deep Generative Model for

Graphs

  • Unification: Interpretability of SBM + fast inference

via Graph Neural Network.

  • Node embedding (𝑨𝑜) is the element wise product of

two other latent variables: 𝑨𝑜 = 𝑐𝑜 ⊙ 𝑠

𝑜.

  • 𝑐𝑜 ∈ 0,1 𝐿 defines the node-community

memberships (cluster assignments). This allows the model to infer the “active communities” for a given (𝐿).

  • 𝑠

𝑜 ∈ ℝ𝐿 defines the node-community membership

strength.

Stochastic blockmodels meet Graph Neural Networks

slide-5
SLIDE 5

Deep Generative LFRM

Generative Story

  • Membership vector (𝑐𝑜 ∈ 0,1 𝐿)
  • Stick-breaking IBP
  • 𝑤𝑙 ∼ 𝐶𝑓𝑢𝑏 𝛽, 1 , 𝑙 = 1,2, … , 𝐿
  • 𝜌𝑙 = ς𝑘=1

𝑙

𝑤𝑘, 𝑐𝑜𝑙 ∼ 𝐶𝑓𝑠𝑜𝑝𝑣𝑚𝑚𝑗(𝜌𝑙)

  • Membership Strength (𝑠

𝑜 ∈ ℝ𝐿)

  • 𝑠

𝑜 ∼ 𝒪(0,1)

  • Node embedding: (𝑨𝑜 = 𝑐𝑜 ⊙ 𝑠

𝑜)

  • 𝑔

𝑜 = 𝑔(𝑨𝑜), where 𝑔 is a multi-layered perceptron.

  • 𝑞 𝐵 𝑜𝑛 𝑔

𝑜, 𝑔 𝑛) = 𝜏(𝑔 𝑜 𝑈𝑔 𝑛)

  • Posterior: p(𝑤, 𝑐, 𝑠|𝐵, 𝑌)

Stochastic blockmodels meet Graph Neural Networks

slide-6
SLIDE 6

Deep Generative LFRM

Inference Network

  • Full mean-field approximation: Approximate the true

posterior with the variational posterior.

  • 𝑟𝜚 𝑤, 𝑐, 𝑠 = ς𝑙=1

𝐿

ς𝑜=1

𝑂

𝑟𝜚 𝑤𝑜𝑙 𝑟𝜚 𝑐𝑜𝑙 𝑟𝜚(𝑠

𝑜𝑙)

  • 𝑟𝜚 𝑤𝑜𝑙 = 𝐿𝑣𝑛𝑏𝑠𝑏𝑡𝑥𝑏𝑛𝑧(𝑤𝑜𝑙|𝑑𝑙, 𝑒𝑙)
  • 𝑟𝜚 𝑐𝑜𝑙 = 𝐶𝑓𝑠𝑜𝑝𝑣𝑚𝑚𝑗 𝑐𝑜𝑙 𝜌𝑙
  • 𝑟𝜚 𝑠

𝑜𝑙 = 𝒪(𝜈𝑜, 𝑒𝑗𝑏𝑕(𝜏𝑜 2))

  • Kumaraswamy can be re-parameterized and act as a

reasonable approximation for Beta. For Bernoulli, we use continuous relaxation (Concrete Distribution).

Stochastic blockmodels meet Graph Neural Networks

slide-7
SLIDE 7

Deep Generative LFRM

Learning

  • Since the vanilla mean-field ignores the posterior dependencies among the latent variables, we

considered Structured Mean-Field: 𝑟𝜚 𝑤, 𝑐, 𝑠 = ς𝑙=1

𝐿

𝑟𝜚 𝑤𝑙 ς𝑜=1

𝑂

𝑟𝜚 𝑐𝑜𝑙|𝑤 𝑟𝜚(𝑠

𝑜𝑙)

  • The only difference from the Mean-field approximation is that 𝑤 is now a global variable (same for

all nodes); bnk|v ∼ 𝐶𝑓𝑠𝑜𝑝𝑣𝑚𝑚𝑗(𝜌𝑙).

  • We can maximize the following ELBO:

𝑜=1 𝑂

𝑛=1 𝑂

(𝔽[log 𝑞𝜄(𝐵𝑜𝑛|𝑨𝑜, 𝑨𝑛)]) + ෍

𝑜=1 𝑂

(𝔽[log 𝑞𝜄(𝑌𝑜|𝑨𝑜)]) − σ𝑜=1

𝑂

(𝐿𝑀 𝑟𝜚 𝑐𝑜 𝑤𝑜 𝑞𝜄 𝑐𝑜 𝑤𝑜 + 𝐿𝑀 𝑟𝜚 𝑠

𝑜 𝑞𝜄 𝑠 𝑜

+ 𝐿𝑀[𝑟𝜚(𝑤𝑜)|𝑞(𝑤𝑜)])

Stochastic blockmodels meet Graph Neural Networks

slide-8
SLIDE 8

Results

Stochastic blockmodels meet Graph Neural Networks

Train network

  • w. Masked edges

Generated network Sparse latent space Performance on Link prediction task on five datasets.

slide-9
SLIDE 9

Thank you

Please come to our poster @ 06:30PM Pacific Ballroom #180

Stochastic blockmodels meet Graph Neural Networks