[PPT] - Variational methods for overlapping and non-overlapping stochastic PowerPoint Presentation

SLIDE 1

Variational methods for overlapping and non-overlapping stochastic block models

Pierre Latouche

Universit´ e Paris 1 Panth´ eon-Sorbonne Laboratoire SAMM MSTGA 2012

Pierre Latouche 1

SLIDE 2

Real networks

◮ Many scientific fields :

◮ World Wide Web ◮ Biology, sociology,

physics

◮ Nature of data under

study:

◮ Interactions between N

bjects

◮ O(N 2) possible

interactions

◮ Network topology :

◮ Describes the way

nodes interact, structure/function relationship

Sample of 250 blogs (nodes) with their links (edges) of the French political Blogosphere. Pierre Latouche 3

SLIDE 4

In Biology

The metabolic network of bacteria Escherichia coli (Lacroix et al., 2006).

Pierre Latouche 4

SLIDE 5

In Biology

Subset of the yeast transcriptional regulatory network (Milo et al., 2002).

Pierre Latouche 5

SLIDE 6

Real networks

◮ Properties :

◮ Sparsity : m = O(N) ◮ Existence of a giant component ◮ Heterogeneity ◮ Preferential attachment ◮ Small world

֒ → Topological structure (groups of vertices)

Pierre Latouche 6

SLIDE 7

Real networks

◮ Properties :

◮ Sparsity : m = O(N) ◮ Existence of a giant component ◮ Heterogeneity ◮ Preferential attachment ◮ Small world

֒ → Topological structure (groups of vertices)

Pierre Latouche 6

SLIDE 8

Graph clustering

◮ Existing methods look for :

◮ Community structure ◮ Disassortative mixing ◮ Heterogeneous structure Pierre Latouche 7

SLIDE 9

Graph clustering

◮ Existing methods look for :

◮ Community structure ◮ Disassortative mixing ◮ Heterogeneous structure Pierre Latouche 7

SLIDE 10

Graph clustering

◮ Existing methods look for :

◮ Community structure ◮ Disassortative mixing ◮ Heterogeneous structure Pierre Latouche 7

SLIDE 11

Graph clustering

◮ Existing methods look for :

◮ Community structure ◮ Disassortative mixing ◮ Heterogeneous structure Pierre Latouche 7

SLIDE 12

Stochastic Block Model (SBM)

◮ Nowicki and Snijders (2001)

◮ Earlier work : Govaert et al. (1977)

◮ Zi independent hidden variables :

◮ Zi ∼ M

1, α = (α1, α2, . . . , αK)
◮ Zik = 1 : vertex i belongs to class k

◮ X | Z edges drawn independently :

Xij|{ZikZjl = 1} ∼ B(πkl)

◮ A mixture model for graphs :

Xij ∼

K

k=1

K

l=1

αkαlB(πkl)

Pierre Latouche 8

SLIDE 13

1 2 3 4 5 6 7 8 4 5 6 7 8

π••

9 10

π•• π•• π•• π••

Pierre Latouche 9

SLIDE 14

Maximum likelihood estimation

◮ Log-likelihoods of the model :

◮ Observed-data : log p(X | α, Π) = log {

Z p(X, Z | α, Π)}

֒ → KN terms

◮ Expectation Maximization (EM) algorithm requires the

knowledge of p(Z | X, α, Π)

Problem

p(Z | X, α, Π) is not tractable (no conditional independence)

Variational EM

Daudin et al. (2008)

Pierre Latouche 10

SLIDE 15

Maximum likelihood estimation

◮ Log-likelihoods of the model :

◮ Observed-data : log p(X | α, Π) = log {

Z p(X, Z | α, Π)}

֒ → KN terms

◮ Expectation Maximization (EM) algorithm requires the

knowledge of p(Z | X, α, Π)

Problem

p(Z | X, α, Π) is not tractable (no conditional independence)

Variational EM

Daudin et al. (2008)

Pierre Latouche 10

SLIDE 16

Maximum likelihood estimation

◮ Log-likelihoods of the model :

◮ Observed-data : log p(X | α, Π) = log {

Z p(X, Z | α, Π)}

֒ → KN terms

◮ Expectation Maximization (EM) algorithm requires the

knowledge of p(Z | X, α, Π)

Problem

p(Z | X, α, Π) is not tractable (no conditional independence)

Variational EM

Daudin et al. (2008)

Pierre Latouche 10

SLIDE 17

Model selection

Criteria

Since log p(X | α, Π) is not tractable, we cannot rely on:

◮ AIC = log p(X |ˆ

α, ˆ Π) − C

◮ BIC = log p(X |ˆ

α, ˆ Π) − C

2 log N(N−1) 2

ICL

Biernacki et al. (2000) ֒ → Daudin et al. (2008)

Variational Bayes EM ֒ → ILvb

Latouche et al. (2012)

Pierre Latouche 11

SLIDE 18

Model selection

Criteria

Since log p(X | α, Π) is not tractable, we cannot rely on:

◮ AIC = log p(X |ˆ

α, ˆ Π) − C

◮ BIC = log p(X |ˆ

α, ˆ Π) − C

2 log N(N−1) 2

ICL

Biernacki et al. (2000) ֒ → Daudin et al. (2008)

Variational Bayes EM ֒ → ILvb

Latouche et al. (2012)

Pierre Latouche 11

SLIDE 19

Bayesian framework

◮ Conjugate prior distributions :

◮ p

α | n0 = {n0

1, . . . , n0 K}

= Dir(α; n0)

◮ p

Π | η0 = (η0

kl), ζ0 = (ζ0 kl)

=

k≤l Beta(πkl; η0 kl, ζ0 kl)

◮ Non informative Jeffreys prior :

◮ n0

k = 1/2

◮ η0

kl = ζ0 kl = 1/2

Pierre Latouche 12

SLIDE 20

Variational Bayes EM

Latouche et al. (2009)

◮ p(Z, α, Π | X) not tractable

Decomposition

log p(X) = L (q) + KL (q(·) || p(·| X)) where L(q) =

Z

q(Z, α, Π) log p(X, Z, α, Π) q(Z, α, Π)

d α d Π

Factorization

q(Z, α, Π) = q(α)q(Π)q(Z) = q(α)q(Π)

N

i=1

q(Zi)

Pierre Latouche 13

SLIDE 21

Variational Bayes EM

Latouche et al. (2009)

E-step

◮ q(Zi) = M(Zi; 1, τi = {τi1, . . . , τiK})

M-step

◮ q(α) = Dir(α; n) ◮ q(Π) = K k≤l Beta(πkl; ηkl, ζkl)

Pierre Latouche 14

SLIDE 22

A new model selection criterion : ILvb

Latouche et al. (2012)

◮ log p(X |K) = L (q) + KL(...) ◮ After convergence, use L (q) as an approximation of

log p(X |K)

ILvb

ILvb = log

Γ(K

k=1 n0 k) K k=1 Γ(nk)

Γ(K

k=1 nk) K k=1 Γ(n0 k)

+

K

k≤l

log Γ(η0

kl + ζ0 kl)Γ(ηkl)Γ(ζkl)

Γ(ηkl + ζkl)Γ(η0

kl)Γ(ζ0 kl)

−

N

i=1

K

k=1

τik log τik

Pierre Latouche 15

SLIDE 23

Overlaps in networks

Palla et al. (2006)

Problem

The stochastic block model (SBM) and most existing methods assume that each vertex belongs to a single class

Pierre Latouche 17

SLIDE 25

Stochastic Block Model (SBM)

◮ Nowicki and Snijders (2001) ◮ Zi independent hidden variables :

Zi ∼ M

1, α = (α1, α2, . . . , αK)
Pierre Latouche

18

SLIDE 26

Overlapping Stochastic Block model (OSBM)

◮ Latouche et al. (2011) ◮ Zik independent hidden variables :

Zi ∼

K

k=1

B(Zik; αk) =

K

k=1

αZik

k

(1 − αk)1−Zik

Pierre Latouche 18

SLIDE 27

Overlapping Stochastic Block model (OSBM)

◮ Latouche et al. (2011) ◮ X | Z edges drawn independently :

Xij| Zi, Zj ∼ B

Xij; ΠZi,Zj)
◮ ΠZi,Zj = g
aZi,Zj
◮ aZi,Zj = Z⊺

i W Zj

i ↔ j

+ Z⊺

i U

i →?

+ V⊺ Zj

? → j

+ W ∗

bias

◮ g(t) = 1/ (1 + exp(−t)) is the logistic function

Pierre Latouche 18

SLIDE 28

OSBM

◮ ˜

Zi = (Zi, 1)⊺

◮

˜ W = W U V⊺ W ∗

◮ aZi,Zj = ˜

Z

⊺ i ˜

W ˜ Zj

◮ Parameter set :

α, ˜

W

Pierre Latouche

19

SLIDE 29

Bayesian framework

◮ Conjugate prior distributions :

◮ p(α) = K

k=1 Beta(αk; η0 k, ζ0 k)

◮ p( ˜

W

vec) = N( ˜

W

vec; ˜

W

vec 0 , S0)

◮ The vec operator : if

A = A11 A12 A21 A22

,

then Avec =     A11 A21 A12 A22    

Pierre Latouche 21

SLIDE 31

Bayesian framework

◮ x⊺ A y = (y ⊗ x)⊺ Avec ◮ In practice : set ˜

W

vec

= 0 and S0 = I

β

Problem

p(Z, α, ˜ W | X) not tractable

Pierre Latouche 22

SLIDE 32

q Transformation

Decomposition

log p(X) = L(r) + KL(r||p) where L(r) =

Z
r(Z, α, ˜

W) log p(X | Z, ˜ W)p(Z | α)p(α)p( ˜ W) r(Z, α, ˜ W)

d α d ˜

W

Lower bound

log p(X) ≥ L(r)

Problem

L(r) has a too complex form ֒ → no variational Bayes EM algorithm ??

Pierre Latouche 23

SLIDE 33

Local bound

◮ Use the bound of Jaakkola and Jordan (2000) for Bayesian

logistic regression log p(X | Z, ˜ W) ≥ log h(Z, ˜ W, ξ), ∀ ξ ∈ RN×N where log h(Z, ˜ W, ξ) =

N

i=j
(Xij − 1

2)aZi,Zj − ξij 2 + log g(ξij) − λ(ξij)(a2

Zi,Zj − ξ2 ij)

and

λ(ξ) = 1 4ξ tanh(ξ 2) = 1 2ξ

g(ξ) − 1

2

Pierre Latouche

24

SLIDE 34

ξ Transformation

Lower Bound

log p(X) = log

Z
p(X | Z, ˜

W)p(Z | α)p(α)p( ˜ W)d α d ˜ W

≥ L(ξ)

where L(ξ) = log

Z
h(Z, ˜

W, ξ)p(Z | α)p(α)p( ˜ W)d α d ˜ W

Pierre Latouche

25

SLIDE 35

ξ Transformation

Decomposition

L(ξ) = L(r; ξ) + KL(r||p) where L(r; ξ) =

Z
r(Z, α, ˜

W) log h(Z, ˜ W, ξ)p(Z | α)p(α)p( ˜ W) r(Z, α, ˜ W)

dαd ˜

W

Lower bound

log p(X) ≥ L(ξ) ≥ L(r; ξ)

Pierre Latouche 26

SLIDE 36

Inference

Local optimization

◮ ξ = argmaxξL(r; ξ)

E-step

◮ r(Zik) = B(Zik; τik)

M-step

◮ r(α) = K k=1 Beta(αk; ηN k , ζN k ) ◮ r( ˜

W

vec) = N( ˜

W

vec; ˜

W

vec N , SN)

Pierre Latouche 27

SLIDE 37

Model selection

◮ After convergence, use L(ˆ

r; ˆ ξ) as an approximation of log p(X |K)

ILosbm

ILosbm = L(ˆ r; ˆ ξ)

Pierre Latouche 28

SLIDE 38

L2 regularization

p( ˜ W

vec) = N( ˜

W

vec; 0, I β) ◮ β too small ֒

→ overfit

◮ β too large ֒

→ ILosbm maximized for very large values of K

Question

Can we estimate β from the data ?

Pierre Latouche 29

SLIDE 39

Bayesian framework

◮ Conjugate prior distributions :

◮ p( ˜

W

vec) = N( ˜

W

vec; 0, I β )

◮ p(β) = Gamma(β; a0, b0) Pierre Latouche 30

SLIDE 40

Inference

◮ Use a variational Bayes EM algorithm to maximize:

L(r; ξ) =

Z
r(Z, α, ˜

W, β) log h(Z, ˜ W, ξ)p(Z | α)p(α)p( ˜ W)p(β) r(Z, α, ˜ W, β)

d α d ˜

W dβ

◮ r(β) = Gamma(β; aN, bN), where

aN = a0 + (K + 1)2 2 and bN = b0 + 1 2Tr

SN + ( ˜

W

vec N )⊺ ˜

W

vec N

Criterion

ILosbm = L(ˆ r; ˆ ξ)

Pierre Latouche 31

SLIDE 41

Inference

◮ Use a variational Bayes EM algorithm to maximize:

L(r; ξ) =

Z
r(Z, α, ˜

W, β) log h(Z, ˜ W, ξ)p(Z | α)p(α)p( ˜ W)p(β) r(Z, α, ˜ W, β)

d α d ˜

W dβ

◮ r(β) = Gamma(β; aN, bN), where

aN = a0 + (K + 1)2 2 and bN = b0 + 1 2Tr

SN + ( ˜

W

vec N )⊺ ˜

W

vec N

Criterion

ILosbm = L(ˆ r; ˆ ξ)

Pierre Latouche 31

SLIDE 42

ILosbm

ILosbm =

N

i=j
log g(ξij) − ξij

2 + λ(ξij)ξ2

ij

+

K

k=1

log Γ(η0

k + ζ0 k)Γ(ηN k )Γ(ζN k )

Γ(η0

k)Γ(ζ0 k)Γ(ηN k + ζN k )

+ log Γ(aN)

Γ(a0) + a0 log b0 + aN(1 − b0 bN − log bN) + 1 2( ˜ W

vec N )⊺ S−1 N

˜ W

⊺ N

+ 1 2 log | SN | −

N

i=1

K

k=1

{τik log τik + (1 − τik) log(1 − τik)} .

Pierre Latouche 32

SLIDE 43

Experiments on simulated data

◮ Two topological structures :

◮ Community structures (affiliation) :

W =       λ −ǫ . . . −ǫ −ǫ λ . . . . . . ... −ǫ −ǫ . . . −ǫ λ      

◮ Community structures and stars :

W =                λ λ −ǫ . . . . . . . . . −ǫ −ǫ −λ −ǫ . . . . . . . . . . . . . . . −ǫ λ λ −ǫ . . . . . . . . . . . . −ǫ −λ −ǫ . . . . . . . . . . . . . . . −ǫ ... −ǫ −ǫ . . . . . . . . . . . . −ǫ λ λ −ǫ . . . . . . . . . . . . −ǫ −λ               

Pierre Latouche 33

SLIDE 44

Community structures

Example of an overlapping stochastic block model (OSBM) network with community structures.

Pierre Latouche 34

SLIDE 45

Community structures and stars

Example of an overlapping stochastic block model (OSBM) network with community structures and stars.

Pierre Latouche 35

SLIDE 46

Community structures and stars

Example of an overlapping stochastic block model (OSBM) network with community structures and stars.

Pierre Latouche 36

SLIDE 47

Experiments on simulated data

◮ N = 100 ◮ λ = 4 ◮ ǫ = 1 ◮ W ∗ = −5.5 ◮ U = V =

ǫ

. . . ǫ

◮ αk = 0.25

◮ K = 4 ◮ 100 simulations ◮ 4 graph clustering methods :

◮ CFinder (Palla et al. 2006) ◮ Stochastic Block Model (SBM) ◮ Mixed Membership Stochastic Block Model (MMSB) (Airoldi

et al. 2008)

◮ Overlapping Stochastic Block Model (OSBM) Pierre Latouche 37

SLIDE 48

How to compare the methods ?

◮ CFinder and OSBM can deal with outliers (Zi = 0) ◮ SBM and MMSB are run with K + 1 classes

֒ → identify the class of outliers

◮ Compute P = Z Z⊺ and ˆ

P = ˆ Zˆ Z

⊺ :

◮ invariant to column permutations of Z and ˆ

Z

◮ number of shared clusters between each pair of vertices

◮ Compute L2 distance d(P, ˆ

P)

Pierre Latouche 38

SLIDE 49

Community structures

50 100 150 200 250 300 CFinder SBM MMSB OSBM

L2 distance d(P, ˆ P) over the 100 samples of networks with community structures for CFinder, SBM, MMSB and OSBM.

Pierre Latouche 39

SLIDE 50

Community structures and stars

50 150 250 350 450 550 CFinder SBM MMSB OSBM

L2 distance d(P, ˆ P) over the 100 samples of networks with community structures for CFinder, SBM, MMSB and OSBM.

Pierre Latouche 40

SLIDE 51

Model selection

◮ Community structure ◮ N = 100 ◮ ǫ = 1 ◮ W ∗ = −5.5 ◮ αk = 1/K ◮ KTrue ∈ {3, . . . , 7} ◮ K ∈ {2, . . . , 8} ◮ 100 simulations

Pierre Latouche 41

SLIDE 52

Results

Table: KT rue\KILosbm(pintra ≈ 0.92)

2 3 4 5 6 7 8 3 99 1 4 99 1 5 93 5 2 6 7 64 22 7 7 16 47 37

Pierre Latouche 42

SLIDE 53

Results

Table: KT rue\KILosbm(pintra ≈ 0.62)

2 3 4 5 6 7 8 3 99 1 4 85 9 5 1 5 4 53 26 9 8 6 18 34 27 21 7 4 18 30 48

Pierre Latouche 43

SLIDE 54

The French blogosphere network

cluster 1 cluster 2 cluster 3 cluster 4

utliers

UMP 30 + 3 2 + 3 5 UDF 0 + 1 29 + 1 0 + 2 1 liberal 24 1 PS 40 17 analysts 0 + 1 1 + 3 1 + 1 0 + 4 5

thers

1 30

Classification of the blogs into K = 4 clusters using OSBM. 196 vertices, 2864 edges.

Pierre Latouche 44

SLIDE 55

Conclusion

◮ Computational cost : O(K4N2) = O(K2N2) ◮ New model selection criterion : ILosbm ◮ R package OSBM soon available on the CRAN ◮ Can be used to analyze SBM networks

Pierre Latouche 45

SLIDE 56

References

◮ K. Nowicki and T.A.B. Snijders (2001), Estimation and

prediction for stochastic blockstructures. 96, 1077-1087

◮ E.M. Airoldi, D.M. Blei, S.E. Fienberg, E.P

. Xing (2008), Mixed membership stochastic blockmodels. Journal of Machine Learning Research, 9, 1981-2014

◮ J-J. Daudin, F

. Picard et S. Robin (2008), A mixture model for random graphs. Statistics and Computing, 18, 2, 151-171

◮ P

. Latouche, E. Birmel´ e, C. Ambroise (2011), Overlapping stochastic block models with application to the French political blogosphere network. Annals of Applied Statistics, 5, 1, 309-336

◮ P

. Latouche, E. Birmel´ e, C. Ambroise (2012), Variational Bayesian inference and complexity control for stochastic block models. Statistical Modelling, 12, 1, 93-115

Pierre Latouche 46

Variational methods for overlapping and non-overlapping stochastic block models

Pierre Latouche

Universit´ e Paris 1 Panth´ eon-Sorbonne Laboratoire SAMM MSTGA 2012

Contents

Introduction Real networks Graph clustering Stochastic block models Model selection The overlapping stochastic block model Model selection Bayesian framework Inference The regulation term β Model selection Experiments Simulated data The French blogosphere network

Real networks

◮ Many scientific fields :

physics

◮ Nature of data under

study:

interactions

◮ Network topology :

nodes interact, structure/function relationship

In Biology

The metabolic network of bacteria Escherichia coli (Lacroix et al., 2006).

In Biology

Subset of the yeast transcriptional regulatory network (Milo et al., 2002).

Real networks

◮ Properties :

֒ → Topological structure (groups of vertices)

Real networks

◮ Properties :

֒ → Topological structure (groups of vertices)

Graph clustering

◮ Existing methods look for :

Graph clustering

◮ Existing methods look for :

Graph clustering

◮ Existing methods look for :

Graph clustering

◮ Existing methods look for :

Stochastic Block Model (SBM)

◮ Nowicki and Snijders (2001)

◮ Zi independent hidden variables :

◮ X | Z edges drawn independently :

Xij|{ZikZjl = 1} ∼ B(πkl)

◮ A mixture model for graphs :

Xij ∼

K

K

αkαlB(πkl)

π••

π•• π•• π•• π••

Maximum likelihood estimation

◮ Log-likelihoods of the model :

֒ → KN terms

◮ Expectation Maximization (EM) algorithm requires the

knowledge of p(Z | X, α, Π)

Problem

p(Z | X, α, Π) is not tractable (no conditional independence)

Variational EM

Daudin et al. (2008)

Maximum likelihood estimation

◮ Log-likelihoods of the model :

֒ → KN terms

◮ Expectation Maximization (EM) algorithm requires the

knowledge of p(Z | X, α, Π)

Problem

p(Z | X, α, Π) is not tractable (no conditional independence)

Variational EM

Daudin et al. (2008)

Maximum likelihood estimation

◮ Log-likelihoods of the model :

֒ → KN terms

◮ Expectation Maximization (EM) algorithm requires the

knowledge of p(Z | X, α, Π)

Problem

p(Z | X, α, Π) is not tractable (no conditional independence)

Variational EM

Daudin et al. (2008)

Model selection

Criteria

Since log p(X | α, Π) is not tractable, we cannot rely on:

◮ AIC = log p(X |ˆ

α, ˆ Π) − C

◮ BIC = log p(X |ˆ

α, ˆ Π) − C

2 log N(N−1) 2

ICL

Biernacki et al. (2000) ֒ → Daudin et al. (2008)