Variational methods for overlapping and non-overlapping stochastic - - PowerPoint PPT Presentation

variational methods for overlapping and non overlapping
SMART_READER_LITE
LIVE PREVIEW

Variational methods for overlapping and non-overlapping stochastic - - PowerPoint PPT Presentation

Variational methods for overlapping and non-overlapping stochastic block models Pierre Latouche Universit e Paris 1 Panth eon-Sorbonne Laboratoire SAMM MSTGA 2012 Pierre Latouche 1 Contents Introduction Real networks Graph


slide-1
SLIDE 1

Variational methods for overlapping and non-overlapping stochastic block models

Pierre Latouche

Universit´ e Paris 1 Panth´ eon-Sorbonne Laboratoire SAMM MSTGA 2012

Pierre Latouche 1

slide-2
SLIDE 2

Contents

Introduction Real networks Graph clustering Stochastic block models Model selection The overlapping stochastic block model Model selection Bayesian framework Inference The regulation term β Model selection Experiments Simulated data The French blogosphere network

Pierre Latouche 2

slide-3
SLIDE 3

Real networks

◮ Many scientific fields :

◮ World Wide Web ◮ Biology, sociology,

physics

◮ Nature of data under

study:

◮ Interactions between N

  • bjects

◮ O(N 2) possible

interactions

◮ Network topology :

◮ Describes the way

nodes interact, structure/function relationship

Sample of 250 blogs (nodes) with their links (edges) of the French political Blogosphere. Pierre Latouche 3

slide-4
SLIDE 4

In Biology

The metabolic network of bacteria Escherichia coli (Lacroix et al., 2006).

Pierre Latouche 4

slide-5
SLIDE 5

In Biology

Subset of the yeast transcriptional regulatory network (Milo et al., 2002).

Pierre Latouche 5

slide-6
SLIDE 6

Real networks

◮ Properties :

◮ Sparsity : m = O(N) ◮ Existence of a giant component ◮ Heterogeneity ◮ Preferential attachment ◮ Small world

֒ → Topological structure (groups of vertices)

Pierre Latouche 6

slide-7
SLIDE 7

Real networks

◮ Properties :

◮ Sparsity : m = O(N) ◮ Existence of a giant component ◮ Heterogeneity ◮ Preferential attachment ◮ Small world

֒ → Topological structure (groups of vertices)

Pierre Latouche 6

slide-8
SLIDE 8

Graph clustering

◮ Existing methods look for :

◮ Community structure ◮ Disassortative mixing ◮ Heterogeneous structure Pierre Latouche 7

slide-9
SLIDE 9

Graph clustering

◮ Existing methods look for :

◮ Community structure ◮ Disassortative mixing ◮ Heterogeneous structure Pierre Latouche 7

slide-10
SLIDE 10

Graph clustering

◮ Existing methods look for :

◮ Community structure ◮ Disassortative mixing ◮ Heterogeneous structure Pierre Latouche 7

slide-11
SLIDE 11

Graph clustering

◮ Existing methods look for :

◮ Community structure ◮ Disassortative mixing ◮ Heterogeneous structure Pierre Latouche 7

slide-12
SLIDE 12

Stochastic Block Model (SBM)

◮ Nowicki and Snijders (2001)

◮ Earlier work : Govaert et al. (1977)

◮ Zi independent hidden variables :

◮ Zi ∼ M

  • 1, α = (α1, α2, . . . , αK)
  • ◮ Zik = 1 : vertex i belongs to class k

◮ X | Z edges drawn independently :

Xij|{ZikZjl = 1} ∼ B(πkl)

◮ A mixture model for graphs :

Xij ∼

K

  • k=1

K

  • l=1

αkαlB(πkl)

Pierre Latouche 8

slide-13
SLIDE 13

1 2 3 4 5 6 7 8 4 5 6 7 8

π••

9 10

π•• π•• π•• π••

Pierre Latouche 9

slide-14
SLIDE 14

Maximum likelihood estimation

◮ Log-likelihoods of the model :

◮ Observed-data : log p(X | α, Π) = log {

Z p(X, Z | α, Π)}

֒ → KN terms

◮ Expectation Maximization (EM) algorithm requires the

knowledge of p(Z | X, α, Π)

Problem

p(Z | X, α, Π) is not tractable (no conditional independence)

Variational EM

Daudin et al. (2008)

Pierre Latouche 10

slide-15
SLIDE 15

Maximum likelihood estimation

◮ Log-likelihoods of the model :

◮ Observed-data : log p(X | α, Π) = log {

Z p(X, Z | α, Π)}

֒ → KN terms

◮ Expectation Maximization (EM) algorithm requires the

knowledge of p(Z | X, α, Π)

Problem

p(Z | X, α, Π) is not tractable (no conditional independence)

Variational EM

Daudin et al. (2008)

Pierre Latouche 10

slide-16
SLIDE 16

Maximum likelihood estimation

◮ Log-likelihoods of the model :

◮ Observed-data : log p(X | α, Π) = log {

Z p(X, Z | α, Π)}

֒ → KN terms

◮ Expectation Maximization (EM) algorithm requires the

knowledge of p(Z | X, α, Π)

Problem

p(Z | X, α, Π) is not tractable (no conditional independence)

Variational EM

Daudin et al. (2008)

Pierre Latouche 10

slide-17
SLIDE 17

Model selection

Criteria

Since log p(X | α, Π) is not tractable, we cannot rely on:

◮ AIC = log p(X |ˆ

α, ˆ Π) − C

◮ BIC = log p(X |ˆ

α, ˆ Π) − C

2 log N(N−1) 2

ICL

Biernacki et al. (2000) ֒ → Daudin et al. (2008)

Variational Bayes EM ֒ → ILvb

Latouche et al. (2012)

Pierre Latouche 11

slide-18
SLIDE 18

Model selection

Criteria

Since log p(X | α, Π) is not tractable, we cannot rely on:

◮ AIC = log p(X |ˆ

α, ˆ Π) − C

◮ BIC = log p(X |ˆ

α, ˆ Π) − C

2 log N(N−1) 2

ICL

Biernacki et al. (2000) ֒ → Daudin et al. (2008)

Variational Bayes EM ֒ → ILvb

Latouche et al. (2012)

Pierre Latouche 11

slide-19
SLIDE 19

Bayesian framework

◮ Conjugate prior distributions :

◮ p

  • α | n0 = {n0

1, . . . , n0 K}

  • = Dir(α; n0)

◮ p

  • Π | η0 = (η0

kl), ζ0 = (ζ0 kl)

  • =

k≤l Beta(πkl; η0 kl, ζ0 kl)

◮ Non informative Jeffreys prior :

◮ n0

k = 1/2

◮ η0

kl = ζ0 kl = 1/2

Pierre Latouche 12

slide-20
SLIDE 20

Variational Bayes EM

Latouche et al. (2009)

◮ p(Z, α, Π | X) not tractable

Decomposition

log p(X) = L (q) + KL (q(·) || p(·| X)) where L(q) =

  • Z

q(Z, α, Π) log p(X, Z, α, Π) q(Z, α, Π)

  • d α d Π

Factorization

q(Z, α, Π) = q(α)q(Π)q(Z) = q(α)q(Π)

N

  • i=1

q(Zi)

Pierre Latouche 13

slide-21
SLIDE 21

Variational Bayes EM

Latouche et al. (2009)

E-step

◮ q(Zi) = M(Zi; 1, τi = {τi1, . . . , τiK})

M-step

◮ q(α) = Dir(α; n) ◮ q(Π) = K k≤l Beta(πkl; ηkl, ζkl)

Pierre Latouche 14

slide-22
SLIDE 22

A new model selection criterion : ILvb

Latouche et al. (2012)

◮ log p(X |K) = L (q) + KL(...) ◮ After convergence, use L (q) as an approximation of

log p(X |K)

ILvb

ILvb = log

  • Γ(K

k=1 n0 k) K k=1 Γ(nk)

Γ(K

k=1 nk) K k=1 Γ(n0 k)

  • +

K

  • k≤l

log Γ(η0

kl + ζ0 kl)Γ(ηkl)Γ(ζkl)

Γ(ηkl + ζkl)Γ(η0

kl)Γ(ζ0 kl)

N

  • i=1

K

  • k=1

τik log τik

Pierre Latouche 15

slide-23
SLIDE 23

Contents

Introduction Real networks Graph clustering Stochastic block models Model selection The overlapping stochastic block model Model selection Bayesian framework Inference The regulation term β Model selection Experiments Simulated data The French blogosphere network

Pierre Latouche 16

slide-24
SLIDE 24

Overlaps in networks

Palla et al. (2006)

Problem

The stochastic block model (SBM) and most existing methods assume that each vertex belongs to a single class

Pierre Latouche 17

slide-25
SLIDE 25

Stochastic Block Model (SBM)

◮ Nowicki and Snijders (2001) ◮ Zi independent hidden variables :

Zi ∼ M

  • 1, α = (α1, α2, . . . , αK)
  • Pierre Latouche

18

slide-26
SLIDE 26

Overlapping Stochastic Block model (OSBM)

◮ Latouche et al. (2011) ◮ Zik independent hidden variables :

Zi ∼

K

  • k=1

B(Zik; αk) =

K

  • k=1

αZik

k

(1 − αk)1−Zik

Pierre Latouche 18

slide-27
SLIDE 27

Overlapping Stochastic Block model (OSBM)

◮ Latouche et al. (2011) ◮ X | Z edges drawn independently :

Xij| Zi, Zj ∼ B

  • Xij; ΠZi,Zj)
  • ◮ ΠZi,Zj = g
  • aZi,Zj
  • ◮ aZi,Zj = Z⊺

i W Zj

  • i ↔ j

+ Z⊺

i U

i →?

+ V⊺ Zj

? → j

+ W ∗

  • bias

◮ g(t) = 1/ (1 + exp(−t)) is the logistic function

Pierre Latouche 18

slide-28
SLIDE 28

OSBM

◮ ˜

Zi = (Zi, 1)⊺

˜ W = W U V⊺ W ∗

  • ◮ aZi,Zj = ˜

Z

⊺ i ˜

W ˜ Zj

◮ Parameter set :

  • α, ˜

W

  • Pierre Latouche

19

slide-29
SLIDE 29

Contents

Introduction Real networks Graph clustering Stochastic block models Model selection The overlapping stochastic block model Model selection Bayesian framework Inference The regulation term β Model selection Experiments Simulated data The French blogosphere network

Pierre Latouche 20

slide-30
SLIDE 30

Bayesian framework

◮ Conjugate prior distributions :

◮ p(α) = K

k=1 Beta(αk; η0 k, ζ0 k)

◮ p( ˜

W

vec) = N( ˜

W

vec; ˜

W

vec 0 , S0)

◮ The vec operator : if

A = A11 A12 A21 A22

  • ,

then Avec =     A11 A21 A12 A22    

Pierre Latouche 21

slide-31
SLIDE 31

Bayesian framework

◮ x⊺ A y = (y ⊗ x)⊺ Avec ◮ In practice : set ˜

W

vec

= 0 and S0 = I

β

Problem

p(Z, α, ˜ W | X) not tractable

Pierre Latouche 22

slide-32
SLIDE 32

q Transformation

Decomposition

log p(X) = L(r) + KL(r||p) where L(r) =

  • Z
  • r(Z, α, ˜

W) log p(X | Z, ˜ W)p(Z | α)p(α)p( ˜ W) r(Z, α, ˜ W)

  • d α d ˜

W

Lower bound

log p(X) ≥ L(r)

Problem

L(r) has a too complex form ֒ → no variational Bayes EM algorithm ??

Pierre Latouche 23

slide-33
SLIDE 33

Local bound

◮ Use the bound of Jaakkola and Jordan (2000) for Bayesian

logistic regression log p(X | Z, ˜ W) ≥ log h(Z, ˜ W, ξ), ∀ ξ ∈ RN×N where log h(Z, ˜ W, ξ) =

N

  • i=j
  • (Xij − 1

2)aZi,Zj − ξij 2 + log g(ξij) − λ(ξij)(a2

Zi,Zj − ξ2 ij)

  • and

λ(ξ) = 1 4ξ tanh(ξ 2) = 1 2ξ

  • g(ξ) − 1

2

  • Pierre Latouche

24

slide-34
SLIDE 34

ξ Transformation

Lower Bound

log p(X) = log

  • Z
  • p(X | Z, ˜

W)p(Z | α)p(α)p( ˜ W)d α d ˜ W

  • ≥ L(ξ)

where L(ξ) = log

  • Z
  • h(Z, ˜

W, ξ)p(Z | α)p(α)p( ˜ W)d α d ˜ W

  • Pierre Latouche

25

slide-35
SLIDE 35

ξ Transformation

Decomposition

L(ξ) = L(r; ξ) + KL(r||p) where L(r; ξ) =

  • Z
  • r(Z, α, ˜

W) log h(Z, ˜ W, ξ)p(Z | α)p(α)p( ˜ W) r(Z, α, ˜ W)

  • dαd ˜

W

Lower bound

log p(X) ≥ L(ξ) ≥ L(r; ξ)

Pierre Latouche 26

slide-36
SLIDE 36

Inference

Local optimization

◮ ξ = argmaxξL(r; ξ)

E-step

◮ r(Zik) = B(Zik; τik)

M-step

◮ r(α) = K k=1 Beta(αk; ηN k , ζN k ) ◮ r( ˜

W

vec) = N( ˜

W

vec; ˜

W

vec N , SN)

Pierre Latouche 27

slide-37
SLIDE 37

Model selection

◮ After convergence, use L(ˆ

r; ˆ ξ) as an approximation of log p(X |K)

ILosbm

ILosbm = L(ˆ r; ˆ ξ)

Pierre Latouche 28

slide-38
SLIDE 38

L2 regularization

p( ˜ W

vec) = N( ˜

W

vec; 0, I β) ◮ β too small ֒

→ overfit

◮ β too large ֒

→ ILosbm maximized for very large values of K

Question

Can we estimate β from the data ?

Pierre Latouche 29

slide-39
SLIDE 39

Bayesian framework

◮ Conjugate prior distributions :

◮ p( ˜

W

vec) = N( ˜

W

vec; 0, I β )

◮ p(β) = Gamma(β; a0, b0) Pierre Latouche 30

slide-40
SLIDE 40

Inference

◮ Use a variational Bayes EM algorithm to maximize:

L(r; ξ) =

  • Z
  • r(Z, α, ˜

W, β) log h(Z, ˜ W, ξ)p(Z | α)p(α)p( ˜ W)p(β) r(Z, α, ˜ W, β)

  • d α d ˜

W dβ

◮ r(β) = Gamma(β; aN, bN), where

aN = a0 + (K + 1)2 2 and bN = b0 + 1 2Tr

  • SN + ( ˜

W

vec N )⊺ ˜

W

vec N

  • Criterion

ILosbm = L(ˆ r; ˆ ξ)

Pierre Latouche 31

slide-41
SLIDE 41

Inference

◮ Use a variational Bayes EM algorithm to maximize:

L(r; ξ) =

  • Z
  • r(Z, α, ˜

W, β) log h(Z, ˜ W, ξ)p(Z | α)p(α)p( ˜ W)p(β) r(Z, α, ˜ W, β)

  • d α d ˜

W dβ

◮ r(β) = Gamma(β; aN, bN), where

aN = a0 + (K + 1)2 2 and bN = b0 + 1 2Tr

  • SN + ( ˜

W

vec N )⊺ ˜

W

vec N

  • Criterion

ILosbm = L(ˆ r; ˆ ξ)

Pierre Latouche 31

slide-42
SLIDE 42

ILosbm

ILosbm =

N

  • i=j
  • log g(ξij) − ξij

2 + λ(ξij)ξ2

ij

  • +

K

  • k=1

log Γ(η0

k + ζ0 k)Γ(ηN k )Γ(ζN k )

Γ(η0

k)Γ(ζ0 k)Γ(ηN k + ζN k )

  • + log Γ(aN)

Γ(a0) + a0 log b0 + aN(1 − b0 bN − log bN) + 1 2( ˜ W

vec N )⊺ S−1 N

˜ W

⊺ N

+ 1 2 log | SN | −

N

  • i=1

K

  • k=1

{τik log τik + (1 − τik) log(1 − τik)} .

Pierre Latouche 32

slide-43
SLIDE 43

Experiments on simulated data

◮ Two topological structures :

◮ Community structures (affiliation) :

W =       λ −ǫ . . . −ǫ −ǫ λ . . . . . . ... −ǫ −ǫ . . . −ǫ λ      

◮ Community structures and stars :

W =                λ λ −ǫ . . . . . . . . . −ǫ −ǫ −λ −ǫ . . . . . . . . . . . . . . . −ǫ λ λ −ǫ . . . . . . . . . . . . −ǫ −λ −ǫ . . . . . . . . . . . . . . . −ǫ ... −ǫ −ǫ . . . . . . . . . . . . −ǫ λ λ −ǫ . . . . . . . . . . . . −ǫ −λ               

Pierre Latouche 33

slide-44
SLIDE 44

Community structures

Example of an overlapping stochastic block model (OSBM) network with community structures.

Pierre Latouche 34

slide-45
SLIDE 45

Community structures and stars

Example of an overlapping stochastic block model (OSBM) network with community structures and stars.

Pierre Latouche 35

slide-46
SLIDE 46

Community structures and stars

Example of an overlapping stochastic block model (OSBM) network with community structures and stars.

Pierre Latouche 36

slide-47
SLIDE 47

Experiments on simulated data

◮ N = 100 ◮ λ = 4 ◮ ǫ = 1 ◮ W ∗ = −5.5 ◮ U = V =

  • ǫ

. . . ǫ

  • ◮ αk = 0.25

◮ K = 4 ◮ 100 simulations ◮ 4 graph clustering methods :

◮ CFinder (Palla et al. 2006) ◮ Stochastic Block Model (SBM) ◮ Mixed Membership Stochastic Block Model (MMSB) (Airoldi

et al. 2008)

◮ Overlapping Stochastic Block Model (OSBM) Pierre Latouche 37

slide-48
SLIDE 48

How to compare the methods ?

◮ CFinder and OSBM can deal with outliers (Zi = 0) ◮ SBM and MMSB are run with K + 1 classes

֒ → identify the class of outliers

◮ Compute P = Z Z⊺ and ˆ

P = ˆ Zˆ Z

⊺ :

◮ invariant to column permutations of Z and ˆ

Z

◮ number of shared clusters between each pair of vertices

◮ Compute L2 distance d(P, ˆ

P)

Pierre Latouche 38

slide-49
SLIDE 49

Community structures

50 100 150 200 250 300 CFinder SBM MMSB OSBM

L2 distance d(P, ˆ P) over the 100 samples of networks with community structures for CFinder, SBM, MMSB and OSBM.

Pierre Latouche 39

slide-50
SLIDE 50

Community structures and stars

50 150 250 350 450 550 CFinder SBM MMSB OSBM

L2 distance d(P, ˆ P) over the 100 samples of networks with community structures for CFinder, SBM, MMSB and OSBM.

Pierre Latouche 40

slide-51
SLIDE 51

Model selection

◮ Community structure ◮ N = 100 ◮ ǫ = 1 ◮ W ∗ = −5.5 ◮ αk = 1/K ◮ KTrue ∈ {3, . . . , 7} ◮ K ∈ {2, . . . , 8} ◮ 100 simulations

Pierre Latouche 41

slide-52
SLIDE 52

Results

Table: KT rue\KILosbm(pintra ≈ 0.92)

2 3 4 5 6 7 8 3 99 1 4 99 1 5 93 5 2 6 7 64 22 7 7 16 47 37

Pierre Latouche 42

slide-53
SLIDE 53

Results

Table: KT rue\KILosbm(pintra ≈ 0.62)

2 3 4 5 6 7 8 3 99 1 4 85 9 5 1 5 4 53 26 9 8 6 18 34 27 21 7 4 18 30 48

Pierre Latouche 43

slide-54
SLIDE 54

The French blogosphere network

cluster 1 cluster 2 cluster 3 cluster 4

  • utliers

UMP 30 + 3 2 + 3 5 UDF 0 + 1 29 + 1 0 + 2 1 liberal 24 1 PS 40 17 analysts 0 + 1 1 + 3 1 + 1 0 + 4 5

  • thers

1 30

Classification of the blogs into K = 4 clusters using OSBM. 196 vertices, 2864 edges.

Pierre Latouche 44

slide-55
SLIDE 55

Conclusion

◮ Computational cost : O(K4N2) = O(K2N2) ◮ New model selection criterion : ILosbm ◮ R package OSBM soon available on the CRAN ◮ Can be used to analyze SBM networks

Pierre Latouche 45

slide-56
SLIDE 56

References

◮ K. Nowicki and T.A.B. Snijders (2001), Estimation and

prediction for stochastic blockstructures. 96, 1077-1087

◮ E.M. Airoldi, D.M. Blei, S.E. Fienberg, E.P

. Xing (2008), Mixed membership stochastic blockmodels. Journal of Machine Learning Research, 9, 1981-2014

◮ J-J. Daudin, F

. Picard et S. Robin (2008), A mixture model for random graphs. Statistics and Computing, 18, 2, 151-171

◮ P

. Latouche, E. Birmel´ e, C. Ambroise (2011), Overlapping stochastic block models with application to the French political blogosphere network. Annals of Applied Statistics, 5, 1, 309-336

◮ P

. Latouche, E. Birmel´ e, C. Ambroise (2012), Variational Bayesian inference and complexity control for stochastic block models. Statistical Modelling, 12, 1, 93-115

Pierre Latouche 46