Limiting Spectral Distribution of Stochastic Block Model Yizhe Zhu - - PowerPoint PPT Presentation

limiting spectral distribution of stochastic block model
SMART_READER_LITE
LIVE PREVIEW

Limiting Spectral Distribution of Stochastic Block Model Yizhe Zhu - - PowerPoint PPT Presentation

Limiting Spectral Distribution of Stochastic Block Model Yizhe Zhu University of Washington December 30, 2016 SJTU Joint Work with Ioana Dumitriu Overview Semicircle Law 1 Erd os R enyi random graph 2 Stochastic Block Model 3


slide-1
SLIDE 1

Limiting Spectral Distribution of Stochastic Block Model

Yizhe Zhu

University of Washington

December 30, 2016 SJTU Joint Work with Ioana Dumitriu

slide-2
SLIDE 2

Overview

1

Semicircle Law

2

Erd˝

  • s R´

enyi random graph

3

Stochastic Block Model

4

Proof of Semicircle law for Erd˝

  • s R´

enyi

5

Spectral Distribution for SBM

slide-3
SLIDE 3

Wigner Semicircle Law

Figure: Eugene Wigner, Nobel Prize in Physics (1963)

This law was first observed by Wigner (1955) for certain special classes of random matrices arising in quantum mechanical investigations.

slide-4
SLIDE 4

Wigner Semicircle Law

Two independent families of i.i.d. zero mean, real-valued random variables {Zij}1≤i<j and {Yi}1≤i, E[Z 2

1,2] = 1, max{E|Z1,2|k, E|Y1|k} < ∞ for all

k ≥ 1. XN(i, j) = XN(j, i) =

  • Zij/

√ N if i ≤ j Yi/ √ N if i = j Let λN

i

denote the eigenvalues of XN with λN

1 ≤ λN 2 ≤ · · · ≤ λN N, and

define the empirical distribution of the eigenvalues as the probability measure FN = 1 N

N

  • i=1

δλN

i

slide-5
SLIDE 5

Semicircle distribution density σ(x) =

1 2π

√ 4 − x21|x|≤2.

Theorem (Wigner)

For a Wigner matrix, the empirical measure FN converges weakly in probability to the standard semicircle distribution. lim

N→∞ P(|FN, f − σ, f | > ǫ) = 0, ∀f ∈ Cb(R), ǫ > 0.

slide-6
SLIDE 6

Erd˝

  • s R´

enyi random graph

G(n, p). n vertices. i ∼ j independently with probability p.

Theorem (Tran, Vu and Wang, 2011)

For p = ω( 1

n), the empirical spectral distribution of the matrix 1 √nσAn

converges in distribution to the semicircle distribution which has a density ρsc(x) with support on [−2, 2], ρsc(x) := 1 2π

  • 4 − x2.
slide-7
SLIDE 7

How to generalize Erd˝

  • s R´

enyi model?

Definition (random graph with given expected degree, Chung-Lu-Vu model)

G(w). For a sequence w = (w1, w2, . . . , wn). Edges are independently assigned to each pair of vertices (i, j) with probability wiwjρ, where ρ =

1 n

i=1 wi .

G(n, p) can be viewed as w = (pn, pn, . . . , pn).

slide-8
SLIDE 8

Stochastic Block Model

Consider a network with n nodes and d communities Ω1, · · · Ωd with size n1, . . . , nd, d

i=1 ni = n. If two nodes belong to different communities,

connect them independently with probability p0. If two nodes are in the same community Ωm, connect them independently with probability pm, 1 ≤ m ≤ d. Statistical task: Community Detection / Recovery

slide-9
SLIDE 9

graphon

A graphon is a symmetric measurable function W : [0, 1]2 → [0, 1]. Limit

  • bjects for graph sequences in the dense case.

Generate a graph in the following way:

  • 1. Each vertex j of the graph is assigned an independent random value

uj ∼ U[0, 1].

  • 2. Edge (i, j) is independently included in the graph with probability

W (ui, uj). Erd˝

  • s R´

enyi: W = p. for some constant p ∈ [0, 1].

slide-10
SLIDE 10

Measure Theory:

slide-11
SLIDE 11

Measure Theory: constant function → step function → measurable function

slide-12
SLIDE 12

Measure Theory: constant function → step function → measurable function Random Graph Theory:

slide-13
SLIDE 13

Measure Theory: constant function → step function → measurable function Random Graph Theory: Erd˝

  • s R´

enyi model → stochastic block model → graphon

slide-14
SLIDE 14

Proof of Semicircle Law for Erd˝

  • s R´

enyi random graph

slide-15
SLIDE 15

Note that E[An] = pJn, Mn = σ−1(An − pJn) is centered.

Lemma (Rank Inequality)

F A − F B ≤ rank(A−B)

n

. It’s sufficient to show the semicircle law holds for Mn. A standard way is the moment method.

slide-16
SLIDE 16

Note that E[An] = pJn, Mn = σ−1(An − pJn) is centered.

Lemma (Rank Inequality)

F A − F B ≤ rank(A−B)

n

. It’s sufficient to show the semicircle law holds for Mn. A standard way is the moment method. kth moment of empirical spectral distribution of a matrix Wn is

  • xkdF W

n (x) = 1

nE[Trace(W k

n )]

slide-17
SLIDE 17

Note that E[An] = pJn, Mn = σ−1(An − pJn) is centered.

Lemma (Rank Inequality)

F A − F B ≤ rank(A−B)

n

. It’s sufficient to show the semicircle law holds for Mn. A standard way is the moment method. kth moment of empirical spectral distribution of a matrix Wn is

  • xkdF W

n (x) = 1

nE[Trace(W k

n )]

On a compact set, convergence in distribution is the same as convergence

  • f moments. Need to show

1 nE[Trace(W k

n )] →

2

−2

xkρsc(x)dx

slide-18
SLIDE 18

For k = 2m + 1, 2

−2 xkρsc(x)dx = 0.

For k = 2m, 2

−2 xkρsc(x)dx = 1 m+1

2m

m

slide-19
SLIDE 19

For k = 2m + 1, 2

−2 xkρsc(x)dx = 0.

For k = 2m, 2

−2 xkρsc(x)dx = 1 m+1

2m

m

  • ← Catalan number.
slide-20
SLIDE 20

Let Wn =

1 √nMn and ηij be (i, j) entry of Mn. We have the following

expansion for W k

n .

1 nE[Trace(W k

n )] =

1 n1+k/2 E[Trace(Mk

n )]

= 1 n1+k/2

  • 1≤i1...ik≤n

Eηi1i2ηi2i3 · · · ηiki1 Each term (indices) corresponds to a closed walk of length k on the complete graph Kn.

slide-21
SLIDE 21

Let Wn =

1 √nMn and ηij be (i, j) entry of Mn. We have the following

expansion for W k

n .

1 nE[Trace(W k

n )] =

1 n1+k/2 E[Trace(Mk

n )]

= 1 n1+k/2

  • 1≤i1...ik≤n

Eηi1i2ηi2i3 · · · ηiki1 Each term (indices) corresponds to a closed walk of length k on the complete graph Kn. The term is nonzero if and only if each edge in the closed walk appears at least twice, we call such a walk a good walk.

slide-22
SLIDE 22

Consider a good walk that uses l different edges e1, . . . , el with multiplicities m1, . . . , ml, l ≤ m. A bound for number of good walks with l different edges are nl+1 × lk. When k = 2m + 1,

slide-23
SLIDE 23

Consider a good walk that uses l different edges e1, . . . , el with multiplicities m1, . . . , ml, l ≤ m. A bound for number of good walks with l different edges are nl+1 × lk. When k = 2m + 1, 1 nE[Trace(W k

n )] =

1 n1+k/2

m

  • l=1
  • good walk of l edges

Eηm1

e1 · · · ηml el = O(

1 √np). When k = 2m,

slide-24
SLIDE 24

Consider a good walk that uses l different edges e1, . . . , el with multiplicities m1, . . . , ml, l ≤ m. A bound for number of good walks with l different edges are nl+1 × lk. When k = 2m + 1, 1 nE[Trace(W k

n )] =

1 n1+k/2

m

  • l=1
  • good walk of l edges

Eηm1

e1 · · · ηml el = O(

1 √np). When k = 2m, classify good walks into two types. The first type uses l ≤ m − 1 different edges, the contribution of these terms are O( 1

np).

The second kind of good walk use exactly l = m different edges and each term has form Eη2

e1 · · · η2 el = 1.

The number of the second kind of good walk is nm+1(1+O(n−1)

m+1

2m

m

  • .

Then the conclusion follows.

slide-25
SLIDE 25

Spectral Distribution for SBM

We consider a n × n random matrix with rectangular blocks, we can write the matrix as follows An =

d

  • k,l=1

Ekl ⊗ A(k,l) where A(k,l), 1 ≤ k ≤ l ≤ d are nk × nl independent rectangular random

  • matrices. We use ak,l

rs to denote the entries of the matrix A(k,l) and make

the following assumptions 1:

slide-26
SLIDE 26

Spectral Distribution for SBM

We consider a n × n random matrix with rectangular blocks, we can write the matrix as follows An =

d

  • k,l=1

Ekl ⊗ A(k,l) where A(k,l), 1 ≤ k ≤ l ≤ d are nk × nl independent rectangular random

  • matrices. We use ak,l

rs to denote the entries of the matrix A(k,l) and make

the following assumptions 1:

1 a(k,l)

rs

= a(l,k)

sr

, for all r = 1, . . . , nk, s = 1, . . . , nl, 1 ≤ k, ≤ l ≤ d and nk/n → αk ∈ [0, ∞), 1 ≤ k ≤ d.

2 {ak,l

rs , 1 ≤ r ≤ nk, 1 ≤ s ≤ nl, k ≤ l} are i.i.d. random variable with

mean zero and variance σ2

kl, 1 ≤ k ≤ l ≤ d.

3 Let σ2 = maxk,l σ2

kl, we have limn→∞ σ2

kl

σ2 = skl.

4 Let Mn = An

σ = (c(k,l) rs

) and lim

n→∞

1 n2

  • k,l
  • r,s

E

  • |c(k,l)

rs

|2I(|c(k,l)

rs

| ≥ η√n)

  • = 0
slide-27
SLIDE 27

Theorem (Ding, 2015)

If d is fixed, under the assumptions (1) − (4), with probability 1, the empirical spectral distribution Fn of the random matrix

Mn √n = An √nσ = (c(k,l) rs

) converges to a probability distribution F.

slide-28
SLIDE 28

Theorem (Ding, 2015)

If d is fixed, under the assumptions (1) − (4), with probability 1, the empirical spectral distribution Fn of the random matrix

Mn √n = An √nσ = (c(k,l) rs

) converges to a probability distribution F.

Lemma (Ding, 2015)

In order to prove the theorem above, we only need to verify that they hold under the following assumptions 2:

1 c(k,k)

rr

= 0, {ak,l

rs , 1 ≤ r ≤ nk, 1 ≤ s ≤ nl, k ≤ l} are i.i.d. random

variable with mean zero and variance σ2

kl, 1 ≤ k ≤ l ≤ d, and

limn→∞ σ2

kl = skl ≤ 1.

2 |c(k,l)

rs

| ≤ ηn √n for some positive sequence ηn such that ηn → 0.

slide-29
SLIDE 29

Theorem (Ding, 2015)

If d is fixed, under the assumptions (1) − (4), with probability 1, the empirical spectral distribution Fn of the random matrix

Mn √n = An √nσ = (c(k,l) rs

) converges to a probability distribution F.

Lemma (Ding, 2015)

In order to prove the theorem above, we only need to verify that they hold under the following assumptions 2:

1 c(k,k)

rr

= 0, {ak,l

rs , 1 ≤ r ≤ nk, 1 ≤ s ≤ nl, k ≤ l} are i.i.d. random

variable with mean zero and variance σ2

kl, 1 ≤ k ≤ l ≤ d, and

limn→∞ σ2

kl = skl ≤ 1.

2 |c(k,l)

rs

| ≤ ηn √n for some positive sequence ηn such that ηn → 0. pi = ω(1/n) satisfies assumption 2, but not centered. Rank inequality helps! F A − F B ≤ rank(A − B) n .

slide-30
SLIDE 30

What is F ?

slide-31
SLIDE 31

What is F ?

slide-32
SLIDE 32

What is F ? Formula?

slide-33
SLIDE 33

For a probability distribution G, its Stieltjes transfrom sG(z) is defined as follows. sG(z) =

  • 1

x − z dG(x), z ∈ C+

Theorem (Far, Oraby, Bryc and Speicher, 2006)

If all entries are Gaussian, sF(z) is determined by the following equations. s(z) =

d

  • k=1

αkgk(z) −zgk(z) = 1 +

d

  • l=1

αlσ2

klgl(z)gk(z), 1 ≤ k ≤ d

Tool: Free probability (algebraic). Gaussian random variable satisfies assumption (2).

slide-34
SLIDE 34

d → ∞?

slide-35
SLIDE 35

d → ∞?

Graphon

slide-36
SLIDE 36

d → ∞?

Graphon

Theorem

Under the assumption 2, if d → ∞, αk → 0 for all 1 ≤ k ≤ d, and all

  • ff-diagonal blocks have variance σ2

0, then ESD of Mn converges to

semicircle law with parameter s0 = limn→∞

σ2 σ2 .

Semicircle law with paramter σ. ρσ =

1 2πσ2

√ 4σ2 − x21|x|≤4σ.

slide-37
SLIDE 37

Theorem

Under the assumption 2, if d → ∞ as n → ∞ and ∞

k=1 αk = 1, the

empirical spectral distribution of An/(σ√n) converges with probability 1.

Corollary

the S-transform of ESD satisfies the following equations: S(z) = 1 a(z, x)dx a(z, x)−1 = z − 1 σ2(y, x)a(z, y)dy Example: αi = 1

2i .

slide-38
SLIDE 38

i=1 αi = c < 1

Given ∞

i=1 αi = c, and c > α1 ≥ α2 · · · > 0. Let ni = ⌊nαi⌋, then we

can generate large blocks of size ni × ni with parameter pi until ni = 0, then we have a remaining block with noise p0. let k(n) = sup{k : αk ≥ 1/n}. We generate the last block of size nk(n)+1 = n − k(n)

i=1 ni. with parameter p0.

Lemma

Assume ∞

i=1 αi = c and c > α1 ≥ α2 · · · > 0. Let

k(n) = sup{k : αk ≥ 1/n}, then k(n)

n

→ 0.

slide-39
SLIDE 39

i=1 αi = c < 1

Given ∞

i=1 αi = c, and c > α1 ≥ α2 · · · > 0. Let ni = ⌊nαi⌋, then we

can generate large blocks of size ni × ni with parameter pi until ni = 0, then we have a remaining block with noise p0. let k(n) = sup{k : αk ≥ 1/n}. We generate the last block of size nk(n)+1 = n − k(n)

i=1 ni. with parameter p0.

Lemma

Assume ∞

i=1 αi = c and c > α1 ≥ α2 · · · > 0. Let

k(n) = sup{k : αk ≥ 1/n}, then k(n)

n

→ 0. Under assumption 2, the limiting distribution exists.

slide-40
SLIDE 40

Rate of Convergence

Local semicircle law

Theorem (Tao, Vu, 2011)

For any ǫ, δ > 0, and any random Hermitian matrix Mn = (ξij)1≤i,j≤n whose upper-triangular entries are independent with mean zero and variance 1 and such that |ξij| ≤ K almost surely for all i, j and some 1 ≤ K ≤ n1/2−ǫ and any interval I in [−2 + ǫ, 2 − ǫ] of width |I| ≥ K 2 log20 n

n

, the number of eigenvalues NI of Wn =

1 √nMn in I obeys

the concentration estimate |NI − n

  • I

ρsc(x)dx| ≤ δn|I| with overwhelming probability.

slide-41
SLIDE 41

References

Anderson G W, Guionnet A, Zeitouni O. (2010). An introduction to random matrices. Cambridge University Press. R´ acz, Mikl´

  • s Z. (2016).

Basic models and questions in statistical network analysis. arXiv:1609.03511. Chung F, Lu L, Vu V. (2003). Spectra of random graphs with given expected degrees. Proceedings of the National Academy of Sciences, 2003, 100(11): 6313-6318. Tran L V, Vu V H, Wang K. (2013). Sparse random graphs: Eigenvalues and eigenvectors. Random Structures & Algorithms, 2013, 42(1): 110-134. Ding X. (2014). Spectral analysis of large block random matrices with rectangular blocks. Lithuanian Mathematical Journal, 2014, 54(2): 115-126.

slide-42
SLIDE 42

References

Avrachenkov K, Cottatellucci L, Kadavankandy A. (2015). Spectral properties of random matrices for stochastic block model. Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt), 2015 13th International Symposium on. IEEE, 2015: 537-544. Avrachenkov K, Cottatellucci L, Kadavankandy A. (2015). Spectral properties of random matrices for stochastic block model. Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt), 2015 13th International Symposium on. IEEE, 2015: 537-544. Far R R, Oraby T, Bryc W, et al. (2006). Spectra of large block matrices arXiv preprint cs/0610045, 2006. Tao T, Vu V. (2011). Random matrices: universality of local eigenvalue statistics Acta mathematica, 2011, 206(1): 127-204.