SLIDE 1
Limiting Spectral Distribution of Stochastic Block Model
Yizhe Zhu
University of Washington
December 30, 2016 SJTU Joint Work with Ioana Dumitriu
SLIDE 2 Overview
1
Semicircle Law
2
Erd˝
enyi random graph
3
Stochastic Block Model
4
Proof of Semicircle law for Erd˝
enyi
5
Spectral Distribution for SBM
SLIDE 3
Wigner Semicircle Law
Figure: Eugene Wigner, Nobel Prize in Physics (1963)
This law was first observed by Wigner (1955) for certain special classes of random matrices arising in quantum mechanical investigations.
SLIDE 4 Wigner Semicircle Law
Two independent families of i.i.d. zero mean, real-valued random variables {Zij}1≤i<j and {Yi}1≤i, E[Z 2
1,2] = 1, max{E|Z1,2|k, E|Y1|k} < ∞ for all
k ≥ 1. XN(i, j) = XN(j, i) =
√ N if i ≤ j Yi/ √ N if i = j Let λN
i
denote the eigenvalues of XN with λN
1 ≤ λN 2 ≤ · · · ≤ λN N, and
define the empirical distribution of the eigenvalues as the probability measure FN = 1 N
N
δλN
i
SLIDE 5
Semicircle distribution density σ(x) =
1 2π
√ 4 − x21|x|≤2.
Theorem (Wigner)
For a Wigner matrix, the empirical measure FN converges weakly in probability to the standard semicircle distribution. lim
N→∞ P(|FN, f − σ, f | > ǫ) = 0, ∀f ∈ Cb(R), ǫ > 0.
SLIDE 6 Erd˝
enyi random graph
G(n, p). n vertices. i ∼ j independently with probability p.
Theorem (Tran, Vu and Wang, 2011)
For p = ω( 1
n), the empirical spectral distribution of the matrix 1 √nσAn
converges in distribution to the semicircle distribution which has a density ρsc(x) with support on [−2, 2], ρsc(x) := 1 2π
SLIDE 7 How to generalize Erd˝
enyi model?
Definition (random graph with given expected degree, Chung-Lu-Vu model)
G(w). For a sequence w = (w1, w2, . . . , wn). Edges are independently assigned to each pair of vertices (i, j) with probability wiwjρ, where ρ =
1 n
i=1 wi .
G(n, p) can be viewed as w = (pn, pn, . . . , pn).
SLIDE 8
Stochastic Block Model
Consider a network with n nodes and d communities Ω1, · · · Ωd with size n1, . . . , nd, d
i=1 ni = n. If two nodes belong to different communities,
connect them independently with probability p0. If two nodes are in the same community Ωm, connect them independently with probability pm, 1 ≤ m ≤ d. Statistical task: Community Detection / Recovery
SLIDE 9 graphon
A graphon is a symmetric measurable function W : [0, 1]2 → [0, 1]. Limit
- bjects for graph sequences in the dense case.
Generate a graph in the following way:
- 1. Each vertex j of the graph is assigned an independent random value
uj ∼ U[0, 1].
- 2. Edge (i, j) is independently included in the graph with probability
W (ui, uj). Erd˝
enyi: W = p. for some constant p ∈ [0, 1].
SLIDE 10
Measure Theory:
SLIDE 11
Measure Theory: constant function → step function → measurable function
SLIDE 12
Measure Theory: constant function → step function → measurable function Random Graph Theory:
SLIDE 13 Measure Theory: constant function → step function → measurable function Random Graph Theory: Erd˝
enyi model → stochastic block model → graphon
SLIDE 14 Proof of Semicircle Law for Erd˝
enyi random graph
SLIDE 15
Note that E[An] = pJn, Mn = σ−1(An − pJn) is centered.
Lemma (Rank Inequality)
F A − F B ≤ rank(A−B)
n
. It’s sufficient to show the semicircle law holds for Mn. A standard way is the moment method.
SLIDE 16 Note that E[An] = pJn, Mn = σ−1(An − pJn) is centered.
Lemma (Rank Inequality)
F A − F B ≤ rank(A−B)
n
. It’s sufficient to show the semicircle law holds for Mn. A standard way is the moment method. kth moment of empirical spectral distribution of a matrix Wn is
n (x) = 1
nE[Trace(W k
n )]
SLIDE 17 Note that E[An] = pJn, Mn = σ−1(An − pJn) is centered.
Lemma (Rank Inequality)
F A − F B ≤ rank(A−B)
n
. It’s sufficient to show the semicircle law holds for Mn. A standard way is the moment method. kth moment of empirical spectral distribution of a matrix Wn is
n (x) = 1
nE[Trace(W k
n )]
On a compact set, convergence in distribution is the same as convergence
1 nE[Trace(W k
n )] →
2
−2
xkρsc(x)dx
SLIDE 18
For k = 2m + 1, 2
−2 xkρsc(x)dx = 0.
For k = 2m, 2
−2 xkρsc(x)dx = 1 m+1
2m
m
SLIDE 19 For k = 2m + 1, 2
−2 xkρsc(x)dx = 0.
For k = 2m, 2
−2 xkρsc(x)dx = 1 m+1
2m
m
SLIDE 20 Let Wn =
1 √nMn and ηij be (i, j) entry of Mn. We have the following
expansion for W k
n .
1 nE[Trace(W k
n )] =
1 n1+k/2 E[Trace(Mk
n )]
= 1 n1+k/2
Eηi1i2ηi2i3 · · · ηiki1 Each term (indices) corresponds to a closed walk of length k on the complete graph Kn.
SLIDE 21 Let Wn =
1 √nMn and ηij be (i, j) entry of Mn. We have the following
expansion for W k
n .
1 nE[Trace(W k
n )] =
1 n1+k/2 E[Trace(Mk
n )]
= 1 n1+k/2
Eηi1i2ηi2i3 · · · ηiki1 Each term (indices) corresponds to a closed walk of length k on the complete graph Kn. The term is nonzero if and only if each edge in the closed walk appears at least twice, we call such a walk a good walk.
SLIDE 22
Consider a good walk that uses l different edges e1, . . . , el with multiplicities m1, . . . , ml, l ≤ m. A bound for number of good walks with l different edges are nl+1 × lk. When k = 2m + 1,
SLIDE 23 Consider a good walk that uses l different edges e1, . . . , el with multiplicities m1, . . . , ml, l ≤ m. A bound for number of good walks with l different edges are nl+1 × lk. When k = 2m + 1, 1 nE[Trace(W k
n )] =
1 n1+k/2
m
Eηm1
e1 · · · ηml el = O(
1 √np). When k = 2m,
SLIDE 24 Consider a good walk that uses l different edges e1, . . . , el with multiplicities m1, . . . , ml, l ≤ m. A bound for number of good walks with l different edges are nl+1 × lk. When k = 2m + 1, 1 nE[Trace(W k
n )] =
1 n1+k/2
m
Eηm1
e1 · · · ηml el = O(
1 √np). When k = 2m, classify good walks into two types. The first type uses l ≤ m − 1 different edges, the contribution of these terms are O( 1
np).
The second kind of good walk use exactly l = m different edges and each term has form Eη2
e1 · · · η2 el = 1.
The number of the second kind of good walk is nm+1(1+O(n−1)
m+1
2m
m
Then the conclusion follows.
SLIDE 25 Spectral Distribution for SBM
We consider a n × n random matrix with rectangular blocks, we can write the matrix as follows An =
d
Ekl ⊗ A(k,l) where A(k,l), 1 ≤ k ≤ l ≤ d are nk × nl independent rectangular random
rs to denote the entries of the matrix A(k,l) and make
the following assumptions 1:
SLIDE 26 Spectral Distribution for SBM
We consider a n × n random matrix with rectangular blocks, we can write the matrix as follows An =
d
Ekl ⊗ A(k,l) where A(k,l), 1 ≤ k ≤ l ≤ d are nk × nl independent rectangular random
rs to denote the entries of the matrix A(k,l) and make
the following assumptions 1:
1 a(k,l)
rs
= a(l,k)
sr
, for all r = 1, . . . , nk, s = 1, . . . , nl, 1 ≤ k, ≤ l ≤ d and nk/n → αk ∈ [0, ∞), 1 ≤ k ≤ d.
2 {ak,l
rs , 1 ≤ r ≤ nk, 1 ≤ s ≤ nl, k ≤ l} are i.i.d. random variable with
mean zero and variance σ2
kl, 1 ≤ k ≤ l ≤ d.
3 Let σ2 = maxk,l σ2
kl, we have limn→∞ σ2
kl
σ2 = skl.
4 Let Mn = An
σ = (c(k,l) rs
) and lim
n→∞
1 n2
E
rs
|2I(|c(k,l)
rs
| ≥ η√n)
SLIDE 27
Theorem (Ding, 2015)
If d is fixed, under the assumptions (1) − (4), with probability 1, the empirical spectral distribution Fn of the random matrix
Mn √n = An √nσ = (c(k,l) rs
) converges to a probability distribution F.
SLIDE 28 Theorem (Ding, 2015)
If d is fixed, under the assumptions (1) − (4), with probability 1, the empirical spectral distribution Fn of the random matrix
Mn √n = An √nσ = (c(k,l) rs
) converges to a probability distribution F.
Lemma (Ding, 2015)
In order to prove the theorem above, we only need to verify that they hold under the following assumptions 2:
1 c(k,k)
rr
= 0, {ak,l
rs , 1 ≤ r ≤ nk, 1 ≤ s ≤ nl, k ≤ l} are i.i.d. random
variable with mean zero and variance σ2
kl, 1 ≤ k ≤ l ≤ d, and
limn→∞ σ2
kl = skl ≤ 1.
2 |c(k,l)
rs
| ≤ ηn √n for some positive sequence ηn such that ηn → 0.
SLIDE 29 Theorem (Ding, 2015)
If d is fixed, under the assumptions (1) − (4), with probability 1, the empirical spectral distribution Fn of the random matrix
Mn √n = An √nσ = (c(k,l) rs
) converges to a probability distribution F.
Lemma (Ding, 2015)
In order to prove the theorem above, we only need to verify that they hold under the following assumptions 2:
1 c(k,k)
rr
= 0, {ak,l
rs , 1 ≤ r ≤ nk, 1 ≤ s ≤ nl, k ≤ l} are i.i.d. random
variable with mean zero and variance σ2
kl, 1 ≤ k ≤ l ≤ d, and
limn→∞ σ2
kl = skl ≤ 1.
2 |c(k,l)
rs
| ≤ ηn √n for some positive sequence ηn such that ηn → 0. pi = ω(1/n) satisfies assumption 2, but not centered. Rank inequality helps! F A − F B ≤ rank(A − B) n .
SLIDE 30
What is F ?
SLIDE 31
What is F ?
SLIDE 32
What is F ? Formula?
SLIDE 33 For a probability distribution G, its Stieltjes transfrom sG(z) is defined as follows. sG(z) =
x − z dG(x), z ∈ C+
Theorem (Far, Oraby, Bryc and Speicher, 2006)
If all entries are Gaussian, sF(z) is determined by the following equations. s(z) =
d
αkgk(z) −zgk(z) = 1 +
d
αlσ2
klgl(z)gk(z), 1 ≤ k ≤ d
Tool: Free probability (algebraic). Gaussian random variable satisfies assumption (2).
SLIDE 34
d → ∞?
SLIDE 35
d → ∞?
Graphon
SLIDE 36 d → ∞?
Graphon
Theorem
Under the assumption 2, if d → ∞, αk → 0 for all 1 ≤ k ≤ d, and all
- ff-diagonal blocks have variance σ2
0, then ESD of Mn converges to
semicircle law with parameter s0 = limn→∞
σ2 σ2 .
Semicircle law with paramter σ. ρσ =
1 2πσ2
√ 4σ2 − x21|x|≤4σ.
SLIDE 37
Theorem
Under the assumption 2, if d → ∞ as n → ∞ and ∞
k=1 αk = 1, the
empirical spectral distribution of An/(σ√n) converges with probability 1.
Corollary
the S-transform of ESD satisfies the following equations: S(z) = 1 a(z, x)dx a(z, x)−1 = z − 1 σ2(y, x)a(z, y)dy Example: αi = 1
2i .
SLIDE 38
∞
i=1 αi = c < 1
Given ∞
i=1 αi = c, and c > α1 ≥ α2 · · · > 0. Let ni = ⌊nαi⌋, then we
can generate large blocks of size ni × ni with parameter pi until ni = 0, then we have a remaining block with noise p0. let k(n) = sup{k : αk ≥ 1/n}. We generate the last block of size nk(n)+1 = n − k(n)
i=1 ni. with parameter p0.
Lemma
Assume ∞
i=1 αi = c and c > α1 ≥ α2 · · · > 0. Let
k(n) = sup{k : αk ≥ 1/n}, then k(n)
n
→ 0.
SLIDE 39
∞
i=1 αi = c < 1
Given ∞
i=1 αi = c, and c > α1 ≥ α2 · · · > 0. Let ni = ⌊nαi⌋, then we
can generate large blocks of size ni × ni with parameter pi until ni = 0, then we have a remaining block with noise p0. let k(n) = sup{k : αk ≥ 1/n}. We generate the last block of size nk(n)+1 = n − k(n)
i=1 ni. with parameter p0.
Lemma
Assume ∞
i=1 αi = c and c > α1 ≥ α2 · · · > 0. Let
k(n) = sup{k : αk ≥ 1/n}, then k(n)
n
→ 0. Under assumption 2, the limiting distribution exists.
SLIDE 40 Rate of Convergence
Local semicircle law
Theorem (Tao, Vu, 2011)
For any ǫ, δ > 0, and any random Hermitian matrix Mn = (ξij)1≤i,j≤n whose upper-triangular entries are independent with mean zero and variance 1 and such that |ξij| ≤ K almost surely for all i, j and some 1 ≤ K ≤ n1/2−ǫ and any interval I in [−2 + ǫ, 2 − ǫ] of width |I| ≥ K 2 log20 n
n
, the number of eigenvalues NI of Wn =
1 √nMn in I obeys
the concentration estimate |NI − n
ρsc(x)dx| ≤ δn|I| with overwhelming probability.
SLIDE 41 References
Anderson G W, Guionnet A, Zeitouni O. (2010). An introduction to random matrices. Cambridge University Press. R´ acz, Mikl´
Basic models and questions in statistical network analysis. arXiv:1609.03511. Chung F, Lu L, Vu V. (2003). Spectra of random graphs with given expected degrees. Proceedings of the National Academy of Sciences, 2003, 100(11): 6313-6318. Tran L V, Vu V H, Wang K. (2013). Sparse random graphs: Eigenvalues and eigenvectors. Random Structures & Algorithms, 2013, 42(1): 110-134. Ding X. (2014). Spectral analysis of large block random matrices with rectangular blocks. Lithuanian Mathematical Journal, 2014, 54(2): 115-126.
SLIDE 42
References
Avrachenkov K, Cottatellucci L, Kadavankandy A. (2015). Spectral properties of random matrices for stochastic block model. Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt), 2015 13th International Symposium on. IEEE, 2015: 537-544. Avrachenkov K, Cottatellucci L, Kadavankandy A. (2015). Spectral properties of random matrices for stochastic block model. Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt), 2015 13th International Symposium on. IEEE, 2015: 537-544. Far R R, Oraby T, Bryc W, et al. (2006). Spectra of large block matrices arXiv preprint cs/0610045, 2006. Tao T, Vu V. (2011). Random matrices: universality of local eigenvalue statistics Acta mathematica, 2011, 206(1): 127-204.