[PPT] - Spectral properties of Google matrix Lecture 3 Klaus Frahm PowerPoint Presentation

SLIDE 1

1
0.5

0.5 1

1
0.5

0.5 1

λ

Wikipedia

1
0.5

0.5 1

1
0.5

0.5 1

Physical Review

λ

Spectral properties of Google matrix

Lecture 3 Klaus Frahm

Quantware MIPS Center Universit´ e Paul Sabatier Laboratoire de Physique Th´ eorique, UMR 5152, IRSAMC

A. D. Chepelianskii, Y. H. Eom, L. Ermann, B. Georgeot, D. L. Shepelyansky

Network analysis and applications Luchon, June 21 - July 5, 2014

SLIDE 2

Random Perron-Frobenius matrices . . . . . . . . . . . . . 3 Poisson statistics of PageRank . . . . . . . . . . . . . . . . 6 Physical Review network . . . . . . . . . . . . . . . . . . . 8 Triangular approximation . . . . . . . . . . . . . . . . . . . 11 Full Physical Review network . . . . . . . . . . . . . . . . . 14 Fractal Weyl law . . . . . . . . . . . . . . . . . . . . . . . 21 ImpactRank for influence propagation . . . . . . . . . . . . 22 Integer network . . . . . . . . . . . . . . . . . . . . . . . . 23 References . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2

SLIDE 3

Random Perron-Frobenius matrices

Construct random matrix ensembles Gij such that:

Gij ≥ 0
Gij are (approximately) non-correlated and distributed with the

same distribution P(Gij) (of finite variance σ2).

j Gij = 1

⇒ Gij = 1/N

⇒ average of G has one eigenvalue λ1 = 1 (⇒ “flat” PageRank)

and other eigenvalues λj = 0 (for j = 1).

degenerate perturbation theory for the fluctuations ⇒ circular

eigenvalue density with R =

√ Nσ and one unit eigenvalue.

3

SLIDE 4

Different variants of the model:

uniform full: P(G) = N/2 for 0 ≤ G ≤ 2/N

⇒ R = 1/ √ 3N

uniform sparse with Q non-zero elements per column:

P(G) = Q/2 for 0 ≤ G ≤ 2/Q with probability Q/N

and G = 0 with probability 1 − Q/N

⇒ R = 2/√3Q

constant sparse with Q non-zero elements per column:

G = 1/Q with probability Q/N

and G = 0 with probability 1 − Q/N

⇒ R = 1/√Q

powerlaw with p(G) = D(1 + aG)−b for 0 ≤ G ≤ 1 and

2 < b < 3: ⇒ R = C(b) N 1−b/2 , C(b) = (b − 2)(b−1)/2

b−1 3−b

4

SLIDE 5

Numerical verification: uniform full:

N = 400

uniform sparse:

N = 400, Q = 20

power law:

b = 2.5

1
0.5

0.5 1

1
0.5

0.5 1

λ

1
0.5

0.5 1

1
0.5

0.5 1

λ

0.02

0.02

0.02

0.02

λ

1
0.5

0.5 1

1
0.5

0.5 1

λ

1
0.5

0.5 1

1
0.5

0.5 1

λ

0.1 0.2 100 1000 R N R = 0.67 N-0.22

triangular random and average constant sparse:

N = 400, Q = 20

power law case:

Rth ∼ N −0.25

5

SLIDE 6

Poisson statistics of PageRank

0.2 0.4 0.6 0.8 1 1 2 3 4 p(s) s

Twitter

riginal data

pPois(s) pWig(s) 0.2 0.4 0.6 0.8 1 1 2 3 4 p(s) s

Wikipedia

riginal data

pPois(s) pWig(s)

Identify PageRank values to “energy-levels”:

P(i) = exp(−Ei/T)/Z

with Z =

i exp(−Ei/T) and an effective temperature T (can be

choosen: T = 1). 6

SLIDE 7

7.5 8 8.5 9 9.5 10 0.8 0.85 0.9 Ei α 6.5 7 7.5 0.8 0.85 0.9 Ei α 9.5 9.6 9.7 9.8 0.8 0.85 0.9 Ei α 7.2 7.3 7.4 7.5 0.8 0.85 0.9 Ei α

Twitter Wikipedia

Parameter dependance of Ei = − ln(Pi) on the damping factor α. 7

SLIDE 8

Physical Review network

N = 463347 nodes and Nℓ = 4691015 links.

Coarse-grained matrix structure (500 × 500 cells): left: time ordered right: journal and then time ordered “11” Journals of Physical Review: (Phys. Rev. Series I), Phys. Rev., Phys. Rev. Lett., (Rev. Mod. Phys.), Phys. Rev. A, B, C, D, E, (Phys. Rev. STAB and

Phys. Rev. STPER).

8

SLIDE 9

⇒ nearly triangular matrix structure of adjacency matrix: most

citations links t → t′ are for t > t′ (“past citations”) but there is small number (12126 = 2.6 × 10−3Nℓ) of links t → t′ with t ≤ t′ corresponding to future citations. Spectrum by “double-precision” Arnoldi method with nA = 8000:

1
0.5

0.5 1

1
0.5

0.5 1

λ

1
0.5

0.5 1

1
0.5

0.5 1

λ Numerical problem: eigenvalues with |λ| < 0.3 − 0.4 are not reliable! Reason: large Jordan subspaces associated to the eigenvalue λ = 0. 9

SLIDE 10

“very bad” Jordan perturbation theory: Consider a “perturbed” Jordan block of size D:

      0 1 · · · 0 0 0 0 · · · 0 0

. . . . . . ... . . . . . .

0 0 · · · 0 1 ε 0 · · · 0 0      

characteristic polynomial: λD − (−1)Dε

ε = 0 ⇒ λ = 0 ε = 0 ⇒ λj = −ε1/D exp(2πij/D)

for D ≈ 102 and ε = 10−16

⇒

“Jordan-cloud” of artifical eigenvalues due to rounding errors in the region |λ| < 0.3 − 0.4. 10

SLIDE 11

Triangular approximation

Remove the small number of links due to “future citations”. Semi-analytical diagonalization is possible:

S = S0 + e d T/N

where en = 1 for all nodes n, dn = 1 for dangling nodes n and

dn = 0 otherwise. S0 is the pure link matrix which is nil-potent: Sl

0 = 0

with l = 352. Let ψ be an eigenvector of S with eigenvalue λ and C = d Tψ.

If C = 0 ⇒ ψ eigenvector of S0 ⇒ λ = 0 since S0 nil-potent.

These eigenvectors belong to large Jordan blocks and are responsible for the numerical problems. Note: Similar situation as in network of integer numbers where l = [log2(N)] and numerical instability for |λ| < 0.01.

11

SLIDE 12

If C = 0 ⇒ λ = 0 since the equation S0ψ = −C e/N does not

have a solution ⇒ λ1 − S0 invertible.

⇒ ψ = C (λ1 − S0)−1 e/N = C λ

l−1

j=0

S0 λ j e/N . From λl = (d Tψ/C)λl ⇒ Pr(λ) = 0

with the reduced polynomial of degree l = 352 :

Pr(λ) = λl −

l−1

j=0

λl−1−j cj = 0 , cj = d T Sj

0 e/N .

⇒ at most l = 352 eigenvalues λ = 0 which can be numerically

determined as the zeros of Pr(λ). However: still numerical problems:

cl−1 ≈ 3.6 × 10−352
alternate sign problem with a strong loss of significance.
big sensitivity of eigenvalues on cj

12

SLIDE 13

Solution:

Using the multi precision library GMP with 256 binary digits the zeros of Pr(λ) can be determined with accuracy ∼

10−18.

Furthermore the Arnoldi method can also be implemented with higher precision.

red crosses: zeros of Pr(λ) from 256 binary digits calculation blue squares: eigenvalues from Arnoldi method with 52, 256, 512, 1280 binary digits. In the last case: ⇒ break off at nA = 352 with vanishing coupling element.

0.5

0.5

0.5

0.5 1

λ

0.4
0.2

0.2 0.4

0.4
0.2

0.2 0.4

0.5

0.5

0.5

0.5 1

λ

0.5

0.5

0.5

0.5 1

λ

0.5

0.5

0.5

0.5 1

λ

0.2
0.1

0.1 0.2

0.2
0.1

0.1 0.2

0.2
0.1

0.1 0.2

0.2
0.1

0.1 0.2

0.2
0.1

0.1 0.2

0.2
0.1

0.1 0.2

13

SLIDE 14

Full Physical Review network

High precision Arnoldi method for full Physical Review network (including the “future citations”) for 52, 256, 512, 768 binary digits and

nA = 2000:

0.5

0.5

0.5

0.5 1

λ

0.2
0.1

0.1 0.2

0.2
0.1

0.1 0.2

0.4
0.2

0.2 0.4

0.4
0.2

0.2 0.4

0.1
0.05

0.05 0.1

0.1
0.05

0.05 0.1

14

SLIDE 15

Degeneracies

0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 100 200 300 |λj| j 52 binary digits nA=1000 nA=2000 nA=4000 nA=8000 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 100 200 300 |λj| j nA=2000 768 binary digits 512 binary digits 256 binary digits 52 binary digits

High precision in Arnoldi method is “bad” to count the degeneracy of certain degenerate eigenvalues. In theory the Arnoldi method cannot find several eigenvectors for degenerate eigenvalues, a shortcoming which is (partly) “repaired” by rounding errors.

Q: How are highly degenerate core space eigenvalues possible ? 15

SLIDE 16

Semi-analytical argument for the full PR network:

S = S0 + e d T/N

There are two groups of eigenvectors ψ with: Sψ = λψ

1. Those with d Tψ = 0

⇒ ψ is also an eigenvector of S0.

Generically an arbitrary eigenvector of S0 is not an eigenvector of

S unless the eigenvalue is degenerate with degeneracy m > 1.

Using linear combinations of different eigenvectors for the same eigenvalue one can construct m − 1 eigenvectors ψ respecting

d Tψ = 0 which are therefore eigenvectors of S.

Pratically: determine degenerate subspace eigenvalues of S0 (and also of ST

0 ) which are of the form: λ = ±1/√n with

n = 1, 2, 3, . . . due to 2 × 2-blocks:

1/n1

1/n2

⇒

λ = ± 1 √n1 n2 .

16

SLIDE 17

2. Those with d Tψ = 0

⇒ R(λ) = 0 with the rational function: R(λ) = 1 − dT 1 λ 1 − S0 e/N = 1 −

j,q

Cjq (λ − ρj)q

Here Cjq and ρj are unknown, except for

ρ1 = 2 Re [(9 + i √ 119)1/3]/(135)1/3 ≈ 0.9024 and ρ2,3 = ±1/ √ 2 ≈ ±0.7071.

Idea: Expand the geometric matrix series ⇒

R(λ) = 1 −

∞

j=0

cjλ−1−j , cj = d T Sj

0 e/N

which converges for |λ| > ρ1 ≈ 0.9024 since cj ∼ ρj

1 for j → ∞.

Problem: How to determine the zeros of R(λ) with |λ| < ρ1 ? 17

SLIDE 18

Analytic continuation by rational interpolation: Use the series to evaluate R(z) at nS support points

zj = exp(2πij/nS) with a given precision of p binary digits and

determine the rational function RI(z) which interpolates R(z) at these support points. Two cases:

nS = 2nR + 1 ⇒ RI(z) = PnR(z) QnR(z) nS = 2nR + 2 ⇒ RI(z) = PnR(z) QnR+1(z)

The nR zeros of PnR(z) are approximations of the eigenvalues of

S (of the 2nd group).

For a given precision, e. g. p = 1024 binary digits one can obtain a certain number of reliable eigenvalues, e. g. nR = 300. The method can be pushed up to p = 16384 and nR = 2500 which is better than the high precision Arnoldi method with nA = 2000. 18

SLIDE 19

Examples:

Some “artificial zeros” for nR = 340 and p = 1024 (left top and middle panels) where both variants of the method differ. For nR = 300 and p = 1024 most zeros coincide with HP Arnoldi method (right top and middle panels) and both variants of the method coincide. Lower panels: comparison for nR =

2000, p = 12288 (left) or for nR = 2500, p = 16384 with HP Arnoldi

method.

0.4
0.2

0.2 0.4

0.4
0.2

0.2 0.4

0.4
0.2

0.2 0.4

0.4
0.2

0.2 0.4

1
0.5

0.5 1

1
0.5

0.5 1

λ

1
0.5

0.5 1

1
0.5

0.5 1

λ

0.1

0.1

0.1

0.1

0.1

0.1

0.1

0.1

19

SLIDE 20

Accurate eigenvalue spectrum for the full Physical Review network by the rational interpolation method (left) and the HP Arnoldi method (right):

1
0.5

0.5 1

1
0.5

0.5 1

λ

1
0.5

0.5 1

1
0.5

0.5 1

λ

0.4
0.2

0.2 0.4

0.4
0.2

0.2 0.4

0.4
0.2

0.2 0.4

0.4
0.2

0.2 0.4

20

SLIDE 21

Fractal Weyl law

103 104 105 106 1920 1940 1960 1980 2000 Nt t t0 = 1791 τ = 11.4 Nt 2(t-t0)/τ 100 101 102 103 103 104 105 106 Nλ Nt a = 0.32 b = 0.51 λc = 0.50 Nλ = a (Nt)b nA = 4000 nA = 2000 100 101 102 103 103 104 105 106 Nλ Nt a = 0.24 b = 0.47 λc = 0.65 Nλ = a (Nt)b nA = 4000 nA = 2000 0.5 0.6 0.7 0.8 0.2 0.4 0.6 0.8 1 b λc

(b) (a) (c) (d)

Nλ = number of complex eigenvalues with λc ≤ |λ| ≤ 1. Nt = reduced network size of Physical Review at time t. Nλ = aN b

t

21

SLIDE 22

ImpactRank for influence propagation

10-8 10-7 10-6 10-5 10-4 10-3 10-2 10-1 100 100 101 102 103 104 105 106

P K

(a)

BCS Anderson Napoleon 10-8 10-7 10-6 10-5 10-4 10-3 10-2 10-1 100 100 101 102 103 104 105 106

P* K*

(b)

BCS Anderson Napoleon

vf = 1 − γ 1 − γG v0 , v∗

f =

1 − γ 1 − γG∗ v0 vf = “PageRank” of ˜ G with: ˜ G = γ G + (1 − γ) v0 eT

22

SLIDE 23

Integer network

Consider the integers n ∈ {1, . . . , N} and construct an adjacency matrix by Amn = k where k is the largest integer such that mk is a divisor of n if 1 < m < n and Amn = 0 if m = 1 or m = n (note

Amn = k = 0 if m is not a divisor of n). Construct S and G in the

usual way from A.

10-6 10-4 10-2 100 102 104 106 P K ~K −1 103 104 105 106 107

23

SLIDE 24

PageRank

10-8 10-6 10-4 10-2 100 102 104 106 108 P K ~K −1 107 108 109 107, PM 10-8 10-6 10-4 10-2 100 102 104 106 108 P K ~K −1 107 108 109 10-6 10-4 10-2 100 102 104 106 P n ~n −1 107 106 10-2 100 100 102 104 106 P n n 107 106

24

SLIDE 25

Dependence of n on K-index

3x106 2x106 106 1 106 5x105 1 n K 5x105 4x105 3x105 2x105 105 1 105 5x104 1 n K 5x105 4x105 3x105 2x105 105 1 105 5x104 1 n K 1.2x105 1.1x105 105 2.6x104 2.4x104 2.2x104 n K

red: N = 107 blue: N = 106 “New order” of integers: K = 1, 2, . . . , 32 ⇒ n = 2, 3, 5, 7, 4, 11,

13, 17, 6, 19, 9, 23, 29, 8, 31, 10, 37, 41, 43, 14, 47, 15, 53, 59, 61, 25, 67, 12, 71, 73, 22, 21.

25

SLIDE 26

Semi-analytical determination of spectrum, PageRank and eigenvectors

Matrix structure:

S = S0 + v d T

where v = e/N, dj = 1 for dangling nodes (primes and 1) and

dj = 0 otherwise. S0 is the pure link matrix which is nil-potent: Sl

0 = 0

with l = [log2(N)] ≪ N

⇒

same theory as for the Phys.-Rev. Network. 26

SLIDE 27

Spectrum I

1
0.5

0.5 1

1
0.5

0.5 1

λ

106

0.04
0.02

0.02 0.04

0.04
0.02

0.02 0.04

λ

106

1
0.5

0.5 1

1
0.5

0.5 1

λ

107

0.04
0.02

0.02 0.04

0.04
0.02

0.02 0.04

λ

107

blue dots: semi-analytical eigenvalues as zeros from Pr(λ) (or eigenvalues of ¯

S).

red crosses: Arnoldi method with random initial vector and nA = 1000. light blue boxes: Arnoldi method with constant initial vector v = e/N and nA = 1000.

27

SLIDE 28

Spectrum II

5 10 15 5 10 15 20 25 30 γj j 109 108 107 106 105 104 103 1 1.5 2 0.05 0.1 0.15 γ1 1/ln(N) γ1=1.020+7.15/ln(N)

γj = −2 ln |λj|

Large N limit of γ1 with the scaling parameter: 1/ ln(N). Note:

c0 = d Tv = 1 N

N

j=1

dj = 1 + π(N) N ≈ 1 ln(N)

where π(N) is the number of primes below N. 28

SLIDE 29

References

1. K. M. Frahm, A. D. Chepelianskii and D. L. Shepelyansky,

PageRank of integers, Phys. A: Math. Theor. 45, 405101 (2012).

2. K. M. Frahm, and D. L. Shepelyansky, Poisson statistics of

PageRank probabilities of Twitter and Wikipedia networks,

Eur. Phys. J. B, 87, 93 (2014).
3. K. M. Frahm, Y. H. Eom, and D. L. Shepelyansky, Google matrix
f the citation network of Physical Review, Phys. Rev. E 89,

Spectral properties of Google matrix

Lecture 3 Klaus Frahm

Contents

Random Perron-Frobenius matrices

Construct random matrix ensembles Gij such that:

same distribution P(Gij) (of finite variance σ2).

⇒ Gij = 1/N

and other eigenvalues λj = 0 (for j = 1).

eigenvalue density with R =

√ Nσ and one unit eigenvalue.

3

Different variants of the model:

⇒ R = 1/ √ 3N

P(G) = Q/2 for 0 ≤ G ≤ 2/Q with probability Q/N

and G = 0 with probability 1 − Q/N

⇒ R = 2/√3Q

G = 1/Q with probability Q/N

and G = 0 with probability 1 − Q/N

⇒ R = 1/√Q

2 < b < 3: ⇒ R = C(b) N 1−b/2 , C(b) = (b − 2)(b−1)/2

4

Numerical verification: uniform full:

N = 400

uniform sparse:

N = 400, Q = 20

power law:

b = 2.5

triangular random and average constant sparse:

N = 400, Q = 20

power law case:

Rth ∼ N −0.25

5

Poisson statistics of PageRank

Identify PageRank values to “energy-levels”:

P(i) = exp(−Ei/T)/Z

with Z =

choosen: T = 1). 6

Twitter Wikipedia

Parameter dependance of Ei = − ln(Pi) on the damping factor α. 7

Physical Review network

N = 463347 nodes and Nℓ = 4691015 links.

8

⇒ nearly triangular matrix structure of adjacency matrix: most

citations links t → t′ are for t > t′ (“past citations”) but there is small number (12126 = 2.6 × 10−3Nℓ) of links t → t′ with t ≤ t′ corresponding to future citations. Spectrum by “double-precision” Arnoldi method with nA = 8000:

λ

λ Numerical problem: eigenvalues with |λ| < 0.3 − 0.4 are not reliable! Reason: large Jordan subspaces associated to the eigenvalue λ = 0. 9

“very bad” Jordan perturbation theory: Consider a “perturbed” Jordan block of size D:

      0 1 · · · 0 0 0 0 · · · 0 0

. . . . . . ... . . . . . .

0 0 · · · 0 1 ε 0 · · · 0 0      

characteristic polynomial: λD − (−1)Dε

ε = 0 ⇒ λ = 0 ε = 0 ⇒ λj = −ε1/D exp(2πij/D)

for D ≈ 102 and ε = 10−16

⇒

“Jordan-cloud” of artifical eigenvalues due to rounding errors in the region |λ| < 0.3 − 0.4. 10

Triangular approximation

Remove the small number of links due to “future citations”. Semi-analytical diagonalization is possible:

S = S0 + e d T/N

where en = 1 for all nodes n, dn = 1 for dangling nodes n and

dn = 0 otherwise. S0 is the pure link matrix which is nil-potent: Sl

with l = 352. Let ψ be an eigenvector of S with eigenvalue λ and C = d Tψ.

11

have a solution ⇒ λ1 − S0 invertible.

⇒ ψ = C (λ1 − S0)−1 e/N = C λ

S0 λ j e/N . From λl = (d Tψ/C)λl ⇒ Pr(λ) = 0

with the reduced polynomial of degree l = 352 :

Pr(λ) = λl −

λl−1−j cj = 0 , cj = d T Sj

⇒ at most l = 352 eigenvalues λ = 0 which can be numerically

determined as the zeros of Pr(λ). However: still numerical problems:

12

Solution:

Using the multi precision library GMP with 256 binary digits the zeros of Pr(λ) can be determined with accuracy ∼

10−18.

Furthermore the Arnoldi method can also be implemented with higher precision.

13

Full Physical Review network

High precision Arnoldi method for full Physical Review network (including the “future citations”) for 52, 256, 512, 768 binary digits and

nA = 2000:

14