Google matrix analysis of directed networks Lecture 3 Klaus Frahm - - PowerPoint PPT Presentation

google matrix analysis of directed networks
SMART_READER_LITE
LIVE PREVIEW

Google matrix analysis of directed networks Lecture 3 Klaus Frahm - - PowerPoint PPT Presentation

Google matrix analysis of directed networks Lecture 3 Klaus Frahm Quantware MIPS Center Universit e Paul Sabatier Laboratoire de Physique Th eorique, UMR 5152, IRSAMC A. D. Chepelianskii, Y. H. Eom, L. Ermann, B. Georgeot, D. L.


slide-1
SLIDE 1

Google matrix analysis of directed networks

Lecture 3 Klaus Frahm

Quantware MIPS Center Universit´ e Paul Sabatier Laboratoire de Physique Th´ eorique, UMR 5152, IRSAMC

  • A. D. Chepelianskii, Y. H. Eom, L. Ermann, B. Georgeot, D. L. Shepelyansky

Networks and data mining Luchon, June 27 - July 11, 2015

slide-2
SLIDE 2

Contents

Random Perron-Frobenius matrices . . . . . . . . . . . . . 3 Poisson statistics of PageRank . . . . . . . . . . . . . . . . 6 Physical Review network . . . . . . . . . . . . . . . . . . . 8 Triangular approximation . . . . . . . . . . . . . . . . . . . 11 Full Physical Review network . . . . . . . . . . . . . . . . . 14 Fractal Weyl law . . . . . . . . . . . . . . . . . . . . . . . 17 ImpactRank for influence propagation . . . . . . . . . . . . 18 Integer network . . . . . . . . . . . . . . . . . . . . . . . . 19 References . . . . . . . . . . . . . . . . . . . . . . . . . . 25 Appendix: Rational interpolation method . . . . . . . . . . . 26 2

slide-3
SLIDE 3

Random Perron-Frobenius matrices

Construct random matrix ensembles Gij such that:

  • Gij ≥ 0
  • Gij are (approximately) non-correlated and distributed with the

same distribution P(Gij) (of finite variance σ2).

j Gij = 1

⇒ Gij = 1/N

  • ⇒ average of G has one eigenvalue λ1 = 1 (⇒ “flat” PageRank)

and other eigenvalues λj = 0 (for j = 1).

  • degenerate perturbation theory for the fluctuations ⇒ circular

eigenvalue density with R =

√ Nσ and one unit eigenvalue.

3

slide-4
SLIDE 4

Different variants of the model:

  • uniform full: P(G) = N/2 for 0 ≤ G ≤ 2/N

⇒ R = 1/ √ 3N

  • uniform sparse with Q non-zero elements per column:

P(G) = Q/2 for 0 ≤ G ≤ 2/Q with probability Q/N

and G = 0 with probability 1 − Q/N

⇒ R = 2/√3Q

  • constant sparse with Q non-zero elements per column:

G = 1/Q with probability Q/N

and G = 0 with probability 1 − Q/N

⇒ R = 1/√Q

  • powerlaw with p(G) = D(1 + aG)−b for 0 ≤ G ≤ 1 and

2 < b < 3: ⇒ R = C(b) N 1−b/2 , C(b) = (b − 2)(b−1)/2

b−1 3−b

4

slide-5
SLIDE 5

Numerical verification: uniform full:

N = 400

uniform sparse:

N = 400, Q = 20

power law:

b = 2.5

triangular random and average constant sparse:

N = 400, Q = 20

power law case:

Rth ∼ N −0.25

5

slide-6
SLIDE 6

Poisson statistics of PageRank

Identify PageRank values to “energy-levels”:

P(i) = exp(−Ei/T)/Z

with Z =

i exp(−Ei/T) and an effective temperature T (can be

choosen: T = 1). 6

slide-7
SLIDE 7

Parameter dependance of Ei = − ln(Pi) on the damping factor α. 7

slide-8
SLIDE 8

Physical Review network

N = 463347 nodes and Nℓ = 4691015 links.

Coarse-grained matrix structure (500 × 500 cells): left: time ordered right: journal and then time ordered “11” Journals of Physical Review: (Phys. Rev. Series I), Phys. Rev., Phys. Rev. Lett., (Rev. Mod. Phys.), Phys. Rev. A, B, C, D, E, (Phys. Rev. STAB and

  • Phys. Rev. STPER).

8

slide-9
SLIDE 9

⇒ nearly triangular matrix structure of adjacency matrix: most

citations links t → t′ are for t > t′ (“past citations”) but there is small number (12126 = 2.6 × 10−3Nℓ) of links t → t′ with t ≤ t′ corresponding to future citations. Spectrum by “double-precision” Arnoldi method with nA = 8000: Numerical problem: eigenvalues with |λ| < 0.3 − 0.4 are not reliable! Reason: large Jordan subspaces associated to the eigenvalue λ = 0. 9

slide-10
SLIDE 10

“very bad” Jordan perturbation theory: Consider a “perturbed” Jordan block of size D:

    

0 1 · · · 0 0 0 0 · · · 0 0

. . . . . . ... . . . . . .

0 0 · · · 0 1 ε 0 · · · 0 0

    

characteristic polynomial: λD − (−1)Dε

ε = 0 ⇒ λ = 0 ε = 0 ⇒ λj = −ε1/D exp(2πij/D)

for D ≈ 102 and ε = 10−16

“Jordan-cloud” of artifical eigenvalues due to rounding errors in the region |λ| < 0.3 − 0.4. 10

slide-11
SLIDE 11

Triangular approximation

Remove the small number of links due to “future citations”. Semi-analytical diagonalization is possible:

S = S0 + e d T/N

where en = 1 for all nodes n, dn = 1 for dangling nodes n and

dn = 0 otherwise. S0 is the pure link matrix which is nil-potent: Sl

0 = 0

with l = 352. Let ψ be an eigenvector of S with eigenvalue λ and C = d Tψ.

  • If C = 0 ⇒ ψ eigenvector of S0 ⇒ λ = 0 since S0 nil-potent.

These eigenvectors belong to large Jordan blocks and are responsible for the numerical problems. Note: Similar situation as in network of integer numbers where l = [log2(N)] and numerical instability for |λ| < 0.01.

11

slide-12
SLIDE 12
  • If C = 0 ⇒ λ = 0 since the equation S0ψ = −C e/N does not

have a solution ⇒ λ1 − S0 invertible.

⇒ ψ = C (λ1 − S0)−1 e/N = C λ

l−1

  • j=0
  • S0

λ

j

e/N . From λl = (d Tψ/C)λl ⇒ Pr(λ) = 0

with the reduced polynomial of degree l = 352 :

Pr(λ) = λl −

l−1

  • j=0

λl−1−j cj = 0 , cj = d T Sj

0 e/N .

⇒ at most l = 352 eigenvalues λ = 0 which can be numerically

determined as the zeros of Pr(λ). However: still numerical problems:

  • cl−1 ≈ 3.6 × 10−352
  • alternate sign problem with a strong loss of significance.
  • big sensitivity of eigenvalues on cj

12

slide-13
SLIDE 13

Solution:

Using the multi precision library GMP with 256 binary digits the zeros of Pr(λ) can be determined with accuracy ∼

10−18.

Furthermore the Arnoldi method can also be implemented with higher precision.

red crosses: zeros of Pr(λ) from 256 binary digits calculation blue squares: eigenvalues from Arnoldi method with 52, 256, 512, 1280 binary digits. In the last case: ⇒ break off at nA = 352 with vanishing coupling element.

13

slide-14
SLIDE 14

Full Physical Review network

Complications due to small number of “future citations” which break the triangular structure ⇒ two groups of eigenvectors ψ:

  • 1. d Tψ = 0

common eigenvector/eigenvalue of S and S0, essentially : λ = ±1/√n with n = 1, 2, 3, . . . and large degeneracies.

  • 2. d Tψ = 0

⇒ R(λ) = 0 with a rational function: R(λ) = 1 −

  • j=0

cjλ−1−j , cj = d T Sj

0 e/N

with convergence for |λ| > ρ1 ≈ 0.9024. The zeros of R(λ) with

|λ| < ρ1 can be determined by a rational interpolation using many

support points with |zj| = 1 where the series to evaluate R(zi) converges well

rational interpolation method (requires also high precision computations, details in Appendix). 14

slide-15
SLIDE 15

Accurate eigenvalue spectrum for the full Physical Review network by the rational interpolation method (left) and the HP Arnoldi method (right):

15

slide-16
SLIDE 16

Degeneracies

High precision in Arnoldi method is “bad” to count the degeneracy of certain degenerate eigenvalues (of first group). In theory the Arnoldi method cannot find several eigenvectors for degenerate eigenvalues, a shortcoming which is (partly) “repaired” by rounding errors.

16

slide-17
SLIDE 17

Fractal Weyl law

Nλ = number of complex eigenvalues with λc ≤ |λ| ≤ 1. Nt = reduced network size of Physical Review at time t. Nλ = aN b

t

17

slide-18
SLIDE 18

ImpactRank for influence propagation

vf = 1 − γ 1 − γG v0 , v∗

f =

1 − γ 1 − γG∗ v0 vf = “PageRank” of ˜ G with: ˜ G = γ G + (1 − γ) v0 eT

18

slide-19
SLIDE 19

Integer network

Consider the integers n ∈ {1, . . . , N} and construct an adjacency matrix by Amn = k where k is the largest integer such that mk is a divisor of n if 1 < m < n and Amn = 0 if m = 1 or m = n (note

Amn = k = 0 if m is not a divisor of n). Construct S and G in the

usual way from A. 19

slide-20
SLIDE 20

PageRank

20

slide-21
SLIDE 21

Dependence of n on K-index

red: N = 107 blue: N = 106 “New order” of integers: K = 1, 2, . . . , 32 ⇒ n = 2, 3, 5, 7, 4, 11,

13, 17, 6, 19, 9, 23, 29, 8, 31, 10, 37, 41, 43, 14, 47, 15, 53, 59, 61, 25, 67, 12, 71, 73, 22, 21.

21

slide-22
SLIDE 22

Semi-analytical determination of spectrum, PageRank and eigenvectors

Matrix structure:

S = S0 + v d T

where v = e/N, dj = 1 for dangling nodes (primes and 1) and

dj = 0 otherwise. S0 is the pure link matrix which is nil-potent: Sl

0 = 0

with l = [log2(N)] ≪ N

same theory as for the Phys.-Rev. Network. 22

slide-23
SLIDE 23

Spectrum I

blue dots: semi-analytical eigenvalues as zeros from Pr(λ) (or eigenvalues of ¯

S).

red crosses: Arnoldi method with random initial vector and nA = 1000. light blue boxes: Arnoldi method with constant initial vector v = e/N and nA = 1000.

23

slide-24
SLIDE 24

Spectrum II

γj = −2 ln |λj|

Large N limit of γ1 with the scaling parameter: 1/ ln(N). Note:

c0 = d Tv = 1 N

N

  • j=1

dj = 1 + π(N) N ≈ 1 ln(N)

where π(N) is the number of primes below N. 24

slide-25
SLIDE 25

References

  • 1. K. M. Frahm, A. D. Chepelianskii and D. L. Shepelyansky,

PageRank of integers, Phys. A: Math. Theor. 45, 405101 (2012).

  • 2. K. M. Frahm, and D. L. Shepelyansky, Poisson statistics of

PageRank probabilities of Twitter and Wikipedia networks,

  • Eur. Phys. J. B, 87, 93 (2014).
  • 3. K. M. Frahm, Y. H. Eom, and D. L. Shepelyansky, Google matrix
  • f the citation network of Physical Review, Phys. Rev. E 89,

052814 (2014). 25

slide-26
SLIDE 26

Appendix: Rational interpolation method

High precision Arnoldi method for full Physical Review network (including the “future citations”) for 52, 256, 512, 768 binary digits and

nA = 2000:

26

slide-27
SLIDE 27

Semi-analytical argument for the full PR network:

S = S0 + e d T/N

There are two groups of eigenvectors ψ with: Sψ = λψ

  • 1. Those with d Tψ = 0

⇒ ψ is also an eigenvector of S0.

Generically an arbitrary eigenvector of S0 is not an eigenvector of

S unless the eigenvalue is degenerate with degeneracy m > 1.

Using linear combinations of different eigenvectors for the same eigenvalue one can construct m − 1 eigenvectors ψ respecting

d Tψ = 0 which are therefore eigenvectors of S.

Pratically: determine degenerate subspace eigenvalues of S0 (and also of ST

0 ) which are of the form: λ = ±1/√n with

n = 1, 2, 3, . . . due to 2 × 2-blocks:

  • 1/n1

1/n2

λ = ± 1 √n1 n2 .

27

slide-28
SLIDE 28
  • 2. Those with d Tψ = 0

⇒ R(λ) = 0 with the rational function: R(λ) = 1 − dT 1 λ 1 − S0 e/N = 1 −

  • j,q

Cjq (λ − ρj)q

Here Cjq and ρj are unknown, except for

ρ1 = 2 Re [(9 + i √ 119)1/3]/(135)1/3 ≈ 0.9024 and ρ2,3 = ±1/ √ 2 ≈ ±0.7071.

Idea: Expand the geometric matrix series ⇒

R(λ) = 1 −

  • j=0

cjλ−1−j , cj = d T Sj

0 e/N

which converges for |λ| > ρ1 ≈ 0.9024 since cj ∼ ρj

1 for j → ∞.

Problem: How to determine the zeros of R(λ) with |λ| < ρ1 ? 28

slide-29
SLIDE 29

Analytic continuation by rational interpolation: Use the series to evaluate R(z) at nS support points

zj = exp(2πij/nS) with a given precision of p binary digits and

determine the rational function RI(z) which interpolates R(z) at these support points. Two cases:

nS = 2nR + 1 ⇒ RI(z) = PnR(z) QnR(z) nS = 2nR + 2 ⇒ RI(z) = PnR(z) QnR+1(z)

The nR zeros of PnR(z) are approximations of the eigenvalues of

S (of the 2nd group).

For a given precision, e. g. p = 1024 binary digits one can obtain a certain number of reliable eigenvalues, e. g. nR = 300. The method can be pushed up to p = 16384 and nR = 2500 which is better than the high precision Arnoldi method with nA = 2000. 29

slide-30
SLIDE 30

Examples:

Some “artificial zeros” for nR = 340 and p = 1024 (left top and middle panels) where both variants of the method differ. For nR = 300 and p = 1024 most zeros coincide with HP Arnoldi method (right top and middle panels) and both variants of the method coincide. Lower panels: comparison for nR =

2000, p = 12288 (left) or for nR = 2500, p = 16384 with HP Arnoldi

method.

30