Google matrix analysis of directed networks Lecture 3 Klaus Frahm - PowerPoint PPT Presentation

Google matrix analysis of directed networks Lecture 3 Klaus Frahm Quantware MIPS Center Universit´ e Paul Sabatier Laboratoire de Physique Th´ eorique, UMR 5152, IRSAMC A. D. Chepelianskii, Y. H. Eom, L. Ermann, B. Georgeot, D. L. Shepelyansky Networks and data mining Luchon, June 27 - July 11, 2015

Contents Random Perron-Frobenius matrices . . . . . . . . . . . . . 3 Poisson statistics of PageRank . . . . . . . . . . . . . . . . 6 Physical Review network . . . . . . . . . . . . . . . . . . . 8 Triangular approximation . . . . . . . . . . . . . . . . . . . 11 Full Physical Review network . . . . . . . . . . . . . . . . . 14 Fractal Weyl law . . . . . . . . . . . . . . . . . . . . . . . 17 ImpactRank for influence propagation . . . . . . . . . . . . 18 Integer network . . . . . . . . . . . . . . . . . . . . . . . . 19 References . . . . . . . . . . . . . . . . . . . . . . . . . . 25 Appendix: Rational interpolation method . . . . . . . . . . . 26 2

Random Perron-Frobenius matrices Construct random matrix ensembles G ij such that: • G ij ≥ 0 • G ij are (approximately) non-correlated and distributed with the same distribution P ( G ij ) (of finite variance σ 2 ). • � j G ij = 1 ⇒ � G ij � = 1 /N • ⇒ average of G has one eigenvalue λ 1 = 1 ( ⇒ “flat” PageRank) and other eigenvalues λ j = 0 (for j � = 1 ). • degenerate perturbation theory for the fluctuations ⇒ circular √ eigenvalue density with R = Nσ and one unit eigenvalue. 3

Different variants of the model: • uniform full : P ( G ) = N/ 2 for 0 ≤ G ≤ 2 /N √ ⇒ R = 1 / 3 N • uniform sparse with Q non-zero elements per column: P ( G ) = Q/ 2 for 0 ≤ G ≤ 2 /Q with probability Q/N and G = 0 with probability 1 − Q/N R = 2 / √ 3 Q ⇒ • constant sparse with Q non-zero elements per column: G = 1 /Q with probability Q/N and G = 0 with probability 1 − Q/N R = 1 / √ Q ⇒ • powerlaw with p ( G ) = D (1 + aG ) − b for 0 ≤ G ≤ 1 and 2 < b < 3 : C ( b ) = ( b − 2) ( b − 1) / 2 � b − 1 R = C ( b ) N 1 − b/ 2 ⇒ , 3 − b 4

Numerical verification: triangular random and uniform full: N = 400 average uniform sparse: constant sparse: N = 400 , N = 400 , Q = 20 Q = 20 power law: power law case: R th ∼ N − 0 . 25 b = 2 . 5 5

Poisson statistics of PageRank Identify PageRank values to “energy-levels”: P ( i ) = exp( − E i /T ) /Z with Z = � i exp( − E i /T ) and an effective temperature T (can be choosen: T = 1 ). 6

Parameter dependance of E i = − ln( P i ) on the damping factor α . 7

Physical Review network N = 463347 nodes and N ℓ = 4691015 links. Coarse-grained matrix structure ( 500 × 500 cells): left: time ordered right: journal and then time ordered “11” Journals of Physical Review: (Phys. Rev. Series I), Phys. Rev., Phys. Rev. Lett., (Rev. Mod. Phys.), Phys. Rev. A, B, C, D, E, (Phys. Rev. STAB and Phys. Rev. STPER). 8

⇒ nearly triangular matrix structure of adjacency matrix: most citations links t → t ′ are for t > t ′ (“past citations”) but there is small number ( 12126 = 2 . 6 × 10 − 3 N ℓ ) of links t → t ′ with t ≤ t ′ corresponding to future citations . Spectrum by “double-precision” Arnoldi method with n A = 8000 : Numerical problem: eigenvalues with | λ | < 0 . 3 − 0 . 4 are not reliable! Reason: large Jordan subspaces associated to the eigenvalue λ = 0 . 9

“very bad” Jordan perturbation theory: Consider a “perturbed” Jordan block of size D :   0 1 · · · 0 0 0 0 · · · 0 0   . . . . ... . . . . . . . .     0 0 · · · 0 1   ε 0 · · · 0 0 characteristic polynomial: λ D − ( − 1) D ε ε = 0 ⇒ λ = 0 λ j = − ε 1 /D exp(2 πij/D ) ε � = 0 ⇒ for D ≈ 10 2 and ε = 10 − 16 ⇒ “Jordan-cloud” of artifical eigenvalues due to rounding errors in the region | λ | < 0 . 3 − 0 . 4 . 10

Triangular approximation Remove the small number of links due to “future citations”. Semi-analytical diagonalization is possible: S = S 0 + e d T /N where e n = 1 for all nodes n , d n = 1 for dangling nodes n and d n = 0 otherwise. S 0 is the pure link matrix which is nil-potent : S l 0 = 0 with l = 352 . Let ψ be an eigenvector of S with eigenvalue λ and C = d T ψ . • If C = 0 ⇒ ψ eigenvector of S 0 ⇒ λ = 0 since S 0 nil-potent. These eigenvectors belong to large Jordan blocks and are responsible for the numerical problems. Note: Similar situation as in network of integer numbers where l = [log 2 ( N )] and numerical instability for | λ | < 0 . 01 . 11

• If C � = 0 ⇒ λ � = 0 since the equation S 0 ψ = − C e/N does not have a solution ⇒ λ 1 − S 0 invertible. l − 1 � j � ⇒ ψ = C ( λ 1 − S 0 ) − 1 e/N = C S 0 � e/N . λ λ j =0 From λ l = ( d T ψ/C ) λ l ⇒ P r ( λ ) = 0 with the reduced polynomial of degree l = 352 : l − 1 P r ( λ ) = λ l − λ l − 1 − j c j = 0 c j = d T S j � , 0 e/N . j =0 ⇒ at most l = 352 eigenvalues λ � = 0 which can be numerically determined as the zeros of P r ( λ ) . However: still numerical problems: • c l − 1 ≈ 3 . 6 × 10 − 352 • alternate sign problem with a strong loss of significance. • big sensitivity of eigenvalues on c j 12

Solution: Using the multi precision library GMP with 256 binary digits the zeros of P r ( λ ) can be determined with accuracy ∼ 10 − 18 . Furthermore the Arnoldi method can also be implemented with higher precision. zeros of P r ( λ ) from 256 binary red crosses: digits calculation blue squares: eigenvalues from Arnoldi method with 52, 256, 512, 1280 binary digits. In the last case: ⇒ break off at n A = 352 with vanishing coupling element. 13

Full Physical Review network Complications due to small number of “future citations” which break the triangular structure ⇒ two groups of eigenvectors ψ : 1. d T ψ = 0 ⇒ common eigenvector/eigenvalue of S and S 0 , essentially : λ = ± 1 / √ n with n = 1 , 2 , 3 , . . . and large degeneracies. 2. d T ψ � = 0 ⇒ R ( λ ) = 0 with a rational function: ∞ c j = d T S j � c j λ − 1 − j R ( λ ) = 1 − , 0 e/N j =0 with convergence for | λ | > ρ 1 ≈ 0 . 9024 . The zeros of R ( λ ) with | λ | < ρ 1 can be determined by a rational interpolation using many support points with | z j | = 1 where the series to evaluate R ( z i ) ⇒ converges well rational interpolation method (requires also high precision computations, details in Appendix). 14

Accurate eigenvalue spectrum for the full Physical Review network by the rational interpolation method (left) and the HP Arnoldi method (right): 15

Degeneracies High precision in Arnoldi method is “bad” to count the degeneracy of certain degenerate eigenvalues (of first group). In theory the Arnoldi method cannot find several eigenvectors for degenerate eigenvalues, a shortcoming which is (partly) “repaired” by rounding errors. 16

Fractal Weyl law N λ = number of complex eigenvalues with λ c ≤ | λ | ≤ 1 . N t = reduced network size of Physical Review at time t . N λ = aN b t 17

ImpactRank for influence propagation v f = 1 − γ 1 − γ v ∗ 1 − γG v 0 , f = 1 − γG ∗ v 0 v f = “PageRank” of ˜ G with: ˜ G = γ G + (1 − γ ) v 0 e T 18

Integer network Consider the integers n ∈ { 1 , . . . , N } and construct an adjacency matrix by A mn = k where k is the largest integer such that m k is a divisor of n if 1 < m < n and A mn = 0 if m = 1 or m = n (note A mn = k = 0 if m is not a divisor of n ). Construct S and G in the usual way from A . 19

PageRank 20

Dependence of n on K -index red: N = 10 7 blue: N = 10 6 “New order” of integers: K = 1 , 2 , . . . , 32 ⇒ n = 2 , 3 , 5 , 7 , 4 , 11 , 13 , 17 , 6 , 19 , 9 , 23 , 29 , 8 , 31 , 10 , 37 , 41 , 43 , 14 , 47 , 15 , 53 , 59 , 61 , 25 , 67 , 12 , 71 , 73 , 22 , 21 . 21

Semi-analytical determination of spectrum, PageRank and eigenvectors Matrix structure: S = S 0 + v d T where v = e/N , d j = 1 for dangling nodes (primes and 1) and d j = 0 otherwise. S 0 is the pure link matrix which is nil-potent : S l 0 = 0 with l = [log 2 ( N )] ≪ N ⇒ same theory as for the Phys.-Rev. Network. 22

Spectrum I blue dots: semi-analytical eigenvalues as zeros from P r ( λ ) (or eigenvalues of ¯ S ). red crosses: Arnoldi method with random initial vector and n A = 1000 . light blue boxes: Arnoldi method with constant initial vector v = e/N and n A = 1000 . 23

Spectrum II γ j = − 2 ln | λ j | Large N limit of γ 1 with the scaling parameter: 1 / ln( N ) . Note: N c 0 = d T v = 1 d j = 1 + π ( N ) 1 � ≈ N N ln( N ) j =1 where π ( N ) is the number of primes below N . 24

References 1. K. M. Frahm, A. D. Chepelianskii and D. L. Shepelyansky, PageRank of integers , Phys. A: Math. Theor. 45 , 405101 (2012). 2. K. M. Frahm, and D. L. Shepelyansky, Poisson statistics of PageRank probabilities of Twitter and Wikipedia networks , Eur. Phys. J. B, 87 , 93 (2014). 3. K. M. Frahm, Y. H. Eom, and D. L. Shepelyansky, Google matrix of the citation network of Physical Review , Phys. Rev. E 89 , 052814 (2014). 25

Appendix: Rational interpolation method High precision Arnoldi method for full Physical Review network (including the “future citations”) for 52, 256, 512, 768 binary digits and n A = 2000 : 26

Google matrix analysis of directed networks Lecture 3 Klaus Frahm - PowerPoint PPT Presentation

Google matrix analysis of directed networks Lecture 3 Klaus Frahm Quantware MIPS Center Universit e Paul Sabatier Laboratoire de Physique Th eorique, UMR 5152, IRSAMC A. D. Chepelianskii, Y. H. Eom, L. Ermann, B. Georgeot, D. L.

Finding Strongly Connected Components Directed Acyclic Graphs Directed Acyclic Graphs Directed

RPC Metrics at Google JBD, Google (@rakyll) gRPC Metrics at Google JBD, Google (@rakyll)

[3] The Matrix What is a matrix? Traditional answer Neo: What is the Matrix? Trinity: The answer

Matrix Multiplication Matrix Multiplication via Matrix-Vector Mult Defn. If matrix A is m n

Websites from Presentation Search Engines Google https://www.google.com/ Google Scholar

BRAINJAR HOW GOOGLE THINKS AND DISPELLING 3 GOOGLE MYTHS (& 6 TIPS!) BRAINJAR HOW GOOGLE

Containers At Scale At Google, the Google Cloud Platform and Beyond Joe Beda jbeda@google.com

Google matrix analysis of directed networks Lecture 1 Klaus Frahm Quantware MIPS Center

Google matrix analysis of directed networks Lecture 2 Klaus Frahm Quantware MIPS Center

Building an IoT Platform with Matrix matthew@matrix.org http://www.matrix.org What is Matrix?

Introductory Matrix Operations Matrix Entries Defn. For matrix A , notation a ij means the en-

Gov 2000: 10. Multiple Regression in Matrix Form Matthew Blackwell Fall 2016 1 / 64 1. Matrix

Liberating Communication with Matrix matthew@matrix.org http://www.matrix.org What is Matrix?

Syntax Directed Analysis Chapter 5 1 Compiler Construction Syntax Directed Analysis

Incidence Relations and Directed Cycles Hao Wu George Washington University Directed graphs and

3.5 Connectivity in Directed Graphs Directed Graphs Directed graph. G = (V, E) Edge (u, v)

Algebraic Tools for Exact Geometric Computing I - Exact Arithmetic and Filtering Michael Hemmer

Algorithm-Architecture Codesign George A. Constantinides Circuits and Systems Group Imperial

ALICe: A Benchmark to Improve Affine Loop Invariant Computation Vivien Maisonneuve Seventh

Enclosures of Roundoff Errors using SDP Victor Magron , CNRS Jointly Certified Upper Bounds with G.

SAMURAI Francesco Tramontano CERN Theory Group work done in collaboration with P. Mastrolia, G.

GCC Configuration and Building Uday Khedker (www.cse.iitb.ac.in/grc) GCC Resource Center,

Multiprocessor Operating Systems CS 6410: Advanced Systems Kai Mast Department of Computer

14/04/2016 Global vs Partitioned scheduling Single shared queue instead of multiple dedicated

Sambuz

Useful Links

Newsletter

Mail Us

Google matrix analysis of directed networks Lecture 3 Klaus Frahm - PowerPoint PPT Presentation

Google matrix analysis of directed networks Lecture 3 Klaus Frahm Quantware MIPS Center Universit e Paul Sabatier Laboratoire de Physique Th eorique, UMR 5152, IRSAMC A. D. Chepelianskii, Y. H. Eom, L. Ermann, B. Georgeot, D. L.

Finding Strongly Connected Components Directed Acyclic Graphs Directed Acyclic Graphs Directed

RPC Metrics at Google JBD, Google (@rakyll) gRPC Metrics at Google JBD, Google (@rakyll)

[3] The Matrix What is a matrix? Traditional answer Neo: What is the Matrix? Trinity: The answer

Matrix Multiplication Matrix Multiplication via Matrix-Vector Mult Defn. If matrix A is m n

Websites from Presentation Search Engines Google https://www.google.com/ Google Scholar

BRAINJAR HOW GOOGLE THINKS AND DISPELLING 3 GOOGLE MYTHS (&amp; 6 TIPS!) BRAINJAR HOW GOOGLE

Containers At Scale At Google, the Google Cloud Platform and Beyond Joe Beda jbeda@google.com

Google matrix analysis of directed networks Lecture 1 Klaus Frahm Quantware MIPS Center

Google matrix analysis of directed networks Lecture 2 Klaus Frahm Quantware MIPS Center

Building an IoT Platform with Matrix matthew@matrix.org http://www.matrix.org What is Matrix?

Introductory Matrix Operations Matrix Entries Defn. For matrix A , notation a ij means the en-

Gov 2000: 10. Multiple Regression in Matrix Form Matthew Blackwell Fall 2016 1 / 64 1. Matrix

Liberating Communication with Matrix matthew@matrix.org http://www.matrix.org What is Matrix?

Syntax Directed Analysis Chapter 5 1 Compiler Construction Syntax Directed Analysis

Incidence Relations and Directed Cycles Hao Wu George Washington University Directed graphs and

3.5 Connectivity in Directed Graphs Directed Graphs Directed graph. G = (V, E) Edge (u, v)

Algebraic Tools for Exact Geometric Computing I - Exact Arithmetic and Filtering Michael Hemmer

Algorithm-Architecture Codesign George A. Constantinides Circuits and Systems Group Imperial

ALICe: A Benchmark to Improve Affine Loop Invariant Computation Vivien Maisonneuve Seventh

Enclosures of Roundoff Errors using SDP Victor Magron , CNRS Jointly Certified Upper Bounds with G.

SAMURAI Francesco Tramontano CERN Theory Group work done in collaboration with P. Mastrolia, G.

GCC Configuration and Building Uday Khedker (www.cse.iitb.ac.in/grc) GCC Resource Center,

Multiprocessor Operating Systems CS 6410: Advanced Systems Kai Mast Department of Computer

14/04/2016 Global vs Partitioned scheduling Single shared queue instead of multiple dedicated

Sambuz

Useful Links

Newsletter

Mail Us

BRAINJAR HOW GOOGLE THINKS AND DISPELLING 3 GOOGLE MYTHS (& 6 TIPS!) BRAINJAR HOW GOOGLE