Numerical Methods for Rapid Computation of PageRank Gene H. Golub - PowerPoint PPT Presentation

Numerical Methods for Rapid Computation of PageRank Gene H. Golub Stanford University Stanford, CA USA Joint work with Chen Greif

Outline Markov Chains and PageRank 1 Definition Acceleration Techniques 2 Sequence extrapolation Adaptive Computation Other Techniques Arnoldi Based Methods 3 A refined Arnoldi algorithm Sensitivity Numerical experiments 2

Stationary Distribution Vector of a Transition Probability Matrix We are seeking a row vector π T that satisfies π T = π T P where P is a square stochastic matrix, with nonnegative entries between 0 and 1, and Pe = e , where e is a vector of all-ones. Theorem Perron(1907)-Frobenius(1912): A nonnegative irreducible matrix has a simple real eigenvalue equal to its spectral radius, whose associated eigenvector is a vector all of whose entries are nonnegative. What happens when P is stochastic and possibly reducible? 4

What Is PageRank? Definition Given a Webpage database, the PageRank of the i th Webpage is the i th element π i of the stationary distribution vector π that satisfies π T P = π T , where P is a matrix of weights of webpages that indicate their importance. 5

What Is PageRank? Definition Given a Webpage database, the PageRank of the i th Webpage is the i th element π i of the stationary distribution vector π that satisfies π T P = π T , where P is a matrix of weights of webpages that indicate their importance. Difficulties 1 P is too large (size possibly in the billions) for forming any of our favorite decompositions. 2 P could be reducible, contain zero rows, and other difficulties of this sort. 6

What Is PageRank? Definition Given a Webpage database, the PageRank of the i th Webpage is the i th element π i of the stationary distribution vector π that satisfies π T P = π T , where P is a matrix of weights of webpages that indicate their importance. Difficulties 1 P is too large (size possibly in the billions) for forming any of our favorite decompositions. 2 P could be reducible, contain zero rows, and other difficulties of this sort. How do we modify P so that there is a unique solution? 7

Links determine the importance of a webpage The fundamental idea of Brin & Page: Importance of a webpage is determined not by its contents but rather by which pages link to it. Apply the power method to a web link graph. 8

Some issues with web link graphs Difficulties 1 The existence of dangling nodes (correspond to an all-zero row in the matrix): could have very important pages that have no outlinks. (e.g. the U.S. constitution!) 2 Periodicity: a cyclic path in the Webgraph. (e.g. You point only to your mom’s webpage and she points only to yours.) Simple example: � 0 � 1 P = . 1 0 9

Some issues with web link graphs Difficulties 1 The existence of dangling nodes (correspond to an all-zero row in the matrix): could have very important pages that have no outlinks. (e.g. the U.S. constitution!) 2 Periodicity: a cyclic path in the Webgraph. (e.g. You point only to your mom’s webpage and she points only to yours.) Simple example: � 0 � 1 P = . 1 0 Solution Set M ( c ) = cP + (1 − c ) E, where E is a positive rank-1 matrix. 10

The matrix M ( c ) We have M ( c ) > 0 which yields a unique solution. But what is the significance of the stationary probability vector? M ( c ) is a Markov chain with positive entries, and M ( c ) z ( c ) = z ( c ) . Therefore for c < 1, z ( c ) is unique (under proper scaling). 11

Simple example (Glynn and G.) For the identity matrix, P = I , no unique stationary probability distribution, but for M ( c ) = cI + (1 − c ) ee T / n we are converging to z ( c ) = 1 ne . 12

The significance of the parameter c c is the probability that a surfer will follow an outlink (as opposed to jump randomly to another Webpage). c = 0 . 85 was the choice in the Brin & Page model. Like regularization: small value leads to a more stable computation, but further away from true solution. 13

Brin & Page’s Strategy: Apply Power Method For Google, it all boiled down originally to solving the eigenvalue problem x = Mx using the power method x ( k +1) = Mx ( k ) . 14

Discussion Let Mz i = λ i z i . For | λ i | � = | λ j | we have x (0) = � α i z i , and x ( k ) = � α i λ k i z i , with � x ( k ) � 1 = 1 and x ≥ 0. After normalization, for λ 1 = 1 we have n x ( k ) = z 1 + � β j λ k j z j . j =2 15

The Eigenvalues of the PageRank Matrix Theorem (Elegant proof due to Eld´ en) Let P be a column-stochastic matrix with eigenvalues { 1 , λ 2 , λ 3 , . . . , λ n } . Then the eigenvalues of M ( c ) = cP + (1 − c ) ve T , where 0 < c < 1 and v is a nonnegative vector with e T v = 1 , are { 1 , c λ 2 , c λ 3 , . . . , c λ n } . This implies | λ j | | λ 1 | ≤ c . 16

Quadratic Extrapolation (Kamvar, Haveliwala, Manning, G.) Slowly convergent series can be replaced by series that converge to the same limit at a much faster rate. Idea: Estimate components of current iterate in the directions of second and third eigenvectors, and eliminate them. 18

Quadratic Extrapolation Suppose M has three distinct eigenvalues. The minimal polynomial is given by P M ( λ ) = γ 0 + γ 1 λ + γ 2 λ 2 + γ 3 λ 3 . By the Cayley-Hamilton theorem, P M ( M ) = 0. Hence for any vector z , P M ( M ) z = ( γ 0 + γ 1 M + γ 2 M 2 + γ 3 M 3 ) z = 0 . 19

Quadratic Extrapolation (cont.) Set z = x ( k − 3) and use the fact that x ( k − 2) = Mx ( k − 3) and so on. Thus, ( x ( k − 2) − x ( k − 3) ) γ 1 + ( x ( k − 1) − x ( k − 3) ) γ 2 + ( x ( k ) − x ( k − 3) ) γ 3 = 0 . Defining y ( k − j ) = x ( k − j ) − x ( k − 3) , j = 1 , 2 , 3 , and setting γ 3 = 1 (to avoid getting a trivial solution γ = 0 ), get ( y ( k − 2) y ( k − 1) )[ γ 1 γ 2 ] T = − y ( k ) . Now, since M has more than three eigenvalues, solve a least squares problem. 20

The dynamic nature of the web This problem involves a matrix which is changing over time. States increase and decrease, i.e. new websites are introduced and old websites die. Websites are continually changing. M is a function of time and so is its dimension. 21

Adaptive Computation (joint with Kamvar and Haveliwala) Most pages converge rapidly. Basic idea: when the PageRank of a page has converged, stop recomputing it. = M N x ( k ) ; x ( k +1) N x ( k +1) = x ( k ) C . C Use the previous vector as a start vector. Nice speedup, but not great. Why? The old pages converge quickly, but the new pages still take long to converge. Web constantly changes! Addition, deletion, change of existing pages... But, if you use Adaptive PageRank, you save the computation of the old pages. 22

Example: Stanford-Berkeley, n ≈ 700000 23

Other Effective Approaches Aggregation/Disaggregation. (Stewart, Langville & Meyer, .....) Approaches related to permutations of the Google matrix. (Del Corso et. al., Kamvar et. al.) Linear system formulation. (Arasu et. al.) and more... Survey paper: A survey of eigenvector methods of Web information retrieval by Amy Langville and Carl Meyer. Stability and convergence analysis: Ipsen & Kirkland. 24

Using the Arnoldi method for PageRank (joint with Chen Greif) Arnoldi method: The Arnoldi method is generally used for generating a small upper Hessenberg that approximates some of the eigenvalues of the original matrix. When Q is orthogonal, Q T MQ ( Q T x ) = λ ( Q T x ) . 1 Find H = Q T MQ upper Hessenberg, then perform the computations for H instead of M . 2 M is n -by- n and is huge, but we terminate the process after k steps. Resulting H is ( k + 1)-by- k . 26

Computational Cost 1 Main cost: One matrix-vector product (with original large matrix) per iteration. 2 Inner products and norm computations. 3 Power method cheaper but not by much if matrix-vector products dominate. 27

An Arnoldi/SVD algorithm for computing PageRank Similar to computing refined Ritz vectors (Jia, Stewart), but pretend largest eigenvalue stays 1 in smaller space, i.e. we do not compute any Ritz values. Set initial guess q and k , the Arnoldi steps number Repeat ..... [ Q , H ] = Arnoldi ( A , q , k ) ..... Compute H − [ I ; 0] = U Σ V T ..... Set v = V (: , k ) ..... Set q = Qv Until σ min ( H − [ I ; 0]) < ε 28

Advantages Orthogonalization achieves effective separation of eigenvectors. Take advantage of knowing the largest eigenvalue. Largest Ritz value could be complex, but if we set the shift to 1 then no risk of complex arithmetic. Smallest singular value converges smoothly to zero (more smoothly than largest Ritz value converges to 1). Stopping criterion with no computational overhead: � Aq − q � 2 = σ min ( H − [ I ; 0]) . 29

Numerical Methods for Rapid Computation of PageRank Gene H. Golub - PowerPoint PPT Presentation

Numerical Methods for Rapid Computation of PageRank Gene H. Golub Stanford University Stanford, CA USA Joint work with Chen Greif Outline Markov Chains and PageRank 1 Definition Acceleration Techniques 2 Sequence extrapolation Adaptive

Graph Mining - PageRank Mert Terzihan-Zhixiong Chen Content 1. Web as a Graph 2. Why is

The PageRank Algorithm and Web Search John Orr Engines Introduction PageRank Computation

PageRank CS16: Introduction to Data Structures & Algorithms Spring 2020 Outline The WWW

Rapid Response Jobs are Alaskas Future Rapid Response Rapid Response Rapid Response is a

PAGERANK-RELATED METHODS FOR ANALYZING CITATION NETWORKS Author: Ludo Waltman and Erjia Yan

IV.4 Topic-Specific & Personalized PageRank PageRank produces one-size-fits-all

PageRank Google's PageRank algorithm. [Sergey Brin and Larry Page, 1998] Measure

Web and PageRank Lecture 4 CSCI 4974/6971 12 Sep 2016 1 / 16 Todays Biz 1. Review MPI 2.

Ranking linked data Web graph, PageRank, Topic-specific PageRank and HITS Web Search Overview

Personalized PageRank Document Understanding, session 4 CS6200: Information Retrieval

Ranking linked data Web graph, PageRank, Topic-specific PageRank and HITS Web Search 1 Overview

0.1 Naive formulation of PageRank In general, PageRank is a way to rank nodes on a graph. Let r i

Lin inear programming Example Numpy: PageRank scipy.optimize.linprog Example linear

Model REM Rapid Engineering Model What is REM? REM Rapid Engineering Model What is REM? REM

Video 1: Error Definition Errors in Numerical Methods Every result we compute in Numerical

Math 211 Math 211 Lecture #12 Numerical Methods Eulers Method September 22, 2003 2

Graphs / Networks Centrality measures, algorithms, interactive applications Duen Horng (Polo) Chau

A Rapid Cache-aware Procedure Positioning Optimization to Favor Incremental Development Enrico

Positioning to Win: A Dynamic Role Assignment and Formation Positioning System Patrick MacAlpine,

Optimal Positioning of Flying Relays for Wireless Networks Junting Chen 1 and David Gesbert 2 1

PPI Network Alignment 02-715 Advanced Topics in Computa8onal Genomics

PageRank; Facility Location CSC2556 - Nisarg Shah 1 Announcements Proposal tentatively due

PageRank Document Understanding, session 3 CS6200: Information Retrieval Link Structure of the

Robust PageRank and Locally Computable Spam Detection Features Vahab Mirrokni [Microsoft

Numerical Methods for Rapid Computation of PageRank Gene H. Golub - PowerPoint PPT Presentation

Numerical Methods for Rapid Computation of PageRank Gene H. Golub Stanford University Stanford, CA USA Joint work with Chen Greif Outline Markov Chains and PageRank 1 Definition Acceleration Techniques 2 Sequence extrapolation Adaptive

Graph Mining - PageRank Mert Terzihan-Zhixiong Chen Content 1. Web as a Graph 2. Why is

The PageRank Algorithm and Web Search John Orr Engines Introduction PageRank Computation

PageRank CS16: Introduction to Data Structures &amp; Algorithms Spring 2020 Outline The WWW

Rapid Response Jobs are Alaskas Future Rapid Response Rapid Response Rapid Response is a

PAGERANK-RELATED METHODS FOR ANALYZING CITATION NETWORKS Author: Ludo Waltman and Erjia Yan

IV.4 Topic-Specific &amp; Personalized PageRank PageRank produces one-size-fits-all

PageRank Google's PageRank algorithm. [Sergey Brin and Larry Page, 1998] Measure

Web and PageRank Lecture 4 CSCI 4974/6971 12 Sep 2016 1 / 16 Todays Biz 1. Review MPI 2.

Ranking linked data Web graph, PageRank, Topic-specific PageRank and HITS Web Search Overview

Personalized PageRank Document Understanding, session 4 CS6200: Information Retrieval

Ranking linked data Web graph, PageRank, Topic-specific PageRank and HITS Web Search 1 Overview

0.1 Naive formulation of PageRank In general, PageRank is a way to rank nodes on a graph. Let r i

Lin inear programming Example Numpy: PageRank scipy.optimize.linprog Example linear

Model REM Rapid Engineering Model What is REM? REM Rapid Engineering Model What is REM? REM

Video 1: Error Definition Errors in Numerical Methods Every result we compute in Numerical

Math 211 Math 211 Lecture #12 Numerical Methods Eulers Method September 22, 2003 2

Graphs / Networks Centrality measures, algorithms, interactive applications Duen Horng (Polo) Chau

A Rapid Cache-aware Procedure Positioning Optimization to Favor Incremental Development Enrico

Positioning to Win: A Dynamic Role Assignment and Formation Positioning System Patrick MacAlpine,

Optimal Positioning of Flying Relays for Wireless Networks Junting Chen 1 and David Gesbert 2 1

PPI Network Alignment 02-715 Advanced Topics in Computa8onal Genomics

PageRank; Facility Location CSC2556 - Nisarg Shah 1 Announcements Proposal tentatively due

PageRank Document Understanding, session 3 CS6200: Information Retrieval Link Structure of the

Robust PageRank and Locally Computable Spam Detection Features Vahab Mirrokni [Microsoft

PageRank CS16: Introduction to Data Structures & Algorithms Spring 2020 Outline The WWW

IV.4 Topic-Specific & Personalized PageRank PageRank produces one-size-fits-all