PageRank (PR) Q: What makes a web page important? A: many important - PowerPoint PPT Presentation

PageRank (PR) Q: What makes a web page important? A: many important pages contain links to it; however a page containing many links has reduced impact on the importance of the pages it contains links to. This is the basic idea in PageRank for ranking graph nodes. PageRank as a random surfer process : Start surfing from a random node and keep following links with probability µ restarting with probability 1 − µ ; the node for restarting will be selected based on a personalization vector v . The ranking value x i of a node i is the probability of visiting this node during surfing. PR can also be cast in power series representation as x = (1 − µ ) � k j =0 µ j S j v ; S encodes column-stochastic adjacencies. Functional rankings A general method to assign ranking values to graph nodes as x = � k j =0 ζ j S j v . PR is a functional ranking, ζ j = (1 − µ ) µ j . Terms attenuated by outdegrees in S and damping coefficients ζ j . March 25, 2013 1 / 6

Q: Is there a way to encode functional rankings as surfing processes? A: Multidamping Computing µ j in multidamping � κ Simulate a functional ranking by random surfers 1- � κ following emanating links with probability µ j at � 2 step j given by : 1 µ j = 1 − , j = 1 , ..., k , � 1 ρ k − j +1 1- � 2 1+ 1 − µ j − 1 where µ 0 = 0 and ρ k − j +1 = ζ k − j +1 1- � 1 ζ k − j Examples LinearRank (LR) x LR = � k 2( k +1 − j ) ( k +1)( k +2) S j v : µ j = j j +2 , j = 1 , ..., k . j =0 TotalRank (TR) x TR = � ∞ ( j +1)( j +2) S j v : µ j = k − j +1 1 k − j +2 , j = 1 , ..., k . j =0 Advantages of multidamping Reduced computational cost in approximating functional rankings using the Monte Carlo approach. A random surfer terminates with probability 1 − µ j at step j . Inherently parallel and synchronization free computation. March 25, 2013 2 / 6

TotalRank: Kendall tau vs step for TopK=1000 nodes (uk-2005) Personalized LinearRank: Number of shared nodes (max=30) vs microstep (in-2004). For the seed node 20% of the nodes has better ranking in the Non-Personalized run. 1 iterations 30 surfers iterations 0.95 surfers 0.9 25 0.85 # shared nodes (max=30) 20 0.8 Kendall tau 0.75 15 0.7 10 0.65 0.6 5 0.55 0 0.5 0 1e+06 2e+06 3e+06 4e+06 5e+06 6e+06 7e+06 1 2 3 4 5 6 7 8 9 10 microstep step Approximate ranking: Run n surfers to completion for graph size n . How well does the computed ranking capture the “reference” ordering for top- k nodes (Kendall τ , y-axis) in comparison to the one calculated by standard iteration (for a number of steps, x-axis) of equivalent computational cost/number of operations? [Left] Approximate personalized ranking: Run < n surfers to completion (each called a microstep, x-axis), but only from a selected node (personalized). How well can we capture the “reference” top- k nodes, i.e. how many of them are shared (y-axis), compared to the iterative approach of equivalent computational load? [Right] [uk-2005: 39 , 459 , 925 nodes, 936 , 364 , 282 edges. in-2004: 1 , 382 , 908 nodes, 16 , 917 , 053 edges] March 25, 2013 3 / 6

Node similarity: Two nodes are similar if they are linked by other similar node pairs. By pairing similar nodes, the two graphs become aligned . In IsoRank , a state-of-the-art graph alignment method, first a matrix X of similarity scores between the two sets of nodes is computed and then maximum-weight bipartite matching approaches extract the most similar pairs. B the adjacencies A T , B T of the two graphs normalized by columns Let ˜ A , ˜ (network data), H ij independently known similarity scores (preferences matrix) between nodes i ∈ V B and j ∈ V A and µ the percentage of contribution of network data in the algorithm. To compute X , IsoRank iterates: A T + (1 − µ ) H X ← µ ˜ BX ˜ March 25, 2013 4 / 6

Network Similarity Decomposition (NSD) We reformulate IsoRank iteration and gain speedup and parallelism. In n steps of we reach X ( n ) = (1 − µ ) � n − 1 k =0 µ k ˜ A T ) k + µ n ˜ B k H ( ˜ B n H ( ˜ A T ) n Assume for a moment that H = uv T (1 component). Two phases for X : u ( k ) = ˜ B k u and v ( k ) = ˜ A k v (preprocess/compute iterates) 1 k =0 µ k u ( k ) v ( k ) T + µ n u ( n ) v ( n ) T (construct X) X ( n ) = (1 − µ ) � n − 1 2 This idea extends to s components, H ∼ � s i =1 w i z T i . NSD computes matrix-vector iterates and builds X as a sum of outer products of vectors; these are much cheaper than triple matrix products. We can then apply Primal Dual Matching (PDM) or Greedy Matching (1/2 approximation, GM) to extract the actual node pairs. PDM networks matches IsoRank NSD matches networks GM elemental similarities elemental similarities as matrix as component vectors March 25, 2013 5 / 6

Species pair NSD PDM GM IsoRank (secs) (secs) (secs) (secs) celeg-dmela 3.15 152.12 7.29 783.48 Species Nodes Edges celeg-hsapi 3.28 163.05 9.54 1209.28 celeg (worm) 2805 4572 celeg-scere 1.97 127.70 4.16 949.58 dmela (fly) 7518 25830 dmela-ecoli 1.86 86.80 4.78 807.93 ecoli (bacterium) 1821 6849 hpylo (bacterium) 706 1414 dmela-hsapi 8.61 590.16 28.10 7840.00 hsapi (human) 9633 36386 dmela-scere 4.79 182.91 12.97 4905.00 mmusc (mouse) 290 254 ecoli-hsapi 2.41 79.23 4.76 2029.56 scere (yeast) 5499 31898 ecoli-scere 1.49 69.88 2.60 1264.24 hsapi-scere 6.09 181.17 15.56 6714.00 We computed the similarity matrices X for various possible pairs of species using Protein-Protein Interaction (PPI) networks. µ = 0 . 80, uniform initial conditions (outer product of suitably normalized 1 ’s for each pair), 20 iterations, one component. Then we extracted node matches using PDM and GM. 3 orders of magnitude speedup of NSD-based approaches compared to IsoRank ones. Parallelization: NSD has also been ported to parallel/distributed platforms: We have aligned up to million-node graph instances using up to 3 , 072 cores in a supercomputer installation. We have managed to process graph pairs of over a billion nodes and twenty billion edges each, over MapReduce-based platforms. March 25, 2013 6 / 6

PageRank (PR) Q: What makes a web page important? A: many important - PowerPoint PPT Presentation

PageRank (PR) Q: What makes a web page important? A: many important pages contain links to it; however a page containing many links has reduced impact on the importance of the pages it contains links to. This is the basic idea in PageRank for

Graph Mining - PageRank Mert Terzihan-Zhixiong Chen Content 1. Web as a Graph 2. Why is

The PageRank Algorithm and Web Search John Orr Engines Introduction PageRank Computation

PageRank CS16: Introduction to Data Structures & Algorithms Spring 2020 Outline The WWW

PAGERANK-RELATED METHODS FOR ANALYZING CITATION NETWORKS Author: Ludo Waltman and Erjia Yan

IV.4 Topic-Specific & Personalized PageRank PageRank produces one-size-fits-all

PageRank Google's PageRank algorithm. [Sergey Brin and Larry Page, 1998] Measure

Web and PageRank Lecture 4 CSCI 4974/6971 12 Sep 2016 1 / 16 Todays Biz 1. Review MPI 2.

Ranking linked data Web graph, PageRank, Topic-specific PageRank and HITS Web Search Overview

Personalized PageRank Document Understanding, session 4 CS6200: Information Retrieval

Ranking linked data Web graph, PageRank, Topic-specific PageRank and HITS Web Search 1 Overview

0.1 Naive formulation of PageRank In general, PageRank is a way to rank nodes on a graph. Let r i

Lin inear programming Example Numpy: PageRank scipy.optimize.linprog Example linear

Sublinear Algorithms for Personalized PageRank, with Applications Ashish Goel Joint work with

PageRank and recommenders on very large scale A Big Data perspective through Stratosphere

CSCI 104 Graph Algorithms Mark Redekopp David Kempe Sandra Batista 2 PAGERANK ALGORITHM 3

Chapter 5: Link Analysis for Authority Scoring 5.1 PageRank (S. Brin and L. Page 1997/1998) 5.2

Video Stabilization CS448V Computational Video Manipulation April 2019 Fundamental problem

Sorting Lower Bound Radix Sort Radix sort to the rescue sort of After today, you should

Supply and Shorting in Speculative Markets Marcel Nutz Columbia University with Johannes

Module 4 Free yourself from self-sabotage The Inner Critic The force inside that

SURF OER Updates Carissa Champlin, SURF Project Lead 29 June 2020 1 OUTLINE NE

Spectral shock waves in QCD Maciej A. Nowak (In collaboration with Jean-Paul Blaizot and Piotr

BRILLIANT BIDS CREATING FUNDING APPLICATIONS THAT STAND OUT J A N E T TAY L O R A N D N

MATH 3341: Introduction to Scientific Computing Lab Libao Jin University of Wyoming April 15,

PageRank (PR) Q: What makes a web page important? A: many important - PowerPoint PPT Presentation

PageRank (PR) Q: What makes a web page important? A: many important pages contain links to it; however a page containing many links has reduced impact on the importance of the pages it contains links to. This is the basic idea in PageRank for

Graph Mining - PageRank Mert Terzihan-Zhixiong Chen Content 1. Web as a Graph 2. Why is

The PageRank Algorithm and Web Search John Orr Engines Introduction PageRank Computation

PageRank CS16: Introduction to Data Structures &amp; Algorithms Spring 2020 Outline The WWW

PAGERANK-RELATED METHODS FOR ANALYZING CITATION NETWORKS Author: Ludo Waltman and Erjia Yan

IV.4 Topic-Specific &amp; Personalized PageRank PageRank produces one-size-fits-all

PageRank Google's PageRank algorithm. [Sergey Brin and Larry Page, 1998] Measure

Web and PageRank Lecture 4 CSCI 4974/6971 12 Sep 2016 1 / 16 Todays Biz 1. Review MPI 2.

Ranking linked data Web graph, PageRank, Topic-specific PageRank and HITS Web Search Overview

Personalized PageRank Document Understanding, session 4 CS6200: Information Retrieval

Ranking linked data Web graph, PageRank, Topic-specific PageRank and HITS Web Search 1 Overview

0.1 Naive formulation of PageRank In general, PageRank is a way to rank nodes on a graph. Let r i

Lin inear programming Example Numpy: PageRank scipy.optimize.linprog Example linear

Sublinear Algorithms for Personalized PageRank, with Applications Ashish Goel Joint work with

PageRank and recommenders on very large scale A Big Data perspective through Stratosphere

CSCI 104 Graph Algorithms Mark Redekopp David Kempe Sandra Batista 2 PAGERANK ALGORITHM 3

Chapter 5: Link Analysis for Authority Scoring 5.1 PageRank (S. Brin and L. Page 1997/1998) 5.2

Video Stabilization CS448V Computational Video Manipulation April 2019 Fundamental problem

Sorting Lower Bound Radix Sort Radix sort to the rescue sort of After today, you should

Supply and Shorting in Speculative Markets Marcel Nutz Columbia University with Johannes

Module 4 Free yourself from self-sabotage The Inner Critic The force inside that

SURF OER Updates Carissa Champlin, SURF Project Lead 29 June 2020 1 OUTLINE NE

Spectral shock waves in QCD Maciej A. Nowak (In collaboration with Jean-Paul Blaizot and Piotr

BRILLIANT BIDS CREATING FUNDING APPLICATIONS THAT STAND OUT J A N E T TAY L O R A N D N

MATH 3341: Introduction to Scientific Computing Lab Libao Jin University of Wyoming April 15,

PageRank CS16: Introduction to Data Structures & Algorithms Spring 2020 Outline The WWW

IV.4 Topic-Specific & Personalized PageRank PageRank produces one-size-fits-all