PageRank: Ranking of nodes in graphs Gonzalo Mateos Dept. of ECE - PowerPoint PPT Presentation

PageRank: Ranking of nodes in graphs Gonzalo Mateos Dept. of ECE and Goergen Institute for Data Science University of Rochester gmateosb@ece.rochester.edu http://www.ece.rochester.edu/~gmateosb/ October 15, 2019 Introduction to Random Processes Ranking of nodes in graphs 1

PageRank: Random walk Ranking of nodes in graphs: Random walk Ranking of nodes in graphs: Probability propagation Introduction to Random Processes Ranking of nodes in graphs 2

Graphs 5 4 1 6 2 3 ◮ Graph ⇒ A set of V of vertices or nodes j = 1 , . . . , J ⇒ Connected by a set of edges E defined as ordered pairs ( i , j ) ◮ In figure ⇒ Nodes are V = { 1 , 2 , 3 , 4 , 5 , 6 } ⇒ Edges E = { (1 , 2) , (1 , 5) , (2 , 3) , (2 , 5) , (3 , 4) , ... (3 , 6) , (4 , 5) , (4 , 6) , (5 , 4) } ◮ Ex. 1: Websites and hyperlinks ⇒ World Wide Web (WWW) ◮ Ex. 2: People and friendship ⇒ Social network Introduction to Random Processes Ranking of nodes in graphs 3

How well connected nodes are? 5 4 1 6 2 3 ◮ Q: Which node is the most connected? A: Define most connected ⇒ Can define “most connected” in different ways ◮ Two important connectivity indicators 1) How many links point to a node (outgoing links irrelevant) 2) How important are the links that point to a node ◮ Node rankings to measure website relevance, social influence Introduction to Random Processes Ranking of nodes in graphs 4

Connectivity ranking ◮ Key insight: There is information in the structure of the network ◮ Knowledge is distributed through the network ⇒ The network (not the nodes) knows the rankings � to rank webpages ◮ Idea exploited by Google’s PageRank c ... by social scientists to study trust & reputation in social networks ... by ISI to rank scientific papers, transactions & magazines ... ◮ No one points to 1 ◮ Only 1 points to 2 5 4 ◮ Only 2 points to 3, but 2 more important than 1 1 6 ◮ 4 as high as 5 with less links ◮ Links to 5 have lower rank 2 3 ◮ Same for 6 Introduction to Random Processes Ranking of nodes in graphs 5

Preliminary definitions ◮ Graph G = ( V , E ) ⇒ vertices V = { 1 , 2 , . . . , J } and edges E 5 4 1 6 2 3 ◮ Outgoing neighborhood of i is the set of nodes j to which i points n ( i ) := { j : ( i , j ) ∈ E } ◮ Incoming neighborhood, n − 1 ( i ) is the set of nodes that point to i : n − 1 ( i ) := { j : ( j , i ) ∈ E } ◮ Strongly connected G ⇒ directed path joining any pair of nodes Introduction to Random Processes Ranking of nodes in graphs 6

Definition of rank ◮ Agent A chooses node i , e.g., web page, at random for initial visit ◮ Next visit randomly chosen between links in the neighborhood n ( i ) ⇒ All neighbors chosen with equal probability ◮ If reach a dead end because node i has no neighbors ⇒ Chose next visit at random equiprobably among all nodes ◮ Redefine graph G = ( V , E ) adding edges from dead ends to all nodes ⇒ Restrict attention to connected (modified) graphs 5 4 1 6 2 3 ◮ Rank of node i is the average number of visits of agent A to i Introduction to Random Processes Ranking of nodes in graphs 7

Equiprobable random walk ◮ Formally, let A n be the node visited at time n ◮ Define transition probability P ij from node i into node j � � A n = i � � P ij := P A n +1 = j ◮ Next visit equiprobable among i ’s N i := | n ( i ) | neighbors | n ( i ) | = 1 1 P ij = , for all j ∈ n ( i ) N i 1/5 1/2 to 1 1/2 1/2 ◮ Still have a graph 1/5 5 4 to 2 ◮ But also a MC 1 1/5 1/2 1/2 1 6 to 3 ◮ Red (not blue) circles 1/5 to 4 2 3 1/2 1/2 to 5 1/2 1/5 Introduction to Random Processes Ranking of nodes in graphs 8

Formal definition of rank ◮ Def: Rank r i of i -th node is the time average of number of visits n 1 � r i := lim I { A m = i } n n →∞ m =1 ⇒ Define vector of ranks r := [ r 1 , r 2 , . . . , r J ] T ◮ Rank r i can be approximated by average r ni at time n n r ni := 1 � I { A m = i } n m =1 ⇒ Since n →∞ r ni = r i , it holds r ni ≈ r i for n sufficiently large lim ⇒ Define vector of approximate ranks r n := [ r n 1 , r n 2 , . . . , r nJ ] T ◮ If modified graph is connected, rank independent of initial visit Introduction to Random Processes Ranking of nodes in graphs 9

Ranking algorithm Output : Vector r ( i ) with ranking of node i Input : Scalar n indicating maximum number of iterations Input : Vector N ( i ) containing number of neighbors of i Input : Matrix N ( i , j ) containing indices j of neighbors of i m = 1; r =zeros(J,1); % Initialize time and ranks A 0 = random(‘unid’, J ); % Draw first visit uniformly at random while m < n do jump = random(‘unid’, N A m − 1 ); % Neighbor uniformly at random A m = N ( A m − 1 , jump); % Jump to selected neighbor r ( A m ) = r ( A m ) + 1; % Update ranking for A m m = m + 1; end r = r / n ; % Normalize by number of iterations n Introduction to Random Processes Ranking of nodes in graphs 10

Social graph example ◮ Asked probability students about homework collaboration ◮ Created (crude) graph of the social network of students in the class ⇒ Used ranking algorithm to understand connectedness ◮ Ex: I want to know how well students are coping with the class ⇒ Best to ask people with higher connectivity ranking ◮ 2009 data from “UPenn’s ECE440” Introduction to Random Processes Ranking of nodes in graphs 11

Ranked class graph Pallavi Yerramilli Harish Venkatesan Jacci Jeffries Xiang-Li Lim Thomas Cassel Owen Tian Daniela Savoia Eric Lamb Ceren Dumaz Priya Takiar Sugyan Lohiaa Lindsey Eatough Ankit Aggarwal Madhur Agarwal Lisa Zheng Anthony Dutcher Aarti Kochhar Robert Feigenberg Carolina Lee Saksham Karwal Ciara Kennedy Amanda Smith Amanda Zwarenstein Ranga Ramachandran Michael Harker Katie Joo Shahid Bosan Varun Balan Ivan Levcovitz Pia Ramchandani Jesse Beyroutey Rebecca Gittler Jane Kim Paul Deren Aditya Kaji Jihyoung Ahn Ella Kim Alexandra Malikova Charles Jeon Chris Setian Introduction to Random Processes Ranking of nodes in graphs 12

Convergence metrics ◮ Recall r is vector of ranks and r n of rank iterates ◮ By definition n →∞ r n = r . How fast r n converges to r ( r given)? lim ◮ Can measure by ℓ 2 distance between r and r n J � 1 / 2 � � ( r ni − r i ) 2 ζ n := � r − r n � 2 = i =1 ◮ If interest is only on highest ranked nodes, e.g., a web search ⇒ Denote r ( i ) as the index of the i -th highest ranked node ⇒ Let r ( i ) be the index of the i -th highest ranked node at time n n ◮ First element wrongly ranked at time n i { r ( i ) � = r ( i ) ξ n := arg min n } Introduction to Random Processes Ranking of nodes in graphs 13

Evaluation of convergence metrics Distance 1 10 correctly ranked nodes 0 10 ◮ Distance close to 10 − 2 in ≈ 5 × 10 3 iterations − 1 10 ◮ Bad: Two highest ranks − 2 10 in ≈ 4 × 10 3 iterations 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 time (n) First element wrongly ranked ◮ Awful: Six best ranks in 14 ≈ 8 × 10 3 iterations 12 10 correctly ranked nodes ◮ (Very) slow convergence 8 6 4 2 0 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 time (n) Introduction to Random Processes Ranking of nodes in graphs 14

When does this algorithm converge? ◮ Cannot confidently claim convergence until 10 5 iterations ⇒ Beyond particular case, slow convergence inherent to algorithm 40 35 30 correctly ranked nodes 25 20 15 10 5 0 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 time (n) 5 x 10 ◮ Example has 40 nodes, want to use in network with 10 9 nodes! ⇒ Leverage properties of MCs to obtain a faster algorithm Introduction to Random Processes Ranking of nodes in graphs 15

PageRank: Probability propagation Ranking of nodes in graphs: Random walk Ranking of nodes in graphs: Probability propagation Introduction to Random Processes Ranking of nodes in graphs 16

Limit probabilities n 1 � ◮ Recall definition of rank ⇒ r i := lim I { A m = i } n n →∞ m =1 ◮ Rank is time average of number of state visits in a MC ⇒ Can be as well obtained from limiting probabilities ◮ Recall transition probabilities ⇒ P ij = 1 , for all j ∈ n ( i ) N i ◮ Stationary distribution π = [ π 1 , π 1 , . . . , π J ] T solution of π j � � π i = P ji π j = for all i N j j ∈ n − 1 ( i ) j ∈ n − 1 ( i ) ⇒ Plus normalization equation � J i =1 π i = 1 ◮ As per ergodicity of MC (strongly connected G ) ⇒ r = π Introduction to Random Processes Ranking of nodes in graphs 17

Matrix notation, eigenvalue problem ◮ As always, can define matrix P with elements P ij J � � π i = P ji π j = P ji π j for all i j ∈ n − 1 ( i ) j =1 ◮ Right hand side is just definition of a matrix product leading to π = P T π , π T 1 = 1 ⇒ Also added normalization equation ◮ Idea: solve system of linear equations or eigenvalue problem on P T ⇒ Requires matrix P available at a central location ⇒ Computationally costly (sparse matrix P with 10 18 entries) Introduction to Random Processes Ranking of nodes in graphs 18

PageRank: Ranking of nodes in graphs Gonzalo Mateos Dept. of ECE - PowerPoint PPT Presentation

PageRank: Ranking of nodes in graphs Gonzalo Mateos Dept. of ECE and Goergen Institute for Data Science University of Rochester gmateosb@ece.rochester.edu http://www.ece.rochester.edu/~gmateosb/ October 15, 2019 Introduction to Random

Graph Mining - PageRank Mert Terzihan-Zhixiong Chen Content 1. Web as a Graph 2. Why is

Ranking linked data Web graph, PageRank, Topic-specific PageRank and HITS Web Search Overview

Ranking linked data Web graph, PageRank, Topic-specific PageRank and HITS Web Search 1 Overview

0.1 Naive formulation of PageRank In general, PageRank is a way to rank nodes on a graph. Let r i

The PageRank Algorithm and Web Search John Orr Engines Introduction PageRank Computation

PageRank CS16: Introduction to Data Structures & Algorithms Spring 2020 Outline The WWW

Graphs () Graphs () Graphs Graphs Graphs are collections of nodes

IV.4 Topic-Specific & Personalized PageRank PageRank produces one-size-fits-all

PAGERANK-RELATED METHODS FOR ANALYZING CITATION NETWORKS Author: Ludo Waltman and Erjia Yan

PageRank Google's PageRank algorithm. [Sergey Brin and Larry Page, 1998] Measure

Web and PageRank Lecture 4 CSCI 4974/6971 12 Sep 2016 1 / 16 Todays Biz 1. Review MPI 2.

Personalized PageRank Document Understanding, session 4 CS6200: Information Retrieval

Lin inear programming Example Numpy: PageRank scipy.optimize.linprog Example linear

Easy and Hard Outline Constraint Ranking in OT The Constraint Ranking problem Making fast

Tutorial: TF-Ranking for sparse features Tutorial: TF-Ranking for sparse features This tutorial

Weighted graphs Weighted graphs Weighted graphs Weighted graphs Graphs with numbers, called

t t tt r t

Building large-scale conic optimization models using MOSEK Fusion Andrea Cassioli Erling D.

On Recent Improvements in the Interior-Point Optimizer in MOSEK ISMP2015 14 July 2015

Magnetic Behaviour of RM 5 Intermetallic Compounds where R is a Rare- Earth and M=Ni or Co

Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember

ADMM and Mirror Descent Geoff Gordon & Ryan Tibshirani (I am Aaditya Ramdas and I approve

Advanced Machine Learning - Exercise 3 Deep learning essentials Introduction Whats the plan?

Lecture 4.4: Finitely generated abelian groups Matthew Macauley Department of Mathematical

PageRank: Ranking of nodes in graphs Gonzalo Mateos Dept. of ECE - PowerPoint PPT Presentation

PageRank: Ranking of nodes in graphs Gonzalo Mateos Dept. of ECE and Goergen Institute for Data Science University of Rochester gmateosb@ece.rochester.edu http://www.ece.rochester.edu/~gmateosb/ October 15, 2019 Introduction to Random

Graph Mining - PageRank Mert Terzihan-Zhixiong Chen Content 1. Web as a Graph 2. Why is

Ranking linked data Web graph, PageRank, Topic-specific PageRank and HITS Web Search Overview

Ranking linked data Web graph, PageRank, Topic-specific PageRank and HITS Web Search 1 Overview

0.1 Naive formulation of PageRank In general, PageRank is a way to rank nodes on a graph. Let r i

The PageRank Algorithm and Web Search John Orr Engines Introduction PageRank Computation

PageRank CS16: Introduction to Data Structures &amp; Algorithms Spring 2020 Outline The WWW

Graphs () Graphs () Graphs Graphs Graphs are collections of nodes

IV.4 Topic-Specific &amp; Personalized PageRank PageRank produces one-size-fits-all

PAGERANK-RELATED METHODS FOR ANALYZING CITATION NETWORKS Author: Ludo Waltman and Erjia Yan

PageRank Google's PageRank algorithm. [Sergey Brin and Larry Page, 1998] Measure

Web and PageRank Lecture 4 CSCI 4974/6971 12 Sep 2016 1 / 16 Todays Biz 1. Review MPI 2.

Personalized PageRank Document Understanding, session 4 CS6200: Information Retrieval

Lin inear programming Example Numpy: PageRank scipy.optimize.linprog Example linear

Easy and Hard Outline Constraint Ranking in OT The Constraint Ranking problem Making fast

Tutorial: TF-Ranking for sparse features Tutorial: TF-Ranking for sparse features This tutorial

Weighted graphs Weighted graphs Weighted graphs Weighted graphs Graphs with numbers, called

t t tt r t

Building large-scale conic optimization models using MOSEK Fusion Andrea Cassioli Erling D.

On Recent Improvements in the Interior-Point Optimizer in MOSEK ISMP2015 14 July 2015

Magnetic Behaviour of RM 5 Intermetallic Compounds where R is a Rare- Earth and M=Ni or Co

Uses of duality Geoff Gordon &amp; Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember

ADMM and Mirror Descent Geoff Gordon &amp; Ryan Tibshirani (I am Aaditya Ramdas and I approve

Advanced Machine Learning - Exercise 3 Deep learning essentials Introduction Whats the plan?

Lecture 4.4: Finitely generated abelian groups Matthew Macauley Department of Mathematical

PageRank CS16: Introduction to Data Structures & Algorithms Spring 2020 Outline The WWW

IV.4 Topic-Specific & Personalized PageRank PageRank produces one-size-fits-all

Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember

ADMM and Mirror Descent Geoff Gordon & Ryan Tibshirani (I am Aaditya Ramdas and I approve