Damping Effect on PageRank Distribution IEEE High Performace Extreme - PowerPoint PPT Presentation

Damping Effect on PageRank Distribution IEEE High Performace Extreme Computing, Waltham, MA, USA September 26, 2018 Tiancheng Liu Yuchen Qian Xi Chen Xiaobai Sun Department of Computer Science, Duke University, USA

Outline ⋄ Analysis: ⋄ Personalized PageRank model: damping effects on PageRank distributions invention by Brin and Page (1998) ⋄ Algorithm: in need of innovative extension exploiting structures of the personalized, ⋄ The PageRank model family: stochastic Krylov (PSK) space an analytic apparatus with increased ⋄ Findings: description power and scope by experiments on real-world network data

Sparse graphs in sparse matrix representations x 1 2 x 20 2 2 4 x 2 x 19 4 4 x 3 6 x 18 6 6 x 4 8 x 17 8 8 x 16 x 5 10 10 10 x 6 x 7 12 12 12 x 11 x 8 14 14 x 12 14 x 13 16 16 16 x 9 18 x 14 18 18 x 15 20 20 20 x 10 2 4 6 8 10 12 14 16 18 20 2 4 6 8 10 12 14 16 18 20 2 4 6 8 10 12 14 16 18 20 probability transition matrix P link graph G ( V , E ) adjacency matrix A P = A · diag(1 ./ d out ) directed edge A ( v , u ) = 1 factor form in storage ( u , v ) ∈ E d in in-degrees d out out-degrees 1 / 26

Precursor: Personalized PageRank Web surfing modeled as a random walk on M α ( v ), a Markov chain with a personalized term S e T M α ( v ) = α P + (1 − α ) S , S = v damping factor link graph personalized vector gathering vector x 1 x 1 x 1 x 20 x 20 x 20 x 2 x 19 x 2 x 19 x 2 x 19 x 3 x 3 x 3 x 18 x 18 x 18 x 4 x 4 x 4 x 17 x 17 x 17 x 16 x 16 x 16 x 5 x 5 x 5 = α +(1 − α ) x 6 x 6 x 6 x 7 x 7 x 7 x 11 x 11 x 11 x 8 x 8 x 8 x 12 x 12 x 12 x 13 x 13 x 13 x 9 x 9 x 9 x 14 x 14 x 14 x 15 x 15 x 15 x 10 x 10 x 10 link graph personalized Markov chain personalized direct links Bernoulli decision at each click: The personalized term S : follow P -links or S -links direct links to v -nodes (yellow) with probability α ∈ (0 , 1) gathering/broadcasting a.k.a. damping factor rank-1, stochastic 2 / 26

Precursor: Personalized PageRank Web surfing modeled as a random walk on M α ( v ), a Markov chain with a personalized term S e T M α ( v ) = + (1 − α ) S , S = α P v damping factor link graph personalized vector gathering vector 2 2 2 4 4 4 6 6 6 8 8 8 = α + (1 − α ) 10 10 10 12 12 12 0 . 85 0 . 15 14 14 14 16 16 16 18 18 18 20 20 20 5 10 15 20 2 4 6 8 10 12 14 16 18 20 2 4 6 8 10 12 14 16 18 20 Bernoulli decision at each click: The personalized term S : follow P -links or S -links direct links to v -nodes (yellow) with probability α ∈ (0 , 1) gathering/broadcasting a.k.a. damping factor rank-1, stochastic 2 / 26

Equivalent expressions of PageRank distribution vector Purpose: multi-aspect investigation for interpretation and computational analysis 1. Steady state distribution of M α 3. Explicit representation � α P + (1 − α ) ve T � k α k ( P k v ) M α x = x = x x = (1 − α ) � in Neumann series with P , v , α the power method � k � k 2 2 2 � α k � � 4 4 4 (1 − α ) 6 6 − → 6 2 2 2 8 8 8 4 4 4 10 6 − → 10 10 6 6 12 8 12 12 8 8 14 10 10 10 14 14 16 12 k 16 16 12 12 18 14 18 18 14 14 20 16 2 4 6 8 10 12 14 16 18 20 20 20 16 16 18 18 18 20 2 4 6 8 10 12 14 16 18 20 20 20 link graph P v x M k x 0 x α Cumulative propagation of v on P Asymptotic walk on M α , memoryless of x 0 2. Solution to sparse linear system 4. Differential transition equation ( I − α P ) x = (1 − α ) v x ( α ) = [ P ( I − α P ) − 1 − (1 − α ) − 1 I ] x ( α ) ˙ many iterative solution methods spectrum-based method 3 / 26

Outline ⋄ Analysis: ⋄ Personalized PageRank model: damping effects on PageRank distributions invention by Brin and Page (1998) in need of innovative extension ⋄ Algorithm: exploiting structures of the personalized, ⋄ The PageRank model family: stochastic Krylov (PSK) space an analytic apparatus with increased ⋄ Findings: description power and scope by experiments on real-world network data

PageRank model family: characterizing various propagation patterns 0.9 0.7 0.8 0.6 0.7 0.5 0.6 Model description in equivalent 0.5 0.4 0.4 0.3 0.3 0.2 expressions: 0.2 0.1 0.1 0 0 0 2 4 6 8 10 12 14 16 18 20 0 2 4 6 8 10 12 14 16 18 20 ⋄ Propagation kernel functions Geometric kernels (Brin-Page) Poisson kernels (Chung) 0.6 0.45 propagation patterns 0.4 0.5 0.35 0.4 0.3 0.25 0.3 ⋄ Cumulative propagation on P 0.2 0.15 0.2 0.1 0.1 0.05 ⋄ Linear systems 0 0 0 2 4 6 8 10 12 14 16 18 20 0 2 4 6 8 10 12 14 16 18 20 Conway-Maxwell-Poisson kernels (slow) Conway-Maxwell-Poisson kernels (fast) ⋄ Differential transitions 0.9 0.4 0.8 0.35 0.7 PageRank distribution response 0.3 0.6 0.25 0.5 0.2 to damping variation 0.4 0.15 0.3 0.1 0.2 0.05 0.1 0 0 0 2 4 6 8 10 12 14 16 18 20 0 2 4 6 8 10 12 14 16 18 20 Negative Binomial kernels Logarithmic kernels A few particular subfamilies of propagation kernel functions 4 / 26

Propagation kernel functions 10 6 7 Propagation kernel function f ρ ( λ ) 10 6 6 # of nodes (bin counts) 5 6 4 Bin counts 4 3 � λ k f ρ ( λ ) = w k ( ρ ) 2 2 0.9 1 0 0.8 10 -5 10 0 graph k 0 10 -5 10 0 10 5 0.7 discrete 10 5 eigenvalue pmf 10 6 7 10 6 6 # of nodes (bin counts) 5 6 PageRank vector (model solution) with particular 4 Bin counts 4 3 network P and personalized distribution vector v 2 2 30 1 20 0 10 -5 10 0 10 0 10 -5 10 0 10 5 10 5 10 6 � P k v x = f ρ ( P ) v = w k ( ρ ) · 7 10 6 6 k -th step k # of nodes (bin counts) 5 damping on 6 propagation 4 Bin counts k -th step 4 3 2 2 0.95 0.9 1 0 0.85 { w k ( ρ ) } : any probability mass function (pmf) 10 -5 0 10 0 0.8 10 -5 10 0 10 5 10 5 of variable ρ , w.i./w.o. additional parameters PageRank distributions of 3 propagation patterns with P for link graph Twitter(www) 1 1 H. Kwak et al. (2009) 5 / 26

Propagation pattern kernels : CMP sub-family 0.45 0.4 0.35 Conway-Maxwell-Poisson (CMP) : 0.3 0.25 ρ k 0.2 w k ( ρ , ν ) = 0.15 ( k !) ν Z 0.1 damping damping 0.05 normalization speed variable 0 constant 0 2 4 6 8 10 12 14 16 18 20 slow damping speed: 0 ≤ ν ≤ 1 ( ρ = 0 . 9) Damping speed parameter ν ≥ 0 including BP model and Chung’s model 0.6  0.5 0 , geometric, (B-P, 1998)    0.4   1 , Poisson, (Chung, 2007)  0.3 ν = < 1 , slow decaying with k 0.2     0.1  > 1 , fast decaying with k  0 0 2 4 6 8 10 12 14 16 18 20 fast damping speed: ν ≥ 1 ( ρ = 5) Slow and fast propagation patterns of CMP distribution 6 / 26

Propagation pattern kernels: NB sub-family Negative Binomial (NB) : step k 0.4 0.35 � k + r − 1 � ρ k (1 − ρ ) r w k ( ρ , r ) = 0.3 k distribution damping shape 0.25 variable 0.2 Distribution shape parameter r : 0.15  1 , geometric distribution 0.1  r = ∞ , Poisson distribution, with r · (1 − ρ ) = const ρ 0.05  0 0 2 4 6 8 10 12 14 16 18 20 Propagation patterns of NB distribution 7 / 26

Propagation pattern kernels: logarithmic distribution Logarithmic : step k 0.9 0.8 ρ k − 1 w k ( ρ ) = ρ ∈ (0 , 1) k , 0.7 ln(1 − ρ ) 0.6 0.5 unique new model in the model family: 0.4 weight decay faster than geometric distribution 0.3 0.2 weight decay slower than Poisson distribution 0.1 no extra control parameters 0 0 2 4 6 8 10 12 14 16 18 20 Propagation patterns of logarithmic distributions 8 / 26

Propagation pattern kernels: precursor models and new model 0.9 0.8 Precursor models : 0.7 0.6 Brin-Page 1 model: geometric distribution 0.5 0.4 0.3 w k ( α ) = (1 − α ) α k 0.2 0.1 0 Chung’s 2 model: Poisson distribution 0 2 4 6 8 10 12 14 16 18 20 0.7 w k ( β ) = e − β β k 0.6 0.5 k ! 0.4 0.3 0.2 0.1 new model in the family : 0 0 2 4 6 8 10 12 14 16 18 20 0.9 log- γ model: logarithmic distribution 0.8 γ k 0.7 − 1 0.6 w k ( γ ) = 0.5 ln(1 − γ ) k 0.4 0.3 0.2 0.1 0 0 2 4 6 8 10 12 14 16 18 20 1 L. Page and S. Brin, 1998 2 F. Chung, PNAS, 2007 9 / 26

Cumulative propagation on P 0.9 0.25 2 0.8 4 0.2 0.7 6 0.6 8 0.15 0.5 10 0.4 12 0.1 0.3 14 link graph P and 0.2 0.05 16 0.1 18 personalized vector v propagation on P 0 0 0 2 4 6 8 10 12 14 16 18 20 20 0 2 4 6 8 10 12 14 16 18 20 α k Pk v � x ( α ) = z α 2 4 6 8 10 12 14 16 18 20 geometric kernel (Brin-Page) 2 2 2 2 2 2 k 4 4 4 4 4 4 0.7 0.18 2 6 0.16 6 6 6 6 6 0.6 4 0.14 8 0.5 6 8 8 8 8 8 0.12 8 0.4 10 · · · 0.1 10 10 10 10 10 10 0.08 0.3 12 12 12 12 12 12 12 0.06 0.2 14 0.04 14 14 14 14 14 14 16 0.1 0.02 18 16 16 16 16 16 16 0 0 0 2 4 6 8 10 12 14 16 18 20 20 0 2 4 6 8 10 12 14 16 18 20 β k 18 18 18 18 18 18 Pk v 2 4 6 8 10 12 14 16 18 20 � Poisson kernel (Chung) x ( β ) = z β 20 20 20 20 20 20 k ! k 2 4 6 8 10 12 14 16 18 20 0.9 0.2 v P 2 v P m − 1 v P v Pv 2 0.18 0.8 4 0.16 0.7 6 0.14 0.6 8 0.12 0.5 10 0.1 0.4 0.08 12 0.3 0.06 14 0.2 0.04 16 0.1 0.02 18 0 0 0 2 4 6 8 10 12 14 16 18 20 0 2 4 6 8 10 12 14 16 18 20 20 γ k Pk v 2 4 6 8 10 12 14 16 18 20 � Logarithmic kernel (log- γ ) x ( γ ) = z γ k k 10 / 26

Damping Effect on PageRank Distribution IEEE High Performace Extreme - PowerPoint PPT Presentation

Damping Effect on PageRank Distribution IEEE High Performace Extreme Computing, Waltham, MA, USA September 26, 2018 Tiancheng Liu Yuchen Qian Xi Chen Xiaobai Sun Department of Computer Science, Duke University, USA Outline Analysis:

Graph Mining - PageRank Mert Terzihan-Zhixiong Chen Content 1. Web as a Graph 2. Why is

Damping Modelling and Identification Using Generalized Proportional Damping S Adhikari

Diferential Equations Forced vibrations ITI 26/03/2020 ITI Forced Vibrations Forced

The PageRank Algorithm and Web Search John Orr Engines Introduction PageRank Computation

PageRank CS16: Introduction to Data Structures & Algorithms Spring 2020 Outline The WWW

Personalized PageRank Document Understanding, session 4 CS6200: Information Retrieval

Novel Super-high Damping Natural Rubber Woothichai Thaijaroen Pram Yodjun Weenusarin Intiya

Can the Spatial Distribution of Damping be Measured? S. A DHIKARI , J. W OODHOUSE AND A. S RIKANTH

PAGERANK-RELATED METHODS FOR ANALYZING CITATION NETWORKS Author: Ludo Waltman and Erjia Yan

IV.4 Topic-Specific & Personalized PageRank PageRank produces one-size-fits-all

PageRank Google's PageRank algorithm. [Sergey Brin and Larry Page, 1998] Measure

Web and PageRank Lecture 4 CSCI 4974/6971 12 Sep 2016 1 / 16 Todays Biz 1. Review MPI 2.

Ranking linked data Web graph, PageRank, Topic-specific PageRank and HITS Web Search Overview

Ranking linked data Web graph, PageRank, Topic-specific PageRank and HITS Web Search 1 Overview

0.1 Naive formulation of PageRank In general, PageRank is a way to rank nodes on a graph. Let r i

Lin inear programming Example Numpy: PageRank scipy.optimize.linprog Example linear

Robust PageRank and Locally Computable Spam Detection Features Vahab Mirrokni [Microsoft

PageRank Document Understanding, session 3 CS6200: Information Retrieval Link Structure of the

PageRank; Facility Location CSC2556 - Nisarg Shah 1 Announcements Proposal tentatively due

PPI Network Alignment 02-715 Advanced Topics in Computa8onal Genomics

p-Norm Flow Diffusion for Local Graph Clustering Kimon Fountoulakis 1 , Di Wang 2 , Shenghao Yang

Networked Systems Laboratory (NetSysLab) University of British Columbia A golf course a

Google PageRank Francesco Ricci Faculty of Computer Science Free University of Bozen-Bolzano

DATA MINING LECTURE 11 Link Analysis Ranking PageRank -- Random walks HITS Absorbing Random

Damping Effect on PageRank Distribution IEEE High Performace Extreme - PowerPoint PPT Presentation

Damping Effect on PageRank Distribution IEEE High Performace Extreme Computing, Waltham, MA, USA September 26, 2018 Tiancheng Liu Yuchen Qian Xi Chen Xiaobai Sun Department of Computer Science, Duke University, USA Outline Analysis:

Graph Mining - PageRank Mert Terzihan-Zhixiong Chen Content 1. Web as a Graph 2. Why is

Damping Modelling and Identification Using Generalized Proportional Damping S Adhikari

Diferential Equations Forced vibrations ITI 26/03/2020 ITI Forced Vibrations Forced

The PageRank Algorithm and Web Search John Orr Engines Introduction PageRank Computation

PageRank CS16: Introduction to Data Structures &amp; Algorithms Spring 2020 Outline The WWW

Personalized PageRank Document Understanding, session 4 CS6200: Information Retrieval

Novel Super-high Damping Natural Rubber Woothichai Thaijaroen Pram Yodjun Weenusarin Intiya

Can the Spatial Distribution of Damping be Measured? S. A DHIKARI , J. W OODHOUSE AND A. S RIKANTH

PAGERANK-RELATED METHODS FOR ANALYZING CITATION NETWORKS Author: Ludo Waltman and Erjia Yan

IV.4 Topic-Specific &amp; Personalized PageRank PageRank produces one-size-fits-all

PageRank Google's PageRank algorithm. [Sergey Brin and Larry Page, 1998] Measure

Web and PageRank Lecture 4 CSCI 4974/6971 12 Sep 2016 1 / 16 Todays Biz 1. Review MPI 2.

Ranking linked data Web graph, PageRank, Topic-specific PageRank and HITS Web Search Overview

Ranking linked data Web graph, PageRank, Topic-specific PageRank and HITS Web Search 1 Overview

0.1 Naive formulation of PageRank In general, PageRank is a way to rank nodes on a graph. Let r i

Lin inear programming Example Numpy: PageRank scipy.optimize.linprog Example linear

Robust PageRank and Locally Computable Spam Detection Features Vahab Mirrokni [Microsoft

PageRank Document Understanding, session 3 CS6200: Information Retrieval Link Structure of the

PageRank; Facility Location CSC2556 - Nisarg Shah 1 Announcements Proposal tentatively due

PPI Network Alignment 02-715 Advanced Topics in Computa8onal Genomics

p-Norm Flow Diffusion for Local Graph Clustering Kimon Fountoulakis 1 , Di Wang 2 , Shenghao Yang

Networked Systems Laboratory (NetSysLab) University of British Columbia A golf course a

Google PageRank Francesco Ricci Faculty of Computer Science Free University of Bozen-Bolzano

DATA MINING LECTURE 11 Link Analysis Ranking PageRank -- Random walks HITS Absorbing Random

PageRank CS16: Introduction to Data Structures & Algorithms Spring 2020 Outline The WWW

IV.4 Topic-Specific & Personalized PageRank PageRank produces one-size-fits-all