PageRank
CS16: Introduction to Data Structures & Algorithms Spring 2020
PageRank CS16: Introduction to Data Structures & Algorithms - - PowerPoint PPT Presentation
PageRank CS16: Introduction to Data Structures & Algorithms Spring 2020 Outline The WWW & Search Engines Basic PageRank (Real) PageRank PageRank in practice 2 The World Wide Web Created by Tim-Berners Lee in 1989
PageRank
CS16: Introduction to Data Structures & Algorithms Spring 2020
Outline
The World Wide Web
Hypertext
Growth of the Web
5 450,000,000 900,000,000 1,350,000,000 1,800,000,000 1991 1993 1995 1997 1999 2001 2003 2005 2007 2009 2011 2013 2015 2017 2020 # of WebsitesGrowth of the Web
6 3,000,000 6,000,000 9,000,000 12,000,000 15,000,000 18,000,000 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 # of Websites Google (2.4M) Yahoo (2K) Altavista (23K)Search Engines
Search Engines
Search Engines
Search Engines
turtle turtle turtle turtle turtle turtle turtle turtle turtle turtle turtle turtle turtle turtle turtle turtle turtle turtle turtle turtle turtle turtle turtle turtle turtle turtle turtle turtle turtle turtle turtle turtle turtle turtle turtle turtle turtle turtle turtle turtle turtle turtle turtle turtle turtle
PageRank
From Lecture 01
The PageRank Algorithm
pages it links to
13The Basic PageRank Algorithm
PR(v) = X
u∈in(v)PR(u) |out(u)|
<latexit sha1_base64="6Fy2YTG6XmJhjxyAFgso8yb6PE=">ACMXicbVDLSsNAFJ34rPVdelmsAjtpiQiqAuh6EZ3VawtNKVMpN26GQS5lEoab7JjV8iuNCFilt/wklaRFsPDJx7zr3cuceLGJXKtl+shcWl5ZXV3Fp+fWNza7uws3svQy0wqeOQhaLpIUkY5aSuqGKkGQmCAo+Rhje4TP3GkAhJQ36nRhFpB6jHqU8xUkbqFK5jV/qwdpuUhmV4Dl2pg06soUs5zBzKUyeBri8Qjn+adTmJx1kVapW46RTKNoVOwOcJ86UFMEUtU7hye2GWAeEK8yQlC3HjlQ7RkJRzEiSd7UkEcID1CMtQzkKiGzH2ckJPDRKF/qhMI8rmKm/J2IUSDkKPNMZINWXs14q/ue1tPJP2zHlkVaE48kiXzOoQpjmB7tUEKzYyBCEBTV/hbiPTDjKpJw3ITizJ8+T+lHlrOLcHBerF9M0cmAfHIAScMAJqIrUAN1gMEDeAZv4N16tF6tD+tz0rpgTWf2wB9YX9+56KmH</latexit><latexit sha1_base64="6Fy2YTG6XmJhjxyAFgso8yb6PE=">ACMXicbVDLSsNAFJ34rPVdelmsAjtpiQiqAuh6EZ3VawtNKVMpN26GQS5lEoab7JjV8iuNCFilt/wklaRFsPDJx7zr3cuceLGJXKtl+shcWl5ZXV3Fp+fWNza7uws3svQy0wqeOQhaLpIUkY5aSuqGKkGQmCAo+Rhje4TP3GkAhJQ36nRhFpB6jHqU8xUkbqFK5jV/qwdpuUhmV4Dl2pg06soUs5zBzKUyeBri8Qjn+adTmJx1kVapW46RTKNoVOwOcJ86UFMEUtU7hye2GWAeEK8yQlC3HjlQ7RkJRzEiSd7UkEcID1CMtQzkKiGzH2ckJPDRKF/qhMI8rmKm/J2IUSDkKPNMZINWXs14q/ue1tPJP2zHlkVaE48kiXzOoQpjmB7tUEKzYyBCEBTV/hbiPTDjKpJw3ITizJ8+T+lHlrOLcHBerF9M0cmAfHIAScMAJqIrUAN1gMEDeAZv4N16tF6tD+tz0rpgTWf2wB9YX9+56KmH</latexit><latexit sha1_base64="6Fy2YTG6XmJhjxyAFgso8yb6PE=">ACMXicbVDLSsNAFJ34rPVdelmsAjtpiQiqAuh6EZ3VawtNKVMpN26GQS5lEoab7JjV8iuNCFilt/wklaRFsPDJx7zr3cuceLGJXKtl+shcWl5ZXV3Fp+fWNza7uws3svQy0wqeOQhaLpIUkY5aSuqGKkGQmCAo+Rhje4TP3GkAhJQ36nRhFpB6jHqU8xUkbqFK5jV/qwdpuUhmV4Dl2pg06soUs5zBzKUyeBri8Qjn+adTmJx1kVapW46RTKNoVOwOcJ86UFMEUtU7hye2GWAeEK8yQlC3HjlQ7RkJRzEiSd7UkEcID1CMtQzkKiGzH2ckJPDRKF/qhMI8rmKm/J2IUSDkKPNMZINWXs14q/ue1tPJP2zHlkVaE48kiXzOoQpjmB7tUEKzYyBCEBTV/hbiPTDjKpJw3ITizJ8+T+lHlrOLcHBerF9M0cmAfHIAScMAJqIrUAN1gMEDeAZv4N16tF6tD+tz0rpgTWf2wB9YX9+56KmH</latexit><latexit sha1_base64="6Fy2YTG6XmJhjxyAFgso8yb6PE=">ACMXicbVDLSsNAFJ34rPVdelmsAjtpiQiqAuh6EZ3VawtNKVMpN26GQS5lEoab7JjV8iuNCFilt/wklaRFsPDJx7zr3cuceLGJXKtl+shcWl5ZXV3Fp+fWNza7uws3svQy0wqeOQhaLpIUkY5aSuqGKkGQmCAo+Rhje4TP3GkAhJQ36nRhFpB6jHqU8xUkbqFK5jV/qwdpuUhmV4Dl2pg06soUs5zBzKUyeBri8Qjn+adTmJx1kVapW46RTKNoVOwOcJ86UFMEUtU7hye2GWAeEK8yQlC3HjlQ7RkJRzEiSd7UkEcID1CMtQzkKiGzH2ckJPDRKF/qhMI8rmKm/J2IUSDkKPNMZINWXs14q/ue1tPJP2zHlkVaE48kiXzOoQpjmB7tUEKzYyBCEBTV/hbiPTDjKpJw3ITizJ8+T+lHlrOLcHBerF9M0cmAfHIAScMAJqIrUAN1gMEDeAZv4N16tF6tD+tz0rpgTWf2wB9YX9+56KmH</latexit>Basic PageRank: Example 1
15A B C D E
.2 .2 .2 .2 .2
.2 .2 .2 .2 .1 .1Round 1
Basic PageRank: Example 1
16A B C D E
.1 .4 .1 .4
.1 .1 .4 .2 .2Round 2
Basic PageRank: Example 1
17A B C D E
.2 .5 .2 .1
.2 .2 .1 .25 .25Round 3
Basic PageRank: Example 1
18A B C D E
.25 .3 .25 .2
.25 .25 .2 .15 .15Round 4
Basic PageRank: Example 1
19A B C D E
.15 .45 .15 .25
.15 .15 .25 .225 .225Round 5
Basic PageRank: Example 1
20A B C D E
.225 .40 .225 .15
.225 .225 .15 .20 .20Round 6
Basic PageRank: Example 1
21A B C D E
.20 .375 .20 .225
.20 .20 .225 .1875 .1875Round 7
Basic PageRank: Example 1
22A B C D E
.1875 .425 .1875 .20
.1875 .1875 .20 .2125 .2125Round 8
Basic PageRank
Observations
Basic PageRank
25 BasicPageRank(G, k): # k is number of “rounds” for v in V: v.rank = 1/|V| for i from 1 to k: for v in V: v.prevrank = v.rank for v in V: v.rank = 0 for u in v.incoming: v.rank = v.rank + u.prevrank/|u.outgoing|Basic PageRank: Example 2
26A B C D E
Basic PageRank on Example 2
27Activity #1 k = 3
BasicPageRank(G, k): # k is number of “rounds” for v in V: v.rank = 1/|V| for i from 1 to k: for v in V: v.prevrank = v.rank for v in V: v.rank = 0 for u in v.incoming: v.rank = v.rank + u.prevrank/|u.outgoing|Basic PageRank: Example 2
28A B C D E
.2 .2 .2 .2 .2
.2 .1 .1 .2 .2Round 1
Basic PageRank: Example 2
29A B C D E
.7 .3
.3Round 2
Basic PageRank: Example 2
30A B C D E
1 Round 3
What’s going on?
Basic PageRank: Example 2
Basic PageRank: Example 3
33A C B D
.25 .25 .25 .25
Basic PageRank: Example 3
34A C B D
1/4 1/4 1/4 1/4
1/8 1/4 1/8 1/4 1/4Basic PageRank: Example 3
35A C B D
1/8 1/2 3/8
1/8 1/2 3/8Basic PageRank: Example 3
36A C B D
1/2 1/2
1/2 1/2Basic PageRank: Example 3
37A C B D
1/2 1/2
1/2 1/2Basic PageRank
Handling Rank Traps
What happens if the node is a sink?
Disappearing PageRank
40B A
1/2
d/21/2
(1-d)/4 (1-d)/4 (1-d)/4 (1-d)/4Disappearing PageRank
B A
(1-d)/2 1/2 (1 − d) 2 + 1 2 = 1 − d 2 < 1
<latexit sha1_base64="hjR81g5W9GoBAjEbdcTybXCiWM=">ACG3icbZDLSgMxFIYz9VbrbdSlm2ARKtIyqYIKCkU3Lis4tCWkslk2tDMhSQjlGEexI2v4saFivBhW9jOp2Ftv4Q+POdc0jO70ScSWVZ30ZhYXFpeaW4Wlpb39jcMrd37mUYC0JtEvJQtB0sKWcBtRVTnLYjQbHvcNpyRteTeuBCsnC4E6NI9rz8SBgHiNYadQ3j7uewCSpoKp7mCb1FB7BKUHZ7RIiWM2Jm5ELiPpm2apZmeC8Qbkpg1zNvnZdUMS+zRQhGMpO8iKVC/BQjHCaVrqxpJGmIzwgHa0DbBPZS/JlkvhgSYu9EKhT6BgRn9PJNiXcuw7utPHaihnaxP4X60TK+sl7AgihUNyPQhL+ZQhXCSFHSZoETxsTaYCKb/CskQ6ySUzrOkQ0CzK8bu147r6Hbk3LjKk+jCPbAPqgABE5BA9yAJrABAY/gGbyCN+PJeDHejY9pa8HIZ3bBHxlfP2eBnqY=</latexit><latexit sha1_base64="hjR81g5W9GoBAjEbdcTybXCiWM=">ACG3icbZDLSgMxFIYz9VbrbdSlm2ARKtIyqYIKCkU3Lis4tCWkslk2tDMhSQjlGEexI2v4saFivBhW9jOp2Ftv4Q+POdc0jO70ScSWVZ30ZhYXFpeaW4Wlpb39jcMrd37mUYC0JtEvJQtB0sKWcBtRVTnLYjQbHvcNpyRteTeuBCsnC4E6NI9rz8SBgHiNYadQ3j7uewCSpoKp7mCb1FB7BKUHZ7RIiWM2Jm5ELiPpm2apZmeC8Qbkpg1zNvnZdUMS+zRQhGMpO8iKVC/BQjHCaVrqxpJGmIzwgHa0DbBPZS/JlkvhgSYu9EKhT6BgRn9PJNiXcuw7utPHaihnaxP4X60TK+sl7AgihUNyPQhL+ZQhXCSFHSZoETxsTaYCKb/CskQ6ySUzrOkQ0CzK8bu147r6Hbk3LjKk+jCPbAPqgABE5BA9yAJrABAY/gGbyCN+PJeDHejY9pa8HIZ3bBHxlfP2eBnqY=</latexit><latexit sha1_base64="hjR81g5W9GoBAjEbdcTybXCiWM=">ACG3icbZDLSgMxFIYz9VbrbdSlm2ARKtIyqYIKCkU3Lis4tCWkslk2tDMhSQjlGEexI2v4saFivBhW9jOp2Ftv4Q+POdc0jO70ScSWVZ30ZhYXFpeaW4Wlpb39jcMrd37mUYC0JtEvJQtB0sKWcBtRVTnLYjQbHvcNpyRteTeuBCsnC4E6NI9rz8SBgHiNYadQ3j7uewCSpoKp7mCb1FB7BKUHZ7RIiWM2Jm5ELiPpm2apZmeC8Qbkpg1zNvnZdUMS+zRQhGMpO8iKVC/BQjHCaVrqxpJGmIzwgHa0DbBPZS/JlkvhgSYu9EKhT6BgRn9PJNiXcuw7utPHaihnaxP4X60TK+sl7AgihUNyPQhL+ZQhXCSFHSZoETxsTaYCKb/CskQ6ySUzrOkQ0CzK8bu147r6Hbk3LjKk+jCPbAPqgABE5BA9yAJrABAY/gGbyCN+PJeDHejY9pa8HIZ3bBHxlfP2eBnqY=</latexit><latexit sha1_base64="hjR81g5W9GoBAjEbdcTybXCiWM=">ACG3icbZDLSgMxFIYz9VbrbdSlm2ARKtIyqYIKCkU3Lis4tCWkslk2tDMhSQjlGEexI2v4saFivBhW9jOp2Ftv4Q+POdc0jO70ScSWVZ30ZhYXFpeaW4Wlpb39jcMrd37mUYC0JtEvJQtB0sKWcBtRVTnLYjQbHvcNpyRteTeuBCsnC4E6NI9rz8SBgHiNYadQ3j7uewCSpoKp7mCb1FB7BKUHZ7RIiWM2Jm5ELiPpm2apZmeC8Qbkpg1zNvnZdUMS+zRQhGMpO8iKVC/BQjHCaVrqxpJGmIzwgHa0DbBPZS/JlkvhgSYu9EKhT6BgRn9PJNiXcuw7utPHaihnaxP4X60TK+sl7AgihUNyPQhL+ZQhXCSFHSZoETxsTaYCKb/CskQ6ySUzrOkQ0CzK8bu147r6Hbk3LjKk+jCPbAPqgABE5BA9yAJrABAY/gGbyCN+PJeDHejY9pa8HIZ3bBHxlfP2eBnqY=</latexit>Handling Sinks
The Real PageRank Algorithm
edges
the “evaporated” pageranks of all nodes
The Real PageRank Algorithm
PR(v) = ✓ X
u∈in(v)d · PR(u) |out(u)| ◆ + X
u∈V(1 − d) · PR(u) |V | = ✓ d · X
u∈in(v)PR(u) |out(u)| ◆ + 1 − d |V | · X
u∈VPR(u) = ✓ d · X
u∈in(v)PR(u) |out(u)| ◆ + 1 − d |V | = 1 − d |V | + d · X
u∈in(v)PR(u) |out(u)|
<latexit sha1_base64="Z7PLqg9/wDMWfnmUVxCKLpMv9rk=">ADxHicvVJda9swFXsfXTZV9o97kUsbCSUBXsMuj0USvewPWZjSQtRCLIsJ6KW5OqjEBT3R/Zt7M9MdpKSJmMBrtguLr3HOLzcpcqZNFP1oBOG9+w8e7j1qPn7y9Nnz1v7BUEurCB0QmUt1nmBNcybowDCT0/NCUcyTnJ4lF5+q/tkVZpJ8d3MCzrmeCpYxg2vjTZb/x0SGew/63sXHXhm2OIEjadpC2fOIsREzAGsBEBSghyhQmzqWIpNLA9ajtlm5R46Q1XNR1jxdeAg3qYZrgk78Nu0uOdwdkuHCQ1Dz1sit0J/9/JWRGuqF1zI7zN7eBtN/87GU2aoewn+XnLTaUS+qA+4m8Spg1X0J60blEpiORWG5FjrURwVZuywMozktGwiq2mByQWe0pFPBeZUj19hSV87SspzKTynzCwrm5OMy1nvPEIzk2M73dq4q/642syT6MHROFNVSQpVBmc2gkrE4apkxRYvK5TzBRzHuFZIb9cow/KZfQrz9y7vJ4F3vYy/+r59craxh54CV6BDojBETgBX0AfDAJToNZcBmo8HPIQx3aJTRorGZegDsRXv8CZ40BQ=</latexit><latexit sha1_base64="Z7PLqg9/wDMWfnmUVxCKLpMv9rk=">ADxHicvVJda9swFXsfXTZV9o97kUsbCSUBXsMuj0USvewPWZjSQtRCLIsJ6KW5OqjEBT3R/Zt7M9MdpKSJmMBrtguLr3HOLzcpcqZNFP1oBOG9+w8e7j1qPn7y9Nnz1v7BUEurCB0QmUt1nmBNcybowDCT0/NCUcyTnJ4lF5+q/tkVZpJ8d3MCzrmeCpYxg2vjTZb/x0SGew/63sXHXhm2OIEjadpC2fOIsREzAGsBEBSghyhQmzqWIpNLA9ajtlm5R46Q1XNR1jxdeAg3qYZrgk78Nu0uOdwdkuHCQ1Dz1sit0J/9/JWRGuqF1zI7zN7eBtN/87GU2aoewn+XnLTaUS+qA+4m8Spg1X0J60blEpiORWG5FjrURwVZuywMozktGwiq2mByQWe0pFPBeZUj19hSV87SspzKTynzCwrm5OMy1nvPEIzk2M73dq4q/642syT6MHROFNVSQpVBmc2gkrE4apkxRYvK5TzBRzHuFZIb9cow/KZfQrz9y7vJ4F3vYy/+r59craxh54CV6BDojBETgBX0AfDAJToNZcBmo8HPIQx3aJTRorGZegDsRXv8CZ40BQ=</latexit><latexit sha1_base64="Z7PLqg9/wDMWfnmUVxCKLpMv9rk=">ADxHicvVJda9swFXsfXTZV9o97kUsbCSUBXsMuj0USvewPWZjSQtRCLIsJ6KW5OqjEBT3R/Zt7M9MdpKSJmMBrtguLr3HOLzcpcqZNFP1oBOG9+w8e7j1qPn7y9Nnz1v7BUEurCB0QmUt1nmBNcybowDCT0/NCUcyTnJ4lF5+q/tkVZpJ8d3MCzrmeCpYxg2vjTZb/x0SGew/63sXHXhm2OIEjadpC2fOIsREzAGsBEBSghyhQmzqWIpNLA9ajtlm5R46Q1XNR1jxdeAg3qYZrgk78Nu0uOdwdkuHCQ1Dz1sit0J/9/JWRGuqF1zI7zN7eBtN/87GU2aoewn+XnLTaUS+qA+4m8Spg1X0J60blEpiORWG5FjrURwVZuywMozktGwiq2mByQWe0pFPBeZUj19hSV87SspzKTynzCwrm5OMy1nvPEIzk2M73dq4q/642syT6MHROFNVSQpVBmc2gkrE4apkxRYvK5TzBRzHuFZIb9cow/KZfQrz9y7vJ4F3vYy/+r59craxh54CV6BDojBETgBX0AfDAJToNZcBmo8HPIQx3aJTRorGZegDsRXv8CZ40BQ=</latexit><latexit sha1_base64="Z7PLqg9/wDMWfnmUVxCKLpMv9rk=">ADxHicvVJda9swFXsfXTZV9o97kUsbCSUBXsMuj0USvewPWZjSQtRCLIsJ6KW5OqjEBT3R/Zt7M9MdpKSJmMBrtguLr3HOLzcpcqZNFP1oBOG9+w8e7j1qPn7y9Nnz1v7BUEurCB0QmUt1nmBNcybowDCT0/NCUcyTnJ4lF5+q/tkVZpJ8d3MCzrmeCpYxg2vjTZb/x0SGew/63sXHXhm2OIEjadpC2fOIsREzAGsBEBSghyhQmzqWIpNLA9ajtlm5R46Q1XNR1jxdeAg3qYZrgk78Nu0uOdwdkuHCQ1Dz1sit0J/9/JWRGuqF1zI7zN7eBtN/87GU2aoewn+XnLTaUS+qA+4m8Spg1X0J60blEpiORWG5FjrURwVZuywMozktGwiq2mByQWe0pFPBeZUj19hSV87SspzKTynzCwrm5OMy1nvPEIzk2M73dq4q/642syT6MHROFNVSQpVBmc2gkrE4apkxRYvK5TzBRzHuFZIb9cow/KZfQrz9y7vJ4F3vYy/+r59craxh54CV6BDojBETgBX0AfDAJToNZcBmo8HPIQx3aJTRorGZegDsRXv8CZ40BQ=</latexit>1
The Real PageRank
pageranks will stabilize
Alternative Sink Handling
PR(v) = 1 − d |V | + d · ✓ X
u∈in(v)PR(u) |out(u)| + X
u∈sinks(G)PR(u) |V | ◆
<latexit sha1_base64="nEPJd7aur4uCDUgY2QDJdVP8lc=">ACjnicdVFta9swEJbd9WXpy7z2476IhUHCWLDX0pWsrJ+aD9mY0kLcQiyLKcismSkUyG4/jn7Q/u2fzPZNWNrugPBc8/dc3e6SwrBDYThL89fe7G+sbn1srO9s7v3Kni9PzbKaspGVAmlbxNimOCSjYCDYLeFZiRPBLtJFpd1/OaeacOV/A7Lgk1zMpc845SAo2bBjzI2GR5+q3r3fXyO40wTWkYf0qp8GD9U+D1OY5oqwHC5/NebGw+Ky2OucSNkMtaWLW6P7Vs3+kbT1mo3abUitpwuTBV7+o/BdwATdt+ZxZ0w0HYGF4FUQu6qLXhLPgZp4ranEmghgzicICpiXRwKlgVSe2hWELsicTRyUJGdmWjbrPA7x6Q4U9o9Cbh/1aUJDdmScuMydwZ57GavK52MRCdjItuSwsMEkfG2VWYFC4vg1OuWYUxNIBQjV3s2J6R9xiwF2wXkL09MurYPxEB0Owq9H3Ysv7Tq20Bv0FvVQhD6hC3SNhmiEqLftRd6pd+YH/rF/7n9+TPW9VnOA/jH/+jdcX8T7</latexit>PageRank requirements
PageRank in Practice
some variant
48Other Applications of PageRank
Readings
Introduction to Poetry
Kleinberg has a great overview of PageRank