What is this Page Known for? Computing Web Page Reputations Davood - PowerPoint PPT Presentation

What is this Page Known for? Computing Web Page Reputations Davood Rafiei, Alberto Mendelzon University of Toronto 1

Introduction � Ranking plays an important role in searching the Web. � But the importance is a subjective measure. � A high quality page in computer graphics is not necessarily a high quality page in databases. � How do search engines address this problem? 2

Simple Importance Ranking v u y x • Rank by in-degree: • used in citation analysis (1970s). • idea: important journals are frequently cited by other journals. 3

Importance Ranking: PageRank � The rank of a page depends on � not only the number of its incoming links, � but also the ranks of those pages. � Adopted by Google search engine. � high-ranked pages are returned first. � Limitation: each page is assigned a universal rank, independent of its topic. 4

Our Goal Pages Topic Search Engine Page Topics Our System 5

Example What is the page sunsite.unc.edu/javafaq/javafaq.html good for? • Java FAQ • comp.lang.java FAQ • Java Tutorials • Java Stuff 6

The Idea search engines compared my favorite search engines p a review of search engines What can we say about the content of Page p ? 7

Random Walk Model 1 � Imagine a user searching for pages on topic t . � The user at each step � either jumps into a page on topic t chosen uniformly at random or � follows an outgoing link of the current page. � The one-level rank of a page on topic t is the number of visits the user makes into the page if the walk goes forever. 8

Random Walk Model 1 � d : the fraction of times the user makes a random jump. � (1-d) : the fraction of times the user follows a link. N � : number of pages on topic t t R n ( p , t ) � : Prob. of visiting page p for topic t at step n. 9

Probability of Visiting a Page q p if page p is d � n 1 � R ( q , t ) � n � R ( p , t ) ( 1 d ) = � + on topic t N � t O ( q ) � q p � 0 otherwise � 10

Second Scenario Good source of links (hub) Good content search engines compared (authority) my favorite search engines p a review of search engines 11

Random Walk Model 2 � Imagine the user at each step � either jumps into a page on topic t chosen uniformly at random, � follows an outgoing link of the current page ( forward visit ), � or jumps into a page that points to the current page ( backward visit ). � The walk strictly alternates between steps 2, 3. � The number of forward (backward) visits the user makes into a page is its authority (hub) 12 rank on topic t if the walk goes forever.

Random Walk Model 2 N � d, (1-d), : defined similarly. t A n ( t p , ) � : Prob. of a forward visit into page p at step n. H n ( p , t ) � : Prob. of a backward visit into page p at step n. 13

Probability of Visiting a Page if page p is d � n 1 � H q t ( , ) � n � A ( p , t ) ( 1 d ) = � + on topic t 2 N � t O ( q ) � q p � 0 otherwise � if page p is � d n 1 � A ( q , t ) � n � H ( p , t ) ( 1 d ) = � + N on topic t 2 � t I ( q ) � p q � 0 otherwise � 14

Rank Computation � Done using iterative methods. � First iteration: � Topics are extracted from the content of pages, � Ranks are initialized. � Next iterations: � Ranks are propagated through hyperlinks. 15

Rank Approximation � A given page p can acquire a high rank on an arbitrarily chosen topic t if � page p is on topic t , � p can be reached within a few steps from a large fraction of pages on topic t , � or p can be reached within a few steps from pages with high reputations on topic t . � An approximate algorithm will examine page p and only those pages not far away from page p . 16

Computing One-Level Reputation For every page p and term t R(p,t) = 1/ if term t appears in page p , N t R(p,t) = 0 otherwise While R has not converged R1(p,t) = 0 for every page p and term t For every link q � p R1(p,t) += R(q,t) / O(q) R(p,t) = (1-d) R1(p,t) for every page p and term t R(p,t) += d/ if term t appears in page p . N t 17

Computing Two-level Reputation For every page p and term t N A(p,t) = H(p,t) = 1/2 if term t appears in page p , t A(p,t) = H(p,t) = 0 otherwise While both H and A have not converged A1(p,t) = H1(p,t) = 0 for every page p and term t q � p For every link A1(p,t) += H(q,t) / O(q) H1(q,t) += A(p,t) / I(p) A(p,t) = (1-d) A1(p,t) and H(p,t) = (1-d) H1(p,t) for every page p and term t N N A(p,t) += d/2 and H(p,t) += d/2 t t 18 if term t appears in page p .

Current Implementation � Given a page, request its incoming links from Alta Vista. � Collect the “snippets” returned by the engine and extract candidate terms and phrases. � Remove stop words. � Set O(p) = 7.2 for every page p. � Initialize the weights and propagate them within one iteration. � Return highly-weighted terms/phrases. 19

Example Reputation of www.macleans.ca : 1 - Maclean's Magazine 2 - macleans 3 - Canadian Universities 20

Example: Authorities on (+censorship +net) � www.eff.org � Anti-Censorship, Join the Blue Ribbon, Blue Ribbon Campaign, Electronic Frontier Foundation � www.cdt.org � Center for Democracy and Technology, Communications Decency Act, Censorship, Free Speech, Blue Ribbon � www.aclu.org � ACLU, American Civil Liberties Union, Communications Decency Act 21

Example: Personal Home Pages � www.w3.org/People/Berners-Lee � History Of The Internet, Tim Berners-Lee, Internet History, W3C � www-db.stanford.edu/~ullman � Jeffrey D Ullman, Database Systems, Data Mining, Programming Languages � www.cs.toronto.edu/~mendel � Alberto Mendelzon, Data Warehousing and OLAP, SIGMOD, DBMS 22

Example: Site Reputation What is this site known for? • Russia • Computer Vision • Images 23 • Hockey

Example: Site Reputation Reputation of the Faculty of Mathematics, Computer Science, Physics and Astronomy at the University of Amsterdam ( www.wins.uva.nl ): • Solaris 2 FAQ • Wiskunde • Frank Zappa 24

Limitations � Our computations are affected by the following two factors: � how well is a topic represented on the Web? � how well is a page connected? – a few pages such as www.microsoft.com have links from a large fraction of all pages on the Web. – a large number of pages only have a few incoming links. 25

Conclusions � Introduced a notion of reputation � combining the textual content and the linkage structure. � Duality of Topics and Pages � Given a page, we currently find a ranked list of topics for the page. � However, given a topic, we can also find a ranked list of pages on that topic. 26

Conclusions � Our proposed methods generalize earlier ranking methods � One-level reputation ranking generalizes PageRank, � Two-level reputation ranking generalizes the hubs-and-authorities model. � Ongoing Work: � large-scale implementation of the proposed methods. 27

What is this Page Known for? Computing Web Page Reputations Davood - PowerPoint PPT Presentation

What is this Page Known for? Computing Web Page Reputations Davood Rafiei, Alberto Mendelzon University of Toronto 1 Introduction Ranking plays an important role in searching the Web. But the importance is a subjective measure. A

Agenda Item 7 Page 107 Page 108 Page 109 Page 110 Page 111 Page 112 Page 113 Page 114 Page

Page 1 of 36 Page 2 of 36 Page 3 of 36 Page 4 of 36 Page 5 of 36 Page 6 of 36 Page 7 of 36

Hearsay Reputations Cory K Costello & Sanjay Srivastava University of Oregon Hearsay

Agenda Item 7 Page 1 Page 2 Page 3 Page 4 Page 5 Page 6 Page 7 Page 8 Page 9 Page 10

Web Services Web Services Towards Web Services Towards Web Services Towards Web Services A

Wednesday, November 30, 2016 3:41 PM General Page 1 General Page 2 General Page 3 General Page

Lecture 8 Friday, June 2, 2017 5:38 PM slide_8 Page 1 slide_8 Page 2 slide_8 Page 3 slide_8

Portable reputation: Proving ownership of reputations across portals Sandeep S. Kumar & Paul

177 Hudson Street Manhattan, NY 10013 Block 219 Lot 21 Historic Photos Page 1 Page 2 Page 3

referencing SERVER 2 web page Images Web repository Server WEB PAGE Server instructions

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Lecture 1: Semantic Web and RDF Aidan Hogan aidhog@gmail.com THE WEB The Web is now 26 years

PAGE 1 PAGE 2 PAGE 3 PAGE 4 Vision PAGE 5 Desire Lines of Cow Paths? PAGE 6

1. Test page This page is for testing. This page is for testing. This page is for testing.

Lecture 12 Sunday, January 27, 2019 5:25 PM Lecture12 Page 1 Lecture12 Page 2 Lecture12 Page 3

Web Development Web Page Layout CSCI-GA 1122 Design and Code Web Development Web Page Layout

Technologies behind Internet Search Engine Ming-Jer Lee CTO VisionNEXT Inc. Type of Search

Community Open House Shaping the Vision for a Greater Community What is a Secondary Plan?

Introduction to Computational Linguistics Frank Richter fr@sfs.uni-tuebingen.de. Seminar f

Large-Scale Systems: WebOS Access to geographically distributed data-dissemination and

About Me About Me The Webs Missing Links: The Webs Missing Links: Dual training Dual

How to have a research career in industry Rebecca Isaacs, Research Scientist at Google SOSP

Informatics 1: Data & Analysis Lecture 14: Example Corpora Applications Ian Stark School of

Web and PageRank Lecture 4 CSCI 4974/6971 12 Sep 2016 1 / 16 Todays Biz 1. Review MPI 2.

What is this Page Known for? Computing Web Page Reputations Davood - PowerPoint PPT Presentation

What is this Page Known for? Computing Web Page Reputations Davood Rafiei, Alberto Mendelzon University of Toronto 1 Introduction Ranking plays an important role in searching the Web. But the importance is a subjective measure. A

Agenda Item 7 Page 107 Page 108 Page 109 Page 110 Page 111 Page 112 Page 113 Page 114 Page

Page 1 of 36 Page 2 of 36 Page 3 of 36 Page 4 of 36 Page 5 of 36 Page 6 of 36 Page 7 of 36

Hearsay Reputations Cory K Costello &amp; Sanjay Srivastava University of Oregon Hearsay

Agenda Item 7 Page 1 Page 2 Page 3 Page 4 Page 5 Page 6 Page 7 Page 8 Page 9 Page 10

Web Services Web Services Towards Web Services Towards Web Services Towards Web Services A

Wednesday, November 30, 2016 3:41 PM General Page 1 General Page 2 General Page 3 General Page

Lecture 8 Friday, June 2, 2017 5:38 PM slide_8 Page 1 slide_8 Page 2 slide_8 Page 3 slide_8

Portable reputation: Proving ownership of reputations across portals Sandeep S. Kumar &amp; Paul

177 Hudson Street Manhattan, NY 10013 Block 219 Lot 21 Historic Photos Page 1 Page 2 Page 3

referencing SERVER 2 web page Images Web repository Server WEB PAGE Server instructions

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Lecture 1: Semantic Web and RDF Aidan Hogan aidhog@gmail.com THE WEB The Web is now 26 years

PAGE 1 PAGE 2 PAGE 3 PAGE 4 Vision PAGE 5 Desire Lines of Cow Paths? PAGE 6

1. Test page This page is for testing. This page is for testing. This page is for testing.

Lecture 12 Sunday, January 27, 2019 5:25 PM Lecture12 Page 1 Lecture12 Page 2 Lecture12 Page 3

Web Development Web Page Layout CSCI-GA 1122 Design and Code Web Development Web Page Layout

Technologies behind Internet Search Engine Ming-Jer Lee CTO VisionNEXT Inc. Type of Search

Community Open House Shaping the Vision for a Greater Community What is a Secondary Plan?

Introduction to Computational Linguistics Frank Richter fr@sfs.uni-tuebingen.de. Seminar f

Large-Scale Systems: WebOS Access to geographically distributed data-dissemination and

About Me About Me The Webs Missing Links: The Webs Missing Links: Dual training Dual

How to have a research career in industry Rebecca Isaacs, Research Scientist at Google SOSP

Informatics 1: Data &amp; Analysis Lecture 14: Example Corpora Applications Ian Stark School of

Web and PageRank Lecture 4 CSCI 4974/6971 12 Sep 2016 1 / 16 Todays Biz 1. Review MPI 2.

Hearsay Reputations Cory K Costello & Sanjay Srivastava University of Oregon Hearsay

Portable reputation: Proving ownership of reputations across portals Sandeep S. Kumar & Paul

Informatics 1: Data & Analysis Lecture 14: Example Corpora Applications Ian Stark School of