Web Search Ranking (COSC 488) Nazli Goharian - PDF document

Web Search Ranking (COSC 488) Nazli Goharian nazli@cs.georgetown.edu 1 Evaluation of Web Search Engines: High Precision Search • Traditional IR systems are evaluated based on precision and recall. • Web search engines are evaluated based on top N documents. • Recall estimation is very difficult • Precision is of limited concern, as many users do not look beyond 1 st screen. => How fast and accurate the first results screen is generated? 2 1

Web Page Ranking • Evidence of quality for ranking: • Domain names -- .edu,.. • Text content -- term count, BM 25,… • Links – anchor text, number of in/out links, (Alg.: HITS, PageRank) • Web usage log – clicktrough data, eye tracking, geographical info (IP address, language,..), query history,.. • Query patterns – certain day,time …for improving efficiency & quality • Page layout – title, font size, html tags positions on page… • A problem: Web spam 3 3 Anchor Text • Short, 2-3 terms, describe the linked/destination page. • May/may not be a different point of view than the author’s. • Anchor text of links to a doc d i included in index for d i • Extended anchor text (text surrounding anchor text) may also be used • Generally weighted based on frequency (notion of idf ) • Spamming problem 6 2

Localized Search • Using geographic information to modify the ranking of results (in addition to SC scores, link based scores,…). • Geographic information maybe derived from: • Location of device sending the query • Context of query • restaurant near Al Capone’ s home’ s town • restaurant Near White Sox stadium • Geographic location in the query • Chicago restaurants • Geographic location in a document metadata 7 7 Link-based Ranking: Authorities and Hub (HITS) • (HITS: Hyperlink-Induced Topic Search, 1999) – Kleinberg • Links can indicate popularity • Assigning each retrieved web page two scores: Authority and Hub scores (thus, query dependant & query independent) • Authority page: an authoritative source on a given topic • Hub page: page listing pointers to authority pages on a topic • Authority score: summation of scores of all the hubs pointing to that authority page • Hub score: summation of scores of all authority pages the hub 8 is pointing to 3

Computing Authority and Hub Scores • Retrieve all pages containing the query term t. This is called root set. (~200 pgs) • Create a set including union of root set pages, pages that point to root set pages, and pages that root set pages point to. This is called base set . • Using the base set s to compute the hub and authority scores. • An iterative algorithm: • Initialize hubs and authorities with a score, ex. 1   • Update H(p) and A(p)   H ( p ) A ( u ) A ( p ) H ( u )       u S | p u u S | u p 9 Link-based Ranking: Page Rank • Mid 90’s by Larry Page & Sergey Brin • A scoring mechanism in Web search ( trade marked by Google and patented by Stanford ) • Generally calculated at the time of crawling (query independent) • Using incoming and outgoing links as an indicator of popularity , adjusts Web page score • Popular page is defined as a page that - Many Web pages link to it ( inlinks ) - Important (popular) pages link to it 10 4

Page Rank  ( 1 d ) PageRank ( D )    ( ) i PageRank A d ( ) N C D ... D D i 1 n • PageRank of (A) is defined based on some ratio of PageRank score of each page D i linking into A C(D i ) : number of links out from page D i d : damping factor (from 0-1; commonly 0.85; ~15% cases are random visits) N: total number of pages An Iterative Algorithm: Initially all pages are assigned an arbitrary page rank (1/n), summing to 1 Iteratively calculate the scores until the new scores do not change significantly To converge faster, may initialize page ranks based on number of inlinks, log info, etc. 11 Web Page Ranking • Considering both query dependant and query independent scores (captured during indexing), a global score is generated for each page: • For retrieved results based on query dependant ranking (ex. BM25), rank using Page Rank Or, • Use a linear combination of various relevance evidence (textual, BM25, link,….) SC(D, Q) = a BM25 (Q,D) + (1-a) PageRank (D) 12 12 5

Web Search Ranking (COSC 488) Nazli Goharian - PDF document

Web Search Ranking (COSC 488) Nazli Goharian nazli@cs.georgetown.edu 1 Evaluation of Web Search Engines: High Precision Search Traditional IR systems are evaluated based on precision and recall. Web search engines are evaluated based

Easy and Hard Outline Constraint Ranking in OT The Constraint Ranking problem Making fast

Tutorial: TF-Ranking for sparse features Tutorial: TF-Ranking for sparse features This tutorial

Web Services Web Services Towards Web Services Towards Web Services Towards Web Services A

Online Submodular Set Cover, Ranking, and Repeated Active Learning Online Ranking: At each round,

Ranking candidate genes from Ranking candidate genes from perturbation experiments Niko

TVM for Ads Ranking @ Facebook Hao Lu, Ansha Yu, Yinghai Lu, Andrew Tulloch Ads Ranking at

Web Search Engines Yiqun Liu Associate Professor, Tsinghua University Beijing, China Search

Search Engines Issues Avi Rappoport Search Tools Consulting Search Issues Enterprise Search

Web Mining and Recommender Systems Advanced Recommender Systems: Bayesian Personalized Ranking

Web CS490W: Web I nformation Search & Management Web opened the door for many important

Web Data Representation Web Graph, Text, Images, Metadata, Search spaces Web Search 1 The Web

Retrieval Models Probability Ranking Principle Web Search Slides based on the books: 1

Tabu Search Search Tabu Page 1 Part I Part I Tabu Search Principles Search Principles Tabu

Uninformed Search 2 Informed Search Rest of blind search An informed search strategyone

Informed search algorithms Outline Best-first search Greedy best-first search A *

Foundations of Artificial Intelligence 9. State-Space Search: Tree Search and Graph Search Malte

PowerPoint Slides For Professional Fiber Optic Installation PowerPoint Slides For Professional

Distributed Generation of Random Graphs Based on Social Network Models Kyrylo Institute for

FAQs Your disk quota is 20GB (per student) If you need more space, please let me know ASAP

Social Media in the GLOBAL Project Juan Quemada Universidad Politcnica de Madrid Friday,

New Gas Pipeline Construction: Intersection of FERC & PHMSA Catherine D. Little Catherine D.

NIRA: A New Inter-Domain Routing Architecture Xiaowei Yang, David Clark, Arthur W. Berger Rachit

Automatic Domain Adaptation for Parsing David McClosky a , b Eugene Charniak b Mark Johnson c , b a

Adviso ry Pane l 4: Public He alth, Safe ty, & L o g istic s Me tric s Re vie w Oc to be r

Web Search Ranking (COSC 488) Nazli Goharian - PDF document

Web Search Ranking (COSC 488) Nazli Goharian nazli@cs.georgetown.edu 1 Evaluation of Web Search Engines: High Precision Search Traditional IR systems are evaluated based on precision and recall. Web search engines are evaluated based

Easy and Hard Outline Constraint Ranking in OT The Constraint Ranking problem Making fast

Tutorial: TF-Ranking for sparse features Tutorial: TF-Ranking for sparse features This tutorial

Web Services Web Services Towards Web Services Towards Web Services Towards Web Services A

Online Submodular Set Cover, Ranking, and Repeated Active Learning Online Ranking: At each round,

Ranking candidate genes from Ranking candidate genes from perturbation experiments Niko

TVM for Ads Ranking @ Facebook Hao Lu, Ansha Yu, Yinghai Lu, Andrew Tulloch Ads Ranking at

Web Search Engines Yiqun Liu Associate Professor, Tsinghua University Beijing, China Search

Search Engines Issues Avi Rappoport Search Tools Consulting Search Issues Enterprise Search

Web Mining and Recommender Systems Advanced Recommender Systems: Bayesian Personalized Ranking

Web CS490W: Web I nformation Search &amp; Management Web opened the door for many important

Web Data Representation Web Graph, Text, Images, Metadata, Search spaces Web Search 1 The Web

Retrieval Models Probability Ranking Principle Web Search Slides based on the books: 1

Tabu Search Search Tabu Page 1 Part I Part I Tabu Search Principles Search Principles Tabu

Uninformed Search 2 Informed Search Rest of blind search An informed search strategyone

Informed search algorithms Outline Best-first search Greedy best-first search A *

Foundations of Artificial Intelligence 9. State-Space Search: Tree Search and Graph Search Malte

PowerPoint Slides For Professional Fiber Optic Installation PowerPoint Slides For Professional

Distributed Generation of Random Graphs Based on Social Network Models Kyrylo Institute for

FAQs Your disk quota is 20GB (per student) If you need more space, please let me know ASAP

Social Media in the GLOBAL Project Juan Quemada Universidad Politcnica de Madrid Friday,

New Gas Pipeline Construction: Intersection of FERC &amp; PHMSA Catherine D. Little Catherine D.

NIRA: A New Inter-Domain Routing Architecture Xiaowei Yang, David Clark, Arthur W. Berger Rachit

Automatic Domain Adaptation for Parsing David McClosky a , b Eugene Charniak b Mark Johnson c , b a

Adviso ry Pane l 4: Public He alth, Safe ty, &amp; L o g istic s Me tric s Re vie w Oc to be r

Web CS490W: Web I nformation Search & Management Web opened the door for many important

New Gas Pipeline Construction: Intersection of FERC & PHMSA Catherine D. Little Catherine D.

Adviso ry Pane l 4: Public He alth, Safe ty, & L o g istic s Me tric s Re vie w Oc to be r