Scale Effects in Web Search Soroush Ebadian, Parand Alizadeh Under - PowerPoint PPT Presentation

Sharif University of Technology 23/2/1397 1/32 Scale Effects in Web Search Soroush Ebadian, Parand Alizadeh Under Supervision of Prof. Fazli Social and Economic Networks, Spring 1397

Sharif University of Technology 23/2/1397 Contents 2/32 • Overview on Problem Space • Data Description • Direct Effects of Scale • Indirect Effects of Scale • Discussion & Conclusion

Sharif University of Technology 23/2/1397 3/32 • Overview on Problem Space • Data Description • Direct Effects of Scale • Indirect Effects of Scale • Discussion & Conclusion

Sharif University of Technology 23/2/1397 Analysis of Web Search Markets 4/32 • T wo different worlds • Ranking based on algorithmic innovation and fixed document features • Learning from historical queries is critical ranking quality Little is known about which one we live in.

Sharif University of Technology 23/2/1397 Analysis of Web Search Markets 5/32 (cont.) • Learning tends to slow down with each additional data point Fig. 1: A learning curve averaged over many trials Can any viable entrant easily achieve?!

Sharif University of Technology 23/2/1397 Authors of Paper 6/32 • Microsoft AI & Research: 4/5 • HomeAway Inc. : 1/5

Sharif University of Technology 23/2/1397 Data Description 8/32 • T wo search engines with same restrictions • More than 6 months • Based on Click-Through-Rates (CTR) Provider 1 (# impressions) > 200 billion Provider 2 (# impressions) > 300 billion Provider 1 (# clicks) > 100 billion Provider 2 (# clicks) > 150 billion Table 1. Summary statistics

Sharif University of Technology 23/2/1397 Benchmark & Target Data 10/32 • Legally limited time raw log retention • Benchmark data: first 3 months • Target data: next 9 months • <H(q,d), CTR(q,d)> • H(q,d): historical measure before day d for query q • CTR(q,d): CTR in day d of query q

Sharif University of Technology 23/2/1397 CTR & Historical Occurrences 11/32 Positive Correlation • Generated 270 pairs into buckets by H(q,d) 1 0.5 0 Provider 1 Provider 2 0-10 10-100 100-1k 10k-100k 100k-1m 1m-10m 10m-100m Fig. 2. CTR shows a positive correlation with the number of historical occurrences.

Sharif University of Technology 23/2/1397 12/32 Regression Analysis CTR = − 0.0530[ − 0.085, − 0.021] + 0.3287[0.315, 0.343] sqrt(log(x)) Fig. 3. Provider 1, relationship between CTR and number of historical examples.

Sharif University of Technology 23/2/1397 13/32 Regression Analysis (cont.) CTR = − 0.3871[ − 0.486, − 0.288] + 0.4792[0.438, 0.520] sqrt(log(x)) Fig. 4. Provider 2, relationship between CTR and number of historical examples.

Sharif University of Technology 23/2/1397 Scale Effect Analysis on New Queries 14/32 • Popular queries may be easier to satisfy • Same “query difficulty” • (1) query has less than 200 clicks in the three-month benchmark • (2) total number of clicks of the query between 1000 and 2000 (in a year) • Provider 1: 8000 queries • Provider 2: 10000 queries

Sharif University of Technology 23/2/1397 Scale Effect Analysis on New Queries 15/32 (cont.) • CTR(q, c): CTR of q in period of c+1 to c+100 clicks • c ∈ {100, 200, . . . , 900}

Sharif University of Technology 23/2/1397 Scale Effect Exists in Both 16/32 Fig. 5. Provider 1, relationship between CTR and number of historical examples for new queries only.

Sharif University of Technology 23/2/1397 Scale Effect Exists in Both (cont.) 17/32 Fig. 6. Provider 2, relationship between CTR and number of historical examples for new queries only.

Sharif University of Technology 23/2/1397 19/32 Constructing Bipartite Knowledge Graph • G = <Q, D, E> • Q = queries, D = documents • e ij = click count between q i and d j • Represent each query as a bag of words queries reduce by 7%

Sharif University of Technology 23/2/1397 20/32 Summary of Query-Document Graph • Cardinality Q: 4.82 billion • Cardinality D: 3.26 billion • Cardinality E: 11.6 billion • T otal clicks: > 100 billion

Sharif University of Technology 23/2/1397 21/32 Clustering Documents • Construct similarity matrix of document using cosine similarity • Convert similarity weights to 0 or 1 using a threshold • Construct document similarity graph

Sharif University of Technology 23/2/1397 22/32 Clustering Documents (cont.) • Find connected components of documents similarity graph • Each connected component is an intent - cluster • Construct query/intent-cluster graph • E ij = fraction of clicks from q i to cluster j

Sharif University of Technology 23/2/1397 Algorithm 1. Find Connected 23/32 Components 1. Every document pair is a separate cluster 2. Identify link nodes between pairs and merge 3. Repeat 2 until convergence

Sharif University of Technology 23/2/1397 24/32 Evaluation of Clusters • Form a 100-query test set and get all clusters • Score edges with 0 or 1 using auditors • Choose thresholds between 0.7, 0.8, 0.9 and 0.95

Sharif University of Technology 23/2/1397 25/32 Evaluation of Clusters (cont.) • Precision: fraction of pairs judged to be relevant • Weighted Precision: precision with applying Markov weight to each pair

Sharif University of Technology 23/2/1397 26/32 Evaluation of Clusters (cont.) • Pseudo Recall: for threshold 0.7 is 1, o.w is fraction of pairs each method recovers • Weighted Recall: pseudo recall with applying Markov weight to each pair

Sharif University of Technology 23/2/1397 27/32 Evaluation of Clusters (cont.) Pseudo recall W. Recall Threshold Precision W. Precision 0.7 0.69 0.79 1 1 0.8 0.7 0.84 0.76 1.054 0.9 0.68 0.83 0.45 1.04 0.95 0.66 0.83 0.26 1.03 Table 2. Precision and Recall by threshold

Sharif University of Technology 23/2/1397 28/32 Fig. 7. CDF of the number intent clusters with edge to submitted query.

Sharif University of Technology 23/2/1397 29/32 Fig. 8. CDF of the number of queries per queries per intent cluster.

Sharif University of Technology 23/2/1397 30/32 Impact on CTR

Sharif University of Technology 23/2/1397 31/32 Discussion & Conclusion • It is unclear that increase on scale makes the search problem easier or harder • Search engines are one of the most complicated engineering tasks ever attempted

Sharif University of Technology 23/2/1397 32/32 Thanks for your attention!

Scale Effects in Web Search Soroush Ebadian, Parand Alizadeh Under - PowerPoint PPT Presentation

Sharif University of Technology 23/2/1397 1/32 Scale Effects in Web Search Soroush Ebadian, Parand Alizadeh Under Supervision of Prof. Fazli Social and Economic Networks, Spring 1397 Sharif University of Technology 23/2/1397 Contents 2/32

Web Services Web Services Towards Web Services Towards Web Services Towards Web Services A

Effects and State Liam OConnor CSE, UNSW (and Data61) Term 2 2019 1 Effects State IO

Search Engines Issues Avi Rappoport Search Tools Consulting Search Issues Enterprise Search

Large-Scale Web Applications Mendel Rosenblum CS142 Lecture Notes - Large-Scale Web Apps Web

Web CS490W: Web I nformation Search & Management Web opened the door for many important

Web Data Representation Web Graph, Text, Images, Metadata, Search spaces Web Search 1 The Web

CS490W Without search engines the web wouldnt scale The acceptance of search interaction makes

Tabu Search Search Tabu Page 1 Part I Part I Tabu Search Principles Search Principles Tabu

Uninformed Search 2 Informed Search Rest of blind search An informed search strategyone

Informed search algorithms Outline Best-first search Greedy best-first search A *

Foundations of Artificial Intelligence 9. State-Space Search: Tree Search and Graph Search Malte

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Lecture 1: Semantic Web and RDF Aidan Hogan aidhog@gmail.com THE WEB The Web is now 26 years

EE 6882 Visual Search Engine Lec. 1: Introduction tinyeye, photo copy search Web image search

Media Link Analysis and Web Search How to Organize the Web First try: Human curated Web

Datatel Colleague Web User Interface The Look of Web UI Search Area Tabs Context Area Menus

Building solar through community. Building community through solar. Tess Isemann

Ventura County Municipal Separate Storm Sewer System First MS4 Permit (1994) ! Program

Traffic Congestion Continues to Increase Across the US, congestion during commuting hours

Canadian Tire Corporation February 2010 F Forward-looking Information d l ki I f i This

DEVICES GIVE US CONTEXT WHEN WE SEARCH Our$%me$online$is$spread$between$$

Beyond the Repository: Integrating Local Preservation Systems with National Distribution Services

Our Regulatory Proposals AER Public Forum 9 April 2019 Energy Queensland A community and

CONSULTANT TEAM PRESENTATION ON PA SEPT 14 REVISED PLAN DRAFT September 27, 2018 AGENDA

Scale Effects in Web Search Soroush Ebadian, Parand Alizadeh Under - PowerPoint PPT Presentation

Sharif University of Technology 23/2/1397 1/32 Scale Effects in Web Search Soroush Ebadian, Parand Alizadeh Under Supervision of Prof. Fazli Social and Economic Networks, Spring 1397 Sharif University of Technology 23/2/1397 Contents 2/32

Web Services Web Services Towards Web Services Towards Web Services Towards Web Services A

Effects and State Liam OConnor CSE, UNSW (and Data61) Term 2 2019 1 Effects State IO

Search Engines Issues Avi Rappoport Search Tools Consulting Search Issues Enterprise Search

Large-Scale Web Applications Mendel Rosenblum CS142 Lecture Notes - Large-Scale Web Apps Web

Web CS490W: Web I nformation Search &amp; Management Web opened the door for many important

Web Data Representation Web Graph, Text, Images, Metadata, Search spaces Web Search 1 The Web

CS490W Without search engines the web wouldnt scale The acceptance of search interaction makes

Tabu Search Search Tabu Page 1 Part I Part I Tabu Search Principles Search Principles Tabu

Uninformed Search 2 Informed Search Rest of blind search An informed search strategyone

Informed search algorithms Outline Best-first search Greedy best-first search A *

Foundations of Artificial Intelligence 9. State-Space Search: Tree Search and Graph Search Malte

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Lecture 1: Semantic Web and RDF Aidan Hogan aidhog@gmail.com THE WEB The Web is now 26 years

EE 6882 Visual Search Engine Lec. 1: Introduction tinyeye, photo copy search Web image search

Media Link Analysis and Web Search How to Organize the Web First try: Human curated Web

Datatel Colleague Web User Interface The Look of Web UI Search Area Tabs Context Area Menus

Building solar through community. Building community through solar. Tess Isemann

Ventura County Municipal Separate Storm Sewer System First MS4 Permit (1994) ! Program

Traffic Congestion Continues to Increase Across the US, congestion during commuting hours

Canadian Tire Corporation February 2010 F Forward-looking Information d l ki I f i This

DEVICES GIVE US CONTEXT WHEN WE SEARCH Our$%me$online$is$spread$between$$

Beyond the Repository: Integrating Local Preservation Systems with National Distribution Services

Our Regulatory Proposals AER Public Forum 9 April 2019 Energy Queensland A community and

CONSULTANT TEAM PRESENTATION ON PA SEPT 14 REVISED PLAN DRAFT September 27, 2018 AGENDA

Web CS490W: Web I nformation Search & Management Web opened the door for many important