scale effects in web search
play

Scale Effects in Web Search Soroush Ebadian, Parand Alizadeh Under - PowerPoint PPT Presentation

Sharif University of Technology 23/2/1397 1/32 Scale Effects in Web Search Soroush Ebadian, Parand Alizadeh Under Supervision of Prof. Fazli Social and Economic Networks, Spring 1397 Sharif University of Technology 23/2/1397 Contents 2/32


  1. Sharif University of Technology 23/2/1397 1/32 Scale Effects in Web Search Soroush Ebadian, Parand Alizadeh Under Supervision of Prof. Fazli Social and Economic Networks, Spring 1397

  2. Sharif University of Technology 23/2/1397 Contents 2/32 • Overview on Problem Space • Data Description • Direct Effects of Scale • Indirect Effects of Scale • Discussion & Conclusion

  3. Sharif University of Technology 23/2/1397 3/32 • Overview on Problem Space • Data Description • Direct Effects of Scale • Indirect Effects of Scale • Discussion & Conclusion

  4. Sharif University of Technology 23/2/1397 Analysis of Web Search Markets 4/32 • T wo different worlds • Ranking based on algorithmic innovation and fixed document features • Learning from historical queries is critical ranking quality Little is known about which one we live in.

  5. Sharif University of Technology 23/2/1397 Analysis of Web Search Markets 5/32 (cont.) • Learning tends to slow down with each additional data point Fig. 1: A learning curve averaged over many trials Can any viable entrant easily achieve?!

  6. Sharif University of Technology 23/2/1397 Authors of Paper 6/32 • Microsoft AI & Research: 4/5 • HomeAway Inc. : 1/5

  7. Sharif University of Technology 23/2/1397 7/32 • Overview on Problem Space • Data Description • Direct Effects of Scale • Indirect Effects of Scale • Discussion & Conclusion

  8. Sharif University of Technology 23/2/1397 Data Description 8/32 • T wo search engines with same restrictions • More than 6 months • Based on Click-Through-Rates (CTR) Provider 1 (# impressions) > 200 billion Provider 2 (# impressions) > 300 billion Provider 1 (# clicks) > 100 billion Provider 2 (# clicks) > 150 billion Table 1. Summary statistics

  9. Sharif University of Technology 23/2/1397 9/32 • Overview on Problem Space • Data Description • Direct Effects of Scale • Indirect Effects of Scale • Discussion & Conclusion

  10. Sharif University of Technology 23/2/1397 Benchmark & Target Data 10/32 • Legally limited time raw log retention • Benchmark data: first 3 months • Target data: next 9 months • <H(q,d), CTR(q,d)> • H(q,d): historical measure before day d for query q • CTR(q,d): CTR in day d of query q

  11. Sharif University of Technology 23/2/1397 CTR & Historical Occurrences 11/32 Positive Correlation • Generated 270 pairs into buckets by H(q,d) 1 0.5 0 Provider 1 Provider 2 0-10 10-100 100-1k 10k-100k 100k-1m 1m-10m 10m-100m Fig. 2. CTR shows a positive correlation with the number of historical occurrences.

  12. Sharif University of Technology 23/2/1397 12/32 Regression Analysis CTR = − 0.0530[ − 0.085, − 0.021] + 0.3287[0.315, 0.343] sqrt(log(x)) Fig. 3. Provider 1, relationship between CTR and number of historical examples.

  13. Sharif University of Technology 23/2/1397 13/32 Regression Analysis (cont.) CTR = − 0.3871[ − 0.486, − 0.288] + 0.4792[0.438, 0.520] sqrt(log(x)) Fig. 4. Provider 2, relationship between CTR and number of historical examples.

  14. Sharif University of Technology 23/2/1397 Scale Effect Analysis on New Queries 14/32 • Popular queries may be easier to satisfy • Same “query difficulty” • (1) query has less than 200 clicks in the three-month benchmark • (2) total number of clicks of the query between 1000 and 2000 (in a year) • Provider 1: 8000 queries • Provider 2: 10000 queries

  15. Sharif University of Technology 23/2/1397 Scale Effect Analysis on New Queries 15/32 (cont.) • CTR(q, c): CTR of q in period of c+1 to c+100 clicks • c ∈ {100, 200, . . . , 900}

  16. Sharif University of Technology 23/2/1397 Scale Effect Exists in Both 16/32 Fig. 5. Provider 1, relationship between CTR and number of historical examples for new queries only.

  17. Sharif University of Technology 23/2/1397 Scale Effect Exists in Both (cont.) 17/32 Fig. 6. Provider 2, relationship between CTR and number of historical examples for new queries only.

  18. Sharif University of Technology 23/2/1397 18/32 • Overview on Problem Space • Data Description • Direct Effects of Scale • Indirect Effects of Scale • Discussion & Conclusion

  19. Sharif University of Technology 23/2/1397 19/32 Constructing Bipartite Knowledge Graph • G = <Q, D, E> • Q = queries, D = documents • e ij = click count between q i and d j • Represent each query as a bag of words queries reduce by 7%

  20. Sharif University of Technology 23/2/1397 20/32 Summary of Query-Document Graph • Cardinality Q: 4.82 billion • Cardinality D: 3.26 billion • Cardinality E: 11.6 billion • T otal clicks: > 100 billion

  21. Sharif University of Technology 23/2/1397 21/32 Clustering Documents • Construct similarity matrix of document using cosine similarity • Convert similarity weights to 0 or 1 using a threshold • Construct document similarity graph

  22. Sharif University of Technology 23/2/1397 22/32 Clustering Documents (cont.) • Find connected components of documents similarity graph • Each connected component is an intent - cluster • Construct query/intent-cluster graph • E ij = fraction of clicks from q i to cluster j

  23. Sharif University of Technology 23/2/1397 Algorithm 1. Find Connected 23/32 Components 1. Every document pair is a separate cluster 2. Identify link nodes between pairs and merge 3. Repeat 2 until convergence

  24. Sharif University of Technology 23/2/1397 24/32 Evaluation of Clusters • Form a 100-query test set and get all clusters • Score edges with 0 or 1 using auditors • Choose thresholds between 0.7, 0.8, 0.9 and 0.95

  25. Sharif University of Technology 23/2/1397 25/32 Evaluation of Clusters (cont.) • Precision: fraction of pairs judged to be relevant • Weighted Precision: precision with applying Markov weight to each pair

  26. Sharif University of Technology 23/2/1397 26/32 Evaluation of Clusters (cont.) • Pseudo Recall: for threshold 0.7 is 1, o.w is fraction of pairs each method recovers • Weighted Recall: pseudo recall with applying Markov weight to each pair

  27. Sharif University of Technology 23/2/1397 27/32 Evaluation of Clusters (cont.) Pseudo recall W. Recall Threshold Precision W. Precision 0.7 0.69 0.79 1 1 0.8 0.7 0.84 0.76 1.054 0.9 0.68 0.83 0.45 1.04 0.95 0.66 0.83 0.26 1.03 Table 2. Precision and Recall by threshold

  28. Sharif University of Technology 23/2/1397 28/32 Fig. 7. CDF of the number intent clusters with edge to submitted query.

  29. Sharif University of Technology 23/2/1397 29/32 Fig. 8. CDF of the number of queries per queries per intent cluster.

  30. Sharif University of Technology 23/2/1397 30/32 Impact on CTR

  31. Sharif University of Technology 23/2/1397 31/32 Discussion & Conclusion • It is unclear that increase on scale makes the search problem easier or harder • Search engines are one of the most complicated engineering tasks ever attempted

  32. Sharif University of Technology 23/2/1397 32/32 Thanks for your attention!

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend