CS293S Summary 2017 Tao Yang Search Result Reply Pages - - PowerPoint PPT Presentation

cs293s summary
SMART_READER_LITE
LIVE PREVIEW

CS293S Summary 2017 Tao Yang Search Result Reply Pages - - PowerPoint PPT Presentation

CS293S Summary 2017 Tao Yang Search Result Reply Pages Advertisements Main results Suggestions recommendation A Crawler Architecture Olston/Najork. Web crawling. Found. Trends Inf. Retr., 4(3):175--246, March 2010. Offline Architecture


slide-1
SLIDE 1

CS293S Summary

2017 Tao Yang

slide-2
SLIDE 2

Search Result Reply Pages

Main results

Suggestions recommendation Advertisements

slide-3
SLIDE 3

A Crawler Architecture

Olston/Najork. Web crawling.

  • Found. Trends Inf. Retr., 4(3):175--246, March 2010.
slide-4
SLIDE 4

Offline Architecture

Classification Clustering Indexing/mapreduce Click data Feature engineering/management

slide-5
SLIDE 5

5

Similarity Analysis

Docu- ment The set

  • f strings
  • f length k

that appear in the doc- ument Signatures : short integer vectors that represent the sets, and reflect their similarity Locality- sensitive Hashing Candidate pairs : those pairs

  • f signatures

that we need to test for similarity.

slide-6
SLIDE 6

Online Engine: Architecture, Matching, Ranking

Clustering Middleware Document Abstract Cache Frontend

Client queries

Traffic load balancer Cache Cache Frontend Frontend Frontend Web page index Document Abstract Document Abstract Document description Ranking Ranking Ranking Ranking Ranking Ranking Classification PageInfo Suggestions Hierarchical Cache Structured DB Web page index Web Search for a Planet: The Google Cluster Architecture

  • L. Barroso, J. Dean, U. Hölzle, IEEE Micro, vol. 23 (2003)
slide-7
SLIDE 7

Document Ranking with Text, Quality, and Click Features

  • Text features

§ TFIDF, BM25 § Where do they appear? Title/body § Proximity (word distance)

  • Document quality and classification

§ Web link scores (e.g. PageRank). § Page length, URL type etc.

  • User behavior data

§ Presentation: what a user sees before a click § Clickthrough: frequency and timing of clicks § Browsing: what users do after a click

slide-8
SLIDE 8

Learning to rank

  • Convert ranking problem to a classification

problem. § Point-wise learning –Given a query-document pair, predict a score (e.g. relevancy score) § Pair-wise learning –the input is a pair of documents for a query § List-wise learning

  • Bayes, SVM, decision trees, human rules.
  • Bagging/boosting to combine multiple schemes
slide-9
SLIDE 9

9

Recommendation vs Search Ranking

  • Collaborative filtering :

Similarity-guided recommendation

Text Content Link popularity User click data Web page ranking User rating Item recommendation Content

å å

= =

  • +

=

n u u a n u u i u u a a i a

w r r w r p

1 , 1 , , ,

) (

User a Item i Sparse

slide-10
SLIDE 10

Search Advertisement