SLIDE 1
Quality-biased Ranking for Queries with Commercial Intent Alexander - - PowerPoint PPT Presentation
Quality-biased Ranking for Queries with Commercial Intent Alexander - - PowerPoint PPT Presentation
Quality-biased Ranking for Queries with Commercial Intent Alexander Shishkin Polina Zhinalieva Kirill Nikolaev {sisoid, bondy, kvn}@yandex-team.ru Yandex LLC WebQuality Workshop 2013 1 Topical Relevance Scale Vital the most likely
SLIDE 2
SLIDE 3
The Main Problems of Commercial Ranking Query: "IPhone 5 wholesale" URL Rating wholesaleiphone5.net Highly relevant wholesaleiphone5sale.com Highly relevant iphone5wholesale.com Highly relevant wholesaleiphone5cool.com Highly relevant appleiphone5wholesale.com Highly relevant Any rearrangement of SE results makes no sense in terms of relevance metrics
- ✠
Top positions are saturated with over-optimized sites
❅ ❅ ❅ ❅ ❅ ❅ ❘
3
SLIDE 4
Are Commercial Sites Really Identical? best-tyres.ru tyreservice.ru 4
SLIDE 5
Over-optimized Document Features Text features Link features 5
SLIDE 6
SEO Ecosystem Over-optimized sites in the top-10 SE results Further optimization
- f search factors
✛ ✣✢ ✤✜ t t ✻ PPPPPPPPPPPP P q
Webmaster 6
SLIDE 7
Ecosystem of Commercial Ranking Improving the quality of search engine’s results
❄
Introducing new features to capture the site quality
✲
Quality-correlated factors optimization
✛ ✣✢ ✤✜ t t ✻
Webmaster 7
SLIDE 8
The Main Steps in Our Approach
◮ Step 1: introduce new relevance labels ◮ Step 2: create new ranking features ◮ Step 3: modify ranking function ◮ ?????? ◮ PROFIT
8
SLIDE 9
Components of the Document Quality Score
◮ Assortment for a given query ◮ Design quality ◮ Trustworthiness of the site ◮ Quality of service ◮ Usability features of the site
9
SLIDE 10
Illustration of Assortment
◮ Assortment for a given query ◮ Design quality ◮ Trustworthiness of the site ◮ Quality of service ◮ Usability features of the site
10
SLIDE 11
Illustration of Assortment
◮ Assortment for a given query ◮ Design quality ◮ Trustworthiness of the site ◮ Quality of service ◮ Usability features of the site
11
SLIDE 12
Illustration of Usability Features
◮ Assortment for a given query ◮ Design quality ◮ Trustworthiness of the site ◮ Quality of service ◮ Usability features of the site
12
SLIDE 13
Illustration of Usability Features
◮ Assortment for a given query ◮ Design quality ◮ Trustworthiness of the site ◮ Quality of service ◮ Usability features of the site
13
SLIDE 14
Aggregation of Quality Components into the Single Score Commercial relevance: Rc(q, d, s) = V(q, d) · (D(s) + T(s) + S(s) + U(s)), q — search query, d — document, s — the whole site, V(q, d) — Assortment, D(s) — design quality, T(s) — trustworthiness, S(s) — quality of service, U(s) — usability. 14
SLIDE 15
Features for Measuring Site Quality A few examples: Detailed contact information Absence of advertising Number of different product items Availability of shipping service Price discounts . . . 15
SLIDE 16
Challenges of Commercial Ranking
◮ Assessment is 6 times more time-consuming ◮ Only highly relevant documents are evaluated ◮ New labels cover no more than 5% of the dataset ◮ All topical relevance labels should be used
Solution: extrapolate commercial relevance score to the entire dataset using machine learning. 16
SLIDE 17
Learning to Rank with New Relevance Labels Unified relevance: Ru(q, d, s) = Rt(q, d) + α · Rc
est(q, d, s),
Rt(q, d) — topical relevance score, Rc
est(q, d, s) — estimate of the commercial relevance score,
α — weighting coefficient. And now we use standard machine learning algorithm . . . 17
SLIDE 18
New Metrics for the Method Evaluation Offline DCG-like metrics: Goodness(q) =
10
- i=1
Rc(q, di, si) log2(i + 1) , Badness(q) =
10
- i=1
(Rc(q, di, si) ≤ th) log2(i + 1) , th — threshold for the minimal acceptable site quality. 18
SLIDE 19
Changes in New Metrics
Goodness metric (30%-increase) Badness metric (70%-decrease)
19
SLIDE 20
Changes in Online Metrics A/B experiment:
◮ 7%-increase in the Long Clicks per Session metric; ◮ 5%-decrease in the Abandonment Rate metric.
Interleaving experiment:
◮ users chose new ranking results 1% more often than
results from default ranking system. 20
SLIDE 21