Incorporating Clicks, Attention and Satisfaction into a SERP - - PowerPoint PPT Presentation

incorporating clicks attention and satisfaction into a
SMART_READER_LITE
LIVE PREVIEW

Incorporating Clicks, Attention and Satisfaction into a SERP - - PowerPoint PPT Presentation

Background Motivation Model & Metric Experimental Setup Results Summary Incorporating Clicks, Attention and Satisfaction into a SERP Evaluation Model Aleksandr Chuklin , Maarten de Rijke chuklin@google.com derijke@uva.nl


slide-1
SLIDE 1

Background Motivation Model & Metric Experimental Setup Results Summary

Incorporating Clicks, Attention and Satisfaction into a SERP Evaluation Model

Aleksandr Chuklin¶,§ Maarten de Rijke§ chuklin@google.com derijke@uva.nl

¶Google Research Europe §University of Amsterdam AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 1

slide-2
SLIDE 2

Background

slide-3
SLIDE 3

Background Motivation Model & Metric Experimental Setup Results Summary

Search Engine Result Page (SERP) Evaluation

Main problem Combining relevance of individual SERP items (Rk) into a whole-page metric.

AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 3

slide-4
SLIDE 4

Background Motivation Model & Metric Experimental Setup Results Summary

Search Engine Result Page (SERP) Evaluation

Examples

document 3 document 4 document 1 document 2 document 5

AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 4

slide-5
SLIDE 5

Background Motivation Model & Metric Experimental Setup Results Summary

Search Engine Result Page (SERP) Evaluation

Examples Precision at N: P@N = 1 N

N

  • k=1

Rk

document 3 document 4 document 1 document 2 document 5

AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 4

slide-6
SLIDE 6

Background Motivation Model & Metric Experimental Setup Results Summary

Search Engine Result Page (SERP) Evaluation

Examples Precision at N: P@N = 1 N

N

  • k=1

Rk Discounted Cumulative Gain (DCG): DCG@N =

N

  • k=1

1 log2 (1 + k) · Rk

document 3 document 4 document 1 document 2 document 5

AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 4

slide-7
SLIDE 7

Background Motivation Model & Metric Experimental Setup Results Summary

Search Engine Result Page (SERP) Evaluation

Examples Precision at N: P@N = 1 N

N

  • k=1

Rk Discounted Cumulative Gain (DCG): DCG@N =

N

  • k=1

1 log2 (1 + k) · Rk Model-Based Metrics (Chuklin et al. 2013): Utility@N =

N

  • k=1

P(Ck = 1) · Rk

document 3 document 4 document 1 document 2 document 5

AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 4

slide-8
SLIDE 8

Background Motivation Model & Metric Experimental Setup Results Summary

Main Goal of This Paper Better measure for SERP utility

Namely, improve this (Chuklin et al. 2013):

N

  • k=1

P(Ck = 1) · Rk

AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 5

slide-9
SLIDE 9

Motivation

slide-10
SLIDE 10

Background Motivation Model & Metric Experimental Setup Results Summary

Complex Heterogeneous SERPs

AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 7

slide-11
SLIDE 11

Background Motivation Model & Metric Experimental Setup Results Summary

Motivation 1: Non-Trivial Attention Patterns

9 1 3 5 6 7 8 4 2

1452

Image credits: F. Diaz, R.W. White, G. Buscher, and D. Liebling. Robust models of mouse movement on dynamic web search results pages. In CIKM, 2013. ACM Press AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 8

slide-12
SLIDE 12

Background Motivation Model & Metric Experimental Setup Results Summary

Motivation 2: Satisfaction Without Clicks

High direct page utility (measured by DCG or ERR) leads to higher abandonment rate (SERPs with no clicks) direct page utility

Image credits: from A. Chuklin and P. Serdyukov. Good abandonments in factoid queries. In WWW, 2012. AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 9

slide-13
SLIDE 13

Background Motivation Model & Metric Experimental Setup Results Summary

Problems of Existing Models and Evaluation Metrics

AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 10

slide-14
SLIDE 14

Background Motivation Model & Metric Experimental Setup Results Summary

Problems of Existing Models and Evaluation Metrics

existing models mostly do not model non-trivial user attention patterns

AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 10

slide-15
SLIDE 15

Background Motivation Model & Metric Experimental Setup Results Summary

Problems of Existing Models and Evaluation Metrics

existing models mostly do not model non-trivial user attention patterns existing models do not use explicit user satisfaction data

AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 10

slide-16
SLIDE 16

Model & Metric

slide-17
SLIDE 17

Background Motivation Model & Metric Experimental Setup Results Summary

Clicks + Attention + Satisfaction (CAS) Model

SERP

𝜒& 𝐹& 𝐷& 𝜒) 𝐹) 𝐷) 𝜒* 𝐹* 𝐷* 𝑇

Utility

AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 12

slide-18
SLIDE 18

Background Motivation Model & Metric Experimental Setup Results Summary

Clicks + Attention + Satisfaction (CAS) Model

SERP

𝜒& 𝐹& 𝐷& 𝜒) 𝐹) 𝐷) 𝜒* 𝐹* 𝐷* 𝑇

Utility

AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 13

slide-19
SLIDE 19

Background Motivation Model & Metric Experimental Setup Results Summary

Click Model

Examination assumption: click happens only when an item was examined and attractive: P(Ck = 1) = P(Ek = 1) · P(Ck = 1 | Ek = 1)

AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 14

slide-20
SLIDE 20

Background Motivation Model & Metric Experimental Setup Results Summary

Click Model

Examination assumption: click happens only when an item was examined and attractive: P(Ck = 1) = P(Ek = 1) · P(Ck = 1 | Ek = 1) N.B. Here we assume that P(Ck = 1 | Ek = 1) = α( Rk) where Rk comes from the raters and α is a logistic function.

AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 14

slide-21
SLIDE 21

Background Motivation Model & Metric Experimental Setup Results Summary

Clicks + Attention + Satisfaction (CAS) Model

SERP

𝜒& 𝐹& 𝐷& 𝜒) 𝐹) 𝐷) 𝜒* 𝐹* 𝐷* 𝑇

Utility

AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 15

slide-22
SLIDE 22

Background Motivation Model & Metric Experimental Setup Results Summary

Attention (Examination) Model

Logistic regression model: P(Ek = 1) = ε( ϕk), where ϕk is a vector of features for SERP item k.

AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 16

slide-23
SLIDE 23

Background Motivation Model & Metric Experimental Setup Results Summary

Attention (Examination) Model

Logistic regression model: P(Ek = 1) = ε( ϕk), where ϕk is a vector of features for SERP item k. Feature group Features # of features rank user-perceived rank of the SERP item (can be different from k) 1

AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 16

slide-24
SLIDE 24

Background Motivation Model & Metric Experimental Setup Results Summary

Attention (Examination) Model

Logistic regression model: P(Ek = 1) = ε( ϕk), where ϕk is a vector of features for SERP item k. Feature group Features # of features rank user-perceived rank of the SERP item (can be different from k) 1 CSS classes SERP item type (Web, News, Weather, Currency, Knowledge Panel, etc.) 10 geometry

  • ffset from the top, first or second col-

umn (binary), width (w), height (h), w × h 5

AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 16

slide-25
SLIDE 25

Background Motivation Model & Metric Experimental Setup Results Summary

Clicks + Attention + Satisfaction (CAS) Model

SERP

𝜒& 𝐹& 𝐷& 𝜒) 𝐹) 𝐷) 𝜒* 𝐹* 𝐷* 𝑇

Utility

AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 17

slide-26
SLIDE 26

Background Motivation Model & Metric Experimental Setup Results Summary

Satisfaction Model

in previous models, satisfaction comes only from clicked results;

AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 18

slide-27
SLIDE 27

Background Motivation Model & Metric Experimental Setup Results Summary

Satisfaction Model

in previous models, satisfaction comes only from clicked results; in our model it also comes from the SERP items that simply attracted attention;

AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 18

slide-28
SLIDE 28

Background Motivation Model & Metric Experimental Setup Results Summary

Satisfaction Model

in previous models, satisfaction comes only from clicked results; in our model it also comes from the SERP items that simply attracted attention; P(S = 1) = σ(τ0 + U) =

AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 18

slide-29
SLIDE 29

Background Motivation Model & Metric Experimental Setup Results Summary

Satisfaction Model

in previous models, satisfaction comes only from clicked results; in our model it also comes from the SERP items that simply attracted attention; P(S = 1) = σ(τ0 + U) = σ

  • τ0 +
  • k

P(Ek = 1)ud( Dk) +

  • k

P(Ck = 1)ur( Rk)

  • where

Dk and Rk are ratings assigned by the raters for direct snippet relevance and result relevance respectively. ud and ur are linear functions of rating histograms.

AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 18

slide-30
SLIDE 30

Background Motivation Model & Metric Experimental Setup Results Summary

The CAS Metric

Utility that determines the satisfaction probability: U =

  • k

P(Ek = 1)ud( Dk) +

  • k

P(Ck = 1)ur( Rk)

AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 19

slide-31
SLIDE 31

Background Motivation Model & Metric Experimental Setup Results Summary

The CAS Metric

Utility that determines the satisfaction probability: U =

  • k

P(Ek = 1)ud( Dk)

  • NEW

+

  • k

P(Ck = 1)ur( Rk)

  • Chuklin et al. 2013

has an additional term

AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 19

slide-32
SLIDE 32

Background Motivation Model & Metric Experimental Setup Results Summary

The CAS Metric

Utility that determines the satisfaction probability: U =

  • k

P(Ek = 1)ud( Dk)

  • NEW

+

  • k

P(Ck = 1)ur( Rk)

  • Chuklin et al. 2013

has an additional term trained on mousing and satisfaction (in addition to clicks)

AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 19

slide-33
SLIDE 33

Experimental Setup

slide-34
SLIDE 34

Background Motivation Model & Metric Experimental Setup Results Summary

Dataset

199 queries with explicit unambiguous feedback (satisfied / not satisfied);

AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 21

slide-35
SLIDE 35

Background Motivation Model & Metric Experimental Setup Results Summary

Dataset

199 queries with explicit unambiguous feedback (satisfied / not satisfied); 1,739 rated results

direct snippet relevance (D) result relevance (R)

AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 21

slide-36
SLIDE 36

Background Motivation Model & Metric Experimental Setup Results Summary

Baselines and CAS Model Variants

UBM model that agrees well with online team-draft experimental outcomes; PBM position-based model, a robust model with fewer parameters than UBM; random model that predicts click and satisfaction with fixed probabilities (learned from the data). uUBM from Chuklin et al. 2013. Similar to UBM, but parameters are trained on a different and much bigger dataset.

AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 22

slide-37
SLIDE 37

Background Motivation Model & Metric Experimental Setup Results Summary

Baselines and CAS Model Variants

UBM model that agrees well with online team-draft experimental outcomes; PBM position-based model, a robust model with fewer parameters than UBM; random model that predicts click and satisfaction with fixed probabilities (learned from the data). uUBM from Chuklin et al. 2013. Similar to UBM, but parameters are trained on a different and much bigger dataset. CASnod is a stripped-down version that does not use (D) labels; CASnosat is a version of the CAS model that does not include the satisfaction term while optimizing the model; CASnoreg is a version of the CAS model that does not use regularization while

  • training. All other models

were trained with L2-regularization.

AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 22

slide-38
SLIDE 38

Results

slide-39
SLIDE 39

Background Motivation Model & Metric Experimental Setup Results Summary

Is the New Metric Really New?

Correlation Between Metrics

Table: Correlation between metrics measured by average Pearson’s correlation coefficient. CASnosat CASnoreg CAS UBM PBM DCG uUBM CASnod 0.593 0.564 0.633 0.470 0.487 0.546 0.441 CASnosat 0.664 0.715 0.707 0.668 0.735 0.684 CASnoreg 0.974 0.363 0.379 0.417 0.341 CAS 0.377 0.394 0.440 0.360 UBM 0.814 0.972 0.882 PBM 0.906 0.965 DCG 0.943

AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 24

slide-40
SLIDE 40

Background Motivation Model & Metric Experimental Setup Results Summary

Is the New Metric Measuring the Right Thing?

Metric Correlation with True Satisfaction

C A S n

  • d

C A S n

  • s

a t C A S n

  • r

e g C A S U B M P B M r a n d

  • m

D C G u U B M 0.2 0.1 0.0 0.1 0.2 0.3 0.4 0.5 0.6

Pearson correlation coefficient between different model-based metrics and the user-reported satisfaction.

AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 25

slide-41
SLIDE 41

Background Motivation Model & Metric Experimental Setup Results Summary

Bonus Point

Log-Likelihood of Click Prediction

C A S n

  • d

C A S n

  • s

a t C A S n

  • r

e g C A S U B M P B M r a n d

  • m

u U B M 4.5 4.0 3.5 3.0 2.5 2.0 1.5

Log-likelihood of the click data. Note that uUBM was trained on a totally different dataset.

AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 26

slide-42
SLIDE 42

Summary

slide-43
SLIDE 43

Background Motivation Model & Metric Experimental Setup Results Summary

Summary

A model-based metric needs to model satisfaction explicitly and use it for training.

AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 28

slide-44
SLIDE 44

Background Motivation Model & Metric Experimental Setup Results Summary

Summary

A model-based metric needs to model satisfaction explicitly and use it for training. Direct snippet relevance (D) is essential for predicting satisfaction.

AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 28

slide-45
SLIDE 45

Background Motivation Model & Metric Experimental Setup Results Summary

Summary

A model-based metric needs to model satisfaction explicitly and use it for training. Direct snippet relevance (D) is essential for predicting satisfaction. The CAS metric is quite different from the previously used metrics, making it an interesting addition to TREC.

AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 28

slide-46
SLIDE 46

Background Motivation Model & Metric Experimental Setup Results Summary

Summary

A model-based metric needs to model satisfaction explicitly and use it for training. Direct snippet relevance (D) is essential for predicting satisfaction. The CAS metric is quite different from the previously used metrics, making it an interesting addition to TREC. When used as a model, CAS consistently predicts user satisfaction with a relatively small penalty in click prediction.

AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 28

slide-47
SLIDE 47

Background Motivation Model & Metric Experimental Setup Results Summary

Acknowledgments

All content represents the opinion of the authors which is not necessarily shared or endorsed by their respective employers and/or sponsors. AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 29

slide-48
SLIDE 48
slide-49
SLIDE 49

Background Motivation Model & Metric Experimental Setup Results Summary

Evaluating the User Model

Log-Likelihood of Satisfaction Prediction

C A S n

  • d

C A S n

  • s

a t C A S n

  • r

e g C A S U B M P B M r a n d

  • m

u U B M 0.8 0.7 0.6 0.5 0.4 0.3 0.2

Log-likelihood of the satisfaction prediction. Some models have log-likelihood below −0.8, hence there are no boxes for them.

AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 31

slide-50
SLIDE 50

Background Motivation Model & Metric Experimental Setup Results Summary

Analyzing the Attention Features

CASrank is the model that only uses the rank to predict attention; CASnogeom only uses the rank and SERP item type information and does not use geometry; CASnoclass does not use the CSS class features (SERP item type). Pearson correlation with satisfaction

C A S r a n k C A S n

  • g

e

  • m

C A S n

  • c

l a s s C A S n

  • d

C A S 0.2 0.1 0.0 0.1 0.2 0.3 0.4 0.5 0.6

Log-likelihood of clicks / satisfaction

CASrank CASnogeom CASnoclass CASnod CAS 2.5 2.4 2.3 2.2 2.1 2.0 1.9 1.8 1.7 C A S r a n k C A S n

  • g

e

  • m

C A S n

  • c

l a s s C A S n

  • d

C A S 0.65 0.60 0.55 0.50 0.45 0.40 0.35 0.30 0.25 0.20

AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 32

slide-51
SLIDE 51

Background Motivation Model & Metric Experimental Setup Results Summary

Heterogeneous SERPs

12% of the SERPs in our data are heterogeneous and our metric does well for them.

Table: Pearson correlation between utility of heterogeneous SERP and user-reported satisfaction. CAS UBM PBM random DCG uUBM 0.60 0.38

  • 0.05
  • 0.39

0.24

  • 0.08

CASrank CASnogeom CASclass CASnod CASnosat CASnoreg 0.15

  • 0.04

0.27

  • 0.04

0.48 0.67

AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 33

slide-52
SLIDE 52

Background Motivation Model & Metric Experimental Setup Results Summary

Spammers

Some raters were filtered out as spammers, but there was still some natural disagreement:

Table: Filtered out workers and agreement scores for remaining workers.

% of workers % of ratings Cohen’s Krippendorf’s label removed removed kappa alpha (D) 32% 27% 0.339 0.144 (R) 41% 29% 0.348 0.117

AC–MdR Incorporating Clicks, Attention and Satisfaction. . . 34