Buildi ding ng Reco commen mmende ders rs and Searc rch h - - PowerPoint PPT Presentation

buildi ding ng reco commen mmende ders rs and searc rch h
SMART_READER_LITE
LIVE PREVIEW

Buildi ding ng Reco commen mmende ders rs and Searc rch h - - PowerPoint PPT Presentation

Buildi ding ng Reco commen mmende ders rs and Searc rch h Engines es by Re-usin sing g User r Feedback ck Adit ith Sw Swaminathan adswamin@microsoft.com ad Join Joint work with Thorst sten n Joa Joachims s an and d Tobia


slide-1
SLIDE 1

Buildi ding ng Reco commen mmende ders rs and Searc rch h Engines es by Re-usin sing g User r Feedback ck

Adit ith Sw Swaminathan ad adswamin@microsoft.com

Join Joint work with Thorst sten n Joa Joachims s an and d Tobia ias Sch Schnabel (Co (Cornell Uni niversit ity)

Ack Ack: NS NSF F Gr Grants

slide-2
SLIDE 2

Bi Bio

Counterfactual Evaluation MSR - DLTC and Learning

2

slide-3
SLIDE 3

Summary mmary

“Pay attention to feedback effects, and dis-entangle them” -- David

3

“Randomize cleverly to break confounding/feed back” -- Yisong “Use logs collected from interactive systems to evaluate/train new interaction policies” Now: Simple/pragmatic techniques to tackle biased user feedback

slide-4
SLIDE 4

Wald’s insight: What’s missing?

4

  • Where

re to add armor?

  • r? Cover

er bullet et-holes? holes? (Survivor rvivor bias!) s!)

  • Beware:

are: Confound founding ing due to missi sing ng info

slide-5
SLIDE 5

Overview verview

  • “Use user ratings for collaborative filtering”

– Project: t: MNAR

(Schnabel et al, ICML 2016)

  • “Use user clicks for search ranking”

– Project: t: ULTR

(Joachims et al, WSDM 2017)

5

slide-6
SLIDE 6

Movie vie Recommen commendation ation

6

5 1 5 3 1 3

Horro ror Romance Drama ma Romance Lovers rs Horro ror Lovers rs

5 5 1 3 5 5 5 5 1 3 5 5 5 3 5 5 1 3 3 1 1 5 5 3 3 5 5 5 5 5 5 3 1 5 5 3

Data a is Missi sing ng Not At Random

  • m (MNAR)

AR)

Y True Rati ting O Observe served Y/N

Example adapted from (Steck et al, 2010)

slide-7
SLIDE 7

Se Select ection ion Bi Bias as in n Rec ecommend mmendati tion

  • ns
  • User-induced (e.g. browsing)
  • System-induced (e.g. advertising)

Question: What if we ignore these biases?

7

slide-8
SLIDE 8

5 1 5 3 1 3

Horro ror Romance Drama ma Romance Lovers rs Horro ror Lovers rs

Ev Evaluatin aluating g rec ecommend mmendations ations un under der Se Select ection ion Bi Bias as

8

5 5 1 3 5 5 5 5 1 3 5 5 5 3 5 5 1 3 3 1 1 5 5 3 3 5 5 5 5 5 5 3 1 5 5 3

Y True Rati ting O Observe served Y/N ෡ 𝒁 Reco commend

5 5 1 3 5 5 5 5 1 3 5 5 5 3 5 5 1 3 3 1 1 5 5 3 3 5 5 5 3 5 5 5 3 1 5 5 3

Observed erved ratings ngs are misleadi eading ng

slide-9
SLIDE 9

Ev Evaluatin aluating g rating ating pr predictions edictions un under der Se Select ection ion Bi Bias as

9

Horro ror Romance Drama ma Horro ror Romance Drama ma Romance Lovers rs Horro ror Lovers rs

5 5 5 5 5 5 5 5 5 5 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 3 3 3 3 3 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5

෠ 𝑍

1

Pred Ratings (worse) ෠ 𝑍

2

Pred Ratings (better)

5 5 5 5 5 5 5 5 5 5 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 3 3 3 3 3 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5

Observed erved losses es are misleadi eading ng

slide-10
SLIDE 10

Rec ecommend mmendati tion

  • ns

s as as Treat eatments ments

Fix select ction

  • n bias

s  potentia ential l outcomes comes frame amework work

10

5 1 5 3 1 3

Items ms Users rs

Counterfactual Outcomes 𝑍 Factual Outcomes ෨ 𝑍

5 5 1 3 5 5 5 5 1 3 5 5 5 3 5 5 1 3 3 1 1 5 5 3 3 5 5 5 5 5 5 3 1 5 5 3

treatme ments patien tients

⇒ Understand erstand assign ignme ment nt mechani hanism sm

(Imbens & Ruben, 2015)

slide-11
SLIDE 11

As Assi signm gnment ent Mec echanism anism for

  • r

Rec ecommend mmendati tion

  • n

𝑄𝑣,𝑗 = 𝑄 𝑃𝑣,𝑗 = 1

Inverse Propensity Scoring (IPS) is unbiased if 𝑄𝑣,𝑗 > 0:

෠ 𝑆𝐽𝑄𝑇 =

1 𝑉⋅𝐽 ෍ 𝑣,𝑗 𝟚{𝑃𝑣𝑗=1} 𝑄𝑣,𝑗

𝑍

𝑣,𝑗 − ෠

𝑍

𝑣,𝑗 2

11

𝑞 𝑞/10 𝑞 𝑞/2 𝑞/10 𝑞/2

Horror

  • r

Roman ance ce Drama

Propensiti pensities es P

(Horvitz & Thompson, 1952; Rosenbaum & Rubin, 1983; ...)

slide-12
SLIDE 12

Debiasing ebiasing Ev Evalua aluation tion

IPS S is robust ust to selection ction bias

12

Seve verity rity of

  • f Sele

lecti ction Bias Seve verity rity of

  • f Sele

lecti ction Bias

slide-13
SLIDE 13

Ex Exper perime menta ntal l vs. . Obs bser erva vation tional al

  • Control

trolled led Experim eriments ents

– We control ntrol assign ignme ment nt mechan hanis ism m (e.g. .g. ad place acemen ment) t) – Prop

  • pen

ensiti ities es 𝑄𝑣,𝑗 = 𝑄 𝑃𝑣,𝑗 = 1 kno nown wn [ Just t log g prop

  • pen

ensiti ities es! ] – Requ quireme irement: nt: 𝑄𝑣,𝑗 > 0 (prob.

  • b. assign

ignmen ment) t)

  • Observa

ervational

  • nal Study

dy

– Assign ignmen ment mecha hanis nism m not t under der our cont

  • ntrol
  • l (e.g.

.g. revie iews ws/rating /ratings) – Use e featu atures 𝑎; ; ෠

𝑄𝑣,𝑗 = 𝑄 𝑃𝑣,𝑗 = 1| 𝑎

[ [ Estima timate te prope

  • pens

nsity ity ] – Requ quireme irement: nt: 𝑃𝑣,𝑗 ⊥ 𝑍

𝑣,𝑗 | 𝑎

(unc ncon

  • nfou

found nded) ed)

13

slide-14
SLIDE 14

Pr Prope

  • pens

nsity ity Es Estimatio imation

  • Supervi

ervise sed d Regress ession ion Probl blem em

෠ 𝑄𝑣,𝑗 = 𝑄 𝑃𝑣,𝑗 = 1| 𝑎

  • Off-the

he-sh shelf elf ML, e.g., .,

– Logis gistic ic regre gression ion – Naïv ïve e Bayes es – Bernou noulli lli Matrix trix Factor toriz izati ation

  • n

– …

IPS S is robust ust to inaccura curate te propen pensiti sities es

14

Horr rror Romance ce Drama

Observa ervations

  • ns O

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

slide-15
SLIDE 15

Debiased ebiased Col

  • llabo

labora rative tive Filter tering ing

෠ 𝑍𝐹𝑆𝑁 = argmin

𝑊,𝑋

𝑃𝑣,𝑗=1

1 𝑄𝑣,𝑗 𝑍

𝑣,𝑗 − 𝑊 𝑣𝑋 𝑗 2 + 𝜇

𝑊 𝐺

2 + 𝑋 𝐺 2

15

MF MF

Prop

  • pen

ensity estimat mation

  • n

Obse serva rvati tions s O Featu tures res Z Obse serve rved ratin ings s ෩ 𝒁

Latent variables

generative

Missin sing Data ta Model Comple lete te Data ta Model

discriminative

(Marlin et al, 2007; Steck, 2011; ...)

slide-16
SLIDE 16

Col

  • llabo

labora rative tive Filtering tering Results esults

16

  • Two real-worl

world d MNAR R datasets asets

– YAHO HOO: Song ng rating ings (154 5400 00 users; ers; Marlin & Zemel, 2009) – COAT: T: Shopp

  • ppin

ing g ratin ings gs (300 00 users; ers; new ew Schnabel et al, 2016)

  • Report

rt performa formance nce on MAR datase asets ts

http://www.cs.cornell.edu/~schnabts/mnar/

slide-17
SLIDE 17

Overview verview

  • “Use user ratings for collaborative filtering”

– Project: t: MNAR

(Schnabel et al, ICML 2016)

  • “Use user clicks for search ranking”

– Project: t: ULTR

(Joachims et al, WSDM 2017)

17

slide-18
SLIDE 18

Learning-to-Rank from Clicks

Presented 𝒛𝟐

A B C D E F G

Click

Presented 𝒛𝟐

A B C D E F G

Click

Presented 𝒛𝟐

A B C D E F G

Click

Presented 𝒛𝟐

A B C D E F G

Click

Presented 𝒛𝟐

A B C D E F G

Click

Presented 𝒛𝟐

A B C D E F G

Click

Presented ഥ 𝒛𝒐

A B C D E F G

Click Click

New Ranker 𝑇(𝑦) Learning Algorithm

Query Distribution 𝑦𝑗 ∼ 𝑸(𝒀) Deployed Ranker ത 𝑧𝑗 = ҧ 𝑇(𝑦𝑗) Should perform better than ҧ 𝑇(𝑦)

slide-19
SLIDE 19

New 𝒛

F G D C E A B

Presented ഥ 𝒛𝒐

A B C D E F G

Evaluating Rankings

Presented ഥ 𝒛

A B C D E F G

New 𝒛

F G D C E A B

Presented ഥ 𝒛

A B C D E F G

Click

New 𝒛

F G D C E A B

Deployed Ranker ത 𝑧 = ҧ 𝑇("𝑻𝑾𝑵") New Ranker to Evaluate 𝑧 = 𝑻("𝑻𝑾𝑵")

1 2 4 3 6 7 Manually Labeled

slide-20
SLIDE 20

Evaluation with Missing Judgments

  • Loss: Δ 𝑧|𝑠

– Relevance labels 𝑠𝑗 ∈ {0,1} – This talk: rank of relevant documents Δ 𝑧 𝑠 = ෍

𝑗

𝑠𝑏𝑜𝑙 𝑗 𝑧 ⋅ 𝑠𝑗

  • Assume:

– Click implies observed and relevant: 𝑑𝑗 = 1 ՞ 𝑝𝑗 = 1 ∧ 𝑠𝑗 = 1

  • Problem:

– No click can mean not relevant OR not observed 𝑑𝑗 = 0 ՞ 𝑝𝑗 = 0 ∨ (𝑠𝑗 = 0)

 Understand observation mechanism

Presented ഥ 𝒛

A B C D E F G

Click

slide-21
SLIDE 21

Inverse Propensity Score Estimator

  • Observation Propensities 𝑅 𝑝𝑗 = 1|𝑦, ത

𝑧, 𝑠

– Random variable 𝑝𝑗 ∈ {0,1} indicates whether relevance label 𝑠𝑗 for is observed

  • Inverse Propensity Score (IPS) Estimator:
  • Unbiasedness: 𝐹𝑝 ෡

Δ(𝑧│𝑠, 𝑝) = Δ 𝑧 𝑠

෡ Δ 𝑧 𝑠, 𝑝 = ෍

𝑗:𝑑𝑗=1

𝑠𝑏𝑜𝑙 𝑗 𝑧 𝑅 𝑝𝑗 = 1|ത 𝑧, 𝑠 Presented ഥ 𝒛 𝑅

A 1.0 B 0.8 C 0.5 D 0.2 E 0.2 F 0.2 G 0.1

[Horvitz & Thompson, 1952] [Rubin, 1983] [Zadrozny et al., 2003] [Langford, Li, 2009] [Swaminathan & Joachims, 2015]

New Ranking

slide-22
SLIDE 22

ERM for Partial-Information LTR

  • Unbiased Empirical Risk:

෠ 𝑆𝐽𝑄𝑇 𝑇 = 1 𝑂 ෍

𝑦, ത 𝑧,𝑑 ∈𝑇

𝑗:𝑑𝑗=1

𝑠𝑏𝑜𝑙 𝑗 𝑧 𝑅 𝑝𝑗 = 1|ത 𝑧, 𝑠

  • ERM Learning:
  • Questions:

– How do we optimize this empirical risk in a practical learning algorithm? – How do we define and estimate the propensity model 𝑅 𝑝𝑗 = 1|ത 𝑧, 𝑠 ?

መ 𝑇 = argmin

𝑇

෡ 𝑆𝐽𝑄𝑇 𝑇

Consistent Estimator

  • f True

Error Consistent ERM Learning

slide-23
SLIDE 23

Propensity-Weighted SVM Rank

  • Data: 𝑇 = 𝑦𝑘, 𝑒𝑘, 𝐸

𝑘, 𝑟𝑘 𝑜

  • Training QP:
  • Loss Bound:

𝑥∗ = argmin

𝑥,𝜊≥0

1 2 𝑥 ⋅ 𝑥 + 𝐷 𝑜 ෍

𝑘

1 𝑟𝑘 ෍

𝑗

𝜊𝑘

𝑗

∀ ҧ 𝑒𝑗 ∈ 𝐸1: 𝑥 ⋅ 𝜚 𝑦1, 𝑒1 − 𝜚 𝑦1, ҧ 𝑒𝑗 ≥ 1 − 𝜊1

𝑗

⋮ ∀ ҧ 𝑒𝑗 ∈ 𝐸𝑜: 𝑥 ⋅ 𝜚 𝑦𝑜, 𝑒𝑜 − 𝜚 𝑦𝑜, ҧ 𝑒𝑗 ≥ 1 − 𝜊𝑜

𝑗

∀𝑥: 𝑠𝑏𝑜𝑙 𝑒, 𝑡𝑝𝑠𝑢(𝑥 ⋅ 𝜚(𝑦, 𝑒) ≤ ෍

𝑗

𝜊𝑗 + 1

Query Clicked Others Propensity Optimizes convex upper bound on unbiased IPS risk estimate! [Joachims et al., 2002]

slide-24
SLIDE 24

Position-Based Propensity Model

  • Model:
  • Assumptions

– Examination only depends on rank – Click reveals relevance if rank is examined

𝑄 𝑑𝑗 = 1|𝑠

𝑗, 𝑠𝑏𝑜𝑙 𝑗 ത

𝑧 = 𝑟𝑠𝑏𝑜𝑙 𝑗 ത 𝑧 ⋅ [𝑠

𝑗 = 1]

Presented ഥ 𝒛 𝑅

A 𝑟1 B 𝑟2 C 𝑟3 D 𝑟4 E 𝑟5 F 𝑟6 G 𝑟7

[Richardson et al., 2007] [Chuklin et al., 2015] [Wang et al., 2016]

slide-25
SLIDE 25

Estimating the Propensities

  • Experiment:

– Click rate at rank 1: 𝑟1 ⋅ 𝐹 𝑑𝑇1 = 1 𝑝𝑇1 = 1)

  • Intervention:

– swap results at rank 1 and rank k – Click rate at rank k: 𝑟𝑙 ⋅ 𝐹 𝑑𝑇1 = 1 𝑝𝑇1 = 1)

𝑟1 𝑟𝑙 = 𝐷𝑚𝑗𝑑𝑙 𝑠𝑏𝑢𝑓 𝑏𝑢 𝑠𝑏𝑜𝑙 1 𝐷𝑚𝑗𝑑𝑙 𝑠𝑏𝑢𝑓 𝑏𝑢 𝑠𝑏𝑜𝑙 𝑙 𝑏𝑔𝑢𝑓𝑠 𝑡𝑥𝑏𝑞

[Langford et al., 2009; Wang et al., 2016]

slide-26
SLIDE 26

Real-World Experiment

  • Arxiv Full-Text Search

– Run intervention experiment to estimate 𝑟𝑠 – Collect training clicks using production ranker – Train naïve / propensity SVM-Rank (1000 features) – A/B tests via interleaving

slide-27
SLIDE 27

Overview verview

  • “Use user ratings for collaborative filtering”

– Project: t: MNAR

(Schnabel et al, ICML 2016)

  • “Use user clicks for search ranking”

– Project: t: ULTR

(Joachims et al, WSDM 2017)

  • Discus

cussi sion

  • n

27

slide-28
SLIDE 28

Res esource urces

  • Randomi
  • mized

zed datase aset: t: http:// tp://ww www. w.cs.cornel cs.cornell. l.edu/~adith/Criteo/ du/~adith/Criteo/ [NIPS’16 workshop]

  • Tutori

torial: al: Off-pol policy icy evaluatio uation n and optimization mization http:// tp://ww www. w.cs.cornel cs.cornell. l.edu/~adith/Cfact du/~adith/CfactSIG IGIR20 IR2016 16 [SIGIR’16]

  • Book:

k: Causal

l Infe feren rence ce for Stati atist stics, cs, Social al, , and Biomedic edical al Sciences, nces, Imbens ens & Rubin, n, 2015.

  • Many

y open n questio stions! ns!

28

slide-29
SLIDE 29

Conclusion

  • nclusion

Thanks! nks! adswamin@microsoft.com

29

Causality ty+ML +ML Simple/pragmatic techniques to tackle biased user feedback