Recommendations as Treatments: Debiasing Learning and Evaluation - - PowerPoint PPT Presentation

recommendations as treatments
SMART_READER_LITE
LIVE PREVIEW

Recommendations as Treatments: Debiasing Learning and Evaluation - - PowerPoint PPT Presentation

Recommendations as Treatments: Debiasing Learning and Evaluation ICML 2016, NYC Schnabel , Adith Swaminathan , Tob obias Sc Ashudeep Singh , Navin Chandak , Thorsten Joachims Cornell University, Google Funded in part


slide-1
SLIDE 1

†Cornell University, §Google

Recommendations as Treatments: Debiasing Learning and Evaluation

ICML 2016, NYC

Tob

  • bias Sc

Schnabel †, Adith Swaminathan †, Ashudeep Singh †, Navin Chandak§, Thorsten Joachims †

Funded in part through NSF Awards IIS-1247637, IIS-1217686, IIS-1513692.

slide-2
SLIDE 2

Recommendations as Treatments: Debiasing Learning and Evaluation

2

Movie recommendation

5 1 5 3 1 3

Horror Romance Drama

Romance Lovers Horror Lovers

⇒ Data is Missing Not At Random (MNAR)

5 5 1 3 5 5 5 5 1 3 5 5 5 3 5 5 1 3 3 1 1 5 5 3 3 5 5 5 5 5 5 3 1 5 5 3

Y True Rating O Observed Y/N

Example adapted from (Steck et al., 2010)

slide-3
SLIDE 3

Recommendations as Treatments: Debiasing Learning and Evaluation

3

Selection Bias in Recommendation

  • Why is there selection bias?
  • User-induced bias (e.g., browsing)
  • System-induced bias (e.g., advertising)
  • Question: What happens if we ignore selection bias?

(Marlin et al., 2007; Steck, 2011; Hernándandez-Lobato et al., 2014)

slide-4
SLIDE 4

Recommendations as Treatments: Debiasing Learning and Evaluation

4

Evaluating Recommendations under Selection Bias

5 1 5 3 1 3

Horror Romance Drama

Romance Lovers Horror Lovers

⇒ Observed ratings are misleading due to selection bias

5 5 1 3 5 5 5 5 1 3 5 5 5 3 5 5 1 3 3 1 1 5 5 3 3 5 5 5 3 5 5 5 3 1 5 5 3 5 5 1 3 5 5 5 5 1 3 5 5 5 3 5 5 1 3 3 1 1 5 5 3 3 5 5 5 3 5 5 5 3 1 5 5 3

෠ 𝑍 Recommend Y True Rating O Observed Y/N

slide-5
SLIDE 5

Recommendations as Treatments: Debiasing Learning and Evaluation

5

Evaluating Predicted Ratings under Selection Bias

5 1 5 3 1 3

Horror Romance Drama

Romance Lovers Horror Lovers

5 5 5 5 5 5 5 5 5 5 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 3 3 3 3 3

෠ 𝑍

1

Pred Ratings (worse)

5 1 5 3 1 3

Horror Romance Drama

Romance Lovers Horror Lovers

5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5

෠ 𝑍

2

Pred Ratings (better)

slide-6
SLIDE 6

Recommendations as Treatments: Debiasing Learning and Evaluation

6

Evaluating Predicted Ratings under Selection Bias

5 1 5 3 1 3

Horror Romance Drama

Romance Lovers Horror Lovers

5 5 5 5 5 5 5 5 5 5 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 3 3 3 3 3

෠ 𝑍

1

Pred Ratings (worse)

5 1 5 3 1 3

Horror Romance Drama

Romance Lovers Horror Lovers

5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5

෠ 𝑍

2

Pred Ratings (better)

5 5 5 5 5 5 5 5 5 5 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 3 3 3 3 3 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5

slide-7
SLIDE 7

Recommendations as Treatments: Debiasing Learning and Evaluation

7

Evaluating Predicted Ratings under Selection Bias

5 1 5 3 1 3

Horror Romance Drama

Romance Lovers Horror Lovers

⇒ Observed losses are misleading due to selection bias

5 5 5 5 5 5 5 5 5 5 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 3 3 3 3 3

෠ 𝑍

1

Pred Ratings (worse)

5 1 5 3 1 3

Horror Romance Drama

Romance Lovers Horror Lovers

5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5

෠ 𝑍

2

Pred Ratings (better)

5 5 5 5 5 5 5 5 5 5 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 3 3 3 3 3

slide-8
SLIDE 8

Recommendations as Treatments: Debiasing Learning and Evaluation

5 5 1 3 5 5 5 5 1 3 5 5 5 3 5 5 1 3 3 1 1 5 5 3 3 5 5 5 5 5 5 3 1 5 5 3 8

Recommendations as Treatments

Question

  • : How can we fix the effects of selection bias?

Connection

  • to potential outcomes framework

5 1 5 3 1 3

Counterfactual Outcomes 𝑍 Observed Outcomes ෨ 𝑍

users patients movies

treatments patients

treatments

⇒ Understand assignment mechanism (Imbens & Ruben, 2015)

slide-9
SLIDE 9

Recommendations as Treatments: Debiasing Learning and Evaluation

9

Debiasing Evaluation

p p/10 p p/2 p/10 p/2

Horror Romance Drama

Propensities P

  • Assignment mechansim for recommendation:
  • 𝑄𝑣,𝑗 = 𝑄 𝑃𝑣,𝑗 = 1

෠ 𝑆𝐽𝑄𝑇 ෠ 𝑍|𝑄 = 1 𝑉 ⋅ 𝐽 ෍

𝑣,𝑗 :𝑃𝑣𝑗=1

1 𝑄𝑣,𝑗 𝑍

𝑣,𝑗 − ෠

𝑍

𝑣,𝑗 2

Use

  • Inverse-Propensity-Scoring Estimator

(IPS) to obtain unbiased estimate:

(Little & Rubin, 2002; Cortes et al., 2008; Bickel et al., 2009; Sugiyama & Kawanabe, 2012).

slide-10
SLIDE 10

Recommendations as Treatments: Debiasing Learning and Evaluation

10

Propensity estimation

Two

  • settings:

Experimental

  • ̶- Propensities are under our

control; known by design (e.g., ad placement) Observational

  • ̶- Users self-select; need to

estimate 𝑄𝑣,𝑗 Estimate  parameter of binary random variables: 𝑄𝑣,𝑗 = 𝑄 𝑃𝑣,𝑗 = 1 | 𝑌, ෨ 𝑍 Variety of models:  Logistic Regression, Naïve Bayes, etc.

Horror Romance Drama

Observations O

1 1 1 0 0 1 1 1 0 0 1 1 1 0 1 1 0 0 1 1 1 1 0 1 1 1 1 1 1 0 0 1 1 1 1 0 1 0 0 1 1 1 1 1 1 1 0 0 1

slide-11
SLIDE 11

Recommendations as Treatments: Debiasing Learning and Evaluation

11

Debiasing Evaluation

Robustness

  • to selection bias:

Severity of Selection Bias Severity of Selection Bias

slide-12
SLIDE 12

Recommendations as Treatments: Debiasing Learning and Evaluation

12

Debiasing Evaluation

  • Robustness to inaccurate propensities:

More accurate propensities More accurate propensities IPS-est

slide-13
SLIDE 13

Recommendations as Treatments: Debiasing Learning and Evaluation

13

Debiasing Learning

Empirical

  • Risk Minimization (ERM) successful in many settings (Cortes & Vapnik, 1995)

Use

  • ERM together with Inverse-Propensity-Scoring Estimator (IPS)
  • For matrix factorization with MSE loss:

෠ 𝑍𝐹𝑆𝑁 = argmin

෠ 𝑍∈ℋ

෠ 𝑆𝐽𝑄𝑇 ෠ 𝑍 | 𝑄

෠ 𝑍𝐹𝑆𝑁 = argmin

𝑊,𝑋

𝑃𝑣,𝑗=1

1 𝑄𝑣,𝑗 𝑍

𝑣,𝑗 − 𝑊 𝑣𝑋 𝑗 2 + 𝜇

𝑊 𝐺

2 + 𝑋 𝐺 2

propensity weight

slide-14
SLIDE 14

Recommendations as Treatments: Debiasing Learning and Evaluation

14

Generalization Error

Theoretical

  • insights:

Additional trade

  • off between bias and variance

With

  • probability 1 − 𝜃, capacity ℋ , maximum loss Δ:

𝑆 ෠ 𝑍𝐹𝑆𝑁 ≤ ෠ 𝑆𝐽𝑄𝑇 ෠ 𝑍𝐹𝑆𝑁| ෠ 𝑄 + Δ 𝑉 ⋅ 𝐽 ෍

𝑣,𝑗

1 − 𝑄𝑣,𝑗 ෠ 𝑄𝑣,𝑗 + Δ 𝑉 ⋅ 𝐽 log 2 ℋ 𝜃 2 ෍

𝑣,𝑗

1 ෠ 𝑄𝑣,𝑗

2

Bias Variance

slide-15
SLIDE 15

Recommendations as Treatments: Debiasing Learning and Evaluation

15

Propensity-scored ERM

Approach

  • is modular and discriminative:

Pick and 1. estimate propensity model Use 2. estimated propensities in ERM objective ERM Propensity estimation Observations O Features X Observed ratings ෩ 𝒁 Latent variables generative Missing Data Model Complete Data Model discriminative

(Marlin et al., 2007; Steck, 2011; Hernándandez-Lobato et al., 2014)

slide-16
SLIDE 16

Recommendations as Treatments: Debiasing Learning and Evaluation

16

Debiasing Learning

Results

  • n two real-world datasets:

COAT: Shopping

  • dataset (300 users; newly collected)

YAHOO: Song

  • rating dataset (15400 users; Marlin & Zemel, 2009)

Report

  • performance on MAR test data:

HL:

  • Latest generative approach (Hernández-Lobato et al., 2014)
slide-17
SLIDE 17

Recommendations as Treatments: Debiasing Learning and Evaluation

17

Conclusions

Discriminative propensity scoring

  • :

Modular

  • Directly optimizes target loss
  • No latent variables
  • Scalable
  • Data
  • and code:

http://www.cs.cornell.edu/

  • ~schnabts/mnar/

ERM Propensity estimation Observations O Features X Observed ratings ෩ 𝒁