A Feedback Shift Correction in Predicting Conversion Rates under - - PowerPoint PPT Presentation

a feedback shift correction in predicting conversion
SMART_READER_LITE
LIVE PREVIEW

A Feedback Shift Correction in Predicting Conversion Rates under - - PowerPoint PPT Presentation

A Feedback Shift Correction in Predicting Conversion Rates under Delayed Feedback Shota Yasui, Gota Morishita, Komei Fujita, Masashi Shibata The Web Conference 2020 Introduction and Problem Setting 2


slide-1
SLIDE 1

A Feedback Shift Correction 
 in Predicting Conversion Rates 
 under Delayed Feedback


Shota Yasui, Gota Morishita, 
 Komei Fujita, Masashi Shibata
 
 The Web Conference 2020


slide-2
SLIDE 2

Introduction and 
 Problem Setting


2

slide-3
SLIDE 3

Conversion Prediction


Predict Conversion-Rate(CVR) for each request.


DSP

bid


Predicting CVR is important to decide the bid price


User
 use Apps
 AD Auction


request


3

slide-4
SLIDE 4

Ideal loss function


The following loss should be minimized.
 The ideal parameters are as follow
 This is not possible!
 Because we do not observe c due to the delayed feedback.


features
 Conversion
 model


4

slide-5
SLIDE 5

Delayed Feedback


5

slide-6
SLIDE 6

Delayed Feedback


6

timestamp of Click
 timestamp of CV
 time


timestamp of click and cv for certain user 


  • user takes sometimes to purchase items after clicked the ad. 


delay


slide-7
SLIDE 7

The problem of Delayed Feedback


7

  • we can not observe CV for this user 

  • this sample is recognized as negative label! (mislabeled) 


timestamp of Click
 timestamp of CV
 time
 training
 begins


timestamp of click and cv for certain user 


included in training data 


Unobserved


slide-8
SLIDE 8

The relation between Y and C


8

C=1 C=0 Y=1 Y=0 mislabeled
 S = 0
 correctly labeled
 S = 1
 true label


  • bservable label


Prob of correctly labeled 
 Prob of mislabel


slide-9
SLIDE 9

Bias in standard supervised approach


ideal loss
 actual loss(ERM)
 Inconsistent!


9

slide-10
SLIDE 10

Our Solution


Importance Weight Approach


10

slide-11
SLIDE 11

Importance Weight(FSIW) approach


11

ideal-loss
 Unbiased-loss
 (consistent?)


We propose consistent loss based on the Importance Weight(Propensity Score)


Importance Weight


slide-12
SLIDE 12

Importance Weight(FSIW) approach


Our empirical loss
 The basic idea is to weight each sample 
 by the conditional density ratio.


12

Importance Weight


slide-13
SLIDE 13

How to estimate FSIW


We estimate these probability from data old enough to observe S and C. 


13

slide-14
SLIDE 14

14

week 1 week 2 week 3 discard
 training data
 Counterfactual Dead Line 


slide-15
SLIDE 15

15

week 1 week 2 week 3 discard


Train models for 


training data
 Counterfactual Dead Line 


slide-16
SLIDE 16

16

week 1 week 2 week 3 discard


Train models for 


training data
 Counterfactual Dead Line 
 week 1 week 2 week 3 Importance weight

Train the CVR model


training data


slide-17
SLIDE 17

features of our proposed method


It is just a importance weight
 ○ can be used for any CVR model
 ○ can fit the delay nonparametrically
 ○ does not increase the time complexity of CVR models


17

slide-18
SLIDE 18

Experiment


18

slide-19
SLIDE 19

Conversion Logs Dataset


  • Open data provided by Criteo(Link)

  • 30days of click and CV log

  • Used in Chapelle(2014)

  • bservation period is 30days


19

slide-20
SLIDE 20

Experiment procedure


20

train(3 weeks) test train(3 weeks) test train(3 weeks) test

time


averaging these results 


day = 22
 day = 23
 day = 24


train(3 weeks) test

day = 28


iterate for 7days

slide-21
SLIDE 21

Result 1


Proposed Method 
 Chapelle(2014) 
 Pure-Logistic
 Regression


  • Normalized-logloss(NLL) is the most important metrics 


○ we use prediction probability for bidding 
 ○ logloss(LL) is sensitive to the base CVR 


21

slide-22
SLIDE 22

Dynalyst Data


22

  • DSP in Cyberagent.inc 

  • 2 experiments


○ the same procedure as the first experiment 
 ■ focus on three campaigns 
 ■ baseline model is FFM (Juan 2017) 
 ○ Online A/B test


slide-23
SLIDE 23

Three Campaigns


  • Observational period is different by campaings 


○ S: 1days
 ○ M: 3days
 ○ L: 7days


23

slide-24
SLIDE 24

Result 2


Only Campaign L shows the improvement. 


24

slide-25
SLIDE 25

Follow Up Online Experiment@Campaign-L


25

  • Improved cost consumption and CV.

  • CPA does not change or slightly decreased.

slide-26
SLIDE 26

Conclusion


  • We proposed a consistent loss to predict CVR under Delayed

Feedback.
 


  • Our method performs better in two offline and one online

experiment.
 
 Thank you for listening!
 26

slide-27
SLIDE 27

appendix


slide-28
SLIDE 28

cumulative distribution of delay


28

slide-29
SLIDE 29

effect of counterfactual deadline


29