Decision trees for uplift modeling Piotr Rzepakowski National - - PowerPoint PPT Presentation

decision trees for uplift modeling
SMART_READER_LITE
LIVE PREVIEW

Decision trees for uplift modeling Piotr Rzepakowski National - - PowerPoint PPT Presentation

Decision trees for uplift modeling Piotr Rzepakowski National Institute of Telecommunications Warsaw, Poland Warsaw University of Technology Warsaw, Poland Szymon Jaroszewicz National Institute of Telecommunications Warsaw, Poland Polish


slide-1
SLIDE 1

Decision trees for uplift modeling

Piotr Rzepakowski

National Institute of Telecommunications Warsaw, Poland Warsaw University of Technology Warsaw, Poland

Szymon Jaroszewicz

National Institute of Telecommunications Warsaw, Poland Polish Academy of Sciences Warsaw, Poland

ICDM 2010

Piotr Rzepakowski & Szymon Jaroszewicz (Piotr RzepakowskiNational Institute of Telecommunications Warsaw, Decision trees for uplift modeling ICDM 2010 1 / 21

slide-2
SLIDE 2

Marketing campaign example

Sample Pilot campaign Model P(buy|campaign) Select targets for campaign

Piotr Rzepakowski & Szymon Jaroszewicz (Piotr RzepakowskiNational Institute of Telecommunications Warsaw, Decision trees for uplift modeling ICDM 2010 2 / 21

slide-3
SLIDE 3

Main idea of uplift modeling

We can divide objects into four groups

1 Responded because of the action 2 Responded regardless of whether the action is taken (unnecessary

costs)

3 Did not respond and the action had no impact (unnecessary costs) 4 Did not respond because the action had a negative impact

(e.g. customer got annoyed by the campaign, may even churn)

Piotr Rzepakowski & Szymon Jaroszewicz (Piotr RzepakowskiNational Institute of Telecommunications Warsaw, Decision trees for uplift modeling ICDM 2010 3 / 21

slide-4
SLIDE 4

Traditional classification vs. uplift modeling

Traditional models predict the conditional probability

P(response|treatment)

Piotr Rzepakowski & Szymon Jaroszewicz (Piotr RzepakowskiNational Institute of Telecommunications Warsaw, Decision trees for uplift modeling ICDM 2010 4 / 21

slide-5
SLIDE 5

Traditional classification vs. uplift modeling

Traditional models predict the conditional probability

P(response|treatment)

Uplift models predict change in behaviour resulting from the action

P(response|treatment) − P(response|no treatment)

Piotr Rzepakowski & Szymon Jaroszewicz (Piotr RzepakowskiNational Institute of Telecommunications Warsaw, Decision trees for uplift modeling ICDM 2010 4 / 21

slide-6
SLIDE 6

Marketing campaign example (uplift modeling approach)

Treatment sample Control sample Pilot campaign Model P(buy|campaign) − P(buy|no campaign) Select targets for campaign

Piotr Rzepakowski & Szymon Jaroszewicz (Piotr RzepakowskiNational Institute of Telecommunications Warsaw, Decision trees for uplift modeling ICDM 2010 5 / 21

slide-7
SLIDE 7

Related work

Literature

Surprisingly little attention in literature Business whitepapers offering vague descriptions of algorithms used

Two general approaches

Subtraction of two models Modification of model learning algorithms

Piotr Rzepakowski & Szymon Jaroszewicz (Piotr RzepakowskiNational Institute of Telecommunications Warsaw, Decision trees for uplift modeling ICDM 2010 6 / 21

slide-8
SLIDE 8

Subtraction of two models

Treatment sample Control sample Pilot campaign Model P(buy|campaign) Model P(buy|no campaign) P(buy|campaign) − P(buy|no campaign) Select targets for campaign + –

Piotr Rzepakowski & Szymon Jaroszewicz (Piotr RzepakowskiNational Institute of Telecommunications Warsaw, Decision trees for uplift modeling ICDM 2010 7 / 21

slide-9
SLIDE 9

Current approaches to uplift decision trees

Create splits using difference of probabilities (∆∆P) PT = 5% PC = 3% ∆P = 2% PT = 8% PC = 3.5% ∆P = 4.5% x < a PT = 3.7% PC = 2.8% ∆P = 0.9% x >= a ∆∆P = 3.6% Pruning not used (or not described) Work only for two class problems and binary splits

Piotr Rzepakowski & Szymon Jaroszewicz (Piotr RzepakowskiNational Institute of Telecommunications Warsaw, Decision trees for uplift modeling ICDM 2010 8 / 21

slide-10
SLIDE 10

Our approach to uplift decision trees

Spliting criteria based on Information Theory Pruning strategy designed for uplift modeling Multiclass problems and multiway splits possible If the control group is empty, the criterion should reduce to one of classical splitting criteria used for decision tree learning

Piotr Rzepakowski & Szymon Jaroszewicz (Piotr RzepakowskiNational Institute of Telecommunications Warsaw, Decision trees for uplift modeling ICDM 2010 9 / 21

slide-11
SLIDE 11

Kullback-Leibler divergence

Measure difference between treatment and control groups using KL divergence KL

  • PT(Class) : PC(Class)
  • =
  • y∈Dom(Class)

PT(y) log PT(y) PC(y)

Piotr Rzepakowski & Szymon Jaroszewicz (Piotr RzepakowskiNational Institute of Telecommunications Warsaw, Decision trees for uplift modeling ICDM 2010 10 / 21

slide-12
SLIDE 12

Kullback-Leibler divergence

Measure difference between treatment and control groups using KL divergence KL

  • PT(Class) : PC(Class)
  • =
  • y∈Dom(Class)

PT(y) log PT(y) PC(y) Need KL-divergence conditional on a given test KL(PT(Class) : PC(Class)|Test) =

  • a∈Dom(Test)

NT (a) + NC(a) NT + NC KL

  • PT(Class|a) : PC(Class|a)
  • Measures how much the two groups differ given a test’s outcome

Piotr Rzepakowski & Szymon Jaroszewicz (Piotr RzepakowskiNational Institute of Telecommunications Warsaw, Decision trees for uplift modeling ICDM 2010 10 / 21

slide-13
SLIDE 13

Final splitting criterion

KLgain(Test) = KL

  • PT(Class) : PC(Class)|Test
  • − KL
  • PT(Class) : PC(Class)
  • Measures the increase in difference between treatment and control

groups from splitting based on Test If the control group is empty, KLgain reduces to entropy gain

Piotr Rzepakowski & Szymon Jaroszewicz (Piotr RzepakowskiNational Institute of Telecommunications Warsaw, Decision trees for uplift modeling ICDM 2010 11 / 21

slide-14
SLIDE 14

Final splitting criterion

KLgain(Test) = KL

  • PT(Class) : PC(Class)|Test
  • − KL
  • PT(Class) : PC(Class)
  • Measures the increase in difference between treatment and control

groups from splitting based on Test If the control group is empty, KLgain reduces to entropy gain KLratio = KLgain(Test) KLvalue(Test) Tests with large number of values are punished Tests which split the control and treatment groups in different proportions are punished Postulates are satisfied

Piotr Rzepakowski & Szymon Jaroszewicz (Piotr RzepakowskiNational Institute of Telecommunications Warsaw, Decision trees for uplift modeling ICDM 2010 11 / 21

slide-15
SLIDE 15

Splitting criterion based on squared Euclidean distance

Euclid

  • PT(Class) : PC(Class)
  • =
  • y∈Dom(Class)
  • PT(y) − PC(y)

2 Euclidgain, Euclidratio analogous to KL Better statistical properties (values are bounded) Symmetry

Piotr Rzepakowski & Szymon Jaroszewicz (Piotr RzepakowskiNational Institute of Telecommunications Warsaw, Decision trees for uplift modeling ICDM 2010 12 / 21

slide-16
SLIDE 16

Pruning procedure (maximum class probability difference)

Definitions Diff (Class, node) = PT(Class|node) − PC(Class|node) Maximum class probability difference (MD) MD(node) = maxClass |Diff (Class|node)| sign(node) = sgn(Diff (Class∗, node))

Piotr Rzepakowski & Szymon Jaroszewicz (Piotr RzepakowskiNational Institute of Telecommunications Warsaw, Decision trees for uplift modeling ICDM 2010 13 / 21

slide-17
SLIDE 17

Pruning procedure (maximum class probability difference)

Definitions Diff (Class, node) = PT(Class|node) − PC(Class|node) Maximum class probability difference (MD) MD(node) = maxClass |Diff (Class|node)| sign(node) = sgn(Diff (Class∗, node)) Use separate validation sets Bottom up procedure Keep subtree if

On validation set: MD of the subtree is greater than if it was replaced with a leaf And the sign of MD is the same in training and validation sets

Piotr Rzepakowski & Szymon Jaroszewicz (Piotr RzepakowskiNational Institute of Telecommunications Warsaw, Decision trees for uplift modeling ICDM 2010 13 / 21

slide-18
SLIDE 18

Experimental evaluation

Compared models

1

Euclid - uplift decision trees based on Eratio

2

KL - uplift decision trees based on KLratio

3

DeltaDeltaP - based on the ∆∆P criterion

4

DoubleTree - separate decision trees for the treatment and control groups

Piotr Rzepakowski & Szymon Jaroszewicz (Piotr RzepakowskiNational Institute of Telecommunications Warsaw, Decision trees for uplift modeling ICDM 2010 14 / 21

slide-19
SLIDE 19

Method of evaluating uplift classifiers

Control and treatment datasets are scored using the same model Compute lift curves on both datasets Uplift curve = lift curve on treatment data – lift curve on control data Measure model’s performance based on

Area under the uplift curve (AUUC) Height of the uplift curve at the 40th percentile

Piotr Rzepakowski & Szymon Jaroszewicz (Piotr RzepakowskiNational Institute of Telecommunications Warsaw, Decision trees for uplift modeling ICDM 2010 15 / 21

slide-20
SLIDE 20

The uplift curve for the splice dataset

20 40 60 80 100 Treated objects [%] 2 4 6 8 10 12 14 16 18 Cumulative profit increase

Euclid KL DoubleTree DeltaDeltaP

Piotr Rzepakowski & Szymon Jaroszewicz (Piotr RzepakowskiNational Institute of Telecommunications Warsaw, Decision trees for uplift modeling ICDM 2010 16 / 21

slide-21
SLIDE 21

Data preparation

Lack publicly available data to test uplift models Datasets from UCI repository were split into treatment and control groups based on one attribute Procedure of choosing the splitting attribute:

If an action was present it was picked (e.g. hepatitis data) Otherwise pick the first attribute which gives a reasonably balanced split

Piotr Rzepakowski & Szymon Jaroszewicz (Piotr RzepakowskiNational Institute of Telecommunications Warsaw, Decision trees for uplift modeling ICDM 2010 17 / 21

slide-22
SLIDE 22

Methodology of model comparision

1 Models are evaluated using 2 × 5 crossvalidation 2 Models are compared by ranking on all datasets 3 Check if there are differences in model prformance using Friedman’s

test, a nonparametric analogue of ANOVA

4 If the test shows significant differences, a post-hoc Nemenyi test is

used to assess which of the models are significantly different

Piotr Rzepakowski & Szymon Jaroszewicz (Piotr RzepakowskiNational Institute of Telecommunications Warsaw, Decision trees for uplift modeling ICDM 2010 18 / 21

slide-23
SLIDE 23

Results for Area Under Uplift Curve Nemenyi test at p = 0.01

1 2 3 4

KL Euclid DoubleTree DeltaDeltaP CD0.01

Piotr Rzepakowski & Szymon Jaroszewicz (Piotr RzepakowskiNational Institute of Telecommunications Warsaw, Decision trees for uplift modeling ICDM 2010 19 / 21

slide-24
SLIDE 24

Results for the height of the curve at the 40th percentile Nemenyi test at p = 0.05

1 2 3 4

KL Euclid DoubleTree DeltaDeltaP CD0.05

Piotr Rzepakowski & Szymon Jaroszewicz (Piotr RzepakowskiNational Institute of Telecommunications Warsaw, Decision trees for uplift modeling ICDM 2010 20 / 21

slide-25
SLIDE 25

Summary

Method for decision tree construction for uplift modeling in the style

  • f modern decision tree learning

Information Theory based splitting Dedicated pruning strategy

Two splitting criteria (KL and Euclidian distance) Reduce to standard decision trees if control data absent The new method significantly outperforms previous approaches to uplift modeling Other applications e.g. medicine

Piotr Rzepakowski & Szymon Jaroszewicz (Piotr RzepakowskiNational Institute of Telecommunications Warsaw, Decision trees for uplift modeling ICDM 2010 21 / 21