Decision trees for uplift modeling Piotr Rzepakowski National - PowerPoint PPT Presentation

Decision trees for uplift modeling Piotr Rzepakowski National Institute of Telecommunications Warsaw, Poland Warsaw University of Technology Warsaw, Poland Szymon Jaroszewicz National Institute of Telecommunications Warsaw, Poland Polish Academy of Sciences Warsaw, Poland ICDM 2010 Piotr Rzepakowski & Szymon Jaroszewicz ( Piotr Rzepakowski National Institute of Telecommunications Warsaw, Decision trees for uplift modeling ICDM 2010 1 / 21

Marketing campaign example Select Model Pilot Sample targets for campaign P ( buy | campaign ) campaign Piotr Rzepakowski & Szymon Jaroszewicz ( Piotr Rzepakowski National Institute of Telecommunications Warsaw, Decision trees for uplift modeling ICDM 2010 2 / 21

Main idea of uplift modeling We can divide objects into four groups 1 Responded because of the action 2 Responded regardless of whether the action is taken ( unnecessary costs) 3 Did not respond and the action had no impact ( unnecessary costs) 4 Did not respond because the action had a negative impact ( e.g. customer got annoyed by the campaign, may even churn) Piotr Rzepakowski & Szymon Jaroszewicz ( Piotr Rzepakowski National Institute of Telecommunications Warsaw, Decision trees for uplift modeling ICDM 2010 3 / 21

Traditional classification vs. uplift modeling Traditional models predict the conditional probability P ( response | treatment ) Piotr Rzepakowski & Szymon Jaroszewicz ( Piotr Rzepakowski National Institute of Telecommunications Warsaw, Decision trees for uplift modeling ICDM 2010 4 / 21

Traditional classification vs. uplift modeling Traditional models predict the conditional probability P ( response | treatment ) Uplift models predict change in behaviour resulting from the action P ( response | treatment ) − P ( response | no treatment ) Piotr Rzepakowski & Szymon Jaroszewicz ( Piotr Rzepakowski National Institute of Telecommunications Warsaw, Decision trees for uplift modeling ICDM 2010 4 / 21

Marketing campaign example (uplift modeling approach) Treatment Pilot sample campaign Model Select P ( buy | campaign ) − targets for P ( buy | no campaign ) campaign Control sample Piotr Rzepakowski & Szymon Jaroszewicz ( Piotr Rzepakowski National Institute of Telecommunications Warsaw, Decision trees for uplift modeling ICDM 2010 5 / 21

Related work Literature Surprisingly little attention in literature Business whitepapers offering vague descriptions of algorithms used Two general approaches Subtraction of two models Modification of model learning algorithms Piotr Rzepakowski & Szymon Jaroszewicz ( Piotr Rzepakowski National Institute of Telecommunications Warsaw, Decision trees for uplift modeling ICDM 2010 6 / 21

Subtraction of two models Treatment Pilot Model P ( buy | campaign ) sample campaign + Select P ( buy | campaign ) − targets for P ( buy | no campaign ) campaign – Model Control sample P ( buy | no campaign ) Piotr Rzepakowski & Szymon Jaroszewicz ( Piotr Rzepakowski National Institute of Telecommunications Warsaw, Decision trees for uplift modeling ICDM 2010 7 / 21

Current approaches to uplift decision trees Create splits using difference of probabilities (∆∆ P ) P T = 5% ∆ P = 2% P C = 3% x > = a x < a P T = 8% P T = 3 . 7% ∆ P = 4 . 5% ∆ P = 0 . 9% P C = 3 . 5% P C = 2 . 8% ∆∆ P = 3 . 6% Pruning not used (or not described) Work only for two class problems and binary splits Piotr Rzepakowski & Szymon Jaroszewicz ( Piotr Rzepakowski National Institute of Telecommunications Warsaw, Decision trees for uplift modeling ICDM 2010 8 / 21

Our approach to uplift decision trees Spliting criteria based on Information Theory Pruning strategy designed for uplift modeling Multiclass problems and multiway splits possible If the control group is empty , the criterion should reduce to one of classical splitting criteria used for decision tree learning Piotr Rzepakowski & Szymon Jaroszewicz ( Piotr Rzepakowski National Institute of Telecommunications Warsaw, Decision trees for uplift modeling ICDM 2010 9 / 21

Kullback-Leibler divergence Measure difference between treatment and control groups using KL divergence P T ( y ) log P T ( y ) � � � P T ( Class ) : P C ( Class ) KL = P C ( y ) y ∈ Dom( Class ) Piotr Rzepakowski & Szymon Jaroszewicz ( Piotr Rzepakowski National Institute of Telecommunications Warsaw, Decision trees for uplift modeling ICDM 2010 10 / 21

Kullback-Leibler divergence Measure difference between treatment and control groups using KL divergence P T ( y ) log P T ( y ) � � � P T ( Class ) : P C ( Class ) KL = P C ( y ) y ∈ Dom( Class ) Need KL-divergence conditional on a given test KL ( P T ( Class ) : P C ( Class ) | Test ) N T ( a ) + N C ( a ) � � � P T ( Class | a ) : P C ( Class | a ) = KL N T + N C a ∈ Dom( Test ) Measures how much the two groups differ given a test’s outcome Piotr Rzepakowski & Szymon Jaroszewicz ( Piotr Rzepakowski National Institute of Telecommunications Warsaw, Decision trees for uplift modeling ICDM 2010 10 / 21

Final splitting criterion KL gain ( Test ) = � � � � P T ( Class ) : P C ( Class ) | Test P T ( Class ) : P C ( Class ) − KL KL Measures the increase in difference between treatment and control groups from splitting based on Test If the control group is empty, KL gain reduces to entropy gain Piotr Rzepakowski & Szymon Jaroszewicz ( Piotr Rzepakowski National Institute of Telecommunications Warsaw, Decision trees for uplift modeling ICDM 2010 11 / 21

Final splitting criterion KL gain ( Test ) = � � � � P T ( Class ) : P C ( Class ) | Test P T ( Class ) : P C ( Class ) − KL KL Measures the increase in difference between treatment and control groups from splitting based on Test If the control group is empty, KL gain reduces to entropy gain KL ratio = KL gain ( Test ) KL value ( Test ) Tests with large number of values are punished Tests which split the control and treatment groups in different proportions are punished Postulates are satisfied Piotr Rzepakowski & Szymon Jaroszewicz ( Piotr Rzepakowski National Institute of Telecommunications Warsaw, Decision trees for uplift modeling ICDM 2010 11 / 21

Splitting criterion based on squared Euclidean distance � 2 � � � P T ( Class ) : P C ( Class ) � P T ( y ) − P C ( y ) = Euclid y ∈ Dom( Class ) Euclid gain , Euclid ratio analogous to KL Better statistical properties (values are bounded) Symmetry Piotr Rzepakowski & Szymon Jaroszewicz ( Piotr Rzepakowski National Institute of Telecommunications Warsaw, Decision trees for uplift modeling ICDM 2010 12 / 21

Pruning procedure (maximum class probability difference) Definitions Diff ( Class , node ) = P T ( Class | node ) − P C ( Class | node ) Maximum class probability difference (MD) MD ( node ) = max Class | Diff ( Class | node ) | sign ( node ) = sgn( Diff ( Class ∗ , node )) Piotr Rzepakowski & Szymon Jaroszewicz ( Piotr Rzepakowski National Institute of Telecommunications Warsaw, Decision trees for uplift modeling ICDM 2010 13 / 21

Pruning procedure (maximum class probability difference) Definitions Diff ( Class , node ) = P T ( Class | node ) − P C ( Class | node ) Maximum class probability difference (MD) MD ( node ) = max Class | Diff ( Class | node ) | sign ( node ) = sgn( Diff ( Class ∗ , node )) Use separate validation sets Bottom up procedure Keep subtree if On validation set: MD of the subtree is greater than if it was replaced with a leaf And the sign of MD is the same in training and validation sets Piotr Rzepakowski & Szymon Jaroszewicz ( Piotr Rzepakowski National Institute of Telecommunications Warsaw, Decision trees for uplift modeling ICDM 2010 13 / 21

Experimental evaluation Compared models Euclid - uplift decision trees based on E ratio 1 KL - uplift decision trees based on KL ratio 2 DeltaDeltaP - based on the ∆∆ P criterion 3 DoubleTree - separate decision trees for the treatment and control 4 groups Piotr Rzepakowski & Szymon Jaroszewicz ( Piotr Rzepakowski National Institute of Telecommunications Warsaw, Decision trees for uplift modeling ICDM 2010 14 / 21

Method of evaluating uplift classifiers Control and treatment datasets are scored using the same model Compute lift curves on both datasets Uplift curve = lift curve on treatment data – lift curve on control data Measure model’s performance based on Area under the uplift curve (AUUC) Height of the uplift curve at the 40th percentile Piotr Rzepakowski & Szymon Jaroszewicz ( Piotr Rzepakowski National Institute of Telecommunications Warsaw, Decision trees for uplift modeling ICDM 2010 15 / 21

The uplift curve for the splice dataset 18 Euclid KL 16 DoubleTree 14 DeltaDeltaP Cumulative profit increase 12 10 8 6 4 2 0 0 20 40 60 80 100 Treated objects [%] Piotr Rzepakowski & Szymon Jaroszewicz ( Piotr Rzepakowski National Institute of Telecommunications Warsaw, Decision trees for uplift modeling ICDM 2010 16 / 21

Data preparation Lack publicly available data to test uplift models Datasets from UCI repository were split into treatment and control groups based on one attribute Procedure of choosing the splitting attribute: If an action was present it was picked (e.g. hepatitis data) Otherwise pick the first attribute which gives a reasonably balanced split Piotr Rzepakowski & Szymon Jaroszewicz ( Piotr Rzepakowski National Institute of Telecommunications Warsaw, Decision trees for uplift modeling ICDM 2010 17 / 21

Decision trees for uplift modeling Piotr Rzepakowski National - PowerPoint PPT Presentation

Decision trees for uplift modeling Piotr Rzepakowski National Institute of Telecommunications Warsaw, Poland Warsaw University of Technology Warsaw, Poland Szymon Jaroszewicz National Institute of Telecommunications Warsaw, Poland Polish

Decision Trees Lecture 23 To left or to right 1 Decision Trees 2 Decision Trees A different

Decision Trees Lecture 22 To left or to right 1 Decision Trees 2 Decision Trees A different

Learning Decision Trees Representation is a decision tree. Bias is towards simple decision

Trees Trees CSE, IIT KGP Trees and Spanning Trees Trees and Spanning Trees A graph having

( ( ) ) ( ) ( ) = = Work = h log t n B- B -Trees Trees B B- -Trees

Trees Chapter 11 Chapter Summary Introduction to Trees Applications of Trees Tree

Decision Tree R Greiner Cmput 466 / 551 Learning Decision Trees Def'n: Decision Trees

Trees Eric McCreath Overview In this lecture we will explore: general trees, binary trees,

Lecture 23: Decision Trees Decision trees Prof. Julia Hockenmaier

Outline Univariate Trees 1 Decision Trees Classification Regression Pruning Steven J Zeil

2-3-4 Trees and Red- Black Trees 204 erm CS 16: Balanced Trees 2-3-4 Trees Revealed Nodes

/ + - * * 5 3 2 6 5 2 Examples Binary Trees BSTs Augmenting BinExpr General Trees

Learning Decision Trees Machine Learning 1 Some slides from Tom Mitchell, Dan Roth and others

Optimal Sparse Decision Trees Xiyang Hu Cynthia Rudin Margo Seltzer Carnegie Mellon Duke

Decision Trees: Discussion Machine Learning 1 Some slides from Tom Mitchell, Dan Roth and others

Decision trees Decision Trees / Discrete Variables Location Season Location Fun? Ski Slope

Review Elisa Bertino SIGSAC Chair March 2013 Special Interest Group on Security, Audit and

Komulainen Kaija, DDS University of Eastern Finland, City of Kuopio, Finland SMART SCIENCE BY

CHAMP C ollaborating to H eal opioid A ddiction and M ental health in P rimary Care Presented by

E-Cigarettes: A 21st Century Cessation Device? A Review of the Literature Dr Natalie Walker

Null Hypothesis Significance Testing p -values, significance level, power, t -tests 18.05 Spring

Causality and Experiments npr.org (report on a study in heart.bmj.com) Foundations of Data

Analyzing the A/B test results CUS TOMER AN ALYTICS AN D A/B TES TIN G IN P YTH ON Ryan

Making Your Abstract Awesome + Getting It Accepted Presented by the Health Equity Committee of