Support Vector Machines for Uplift Modeling Lukasz Zaniewicz 2 Szymon - PowerPoint PPT Presentation

Support Vector Machines for Uplift Modeling Lukasz Zaniewicz 2 Szymon Jaroszewicz 1 , 2 � 1 National Institute of Telecommunications Warsaw, Poland 2 Institute of Computer Science Polish Academy of Sciences Warsaw, Poland � Lukasz Zaniewicz, Szymon Jaroszewicz Uplift SVMs

What is uplift modeling? From workshop’s description: Traditionally, causal relationships are identified based on controlled experiments. [...] there has been an increasing interest in discovering causal relationships from observational data only. � Lukasz Zaniewicz, Szymon Jaroszewicz Uplift SVMs

What is uplift modeling? From workshop’s description: Traditionally, causal relationships are identified based on controlled experiments. [...] there has been an increasing interest in discovering causal relationships from observational data only. Suppose we do have data from a controlled experiment Question: what can Machine Learning do for us? Relatively little interest in the ML community � Lukasz Zaniewicz, Szymon Jaroszewicz Uplift SVMs

What is uplift modeling? Uplift modeling Given two training datasets: the treatment dataset 1 individuals on which an action was taken the control dataset 2 individuals on which no action was taken used as background Build a model which predicts the causal influence of the action on a given individual � Lukasz Zaniewicz, Szymon Jaroszewicz Uplift SVMs

Uplift modeling Notation: P T probabilities in the treatment group P C probabilities in the control group Traditional classifiers predict the conditional probability P T ( Y | X 1 , . . . , X m ) Uplift models predict change in behaviour resulting from the action P T ( Y | X 1 , . . . , X m ) − P C ( Y | X 1 , . . . , X m ) � Lukasz Zaniewicz, Szymon Jaroszewicz Uplift SVMs

Why uplift modeling? A typical marketing campaign Select Pilot Sample Model P ( buy | X ) targets for campaign campaign � Lukasz Zaniewicz, Szymon Jaroszewicz Uplift SVMs

Why uplift modeling? A typical marketing campaign Select Pilot Sample Model P ( buy | X ) targets for campaign campaign But this is not what we need! We want people who bought because of the campaign Not people who bought after the campaign � Lukasz Zaniewicz, Szymon Jaroszewicz Uplift SVMs

A typical marketing campaign We can divide potential customers into four groups 1 Responded because of the action ( the people we want ) 2 Responded, but would have responded anyway ( unnecessary costs) 3 Did not respond and the action had no impact ( unnecessary costs ) 4 Did not respond because the action had a ( negative impact ) � Lukasz Zaniewicz, Szymon Jaroszewicz Uplift SVMs

Marketing campaign (uplift modeling approach) Marketing campaign (uplift modeling approach) Treatment Pilot sample campaign Model Select P T ( buy | X ) − targets for P C ( buy | X ) campaign Control sample � Lukasz Zaniewicz, Szymon Jaroszewicz Uplift SVMs

Applications in medicine A typical medical trial: treatment group: gets the treatment control group: gets placebo (or another treatment) do a statistical test to show that the treatment is better than placebo With uplift modeling we can find out for whom the treatment works best Personalized medicine � Lukasz Zaniewicz, Szymon Jaroszewicz Uplift SVMs

Main difficulty of uplift modeling Rubin’s causal inference framework The fundamental problem of causal inference Our knowledge is always incomplete For each training case we know either what happened after the treatment, or what happened if no treatment was given Never both! This makes designing uplift algorithms challenging � Lukasz Zaniewicz, Szymon Jaroszewicz Uplift SVMs

The two model approach An obvious approach to uplift modeling: 1 Build a classifier M T modeling P T ( Y | X ) on the treatment sample 2 Build a classifier M C modeling P C ( Y | X ) on the control sample 3 The uplift model subtracts probabilities predicted by both classifiers M U ( Y | X ) = M T ( Y | X ) − M C ( Y | X ) � Lukasz Zaniewicz, Szymon Jaroszewicz Uplift SVMs

Two model approach Advantages: Works with existing classification models Good probability predictions ⇒ good uplift prediction Disadvantages: Differences between class probabilities can follow a different pattern than the probabilities themselves each classifier focuses on changes in class probabilities but ignores the weaker ‘uplift signal’ algorithms designed to focus directly on uplift can give better results � Lukasz Zaniewicz, Szymon Jaroszewicz Uplift SVMs

Uplift Support Vector Machines � Lukasz Zaniewicz, Szymon Jaroszewicz Uplift SVMs

Uplift Support Vector Machines Support Vector Machines (SVMs) are a popular Machine Learning algorithm Here we adapt them to the uplift modeling problem � Lukasz Zaniewicz, Szymon Jaroszewicz Uplift SVMs

Uplift Support Vector Machines Recall that the outcome of an action can be positive negative neutral � Lukasz Zaniewicz, Szymon Jaroszewicz Uplift SVMs

Uplift Support Vector Machines Recall that the outcome of an action can be positive negative neutral Main idea Use two parallel hyperplanes dividing the sample space into three areas: positive (+1) neutral (0) negative ( − 1) � Lukasz Zaniewicz, Szymon Jaroszewicz Uplift SVMs

Uplift Support Vector Machines +1 H 1 0 H 2 − 1 H 1 : � w , x � + b 1 = 0 H 2 : � w , x � + b 2 = 0 � Lukasz Zaniewicz, Szymon Jaroszewicz Uplift SVMs

Uplift Support Vector Machines How do we train Uplift SVMs? Classical SVMs: need to know if a case is classified correctly Fundamental problem of causal inference ⇒ We never know if a point was classified correctly! The algorithm must use only the information available � Lukasz Zaniewicz, Szymon Jaroszewicz Uplift SVMs

Uplift Support Vector Machines Four types of points: T + , T − , C + , C − Positive area (+1): T − , C + definitely misclassified T + , C − may be correct, at worst neutral Negative area (-1): T + , C − definitely misclassified T − , C + may be correct, at worst neutral Neutral area (0): all predictions may be correct or incorrect � Lukasz Zaniewicz, Szymon Jaroszewicz Uplift SVMs

Uplift Support Vector Machines – problem formulation Penalize points separately for being on the wrong side of each hyperplane Points in the neutral area are penalized for crossing one hyperplane this prevents all points from being classified as neutral Points which are definitely misclassified are penalized for crossing two hyperplanes such points should be avoided, thus the higher penalty Other points are not penalized � Lukasz Zaniewicz, Szymon Jaroszewicz Uplift SVMs

Uplift Support Vector Machines – problem formulation +1 T + C − T − H 1 ξ i , 2 ξ i , 1 0 ξ i , 1 C + T + ξ i , 2 H 2 ξ i , 1 ξ i 2 T + − 1 C + � Lukasz Zaniewicz, Szymon Jaroszewicz Uplift SVMs

Optimization task – primal form 1 � � min 2 � w , w � + C 1 ξ i , 1 + C 2 ξ i , 1 w , b 1 , b 2 ∈ R m +2 D T + ∪ D C D T − ∪ D C + − � � + C 2 ξ i , 2 + C 1 ξ i , 2 , D T + ∪ D C D T − ∪ D C + − subject to: � w , x i � + b 1 ≤ − 1 + ξ i , 1 , for ( x i , y i ) ∈ D T + ∪ D C − , � w , x i � + b 1 ≥ +1 − ξ i , 1 , for ( x i , y i ) ∈ D T − ∪ D C + , � w , x i � + b 2 ≤ − 1 + ξ i , 2 , for ( x i , y i ) ∈ D T + ∪ D C − , � w , x i � + b 2 ≥ +1 − ξ i , 2 , for ( x i , y i ) ∈ D T − ∪ D C + , ξ i , j ≥ 0 , dla i = 1 , . . . , n , j ∈ { 1 , 2 } , � Lukasz Zaniewicz, Szymon Jaroszewicz Uplift SVMs

Optimization task – primal form We have two penalty parameters: C 1 penalty coefficient for being on the wrong side of one hyperplane C 2 coefficient of additional penalty for crossing also the second hyperplane All points classified as neutral are penalized with C 1 ξ All definitely misclassified points are penalized with C 1 ξ and C 2 ξ How do C 1 and C 2 influence the model? � Lukasz Zaniewicz, Szymon Jaroszewicz Uplift SVMs

Influence of penalty coefficients C 1 and C 2 on the model Lemma For a well defined model C 2 ≥ C 1 . Otherwise the order of the hyperplanes would be reversed. Lemma If C 2 = C 1 then no points are classified as neutral. Lemma For sufficiently large ratio C 2 / C 1 no point is penalized for crossing both hyperplanes. (Almost all points are classified as neutral.) � Lukasz Zaniewicz, Szymon Jaroszewicz Uplift SVMs

Influence of penalty coefficients C 1 and C 2 on the model The C 1 coefficient plays the role of the penalty in classical SVMs The ratio C 2 / C 1 decides on the proportion of cases classified as neutral � Lukasz Zaniewicz, Szymon Jaroszewicz Uplift SVMs

Example: the tamoxifen drug trial data tamoxifen classified negative 320 classified neutral 280 classified positive 240 number of cases 200 160 120 80 40 1.00 1.02 1.04 1.06 1.08 1.10 1.12 1.14 1.16 C 2 /C 1 � Lukasz Zaniewicz, Szymon Jaroszewicz Uplift SVMs

Example: the tamoxifen drug trial data tamoxifen 12 8 4 uplift [ % ] 0 4 8 classified negative 12 classified neutral classified positive 16 1.00 1.02 1.04 1.06 1.08 1.10 1.12 1.14 1.16 C 2 /C 1 � Lukasz Zaniewicz, Szymon Jaroszewicz Uplift SVMs

Evaluating uplift models � Lukasz Zaniewicz, Szymon Jaroszewicz Uplift SVMs

Support Vector Machines for Uplift Modeling Lukasz Zaniewicz 2 Szymon - PowerPoint PPT Presentation

Support Vector Machines for Uplift Modeling Lukasz Zaniewicz 2 Szymon Jaroszewicz 1 , 2 1 National Institute of Telecommunications Warsaw, Poland 2 Institute of Computer Science Polish Academy of Sciences Warsaw, Poland Lukasz

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 Support Vector Machines Kernel

? 17.10.2018 3 17.10.2018 4 Support Vector Machines (SVM): Background Support Vector Machines

Support Vector Machines October 16, 2018 Support Vector Machines October 16, 2018 1 / 31

Relevance Vector Machines Jukka Lankinen LUT February 21, 2011 Jukka Lankinen Relevance Vector

Vector addition: The zero vector The D -vector whose entries are all zero is the zero vector ,

Support Vector Machines & Kernelization Barna Saha Most of the slides are made using David

Introduction Kailash Awati Instructor DataCamp Support Vector Machines in R Preliminaries

Support Vector Machines Support Vector Machines Hypothesis Space Hypothesis Space variable

Support Vector Machines (Ch. 18.9) SVM Basics Support Vector Machines (SVMs) try to do our

Support vector machines CS 446 Part 1: linear support vector machines 1.0 1.0 1.0 0.8 0.8

SUPPORT VECTOR MACHINES SUPPORT VECTOR MACHINES Matthieu R Bloch Tuesday, February 25, 2020 1

RBF Kernels: Generating a complex dataset DataCamp Support Vector Machines in R A bit about RBF

Machine Learning for NLP Support Vector Machines Aurlie Herbelot 2019 Centre for Mind/Brain

Generating a radially separable dataset DataCamp Support Vector Machines in R Generating a 2d

Support Vector Machines 290N, 2014 Support Vector Machines (SVM) Supervised learning

Best Practices in Demonstrating Evidence Diana Epstein, Ph.D, CNCS Office of Research and

Assessing Community Need Resource ID National HOPWA Institute 2017 Tampa, FL Presentation

Popula'on Structure and Disease- Associa'ons 02-223 How to

Control Room Operations Working Group Geoff Savage WA104/ICARUS Technical Working Group Meeting

Political Science 209 - Fall 2018 Causal Inference Florian Hollenbach 7th September 2018 Causal

Econ 551 Government Finance: Revenues Fall 2019 Given by Kevin Milligan Vancouver School of

Challenges, advantages, and limitations of quasi-experimental approaches to evaluate interventions

Towards Post-Quantum TLS ECC 2019 Kris Kwiatkowski DECEMBER 2, 2019 OVERVIEW OUR APPROACH

Support Vector Machines for Uplift Modeling Lukasz Zaniewicz 2 Szymon - PowerPoint PPT Presentation

Support Vector Machines for Uplift Modeling Lukasz Zaniewicz 2 Szymon Jaroszewicz 1 , 2 1 National Institute of Telecommunications Warsaw, Poland 2 Institute of Computer Science Polish Academy of Sciences Warsaw, Poland Lukasz

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 Support Vector Machines Kernel

? 17.10.2018 3 17.10.2018 4 Support Vector Machines (SVM): Background Support Vector Machines

Support Vector Machines October 16, 2018 Support Vector Machines October 16, 2018 1 / 31

Relevance Vector Machines Jukka Lankinen LUT February 21, 2011 Jukka Lankinen Relevance Vector

Vector addition: The zero vector The D -vector whose entries are all zero is the zero vector ,

Support Vector Machines &amp; Kernelization Barna Saha Most of the slides are made using David

Introduction Kailash Awati Instructor DataCamp Support Vector Machines in R Preliminaries

Support Vector Machines Support Vector Machines Hypothesis Space Hypothesis Space variable

Support Vector Machines (Ch. 18.9) SVM Basics Support Vector Machines (SVMs) try to do our

Support vector machines CS 446 Part 1: linear support vector machines 1.0 1.0 1.0 0.8 0.8

SUPPORT VECTOR MACHINES SUPPORT VECTOR MACHINES Matthieu R Bloch Tuesday, February 25, 2020 1

RBF Kernels: Generating a complex dataset DataCamp Support Vector Machines in R A bit about RBF

Machine Learning for NLP Support Vector Machines Aurlie Herbelot 2019 Centre for Mind/Brain

Generating a radially separable dataset DataCamp Support Vector Machines in R Generating a 2d

Support Vector Machines 290N, 2014 Support Vector Machines (SVM) Supervised learning

Best Practices in Demonstrating Evidence Diana Epstein, Ph.D, CNCS Office of Research and

Assessing Community Need Resource ID National HOPWA Institute 2017 Tampa, FL Presentation

Popula'on Structure and Disease- Associa'ons 02-223 How to

Control Room Operations Working Group Geoff Savage WA104/ICARUS Technical Working Group Meeting

Political Science 209 - Fall 2018 Causal Inference Florian Hollenbach 7th September 2018 Causal

Econ 551 Government Finance: Revenues Fall 2019 Given by Kevin Milligan Vancouver School of

Challenges, advantages, and limitations of quasi-experimental approaches to evaluate interventions

Towards Post-Quantum TLS ECC 2019 Kris Kwiatkowski DECEMBER 2, 2019 OVERVIEW OUR APPROACH

Support Vector Machines & Kernelization Barna Saha Most of the slides are made using David