support vector machines for uplift modeling
play

Support Vector Machines for Uplift Modeling Lukasz Zaniewicz 2 Szymon - PowerPoint PPT Presentation

Support Vector Machines for Uplift Modeling Lukasz Zaniewicz 2 Szymon Jaroszewicz 1 , 2 1 National Institute of Telecommunications Warsaw, Poland 2 Institute of Computer Science Polish Academy of Sciences Warsaw, Poland Lukasz


  1. Support Vector Machines for Uplift Modeling Lukasz Zaniewicz 2 Szymon Jaroszewicz 1 , 2 � 1 National Institute of Telecommunications Warsaw, Poland 2 Institute of Computer Science Polish Academy of Sciences Warsaw, Poland � Lukasz Zaniewicz, Szymon Jaroszewicz Uplift SVMs

  2. What is uplift modeling? From workshop’s description: Traditionally, causal relationships are identified based on controlled experiments. [...] there has been an increasing interest in discovering causal relationships from observational data only. � Lukasz Zaniewicz, Szymon Jaroszewicz Uplift SVMs

  3. What is uplift modeling? From workshop’s description: Traditionally, causal relationships are identified based on controlled experiments. [...] there has been an increasing interest in discovering causal relationships from observational data only. Suppose we do have data from a controlled experiment Question: what can Machine Learning do for us? Relatively little interest in the ML community � Lukasz Zaniewicz, Szymon Jaroszewicz Uplift SVMs

  4. What is uplift modeling? Uplift modeling Given two training datasets: the treatment dataset 1 individuals on which an action was taken the control dataset 2 individuals on which no action was taken used as background Build a model which predicts the causal influence of the action on a given individual � Lukasz Zaniewicz, Szymon Jaroszewicz Uplift SVMs

  5. Uplift modeling Notation: P T probabilities in the treatment group P C probabilities in the control group Traditional classifiers predict the conditional probability P T ( Y | X 1 , . . . , X m ) Uplift models predict change in behaviour resulting from the action P T ( Y | X 1 , . . . , X m ) − P C ( Y | X 1 , . . . , X m ) � Lukasz Zaniewicz, Szymon Jaroszewicz Uplift SVMs

  6. Why uplift modeling? A typical marketing campaign Select Pilot Sample Model P ( buy | X ) targets for campaign campaign � Lukasz Zaniewicz, Szymon Jaroszewicz Uplift SVMs

  7. Why uplift modeling? A typical marketing campaign Select Pilot Sample Model P ( buy | X ) targets for campaign campaign But this is not what we need! We want people who bought because of the campaign Not people who bought after the campaign � Lukasz Zaniewicz, Szymon Jaroszewicz Uplift SVMs

  8. A typical marketing campaign We can divide potential customers into four groups 1 Responded because of the action ( the people we want ) 2 Responded, but would have responded anyway ( unnecessary costs) 3 Did not respond and the action had no impact ( unnecessary costs ) 4 Did not respond because the action had a ( negative impact ) � Lukasz Zaniewicz, Szymon Jaroszewicz Uplift SVMs

  9. Marketing campaign (uplift modeling approach) Marketing campaign (uplift modeling approach) Treatment Pilot sample campaign Model Select P T ( buy | X ) − targets for P C ( buy | X ) campaign Control sample � Lukasz Zaniewicz, Szymon Jaroszewicz Uplift SVMs

  10. Applications in medicine A typical medical trial: treatment group: gets the treatment control group: gets placebo (or another treatment) do a statistical test to show that the treatment is better than placebo With uplift modeling we can find out for whom the treatment works best Personalized medicine � Lukasz Zaniewicz, Szymon Jaroszewicz Uplift SVMs

  11. Main difficulty of uplift modeling Rubin’s causal inference framework The fundamental problem of causal inference Our knowledge is always incomplete For each training case we know either what happened after the treatment, or what happened if no treatment was given Never both! This makes designing uplift algorithms challenging � Lukasz Zaniewicz, Szymon Jaroszewicz Uplift SVMs

  12. The two model approach An obvious approach to uplift modeling: 1 Build a classifier M T modeling P T ( Y | X ) on the treatment sample 2 Build a classifier M C modeling P C ( Y | X ) on the control sample 3 The uplift model subtracts probabilities predicted by both classifiers M U ( Y | X ) = M T ( Y | X ) − M C ( Y | X ) � Lukasz Zaniewicz, Szymon Jaroszewicz Uplift SVMs

  13. Two model approach Advantages: Works with existing classification models Good probability predictions ⇒ good uplift prediction Disadvantages: Differences between class probabilities can follow a different pattern than the probabilities themselves each classifier focuses on changes in class probabilities but ignores the weaker ‘uplift signal’ algorithms designed to focus directly on uplift can give better results � Lukasz Zaniewicz, Szymon Jaroszewicz Uplift SVMs

  14. Uplift Support Vector Machines � Lukasz Zaniewicz, Szymon Jaroszewicz Uplift SVMs

  15. Uplift Support Vector Machines Support Vector Machines (SVMs) are a popular Machine Learning algorithm Here we adapt them to the uplift modeling problem � Lukasz Zaniewicz, Szymon Jaroszewicz Uplift SVMs

  16. Uplift Support Vector Machines Recall that the outcome of an action can be positive negative neutral � Lukasz Zaniewicz, Szymon Jaroszewicz Uplift SVMs

  17. Uplift Support Vector Machines Recall that the outcome of an action can be positive negative neutral Main idea Use two parallel hyperplanes dividing the sample space into three areas: positive (+1) neutral (0) negative ( − 1) � Lukasz Zaniewicz, Szymon Jaroszewicz Uplift SVMs

  18. Uplift Support Vector Machines +1 H 1 0 H 2 − 1 H 1 : � w , x � + b 1 = 0 H 2 : � w , x � + b 2 = 0 � Lukasz Zaniewicz, Szymon Jaroszewicz Uplift SVMs

  19. Uplift Support Vector Machines How do we train Uplift SVMs? Classical SVMs: need to know if a case is classified correctly Fundamental problem of causal inference ⇒ We never know if a point was classified correctly! The algorithm must use only the information available � Lukasz Zaniewicz, Szymon Jaroszewicz Uplift SVMs

  20. Uplift Support Vector Machines Four types of points: T + , T − , C + , C − Positive area (+1): T − , C + definitely misclassified T + , C − may be correct, at worst neutral Negative area (-1): T + , C − definitely misclassified T − , C + may be correct, at worst neutral Neutral area (0): all predictions may be correct or incorrect � Lukasz Zaniewicz, Szymon Jaroszewicz Uplift SVMs

  21. Uplift Support Vector Machines – problem formulation Penalize points separately for being on the wrong side of each hyperplane Points in the neutral area are penalized for crossing one hyperplane this prevents all points from being classified as neutral Points which are definitely misclassified are penalized for crossing two hyperplanes such points should be avoided, thus the higher penalty Other points are not penalized � Lukasz Zaniewicz, Szymon Jaroszewicz Uplift SVMs

  22. Uplift Support Vector Machines – problem formulation +1 T + C − T − H 1 ξ i , 2 ξ i , 1 0 ξ i , 1 C + T + ξ i , 2 H 2 ξ i , 1 ξ i 2 T + − 1 C + � Lukasz Zaniewicz, Szymon Jaroszewicz Uplift SVMs

  23. Optimization task – primal form 1 � � min 2 � w , w � + C 1 ξ i , 1 + C 2 ξ i , 1 w , b 1 , b 2 ∈ R m +2 D T + ∪ D C D T − ∪ D C + − � � + C 2 ξ i , 2 + C 1 ξ i , 2 , D T + ∪ D C D T − ∪ D C + − subject to: � w , x i � + b 1 ≤ − 1 + ξ i , 1 , for ( x i , y i ) ∈ D T + ∪ D C − , � w , x i � + b 1 ≥ +1 − ξ i , 1 , for ( x i , y i ) ∈ D T − ∪ D C + , � w , x i � + b 2 ≤ − 1 + ξ i , 2 , for ( x i , y i ) ∈ D T + ∪ D C − , � w , x i � + b 2 ≥ +1 − ξ i , 2 , for ( x i , y i ) ∈ D T − ∪ D C + , ξ i , j ≥ 0 , dla i = 1 , . . . , n , j ∈ { 1 , 2 } , � Lukasz Zaniewicz, Szymon Jaroszewicz Uplift SVMs

  24. Optimization task – primal form We have two penalty parameters: C 1 penalty coefficient for being on the wrong side of one hyperplane C 2 coefficient of additional penalty for crossing also the second hyperplane All points classified as neutral are penalized with C 1 ξ All definitely misclassified points are penalized with C 1 ξ and C 2 ξ How do C 1 and C 2 influence the model? � Lukasz Zaniewicz, Szymon Jaroszewicz Uplift SVMs

  25. Influence of penalty coefficients C 1 and C 2 on the model Lemma For a well defined model C 2 ≥ C 1 . Otherwise the order of the hyperplanes would be reversed. Lemma If C 2 = C 1 then no points are classified as neutral. Lemma For sufficiently large ratio C 2 / C 1 no point is penalized for crossing both hyperplanes. (Almost all points are classified as neutral.) � Lukasz Zaniewicz, Szymon Jaroszewicz Uplift SVMs

  26. Influence of penalty coefficients C 1 and C 2 on the model The C 1 coefficient plays the role of the penalty in classical SVMs The ratio C 2 / C 1 decides on the proportion of cases classified as neutral � Lukasz Zaniewicz, Szymon Jaroszewicz Uplift SVMs

  27. Example: the tamoxifen drug trial data tamoxifen classified negative 320 classified neutral 280 classified positive 240 number of cases 200 160 120 80 40 1.00 1.02 1.04 1.06 1.08 1.10 1.12 1.14 1.16 C 2 /C 1 � Lukasz Zaniewicz, Szymon Jaroszewicz Uplift SVMs

  28. Example: the tamoxifen drug trial data tamoxifen 12 8 4 uplift [ % ] 0 4 8 classified negative 12 classified neutral classified positive 16 1.00 1.02 1.04 1.06 1.08 1.10 1.12 1.14 1.16 C 2 /C 1 � Lukasz Zaniewicz, Szymon Jaroszewicz Uplift SVMs

  29. Evaluating uplift models � Lukasz Zaniewicz, Szymon Jaroszewicz Uplift SVMs

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend