Dynamic Micro Targeting: Fitness- Based Approach to Predicting - - PowerPoint PPT Presentation

dynamic micro targeting fitness
SMART_READER_LITE
LIVE PREVIEW

Dynamic Micro Targeting: Fitness- Based Approach to Predicting - - PowerPoint PPT Presentation

Dynamic Micro Targeting: Fitness- Based Approach to Predicting Individual Preferences Tianyi Jiang Alexander Tuzhilin Leonard N. Stern School of Business New York University February 2007 1 Personalization Research From Amazon shopping to


slide-1
SLIDE 1

Dynamic Micro Targeting: Fitness- Based Approach to Predicting Individual Preferences

Tianyi Jiang Alexander Tuzhilin Leonard N. Stern School of Business New York University February 2007

1

slide-2
SLIDE 2

Personalization Research

2

From Amazon shopping to choosing your politician, personalizing your decision choices via Data Mining

slide-3
SLIDE 3

Research Questions

  • How to effectively segment customer base?
  • What is the “ideal” segmentation of the

customer base?

  • Is it practically achievable?
  • What is the distribution of the segment

sizes in this ideal segmentation scheme?

  • Is it better to partition customers and

products together to achieve better targeting?

3

slide-4
SLIDE 4

Customer Segmentation via Direct Grouping Methods

4

Direct grouping

  • f

customers C into segments:

  • combine the transactional data of

customers Cm, Cm+1, …, Cn into a group Pi = (Cm, Cm+1, …, Cn)

  • Build a predictive model
  • Define fitness score
  • n the model

e.g. RAE, RME, etc.

slide-5
SLIDE 5

Customer Segmentation via Direct Grouping Methods (Example)

5

  • C are Amazon customers
  • Pi is customers from NewYork City
  • X1, …, Xp are these customer’s demographic and

purchase attributes such as age, gender, day of purchase, purchase total, etc.,

  • Y
  • will

they buy a product while visiting Amazon.com,

  • predicts

these customers’ propensity to purchase during a Amazon.com visit

  • fitness score

is the Relative Absolute Error

slide-6
SLIDE 6

Optimal Customer Segmentation (OCS) Problem

Given the customer base C of N customers and predictive model

Partition C into the set of mutually exclusive collectively exhaustive segments P = {P1,...,Pk},

  • Build predictive model

for each segment Pi

  • Find optimal partitioning P = {P1,...,Pk} so that the objective

function

is maximized over all possible partitions P, where

is the fitness function for segment Pi and weight αi specifies “importance” of segment i.

6

slide-7
SLIDE 7

OCS Solution Space

  • Theorem. OCS problem is NP-hard…

Therefore… suboptimal solution:

  • find a suboptimal polynomial customer

segmentation methods providing reasonable fitness

7

slide-8
SLIDE 8

Related Work

  • Combinatorial Optimization Problems in

Operations Research – (Land et al. 1960, Guignard et al.

1987, Gomory 1958)

  • Customer segmentation and clustering in Marketing

Research – clustering, mixture models (Wedel et al.

2000)

  • Data Mining Research on Customer Segmentation –

basket shopping, hierarchical, & pattern based clustering (Brijs et al. 2001, Jiang et al. 2006,

Yang et al. 2003)

8

slide-9
SLIDE 9

Traditional Segmentation Methods

Hierarchical Clustering (HC)

  • compute

some summary statistics from customers’ demographic and transactional data

  • consider these statistics as points in an n-

dimensional space

  • group customers into segments by applying

various clustering algorithms to these n- dimensional points.

* Jiang, Tuzhilin, “Segmenting Customers from Population to Individuals: Does 1-to-1 Keep Your Customers Forever?” TKDE 18(10), 2006

9

slide-10
SLIDE 10

Traditional Segmentation Methods

Affinity Propagation (AP)

  • n unique customers
  • AP identifies a set of training points, exemplars, as

cluster centers by recursively propagating “affinity messages” among training points.

  • Similar to greedy K-medoids algorithms, AP picks

exemplars as cluster centers during every iteration

  • where each exemplar in our study is a single customer

represented by his/her summary statistics vector.

10

slide-11
SLIDE 11

11

Iterative Merge (IM) Method:

  • start

with segments containing individual customers,

  • iteratively merge two existing segments SegA, and

SegB at a time when

I. the predictive model based on the combined data performs better and II. combining SegA with any other existing segments would have resulted in a worse performance than the combination of both SegA and SegB.

Suboptimal Efficient Solution of OCS Problem using Direct Grouping

slide-12
SLIDE 12

Product Types × Customer Matrix

( √ stands for a purchase of product type by customer)

12

Customer

1

Customer

2

… Customer

N

Product Type1 √ Product Type2 √ √ … … … … … Product TypeL √ √

Micro Targeting…

slide-13
SLIDE 13

Micro Targeting Method

13

Iterative Merge Products (IM_Prod):

  • start with segments containing individual

customer’s specific product type transaction data

  • Bootstrap
  • peration

to merge small segments based on K-nearest neighbors of customer’s product type and demographic summary statistics vectors

  • Run

IM with customers’ product type segments

slide-14
SLIDE 14

Empirical Comparisons of Different Approaches

14

Comparing Three Segmentation Approaches:

  • Statistics based
  • Direct grouping based
  • Micro Targeting based

Across five dimensions of different

  • Types of datasets (ComScore, Nielsen, Synthetic data)
  • Types of customers (high vs. low-volume)
  • Types of predictive models (classifiers J48 & Naïve Bayes)
  • Dependent variables (3 variables per dataset)
  • Performance Measures
  • Root Mean Squared Error – RME
  • Relative Absolute Error – RAE
  • Correctly Classified Instances - CCI
slide-15
SLIDE 15

Data Sets: Customer Types & Transaction Counts

15

DataSet Customer Type %

  • f

Total Population Families Total Transactions Average Transactions Per Family ComScore High 5% 2,230 137,157 62 ComScore Low 5% 2,230 24,344 11 Nielsen High 10% 156 28,985 186 Nielsen Low 10% 156 5,007 32 Syn-High High 100% 2,048 204,800 100 Syn-Low Low 100% 2,048 20,480 10

slide-16
SLIDE 16

Statistical Significance

We apply the Mann-Whitney rank test to compare any two performance distributions across

  • 6 datasets
  • 3 variables
  • 2 classifiers
  • 3 performance measures

for a total of 108 pair-wise distribution tests between any segmentation approaches

16

slide-17
SLIDE 17

Statistical Significance

The null hypothesis for comparing distributions generated by methods A and B for a performance measure is:

17

(I) H0: The distribution of a performance measure generated by method A is not different from the distribution of the performance measure generated by method B. H1+: The distribution of a performance measure generated by method A is different from the distribution of the performance measure generated by method B in the positive direction. H1-: The distribution of a performance measure generated by method A is different from the distribution of the performance measure generated by method B in the negative direction.

slide-18
SLIDE 18

Empirical Results Comparing All Methods

Performance tests across all statistics-based segmentation methods for Hypothesis Test (I) at 95% significance level (numbers in columns H1+ and H1- indicate the number of statistical tests that reject hypothesis H0. Total significance tests per method to method comparison pair is 108)

18

Methods HC IM IM_Prod H+ H- H+ H- H+ H- AP 66 18 12 57 108 HC

  • 6

90 108 IM_Prod 108 96

slide-19
SLIDE 19

19

Empirical Results

Sample CCI score distributions

(“Day of the Week” prediction across High & Low-Volume ComScore Customers)

High-Volume Datasets Low-Volume Datasets IM IM_Prod

slide-20
SLIDE 20

20

Empirical Results

Error distributions

(“Day of the Week” prediction across High & Low-Volume ComScore Customers)

IM_Prod IM RAE

High Volume

RME

Low Volume

slide-21
SLIDE 21

Empirical Results

Segment Size Distribution Generated by IM_Prod and IM

21

High-Volume Datasets

500 1000 1500 2000 2500 3000 3500 4000 4500 5000

1 8 15 22 29 36 43 50 Segment Size Count 1000 2000 3000 4000 5000 6000 7000 8000 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 Segment Size Count

Low-Volume Datasets

2000 4000 6000 8000 10000 12000 14000 16000 1 3 5 22 52 118 688 732 983

Segment Size Number of Segments 500 1000 1500 2000 2500 3000 1 3 5 7 68 99 408 674 1032 1304

Segment Size Number of Segments

IM IM_Prod

slide-22
SLIDE 22

Empirical Results

Customer Segment Membership Count Distribution

22

High-Volume Datasets Low-Volume Datasets

1000 2000 3000 4000 5000 6000 7000 1 2 3 4 5 6 7 8

High Vol Segment Membership Frequency

1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 1 2 3 4 5 6 7

Low Vol Segment Membership Frequency

slide-23
SLIDE 23

Empirical Results

Generated segments in the “Segment Count” × “Average CCI per segment” × “Number of Purchases in Segment” space

23

High-Volume Datasets Low-Volume Datasets IM IM_Prod

slide-24
SLIDE 24

IM_Prod Computational Expense

24

slide-25
SLIDE 25

Conclusions

  • Partition customers based on micro targeting

results in formation of “better” customer segmentations than traditional clustering based and fitness-based direct grouping approaches

  • Micro targeting produces smaller segments than

Direct Grouping methods

  • The above results add support for Micro

Segmentation (partition based on both customer and product types) approaches to personalization

25

slide-26
SLIDE 26

Future Research

  • Improve method not just based on predictive accuracy, but

also in terms of the standard marketing oriented performance measures such as customer value, profitability and other economics based performance measures

  • Investigate scalability and generalizability issues of our

approach against different types of very large real world datasets and be able to handle incremental or time series data

26