Presenter Daymond Ling, Professor, Seneca College Daymond Ling is a - - PowerPoint PPT Presentation

presenter
SMART_READER_LITE
LIVE PREVIEW

Presenter Daymond Ling, Professor, Seneca College Daymond Ling is a - - PowerPoint PPT Presentation

Presenter Daymond Ling, Professor, Seneca College Daymond Ling is a Professor at Seneca College of Applied Technology and Arts where he has been teaching advanced analytics and machine learning in the School of Marketing since 2015. Prior to


slide-1
SLIDE 1
slide-2
SLIDE 2

Presenter

Daymond Ling, Professor, Seneca College

Daymond Ling is a Professor at Seneca College of Applied Technology and Arts where he has been teaching advanced analytics and machine learning in the School of Marketing since 2015. Prior to teaching, he was Senior Director, Advanced Analytics at Canadian Imperial Bank of Commerce where he focused on solving all manners of marketing analytics problems related to customer relationship management since 1996. He worked for American Express Canada in Risk Management before CIBC. Daymond received his M.Sc. degree in Operations Research and B.Sc. in Physics Honours from the University of British Columbia. He started using SAS in 1980 and has continued ever since.

1

slide-3
SLIDE 3

1489-2017 Timing is Everything

Detecting Important Behaviour Triggers

slide-4
SLIDE 4

Predictive Analytics

If the future looks like the past, then past patterns can be used to predict the future

3

His istorical W l Win indow Predicti tion W Window Target definition of desired outcome within some timeframe Historical information up to current period used to find pattern that strongly correlates with target definition But L Life i is Dynamic. What t if the p e process i is non-stati tionary a and h has c changed ed?

Now Future Past

slide-5
SLIDE 5

Think Outside The Box

1.

  • 1. Who i

is likely t to b buy

2.

  • 2. Wh

Who h has mo more mo money ey

1.

  • 1. Who c

changed?

2.

  • 2. What w

was the c change?

4

Compare d e differ eren ent p people a e at t the s same t e time: e: Follow t w the s e same p e person in time: e: Change y your p per erspective. Ask d different t ques esti tions to g get t new i w insights ts.

slide-6
SLIDE 6

Change Point Analysis

Shift in Mean

slide-7
SLIDE 7

Change Point Analysis

Econ

  • nom
  • my:
  • Gros
  • ss D

Dom

  • mestic P

ic Prod

  • duct

ct

  • Stock M

Market I Inde ndex Cor

  • rpor
  • rate P

Performance ce Metr trics ics:

  • Nu

Number o

  • f clien

ents

  • Portfolio

lio ba balance Customer er D Det etails:

  • EFT

FT pa payroll l of a a che hequi quing account

  • Your monthly

hly credit it c card s d spe pend nd

6

Hundred eds Hundred eds Tens of

  • f Milli

illions

Change Point Analysis is the problem of estimating the point at which some statistical property changes. This presentation will focus on change in the me mean, i.e., the average has shifted.

slide-8
SLIDE 8

Did the Mean shift?

Naïve r e rule: e: mea ean(p (per eriod 2) d differ eren ent f t from mean(per eriod 1 1) b by 2 20%

7

The hese g graph phs m meet 20+% cha hange, but t they ey are e False P e Positives es: Na Naïve r rule a also g gen enerate e False e Negatives es. Statistic ical t l tests of m mean di difference, e.g., t two sample ple t-test, i involve r ratio o

  • f m

mean n differenc nce t to s standa dard de deviation, i it is no not ba based d on n the m mean d n differenc nce only. The he issue ue with t h the he na naïve rul ule is that it it does not t take data varia iabilit ity into a account. The he de decisio ion rul ule m mus ust be be modif ified t d to take v variabil ilit ity into c conside deration. n.

slide-9
SLIDE 9

Detect Mean Shift via CUSUM Range

CUSUM: cumulative sum of deviations of centered series

  • If deviations are random, they tend to cancel out resulting in small CUSUM range
  • Shifted sections have deviations of the same sign, CUSUM will move away from zero

8

Dec ecision rule: Large C CUSUM range i e is indicative o e of s shifted ed m mea ean

slide-10
SLIDE 10

How Large is Large?

Leverage variability of empirical data to determine “Large”:

  • Calculate Empirical CUSUM
  • Calculate CUSUM distribution by randomly shuffling the data many times
  • P-value of Empirical CUSUM is significant is proportion of Distribution >= Empirical

9

Natural data variation d deter etermine w e whether er e empirical p patter ern i is unusual

By using actual data variability to perform significance test, False Positive and False Negative can be minimized

slide-11
SLIDE 11

When Did It Happen?

1.

Change occurred at Max x Ab Abso solute CU CUSUM

  • Simple to compute
  • A little less precise

10

2.

Change occurred at point

  • f Mi

Minimu mum V m Variance

  • More complex calculation
  • More precise

Two estimators for location of change: Recursiv ively ly split lit a a tim ime s series in into m man any s sections

slide-12
SLIDE 12

Reaction Speed to New Change

V1 V1 V2 V2 V3 V3 V4 V4 V5 V5 V6 V6 V7 V7 V8 V8 V9 V9 V1 V10 V1 V11 V1 V12 Pr Prob

  • b

10.53 12.15 11.64 8.89 13.08 7.53 9.72 12.08 10.13 12.45 9.70 10.58 0.15 12.15 11.64 8.89 13.08 7.53 9.72 12.08 10.13 12.45 9.70 10.58 20 0.39 11.64 8.89 13.08 7.53 9.72 12.08 10.13 12.45 9.70 10.58 20 20 0. 0.77 77 8.89 13.08 7.53 9.72 12.08 10.13 12.45 9.70 10.58 20 20 20 0. 0.93 93 13.08 7.53 9.72 12.08 10.13 12.45 9.70 10.58 20 20 20 20 0. 0.99 99

11

  • The first series has no change. Probability of mean shift = 0.15, insignificant.
  • When shifted by one and appending a single large value, probability increases to 0.39, still
  • insignificant. CPA is robust to single spike.
  • Shifting the series by two and append two large values, probability increases to 0.77. CPA is

signaling heightened likelihood of mean shift, odds are 3:1.

  • Shifting by three and appending three large values raises the probability to 0.93. With four

consecutive value, probability is 0.99.

For s short t time s e series es, a a section o

  • f t

three o ee or m more e shifted ed v value h e has P >= 0.90

slide-13
SLIDE 13

Decision Logic

12

St Statis istical l Si Signif ific icance

  • No False Positive
  • No False Negative

Busi usiness ss Si Signif ific icance

  • Magnitude of

change interesting to business

Events o

  • f

Interest st

  • True events for

investigation and intervention

slide-14
SLIDE 14

Pay Increase

slide-15
SLIDE 15

Large Scale Computation

Numer erically i inte tensive e computation: 1.

  • 1. Hundreds of
  • f m

milli illions of

  • f tim

ime s serie ies 2.

  • 2. Random s

shuffl fle s e significance te e test 3.

  • 3. Minimizati

tion of R Res esidual S Sum o

  • f S

f Squares En End-to to-end p process b built i in SAS:

  • Handles

es v varying l g length t time s e series es

  • 28 code

e modules es including d g data p prep

  • 2,000 lin

lines of

  • f c

cod

  • de
  • In

In-mem emory p processing f g for eff efficiency (in in-me memor

  • ry a

y array d y dat ata s structures)

14

slide-16
SLIDE 16

EFT Payroll Increase

Proce

  • cess 155 M

Million ion p payroll

  • ll e

events Each month de h detect 1 1,200 – 1, 1,500 500 c clients Can de n detect large pa pay s spi pike, e e.g., l lum ump s p sum um pa payments

Step Records CPU Time Elapsed

  • 1. Extract four years of EFT payroll

155 Million records 18 minutes 25 minutes

  • 2. Aggregation (customers with multiple acct)

145 Million records 2 minutes 2 minutes

  • 3. Eliminate low pay and closed accounts

85 Million records 5 minutes 5 minutes

  • 4. Pay frequency determination

62 Million records 2 minutes 2 minutes

  • 5. Remove irregular off cycle pay

60 Million records 5 minutes 5 minutes

  • 6. Kalman Filter smoothing of pay spikes

60 Million records 4 minutes 4 minutes

  • 7. Change Point Detection

60 Million records 7 hours 7 hours

  • 8. Event Selection

17K Accounts 1 minute 1 minute

15

slide-17
SLIDE 17

Payroll increase examples (clean)

16

slide-18
SLIDE 18

Payroll increase examples (noisy)

17

slide-19
SLIDE 19

More money in pocket means…

We ide dentify i in n one ne year 2 21K c cus ustomers tha hat g grow f funds unds b by $300 Millio ion and nd inc ncrease c card s d spe pend nd by $50 Millio ion 1. 1. Payroll ll inc ncrease more often f for young nger pe people le, a , and t nd the hey ope pen m more a accoun unts 2. 2. Youn unger cus ustomers i invest more, s spe pend m nd more, bo borrow m more (ne new c car, , bi bigger ho hous use) 3. 3. Olde der pe people ple j jus ust save t the heir ir a addi dditio iona nal i l inc ncome, t the hey do don’ n’t spe pend nd more or bo borrow more

18

After Pay I Increa ease se Per C Customer er F Fund unds s Increa ease se Total i in $ $Million Age ge Cust stomer Account Increa ease se Asse sset Lendi ding ng Card rd Spend Funds Card rd Spend 19 19 - 35 35 7, 7,80 800 6. 6.6% 6% $5,900 900 $10, $10,800 $3,200 200 $130 $130 $25 $25 35 35 - 45 45 5, 5,70 700 4. 4.3% 3% $5,500 500 $5,400 400 $2,400 400 $62 $62 $15 $15 45 45 - 55 55 5, 5,00 000 3. 3.2% 2% $9,600 600 $3,500 500 $3,100 100 $66 $66 $15 $15 55 55 - 65 65 2, 2,50 500 2. 2.2% 2% $16, $16,100 $600 $600

  • $1,900

900 $42 $42

  • $5

$5 Al All 21,000 000 4. 4.3% 3% $7,900 900 $6,400 400 $2,300 300 $300 $300 $50 $50

slide-20
SLIDE 20

Credit Card Spend

slide-21
SLIDE 21

Credit Card Spend Decrease

Scenario:

  • Portfolio of 4 million+ credit cards
  • Annual spend volume low to plan

by approximately $700 million

20

Causes:

  • Reduced acquisition?
  • Increased attrition?
  • Slow down in economy?
  • Increased competition in Reward cards?
slide-22
SLIDE 22

Change Point Analysis Process

Step Records CPU Time Elapsed Time

  • 1. Extract three years of account PV

178 Million 6 minutes 1 hour

  • 2. Aggregate to customer

126 Million 15 minutes 15 minutes

  • 3. Eliminate low spend and closed accounts

48 Million 6 minutes 6 minutes

  • 4. Change Point Detection

48 Million 4 hours 4 hours

  • 5. Event selection

8K customer 1 minute 1 minute

Process 178 million records Each month detect 600 clients with significant PV decrease

21

slide-23
SLIDE 23

Losing $1 Million+ per customer in a year…

22

Each c customer er l lower ered ed s spend b by $ $1 Million+ p per y year

slide-24
SLIDE 24

Losing $500K+ per year…

Each c customer er l lower ered ed s spend b by $ $500K+ K+ p per year

23

slide-25
SLIDE 25

Spend Loss Identified

Annual s l spend of

  • f 4

4milli illion+ c car ards w was as lo low t to p plan lan b by ~ y ~ $700 M Milli illion. CPA id identifie ied 8 8K clie lients w wit ith an annual s l spend los loss of

  • f ~

~ $545 M Millio illion.

24

Group Customer % Customer Average A Annua nnual Spen end L Loss Annua nnual S l Spe pend nd Los Loss Inde ndex 1 100 100 1% 1% $1 mill llion+ $109 $109 million 14. 14.9 2 400 400 5% 5% $266K $266K $109 $109 million 3. 3.9 3 1K 1K 13% 13% $110K $110K $109 $109 million 1. 1.6 4 2. 2.1K 1K 26% 26% $52K $52K $109 $109 million 0. 0.8 5 4. 4.4K 4K 55% 55% $25K $25K $109 $109 million 0. 0.4 To Total 8K 8K 100% 100% $68K $68K $545 $545 million 1. 1.0

slide-26
SLIDE 26

Interest Rate Sensitive Customers

slide-27
SLIDE 27

Savings Account Bonus Rate Promotion

26

Scenario:

  • Frequent promotion of Bonus

Interest Rate on new balance

  • Portfolio balance fluctuates with

promotion on/off Questions:

  • Who is interest rate sensitive?
  • How many are they?
  • How much do they swing the portfolio

balance?

slide-28
SLIDE 28

Analysis Process

27

$5 Million fluctuation

Grey = high interest promotion White = regular low interest

1. 1. Change Point Analy lysis to id identify accounts w with b bala lance c changes ( (mult lti-per eriod) 2. 2. Che heck c correlatio ion n of ba balance cha hang nge pa pattern with campa paig ign pe n periods ds $500K fluctuation

slide-29
SLIDE 29

Outcome

Hot Money # of Customer Total ($Million) $ Million+ 815 $2,167 $500k - $1Million 1,313 $904 $250K - $500K 2,381 $838 $100K - $250K 4,361 $699 Total 8,870 $4,607

28

Portfolio c consists of

  • f 400K+ ac

accounts w wit ith ~ ~ $20 Billion illion in in balan alance. We f e found 9 9K customers t that a are r e responsible f e for ~ ~ $4.6 B Billion o

  • f H

Hot Money. Instea ead o

  • f a

an i inte teres est rate te issue, e, t the c e conver ersation c changed ed to to h how to w to l look after er w wealthy c customer ers that a are looking f g for r ret eturn f from s safe i e instrumen ents.

slide-30
SLIDE 30

Back to Predictive Analytics

slide-31
SLIDE 31

Improve Predictive Analytics

1. 1.

Bet etter er T Target D t Definition

  • Balance/Spend increase/decrease model target definition use the naïve rule, thus

the target set is not clean resulting in poor model

  • Use CPA to define very clean targets – won’t mis-identify, won’t miss any
  • Align individual customer’s time window on the exact time of change. This cleanly

delineates and improves the historical and prediction window.

2. 2.

Bet etter er Input F Features es

  • Events are cleaner and clearer signals compared to the raw time series
  • Use past event triggers as inputs to predictive models to improve predictability
  • Build customer event database

30

slide-32
SLIDE 32

Timing is Everything

31

slide-33
SLIDE 33

Thank Y You daymond.ling@senecac acollege.ca

32

1489 Timing is Everything

slide-34
SLIDE 34

Please Provide Feedback!

1.

Go to the Agenda icon in the conference app.

2.

Find this session title 1489 489 and select it.

3.

On the sessions page, scroll down to Surveys and select the name of the survey.

4.

Complete the survey and click Finish.

33

slide-35
SLIDE 35