Presenter Daymond Ling, Professor, Seneca College Daymond Ling is a - - PowerPoint PPT Presentation
Presenter Daymond Ling, Professor, Seneca College Daymond Ling is a - - PowerPoint PPT Presentation
Presenter Daymond Ling, Professor, Seneca College Daymond Ling is a Professor at Seneca College of Applied Technology and Arts where he has been teaching advanced analytics and machine learning in the School of Marketing since 2015. Prior to
Presenter
Daymond Ling, Professor, Seneca College
Daymond Ling is a Professor at Seneca College of Applied Technology and Arts where he has been teaching advanced analytics and machine learning in the School of Marketing since 2015. Prior to teaching, he was Senior Director, Advanced Analytics at Canadian Imperial Bank of Commerce where he focused on solving all manners of marketing analytics problems related to customer relationship management since 1996. He worked for American Express Canada in Risk Management before CIBC. Daymond received his M.Sc. degree in Operations Research and B.Sc. in Physics Honours from the University of British Columbia. He started using SAS in 1980 and has continued ever since.
1
1489-2017 Timing is Everything
Detecting Important Behaviour Triggers
Predictive Analytics
If the future looks like the past, then past patterns can be used to predict the future
3
His istorical W l Win indow Predicti tion W Window Target definition of desired outcome within some timeframe Historical information up to current period used to find pattern that strongly correlates with target definition But L Life i is Dynamic. What t if the p e process i is non-stati tionary a and h has c changed ed?
Now Future Past
Think Outside The Box
1.
- 1. Who i
is likely t to b buy
2.
- 2. Wh
Who h has mo more mo money ey
1.
- 1. Who c
changed?
2.
- 2. What w
was the c change?
4
Compare d e differ eren ent p people a e at t the s same t e time: e: Follow t w the s e same p e person in time: e: Change y your p per erspective. Ask d different t ques esti tions to g get t new i w insights ts.
Change Point Analysis
Shift in Mean
Change Point Analysis
Econ
- nom
- my:
- Gros
- ss D
Dom
- mestic P
ic Prod
- duct
ct
- Stock M
Market I Inde ndex Cor
- rpor
- rate P
Performance ce Metr trics ics:
- Nu
Number o
- f clien
ents
- Portfolio
lio ba balance Customer er D Det etails:
- EFT
FT pa payroll l of a a che hequi quing account
- Your monthly
hly credit it c card s d spe pend nd
6
Hundred eds Hundred eds Tens of
- f Milli
illions
Change Point Analysis is the problem of estimating the point at which some statistical property changes. This presentation will focus on change in the me mean, i.e., the average has shifted.
Did the Mean shift?
Naïve r e rule: e: mea ean(p (per eriod 2) d differ eren ent f t from mean(per eriod 1 1) b by 2 20%
7
The hese g graph phs m meet 20+% cha hange, but t they ey are e False P e Positives es: Na Naïve r rule a also g gen enerate e False e Negatives es. Statistic ical t l tests of m mean di difference, e.g., t two sample ple t-test, i involve r ratio o
- f m
mean n differenc nce t to s standa dard de deviation, i it is no not ba based d on n the m mean d n differenc nce only. The he issue ue with t h the he na naïve rul ule is that it it does not t take data varia iabilit ity into a account. The he de decisio ion rul ule m mus ust be be modif ified t d to take v variabil ilit ity into c conside deration. n.
Detect Mean Shift via CUSUM Range
CUSUM: cumulative sum of deviations of centered series
- If deviations are random, they tend to cancel out resulting in small CUSUM range
- Shifted sections have deviations of the same sign, CUSUM will move away from zero
8
Dec ecision rule: Large C CUSUM range i e is indicative o e of s shifted ed m mea ean
How Large is Large?
Leverage variability of empirical data to determine “Large”:
- Calculate Empirical CUSUM
- Calculate CUSUM distribution by randomly shuffling the data many times
- P-value of Empirical CUSUM is significant is proportion of Distribution >= Empirical
9
Natural data variation d deter etermine w e whether er e empirical p patter ern i is unusual
By using actual data variability to perform significance test, False Positive and False Negative can be minimized
When Did It Happen?
1.
Change occurred at Max x Ab Abso solute CU CUSUM
- Simple to compute
- A little less precise
10
2.
Change occurred at point
- f Mi
Minimu mum V m Variance
- More complex calculation
- More precise
Two estimators for location of change: Recursiv ively ly split lit a a tim ime s series in into m man any s sections
Reaction Speed to New Change
V1 V1 V2 V2 V3 V3 V4 V4 V5 V5 V6 V6 V7 V7 V8 V8 V9 V9 V1 V10 V1 V11 V1 V12 Pr Prob
- b
10.53 12.15 11.64 8.89 13.08 7.53 9.72 12.08 10.13 12.45 9.70 10.58 0.15 12.15 11.64 8.89 13.08 7.53 9.72 12.08 10.13 12.45 9.70 10.58 20 0.39 11.64 8.89 13.08 7.53 9.72 12.08 10.13 12.45 9.70 10.58 20 20 0. 0.77 77 8.89 13.08 7.53 9.72 12.08 10.13 12.45 9.70 10.58 20 20 20 0. 0.93 93 13.08 7.53 9.72 12.08 10.13 12.45 9.70 10.58 20 20 20 20 0. 0.99 99
11
- The first series has no change. Probability of mean shift = 0.15, insignificant.
- When shifted by one and appending a single large value, probability increases to 0.39, still
- insignificant. CPA is robust to single spike.
- Shifting the series by two and append two large values, probability increases to 0.77. CPA is
signaling heightened likelihood of mean shift, odds are 3:1.
- Shifting by three and appending three large values raises the probability to 0.93. With four
consecutive value, probability is 0.99.
For s short t time s e series es, a a section o
- f t
three o ee or m more e shifted ed v value h e has P >= 0.90
Decision Logic
12
St Statis istical l Si Signif ific icance
- No False Positive
- No False Negative
Busi usiness ss Si Signif ific icance
- Magnitude of
change interesting to business
Events o
- f
Interest st
- True events for
investigation and intervention
Pay Increase
Large Scale Computation
Numer erically i inte tensive e computation: 1.
- 1. Hundreds of
- f m
milli illions of
- f tim
ime s serie ies 2.
- 2. Random s
shuffl fle s e significance te e test 3.
- 3. Minimizati
tion of R Res esidual S Sum o
- f S
f Squares En End-to to-end p process b built i in SAS:
- Handles
es v varying l g length t time s e series es
- 28 code
e modules es including d g data p prep
- 2,000 lin
lines of
- f c
cod
- de
- In
In-mem emory p processing f g for eff efficiency (in in-me memor
- ry a
y array d y dat ata s structures)
14
EFT Payroll Increase
Proce
- cess 155 M
Million ion p payroll
- ll e
events Each month de h detect 1 1,200 – 1, 1,500 500 c clients Can de n detect large pa pay s spi pike, e e.g., l lum ump s p sum um pa payments
Step Records CPU Time Elapsed
- 1. Extract four years of EFT payroll
155 Million records 18 minutes 25 minutes
- 2. Aggregation (customers with multiple acct)
145 Million records 2 minutes 2 minutes
- 3. Eliminate low pay and closed accounts
85 Million records 5 minutes 5 minutes
- 4. Pay frequency determination
62 Million records 2 minutes 2 minutes
- 5. Remove irregular off cycle pay
60 Million records 5 minutes 5 minutes
- 6. Kalman Filter smoothing of pay spikes
60 Million records 4 minutes 4 minutes
- 7. Change Point Detection
60 Million records 7 hours 7 hours
- 8. Event Selection
17K Accounts 1 minute 1 minute
15
Payroll increase examples (clean)
16
Payroll increase examples (noisy)
17
More money in pocket means…
We ide dentify i in n one ne year 2 21K c cus ustomers tha hat g grow f funds unds b by $300 Millio ion and nd inc ncrease c card s d spe pend nd by $50 Millio ion 1. 1. Payroll ll inc ncrease more often f for young nger pe people le, a , and t nd the hey ope pen m more a accoun unts 2. 2. Youn unger cus ustomers i invest more, s spe pend m nd more, bo borrow m more (ne new c car, , bi bigger ho hous use) 3. 3. Olde der pe people ple j jus ust save t the heir ir a addi dditio iona nal i l inc ncome, t the hey do don’ n’t spe pend nd more or bo borrow more
18
After Pay I Increa ease se Per C Customer er F Fund unds s Increa ease se Total i in $ $Million Age ge Cust stomer Account Increa ease se Asse sset Lendi ding ng Card rd Spend Funds Card rd Spend 19 19 - 35 35 7, 7,80 800 6. 6.6% 6% $5,900 900 $10, $10,800 $3,200 200 $130 $130 $25 $25 35 35 - 45 45 5, 5,70 700 4. 4.3% 3% $5,500 500 $5,400 400 $2,400 400 $62 $62 $15 $15 45 45 - 55 55 5, 5,00 000 3. 3.2% 2% $9,600 600 $3,500 500 $3,100 100 $66 $66 $15 $15 55 55 - 65 65 2, 2,50 500 2. 2.2% 2% $16, $16,100 $600 $600
- $1,900
900 $42 $42
- $5
$5 Al All 21,000 000 4. 4.3% 3% $7,900 900 $6,400 400 $2,300 300 $300 $300 $50 $50
Credit Card Spend
Credit Card Spend Decrease
Scenario:
- Portfolio of 4 million+ credit cards
- Annual spend volume low to plan
by approximately $700 million
20
Causes:
- Reduced acquisition?
- Increased attrition?
- Slow down in economy?
- Increased competition in Reward cards?
Change Point Analysis Process
Step Records CPU Time Elapsed Time
- 1. Extract three years of account PV
178 Million 6 minutes 1 hour
- 2. Aggregate to customer
126 Million 15 minutes 15 minutes
- 3. Eliminate low spend and closed accounts
48 Million 6 minutes 6 minutes
- 4. Change Point Detection
48 Million 4 hours 4 hours
- 5. Event selection
8K customer 1 minute 1 minute
Process 178 million records Each month detect 600 clients with significant PV decrease
21
Losing $1 Million+ per customer in a year…
22
Each c customer er l lower ered ed s spend b by $ $1 Million+ p per y year
Losing $500K+ per year…
Each c customer er l lower ered ed s spend b by $ $500K+ K+ p per year
23
Spend Loss Identified
Annual s l spend of
- f 4
4milli illion+ c car ards w was as lo low t to p plan lan b by ~ y ~ $700 M Milli illion. CPA id identifie ied 8 8K clie lients w wit ith an annual s l spend los loss of
- f ~
~ $545 M Millio illion.
24
Group Customer % Customer Average A Annua nnual Spen end L Loss Annua nnual S l Spe pend nd Los Loss Inde ndex 1 100 100 1% 1% $1 mill llion+ $109 $109 million 14. 14.9 2 400 400 5% 5% $266K $266K $109 $109 million 3. 3.9 3 1K 1K 13% 13% $110K $110K $109 $109 million 1. 1.6 4 2. 2.1K 1K 26% 26% $52K $52K $109 $109 million 0. 0.8 5 4. 4.4K 4K 55% 55% $25K $25K $109 $109 million 0. 0.4 To Total 8K 8K 100% 100% $68K $68K $545 $545 million 1. 1.0
Interest Rate Sensitive Customers
Savings Account Bonus Rate Promotion
26
Scenario:
- Frequent promotion of Bonus
Interest Rate on new balance
- Portfolio balance fluctuates with
promotion on/off Questions:
- Who is interest rate sensitive?
- How many are they?
- How much do they swing the portfolio
balance?
Analysis Process
27
$5 Million fluctuation
Grey = high interest promotion White = regular low interest
1. 1. Change Point Analy lysis to id identify accounts w with b bala lance c changes ( (mult lti-per eriod) 2. 2. Che heck c correlatio ion n of ba balance cha hang nge pa pattern with campa paig ign pe n periods ds $500K fluctuation
Outcome
Hot Money # of Customer Total ($Million) $ Million+ 815 $2,167 $500k - $1Million 1,313 $904 $250K - $500K 2,381 $838 $100K - $250K 4,361 $699 Total 8,870 $4,607
28
Portfolio c consists of
- f 400K+ ac
accounts w wit ith ~ ~ $20 Billion illion in in balan alance. We f e found 9 9K customers t that a are r e responsible f e for ~ ~ $4.6 B Billion o
- f H
Hot Money. Instea ead o
- f a
an i inte teres est rate te issue, e, t the c e conver ersation c changed ed to to h how to w to l look after er w wealthy c customer ers that a are looking f g for r ret eturn f from s safe i e instrumen ents.
Back to Predictive Analytics
Improve Predictive Analytics
1. 1.
Bet etter er T Target D t Definition
- Balance/Spend increase/decrease model target definition use the naïve rule, thus
the target set is not clean resulting in poor model
- Use CPA to define very clean targets – won’t mis-identify, won’t miss any
- Align individual customer’s time window on the exact time of change. This cleanly
delineates and improves the historical and prediction window.
2. 2.
Bet etter er Input F Features es
- Events are cleaner and clearer signals compared to the raw time series
- Use past event triggers as inputs to predictive models to improve predictability
- Build customer event database
30
Timing is Everything
31
Thank Y You daymond.ling@senecac acollege.ca
32
1489 Timing is Everything
Please Provide Feedback!
1.
Go to the Agenda icon in the conference app.
2.
Find this session title 1489 489 and select it.
3.
On the sessions page, scroll down to Surveys and select the name of the survey.
4.
Complete the survey and click Finish.
33