 
              MAPA Mapping Scorecard Calibration using a Monotone Adjacent Pooling Algorithm Presented at Edinburgh Credit Scoring and Control IX September 7-9, 2005 Raymond Anderson Standard Bank Group Johannesburg, South Africa
On What Why Calibrate? Measure? • Traditional Retail Norm • Good/Bad Definition – Focus on ranking ability – Initially an Accept/Reject paradigm – Bad = 60 days past due – Good = Current • Consistent meaning – Indet = Mid Range – Across Scorecards • Basel II – Over Time – Across Products – Bad=90 days past due – Good=Not Bad • Moving Forward – No Mid Range – Focus on predictive accuracy – Pricing NOTE: Basel based upon corporate – Provisioning methodologies. – Capital Adequacy DEFAULT / NOT DEFAULT
Predictive Modeling Techniques – Logistic Regression • Reasonable estimates, if Basel def’n used – Linear Probability Modeling • Ranks well, but scores unreliable as estimates • Generalised Additive Non-parametric Regression – Decision Trees • Use historical results directly If Basel definition not used, or probability estimates unreliable, then mapping is necessary.
LPM Score Results Score Distribution Score by Natural Log Odds 9 100% 1200 8 90% y = 0.0492x - 0.0779 1000 7 80% R 2 = 0.5786 70% 6 800 60% Ln(Odds) 5 50% 600 4 40% 3 400 30% 2 20% 200 1 10% 0 0% 0 880 903 917 931 945 959 973 987 1001 1015 1029 1045 0 3 7 1 5 9 3 7 1 5 9 5 8 0 1 3 4 5 7 8 0 1 2 4 8 9 9 9 9 9 9 9 0 0 0 0 1 1 1 1 Score Distribution Actual Ln(Not LPM Score LPM Score Actual P(Not Default) Default/Default Odds) Linear (Actual Ln(Not Gini = 55.52%
Assumptions • Ranking Ability paramount! • Estimates necessary, but secondary Possible Methodologies • Score banding / Risk Indicators – Use Historical Figures – Grouping Unscientific • Logit, with score as sole independent – Simple, but assumes linearity • Fitting of Lorenz curve (Glößner 2003) – Very complicated
MAPA Mapping - Process I. Data Selection & Preparation 1 st Pass: MAPA Interpolation II. 2 nd Pass: Correct for Errors III. IV. Create Mapping Table V. Implement Mapping Table
Data Selection & Preparation A. Out of time/out of sample? B. Within Universe C. Rank by Score D. Set Target Variable 1 st Pass: MAPA Interpolation A. Apply MAPA to identify Pools B. Calculate Ln(odds) per Pool C. Interpolate High and Low Ln(Odds) for each Pool D. Interpolate Ln(Odds) for each Record
Pool Definition A B 95% 200 8.0 180 90% 7.0 Pool4= 160 85% 72.90% 6.0 @928 140 Natural Log Odds 80% 5.0 120 P(Good) 75% 100 4.0 70% 80 3.0 65% 60 2.0 60% 40 Pool 1 = Pool2= Pool3= 53.97% 62.50% 62.79% 55% 20 1.0 @903 @911 @914 50% 0 0.0 880 892 899 904 909 914 919 924 929 913 938 963 988 1013 1038 Score BreakScores Pool1 Score Pool2 Pool3 BreakScores Average Ln(Odds) Lower Range Full Range Pool4 If monotonic, then P(Good) increases with score. Use Iterative process: a) find score with lowest cumulative P(Good); b) set that score as upper bound for pool; c) clear and repeat with remaining scores until all scores pooled. Gini = 56.22%
Interpolation 1 st Pass Results 9 2.0 8 7 1.5 Natural Log Odds 6 33,440 Natural Log Odds Records 5 Total 1.0 4 B D 3 0.5 2 C 1 0.0 0 250 500 750 1000 1250 880 939 989 1039 Record Average Ln(Odds) Interpolated Original Number Actual Pool Smoothed Score BreakScores Record Ln(Odds) Use average P(Good) per Pool [B] to Interpolate Ln(Odds) for breakrecords [C], and use to interpolate Ln(Odds) for all records [D]. Aggregate by score, and we have a smoothed score to Ln(Odds) mapping… but with errors.
2 nd Pass: Error Correction 100% 4.0% 30 120 3.5% 20 100 90% 3.0% Adjusted P(Good) 10 Bad Distribution 2.5% Bad Spread 80 80% 0 2.0% Errors 880 939 989 1039 60 70% 1.5% -10 1.0% 40 60% -20 Normal 0.5% 20 Std Dev = 3 -30 50% 0.0% Count 5000 10000 15000 -40 0 Record Number (Total) Score 2nd Pass Spread Actual Error Net deficiency of 31.3 bads out of 1,990 (1.6%), distributed in same fashion as bads. Error corrected by spreading over bads, assuming normally distributed with Z-value from –3 to +3. P(Good) estimate adjusted downwards.
Step 3: Mapping Table 9 900 ′ = + 8 S S BASE 800 ( ) ( ) − ln ln Odds Odds 7 × S BASE S INCR New Score (LogLinear) ln( ) Odds 700 6 Natural Log Odds INCR 5 600 4 500 3 400 2 1 300 0 200 880 939 989 1039 Original 850 900 950 1000 1050 1100 Actual Pool 2nd Pass Score Original Score (LPM) Now have final P(Good). We can map Ln(Odds) onto new loglinear scores. Example at right has 32/1 odds at baseline 500, doubling every fifty points. Gini = 55.51%
Rescale S ( ( ) ( ) ) ′ = + × − INCR ln ln S S Odds Odds BASE S BASE ln( ) Odds INCR ( ) = = Base Score = 500 | 989 60 . 47 Odds S 50 ( ( ) ( ) ) Base Odds = 32 ′ = + × − = 500 ln 60 . 47 ln 32 545 . 9 S 989 ln( 2 ) Double Every 50 Thus, score of 989 maps to 546. points Converts to Bad Rate of 1.625% − ( ) 1 ⎛ ⎞ ⎛ ⎞ ln ( ) ( ) Odds ( ) ⎜ ⎟ ⎜ ′ ⎟ = + − × + 1 exp INCR ln P Bad S S Odds ⎜ ⎟ ⎜ ⎟ BASE BASE ⎝ ⎠ ⎝ S ⎠ INCR ( ) ′ = = = | 546 1 . 625 % P Bad S ( ) − 1 ⎛ ⎞ ⎛ ⎞ ln 2 ( ) ( ) ⎜ + − × + ⎟ ⎜ ⎟ 1 exp 546 500 ln 32 ⎜ ⎟ ⎝ ⎠ ⎝ ⎠ 50
Conclusions • Historical focus on ranking ability (power) • Need reasonable estimates (accuracy) • Problems where scorecard build and required definitions differ, where estimates unreliable, or significant changes to business environment.. • Requirement: – Business issues drive scorecard development – Apply transformations to obtain PD estimates for Basel II
Conclusion cont’d ISSUES ADVANTAGES • Always Backward • Conceptually Simple Looking!!! • Non-Linear • Requires Mapping Table • No Power Loss • Small Numbers? Bias? • Handles any Binary • Raw scores still needed Transformation – Scorecard Monitoring • Allows updates using – Strategy??? latest performance • Endpoint Treatment? – Historical (Detailed) • Other variations may – Informed (Constant) provide improvements
Recommend
More recommend