measuring the quality of credit scoring models
play

Measuring the Quality of Credit Scoring Models Martin ez Dept. of - PowerPoint PPT Presentation

Measuring the Quality of Credit Scoring Models Martin ez Dept. of Mathematics and Statistics, Faculty of Science, Masaryk University CSCC XI, Edinburgh August 2009 Content 1. Introduction 3 2. Good/bad client definition 4 3.


  1. Measuring the Quality of Credit Scoring Models Martin Ř ezá č Dept. of Mathematics and Statistics, Faculty of Science, Masaryk University CSCC XI, Edinburgh August 2009

  2. Content 1. Introduction 3 2. Good/bad client definition 4 3. Measuring the quality 6 4. Indexes based on distribution function 7 5. Indexes based on density function 17 6. Some results for normally distributed scores 24 7. Conclusions 30 2/30

  3. Introduction � It is impossible to use scoring model effectively without knowing how good it is. � Usually one has several scoring models and needs to select just one. The best one. � Before measuring the quality of models one should know (among other things): � good/bad definition � expected reject rate 3/30

  4. Good/bad client definition � Good definition is the basic condition of effective scoring model. � The definition usually depends on: � days past due (DPD) � amount past due � time horizon � Generally we consider following types of client: � Good � Bad � Indeterminate � Insufficient � Excluded � Rejected. 4/30

  5. Good/bad client definition BAD Customer Fraud (first delayed payment, 90 DPD) Default Early default Accepted (2-4 delayed payment, 60 DPD) (60 or 90 DPD) Late default Rejected (5+ delayed payment, 60 DPD) Not default GOOD INDETERMINATE Insufficient 5/30

  6. Measuring the quality � Once the definition of good / bad client and client's score is available, it is possible to evaluate the quality of this score. If the score is an output of a predictive model (scoring function), then we evaluate the quality of this model. We can consider two basic types of quality indexes. First, indexes based on cumulative distribution function like � Kolmogorov-Smirnov statistics (KS) � Gini index � C-statistics � Lift. The second, indexes based on likelihood density function like � Mean difference (Mahalanobis distance) � Informational statistics/value (I Val ). 6/30

  7. Indexes based on distribution function n Number of good clients:  1 , client is good = D K  m Number of bad clients: 0 , otherwise .  n m = = p G , p B Proportions of good/bad clients: + + n m n m � Empirical distribution functions: � Kolmogorov-Smirnov statistics (KS) = ∑ 1 n ( ) = − ≤ ∧ = KS max F ( a ) F ( a ) F ( a ) I s a D 1 m , BAD n , GOOD n . GOOD i K [ ] n ∈ a L , H = i 1 1 m ( ) ∑ = ≤ ∧ = F ( a ) I s a D 0 m . BAD i K m = i 1 1 N [ ] ( ) ∑ ∈ = ≤ a L , H F ( a ) I s a N . ALL i N = 1 i  1 A is true ( ) = I A  0 otherwise  7/30

  8. Indexes based on distribution function � Lorenz curve (LC) = x F ( a ) m . BAD [ ] . = ∈ y F ( a ), a L , H n . GOOD � Gini index A = = Gini 2 A + A B ) ( ) + n m ( ∑ = − − ⋅ + Gini 1 F F F F m . BAD m . BAD − n . GOOD n . GOOD − k k 1 k 1 k = k 2 F . F . where ( ) is k-th vector value of empirical distribution function of bad (good) clients m BAD k n GOOD k 8/30

  9. Indexes based on distribution function � C-statistics: c + 1 Gini − = + = c stat A C 2 It represents the likelihood that randomly selected good client has higher score than randomly selected bad client, i.e. ( ) − = ≥ = ∧ = c stat P s s D 1 D 0 1 2 K K 1 2 9/30

  10. Indexes based on distribution function � Another possible indicator of the quality of scoring model can be cumulative Lift , which says, how many times, at a given level of rejection, is the scoring model better than random selection (random model). More precisely, the ratio indicates the proportion of bad [ ] clients with less than a score a , , to the proportion of bad ∈ a L , H clients in the general population. Formally, it can be expressed by: + + n m n m ( ) ( ) ∑ ∑ ≤ ∧ = ≤ ∧ = I s a Y 0 I s a Y 0 i i = = i 1 i 1 + + n m n m ∑ ( ) ∑ ( ) ≤ ≤ I s a I s a i i CumBadRate ( a ) = = = = = Lift ( a ) i 1 i 1 + n n m BadRate ( ) ∑ = I Y 0 N = i 1 + n m ( ) ∑ = ∨ = I Y 0 Y 1 = i 1 BadRate ( a ) = absLift ( a ) 10/30 BadRate

  11. Indexes based on distribution function 3,50 abs. Lift � Usually it is computed using table with 3,00 cum. Lift 2,50 numbers of all and bad clients in some Lift value 2,00 bands (deciles). 1,50 1,00 absolutely cumulatively decile # cleints # bad clients Bad rate abs. Lift # bad clients Bad rate cum. Lift 0,50 1 100 16 16,0% 3,20 16 16,0% 3,20 - 2 100 12 12,0% 2,40 28 14,0% 2,80 1 2 3 4 5 6 7 8 9 10 3 100 8 8,0% 1,60 36 12,0% 2,40 4 100 5 5,0% 1,00 41 10,3% 2,05 decile 5 100 3 3,0% 0,60 44 8,8% 1,76 6 100 2 2,0% 0,40 46 7,7% 1,53 1 7 100 1 1,0% 0,20 47 6,7% 1,34 8 100 1 1,0% 0,20 48 6,0% 1,20 Gini=0,55 0,8 9 100 1 1,0% 0,20 49 5,4% 1,09 10 100 1 1,0% 0,20 50 5,0% 1,00 0,6 All 1000 50 5,0% 0,4 0,2 Lornz curve Base line 0 0 0,2 0,4 0,6 0,8 1 11/30

  12. Indexes based on distribution function � When bad rates are not monotone: absolutely cumulatively � LC looks fine decile # cleints # bad clients Bad rate abs. Lift # bad clients Bad rate cum. Lift 1 100 8 8,0% 1,60 8 8,0% 1,60 � Gini is slightly lowered 2 100 12 12,0% 2,40 20 10,0% 2,00 3 100 16 16,0% 3,20 36 12,0% 2,40 � Lift looks strange 4 100 5 5,0% 1,00 41 10,3% 2,05 5 100 3 3,0% 0,60 44 8,8% 1,76 6 100 2 2,0% 0,40 46 7,7% 1,53 7 100 1 1,0% 0,20 47 6,7% 1,34 8 100 1 1,0% 0,20 48 6,0% 1,20 9 100 1 1,0% 0,20 49 5,4% 1,09 10 100 1 1,0% 0,20 50 5,0% 1,00 All 1000 50 5,0% 3,50 1 abs. Lift 3,00 cum. Lift Gini=0,48 0,8 2,50 Lift value 0,6 2,00 1,50 0,4 1,00 0,2 Lornz curve 0,50 Base line - 0 1 2 3 4 5 6 7 8 9 10 0 0,2 0,4 0,6 0,8 1 12/30 decile

  13. Indexes based on distribution function � When score is reversed, we obtain reversed figures. 3,50 abs. Lift absolutely cumulatively decile # cleints # bad clients Bad rate abs. Lift # bad clients Bad rate cum. Lift 3,00 cum. Lift 1 100 16 16,0% 3,20 16 16,0% 3,20 2 100 12 12,0% 2,40 28 14,0% 2,80 2,50 3 100 8 8,0% 1,60 36 12,0% 2,40 Lift value 4 100 5 5,0% 1,00 41 10,3% 2,05 2,00 5 100 3 3,0% 0,60 44 8,8% 1,76 6 100 2 2,0% 0,40 46 7,7% 1,53 1,50 7 100 1 1,0% 0,20 47 6,7% 1,34 8 100 1 1,0% 0,20 48 6,0% 1,20 1,00 9 100 1 1,0% 0,20 49 5,4% 1,09 10 100 1 1,0% 0,20 50 5,0% 1,00 0,50 All 1000 50 5,0% - 1 2 3 4 5 6 7 8 9 10 decile absolutely cumulatively decile # cleints # bad clients Bad rate abs. Lift # bad clients Bad rate cum. Lift 1 100 1 1,0% 0,20 1 1,0% 0,20 1 2 100 1 1,0% 0,20 2 1,0% 0,20 Gini= - 0,55 3 100 1 1,0% 0,20 3 1,0% 0,20 0,8 4 100 1 1,0% 0,20 4 1,0% 0,20 5 100 2 2,0% 0,40 6 1,2% 0,24 0,6 6 100 3 3,0% 0,60 9 1,5% 0,30 7 100 5 5,0% 1,00 14 2,0% 0,40 0,4 8 100 8 8,0% 1,60 22 2,8% 0,55 9 100 12 12,0% 2,40 34 3,8% 0,76 0,2 10 100 16 16,0% 3,20 50 5,0% 1,00 Lornz curve Base line All 1000 50 5,0% 0 13/30 0 0,2 0,4 0,6 0,8 1

  14. Indexes based on distribution function � The Gini is not enough!!! � SC 1: 1 1 0,9 good Gini= 0,42 decile # cleints # bad clients Bad rate bad 0,8 0,8 1 100 35 35,0% 0,7 2 100 16 16,0% 0,6 0,6 3 100 8 8,0% 0,5 4 100 8 8,0% 0,4 5 100 7 7,0% 0,4 6 100 6 6,0% 0,3 K-S = 0.34 7 100 6 6,0% 0,2 0,2 Lornz curve 8 100 5 5,0% 0,1 Base line 9 100 5 5,0% 0 0 10 100 4 4,0% 0 0,2 0,4 0,6 0,8 1 0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1 All 1000 100 10,0% � SC 2: 1 1 0,9 good Gini = 0.42 decile # cleints # bad clients Bad rate bad 0,8 0,8 1 100 20 20,0% 0,7 2 100 18 18,0% 0,6 0,6 3 100 17 17,0% 0,5 4 100 15 15,0% 0,4 5 100 12 12,0% 0,4 6 100 6 6,0% 0,3 7 100 4 4,0% K-S = 0.36 0,2 0,2 Lornz curve 8 100 3 3,0% Base line 0,1 9 100 3 3,0% 0 0 10 100 2 2,0% 0 0,2 0,4 0,6 0,8 1 0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1 14/30 All 1000 100 10,0%

  15. Indexes based on distribution function � SC 1: � SC 2: 2,50 4,00 abs. Lift abs. Lift 3,50 cum. Lift cum. Lift 2,00 3,00 2,50 1,50 Lift value Lift value 2,00 1,00 1,50 1,00 0,50 0,50 - - 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 decile decile Lift 20% = 2.55 > Lift 20% = 1.90 Lift 50% = 1.48 < Lift 50% = 1.64 SC 2 is better if reject rate is expected around 50%. SC 1 is much more better if reject rate is expected by 20%. 15/30

  16. Indexes based on distribution function � Lift can be expressed and computed by formulae: F ( a ) [ ] ∈ = n . BAD a L , H Lift ( a ) F ( a ) N . ALL − ( ) 1 F ( F ( q )) 1 − = = 1 Lift n . BAD N . ALL F F ( q ) q − n . BAD N . ALL 1 F ( F ( q )) q N . ALL N . ALL { } − 1 = ∈ ≥ F ( q ) min a [ L , H ], F ( a ) q N . ALL N . ALL ( ) . = ⋅ − 1 Lift 10 F F ( 0 . 1 ) 10 % n . BAD N . ALL 16/30

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend