A case study on using generalized additive models to fit credit rating scores
Marlene Müller, marlene.mueller@itwm.fraunhofer.de
This version: July 8, 2009, 14:32
A case study on using generalized additive models to fit credit - - PowerPoint PPT Presentation
A case study on using generalized additive models to fit credit rating scores Marlene Mller, marlene.mueller@itwm.fraunhofer.de This version: July 8, 2009, 14:32 Contents Application: Credit Rating Aim of this Talk Case Study German
This version: July 8, 2009, 14:32
Application: Credit Rating Aim of this Talk Case Study German Credit Data Australian Credit Data French Credit Data UC2005 Credit Data Simulation Study Conclusions Appendix: Further Plots Australian Credit Data French Credit Data UC2005 Credit Data
1
2
2
3
3
3
3
3
4
regressors dataset sample defaults continuous discrete categorical German Credit 1000 30.00% 3 – 17 Australian Credit 678 55.90% 3 1 8 French Credit 8178 5.86% 5 3 15 UC2005 Credit 5058 23.92% 12 3 21
5
p
j=1
1 β + p
j=1
linear part
6
p
j=1
1 β + p
j=1
linear part
6
p
j=1
1 β + p
j=1
linear part
6
7
regressors dataset name sample defaults continuous discrete categorical German 1000 30.00% 3 – 17
(→ see following slides)
w.r.t. gam::gam
gam::gam (not more than around a second, though)
to logit cubic additive functions
8
regressors dataset name sample defaults continuous discrete categorical German 1000 30.00% 3 – 17
(→ see following slides)
w.r.t. gam::gam
gam::gam (not more than around a second, though)
to logit cubic additive functions
8
3.0 3.2 3.4 3.6 3.8 4.0 4.2 −0.5 0.0 0.5 age s(age,1)
Variable age (mgcv and blue: gam)
6 7 8 9 −4 −2 2 amount s(amount,4.49)
Variable amount (mgcv and blue: gam)
1.5 2.0 2.5 3.0 3.5 4.0 −2 −1 1 2 duration s(duration,1)
Variable duration (mgcv and blue: gam)
9
against the empirical cdf of the fitted default sample scores (precisely 1 − b F vs. 1 − b F(.|Y = 1))
diagonal in relation to the corresponding area for the best possible CAP curve (best possible ∼ = perfect separation)
F(.|Y = 0) and b F(.|Y = 1) and it holds AR = 2 AUC −1
PD
1−F(s)
best possible CAP curve
Percentage
100% 100%
1−F (s)
1 CAP curve
2 1
_ G
Percentage of applicants
10
against the empirical cdf of the fitted default sample scores (precisely 1 − b F vs. 1 − b F(.|Y = 1))
diagonal in relation to the corresponding area for the best possible CAP curve (best possible ∼ = perfect separation)
F(.|Y = 0) and b F(.|Y = 1) and it holds AR = 2 AUC −1
PD
1−F(s)
best possible CAP curve
Percentage
100% 100%
1−F (s)
1 CAP curve
2 1
_ G
Percentage of applicants
10
logit1 logit2 logit3 logitc gam mgcv 0.45 0.50 0.55 0.60 0.65
German: Accuracy Ratios (AR)
2 4 6 8 10 0.45 0.50 0.55 0.60 0.65
German: Accuracy Ratios (AR)
sample 2 4 6 8 10 0.45 0.50 0.55 0.60 0.65
gam vs. mgcv (AR)
sample logit1 logit2 logit3 logitc gam mgcv 100 150 200 250 300 350
German: Deviances
2 4 6 8 10 100 150 200 250 300 350
German: Deviances
sample logit1 logit2 logit3 logitc gam mgcv 2 4 6 8 10
German: Estimation Times
Figure: Out of sample comparison (blockwise CV with 10 blocks) for various estimators, accuracy ratios from CAP curves (upper panels), deviance values and estimation times (lower panels)
11
logit1 logit2 logit3 logitc gam mgcv 0.0 0.1 0.2 0.3 0.4 0.5
German−Metric: Accuracy Ratios (AR)
2 4 6 8 10 0.0 0.1 0.2 0.3 0.4 0.5
German−Metric: Accuracy Ratios (AR)
sample 2 4 6 8 10 0.0 0.1 0.2 0.3 0.4 0.5
gam vs. mgcv (AR)
sample logit1 logit2 logit3 logitc gam mgcv 110 115 120 125 130
German−Metric: Deviances
2 4 6 8 10 110 115 120 125 130
German−Metric: Deviances
sample logit1 logit2 logit3 logitc gam mgcv 0.0 0.1 0.2 0.3 0.4 0.5
German−Metric: Estimation Times
Figure: Out of sample comparison (blockwise CV with 10 blocks) for various estimators, accuracy ratios from CAP curves (upper panels), deviance values and estimation times (lower panels)
12
regressors dataset name sample defaults continuous discrete categorical Australian 678 55.90% 3 1 8
(as half of the observations have the same value)
and gam::gam
number of CV subsamples is rather small!)
gam::gam (less than a second, though)
13
estimation: regressors dataset name sample defaults continuous discrete categorical French 8178 5.86% 5 3 15
validation instead
known to have nonlinear form in a GAM
a nonlinear function be nonlinear
performance for both mgcv::gam and gam::gam
gam::gam (for the largest model: 20-40 sec. on a 3Ghz Intel CPU for the subsamples of about 7800 observations)
14
in Müller and Härdle (2003), here used for estimation: regressors dataset name sample defaults continuous discrete categorical UC2005 5058 23.92% 12 3 21
subsamples for validation
“proof-of concept”
additive functions)
gam::gam (for the largest model: 5-8 min on a 3Ghz Intel CPU for up to 400 seconds for the subsamples of about 4800 observations)
15
β1 = 1, β2 = −1, m(t) = 1.5 cos(πt) + c X1, U, T ∼ Uniform[-1,1], X2 ∼ m (ρT + (1 − ρ)U) (centered) nsim = 1000, n ∈ {100, 1000, 10000}, ρ ∈ {0.0, 0.7}, c ∈ {0, −1, −2}
16
β1 = 1, β2 = −1, m(t) = 1.5 cos(πt) + c X1, U, T ∼ Uniform[-1,1], X2 ∼ m (ρT + (1 − ρ)U) (centered) nsim = 1000, n ∈ {100, 1000, 10000}, ρ ∈ {0.0, 0.7}, c ∈ {0, −1, −2}
16
β1 = 1, β2 = −1, m(t) = 1.5 cos(πt) + c X1, U, T ∼ Uniform[-1,1], X2 ∼ m (ρT + (1 − ρ)U) (centered) nsim = 1000, n ∈ {100, 1000, 10000}, ρ ∈ {0.0, 0.7}, c ∈ {0, −1, −2}
16
gam mgcv 0.00000 0.00010 0.00020 0.00030
MSE m(t)
gam mgcv 0.00000 0.00005 0.00010 0.00015 0.00020 0.00025 0.00030
MSE beta1
gam mgcv 0.00000 0.00005 0.00010 0.00015 0.00020 0.00025
MSE beta2
gam mgcv 0e+00 2e−05 4e−05 6e−05
MSE m(t)
gam mgcv 0.0e+00 5.0e−06 1.0e−05 1.5e−05 2.0e−05 2.5e−05
MSE beta1
gam mgcv 0e+00 1e−05 2e−05 3e−05 4e−05
MSE beta2
17
gam mgcv 0e+00 1e−05 2e−05 3e−05 4e−05
MSE m(t)
gam mgcv 0.0e+00 5.0e−06 1.0e−05 1.5e−05 2.0e−05 2.5e−05 3.0e−05
MSE beta1
gam mgcv 0.0e+00 5.0e−06 1.0e−05 1.5e−05 2.0e−05
MSE beta2
gam mgcv 0e+00 2e−05 4e−05 6e−05
MSE m(t)
gam mgcv 0.0e+00 5.0e−06 1.0e−05 1.5e−05 2.0e−05 2.5e−05
MSE beta1
gam mgcv 0e+00 1e−05 2e−05 3e−05 4e−05
MSE beta2
18
gam mgcv 0.0e+00 5.0e−06 1.0e−05 1.5e−05
MSE m(t)
gam mgcv 0e+00 1e−07 2e−07 3e−07 4e−07
MSE beta1
gam mgcv 0e+00 1e−06 2e−06 3e−06 4e−06 5e−06
MSE beta2
gam mgcv 0.0e+00 5.0e−06 1.0e−05 1.5e−05
MSE m(t)
gam mgcv 0e+00 1e−08 2e−08 3e−08 4e−08 5e−08 6e−08
MSE beta1
gam mgcv 0.0e+00 5.0e−07 1.0e−06 1.5e−06 2.0e−06 2.5e−06 3.0e−06
MSE beta2
19
gam mgcv 650 700 750 800 850
Deviances
gam mgcv 0.35 0.40 0.45 0.50 0.55 0.60
Accuracy Ratios (AR)
gam mgcv 0.30 0.35 0.40 0.45 0.50
Kolmogorov Stats (T)
gam mgcv 7200 7300 7400 7500 7600 7700 7800
Deviances
gam mgcv 0.44 0.46 0.48 0.50
Accuracy Ratios (AR)
gam mgcv 0.32 0.34 0.36 0.38
Kolmogorov Stats (T)
in fact, most of the gam::gam deviances are larger here than the mgcv::gam deviances and gam::gam fits have smaller discriminatory power
20
gam mgcv 0.1 0.2 0.3 0.4 0.5 0.6
Estimation Time
gam mgcv 1 2 3 4 5 6
Estimation Time
21
22
Härdle, W., Müller, M., Sperlich, S. and Werwatz, A. (2004). Nonparametric and Semiparametric Modeling: An Introduction, Springer, New York. Hastie, T. J. and Tibshirani, R. J. (1990). Generalized Additive Models, Vol. 43 of Monographs on Statistics and Applied Probability, Chapman and Hall, London. Müller, M. (2001). Estimation and testing in generalized partial linear models — a comparative study, Statistics and Computing 11: 299–309. Müller, M. and Härdle, W. (2003). Exploring credit data, in G. Bol, G. Nakhaeizadeh, S. Rachev, T. Ridder and K.-H. Vollmer (eds), Credit Risk - Measurement, Evaluation and Management, Physica-Verlag. Speckman, P . E. (1988). Regression analysis for partially linear models, Journal of the Royal Statistical Society, Series B 50: 413–436. Wood, S. N. (2006). Generalized Additive Models: An Introduction with R, Texts in Statistical Science, Chapman and Hall, London.
23
3.0 3.5 4.0 −1.0 −0.5 0.0 0.5 A2 s(A2,1)
Variable A2 (mgcv and blue: gam)
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 −3 −2 −1 1 2 3 A3 s(A3,4.35)
Variable A3 (mgcv and blue: gam)
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 −10 −5 A7 s(A7,6.27)
Variable A7 (mgcv and blue: gam)
24
logit1 logit2 logit3 logitc gam mgcv 0.75 0.80 0.85 0.90
Australian: Accuracy Ratios (AR)
1 2 3 4 5 6 7 0.75 0.80 0.85 0.90
Australian: Accuracy Ratios (AR)
sample 1 2 3 4 5 6 7 0.75 0.80 0.85 0.90
gam vs. mgcv (AR)
sample logit1 logit2 logit3 logitc gam mgcv 50 60 70 80 90
Australian: Deviances
1 2 3 4 5 6 7 50 60 70 80 90
Australian: Deviances
sample logit1 logit2 logit3 logitc gam mgcv 0.0 0.1 0.2 0.3 0.4 0.5 0.6
Australian: Estimation Times
Figure: Out of sample comparison (blockwise CV with 7 blocks) for various estimators, accuracy ratios from CAP curves (upper panels), deviance values and estimation times (lower panels)
25
logit1 logit2 logit3 logitc gam mgcv 0.30 0.35 0.40 0.45 0.50 0.55 0.60
Australian−Metric: Accuracy Ratios (AR)
1 2 3 4 5 6 7 0.30 0.35 0.40 0.45 0.50 0.55 0.60
Australian−Metric: Accuracy Ratios (AR)
sample 1 2 3 4 5 6 7 0.30 0.35 0.40 0.45 0.50 0.55 0.60
gam vs. mgcv (AR)
sample logit1 logit2 logit3 logitc gam mgcv 105 110 115 120 125
Australian−Metric: Deviances
1 2 3 4 5 6 7 105 110 115 120 125
Australian−Metric: Deviances
sample logit1 logit2 logit3 logitc gam mgcv 0.00 0.05 0.10 0.15 0.20 0.25
Australian−Metric: Estimation Times
Figure: Out of sample comparison (blockwise CV with 7 blocks) for various estimators, accuracy ratios from CAP curves (upper panels), deviance values and estimation times (lower panels)
26
−1 1 2 3 −0.5 0.0 0.5 1.0 X1 s(X1,1) Variable X1 (mgcv and blue: gam) −1 1 2 3 −1.5 −1.0 −0.5 0.0 0.5 X2 s(X2,1) Variable X2 (mgcv and blue: gam) 1 2 3 −0.4 −0.3 −0.2 −0.1 0.0 0.1 X3 s(X3,1) Variable X3 (mgcv and blue: gam) 1 2 3 −0.5 0.0 0.5 1.0 X4 s(X4,5.19) Variable X4 (mgcv and blue: gam) 1 2 3 −1.5 −1.0 −0.5 0.0 0.5 X6 s(X6,3.34) Variable X6 (mgcv and blue: gam)
27
logit1 logit2 logit3 logitc gam mgcv 0.3 0.4 0.5 0.6
French: Accuracy Ratios (AR)
5 10 15 20 0.3 0.4 0.5 0.6
French: Accuracy Ratios (AR)
sample 5 10 15 20 0.3 0.4 0.5 0.6
gam vs. mgcv (AR)
sample logit1 logit2 logit3 logitc gam mgcv 150 200 250 300 350 400
French: Deviances
5 10 15 20 150 200 250 300 350 400
French: Deviances
sample logit1 logit2 logit3 logitc gam mgcv 50 100 150 200
French: Estimation Times
Figure: Out of sample comparison (blockwise CV with 20 blocks) for various estimators, accuracy ratios from CAP curves (upper panels), deviance values and estimation times (lower panels)
28
logit1 logit2 logit3 logitc gam mgcv 0.30 0.35 0.40 0.45 0.50 0.55
French−Signif: Accuracy Ratios (AR)
5 10 15 20 0.30 0.35 0.40 0.45 0.50 0.55
French−Signif: Accuracy Ratios (AR)
sample 5 10 15 20 0.30 0.35 0.40 0.45 0.50 0.55
gam vs. mgcv (AR)
sample logit1 logit2 logit3 logitc gam mgcv 160 170 180 190
French−Signif: Deviances
5 10 15 20 160 170 180 190
French−Signif: Deviances
sample logit1 logit2 logit3 logitc gam mgcv 5 10 15
French−Signif: Estimation Times
Figure: Out of sample comparison (blockwise CV with 20 blocks) for various estimators, accuracy ratios from CAP curves (upper panels), deviance values and estimation times (lower panels)
29
logit1 logit2 logit3 logitc gam mgcv 0.1 0.2 0.3 0.4 0.5
French−Metric: Accuracy Ratios (AR)
5 10 15 20 0.1 0.2 0.3 0.4 0.5
French−Metric: Accuracy Ratios (AR)
sample 5 10 15 20 0.1 0.2 0.3 0.4 0.5
gam vs. mgcv (AR)
sample logit1 logit2 logit3 logitc gam mgcv 170 175 180 185 190 195 200
French−Metric: Deviances
5 10 15 20 170 175 180 185 190 195 200
French−Metric: Deviances
sample logit1 logit2 logit3 logitc gam mgcv 2 4 6 8 10 12
French−Metric: Estimation Times
Figure: Out of sample comparison (blockwise CV with 20 blocks) for various estimators, accuracy ratios from CAP curves (upper panels), deviance values and estimation times (lower panels)
30
−2 2 4 −1.5 −1.0 −0.5 0.0 0.5 1.0 X1 s(X1,2.13) Variable X1 (mgcv and blue: gam) −4 −2 2 −0.5 0.0 0.5 1.0 X4 s(X4,1.84) Variable X4 (mgcv and blue: gam) −3 −2 −1 1 2 3 −1 1 2 3 X5 s(X5,5.24) Variable X5 (mgcv and blue: gam) −1 1 2 −2 2 4 6 8 X13 s(X13,4.43) Variable X13 (mgcv and blue: gam) 2 4 −2 −1 1 X14 s(X14,2.11) Variable X14 (mgcv and blue: gam) −2 −1 1 −0.4 0.0 0.2 0.4 0.6 0.8 1.0 X15 s(X15,2.77) Variable X15 (mgcv and blue: gam) −1 1 2 3 −3 −2 −1 1 X26 s(X26,1.64) Variable X26 (mgcv and blue: gam) 1 2 3 1 2 3 4 X28 s(X28,4.43) Variable X28 (mgcv and blue: gam) −1 1 2 3 −2 2 4 X29 s(X29,2.6) Variable X29 (mgcv and blue: gam) −1 1 2 3 4 5 −2.5 −1.5 −0.5 0.0 0.5 X30 s(X30,4.43) Variable X30 (mgcv and blue: gam) −1 1 2 3 4 5 −2 2 4 6 X33 s(X33,7.99) Variable X33 (mgcv and blue: gam) 2 4 6 8 −6 −4 −2 2 X37 s(X37,2.1) Variable X37 (mgcv and blue: gam)
31
logit1 logit2 logit3 logitc gam mgcv 0.92 0.94 0.96 0.98
UC2005: Accuracy Ratios (AR)
5 10 15 20 0.92 0.94 0.96 0.98
UC2005: Accuracy Ratios (AR)
sample 5 10 15 20 0.92 0.94 0.96 0.98
gam vs. mgcv (AR)
sample logit1 logit2 logit3 logitc gam mgcv 50 100 150 200 250 300
UC2005: Deviances
5 10 15 20 50 100 150 200 250 300
UC2005: Deviances
sample logit1 logit2 logit3 logitc gam mgcv 100 200 300 400 500
UC2005: Estimation Times
Figure: Out of sample comparison (blockwise CV with 20 blocks) for various estimators, accuracy ratios from CAP curves (upper panels), deviance values and estimation times (lower panels)
32
logit1 logit2 logit3 logitc gam mgcv 0.85 0.90 0.95
UC2005−Metric: Accuracy Ratios (AR)
5 10 15 20 0.85 0.90 0.95
UC2005−Metric: Accuracy Ratios (AR)
sample 5 10 15 20 0.85 0.90 0.95
gam vs. mgcv (AR)
sample logit1 logit2 logit3 logitc gam mgcv 60 80 100 120 140 160
UC2005−Metric: Deviances
5 10 15 20 60 80 100 120 140 160
UC2005−Metric: Deviances
sample logit1 logit2 logit3 logitc gam mgcv 20 40 60 80
UC2005−Metric: Estimation Times
Figure: Out of sample comparison (blockwise CV with 20 blocks) for various estimators, accuracy ratios from CAP curves (upper panels), deviance values and estimation times (lower panels)
33