A case study on using generalized additive models to fit credit - - PowerPoint PPT Presentation

a case study on using generalized additive models to fit
SMART_READER_LITE
LIVE PREVIEW

A case study on using generalized additive models to fit credit - - PowerPoint PPT Presentation

A case study on using generalized additive models to fit credit rating scores Marlene Mller, marlene.mueller@itwm.fraunhofer.de This version: July 8, 2009, 14:32 Contents Application: Credit Rating Aim of this Talk Case Study German


slide-1
SLIDE 1

A case study on using generalized additive models to fit credit rating scores

Marlene Müller, marlene.mueller@itwm.fraunhofer.de

This version: July 8, 2009, 14:32

slide-2
SLIDE 2

Contents

Application: Credit Rating Aim of this Talk Case Study German Credit Data Australian Credit Data French Credit Data UC2005 Credit Data Simulation Study Conclusions Appendix: Further Plots Australian Credit Data French Credit Data UC2005 Credit Data

1

slide-3
SLIDE 3

Application: Credit Rating

  • Basel II: capital requirements of a bank are adapted to the individual credit

portfolio

  • core terms: determine rating score and subsequently default probabilities

(PDs) as a function of some explanatory variables

  • further terms: loss given default, portfolio dependence structure
  • in practice: often classical logit/probit-type models to estimate linear

predictors (scores) and probabilities (PDs)

  • statistically: 2-group classification problem

risk management issues

  • credit risk is ony one part of a bank’s total risk:

will be aggregated with other risks

  • credit risk estimation from historical data:

stress-tests to simulate future extreme situations need to easily adapt the rating system to possible future changes possible need to extrapolate to segments without observations

2

slide-4
SLIDE 4

Application: Credit Rating

  • Basel II: capital requirements of a bank are adapted to the individual credit

portfolio

  • core terms: determine rating score and subsequently default probabilities

(PDs) as a function of some explanatory variables

  • further terms: loss given default, portfolio dependence structure
  • in practice: often classical logit/probit-type models to estimate linear

predictors (scores) and probabilities (PDs)

  • statistically: 2-group classification problem

risk management issues

  • credit risk is ony one part of a bank’s total risk:

will be aggregated with other risks

  • credit risk estimation from historical data:

stress-tests to simulate future extreme situations need to easily adapt the rating system to possible future changes possible need to extrapolate to segments without observations

2

slide-5
SLIDE 5

(Simplified) Development of Rating Score and Default Probability raw data:

Xj measurements of several variables (“risk factors”)

(nonlinear) transformation:

Xj → e Xj = mj(Xj) handle outliers, allow for nonlinear dependence on raw risk factors

rating score:

S = w1 e X1 + . . . + wd e Xd

default probability:

PD = P(Y = 1|X) = G(w1e X1 + . . . + wd e Xd) (where G is e.g. the logistic or gaussian cdf logit or probit)

3

slide-6
SLIDE 6

(Simplified) Development of Rating Score and Default Probability raw data:

Xj measurements of several variables (“risk factors”)

(nonlinear) transformation:

Xj → e Xj = mj(Xj) handle outliers, allow for nonlinear dependence on raw risk factors

rating score:

S = w1 e X1 + . . . + wd e Xd

default probability:

PD = P(Y = 1|X) = G(w1e X1 + . . . + wd e Xd) (where G is e.g. the logistic or gaussian cdf logit or probit)

3

slide-7
SLIDE 7

(Simplified) Development of Rating Score and Default Probability raw data:

Xj measurements of several variables (“risk factors”)

(nonlinear) transformation:

Xj → e Xj = mj(Xj) handle outliers, allow for nonlinear dependence on raw risk factors

rating score:

S = w1 e X1 + . . . + wd e Xd

default probability:

PD = P(Y = 1|X) = G(w1e X1 + . . . + wd e Xd) (where G is e.g. the logistic or gaussian cdf logit or probit)

3

slide-8
SLIDE 8

(Simplified) Development of Rating Score and Default Probability raw data:

Xj measurements of several variables (“risk factors”)

(nonlinear) transformation:

Xj → e Xj = mj(Xj) handle outliers, allow for nonlinear dependence on raw risk factors

rating score:

S = w1 e X1 + . . . + wd e Xd

default probability:

PD = P(Y = 1|X) = G(w1e X1 + . . . + wd e Xd) (where G is e.g. the logistic or gaussian cdf logit or probit)

3

slide-9
SLIDE 9

(Simplified) Development of Rating Score and Default Probability raw data:

Xj measurements of several variables (“risk factors”)

(nonlinear) transformation:

Xj → e Xj = mj(Xj) handle outliers, allow for nonlinear dependence on raw risk factors

rating score:

S = w1 e X1 + . . . + wd e Xd

default probability:

PD = P(Y = 1|X) = G(w1e X1 + . . . + wd e Xd) (where G is e.g. the logistic or gaussian cdf logit or probit)

3

slide-10
SLIDE 10

Aim of this Talk

case study on (cross-sectional) rating data

  • compare different approaches to generalized additive models (GAM)
  • consider models that allow for additional categorical variables

partial linear terms (combination of GAM/GPLM)

generalized additive models allow for a simultaneous fit of the

transformations from the raw data, the linear rating score and the default probabilities

4

slide-11
SLIDE 11

Outline of the Study credit data case study: 4 credit datasets

regressors dataset sample defaults continuous discrete categorical German Credit 1000 30.00% 3 – 17 Australian Credit 678 55.90% 3 1 8 French Credit 8178 5.86% 5 3 15 UC2005 Credit 5058 23.92% 12 3 21

  • differences between different approaches?
  • improvement of default predictions?

simulation study: comparison of additive model (AM) and GAM fits

  • differences between different approaches?
  • what if regressors are concurve? (nonlinear version of multicollinear)
  • do sample size and default rate matter?

5

slide-12
SLIDE 12

Generalized Additive Model logit/probit are special cases of the generalized linear model (GLM)

E(Y|X) = G “ X⊤β ”

“classic” generalized additive model

E(Y|X) = G 8 < :c +

p

X

j=1

mj(Xj) 9 = ; mj nonparametric

generalized additive partial linear model (semiparametric GAM)

E(Y|X1, X2) = G 8 < :c + X⊤

1 β + p

X

j=1

mj(X2j) 9 = ; mj nonparametric

linear part

  • allows for known transformation functions
  • allows to add / control for categorical regressors

6

slide-13
SLIDE 13

Generalized Additive Model logit/probit are special cases of the generalized linear model (GLM)

E(Y|X) = G “ X⊤β ”

“classic” generalized additive model

E(Y|X) = G 8 < :c +

p

X

j=1

mj(Xj) 9 = ; mj nonparametric

generalized additive partial linear model (semiparametric GAM)

E(Y|X1, X2) = G 8 < :c + X⊤

1 β + p

X

j=1

mj(X2j) 9 = ; mj nonparametric

linear part

  • allows for known transformation functions
  • allows to add / control for categorical regressors

6

slide-14
SLIDE 14

Generalized Additive Model logit/probit are special cases of the generalized linear model (GLM)

E(Y|X) = G “ X⊤β ”

“classic” generalized additive model

E(Y|X) = G 8 < :c +

p

X

j=1

mj(Xj) 9 = ; mj nonparametric

generalized additive partial linear model (semiparametric GAM)

E(Y|X1, X2) = G 8 < :c + X⊤

1 β + p

X

j=1

mj(X2j) 9 = ; mj nonparametric

linear part

  • allows for known transformation functions
  • allows to add / control for categorical regressors

6

slide-15
SLIDE 15

R “Standard” Tools

two main approaches for GAM in

  • gam::gam backfitting with local scoring (Hastie and Tibshirani; 1990)
  • mgcv::gam penalized regression splines (Wood; 2006)

compare these procedures under the default settings of gam::gam and mgcv::gam competing estimators:

  • logit binary GLM with G(u) = 1/{1 + exp(−u)} (logistic cdf as link)
  • logit2, logit3 binary GLM with 2nd / 3rd order polynomial terms for the

continuous regressors

  • logitc binary GLM with continuous regressors categorized (4–5 levels)
  • gam binary GAM using gam::gam with s() terms for continuous
  • mgcv binary GAM using mgcv::gam

7

slide-16
SLIDE 16

German Credit Data from http://www.stat.uni-muenchen.de/service/datenarchiv/kredit/kredit_e.html

regressors dataset name sample defaults continuous discrete categorical German 1000 30.00% 3 – 17

3 continuous regressors: age, amount, duration (time to maturity) use 10 CV subsamples for validation stratified data (true default rate ≈ 5%) important findings:

  • some observation(s) that seem to confuse mgcv::gam in one CV subsample

(→ see following slides)

  • however, mgcv::gam seems to improve deviance and discriminatory power

w.r.t. gam::gam

  • estimation times of mgcv::gam are between 4 to 7 times higher than for

gam::gam (not more than around a second, though)

  • if we only use the continuous regressors: both GAM estimators are comparable

to logit cubic additive functions

8

slide-17
SLIDE 17

German Credit Data from http://www.stat.uni-muenchen.de/service/datenarchiv/kredit/kredit_e.html

regressors dataset name sample defaults continuous discrete categorical German 1000 30.00% 3 – 17

3 continuous regressors: age, amount, duration (time to maturity) use 10 CV subsamples for validation stratified data (true default rate ≈ 5%) important findings:

  • some observation(s) that seem to confuse mgcv::gam in one CV subsample

(→ see following slides)

  • however, mgcv::gam seems to improve deviance and discriminatory power

w.r.t. gam::gam

  • estimation times of mgcv::gam are between 4 to 7 times higher than for

gam::gam (not more than around a second, though)

  • if we only use the continuous regressors: both GAM estimators are comparable

to logit cubic additive functions

8

slide-18
SLIDE 18

German Credit Data: Additive Functions

3.0 3.2 3.4 3.6 3.8 4.0 4.2 −0.5 0.0 0.5 age s(age,1)

Variable age (mgcv and blue: gam)

6 7 8 9 −4 −2 2 amount s(amount,4.49)

Variable amount (mgcv and blue: gam)

1.5 2.0 2.5 3.0 3.5 4.0 −2 −1 1 2 duration s(duration,1)

Variable duration (mgcv and blue: gam)

9

slide-19
SLIDE 19

How to Compare Binary GLM Fits? preferably by out-of-sample validation block cross-validation approach:

leave out subsamples of x% from the fitting procedure, estimate from the remaining (100-x)% and calculate validation criteria from the x% left-out

two criteria for comparison: deviance (→ goodness of fit) and accuracy

ratios AR from CAP curves (→ discriminatory power)

CAP curve (Lorenz curve) and the accuracy ratio AR:

  • plot the empirical cdf of the fitted scores

against the empirical cdf of the fitted default sample scores (precisely 1 − b F vs. 1 − b F(.|Y = 1))

  • AR is the area between CAP curve and

diagonal in relation to the corresponding area for the best possible CAP curve (best possible ∼ = perfect separation)

  • relation to ROC: compares b

F(.|Y = 0) and b F(.|Y = 1) and it holds AR = 2 AUC −1

PD

1−F(s)

best possible CAP curve

Percentage

100% 100%

1−F (s)

1 CAP curve

2 1

_ G

Percentage of applicants

  • f defaults

10

slide-20
SLIDE 20

How to Compare Binary GLM Fits? preferably by out-of-sample validation block cross-validation approach:

leave out subsamples of x% from the fitting procedure, estimate from the remaining (100-x)% and calculate validation criteria from the x% left-out

two criteria for comparison: deviance (→ goodness of fit) and accuracy

ratios AR from CAP curves (→ discriminatory power)

CAP curve (Lorenz curve) and the accuracy ratio AR:

  • plot the empirical cdf of the fitted scores

against the empirical cdf of the fitted default sample scores (precisely 1 − b F vs. 1 − b F(.|Y = 1))

  • AR is the area between CAP curve and

diagonal in relation to the corresponding area for the best possible CAP curve (best possible ∼ = perfect separation)

  • relation to ROC: compares b

F(.|Y = 0) and b F(.|Y = 1) and it holds AR = 2 AUC −1

PD

1−F(s)

best possible CAP curve

Percentage

100% 100%

1−F (s)

1 CAP curve

2 1

_ G

Percentage of applicants

  • f defaults

10

slide-21
SLIDE 21

German Credit Data: Comparison

logit1 logit2 logit3 logitc gam mgcv 0.45 0.50 0.55 0.60 0.65

German: Accuracy Ratios (AR)

2 4 6 8 10 0.45 0.50 0.55 0.60 0.65

German: Accuracy Ratios (AR)

sample 2 4 6 8 10 0.45 0.50 0.55 0.60 0.65

gam vs. mgcv (AR)

sample logit1 logit2 logit3 logitc gam mgcv 100 150 200 250 300 350

German: Deviances

2 4 6 8 10 100 150 200 250 300 350

German: Deviances

sample logit1 logit2 logit3 logitc gam mgcv 2 4 6 8 10

German: Estimation Times

Figure: Out of sample comparison (blockwise CV with 10 blocks) for various estimators, accuracy ratios from CAP curves (upper panels), deviance values and estimation times (lower panels)

11

slide-22
SLIDE 22

German Credit Data: Models with only Continuous Regressors

logit1 logit2 logit3 logitc gam mgcv 0.0 0.1 0.2 0.3 0.4 0.5

German−Metric: Accuracy Ratios (AR)

2 4 6 8 10 0.0 0.1 0.2 0.3 0.4 0.5

German−Metric: Accuracy Ratios (AR)

sample 2 4 6 8 10 0.0 0.1 0.2 0.3 0.4 0.5

gam vs. mgcv (AR)

sample logit1 logit2 logit3 logitc gam mgcv 110 115 120 125 130

German−Metric: Deviances

2 4 6 8 10 110 115 120 125 130

German−Metric: Deviances

sample logit1 logit2 logit3 logitc gam mgcv 0.0 0.1 0.2 0.3 0.4 0.5

German−Metric: Estimation Times

Figure: Out of sample comparison (blockwise CV with 10 blocks) for various estimators, accuracy ratios from CAP curves (upper panels), deviance values and estimation times (lower panels)

12

slide-23
SLIDE 23

Australian Credit Data from http://archive.ics.uci.edu/ml/datasets/Statlog+(Australian+Credit+Approval) used for estimation:

regressors dataset name sample defaults continuous discrete categorical Australian 678 55.90% 3 1 8

use only 7 CV subsamples for validation

  • riginal A13 and A14 were dropped since actually multicollinear with A10, some
  • bservations were dropped because of very few categories

A10 was transformed to log(1 + A10), nevertheless used only as a linear predictor

(as half of the observations have the same value)

important findings:

  • essentially, the estimated additive function for A2 differs between mgcv::gam

and gam::gam

  • gam::gam mostly outperforms than all other estimates (recall, that however the

number of CV subsamples is rather small!)

  • estimation times of mgcv::gam are around 3 to 5 times higher than for

gam::gam (less than a second, though)

13

slide-24
SLIDE 24

French Credit Data data were already analyzed with GPLMs in Müller and Härdle (2003), here used for

estimation: regressors dataset name sample defaults continuous discrete categorical French 8178 5.86% 5 3 15

use the same preprocessing as in as in Müller and Härdle (2003) the original estimation + validation samples were merged, use 20 CV subsamples for

validation instead

continuous variables are X1, X2, X3, X4 and X6, in particular X3, X4 and X6 are

known to have nonlinear form in a GAM

important findings:

  • it is confirmed that additive functions for X3, X4 and X6 should be modelled by

a nonlinear function be nonlinear

  • again observation(s) "confusing" mgcv::gam in one of the subsamples
  • all estimates show similar discriminatory power, though with a slightly better

performance for both mgcv::gam and gam::gam

  • estimation times of mgcv::gam are around 15 to 24 times higher than for

gam::gam (for the largest model: 20-40 sec. on a 3Ghz Intel CPU for the subsamples of about 7800 observations)

14

slide-25
SLIDE 25

UC2005 Credit Data data from the 2005 UC data mining competitionwere already analyzed with GPLMs

in Müller and Härdle (2003), here used for estimation: regressors dataset name sample defaults continuous discrete categorical UC2005 5058 23.92% 12 3 21

the original estimation + validation + quiz samples were merged, use again 20 CV

subsamples for validation

stratified data (true default rate ≈ 5%) several of the variables have been preprocessed with a log-transform or to binary in general, the data haven’t been very carefully analysed, it’s use is rather meant a

“proof-of concept”

important findings:

  • there are again observations "confusing" mgcv::gam in one of the subsamples
  • performance of mgcv::gam and gam::gam w.r.t. is very similar and
  • utperforms the other approaches (closest to them is the logit fit with cubic

additive functions)

  • estimation times of mgcv::gam are around 8 to 40 times higher than for

gam::gam (for the largest model: 5-8 min on a 3Ghz Intel CPU for up to 400 seconds for the subsamples of about 4800 observations)

15

slide-26
SLIDE 26

Simulation Study for (G)PLM

E(Y|X, T) = β1X1 + β2X2 + m(T) which of the (G)AM estimators is preferable ...?

... to fit the additive component functions and/or the regression function? ... w.r.t. discriminatory power in the GPLM/GAM cases? ... from a practical point of view (comp. speed, numerical stability etc.)?

simulation setup:

β1 = 1, β2 = −1, m(t) = 1.5 cos(πt) + c X1, U, T ∼ Uniform[-1,1], X2 ∼ m (ρT + (1 − ρ)U) (centered) nsim = 1000, n ∈ {100, 1000, 10000}, ρ ∈ {0.0, 0.7}, c ∈ {0, −1, −2}

X2 and T are nonlinearly dependent (if ρ = 0.7) or independent otherwise sample size n up to 10000 which is a possible size for credit data the intercept c controls for the default rate (15%–50%) in the GPLM

16

slide-27
SLIDE 27

Simulation Study for (G)PLM

E(Y|X, T) = β1X1 + β2X2 + m(T) which of the (G)AM estimators is preferable ...?

... to fit the additive component functions and/or the regression function? ... w.r.t. discriminatory power in the GPLM/GAM cases? ... from a practical point of view (comp. speed, numerical stability etc.)?

simulation setup:

β1 = 1, β2 = −1, m(t) = 1.5 cos(πt) + c X1, U, T ∼ Uniform[-1,1], X2 ∼ m (ρT + (1 − ρ)U) (centered) nsim = 1000, n ∈ {100, 1000, 10000}, ρ ∈ {0.0, 0.7}, c ∈ {0, −1, −2}

X2 and T are nonlinearly dependent (if ρ = 0.7) or independent otherwise sample size n up to 10000 which is a possible size for credit data the intercept c controls for the default rate (15%–50%) in the GPLM

16

slide-28
SLIDE 28

Simulation Study for (G)PLM

E(Y|X, T) = β1X1 + β2X2 + m(T) which of the (G)AM estimators is preferable ...?

... to fit the additive component functions and/or the regression function? ... w.r.t. discriminatory power in the GPLM/GAM cases? ... from a practical point of view (comp. speed, numerical stability etc.)?

simulation setup:

β1 = 1, β2 = −1, m(t) = 1.5 cos(πt) + c X1, U, T ∼ Uniform[-1,1], X2 ∼ m (ρT + (1 − ρ)U) (centered) nsim = 1000, n ∈ {100, 1000, 10000}, ρ ∈ {0.0, 0.7}, c ∈ {0, −1, −2}

X2 and T are nonlinearly dependent (if ρ = 0.7) or independent otherwise sample size n up to 10000 which is a possible size for credit data the intercept c controls for the default rate (15%–50%) in the GPLM

16

slide-29
SLIDE 29

Simulation Study: Additive Components for GPLM

gam mgcv ρ = 0.7 c = −2 n = 1000

gam mgcv 0.00000 0.00010 0.00020 0.00030

MSE m(t)

gam mgcv 0.00000 0.00005 0.00010 0.00015 0.00020 0.00025 0.00030

MSE beta1

gam mgcv 0.00000 0.00005 0.00010 0.00015 0.00020 0.00025

MSE beta2

gam mgcv ρ = 0.7 c = −2 n = 10000

gam mgcv 0e+00 2e−05 4e−05 6e−05

MSE m(t)

gam mgcv 0.0e+00 5.0e−06 1.0e−05 1.5e−05 2.0e−05 2.5e−05

MSE beta1

gam mgcv 0e+00 1e−05 2e−05 3e−05 4e−05

MSE beta2

17

slide-30
SLIDE 30

Simulation Study: Independent Components vs. Dependent

gam mgcv ρ = 0.7 c = 0 n = 10000

gam mgcv 0e+00 1e−05 2e−05 3e−05 4e−05

MSE m(t)

gam mgcv 0.0e+00 5.0e−06 1.0e−05 1.5e−05 2.0e−05 2.5e−05 3.0e−05

MSE beta1

gam mgcv 0.0e+00 5.0e−06 1.0e−05 1.5e−05 2.0e−05

MSE beta2

gam mgcv ρ = 0.7 c = −2 n = 10000

gam mgcv 0e+00 2e−05 4e−05 6e−05

MSE m(t)

gam mgcv 0.0e+00 5.0e−06 1.0e−05 1.5e−05 2.0e−05 2.5e−05

MSE beta1

gam mgcv 0e+00 1e−05 2e−05 3e−05 4e−05

MSE beta2

18

slide-31
SLIDE 31

Simulation Study: Comparison with Components for PLM

gam mgcv ρ = 0.7 c = 0 n = 1000

gam mgcv 0.0e+00 5.0e−06 1.0e−05 1.5e−05

MSE m(t)

gam mgcv 0e+00 1e−07 2e−07 3e−07 4e−07

MSE beta1

gam mgcv 0e+00 1e−06 2e−06 3e−06 4e−06 5e−06

MSE beta2

gam mgcv ρ = 0.7 c = 0 n = 10000

gam mgcv 0.0e+00 5.0e−06 1.0e−05 1.5e−05

MSE m(t)

gam mgcv 0e+00 1e−08 2e−08 3e−08 4e−08 5e−08 6e−08

MSE beta1

gam mgcv 0.0e+00 5.0e−07 1.0e−06 1.5e−06 2.0e−06 2.5e−06 3.0e−06

MSE beta2

19

slide-32
SLIDE 32

Simulation Study: Deviance and Discriminatory Power for GPLM

gam mgcv ρ = 0.7 c = −2 n = 1000

gam mgcv 650 700 750 800 850

Deviances

gam mgcv 0.35 0.40 0.45 0.50 0.55 0.60

Accuracy Ratios (AR)

gam mgcv 0.30 0.35 0.40 0.45 0.50

Kolmogorov Stats (T)

gam mgcv ρ = 0.7 c = −2 n = 10000

gam mgcv 7200 7300 7400 7500 7600 7700 7800

Deviances

gam mgcv 0.44 0.46 0.48 0.50

Accuracy Ratios (AR)

gam mgcv 0.32 0.34 0.36 0.38

Kolmogorov Stats (T)

in fact, most of the gam::gam deviances are larger here than the mgcv::gam deviances and gam::gam fits have smaller discriminatory power

20

slide-33
SLIDE 33

Simulation Study: Estimation Times for GPLM

gam mgcv ρ = 0.7 c = −2

gam mgcv 0.1 0.2 0.3 0.4 0.5 0.6

Estimation Time

gam mgcv 1 2 3 4 5 6

Estimation Time

n = 1000 n = 10000 (estimation times in sec. on a Xeon 2.50GHz)

21

slide-34
SLIDE 34

Conclusions typically, categorical regressors improve fit significantly, therefore

estimation methods for credit data should adequately use these

backfitting + local scoring (gam::gam) provides fast and numerically stable

results

there is however clear indication, that penalized regression splines

(mgcv::gam) may provide more precise estimates of the additive component functions; its current drawbacks are:

  • estimation time (increasing with model complexity, categorical variables)
  • effects are to be seen only in large samples

thus: no clear recommendation, no “ultimate method”

clearly topics for more research

22

slide-35
SLIDE 35

References

Härdle, W., Müller, M., Sperlich, S. and Werwatz, A. (2004). Nonparametric and Semiparametric Modeling: An Introduction, Springer, New York. Hastie, T. J. and Tibshirani, R. J. (1990). Generalized Additive Models, Vol. 43 of Monographs on Statistics and Applied Probability, Chapman and Hall, London. Müller, M. (2001). Estimation and testing in generalized partial linear models — a comparative study, Statistics and Computing 11: 299–309. Müller, M. and Härdle, W. (2003). Exploring credit data, in G. Bol, G. Nakhaeizadeh, S. Rachev, T. Ridder and K.-H. Vollmer (eds), Credit Risk - Measurement, Evaluation and Management, Physica-Verlag. Speckman, P . E. (1988). Regression analysis for partially linear models, Journal of the Royal Statistical Society, Series B 50: 413–436. Wood, S. N. (2006). Generalized Additive Models: An Introduction with R, Texts in Statistical Science, Chapman and Hall, London.

23

slide-36
SLIDE 36

Australian Credit Data: Additive Functions

3.0 3.5 4.0 −1.0 −0.5 0.0 0.5 A2 s(A2,1)

Variable A2 (mgcv and blue: gam)

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 −3 −2 −1 1 2 3 A3 s(A3,4.35)

Variable A3 (mgcv and blue: gam)

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 −10 −5 A7 s(A7,6.27)

Variable A7 (mgcv and blue: gam)

24

slide-37
SLIDE 37

Australian Credit Data: Comparison

logit1 logit2 logit3 logitc gam mgcv 0.75 0.80 0.85 0.90

Australian: Accuracy Ratios (AR)

1 2 3 4 5 6 7 0.75 0.80 0.85 0.90

Australian: Accuracy Ratios (AR)

sample 1 2 3 4 5 6 7 0.75 0.80 0.85 0.90

gam vs. mgcv (AR)

sample logit1 logit2 logit3 logitc gam mgcv 50 60 70 80 90

Australian: Deviances

1 2 3 4 5 6 7 50 60 70 80 90

Australian: Deviances

sample logit1 logit2 logit3 logitc gam mgcv 0.0 0.1 0.2 0.3 0.4 0.5 0.6

Australian: Estimation Times

Figure: Out of sample comparison (blockwise CV with 7 blocks) for various estimators, accuracy ratios from CAP curves (upper panels), deviance values and estimation times (lower panels)

25

slide-38
SLIDE 38

Australian Credit Data: Models with only Continuous Regressors

logit1 logit2 logit3 logitc gam mgcv 0.30 0.35 0.40 0.45 0.50 0.55 0.60

Australian−Metric: Accuracy Ratios (AR)

1 2 3 4 5 6 7 0.30 0.35 0.40 0.45 0.50 0.55 0.60

Australian−Metric: Accuracy Ratios (AR)

sample 1 2 3 4 5 6 7 0.30 0.35 0.40 0.45 0.50 0.55 0.60

gam vs. mgcv (AR)

sample logit1 logit2 logit3 logitc gam mgcv 105 110 115 120 125

Australian−Metric: Deviances

1 2 3 4 5 6 7 105 110 115 120 125

Australian−Metric: Deviances

sample logit1 logit2 logit3 logitc gam mgcv 0.00 0.05 0.10 0.15 0.20 0.25

Australian−Metric: Estimation Times

Figure: Out of sample comparison (blockwise CV with 7 blocks) for various estimators, accuracy ratios from CAP curves (upper panels), deviance values and estimation times (lower panels)

26

slide-39
SLIDE 39

French Credit Data: Additive Functions

−1 1 2 3 −0.5 0.0 0.5 1.0 X1 s(X1,1) Variable X1 (mgcv and blue: gam) −1 1 2 3 −1.5 −1.0 −0.5 0.0 0.5 X2 s(X2,1) Variable X2 (mgcv and blue: gam) 1 2 3 −0.4 −0.3 −0.2 −0.1 0.0 0.1 X3 s(X3,1) Variable X3 (mgcv and blue: gam) 1 2 3 −0.5 0.0 0.5 1.0 X4 s(X4,5.19) Variable X4 (mgcv and blue: gam) 1 2 3 −1.5 −1.0 −0.5 0.0 0.5 X6 s(X6,3.34) Variable X6 (mgcv and blue: gam)

27

slide-40
SLIDE 40

French Credit Data: Comparison

logit1 logit2 logit3 logitc gam mgcv 0.3 0.4 0.5 0.6

French: Accuracy Ratios (AR)

5 10 15 20 0.3 0.4 0.5 0.6

French: Accuracy Ratios (AR)

sample 5 10 15 20 0.3 0.4 0.5 0.6

gam vs. mgcv (AR)

sample logit1 logit2 logit3 logitc gam mgcv 150 200 250 300 350 400

French: Deviances

5 10 15 20 150 200 250 300 350 400

French: Deviances

sample logit1 logit2 logit3 logitc gam mgcv 50 100 150 200

French: Estimation Times

Figure: Out of sample comparison (blockwise CV with 20 blocks) for various estimators, accuracy ratios from CAP curves (upper panels), deviance values and estimation times (lower panels)

28

slide-41
SLIDE 41

French Credit Data: Models with only Significant Regressors

logit1 logit2 logit3 logitc gam mgcv 0.30 0.35 0.40 0.45 0.50 0.55

French−Signif: Accuracy Ratios (AR)

5 10 15 20 0.30 0.35 0.40 0.45 0.50 0.55

French−Signif: Accuracy Ratios (AR)

sample 5 10 15 20 0.30 0.35 0.40 0.45 0.50 0.55

gam vs. mgcv (AR)

sample logit1 logit2 logit3 logitc gam mgcv 160 170 180 190

French−Signif: Deviances

5 10 15 20 160 170 180 190

French−Signif: Deviances

sample logit1 logit2 logit3 logitc gam mgcv 5 10 15

French−Signif: Estimation Times

Figure: Out of sample comparison (blockwise CV with 20 blocks) for various estimators, accuracy ratios from CAP curves (upper panels), deviance values and estimation times (lower panels)

29

slide-42
SLIDE 42

French Credit Data: Models with only Metric Regressors

logit1 logit2 logit3 logitc gam mgcv 0.1 0.2 0.3 0.4 0.5

French−Metric: Accuracy Ratios (AR)

5 10 15 20 0.1 0.2 0.3 0.4 0.5

French−Metric: Accuracy Ratios (AR)

sample 5 10 15 20 0.1 0.2 0.3 0.4 0.5

gam vs. mgcv (AR)

sample logit1 logit2 logit3 logitc gam mgcv 170 175 180 185 190 195 200

French−Metric: Deviances

5 10 15 20 170 175 180 185 190 195 200

French−Metric: Deviances

sample logit1 logit2 logit3 logitc gam mgcv 2 4 6 8 10 12

French−Metric: Estimation Times

Figure: Out of sample comparison (blockwise CV with 20 blocks) for various estimators, accuracy ratios from CAP curves (upper panels), deviance values and estimation times (lower panels)

30

slide-43
SLIDE 43

UC2005 Credit Data: Additive Functions

−2 2 4 −1.5 −1.0 −0.5 0.0 0.5 1.0 X1 s(X1,2.13) Variable X1 (mgcv and blue: gam) −4 −2 2 −0.5 0.0 0.5 1.0 X4 s(X4,1.84) Variable X4 (mgcv and blue: gam) −3 −2 −1 1 2 3 −1 1 2 3 X5 s(X5,5.24) Variable X5 (mgcv and blue: gam) −1 1 2 −2 2 4 6 8 X13 s(X13,4.43) Variable X13 (mgcv and blue: gam) 2 4 −2 −1 1 X14 s(X14,2.11) Variable X14 (mgcv and blue: gam) −2 −1 1 −0.4 0.0 0.2 0.4 0.6 0.8 1.0 X15 s(X15,2.77) Variable X15 (mgcv and blue: gam) −1 1 2 3 −3 −2 −1 1 X26 s(X26,1.64) Variable X26 (mgcv and blue: gam) 1 2 3 1 2 3 4 X28 s(X28,4.43) Variable X28 (mgcv and blue: gam) −1 1 2 3 −2 2 4 X29 s(X29,2.6) Variable X29 (mgcv and blue: gam) −1 1 2 3 4 5 −2.5 −1.5 −0.5 0.0 0.5 X30 s(X30,4.43) Variable X30 (mgcv and blue: gam) −1 1 2 3 4 5 −2 2 4 6 X33 s(X33,7.99) Variable X33 (mgcv and blue: gam) 2 4 6 8 −6 −4 −2 2 X37 s(X37,2.1) Variable X37 (mgcv and blue: gam)

31

slide-44
SLIDE 44

UC2005 Credit Data: Comparison

logit1 logit2 logit3 logitc gam mgcv 0.92 0.94 0.96 0.98

UC2005: Accuracy Ratios (AR)

5 10 15 20 0.92 0.94 0.96 0.98

UC2005: Accuracy Ratios (AR)

sample 5 10 15 20 0.92 0.94 0.96 0.98

gam vs. mgcv (AR)

sample logit1 logit2 logit3 logitc gam mgcv 50 100 150 200 250 300

UC2005: Deviances

5 10 15 20 50 100 150 200 250 300

UC2005: Deviances

sample logit1 logit2 logit3 logitc gam mgcv 100 200 300 400 500

UC2005: Estimation Times

Figure: Out of sample comparison (blockwise CV with 20 blocks) for various estimators, accuracy ratios from CAP curves (upper panels), deviance values and estimation times (lower panels)

32

slide-45
SLIDE 45

UC2005 Credit Data: Models with only Metric Regressors

logit1 logit2 logit3 logitc gam mgcv 0.85 0.90 0.95

UC2005−Metric: Accuracy Ratios (AR)

5 10 15 20 0.85 0.90 0.95

UC2005−Metric: Accuracy Ratios (AR)

sample 5 10 15 20 0.85 0.90 0.95

gam vs. mgcv (AR)

sample logit1 logit2 logit3 logitc gam mgcv 60 80 100 120 140 160

UC2005−Metric: Deviances

5 10 15 20 60 80 100 120 140 160

UC2005−Metric: Deviances

sample logit1 logit2 logit3 logitc gam mgcv 20 40 60 80

UC2005−Metric: Estimation Times

Figure: Out of sample comparison (blockwise CV with 20 blocks) for various estimators, accuracy ratios from CAP curves (upper panels), deviance values and estimation times (lower panels)

33