Advances in Credit Scoring: combining performance and interpretation in Kernel Discriminant Analysis.
Caterina Liberati
DEMS Università degli Studi di Milano-Bicocca, Milan, Italy
caterina.liberati@unimib.it
Liberati November 10th 2017 1 / 36
Advances in Credit Scoring: combining performance and interpretation - - PowerPoint PPT Presentation
Advances in Credit Scoring: combining performance and interpretation in Kernel Discriminant Analysis. Caterina Liberati DEMS Universit degli Studi di Milano-Bicocca, Milan, Italy caterina.liberati@unimib.it November 10 th 2017 Liberati 1 /
Liberati November 10th 2017 1 / 36
Liberati November 10th 2017 2 / 36
Motivation
Liberati November 10th 2017 3 / 36
Motivation
Liberati November 10th 2017 4 / 36
Kernel-Induced Feature Space
x x x x x x x x x x x x x x x x x x x x x x1 x2
−12 −10 −8 −6 −4 −2 2 4 6 8 −10 −8 −6 −4 −2 2 4 6
Liberati November 10th 2017 5 / 36
Kernel-Induced Feature Space
The complexity of the target function to be learned depends on the way it is represented and the difficulty of the learning task can vary accordingly (figure from Schölkopf and Smola (2002)). φ : R2 → R3 (x1, x2) → (z1, z2, z3) = (x2
1 ,
√ 2x1x2, x2
2 )
(φ(x), φ(z)) = (x2
1 ,
√ 2x1x2, x2
2 )(z2 1 ,
√ 2z1z2, z2
2 )T
= ((x1, x2)(z1, z2)T )2 = (x · z)2 Liberati November 10th 2017 6 / 36
Kernel-Induced Feature Space
Liberati November 10th 2017 7 / 36
Kernel-Induced Feature Space
Liberati November 10th 2017 8 / 36
Kernel-Induced Feature Space
Bω
Wω
n
Liberati November 10th 2017 9 / 36
Kernel-Induced Feature Space
B and SΦ W can be easily written as
Bω = α′Mα
1 ng
i=1
k=1 k(xi, xg k), g = 1, 2.
Wω = α′Nα
g=1 K g(I − Lg)K ′ g
k)
g Liberati November 10th 2017 10 / 36
Kernel-Induced Feature Space
n
Liberati November 10th 2017 11 / 36
Kernel-Induced Feature Space
n
i
Liberati November 10th 2017 12 / 36
Kernel-Induced Feature Space
n
∂L ∂ω
i=1 αiφ(xi) ∂L ∂b
i=1 αi = 0 ∂L ∂ξ
∂L ∂αi
n
Liberati November 10th 2017 13 / 36
Kernel-Induced Feature Space
Liberati November 10th 2017 14 / 36
Kernel-Induced Feature Space
1 1+ ||x−z||2
c
c2
2c2
W (Friedman 1989; Mika
REG
SEL Liberati November 10th 2017 15 / 36
Our Proposal
Liberati November 10th 2017 16 / 36
Our Proposal
Liberati November 10th 2017 16 / 36
Our Proposal
Liberati November 10th 2017 16 / 36
Our Proposal
Liberati November 10th 2017 17 / 36
Examples
1
2
3
4
5
Liberati November 10th 2017 18 / 36
Examples
1
2
3
4
5
6
Liberati November 10th 2017 19 / 36
Examples
Classifier Parameter Variables set Error Rate AUC Class Accuracy good bad CAU 3.6786 CREDIT+ECO+MANAGE+SEMIO 0.186 0.850 0.863 0.619 LAP 3.6786 CREDIT+ECO+MANAGE+SEMIO 0.199 0.852 0.831 0.678 MUL 5.8893 CREDIT+ECO+MANAGE+SEMIO 0.220 0.873 0.769 0.826 RBF 3.6786 CREDIT+ECO+MANAGE+SEMIO 0.210 0.856 0.801 0.748 POLY 2 CREDIT+ECO+MANAGE+SEMIO 0.333 0.566 0.733 0.398 LDA CREDIT+ECO+MANAGE+SEMIO 0.368 0.522 0.713 0.300 LR CREDIT+ECO+MANAGE+SEMIO 0.159 0.522 0.936 0.458 Liberati November 10th 2017 20 / 36
Examples
Bad Good
Liberati November 10th 2017 21 / 36
Examples
r Score Variable △R2 rWR b p-value MUL (R2=0.986 on the training sample with CREDIT+ECO+MANAGE+SEMIO sets) Pc2 0.177 18.40% 0.924 0.000 Pc3 0.701 71.50% 1.836 0.000 Bureau 0.080 8.90% 7.426 0.000 RBF (R2=0.869 on the training sample with CREDIT+ECO+MANAGE+SEMIO sets) Pc2 0.160 18.90% 0.020 0.000 Pc3 0.614 70.90% 0.040 0.000 Bureau 0.069 8.70% 0.160 0.000 POLY (R2=0.682 on the training sample with CREDIT+ECO+MANAGE+SEMIO sets) Interests on financial asset (F .) 0.018 3.30% 0.766 0.000 Total financial assets managed 0.059 11.20% 0.001 0.000 Factor 3 0.040 7.80 % 62.885 0.000 Factor 4 0.009 15.90% 100.860 0.000 Factor 13 0.009 14.00% 43.619 0.000 LDA (R2=1 on the training sample with CREDIT+ECO+MANAGE+SEMIO sets) Pc2 0.176 18.10% 0.095 0.000 Pc3 0.712 71.60% 0.190 0.000 Bureau 0.082 9.00% 0.774 0.000 LR (R2=0.394 on the training sample with CREDIT+ECO+MANAGE+SEMIO sets) Pc2 0.093 16.60% 1.031 0.000 Pc3 0.298 63.10% 2.006 0.000 Bureau 0.039 6.80% 7.888 0.000 Liberati November 10th 2017 22 / 36
Examples
AUC=0.893 score=0.924 Pc2+1.836Pc3+0.088Bureau Pc2=Duty/Pleasure Pc3=Attachment/Detachment Bureau=measure of solvency
False positive rate (1-Specificity) 0.2 0.4 0.6 0.8 1 True positive rate (Sensitivity) 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 ROC curve
Liberati November 10th 2017 23 / 36
Examples
1
2
3
Liberati November 10th 2017 24 / 36
Examples
Liberati November 10th 2017 25 / 36
Examples
Liberati November 10th 2017 26 / 36
Examples
(AUC) on training sets
0.2 0.4 0.6 0.8 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1−Specificity Sensibility Cauchy Laplace RBF FLDA Logistic
Liberati November 10th 2017 27 / 36
Examples
Liberati November 10th 2017 28 / 36
Examples
Liberati November 10th 2017 29 / 36
Examples
n=sample size nq= number of instances sampled without replacement belonging to q-th group nν= number of instances with ν-th category.
Characteristic % of category % of category % of group V-Test Pvalue categories in group in sample in category nνq/nq nν/n nνq/nν CB2_az=1 76.08 28.52 59.37 38.42 0.000 CB1_az=1 55.31 23.83 51.64 26.39 0.000 CERI=1 40.02 17.31 51.45 21.09 0.000 CB2_coll=1 18.62 8.89 46.62 11.92 0.000 DOM4=1 46.04 31.60 32.43 11.47 0.000 DOM3=1 13.22 5.70 51.58 11.13 0.000 DOM2=2 23.56 13.91 37.70 9.98 0.000 DOM1=1 30.67 20.23 33.73 9.44 0.000 CB1_coll=1 14.30 7.96 39.95 8.26 0.000 ANAG=1 35.34 28.08 28.01 5.98 0.000 DOM2=1 4.41 2.22 44.14 5.09 0.000 ANAG=2 32.01 26.24 27.15 4.86 0.000 DOM1=2 29.59 24.96 26.38 3.96 0.000 DOM2=3 26.71 22.99 25.85 3.26 0.001 nνq=instances with ν-th category in the group q Liberati November 10th 2017 30 / 36
Examples
Characteristic % of category % of category % of group V-Test Pvalue categories in group in set in category CB2_az=4 26.09 16.49 89.08 22.19 0.000 CB2_az=3 27.91 18.41 85.33 20.66 0.000 ANAG=4 28.87 21.15 76.82 15.53 0.000 DOM2=4 70.21 60.88 64.92 15.33 0.000 CERI=5 56.45 47.13 67.43 15.03 0.000 CB1_az=5 28.47 21.59 74.24 13.67 0.000 CERI=4 14.50 9.97 81.93 12.67 0.000 CB2_coll=4 8.53 5.48 87.59 11.45 0.000 CB1_az=4 22.11 16.91 73.61 11.34 0.000 DOM3=5 21.54 16.77 72.32 10.41 0.000 DOM4=5 21.01 16.35 72.34 10.27 0.000 CB2_az=5 22.64 17.85 71.41 10.18 0.000 DOM1=4 28.94 23.81 68.40 9.72 0.000 CB2_coll=3 8.67 6.14 79.48 8.73 0.000 CB1_coll=4 7.68 5.48 78.83 7.97 0.000 DOM4=4 17.60 14.23 69.62 7.81 0.000 DOM1=3 35.30 31.00 64.11 7.47 0.000 CB1_az=3 18.81 15.59 67.91 7.16 0.000 CB2_coll=2 7.93 6.54 68.20 4.49 0.000 DOM3=4 4.98 3.96 70.71 4.17 0.000 CB1_coll=2 7.15 5.96 67.45 3.99 0.000 DOM4=3 26.20 24.13 61.11 3.85 0.000 CB1_coll=3 6.40 5.40 66.67 3.51 0.000 CB2_az=2 20.33 18.73 61.11 3.27 0.001 ANAG=3 25.84 24.53 59.30 2.41 0.008 Liberati November 10th 2017 31 / 36
Examples
Akaike H (1973) Information theory and an extension of the maxi- mum likelihood principle in Information Theory: Proceedings of the 2nd International Symposium, B. N. Petrov and F. Csaki (Eds.), pp. 267-281, Academiai Kiado, Budapest, Hungary. Baesens B, Van Gestel T, Viaene S, Stepanova M, Suykens J, Vanthienen J (2003) Benchmarking state- of-the-art classification algorithms for credit scoring. Journal of the Operational Research Society 54:627-635 Barakat N, Bradley AP (2010) Evaluating consumer loans using neural networks. Neurocomputing 74:178-190 Baudat G, Anouar F (2000) Generalized discriminant analysis using a kernel approach. Neural Computation 12:2385-2404 Bozdogan H, Sclove LS (1984) Multi-sample cluster analysis using Akaike’s Information Criterion Annals of the Institute
Haff L. R. (1980) Empirical Bayes estimation of the multivariate normal covariance matrix The Annals of Statistics 8(3): 586-597 Huang YM, Hung C, Jiau HC (2006) Evaluation of neural networks and data mining methods on a credit assessment task for class imbalance problem. Nonlinear Analysis: Real World Applications 7:720-747 James W, Stein C (1961) Estimation with quadratic loss in Proceedings of the 4th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, 361?379, Berkeley, Calif, USA. Johnson R M (1966). The minimal transformation to orthonormality. Psychometrika, 31, 61-66. Johnson J (2000) A Heuristic Method for Estimating the Relative Weight of Predictor Variables in Multiple Regression Multivariate Behavioral Research 35(1):1-19 Lebart, L, Piron, M, and Steiner, J. F. (2003) La Sémiométrie. Dunod Ledoit O, Wolf M (2004) A well-conditioned estimator for large-dimensional covariance matrices Journal of Multivariate Analysis 88(2) 365-411 Liberati C, Camillo F, Saporta G (2017) Advances in credit scoring: combining performance and interpretation in kernel discriminant analysis. Advances in Data Analysis and Classification 11(1):121-138 Liberati November 10th 2017 32 / 36
Examples Malhotra R, Malhotra DK (2003) Evaluating consumer loans using neural networks. Omega 31:83-96 Mercer J (1909) Functions of positive and negative type and their connection with the theory of integral equations. Philos Trans R Soc Lond 209:415-446. Mika S, Rätsch J G Weston, Schölkopf B, Müller KR (2003) Constructing descriptive and discrimina- tive nonlinear features: Rayleigh coefficients in kernel feature spaces. IEEE Transactions on Pattern Analysis and Machine Intelligence 5:623-628 Schölkopf B, Smola AJ (2002) Learning with Kernels. MIT Press, Cambridge, MA. Suykens J, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Processing Letters 9(3):293-300 Shurygin A (1983) The linear combination of the simplest discriminator and Fisher’s one in Applied Statistics, Nauka (Ed.), Moscow 144-158 (in Russian) Stein C (1975) Estimation of a covariance matrix, Rietz Lecture in Proceedings of the 39th Annual Meeting IMS, Atlanta, Ga, USA. Vapnik V (1995) The Nature of Statistical Learning Theory. Springer, New York Liberati November 10th 2017 33 / 36
Examples
F
p−1 n·tr(ˆ ΣMLE ) Ip
n n+m ˆ
n n+m )ˆ
Return Liberati November 10th 2017 34 / 36
Examples
Return Liberati November 10th 2017 35 / 36
Examples
r Liberati November 10th 2017 36 / 36