support vector machines for bankruptcy analysis
play

SUPPORT VECTOR MACHINES FOR BANKRUPTCY ANALYSIS Wolfgang H ARDLE 2 - PowerPoint PPT Presentation

1 SUPPORT VECTOR MACHINES FOR BANKRUPTCY ANALYSIS Wolfgang H ARDLE 2 Rouslan MORO 1 , 2 Dorothea SCH AFER 1 1 Deutsches Institut f ur Wirtschafts- forschung (DIW) 2 Center for Applied Statistics and Eco- nomics (CASE),


  1. 1 SUPPORT VECTOR MACHINES FOR BANKRUPTCY ANALYSIS Wolfgang H¨ ARDLE 2 Rouslan MORO 1 , 2 Dorothea SCH¨ AFER 1 1 Deutsches Institut f¨ ur Wirtschafts- forschung (DIW) 2 Center for Applied Statistics and Eco- nomics (CASE), Humboldt-Universit¨ at zu Berlin Corporate Bankruptcy Prediction with SVMs

  2. Motivation 2 Linear Discriminant Analysis Fisher (1936); company scoring: Beaver (1966), Altman (1968) Z-score: Z i = a 1 x i 1 + a 2 x i 2 + ... + a d x id = a ⊤ x i , where x i = ( x i 1 , ..., x id ) ⊤ ∈ R d are financial ratios for the i -th company. successful company: Z i ≥ z The classification rule: failure: Z i < z Corporate Bankruptcy Prediction with SVMs

  3. Motivation 3 Linear Discriminant Analysis o� o� X� o� o� Surviving� 2� o� o� companies� o� o� o� o� o� o� o� o� x� o� x� o� o� x� o� x� o� o� o� o� o� x� x� o� o� o� x� o� x� x� o� o� o� x� o� x� x� o� o� x� x� x� x� o� o� o� x� x� o� x� x� x� o� x� x� x� x� x� x� x� x� x� x� Failing� x� companies� ?� x� x� X� 1� Corporate Bankruptcy Prediction with SVMs

  4. Motivation 4 Linear Discriminant Analysis Failing� Surviving� companies� companies� Distribution density� Z� Score� Corporate Bankruptcy Prediction with SVMs

  5. Motivation 5 Company Data: Probability of Default Source: Falkenstein et al. (2000) Corporate Bankruptcy Prediction with SVMs

  6. Motivation 6 RiskCalc Private Model Moody’s default model for private firms A semi-parametric model based on the probit regression d � E [ y i | x i ] = Φ { a 0 + a j f j ( x ij ) } j =1 f j are estimated non-parametrically on univariate models Corporate Bankruptcy Prediction with SVMs

  7. Motivation 7 Linearly Non-separable Classification Problem o� X� 2� 3� 1� 2� o� o� o� o� o� o� o� x� x� Surviving� o� o� o� x� companies� o� o� x� o� x� o� o� x� x� o� x� o� o� x� o� x� o� o� o� o� x� x� o� x� x� x� o� x� x� o� o� x� o� x� o� o� o� x� x� o� x� o� x� o� x� x� x� x� o� x� x� x� Failing� o� o� x� companies� x� X� 1� Corporate Bankruptcy Prediction with SVMs

  8. Outline of the Talk 8 Outline 1. Motivation � 2. Support Vector Machines and their Properties 3. Expected Risk vs. Empirical Risk Minimization 4. Realization of an SVM 5. Non-linear Case 6. Company Classification and Rating with SVMs Corporate Bankruptcy Prediction with SVMs

  9. Support Vector Machines and Their Properties 9 Support Vector Machines (SVMs) SVMs are a group of methods for classification (and regression) that make use of classifiers providing “high margin”. ⊡ SVMs possess a flexible structure which is not chosen a priori ⊡ The properties of SVMs can be derived from statistical learning theory ⊡ SVMs do not rely on asymptotic properties; they are especially useful when d/n is big, i.e. in most practically significant cases ⊡ SVMs give a unique solution and outperform Neural Networks Corporate Bankruptcy Prediction with SVMs

  10. Support Vector Machines and Their Properties 10 Classification Problem Training set : { ( x i , y i ) } n i =1 with the distribution P ( x i , y i ) . Find the class y of a new object x using the classifier f : R d �→ { +1; − 1 } , such that the expected risk R ( f ) is minimal . x i ∈ R d is the vector of the i -th object characteristics; y i ∈ {− 1; +1 } or { 0; 1 } is the class of the i -th object. Regression Problem Setup as for the classification problem but: y ∈ R Corporate Bankruptcy Prediction with SVMs

  11. Expected Risk vs. Empirical Risk Minimization 11 Expected Risk Minimization Expected risk � 1 R ( f ) = 2 | f ( x ) − y | dP ( x, y ) = E P ( x,y ) [ L ( x, y )] is minimized wrt f : f opt = arg min f ∈F R ( f )   0 , if classification is correct, L ( x, y ) = 1 2 | f ( x ) − y | =  1 , if classification is wrong. F is an a priori defined set of (non)linear classifier functions Corporate Bankruptcy Prediction with SVMs

  12. Expected Risk vs. Empirical Risk Minimization 12 Empirical Risk Minimization In practice P ( x, y ) is usually unknown : use Empirical Risk n � R ( f ) = 1 1 ˆ 2 | f ( x i ) − y i | n i =1 Minimization (ERM) over the training set { ( x i , y i ) } n i =1 ˆ ˆ f n = arg min R ( f ) f ∈F Corporate Bankruptcy Prediction with SVMs

  13. Expected Risk vs. Empirical Risk Minimization 13 Empirical Risk vs. Expected Risk Risk� ˆ� R� R� ˆ� R� (f)� � R� (f)� � ˆ� Function class� f� f� f� opt� n� Corporate Bankruptcy Prediction with SVMs

  14. Expected Risk vs. Empirical Risk Minimization 14 Convergence From the law of large numbers ˆ lim R ( f ) = R ( f ) n →∞ In addition ERM satisfies ˆ n →∞ min lim R ( f ) = min f ∈F R ( f ) f ∈F if “ F is not too big”. Corporate Bankruptcy Prediction with SVMs

  15. Expected Risk vs. Empirical Risk Minimization 15 Vapnik-Chervonenkis (VC) Bound Basic result of Statistical Learning Theory (for linear classifiers): � h � n, ln( η ) R ( f ) ≤ ˆ R ( f ) + φ n where the bound holds with probability 1 − η and � � h � h + 1) − ln( η h (ln 2 n 4 ) n, ln ( η ) φ = n n Corporate Bankruptcy Prediction with SVMs

  16. Expected Risk vs. Empirical Risk Minimization 16 Structural Risk Minimization Structural Risk Minimization – search for the model structure S h , S h 1 ⊆ S h 2 ⊆ . . . ⊆ S h ⊆ . . . ⊆ S hk ⊆ F , such that f ∈ S h minimizes the expected risk upper bound. h is VC dimension . S h is a set of classifier functions with the same complexity described by h , e.g. P (1) ⊆ P (2) ⊆ P (3) ⊆ . . . ⊆ F , where P ( i ) are polynomials of degree i . The functional class F is given a priori Corporate Bankruptcy Prediction with SVMs

  17. Expected Risk vs. Empirical Risk Minimization 17 Vapnik-Chervonenkis (VC) Dimension Definition . h is VC dimension of a set of functions if there exists a set of points { x i } h i =1 such that these points can be separated in all 2 h possible configurations, and no set { x i } q i =1 exists where q > h satisfies this property. Example 1 . The functions f = A sin θx have an infinite VC dimension. Example 2 . Three points on a plane can be shattered by a set of linear indicator functions in 2 h = 2 3 = 8 ways (whereas 4 points cannot be shattered in 2 q = 2 4 = 16 ways). The VC dimension equals h = 3 . Example 3 . The VC dimension of f = { Hyperplane ∈ R d } is h = d + 1 . Corporate Bankruptcy Prediction with SVMs

  18. Expected Risk vs. Empirical Risk Minimization 18 VC Dimension (d=2, h=3) Corporate Bankruptcy Prediction with SVMs

  19. Realization of the SVM 19 Linearly Separable Case The training set: { ( x i , y i ) } n i =1 , y i = { +1; − 1 } , x i ∈ R d . Find the classifier with the highest “margin” – the gap between parallel hyperplanes separating two classes where the vectors of neither class can lie. Margin maximization minimizes the VC dimension. o� o� x� o� x� o� x� o� o� x� x� o� o� o� x� x� o� x� x� x� Corporate Bankruptcy Prediction with SVMs

  20. Realization of the SVM 20 Linear SVMs. Separable Case The margin is d + + d − = 2 / � w � . To maximize it minimize the Euclidean norm � w � subject to the constraint (1). x� T� w+b=0� x� 2� margin� o� T� w+b=-1� o� x� o� x� o� o� x� o� x� x� o� w� � b x� - o� � | w | o� x� x� o� x� --� d� +� x� � x� d T� w+b=1� x� 0� x� 1� Corporate Bankruptcy Prediction with SVMs

  21. Realization of the SVM 21 Let x ⊤ w + b = 0 be a separating hyperplane. Then d + ( d − ) will be the shortest distance to the closest objects from the classes +1 ( − 1) . x ⊤ i w + b ≥ +1 for y i = +1 x ⊤ i w + b ≤ − 1 for y i = − 1 combine them into one constraint y i ( x ⊤ i w + b ) − 1 ≥ 0 i = 1 , 2 , ..., n (1) The canonical hyperplanes x ⊤ i w + b = ± 1 are parallel and the distance between each of them and the separating hyperplane is d + = d − = 1 / � w � . Corporate Bankruptcy Prediction with SVMs

  22. Realization of the SVM 22 The Lagrangian Formulation The Lagrangian for the primal problem n � L P = 1 2 � w � 2 − α i { y i ( x ⊤ i w + b ) − 1 } i =1 The Karush-Kuhn-Tucker (KKT) Conditions � n ∂L P ∂w k = 0 ⇔ i =1 α i y i x ik = 0 k = 1 , ..., d � n ∂L P ∂b = 0 ⇔ i =1 α i y i = 0 y i ( x ⊤ i w + b ) − 1 ≥ 0 i = 1 , ..., n α i ≥ 0 α i { y i ( x ⊤ i w + b ) − 1 } = 0 Corporate Bankruptcy Prediction with SVMs

  23. Realization of the SVM 23 Substitute the KKT conditions into L P and obtain the Lagrangian for the dual problem n n n � � � α i − 1 α i α j y i y j x ⊤ L D = i x j 2 i =1 i =1 j =1 The primal and dual problems are min w k ,b max α i L P max α i L D s.t. n � α i ≥ 0 α i y i = 0 i =1 Since the optimization problem is convex the dual and primal formulations give the same solution. Corporate Bankruptcy Prediction with SVMs

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend