logistic regression for probabilit y of defa u lt
play

Logistic regression for probabilit y of defa u lt C R E D IT R ISK - PowerPoint PPT Presentation

Logistic regression for probabilit y of defa u lt C R E D IT R ISK MOD E L IN G IN P YTH ON Michael Crabtree Data Scientist , Ford Motor Compan y Probabilit y of defa u lt The likelihood that someone w ill defa u lt on a loan is the probabilit


  1. Logistic regression for probabilit y of defa u lt C R E D IT R ISK MOD E L IN G IN P YTH ON Michael Crabtree Data Scientist , Ford Motor Compan y

  2. Probabilit y of defa u lt The likelihood that someone w ill defa u lt on a loan is the probabilit y of defa u lt A probabilit y v al u e bet w een 0 and 1 like 0.86 loan_status of 1 is a defa u lt or 0 for non - defa u lt CREDIT RISK MODELING IN PYTHON

  3. Probabilit y of defa u lt The likelihood that someone w ill defa u lt on a loan is the probabilit y of defa u lt A probabilit y v al u e bet w een 0 and 1 like 0.86 loan_status of 1 is a defa u lt or 0 for non - defa u lt Probabilit y of Defa u lt Interpretation Predicted loan stat u s 0.4 Unlikel y to defa u lt 0 0.90 Ver y likel y to defa u lt 1 0.1 Ver y u nlikel y to defa u lt 0 CREDIT RISK MODELING IN PYTHON

  4. Predicting probabilities Probabilities of defa u lt as an o u tcome from machine learning Learn from data in col u mns ( feat u res ) Classi � cation models ( defa u lt , non - defa u lt ) T w o most common models : Logistic regression Decision tree CREDIT RISK MODELING IN PYTHON

  5. Logistic regression Similar to the linear regression , b u t onl y prod u ces v al u es bet w een 0 and 1 CREDIT RISK MODELING IN PYTHON

  6. Training a logistic regression Logistic regression a v ailable w ithin the scikit - learn package from sklearn.linear_model import LogisticRegression Called as a f u nction w ith or w itho u t parameters clf_logistic = LogisticRegression(solver='lbfgs') Uses the method .fit() to train clf_logistic.fit(training_columns, np.ravel(training_labels)) Training Col u mns : all of the col u mns in o u r data e x cept loan_status Labels : loan_status (0,1) CREDIT RISK MODELING IN PYTHON

  7. Training and testing Entire data set is u s u all y split into t w o parts CREDIT RISK MODELING IN PYTHON

  8. Training and testing Entire data set is u s u all y split into t w o parts Data S u bset Usage Portion Train Learn from the data to generate predictions 60% Test Test learning on ne w u nseen data 40% CREDIT RISK MODELING IN PYTHON

  9. Creating the training and test sets Separate the data into training col u mns and labels X = cr_loan.drop('loan_status', axis = 1) y = cr_loan[['loan_status']] Use train_test_split() f u nction alread y w ithin sci - kit learn X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.4, random_state=123) test_size : percentage of data for test set random_state : a random seed v al u e for reprod u cibilit y CREDIT RISK MODELING IN PYTHON

  10. Let ' s practice ! C R E D IT R ISK MOD E L IN G IN P YTH ON

  11. Predicting the probabilit y of defa u lt C R E D IT R ISK MOD E L IN G IN P YTH ON Michael Crabtree Data Scientist , Ford Motor Compan y

  12. Logistic regression coefficients # Model Intercept array([-3.30582292e-10]) # Coefficients for ['loan_int_rate','person_emp_length','person_income'] array([[ 1.28517496e-09, -2.27622202e-09, -2.17211991e-05]]) # Calculating probability of default int_coef_sum = -3.3e-10 + (1.29e-09 * loan_int_rate) + (-2.28e-09 * person_emp_length) + (-2.17e-05 * person_income) prob_default = 1 / (1 + np.exp(-int_coef_sum)) prob_nondefault = 1 - (1 / (1 + np.exp(-int_coef_sum))) CREDIT RISK MODELING IN PYTHON

  13. Interpreting coefficients # Intercept intercept = -1.02 # Coefficient for employment length person_emp_length_coef = -0.056 For e v er y 1 y ear increase in person_emp_length , the person is less likel y to defa u lt CREDIT RISK MODELING IN PYTHON

  14. Interpreting coefficients # Intercept intercept = -1.02 # Coefficient for employment length person_emp_length_coef = -0.056 For e v er y 1 y ear increase in person_emp_length , the person is less likel y to defa u lt intercept person _ emp _ length v al u e * coef probabilit y of defa u lt -1.02 (10 * -0.06 ) 10 .17 -1.02 (11 * -0.06 ) 11 .16 -1.02 (12 * -0.06 ) 12 .15 CREDIT RISK MODELING IN PYTHON

  15. Using non - n u meric col u mns N u meric : loan_int_rate , person_emp_length , person_income Non - n u meric : cr_loan_clean['loan_intent'] EDUCATION MEDICAL VENTURE PERSONAL DEBTCONSOLIDATION HOMEIMPROVEMENT Will ca u se errors w ith machine learning models in P y thon u nless processed CREDIT RISK MODELING IN PYTHON

  16. One - hot encoding Represent a string w ith a n u mber CREDIT RISK MODELING IN PYTHON

  17. One - hot encoding Represent a string w ith a n u mber 0 or 1 in a ne w col u mn column_VALUE CREDIT RISK MODELING IN PYTHON

  18. Get d u mmies Utili z e the get_dummies() w ithin pandas # Separate the numeric columns cred_num = cr_loan.select_dtypes(exclude=['object']) # Separate non-numeric columns cred_cat = cr_loan.select_dtypes(include=['object']) # One-hot encode the non-numeric columns only cred_cat_onehot = pd.get_dummies(cred_cat) # Union the numeric columns with the one-hot encoded columns cr_loan = pd.concat([cred_num, cred_cat_onehot], axis=1) CREDIT RISK MODELING IN PYTHON

  19. Predicting the f u t u re , probabl y Use the .predict_proba() method w ithin scikit - learn # Train the model clf_logistic.fit(X_train, np.ravel(y_train)) # Predict using the model clf_logistic.predict_proba(X_test) Creates arra y of probabilities of defa u lt # Probabilities: [[non-default, default]] array([[0.55, 0.45]]) CREDIT RISK MODELING IN PYTHON

  20. Let ' s practice ! C R E D IT R ISK MOD E L IN G IN P YTH ON

  21. Credit model performance C R E D IT R ISK MOD E L IN G IN P YTH ON Michael Crabtree Data Scientist , Ford Motor Compan y

  22. Model acc u rac y scoring Calc u late acc u rac y Use the .score() method from scikit - learn # Check the accuracy against the test data clf_logistic1.score(X_test,y_test) 0.81 81% of v al u es for loan_status predicted correctl y CREDIT RISK MODELING IN PYTHON

  23. ROC c u r v e charts Recei v er Operating Characteristic c u r v e Plots tr u e positi v e rate ( sensiti v it y) against false positi v e rate ( fall - o u t ) fallout, sensitivity, thresholds = roc_curve(y_test, prob_default) plt.plot(fallout, sensitivity, color = 'darkorange') CREDIT RISK MODELING IN PYTHON

  24. Anal yz ing ROC charts Area Under C u r v e ( AUC ): area bet w een c u r v e and random prediction CREDIT RISK MODELING IN PYTHON

  25. Defa u lt thresholds Threshold : at w hat point a probabilit y is a defa u lt CREDIT RISK MODELING IN PYTHON

  26. Setting the threshold Relabel loans based on o u r threshold of 0.5 preds = clf_logistic.predict_proba(X_test) preds_df = pd.DataFrame(preds[:,1], columns = ['prob_default']) preds_df['loan_status'] = preds_df['prob_default'].apply(lambda x: 1 if x > 0.5 else 0) CREDIT RISK MODELING IN PYTHON

  27. Credit classification reports classification_report() w ithin scikit - learn from sklearn.metrics import classification_report classification_report(y_test, preds_df['loan_status'], target_names=target_names) CREDIT RISK MODELING IN PYTHON

  28. Selecting classification metrics Select and store speci � c components from the classification_report() Use the precision_recall_fscore_support() f u nction from scikit - learn from sklearn.metrics import precision_recall_fscore_support precision_recall_fscore_support(y_test,preds_df['loan_status'])[1][1] CREDIT RISK MODELING IN PYTHON

  29. Let ' s practice ! C R E D IT R ISK MOD E L IN G IN P YTH ON

  30. Model discrimination and impact C R E D IT R ISK MOD E L IN G IN P YTH ON Michael Crabtree Data Scientist , Ford Motor Compan y

  31. Conf u sion matrices Sho w s the n u mber of correct and incorrect predictions for each loan_status CREDIT RISK MODELING IN PYTHON

  32. Defa u lt recall for loan stat u s Defa u lt recall ( or sensiti v it y) is the proportion of tr u e defa u lts predicted CREDIT RISK MODELING IN PYTHON

  33. Recall portfolio impact Classi � cation report - Underperforming Logistic Regression model CREDIT RISK MODELING IN PYTHON

  34. Recall portfolio impact Classi � cation report - Underperforming Logistic Regression model N u mber of tr u e defa u lts : 50,000 Loan Amo u nt Defa u lts Predicted / Not Predicted Estimated Loss on Defa u lts $50 .04 / .96 (50000 x .96) x 50 = $2,400,000 CREDIT RISK MODELING IN PYTHON

  35. Recall , precision , and acc u rac y Di � c u lt to ma x imi z e all of them beca u se there is a trade - o � CREDIT RISK MODELING IN PYTHON

  36. Let ' s practice ! C R E D IT R ISK MOD E L IN G IN P YTH ON

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend