Understanding credit risk
C R E D IT R ISK MOD E L IN G IN P YTH ON
Michael Crabtree
Data Scientist, Ford Motor Company
Understanding credit risk C R E D IT R ISK MOD E L IN G IN P YTH - - PowerPoint PPT Presentation
Understanding credit risk C R E D IT R ISK MOD E L IN G IN P YTH ON Michael Crabtree Data Scientist , Ford Motor Compan y What is credit risk ? The possibilit y that someone w ho has borro w ed mone y w ill not repa y it all Calc u lated risk
C R E D IT R ISK MOD E L IN G IN P YTH ON
Michael Crabtree
Data Scientist, Ford Motor Company
CREDIT RISK MODELING IN PYTHON
The possibility that someone who has borrowed money will not repay it all Calculated risk dierence between lending someone money and a government bond When someone fails to repay a loan, it is said to be in default The likelihood that someone will default on a loan is the probability of default (PD)
CREDIT RISK MODELING IN PYTHON
The possibility that someone who has borrowed money will not repay it all Calculated risk dierence between lending someone money and a government bond When someone fails to repay a loan, it is said to be in default The likelihood that someone will default on a loan is the probability of default (PD) Payment Payment Date Loan Status $100 Jun 15 Non-Default $100 Jul 15 Non-Default $0 Aug 15 Default
CREDIT RISK MODELING IN PYTHON
The dollar amount the rm loses as a result of loan default Three primary components: Probability of Default (PD) Exposure at Default (EAD) Loss Given Default (LGD) Formula for expected loss:
expected_loss = PD * EAD * LGD
CREDIT RISK MODELING IN PYTHON
Two Primary types of data used: Application data Behavioral data Application Behavioral Interest Rate Employment Length Grade Historical Default Amount Income
CREDIT RISK MODELING IN PYTHON
Mix of behavioral and application Contain columns simulating credit bureau data Column Column Income Loan grade Age Loan amount Home ownership Interest rate Employment length Loan status Loan intent Historical default Percent Income Credit history length
CREDIT RISK MODELING IN PYTHON
pd.crosstab(cr_loan['person_home_ownership'], cr_loan['loan_status'], values=cr_loan['loan_int_rate'], aggfunc='mean').round(2)
CREDIT RISK MODELING IN PYTHON
plt.scatter(cr_loan['person_income'], cr_loan['loan_int_rate'],c='blue', alpha=0.5) plt.xlabel("Personal Income") plt.ylabel("Loan Interest Rate") plt.show()
C R E D IT R ISK MOD E L IN G IN P YTH ON
C R E D IT R ISK MOD E L IN G IN P YTH ON
Michael Crabtree
Data Scientist, Ford Motor Company
CREDIT RISK MODELING IN PYTHON
Prepared data allows models to train faster Oen positively impacts model performance
CREDIT RISK MODELING IN PYTHON
Possible causes of outliers: Problems with data entry systems (human error) Issues with data ingestion tools
CREDIT RISK MODELING IN PYTHON
Possible causes of outliers: Problems with data entry systems (human error) Issues with data ingestion tools Feature Coecient With Outliers Coecient Without Outliers Interest Rate 0.2 0.01 Employment Length 0.5 0.6 Income 0.6 0.75
CREDIT RISK MODELING IN PYTHON
Use cross tables with aggregate functions
pd.crosstab(cr_loan['person_home_ownership'], cr_loan['loan_status'], values=cr_loan['loan_int_rate'], aggfunc='mean').round(2)
CREDIT RISK MODELING IN PYTHON
Detecting outliers visually Histograms Scaer plots
CREDIT RISK MODELING IN PYTHON
Use the .drop() method within Pandas
indices = cr_loan[cr_loan['person_emp_length'] >= 60].index cr_loan.drop(indices, inplace=True)
C R E D IT R ISK MOD E L IN G IN P YTH ON
C R E D IT R ISK MOD E L IN G IN P YTH ON
Michael Crabtree
Data Scientist, Ford Motor Company
CREDIT RISK MODELING IN PYTHON
NULLs in a row instead of an actual value An empty string '' Not an entirely empty row Can occur in any column in the data
CREDIT RISK MODELING IN PYTHON
Negatively aect machine learning model performance May bias models in unanticipated ways May cause errors for some machine learning models
CREDIT RISK MODELING IN PYTHON
Negatively aect machine learning model performance May bias models in unanticipated ways May cause errors for some machine learning models Missing Data Type Possible Result NULL in numeric column Error NULL in string column Error
CREDIT RISK MODELING IN PYTHON
Generally three ways to handle missing data Replace values where the data is missing Remove the rows containing missing data Leave the rows with missing data unchanged Understanding the data determines the course of action
CREDIT RISK MODELING IN PYTHON
Generally three ways to handle missing data Replace values where the data is missing Remove the rows containing missing data Leave the rows with missing data unchanged Understanding the data determines the course of action Missing Data Interpretation Action NULL in loan_status Loan recently approved Remove from prediction data NULL in person_age Age not recorded or disclosed Replace with median
CREDIT RISK MODELING IN PYTHON
Null values are easily found by using the isnull() function Null records can easily be counted with the sum() function
.any() method checks all columns
null_columns = cr_loan.columns[cr_loan.isnull().any()] cr_loan[null_columns].isnull().sum() # Total number of null values per column person_home_ownership 25 person_emp_length 895 loan_intent 25 loan_int_rate 3140 cb_person_default_on_file 15
CREDIT RISK MODELING IN PYTHON
Replace the missing data using methods like .fillna() with aggregate functions and methods
cr_loan['loan_int_rate'].fillna((cr_loan['loan_int_rate'].mean()), inplace = True)
CREDIT RISK MODELING IN PYTHON
Uses indices to identify records the same as with outliers Remove the records entirely using the .drop() method
indices = cr_loan[cr_loan['person_emp_length'].isnull()].index cr_loan.drop(indices, inplace=True)
C R E D IT R ISK MOD E L IN G IN P YTH ON