Understanding credit risk C R E D IT R ISK MOD E L IN G IN P YTH - - PowerPoint PPT Presentation

understanding credit risk
SMART_READER_LITE
LIVE PREVIEW

Understanding credit risk C R E D IT R ISK MOD E L IN G IN P YTH - - PowerPoint PPT Presentation

Understanding credit risk C R E D IT R ISK MOD E L IN G IN P YTH ON Michael Crabtree Data Scientist , Ford Motor Compan y What is credit risk ? The possibilit y that someone w ho has borro w ed mone y w ill not repa y it all Calc u lated risk


slide-1
SLIDE 1

Understanding credit risk

C R E D IT R ISK MOD E L IN G IN P YTH ON

Michael Crabtree

Data Scientist, Ford Motor Company

slide-2
SLIDE 2

CREDIT RISK MODELING IN PYTHON

What is credit risk?

The possibility that someone who has borrowed money will not repay it all Calculated risk dierence between lending someone money and a government bond When someone fails to repay a loan, it is said to be in default The likelihood that someone will default on a loan is the probability of default (PD)

slide-3
SLIDE 3

CREDIT RISK MODELING IN PYTHON

What is credit risk?

The possibility that someone who has borrowed money will not repay it all Calculated risk dierence between lending someone money and a government bond When someone fails to repay a loan, it is said to be in default The likelihood that someone will default on a loan is the probability of default (PD) Payment Payment Date Loan Status $100 Jun 15 Non-Default $100 Jul 15 Non-Default $0 Aug 15 Default

slide-4
SLIDE 4

CREDIT RISK MODELING IN PYTHON

Expected loss

The dollar amount the rm loses as a result of loan default Three primary components: Probability of Default (PD) Exposure at Default (EAD) Loss Given Default (LGD) Formula for expected loss:

expected_loss = PD * EAD * LGD

slide-5
SLIDE 5

CREDIT RISK MODELING IN PYTHON

Types of data used

Two Primary types of data used: Application data Behavioral data Application Behavioral Interest Rate Employment Length Grade Historical Default Amount Income

slide-6
SLIDE 6

CREDIT RISK MODELING IN PYTHON

Data columns

Mix of behavioral and application Contain columns simulating credit bureau data Column Column Income Loan grade Age Loan amount Home ownership Interest rate Employment length Loan status Loan intent Historical default Percent Income Credit history length

slide-7
SLIDE 7

CREDIT RISK MODELING IN PYTHON

Exploring with cross tables

pd.crosstab(cr_loan['person_home_ownership'], cr_loan['loan_status'], values=cr_loan['loan_int_rate'], aggfunc='mean').round(2)

slide-8
SLIDE 8

CREDIT RISK MODELING IN PYTHON

Exploring with visuals

plt.scatter(cr_loan['person_income'], cr_loan['loan_int_rate'],c='blue', alpha=0.5) plt.xlabel("Personal Income") plt.ylabel("Loan Interest Rate") plt.show()

slide-9
SLIDE 9

Let's practice!

C R E D IT R ISK MOD E L IN G IN P YTH ON

slide-10
SLIDE 10

Outliers in Credit Data

C R E D IT R ISK MOD E L IN G IN P YTH ON

Michael Crabtree

Data Scientist, Ford Motor Company

slide-11
SLIDE 11

CREDIT RISK MODELING IN PYTHON

Data processing

Prepared data allows models to train faster Oen positively impacts model performance

slide-12
SLIDE 12

CREDIT RISK MODELING IN PYTHON

Outliers and performance

Possible causes of outliers: Problems with data entry systems (human error) Issues with data ingestion tools

slide-13
SLIDE 13

CREDIT RISK MODELING IN PYTHON

Outliers and performance

Possible causes of outliers: Problems with data entry systems (human error) Issues with data ingestion tools Feature Coecient With Outliers Coecient Without Outliers Interest Rate 0.2 0.01 Employment Length 0.5 0.6 Income 0.6 0.75

slide-14
SLIDE 14

CREDIT RISK MODELING IN PYTHON

Detecting outliers with cross tables

Use cross tables with aggregate functions

pd.crosstab(cr_loan['person_home_ownership'], cr_loan['loan_status'], values=cr_loan['loan_int_rate'], aggfunc='mean').round(2)

slide-15
SLIDE 15

CREDIT RISK MODELING IN PYTHON

Detecting outliers visually

Detecting outliers visually Histograms Scaer plots

slide-16
SLIDE 16

CREDIT RISK MODELING IN PYTHON

Removing outliers

Use the .drop() method within Pandas

indices = cr_loan[cr_loan['person_emp_length'] >= 60].index cr_loan.drop(indices, inplace=True)

slide-17
SLIDE 17

Let's practice!

C R E D IT R ISK MOD E L IN G IN P YTH ON

slide-18
SLIDE 18

Risk with missing data in loan data

C R E D IT R ISK MOD E L IN G IN P YTH ON

Michael Crabtree

Data Scientist, Ford Motor Company

slide-19
SLIDE 19

CREDIT RISK MODELING IN PYTHON

What is missing data?

NULLs in a row instead of an actual value An empty string '' Not an entirely empty row Can occur in any column in the data

slide-20
SLIDE 20

CREDIT RISK MODELING IN PYTHON

Similarities with outliers

Negatively aect machine learning model performance May bias models in unanticipated ways May cause errors for some machine learning models

slide-21
SLIDE 21

CREDIT RISK MODELING IN PYTHON

Similarities with outliers

Negatively aect machine learning model performance May bias models in unanticipated ways May cause errors for some machine learning models Missing Data Type Possible Result NULL in numeric column Error NULL in string column Error

slide-22
SLIDE 22

CREDIT RISK MODELING IN PYTHON

How to handle missing data

Generally three ways to handle missing data Replace values where the data is missing Remove the rows containing missing data Leave the rows with missing data unchanged Understanding the data determines the course of action

slide-23
SLIDE 23

CREDIT RISK MODELING IN PYTHON

How to handle missing data

Generally three ways to handle missing data Replace values where the data is missing Remove the rows containing missing data Leave the rows with missing data unchanged Understanding the data determines the course of action Missing Data Interpretation Action NULL in loan_status Loan recently approved Remove from prediction data NULL in person_age Age not recorded or disclosed Replace with median

slide-24
SLIDE 24

CREDIT RISK MODELING IN PYTHON

Finding missing data

Null values are easily found by using the isnull() function Null records can easily be counted with the sum() function

.any() method checks all columns

null_columns = cr_loan.columns[cr_loan.isnull().any()] cr_loan[null_columns].isnull().sum() # Total number of null values per column person_home_ownership 25 person_emp_length 895 loan_intent 25 loan_int_rate 3140 cb_person_default_on_file 15

slide-25
SLIDE 25

CREDIT RISK MODELING IN PYTHON

Replacing Missing data

Replace the missing data using methods like .fillna() with aggregate functions and methods

cr_loan['loan_int_rate'].fillna((cr_loan['loan_int_rate'].mean()), inplace = True)

slide-26
SLIDE 26

CREDIT RISK MODELING IN PYTHON

Dropping missing data

Uses indices to identify records the same as with outliers Remove the records entirely using the .drop() method

indices = cr_loan[cr_loan['person_emp_length'].isnull()].index cr_loan.drop(indices, inplace=True)

slide-27
SLIDE 27

Let's practice!

C R E D IT R ISK MOD E L IN G IN P YTH ON