(Machine) Learning To Detect Fraudsters Hany Elemary Sarah LeBlanc - - PowerPoint PPT Presentation

machine learning to detect fraudsters
SMART_READER_LITE
LIVE PREVIEW

(Machine) Learning To Detect Fraudsters Hany Elemary Sarah LeBlanc - - PowerPoint PPT Presentation

(Machine) Learning To Detect Fraudsters Hany Elemary Sarah LeBlanc CREDIT CARD FRAUD TRANSACTION APPLICATION CARD NOT FOUND 2 FRAUD DETECTION MODEL PLOT Not Fraud Fraud Application Count False Negatives False Positives 0 0.2 0.4


slide-1
SLIDE 1

(Machine) Learning To Detect Fraudsters

Hany Elemary Sarah LeBlanc

slide-2
SLIDE 2

CREDIT CARD FRAUD

2

TRANSACTION APPLICATION CARD NOT FOUND

slide-3
SLIDE 3

FRAUD DETECTION MODEL PLOT

3

1 0.2 0.4 0.6 0.8

Model Score Application Count

Not Fraud Fraud False Positives False Negatives

slide-4
SLIDE 4

MINIMIZE LOSSES

4

Lost Profitability = 
 
 (Fraud Cost * FN) + (Opportunity Cost * FP)

FP (Mistaken fraud) FN (Fraud missed) Legend:

slide-5
SLIDE 5

Fraud
 Detection

Application Service

CURRENT STATE

5

Vendor

Customer

Rules Model Strategies

slide-6
SLIDE 6

Fraud
 Detection Customer

Vendor

Application Service

PROPOSED STATE

6

Fraud
 Detection

Strategies Rules

CHALLENGER MODELS CHAMPION

slide-7
SLIDE 7

MODEL TRAINING

7

Historical
 Data Training Model Classification Fraud Not Fraud

Supervised Learning

slide-8
SLIDE 8

8

DATA PATTERNS

Filter Transform Impute Features

slide-9
SLIDE 9

9

DATA FILTERING

Low Cardinality

slide-10
SLIDE 10

10

DATA FILTERING

High Cardinality

slide-11
SLIDE 11

11

DATA FILTERING

Medium Cardinality

slide-12
SLIDE 12

12

DATA FILTERING

Predictive Model Training

Medium Cardinality

slide-13
SLIDE 13

DATA TRANSFORMATION

13

Fraud Status

jack.smith@gmail.com annie.may@fraudster.com freddy.jr@gmail.com nicole.jack@fraudster.com jon.johnston@gmail.com claudia.penns@us.gov walter.carson@gmail.com ben.benjamin@fraudster.com

Email

slide-14
SLIDE 14

DATA TRANSFORMATION

14

Domain name Fraud Status

gmail.com fraudster.com gmail.com fraudster.com gmail.com us.gov gmail.com fraudster.com

slide-15
SLIDE 15

15

DATA IMPUTATION

Column 1

Column 2 Column 3 Column 4

Handling Missing Data

slide-16
SLIDE 16

16

DATA IMPUTATION

Column 1

Column 2 Column 3 Column 4

Handling Missing Data

slide-17
SLIDE 17

17

FEATURE SELECTION

IP to Zip Proximity

slide-18
SLIDE 18

ARCHITECTURE

18

DATA SCIENTIST WORKFLOW DEVELOPER WORKFLOW Transformed Data Trained Model Raw Data Trained Model Score Applications

slide-19
SLIDE 19

Transformed Data Raw Data

19

DATA SCIENTIST WORKFLOW

Clean Impute Transform

Binary Repository Historical Data Store

Trained Model

slide-20
SLIDE 20

Trained Model Score Applications

Message Queue

20

DEVELOPER WORKFLOW

Model 1 Model 2 Model 3

Shadow Mode

Decisioning & Analytics
 Platform Application Service

Vendor

Model Predictions
 Store

Rules Model Strategies

Binary Repository

Model 1

slide-21
SLIDE 21

21

Decisioning & Analytics
 Platform Application Service

Model Predictions
 Store

Message Queue Model 1 Model 2 Model 3

Shadow Mode

Champion Model

Binary Repository

DEVELOPER WORKFLOW

Vendor

Rules Strategies

Trained Model Score Applications

slide-22
SLIDE 22

22

DEVELOPER WORKFLOW

DATA SCIENTIST WORKFLOW

ARCHITECTURE

Rul Str

slide-23
SLIDE 23

23

VALUE STREAM

Data Ingestion Model Training Governance Publish Model Publish Service Shadow Mode Governance Evaluation Champion Model

25 50 75 100
slide-24
SLIDE 24

24

THANK YOU

Sarah LeBlanc Hany Elemary

sleblanc@thoughtworks.com helemary@thoughtworks.com @sarah_g_leblanc @hanyelemary

Questions?