World Class Payment and Enterprise Solutions
for the global financial sector
Machine Learning and Fraud Detection
February 2020
Tamsin Crossland – Senior Architect
@CrosslandTamsin
Machine Learning and Fraud Detection February 2020 Tamsin Crossland - - PowerPoint PPT Presentation
Machine Learning and Fraud Detection February 2020 Tamsin Crossland Senior Architect @CrosslandTamsin World Class Payment and Enterprise Solutions for the global financial sector Two main types of article on AI 2 Machine Learning and
World Class Payment and Enterprise Solutions
for the global financial sector
February 2020
@CrosslandTamsin
2
3
4
5
6
7
8
9
11
Example: if a credit card transaction is more than ten times larger than the average for this customer Allow the human experts to apply their subject matter expertise. Difficult and time-consuming to implement well. Includes the painstaking definition of every single rule for anomaly possible If experts make an omission, undetected anomalies will happen and nobody will suspect it. Today, legacy systems apply about 300 different rules on average to approve a transaction
12
13
14
15
16
17
Rule Based Machine Learning Catches obvious fraudulent scenarios Finds hidden correlations in data Large amount of manual work to enumerate all possible detection rules Automatic detection of possible fraud scenarios Easier to explain More difficult to explain
18
19
20
Data Analysis
data mining and data analysis
21
Contains two days worth of credit card transactions made in September 2013 by European cardholders. 492 frauds out of 284,807 transactions (0.172%). Due to confidentiality issues, cannot provide the original features and more background information about the data. Contains only numerical input variables which are the result of a Principal Component Analysis transformation (a method of extracting relevant information from confusing data sets).
22
23
24
25
26
284315 -> 492
27
28
Janio Martinez Bachmann
29
30
a library for making statistical graphics in Python.
Toolbox for imbalanced dataset in machine learning.
31
32
33
34
35
36
37
38
39
Used to show which features heavily influence whether a transaction is a fraud
40
41
42
43
44
45
46
t-SNE takes a high-dimensional dataset and reduces it to a low-dimensional graph whilst still retaining a lot of the information.
Solving the Class Imbalance: SMOTE creates synthetic points from the minority class in
Location of the synthetic points: SMOTE picks the distance between the closest neighbors
Final Effect: More information is retained since we didn't have to delete any rows unlike in random undersampling.
47
48
Compile the model The following example uses accuracy, the fraction of the transactions that are correctly classified.
with the weights. The loss function is the guide to the terrain, telling the optimizer when it’s moving in the right or wrong direction.
49
50
Predicted: no Predicted: Yes Actual: no True negative False positive Actual: yes False negative True positive
51
Predicted: no Predicted: Yes Actual: no True negative False positive Actual: yes False negative True positive
52
53
54
56
57
58
K-means clustering is a type of unsupervised learning, which is used when you have unlabeled data (i.e., data without defined categories or groups). The goal of this algorithm is to find groups in the data, with the number of groups represented by the variable K. The algorithm works iteratively to assign each data point to one of K groups based on the features that are provided. Data points are clustered based on feature similarity.
59
60
61
62
63
@CrosslandTamsin
64
65
66
67
68
69
@CrosslandTamsin