Normal versus abnormal behaviour Charlotte Werger Data Scientist - PowerPoint PPT Presentation

DataCamp Fraud Detection in Python FRAUD DETECTION IN PYTHON Normal versus abnormal behaviour Charlotte Werger Data Scientist

DataCamp Fraud Detection in Python Fraud detection without labels Using unsupervised learning to distinguish normal from abnormal behaviour Abnormal behaviour by definition is not always fraudulent Challenging because difficult to validate But...realistic because very often you don't have reliable labels

DataCamp Fraud Detection in Python What is normal behaviour? Thoroughly describe your data: plot histograms, check for outliers, investigate correlations and talk to the fraud analyst Are there any known historic cases of fraud? What typifies those cases? Normal behaviour of one type of client may not be normal for another Check patterns within subgroups of data: is your data homogenous?

DataCamp Fraud Detection in Python Customer segmentation: normal behaviour within segments

DataCamp Fraud Detection in Python FRAUD DETECTION IN PYTHON Let's practice!

DataCamp Fraud Detection in Python FRAUD DETECTION IN PYTHON Refresher on clustering methods Charlotte Werger Data Scientist

DataCamp Fraud Detection in Python Clustering: trying to detect patterns in data

DataCamp Fraud Detection in Python K-means clustering: using the distance to cluster centroids

DataCamp Fraud Detection in Python

DataCamp Fraud Detection in Python K-means clustering in Python # Import the packages from sklearn.preprocessing import MinMaxScaler from sklearn.cluster import KMeans # Transform and scale your data X = np.array(df).astype(np.float) scaler = MinMaxScaler() X_scaled = scaler.fit_transform(X) # Define the k-means model and fit to the data kmeans = KMeans(n_clusters=6, random_state=42).fit(X_scaled)

DataCamp Fraud Detection in Python The right amount of clusters Checking the number of clusters: Silhouette method Elbow curve clust = range(1, 10) kmeans = [KMeans(n_clusters=i) for i in clust] score = [kmeans[i].fit(X_scaled).score(X_scaled) for i in range(len(kmeans))] plt.plot(clust,score) plt.xlabel('Number of Clusters') plt.ylabel('Score') plt.title('Elbow Curve') plt.show()

DataCamp Fraud Detection in Python The Elbow Curve

DataCamp Fraud Detection in Python FRAUD DETECTION IN PYTHON Assigning fraud versus non-fraud cases Charlotte Werger Data Scientist

DataCamp Fraud Detection in Python Starting with clustered data

DataCamp Fraud Detection in Python Assign the cluster centroids

DataCamp Fraud Detection in Python Define distances from the cluster centroid

DataCamp Fraud Detection in Python Flag fraud for those furthest away from cluster centroid

DataCamp Fraud Detection in Python Flagging fraud based on distance to centroid # Run the kmeans model on scaled data kmeans = KMeans(n_clusters=6, random_state=42,n_jobs=-1).fit(X_scaled) # Get the cluster number for each datapoint X_clusters = kmeans.predict(X_scaled) # Save the cluster centroids X_clusters_centers = kmeans.cluster_centers_ # Calculate the distance to the cluster centroid for each point dist = [np.linalg.norm(x-y) for x,y in zip(X_scaled, X_clusters_centers[X_clusters])] # Create predictions based on distance km_y_pred = np.array(dist) km_y_pred[dist>=np.percentile(dist, 93)] = 1 km_y_pred[dist<np.percentile(dist, 93)] = 0

DataCamp Fraud Detection in Python Validating your model results Check with the fraud analyst Investigate and describe cases that are flagged in more detail Compare to past known cases of fraud

DataCamp Fraud Detection in Python FRAUD DETECTION IN PYTHON Other clustering fraud detection methods Charlotte Werger Data Scientist

DataCamp Fraud Detection in Python There are many different clustering methods

DataCamp Fraud Detection in Python And different ways of flagging fraud: using smallest clusters

DataCamp Fraud Detection in Python In reality it looks more like this

DataCamp Fraud Detection in Python DBScan versus K-means No need to predefine amount of clusters Adjust maximum distance between points within clusters Assign minimum amount of samples in clusters Better performance on weirdly shaped data But..higher computational costs

DataCamp Fraud Detection in Python Implementing DBscan from sklearn.cluster import DBSCAN db = DBSCAN(eps=0.5, min_samples=10, n_jobs=-1).fit(X_scaled) # Get the cluster labels (aka numbers) pred_labels = db.labels_ # Count the total number of clusters n_clusters_ = len(set(pred_labels)) - (1 if -1 in pred_labels else 0) # Print model results print('Estimated number of clusters: %d' % n_clusters_) Estimated number of clusters: 31

DataCamp Fraud Detection in Python Checking the size of the clusters # Print model results print("Silhouette Coefficient: %0.3f" % metrics.silhouette_score(X_scaled, pred_labels)) Silhouette Coefficient: 0.359 # Get sample counts in each cluster counts = np.bincount(pred_labels[pred_labels>=0]) print (counts) [ 763 496 840 355 1086 676 63 306 560 134 28 18 262 128 332 22 22 13 31 38 36 28 14 12 30 10 11 10 21 10 5]

Normal versus abnormal behaviour Charlotte Werger Data Scientist - PowerPoint PPT Presentation

DataCamp Fraud Detection in Python FRAUD DETECTION IN PYTHON Normal versus abnormal behaviour Charlotte Werger Data Scientist DataCamp Fraud Detection in Python Fraud detection without labels Using unsupervised learning to distinguish

E Egocentric Localization: t i L li ti Normal and Abnormal Normal and Abnormal

Linear regression How to measure the accuracy of linear regression models Linear Regression

Session 14 Introduction to Behaviour that Challenges SECTION 5: 1 Behaviour Behaviour that is

Abnormal Uterine Bleeding: Review differential diagnosis and Evaluation of Premenopausal

On Understanding Normal Protocol Behaviour to Detect the Abnormal P. Smith, D. Hutchison, M.

Chapter 6. Object and System Behaviour 1. Object Behaviour Modelling 2. Global System Behaviour

Detecting abnormal events Detecting abnormal events Jaechul Kim Purpose Purpose Introduce

Normal A Spectrum of Engineering Design Normal Radical A Spectrum of Engineering Design Normal

Abnormal Uterine Bleeding: Evaluation of Premenopausal Women Vanessa Jacoby, MD, MAS Assistant

Catherine Lennox EDPS 650 What is prosocial behaviour? How is prosocial behaviour related to

ANTI SOCIAL BEHAVIOUR WHAT IS ANTISOCIAL WHAT IS ANTISOCIAL BEHAVIOUR BEHAVIOUR Bullying

Anti- -Social Behaviour Statistics Social Behaviour Statistics Anti for Cannock Chase for

Anti-Social Behaviour Anti-Social Behaviour - Anti-social Behaviour, Crime and Policing Act 2014

Program Behaviour Program Behaviour semantics .c .c .c source program code inputs Program

Chomsky Normal Form Chomsky Normal Form Chomsky Normal Form A context free grammar is in

OHSU Tom DeLoughery, MD MACP FAWM Oregon Health and Sciences University Abnormal CBC OHSU

Team Pehal <Volunteers> <Day, Date> <Venue> Book 1 Chapter 1 1. Earn

GDF SUEZ Brasil Gil Maranho Neto Senior Vice President IV Semana de Engenharia Nuclear -

Scope of Briefing Address by Executive Chairman Group Financial Highlights Business

RBC Capital Markets Canadian Banks CEO Conference January 8, 2009 Hosted by: Andre Hardy -

Shopping Tree Urban Shopping Carrier ORANGE A 1 ? User Interviews Our user: Urban grocery

Blending, Modern Hardware Week 12, Mon Apr 2 http://www.ugrad.cs.ubc.ca/~cs314/Vjan2007 Old News

3. Dimensionality Reductjon Chlo-Agathe Azencot Centre for Computatjonal Biology, Mines

Oberseminar 2014 TAMS activities in RobotEra Hannes Bistry University of Hamburg Faculty of

Normal versus abnormal behaviour Charlotte Werger Data Scientist - PowerPoint PPT Presentation

DataCamp Fraud Detection in Python FRAUD DETECTION IN PYTHON Normal versus abnormal behaviour Charlotte Werger Data Scientist DataCamp Fraud Detection in Python Fraud detection without labels Using unsupervised learning to distinguish

E Egocentric Localization: t i L li ti Normal and Abnormal Normal and Abnormal

Linear regression How to measure the accuracy of linear regression models Linear Regression

Session 14 Introduction to Behaviour that Challenges SECTION 5: 1 Behaviour Behaviour that is

Abnormal Uterine Bleeding: Review differential diagnosis and Evaluation of Premenopausal

On Understanding Normal Protocol Behaviour to Detect the Abnormal P. Smith, D. Hutchison, M.

Chapter 6. Object and System Behaviour 1. Object Behaviour Modelling 2. Global System Behaviour

Detecting abnormal events Detecting abnormal events Jaechul Kim Purpose Purpose Introduce

Normal A Spectrum of Engineering Design Normal Radical A Spectrum of Engineering Design Normal

Abnormal Uterine Bleeding: Evaluation of Premenopausal Women Vanessa Jacoby, MD, MAS Assistant

Catherine Lennox EDPS 650 What is prosocial behaviour? How is prosocial behaviour related to

ANTI SOCIAL BEHAVIOUR WHAT IS ANTISOCIAL WHAT IS ANTISOCIAL BEHAVIOUR BEHAVIOUR Bullying

Anti- -Social Behaviour Statistics Social Behaviour Statistics Anti for Cannock Chase for

Anti-Social Behaviour Anti-Social Behaviour - Anti-social Behaviour, Crime and Policing Act 2014

Program Behaviour Program Behaviour semantics .c .c .c source program code inputs Program

Chomsky Normal Form Chomsky Normal Form Chomsky Normal Form A context free grammar is in

OHSU Tom DeLoughery, MD MACP FAWM Oregon Health and Sciences University Abnormal CBC OHSU

Team Pehal &lt;Volunteers&gt; &lt;Day, Date&gt; &lt;Venue&gt; Book 1 Chapter 1 1. Earn

GDF SUEZ Brasil Gil Maranho Neto Senior Vice President IV Semana de Engenharia Nuclear -

Scope of Briefing Address by Executive Chairman Group Financial Highlights Business

RBC Capital Markets Canadian Banks CEO Conference January 8, 2009 Hosted by: Andre Hardy -

Shopping Tree Urban Shopping Carrier ORANGE A 1 ? User Interviews Our user: Urban grocery

Blending, Modern Hardware Week 12, Mon Apr 2 http://www.ugrad.cs.ubc.ca/~cs314/Vjan2007 Old News

3. Dimensionality Reductjon Chlo-Agathe Azencot Centre for Computatjonal Biology, Mines

Oberseminar 2014 TAMS activities in RobotEra Hannes Bistry University of Hamburg Faculty of

Team Pehal <Volunteers> <Day, Date> <Venue> Book 1 Chapter 1 1. Earn