Introduction & Motivation Bart Baesens Professor Data Science - PowerPoint PPT Presentation

DataCamp Fraud Detection in R FRAUD DETECTION IN R Introduction & Motivation Bart Baesens Professor Data Science at KU Leuven

DataCamp Fraud Detection in R Instructors

DataCamp Fraud Detection in R What is fraud? Fraud is an uncommon, well-considered, imperceptibly concealed, time-evolving and often carefully organized crime which appears in many types and forms.

DataCamp Fraud Detection in R Impact of fraud Fraud is very rare, but cost of not detecting fraud can be huge! Examples: Organizations lose 5% of their yearly revenues to fraud Money lost by businesses to fraud > $3.5 trillion each year Credit card companies lose approximately 7 cents per $100 of transactions due to fraud Fraud takes up 5-10% of the claim amounts paid for non-life insurance

DataCamp Fraud Detection in R Types of fraud Anti-money laundering Online fraud Check fraud Product warranty fraud (Credit) card fraud Tax evasion Click fraud Telecommunication fraud Customs fraud Theft of inventory Counterfeit Threat Identity theft Ticket fraud Insurance fraud Transit faud Mortgage fraud Wire fraud Non-delivery fraud Workers compensation fraud

DataCamp Fraud Detection in R Key characteristics of successful fraud analytics models Statistical accuracy

DataCamp Fraud Detection in R Key characteristics of successful fraud analytics models Statistical accuracy Interpretability

DataCamp Fraud Detection in R Key characteristics of successful fraud analytics models Statistical accuracy Interpretability Regulatory compliance

DataCamp Fraud Detection in R Key characteristics of successful fraud analytics models Statistical accuracy Interpretability Regulatory compliance Economical impact

DataCamp Fraud Detection in R Key characteristics of successful fraud analytics models Statistical accuracy Interpretability Regulatory compliance Economical cost Complement expert based approaches with data-driven techniques

DataCamp Fraud Detection in R Challenges of fraud detection model Imbalance e.g. in credit card fraud < 0.5% frauds typically

DataCamp Fraud Detection in R Challenges of fraud detection model Imbalance e.g. in credit card fraud < 0.5% frauds typically Operational efficiency e.g. in credit card fraud < 8 seconds decision time

DataCamp Fraud Detection in R Challenges of fraud detection model Imbalance e.g. in credit card fraud < 0.5% frauds typically Operational efficiency e.g. in credit card fraud < 8 seconds decision time Avoid harassing good customers

DataCamp Fraud Detection in R Imbalanced data After a major storm, an insurance company received many claims Fraudulent claims are labeled with 1 and legitimate claims with 0 The percentage of fraud cases in the data can be determined by using the functions table() and prop.table() prop.table(table()) to determine percentage of fraud > prop.table(table(fraud_label)) 0 1 0.9911 0.0089

DataCamp Fraud Detection in R Imbalanced data Visualize imbalance with pie chart > labels <- c("no fraud", "fraud") > labels <- paste(labels, round(100*prop.table(table(fraud_label)), 2)) > labels <- paste0(labels, "%") > pie(table(fraud_label), labels, col = c("blue", "red"), main = "Pie chart of storm claims")

DataCamp Fraud Detection in R Evaluation of supervised method: confusion matrix

DataCamp Fraud Detection in R Confusion matrix: claims example Suppose no detection model is used, so all claims are considered as legitimate: > predictions <- rep.int(0, nrow(claims)) > predictions <- factor(predictions, levels = c("no fraud", "fraud")) Function confusionMatrix() from package caret: > library(caret) > confusionMatrix(data = predictions, reference = fraud_label) Confusion Matrix and Statistics Reference Prediction 0 1 0 614 14 1 0 0 Accuracy : 0.9777

DataCamp Fraud Detection in R Total cost of not detecting fraud: claims example Total cost of fraud defined as the sum of fraudulent amounts Total cost if no fraud is detected: > total_cost <- sum(claim_amount[fraud_label == "fraud"]) > print(total_cost) [1] 2301508

DataCamp Fraud Detection in R FRAUD DETECTION IN R Let's practice!

DataCamp Fraud Detection in R FRAUD DETECTION IN R Time features Bart Baesens Professor Data Science at KU Leuven

DataCamp Fraud Detection in R Analyzing time Certain events are expected to occur at similar moments in time Example: customer making transactions at similar hours Aim: capture information about the time aspect by meaningful features Dealing with time can be tricky 00:00 = 24:00 no natural ordering: 23:00 < , > 01:00?

DataCamp Fraud Detection in R Mean of timestamps Do not use arithmetic mean to compute an average timestamp! Example: transaction made at 01:00, 02:00, 21:00 and 22:00 arithmetic mean is 11:30, but no transfer was made close to that time! > data(timestamps) > head(timestamps) [1] "20:27:28" "21:08:41" "01:30:16" "00:57:04" "23:12:14" "22:54:16" Convert digital timestamps to decimal format in hours > library(lubridate) > ts <- as.numeric(hms(timestamps)) / 3600 > head(ts) [1] 20.4577778 21.1447222 1.5044444 0.9511111 23.2038889 22.9044444

DataCamp Fraud Detection in R Circular histogram > library(ggplot2) > clock <- ggplot(data.frame(ts), aes(x = ts)) + geom_histogram(breaks = seq(0, 24), colour = "blue", fill = "lightblue") + coord_polar() > arithmetic_mean <- mean(ts) > clock + geom_vline(xintercept = arithmetic_mean, linetype = 2, color = "red", size = 2)

DataCamp Fraud Detection in R Circular histogram with arithmetic mean

DataCamp Fraud Detection in R von Mises distribution Model time as a periodic variable using the von Mises probability distribution (Correa Bahnsen et al., 2016) Periodic normal distribution = normal distribution wrapped around a circle von Mises distribution of a set of timestamps D = { t , t ,… , t } 1 2 n D ∼ vonM ises μ , κ ( ) μ : periodic mean, measure of location, distribution is clustered around μ 1/ κ : periodic variance; κ is a measure of concentration

DataCamp Fraud Detection in R Estimating parameters μ and κ # Convert the decimal timestamps to class "circular" > library(circular) > ts <- circular(ts, units = "hours", template = "clock24") > head(ts) Circular Data: [1] 20.457889 21.144607 1.504422 0.950982 23.203917 4.904397 > estimates <- mle.vonmises(ts) > p_mean <- estimates$mu %% 24 > concentration <- estimates$kappa

DataCamp Fraud Detection in R Circular histogram with periodic mean

DataCamp Fraud Detection in R Confidence interval Extract new features: confidence interval for the time of a transaction S = { x time ∣ i = 1,… , n } : set of transactions made by the same customer i (1) Estimate μ ( S ) and κ ( S ) based on S : > estimates <- mle.vonmises(ts) > p_mean <- estimates$mu %% 24 > concentration <- estimates$kappa (2) Calculate the density (= likelihood) of the timestamps for the estimated von Mises distribution: > densities <- dvonmises(ts, mu = p_mean, kappa = concentration)

DataCamp Fraud Detection in R Feature extraction Binary feature if a new transaction time is within the confidence interval (CI) with probability α (e.g. 0.90, 0.95) Timestamp is within 90% CI if its density is larger than the cutoff value: > alpha <- 0.90 > quantile <- qvonmises((1 - alpha)/2, mu = p_mean, kappa = concentration) %% 24 > cutoff <- dvonmises(quantile, mu = p_mean, kappa = concentration) Binary time feature: TRUE if timestamp lies inside CI, FALSE otherwise > time_feature <- densities >= cutoff

DataCamp Fraud Detection in R Confidence interval

DataCamp Fraud Detection in R Example

DataCamp Fraud Detection in R Confidence interval with moving time window > print(ts) [1] 18.42 20.45 20.88 0.75 19.20 23.65 6.08 > time_feature = c(NA, NA) > for (i in 3:length(ts)) { # Previous timestamps ts_history <- ts[1:(i-1)] # Estimate mu and kappa on historic timestamps estimates <- mle.vonmises(ts_history) p_mean <- estimates$mu %% 24 concentration <- estimates$kappa # Estimate density of current timestamp dens_i <- dvonmises(ts[i], mu = p_mean, kappa = concentration) # Check if density is larger than cutoff with confidence level 90% alpha <- 0.90 quantile <- qvonmises((1-alpha)/2, mu=p_mean, kappa=concentration) %% 24 cutoff <- dvonmises(quantile, mu = p_mean, kappa = concentration) time_feature[i] <- dens_i >= cutoff } > print(time_feature) [1] NA NA TRUE FALSE TRUE TRUE FALSE

DataCamp Fraud Detection in R FRAUD DETECTION IN R Let's practice!

DataCamp Fraud Detection in R FRAUD DETECTION IN R Frequency features Tim Verdonck Professor Data Science at KU Leuven

Introduction & Motivation Bart Baesens Professor Data Science - PowerPoint PPT Presentation

DataCamp Fraud Detection in R FRAUD DETECTION IN R Introduction & Motivation Bart Baesens Professor Data Science at KU Leuven DataCamp Fraud Detection in R Instructors DataCamp Fraud Detection in R Instructors DataCamp Fraud

Sketch Model Review MotoThresher Empowering Tanzanian Farmers Motivation Motivation

with Polynomial Filters Josiah Manson and Scott Schaefer Texas A&M University Motivation

Bringing Portraits to Life CS448V: Lecture 13 Motivation Motivation Motivation Bring Your

Motivation: Theory & practice 2017-18 I MPORTANCE OF MOTIVATION Employees may lack

5. Motivation Motivation: Big Questions Where does motivation come from? Can

Indoor Places Lukas Kuster Motivation GPS for localization [7] 2 Motivation Indoor

Two possible motivations: 2 / 23 Introduction Hypothesis Motivation Generic I0 Thesis

Introduction Introduction Introduction Introduction Outline Motivation Failures

UBER RUSH AND REBUILDING UBERS DISPATCHING PLATFORM motivation CHAPTER 1 OF 8 MOTIVATION

MOTIVATION MOTIVATION Dr. M. Thenmozhi Professor Department of Management Studies Indian

Video Analytics Xavier Gir-i-Nieto Motivation 2 Motivation 3 Motivation 4 Outline 1.

INTRODUCTION INTRODUCTION INTRODUCTION INTRODUCTION INTRODUCTION INTRODUCTION INTRODUCTION

Time Frequency Analysis Overview Introduction and Motivation Introduction and motivation r x (

Knowledge Graph Completion Introduction and motivation We have our constructed knowledge

INTRODUCTION AND INTRODUCTION AND MOTIVATION MOTIVATION Christian Kaestner 1 LECTURE

Theory of Computer Science May 6, 2020 E1. Complexity Theory: Motivation and Introduction

Infrared Spectroscopy Sample IR Spectrum: ! General Theory of IR Spectroscopy ! Overview of the IR

Continuous Profiling: (It's 10:43; Do You Know Where Your Cycles Are?) Jennifer Anderson Lance

SI485i : NLP Set 4 Smoothing Language Models Fall 2013 : Chambers Review: evaluating n-gram

Optical clocks with trapped ions and search for temporal variations of fundamental constants E.

Frequency Counts Frequency Counts over over Data Streams Data Streams Gurmeet Singh Manku

Today Digital filters and signal processing Filter examples and properties FIR filters

A Distributed Dynamic Frequency Allocation Algorithm Behtash Babadi and Vahid Tarokh School of

New Communications Repeater Connector ON/OFF Power Antenna Rhotheta RT-600 and

Introduction & Motivation Bart Baesens Professor Data Science - PowerPoint PPT Presentation

DataCamp Fraud Detection in R FRAUD DETECTION IN R Introduction & Motivation Bart Baesens Professor Data Science at KU Leuven DataCamp Fraud Detection in R Instructors DataCamp Fraud Detection in R Instructors DataCamp Fraud

Sketch Model Review MotoThresher Empowering Tanzanian Farmers Motivation Motivation

with Polynomial Filters Josiah Manson and Scott Schaefer Texas A&amp;M University Motivation

Bringing Portraits to Life CS448V: Lecture 13 Motivation Motivation Motivation Bring Your

Motivation: Theory &amp; practice 2017-18 I MPORTANCE OF MOTIVATION Employees may lack

5. Motivation Motivation: Big Questions Where does motivation come from? Can

Indoor Places Lukas Kuster Motivation GPS for localization [7] 2 Motivation Indoor

Two possible motivations: 2 / 23 Introduction Hypothesis Motivation Generic I0 Thesis

Introduction Introduction Introduction Introduction Outline Motivation Failures

UBER RUSH AND REBUILDING UBERS DISPATCHING PLATFORM motivation CHAPTER 1 OF 8 MOTIVATION

MOTIVATION MOTIVATION Dr. M. Thenmozhi Professor Department of Management Studies Indian

Video Analytics Xavier Gir-i-Nieto Motivation 2 Motivation 3 Motivation 4 Outline 1.

INTRODUCTION INTRODUCTION INTRODUCTION INTRODUCTION INTRODUCTION INTRODUCTION INTRODUCTION

Time Frequency Analysis Overview Introduction and Motivation Introduction and motivation r x (

Knowledge Graph Completion Introduction and motivation We have our constructed knowledge

INTRODUCTION AND INTRODUCTION AND MOTIVATION MOTIVATION Christian Kaestner 1 LECTURE

Theory of Computer Science May 6, 2020 E1. Complexity Theory: Motivation and Introduction

Infrared Spectroscopy Sample IR Spectrum: ! General Theory of IR Spectroscopy ! Overview of the IR

Continuous Profiling: (It's 10:43; Do You Know Where Your Cycles Are?) Jennifer Anderson Lance

SI485i : NLP Set 4 Smoothing Language Models Fall 2013 : Chambers Review: evaluating n-gram

Optical clocks with trapped ions and search for temporal variations of fundamental constants E.

Frequency Counts Frequency Counts over over Data Streams Data Streams Gurmeet Singh Manku

Today Digital filters and signal processing Filter examples and properties FIR filters

A Distributed Dynamic Frequency Allocation Algorithm Behtash Babadi and Vahid Tarokh School of

New Communications Repeater Connector ON/OFF Power Antenna Rhotheta RT-600 and

with Polynomial Filters Josiah Manson and Scott Schaefer Texas A&M University Motivation

Motivation: Theory & practice 2017-18 I MPORTANCE OF MOTIVATION Employees may lack