Improving Electric fraud detection using class imbalance strategies - PowerPoint PPT Presentation

Problem description Data Imbalance problem Strategy Proposed Results and Conclusions Improving Electric fraud detection using class imbalance strategies Eng. Federico Decia Eng. Matías Di Martino Eng. Juan Molinelli Prof. Alicia Fernández Instituto de Ingeniería Eléctrica, Facultad de Ingeniería Universidad de la República Montevideo, Uruguay.

Problem description Data Imbalance problem Strategy Proposed Results and Conclusions Introduction

Problem description Data Imbalance problem Strategy Proposed Results and Conclusions Problem description Nontechnical losses represent a very high cost to power supply companies. Background Research in different countries has been made to tackle this problem Ramos et al., 2010 ← Brazil Nagi and Mohamad, 2010 ← Malaysia Muniz et al., 2009 ← Rio de Janeiro, Brazil Uruguay In Uruguay the national electric power company (henceforth call UTE) faces the problem by manually monitoring a group of customers.

Problem description Data Imbalance problem Strategy Proposed Results and Conclusions Problem description Difficulties Big number of customers (only in the capital city there are 500.000) Wide variety of scams and ways to alter consumption meters.

Problem description Data Imbalance problem Strategy Proposed Results and Conclusions Problem description Other factors: Fraud history. Building address and dimensions. Counter type.

Problem description Data Imbalance problem Strategy Proposed Results and Conclusions Problem description Objective Develop an automatic tool that, based on manually labeled data, detect new suspect customers.

Problem description Data Imbalance problem Strategy Proposed Results and Conclusions Problem description Data Description

Problem description Data Imbalance problem Strategy Proposed Results and Conclusions Data DATASET 1 1504 industrial profiles (October 2004- September 2009) Each profile is represented by the customers monthly consumption. Labels for each customer are provided by UTE. Used for training and theoretical evaluation DATASET 2 3300 industrial profiles (January 2008 - January 2011 ) Each profile is represented by the customers monthly consumption. Used for on field evaluation

Problem description Data Imbalance problem Strategy Proposed Results and Conclusions Data

Problem description Data Imbalance problem Strategy Proposed Results and Conclusions Class imbalance problem The class imbalance problem When working on fraud detection field, one can not assume that the number of people who commit fraud are the same than those who do not, usually there are fewer elements from the class who commit fraud.

Problem description Data Imbalance problem Strategy Proposed Results and Conclusions Class imbalance problem The class imbalance problem When working on fraud detection field, one can not assume that the number of people who commit fraud are the same than those who do not, usually there are fewer elements from the class who commit fraud. Strategies Under-Sampling

Problem description Data Imbalance problem Strategy Proposed Results and Conclusions Class imbalance problem The class imbalance problem When working on fraud detection field, one can not assume that the number of people who commit fraud are the same than those who do not, usually there are fewer elements from the class who commit fraud. Strategies Under-Sampling One Class SVM and Cost Sensitive SVM

Problem description Data Imbalance problem Strategy Proposed Results and Conclusions Class imbalance problem The class imbalance problem When working on fraud detection field, one can not assume that the number of people who commit fraud are the same than those who do not, usually there are fewer elements from the class who commit fraud. Strategies Under-Sampling One Class SVM and Cost Sensitive SVM Recall, Precision and F-Value

Problem description Data Imbalance problem Strategy Proposed Results and Conclusions Class imbalance problem TP Recall p = TP + FN TN Recall n = TN + FP TP Precision = TP + FP F value = ( 1 + β 2 ) Recall p × Precision β 2 Recall p + Precision Labeled as Positive Negative Positive TP (True Positive) FN (False Negative) Negative FP (False Positive) TN (True Negative) Table: Confusion matrix

Problem description Data Imbalance problem Strategy Proposed Results and Conclusions Class imbalance problem Strategy proposed

Problem description Data Imbalance problem Strategy Proposed Results and Conclusions Block diagram Block Diagram The system input corresponds to the last three years of the monthly consumption curve of each costumer. Figure: Block Diagram

Problem description Data Imbalance problem Strategy Proposed Results and Conclusions Block diagram Features

Problem description Data Imbalance problem Strategy Proposed Results and Conclusions Features Consumption ratio for the last 3, 6 and 12 months and the average consumption.

Problem description Data Imbalance problem Strategy Proposed Results and Conclusions Features Norm of the difference between the expected consumption and the actual consumption. Figure: Consumptions are estimate based on the consumption of the same month of the previous year multiplied by the ratio between the mean consumption of each year.

Problem description Data Imbalance problem Strategy Proposed Results and Conclusions Features Difference between Fourier and Wavelets coefficients from the last and previous years.

Problem description Data Imbalance problem Strategy Proposed Results and Conclusions Features Difference between the coefficients of the polynomial Figure: Difference in the coefficients of the polynomial that best fits the consumption curve.

Problem description Data Imbalance problem Strategy Proposed Results and Conclusions Features Distance to the mean customer Figure: We compare each consumption curve with the mean curve.

Problem description Data Imbalance problem Strategy Proposed Results and Conclusions Features Variance Figure: Changes in the variance value Figure: Comparison with the variance normal value

Problem description Data Imbalance problem Strategy Proposed Results and Conclusions Features Global Characteristic Figure: Slope of the straight line Figure: Module of the first five that best fits the consumption Fourier coefficients. curve.

Problem description Data Imbalance problem Strategy Proposed Results and Conclusions Features Selection Features Selection Evaluation methods used Filter → CfsSubsetEval (Weka) Wrapper aiming to improve the F-Value on the different classifiers considered. Search methods used Exhaustive search (only for CfsSubsetEval) Best First (for all other methods)

Problem description Data Imbalance problem Strategy Proposed Results and Conclusions Features Selection Classifiers

Problem description Data Imbalance problem Strategy Proposed Results and Conclusions Classifiers Classifiers Classifiers used Classifiers considered to tackle this problem were: One Class Support Vector Machine (O-SVM) Cost-Sensitive Support Vector Machine (CS-SVM) Optimum Path Forest (OPF) Tree C4.5

Problem description Data Imbalance problem Strategy Proposed Results and Conclusions Classifiers SVM Parameters Compromise: O-SVM → ν ; CS-SVM → C Kernel: Gaussian → γ Optimal parameters were found using 10-fold cross validation. The performance was measure by F-value.

Problem description Data Imbalance problem Strategy Proposed Results and Conclusions Classifiers Optimum Path Forest Euclidean distance was used as the distance function. Raw input vectors (instead of the features here proposed). Under-sample the majority class (during the training step of the algorithm) to improve the final performance (in the F-value sense).

Problem description Data Imbalance problem Strategy Proposed Results and Conclusions Classifiers C4.5 The fourth classifier used is a decision tree proposed by Ross Quinlan: C4.5 As with OPF, we under-sample the majority class (during the training step of the algorithm) to improve the final performance (in the F-value sense).

Problem description Data Imbalance problem Strategy Proposed Results and Conclusions Classifiers Combination Combination

Problem description Data Imbalance problem Strategy Proposed Results and Conclusions Classifiers Combination Why? Improve final performance More robust and general solution Decision rule g p ( x ) = λ p os d p os + λ p cs d p cs + λ p OPF d p OPF + λ p Tree d p (1) Tree g n ( x ) = λ n os d n os + λ n cs d n cs + λ n OPF d n OPF + λ n Tree d n (2) Tree d i j ( x ) = 1 if j labels the sample as i and 0 otherwise. g p ( x ) > g n ( x ) → x labeled as positive g p ( x ) < g n ( x ) → x labeled as negative

Problem description Data Imbalance problem Strategy Proposed Results and Conclusions Classifiers Weight Traditional Case (Kuncheva, 2004), weighted majority vote rule: Hypothesis: independence Objective: maximize overall accuracy Accuracy j � � λ j = log 1 − Accuracy j Accuracy j : ratio of correctly classified samples for the j th classifier.

Improving Electric fraud detection using class imbalance strategies - PowerPoint PPT Presentation

Problem description Data Imbalance problem Strategy Proposed Results and Conclusions Improving Electric fraud detection using class imbalance strategies Eng. Federico Decia Eng. Matas Di Martino Eng. Juan Molinelli Prof. Alicia

Fraud Overview Agenda Fraud Overview Fraud Triangle and Red Flags Fraud Prevention

Using text data to detect fraud Charlotte Werger Data Scientist DataCamp Fraud Detection in

Introduction to fraud detection Charlotte Werger Data Scientist DataCamp Fraud Detection in

The Fraud Indicator in the UK Professor Mark Button Centre for Counter Fraud Studies Outline of

Introduction & Motivation Bart Baesens Professor Data Science at KU Leuven DataCamp Fraud

Fraud Prevention: The Prevention and Detection of Fraud Begins with You Takeaways What is

Normal versus abnormal behaviour Charlotte Werger Data Scientist DataCamp Fraud Detection in

The F word: FRAUD Agenda About Internal Audit Audit team Internal Audit office overview

Review of classification methods for fraud detection Charlotte Werger Data Scientist DataCamp

Vandalism Detection on Wikipedia The class imbalance problem & new approaches Paul Gtze

Risky Business: How Companies Fall Victim to Fraud Presented by: Tony Okray Julie Latchaw

Outlier Detection Motivation: Fraud Detection http://i.imgur.com/ckkoAOp.gif Jian Pei: CMPT

Catch them in the Act Fraud Detection in Real-time Seshika Fernando Technical Lead Fraud: A

Ad click fraud detection Christian Benson and Adam Thuvesen Problem Ad click fraud

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

Data Mining for Potential Voter Fraud Findings and Recommendations Does voter fraud exist?

An Introduction to Morse Theory Gianmarco Molino UConn Sigma Seminar 27 July, 2017 Gianmarco

Deep Visual Learning on Hypersphere Weiyang Liu, Zhen Liu College of Computing Georgia

National Digital Library of India Birjhora Mahavidyalaya Date : 29-08-2019 Introduction

ADAMIS: 2007-2012 and beyond R.Stompor for ADAMIS (APC) ADAMIS in

tiny-k Early Intervention Services of Douglas County-consideration becoming part of USD 497

Health Through Collaboration About Me Chris Tilden, Ph.D., MHA Director of Community Health,

Sustainable Agronomy Deb Gangwish PG Farms & The Diamond G June 26, 2018 About Us

Clinical decision making or Evidence based practice Sarah Dean with thanks to Stuart Logan,

Improving Electric fraud detection using class imbalance strategies - PowerPoint PPT Presentation

Problem description Data Imbalance problem Strategy Proposed Results and Conclusions Improving Electric fraud detection using class imbalance strategies Eng. Federico Decia Eng. Matas Di Martino Eng. Juan Molinelli Prof. Alicia

Fraud Overview Agenda Fraud Overview Fraud Triangle and Red Flags Fraud Prevention

Using text data to detect fraud Charlotte Werger Data Scientist DataCamp Fraud Detection in

Introduction to fraud detection Charlotte Werger Data Scientist DataCamp Fraud Detection in

The Fraud Indicator in the UK Professor Mark Button Centre for Counter Fraud Studies Outline of

Introduction &amp; Motivation Bart Baesens Professor Data Science at KU Leuven DataCamp Fraud

Fraud Prevention: The Prevention and Detection of Fraud Begins with You Takeaways What is

Normal versus abnormal behaviour Charlotte Werger Data Scientist DataCamp Fraud Detection in

The F word: FRAUD Agenda About Internal Audit Audit team Internal Audit office overview

Review of classification methods for fraud detection Charlotte Werger Data Scientist DataCamp

Vandalism Detection on Wikipedia The class imbalance problem &amp; new approaches Paul Gtze

Risky Business: How Companies Fall Victim to Fraud Presented by: Tony Okray Julie Latchaw

Outlier Detection Motivation: Fraud Detection http://i.imgur.com/ckkoAOp.gif Jian Pei: CMPT

Catch them in the Act Fraud Detection in Real-time Seshika Fernando Technical Lead Fraud: A

Ad click fraud detection Christian Benson and Adam Thuvesen Problem Ad click fraud

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

Data Mining for Potential Voter Fraud Findings and Recommendations Does voter fraud exist?

An Introduction to Morse Theory Gianmarco Molino UConn Sigma Seminar 27 July, 2017 Gianmarco

Deep Visual Learning on Hypersphere Weiyang Liu*, Zhen Liu* College of Computing Georgia

National Digital Library of India Birjhora Mahavidyalaya Date : 29-08-2019 Introduction

ADAMIS: 2007-2012 and beyond R.Stompor for ADAMIS (APC) ADAMIS in

tiny-k Early Intervention Services of Douglas County-consideration becoming part of USD 497

Health Through Collaboration About Me Chris Tilden, Ph.D., MHA Director of Community Health,

Sustainable Agronomy Deb Gangwish PG Farms &amp; The Diamond G June 26, 2018 About Us

Clinical decision making or Evidence based practice Sarah Dean with thanks to Stuart Logan,

Introduction & Motivation Bart Baesens Professor Data Science at KU Leuven DataCamp Fraud

Vandalism Detection on Wikipedia The class imbalance problem & new approaches Paul Gtze

Deep Visual Learning on Hypersphere Weiyang Liu, Zhen Liu College of Computing Georgia

Sustainable Agronomy Deb Gangwish PG Farms & The Diamond G June 26, 2018 About Us