Anomaly Detection and Categorization Using Unsupervised Deep - PowerPoint PPT Presentation

Anomaly Detection and Categorization Using Unsupervised Deep Learning S6340 Thursday 7 th April 2016 GPU Technology Conference A. Stephen McGough , Noura Al Moubayed, Jonathan Cumming, Eduardo Cabrera, Peter Matthews, Toby P . Breckon, Ed Ruck-Keene, Georgios Theodoropoulos Durham University, UK

Intel Parallel Computing Centre

Why I’m here? • UK has a major focus on Academic Impact • Researchers collaborating with Industry • Durham University has an Impact agenda • Which paid for this trip • I’m actively seeking collaborations with Companies / Organizations

The Problem • “90% of all the data in the world has been generated over the last two years”… IBM • “85% of worldwide data is held in un-structured formats”… Berry and Kogan • How can we understand it? ….or better still make use of it? • How can we determine the most pertinent information? …and then act on it? • How can we find the needle if we are not sure what it looks like or what hay looks like?

Anomaly Detection Framework Data Pre-processing Topic Modeling Deep Learning Engine Anomalies and abnormal behaviors Presentation of results

Topic Modelling This report presents a proof of concept of our approach to solve anomaly detection problems using unsupervised deep learning. The work focuses on two specific models namely deep restricted Boltzmann machines and stacked denoising autoencoders. The approach is tested on two datasets: VAST Newsfeed Data and the Commission for Energy Regulation smart meter project dataset with text data and numeric data respectively. Topic modeling is used for features extraction from textual data. The results show high correlation between the output of the two modeling techniques. The outliers in energy data detected by the deep learning model show a clear pattern over the period of recorded data demonstrating the potential of this approach in anomaly detection within big data problems where there is little or no prior knowledge or labels. These results show the potential of using unsupervised deep learning methods to address anomaly detection problems. For example it could be used to detect suspicious money transactions and help with detection of terrorist funding activities or it could also be applied to the detection of potential criminal or terrorist activity using phone or digital records (e.g. Twitter, Facebook, and email). Topics

α Probabilistic Topic Modelling Document θ • Unsupervised analysis of text • Too many documents to label manually • Allows us to uncover automatically themes that are latent Words in Document z in a collection of documents • Same words may have different meanings depending on their co-occurrence with other words in a document • Statistically identify the topics from a set of documents w • Which words often found in the same document • Statistically classify which topics appear in each document • Which topics appear in each document β Topic

Anomaly Detection: Unsupervised Deep Learning h h h h h h h h h h h h h h h h h h h h h h Reconstruct Construct h h h h h h h h h h v v v v v v Reconstructed Input Data Input Data Output Data Deep Restricted Boltzmann Machine (DRBM) – more hidden nodes than visible nodes

Anomaly Detection: Unsupervised Deep Learning h h h h h h h h h h h h h Reconstruct Construct h h h h h h h h h h h h h h v v v v v v v v v v v v v v v v v v Stacked Denoising Reconstructed Input Data Output Data Input Data Autoencoder (SDA) - Less hidden nodes than visible nodes

Overall Methodology Inputs Outputs Probabilistic Text Topic Modelling (Un)labelled Unsupervised Anomalies Data Deep Learning Pertinent Pertinent Activity Labelled Anomaly categorisation Benign Benign Activity Probabilistic Text Topic Modelling Labelled Supervised Stereotypes Data Deep Learning

Anomaly Detection and Categorization Using Unsupervised Deep - PowerPoint PPT Presentation

Anomaly Detection and Categorization Using Unsupervised Deep Learning S6340 Thursday 7 th April 2016 GPU Technology Conference A. Stephen McGough , Noura Al Moubayed, Jonathan Cumming, Eduardo Cabrera, Peter Matthews, Toby P . Breckon, Ed

What is an anomaly? Alastair Rushworth Data Scientist DataCamp Anomaly Detection in R Defining

Isolation trees Alastair Rushworth Data Scientist DataCamp Anomaly Detection in R Isolation

Anomaly Detection of Trajectories Junier B. Oliva Anomaly Detection An anomaly (or outlier)

Categorization Categorization is the basis of structure and meaning in our world. We

Anomaly Detection Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824 Administrative

Anomaly Based Network Intrusion Detection with Unsupervised Outlier Detection Jiong Zhang and

Data Mining II Anomaly Detection Heiko Paulheim Anomaly Detection Also known as Outlier

Learning Rules for Anomaly Detection (LERAD) of Hostile Network Traffic Matt Mahoney Overview

Data Mining II Anomaly Detection Heiko Paulheim Anomaly Detection Also known as Outlier

Text Categorization (I) Luo Si Department of Computer Science Purdue University Text

UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING UNSUPERVISED LEARNING Supervised

Dataflow Anomaly Detection Presented By Archana Viswanath Computer Science and Engineering The

Structure of Talk Workload-sensitive Timing Behavior Anomaly Detection 1 Motivation in Large

<Title> Yiqun Hu, SP Group Agenda Condition monitoring & anomaly detection

In Incorporating Feedback in into Tree-based Anomaly Detection Shubhomoy Das, Weng-Keen Wong,

Automatic Categorization of Query Results SIGMOD 04 . Kaushik Chakrabarti 1 S. Surajit

Ameloblastic Carcinoma: Presentation of Two Case Reports Shruti Singh 1* , Jaya Singh 1 , Shaleen

Remarks by Commissioner Julie Brill United States Federal Trade Commission Conference of Western

Alpha Presentation Defeating Malware Payload Obfuscation The Capstone Experience Team Proofpoint

Pattern of Breast Cancer Presentation Faryal Azhar, Tausief Fatima, Tabinda Aqsa, Usman Qureshi,

Combined Radiology and Pathology Classification of Brain Tumors Rozpoznanie guza mzgu na

W.F. Sensakovic, PhD, DABR, MRSC Attendees/trainees should not construe any of the discussion or

May 2018 18 year old female with a palpable abdominal mass By Michael Gange Lake Erie College of

41a Pathology: Reproductive System 41a Pathology: Reproductive System Class Outline 5

Anomaly Detection and Categorization Using Unsupervised Deep - PowerPoint PPT Presentation

Anomaly Detection and Categorization Using Unsupervised Deep Learning S6340 Thursday 7 th April 2016 GPU Technology Conference A. Stephen McGough , Noura Al Moubayed, Jonathan Cumming, Eduardo Cabrera, Peter Matthews, Toby P . Breckon, Ed

What is an anomaly? Alastair Rushworth Data Scientist DataCamp Anomaly Detection in R Defining

Isolation trees Alastair Rushworth Data Scientist DataCamp Anomaly Detection in R Isolation

Anomaly Detection of Trajectories Junier B. Oliva Anomaly Detection An anomaly (or outlier)

Categorization Categorization is the basis of structure and meaning in our world. We

Anomaly Detection Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824 Administrative

Anomaly Based Network Intrusion Detection with Unsupervised Outlier Detection Jiong Zhang and

Data Mining II Anomaly Detection Heiko Paulheim Anomaly Detection Also known as Outlier

Learning Rules for Anomaly Detection (LERAD) of Hostile Network Traffic Matt Mahoney Overview

Data Mining II Anomaly Detection Heiko Paulheim Anomaly Detection Also known as Outlier

Text Categorization (I) Luo Si Department of Computer Science Purdue University Text

UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING UNSUPERVISED LEARNING Supervised

Dataflow Anomaly Detection Presented By Archana Viswanath Computer Science and Engineering The

Structure of Talk Workload-sensitive Timing Behavior Anomaly Detection 1 Motivation in Large

&lt;Title&gt; Yiqun Hu, SP Group Agenda Condition monitoring &amp; anomaly detection

In Incorporating Feedback in into Tree-based Anomaly Detection Shubhomoy Das, Weng-Keen Wong,

Automatic Categorization of Query Results SIGMOD 04 . Kaushik Chakrabarti 1 S. Surajit

Ameloblastic Carcinoma: Presentation of Two Case Reports Shruti Singh 1* , Jaya Singh 1 , Shaleen

Remarks by Commissioner Julie Brill United States Federal Trade Commission Conference of Western

Alpha Presentation Defeating Malware Payload Obfuscation The Capstone Experience Team Proofpoint

Pattern of Breast Cancer Presentation Faryal Azhar, Tausief Fatima, Tabinda Aqsa, Usman Qureshi,

Combined Radiology and Pathology Classification of Brain Tumors Rozpoznanie guza mzgu na

W.F. Sensakovic, PhD, DABR, MRSC Attendees/trainees should not construe any of the discussion or

May 2018 18 year old female with a palpable abdominal mass By Michael Gange Lake Erie College of

41a Pathology: Reproductive System 41a Pathology: Reproductive System Class Outline 5

<Title> Yiqun Hu, SP Group Agenda Condition monitoring & anomaly detection