Anomaly Detection and Categorization Using Unsupervised Deep - - PowerPoint PPT Presentation

anomaly detection and categorization using unsupervised
SMART_READER_LITE
LIVE PREVIEW

Anomaly Detection and Categorization Using Unsupervised Deep - - PowerPoint PPT Presentation

Anomaly Detection and Categorization Using Unsupervised Deep Learning S6340 Thursday 7 th April 2016 GPU Technology Conference A. Stephen McGough , Noura Al Moubayed, Jonathan Cumming, Eduardo Cabrera, Peter Matthews, Toby P . Breckon, Ed


slide-1
SLIDE 1

Anomaly Detection and Categorization Using Unsupervised Deep Learning

S6340 Thursday 7th April 2016 GPU Technology Conference

  • A. Stephen McGough, Noura Al Moubayed, Jonathan Cumming, Eduardo Cabrera,

Peter Matthews, Toby P . Breckon, Ed Ruck-Keene, Georgios Theodoropoulos Durham University, UK

slide-2
SLIDE 2

Intel Parallel Computing Centre

slide-3
SLIDE 3

Why I’m here?

  • UK has a major focus on Academic Impact
  • Researchers collaborating with Industry
  • Durham University has an Impact agenda
  • Which paid for this trip
  • I’m actively seeking collaborations with

Companies / Organizations

slide-4
SLIDE 4

The Problem

  • “90% of all the data in the

world has been generated

  • ver the last two years”… IBM
  • “85% of worldwide data is held

in un-structured formats”… Berry and Kogan

  • How can we understand it? ….or better still make use of it?
  • How can we determine the most pertinent information? …and then act on it?
  • How can we find the needle if we are not sure what it looks like or what hay looks

like?

slide-5
SLIDE 5

Anomaly Detection Framework

Data Pre-processing Topic Modeling Deep Learning Engine Anomalies and abnormal behaviors Presentation of results

slide-6
SLIDE 6

Topic Modelling

This report presents a proof of concept of our approach to solve anomaly detection problems using unsupervised deep learning. The work focuses on two specific models namely deep restricted Boltzmann machines and stacked denoising autoencoders. The approach is tested on two datasets: VAST Newsfeed Data and the Commission for Energy Regulation smart meter project dataset with text data and numeric data

  • respectively. Topic modeling is used for features

extraction from textual data. The results show high correlation between the output of the two modeling

  • techniques. The outliers in energy data detected by the

deep learning model show a clear pattern over the period of recorded data demonstrating the potential of this approach in anomaly detection within big data problems where there is little or no prior knowledge or

  • labels. These results show the potential of using

unsupervised deep learning methods to address anomaly detection problems. For example it could be used to detect suspicious money transactions and help with detection of terrorist funding activities or it could also be applied to the detection of potential criminal or terrorist activity using phone or digital records (e.g. Twitter, Facebook, and email).

Topics

slide-7
SLIDE 7

Topic Modelling

This report presents a proof of concept of our approach to solve anomaly detection problems using unsupervised deep learning. The work focuses on two specific models namely deep restricted Boltzmann machines and stacked denoising autoencoders. The approach is tested on two datasets: VAST Newsfeed Data and the Commission for Energy Regulation smart meter project dataset with text data and numeric data

  • respectively. Topic modeling is used for features

extraction from textual data. The results show high correlation between the output of the two modeling

  • techniques. The outliers in energy data detected by the

deep learning model show a clear pattern over the period of recorded data demonstrating the potential of this approach in anomaly detection within big data problems where there is little or no prior knowledge or

  • labels. These results show the potential of using

unsupervised deep learning methods to address anomaly detection problems. For example it could be used to detect suspicious money transactions and help with detection of terrorist funding activities or it could also be applied to the detection of potential criminal or terrorist activity using phone or digital records (e.g. Twitter, Facebook, and email).

Topics

slide-8
SLIDE 8

Topic Modelling

This report presents a proof of concept of our approach to solve anomaly detection problems using unsupervised deep learning. The work focuses on two specific models namely deep restricted Boltzmann machines and stacked denoising autoencoders. The approach is tested on two datasets: VAST Newsfeed Data and the Commission for Energy Regulation smart meter project dataset with text data and numeric data

  • respectively. Topic modeling is used for features

extraction from textual data. The results show high correlation between the output of the two modeling

  • techniques. The outliers in energy data detected by the

deep learning model show a clear pattern over the period of recorded data demonstrating the potential of this approach in anomaly detection within big data problems where there is little or no prior knowledge or

  • labels. These results show the potential of using

unsupervised deep learning methods to address anomaly detection problems. For example it could be used to detect suspicious money transactions and help with detection of terrorist funding activities or it could also be applied to the detection of potential criminal or terrorist activity using phone or digital records (e.g. Twitter, Facebook, and email).

Topics

slide-9
SLIDE 9

Probabilistic Topic Modelling

  • Unsupervised analysis of text
  • Too many documents to label manually
  • Allows us to uncover automatically themes that are latent

in a collection of documents

  • Same words may have different meanings depending on

their co-occurrence with other words in a document

  • Statistically identify the topics from a set of documents
  • Which words often found in the same document
  • Statistically classify which topics appear in each document
  • Which topics appear in each document

α θ z β w

Topic Document Words in Document

slide-10
SLIDE 10

Anomaly Detection: Unsupervised Deep Learning

Reconstruct Construct

Input Data Reconstructed Input Data

h

v v v v v v

h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h

Deep Restricted Boltzmann Machine (DRBM) – more hidden nodes than visible nodes

Output Data

slide-11
SLIDE 11

Anomaly Detection: Unsupervised Deep Learning

Stacked Denoising Autoencoder (SDA)

  • Less hidden nodes than

visible nodes Reconstruct Construct

Input Data Reconstructed Input Data v v v v v v

h h h h h h h h h h h h h h h h h h h h h h h h h h h

v v v v v v v v v v v v Output Data

slide-12
SLIDE 12

Overall Methodology

(Un)labelled Data Unsupervised Deep Learning Anomalies Anomaly categorisation Supervised Deep Learning Stereotypes Pertinent Activity

Outputs

Benign Pertinent

Inputs

Probabilistic Topic Modelling Text Labelled Data Probabilistic Topic Modelling Text Benign Activity Labelled

slide-13
SLIDE 13

Overall Methodology

(Un)labelled Data Unsupervised Deep Learning Anomalies Anomaly categorisation Supervised Deep Learning Stereotypes Pertinent Activity

Outputs

Benign Pertinent

Inputs

Probabilistic Topic Modelling Text Labelled Data Probabilistic Topic Modelling Text Benign Activity Labelled

slide-14
SLIDE 14

Overall Methodology

(Un)labelled Data Unsupervised Deep Learning Anomalies Anomaly categorisation Supervised Deep Learning Stereotypes Pertinent Activity

Outputs

Benign Pertinent

Inputs

Probabilistic Topic Modelling Text Labelled Data Probabilistic Topic Modelling Text Benign Activity Labelled

slide-15
SLIDE 15

Overall Methodology

(Un)labelled Data Unsupervised Deep Learning Anomalies Anomaly categorisation Supervised Deep Learning Stereotypes Pertinent Activity

Outputs

Benign Pertinent

Inputs

Probabilistic Topic Modelling Text Labelled Data Probabilistic Topic Modelling Text Benign Activity Labelled

slide-16
SLIDE 16

Overall Methodology

(Un)labelled Data Unsupervised Deep Learning Anomalies Anomaly categorisation Supervised Deep Learning Stereotypes Pertinent Activity

Outputs

Benign Pertinent

Inputs

Probabilistic Topic Modelling Text Labelled Data Probabilistic Topic Modelling Text Benign Activity Labelled

slide-17
SLIDE 17

Overall Methodology

(Un)labelled Data Unsupervised Deep Learning Anomalies Anomaly categorisation Supervised Deep Learning Stereotypes Pertinent Activity

Outputs

Benign Pertinent

Inputs

Probabilistic Topic Modelling Text Labelled Data Probabilistic Topic Modelling Text Benign Activity Labelled

slide-18
SLIDE 18

Overall Methodology

(Un)labelled Data Unsupervised Deep Learning Anomalies Anomaly categorisation Supervised Deep Learning Stereotypes Pertinent Activity

Outputs

Benign Pertinent

Inputs

Probabilistic Topic Modelling Text Labelled Data Probabilistic Topic Modelling Text Benign Activity Labelled

slide-19
SLIDE 19

Overall Methodology

(Un)labelled Data Unsupervised Deep Learning Anomalies Anomaly categorisation Supervised Deep Learning Stereotypes Pertinent Activity

Outputs

Benign Pertinent

Inputs

Probabilistic Topic Modelling Text Labelled Data Probabilistic Topic Modelling Text Benign Activity Labelled

slide-20
SLIDE 20

Overall Methodology

(Un)labelled Data Unsupervised Deep Learning Anomalies Anomaly categorisation Supervised Deep Learning Stereotypes Pertinent Activity

Outputs

Benign Pertinent

Inputs

Probabilistic Topic Modelling Text Labelled Data Probabilistic Topic Modelling Text Benign Activity Labelled

slide-21
SLIDE 21

Overall Methodology

(Un)labelled Data Unsupervised Deep Learning Anomalies Anomaly categorisation Supervised Deep Learning Stereotypes Pertinent Activity

Outputs

Benign Pertinent

Inputs

Probabilistic Topic Modelling Text Labelled Data Probabilistic Topic Modelling Text Benign Activity Labelled

slide-22
SLIDE 22

Overall Methodology

(Un)labelled Data Unsupervised Deep Learning Anomalies Anomaly categorisation Supervised Deep Learning Stereotypes Pertinent Activity

Outputs

Benign Pertinent

Inputs

Probabilistic Topic Modelling Text Labelled Data Probabilistic Topic Modelling Text Benign Activity Labelled

slide-23
SLIDE 23

Example: Anomaly Identification

SPAM and HAM in SMS

  • Auto-identification of SPAM

from HAM in SMS messages

  • 5574 SMS messages processed
  • 4827 HAM messages
  • 747 SPAM messages
slide-24
SLIDE 24

Anomaly Identification

SPAM and HAM in SMS

slide-25
SLIDE 25

Comparison

Classifier SC% BH% Acc% MCC% TM+SDA 85.59 0.62 97.51 0.899 Logistic Reg. + tok2 95.48 2.09 97.59 0.899 SVM + tok1 83.10 0.18 97.64 0.893 Boosted NB + tok2 84.48 0.53 97.50 0.887 SMO + tok2 82.91 0.29 97.50 0.887 Boosted C4.5 + tok2 81.53 0.62 97.05 0.865 MDL + tok1 75.44 0.35 96.26 0.826 PART + tok2 78.00 1.45 95.87 0.810 Random Forest + tok2 65.23 0.12 95.36 0.782 C4.5 + tok2 75.25 2.08 95.00 0.770 Bern NB + tok1 54.03 0.00 94.00 0.711 MN TF NB + tok1 52.06 0.00 93.74 0.697 MN Bool NB + tok1 51.87 0.00 93.72 0.695 1NN + tok2 43.81 0.00 92.70 0.636 Basic NB + tok1 48.53 1.42 92.05 0.600 Gauss NB + tok1 47.54 1.39 91.95 0.594 1Flex NB + tok1 47.35 2.77 90.72 0.536 Boolean NB + tok1 98.04 26.01 77.13 0.507 3NN + tok2 23.77 0.00 90.10 0.462 EM + tok2 17.09 4.18 85.54 0.185 TR 0.00 0.00 86.95

  • SC% - SPAM Caught

BH% - Blocked HAM Acc% - Accuracy MCC% - Mathews Correlation Coefficient

slide-26
SLIDE 26

Performance

  • Approach is computationally intensive
  • Need to reduce execution time to tractable level
  • Use of GPGPUs to improve the performance of the framework
  • Have been used previously with Deep Learning showing significant

benefits

  • But focused on Dense Data (images / sound)
  • This is a sparse data problem
slide-27
SLIDE 27

Execution Time

Batch Size Batch Size Time (s) Time (s)

500 1000 1500 2000 2500 3000 10 100 1000 2000 SDA - CPU SDA - GPGPU DRBM - CPU DRBM - GPGPU 50 100 150 200 250 300 350 10 100 1000 2000 SDA - CPU SDA - GPGPU DRBM - CPU DRBM - GPGPU

Data Size 160MB Data Size 1.7GB System: Intel Xeon E5-2650 v3 2.3GHz, 64GB RAM, 2 x 300GB 15k RPM SAS GPU: NVIDIA K40 Dataset: Electric meter readings from Ireland

slide-28
SLIDE 28

2 4 6 8 10 12 10 100 1000 SDA 160MB SDA 1.7GB DRBM 160MB DRBM 1.7GB

Speedup

Batch Size Speedup

slide-29
SLIDE 29

Possible Applications:

  • Terrorist activity tracking
  • Acting out of character, predicting activity
  • Ship/Flight tracking data
  • Hijacking, Flight deviations
  • Police crime database
  • Criminal profiling, acting out of character
  • Unwanted information release
  • Topic changes, specific damaging

subjects,(e.g Wiki Leaks)

  • Student applications
  • Identifying bogus attempts for visa
  • Social media Tracking
  • Social grooming, political persuasion
  • Safety camera tracks
  • Normal movements of people in area
  • Illegal financial transactions
  • Fraud, laundering
  • A. Stephen McGough, Noura Al Moubayed, Jonathan Cumming, Eduardo Cabrera, Peter Matthews,

Toby P. Breckon, Ed Ruck-Keene, Georgios Theodoropoulos Durham University, UK stephen.mcgough@durham.ac.uk