Haoyi Fan 1 , Fengbin Zhang 1 , Ruidong Wang 1 , Liang Xi 1 , Zuoyong - - PowerPoint PPT Presentation

โ–ถ
haoyi fan 1 fengbin zhang 1 ruidong wang 1
SMART_READER_LITE
LIVE PREVIEW

Haoyi Fan 1 , Fengbin Zhang 1 , Ruidong Wang 1 , Liang Xi 1 , Zuoyong - - PowerPoint PPT Presentation

-1- Correlation-aware Deep Generative Model for Unsupervised Anomaly Detection Haoyi Fan 1 , Fengbin Zhang 1 , Ruidong Wang 1 , Liang Xi 1 , Zuoyong Li 2 Harbin University of Science and Technology 1 Minjiang University 2 isfanhy@hrbust.edu.cn


slide-1
SLIDE 1
  • 1-

Correlation-aware Deep Generative Model for Unsupervised Anomaly Detection Haoyi Fan 1, Fengbin Zhang 1, Ruidong Wang 1, Liang Xi 1, Zuoyong Li 2

Harbin University of Science and Technology 1 Minjiang University 2 isfanhy@hrbust.edu.cn

slide-2
SLIDE 2
  • 2-

Background

Latent Space

Anomaly

Anomaly

Normal

Observed Space

slide-3
SLIDE 3
  • 3-

Background

https://www.explosion.com/135494/5-effective-strategies-of-fraud- detection-and-prevention-for-ecommerce/ https://blog.exporthub.com/working-with-chinese-manufacturers/ https://planforgermany.com/switching-private-public-health-insurance- germany/

Fraud Detection Disease Detection Fault Detection Intrusion Detection

https://towardsdatascience.com/building-an-intrusion-detection-system- using-deep-learning-b9488332b321

slide-4
SLIDE 4
  • 4-

Background

Latent Space

Unsupervised Anomaly Detection

โ€“ From the Density Estimation Perspective Data samples: ๐‘Œ๐‘ข๐‘ ๐‘๐‘—๐‘œ= ๐‘ฆ1, ๐‘ฆ2, ๐‘ฆ3, โ€ฆ , ๐‘ฆ๐‘œ , ๐‘ฆ๐‘— is assumed normal.

slide-5
SLIDE 5
  • 5-

Background

Latent Space

Unsupervised Anomaly Detection

โ€“ From the Density Estimation Perspective Data samples: ๐‘Œ๐‘ข๐‘ ๐‘๐‘—๐‘œ = ๐‘ฆ1, ๐‘ฆ2, ๐‘ฆ3, โ€ฆ , ๐‘ฆ๐‘œ , ๐‘ฆ๐‘— is assumed normal. Model: ๐‘ž(๐‘ฆ)

slide-6
SLIDE 6
  • 6-

Background

Latent Space

Unsupervised Anomaly Detection

โ€“ From the Density Estimation Perspective Data samples: ๐‘Œ๐‘ข๐‘ ๐‘๐‘—๐‘œ = ๐‘ฆ1, ๐‘ฆ2, โ€ฆ , ๐‘ฆ๐‘œ , ๐‘ฆ๐‘— is assumed normal. Model: ๐‘ž(๐‘ฆ) Test samples: ๐‘Œ๐‘ข๐‘“๐‘ก๐‘ข = ๐‘ฆ1, ๐‘ฆ2, โ€ฆ , ๐‘ฆ๐‘œ , ๐‘ฆ๐‘ข is unknow. if ๐‘ž(๐‘ฆ๐‘ข) < ๐œ‡, ๐‘ฆ๐‘ข is abnormal. if ๐‘ž(๐‘ฆ๐‘ข) โ‰ฅ ๐œ‡, ๐‘ฆ๐‘ข is normal.

slide-7
SLIDE 7
  • 7-

Background

Unsupervised Anomaly Detection

โ€“ From the Density Estimation Perspective Data samples: ๐‘Œ๐‘ข๐‘ ๐‘๐‘—๐‘œ = ๐‘ฆ1, ๐‘ฆ2, โ€ฆ , ๐‘ฆ๐‘œ , ๐‘ฆ๐‘— is assumed normal. Model: ๐‘ž(๐‘ฆ) Test samples: ๐‘Œ๐‘ข๐‘“๐‘ก๐‘ข = ๐‘ฆ1, ๐‘ฆ2, โ€ฆ , ๐‘ฆ๐‘œ , ๐‘ฆ๐‘ข is unknow. if ๐‘ž(๐‘ฆ๐‘ข) < ๐œ‡, ๐‘ฆ๐‘ข is abnormal. if ๐‘ž(๐‘ฆ๐‘ข) โ‰ฅ ๐œ‡, ๐‘ฆ๐‘ข is normal.

Latent Space

Anomalies reside in the low probability density areas.

slide-8
SLIDE 8
  • 8-

Background

Correlation among data samples

Conventional Feature Learning Correlation-aware Feature Learning Anomaly Detection Anomaly Detection Feature Space Structure Space Graph Modeling

How to discover the normal pattern from both the feature level and structural level ?

slide-9
SLIDE 9
  • 9-

Problem Statement

Anomaly Detection

Given a set of input samples ๐“จ = {๐‘ฆ๐‘—|๐‘— = 1, . . . , ๐‘‚}, each of which is associated with a ๐บ dimension feature ๐˜๐‘— โˆˆ โ„๐บ, we aim to learn a score function ๐‘ฃ(๐˜๐‘—): โ„๐บ โ†ฆ โ„, to classify sample ๐‘ฆ๐‘— based on the threshold ๐œ‡ : ๐‘ง๐‘— = {1, ๐‘—๐‘” ๐‘ฃ(๐˜๐‘—) โ‰ฅ ๐œ‡, 0, ๐‘๐‘ขโ„Ž๐‘“๐‘ ๐‘ฅ๐‘—๐‘ก๐‘“. where ๐‘ง๐‘— denotes the label of sample ๐‘ฆ๐‘—, with 0 being the normal class and 1 the anomalous class.

Notations

๐“— : Graph. ๐“ฆ : Set of nodes in a graph. ๐“• : Set of edges in a graph. ๐‘‚: Number of nodes. ๐บ : Dimension of attribute. ๐ โˆˆ โ„๐‘‚ร—๐‘‚ : Adjacency matrix

  • f a network.

๐˜ โˆˆ โ„๐‘‚ร—๐บ : Feature matrix of all nodes.

slide-10
SLIDE 10
  • 10-

Method

CADGMM

Dual-Encoder Estimation network Feature Decoder Graph Construction

slide-11
SLIDE 11
  • 11-

Method

CADGMM

Graph Construction K-Nearest Neighbor e.g. K=5

Original feature: ๐“จ = {๐‘ฆ๐‘—|๐‘— = 1, . . . , ๐‘‚} Find neighbors by K-NN: เตŸ ๐“ž๐‘— = {๐‘ฆ๐‘—๐‘™|๐‘™ = 1, . . . , ๐ฟ Model correlation as graph: ๐“— = {๐“ฆ, ๐“•, ๐˜} ๐“ฆ = {๐‘ค๐‘— = ๐‘ฆ๐‘—|๐‘— = 1, . . . , ๐‘‚} ๐“• = {๐‘“๐‘—๐‘™ = (๐‘ค๐‘—, ๐‘ค๐‘—๐‘™)|๐‘ค๐‘—๐‘™ โˆˆ ๐“ž๐‘—}

slide-12
SLIDE 12
  • 12-

Method

CADGMM

Feature Encoder e.g. MLP , CNN, LSTM Graph Encoder e.g. GAT Feature Decoder

slide-13
SLIDE 13
  • 13-

Method

CADGMM

Estimation network

Gaussian Mixture Model

Initial embedding: Z Membership: Z๐“(๐‘šโ„ณ) = ๐œ Z๐“ ๐‘šโ„ณโˆ’1 W๐“ ๐‘šโ„ณโˆ’1 + b๐“ ๐‘šโ„ณโˆ’1 , Z๐“(0) = Z ๐“ = Softmax(Z๐“(๐‘€โ„ณ)), ๐“ โˆˆ โ„๐‘‚ร—๐‘ Parameter Estimation: ๐‚๐’ =

เทŒ๐‘—=1

๐‘‚

๐“๐‘—,๐‘›Z๐‘— เทŒ๐‘—=1

๐‘‚

๐“๐‘—,๐‘›

, ๐šป๐’ =

เท

๐‘—=1 ๐‘‚

๐“๐‘—,๐‘›(Z๐‘—โˆ’๐‚๐’)(Z๐‘—โˆ’๐‚๐’)T เทŒ๐‘—=1

๐‘‚

๐“๐‘—,๐‘›

Energy: EZ = โˆ’log ฯƒ๐‘›=1

๐‘

ฯƒ๐‘—=1

๐‘‚ ๐“๐‘—,๐‘› ๐‘‚

exp(โˆ’1

2(Zโˆ’๐‚๐’)T๐šป๐‘› โˆ’1(Zโˆ’๐‚๐’))

|2๐œŒ๐šป๐‘›|

1 2

slide-14
SLIDE 14
  • 14-

โ„’ = ||X โˆ’ เทก X||2

2 + ๐œ‡1EZ + ๐œ‡2 ฯƒ๐‘›=1 ๐‘

ฯƒ๐‘—=1

๐‘‚ 1 (๐šป๐’)๐‘—๐‘— + ๐œ‡3||Z||2 2

Method

Loss and Anomaly Score

๐‘‡๐‘‘๐‘๐‘ ๐‘“ = EZ Loss Function: Anomaly Score:

  • Rec. Error

Energy Covariance Penalty Embedding Penalty

๐‘ง๐‘— = {1, ๐‘—๐‘” ๐‘ฃ(๐˜๐‘—) โ‰ฅ ๐œ‡, 0, ๐‘๐‘ขโ„Ž๐‘“๐‘ ๐‘ฅ๐‘—๐‘ก๐‘“. ๐œ‡=Distribution(๐‘‡๐‘‘๐‘๐‘ ๐‘“) Solution for Problem:

slide-15
SLIDE 15
  • 15-

Experiment

Datasets Baselines Evaluation Metrics

Precision Recall F1-Score OC-SVM Chen et al. 2001 IF Liu et al. 2008 DSEBM Zhai et al. 2016 DAGMM Zong et al. 2018 AnoGAN Schlegl et al. 2017 ALAD Zenati et al. 2018

slide-16
SLIDE 16
  • 16-

Experiment

Results

Consistent performance improvement!

slide-17
SLIDE 17
  • 17-

Experiment

Results

Less sensitive to noise data! More robust!

slide-18
SLIDE 18
  • 18-

Experiment

Results

  • Fig. Impact of different K values of K-NN

algorithms in graph construction.

Less sensitive to hyper-parameters! Easy to use!

slide-19
SLIDE 19
  • 19-

Experiment

Results

Explainable and Effective!

  • Fig. Embedding visualization on KDD99 (Blue indicates

the normal samples and orange the anomalies). (a). DAGMM (b). CADGMM

slide-20
SLIDE 20
  • 20-

Conclusion and Future Works

  • Conventional feature learning models cannot

effectively capture the correlation among data samples for anomaly detection.

  • We propose a general representation learning

framework to model the complex correlation among data samples for unsupervised anomaly detection.

  • We plan to explore the correlation among samples

for extremely high-dimensional data sources like image or video.

  • We plan to develop an adaptive and learnable graph

construction module for a more reasonable correlation modeling.

slide-21
SLIDE 21
  • 21-

Reference

  • [OC-SVM] Chen, Y., Zhou, X.S., Huang, T.S.: One-class svm for learning in image
  • retrieval. ICIP. 2001
  • [IF] 8. Liu, F.T., Ting, K.M., Zhou, Z.H.: Isolation forest. ICDM. 2008.
  • [DSEBM] Zhai, S., Cheng, Y., Lu, W., Zhang, Z.: Deep structured energy based

models for anomaly detection. ICML. 2016.

  • [DAGMM] Zong, B., Song, Q., Min, M.R., Cheng, W., Lumezanu, C., Cho, D., Chen,

H.: Deep autoencoding gaussian mixture model for unsupervised anomaly

  • detection. ICLR. 2018.
  • [AnoGAN] Schlegl, T., Seebโ€ข
  • ck, P

., Waldstein, S.M., Schmidt-Erfurth, U., Langs, G.: Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. IPMI. 2017.

  • [ALAD] Zenati, H., Romain, M., Foo, C.S., Lecouat, B., Chandrasekhar, V.:

Adversarially learned anomaly detection. ICDM. 2018.

slide-22
SLIDE 22
  • 22-

Thanks Thanks for listening!

Contact: isfanhy@hrbust.edu.cn Home Page: https://haoyfan.github.io/