haoyi fan 1 fengbin zhang 1 ruidong wang 1
play

Haoyi Fan 1 , Fengbin Zhang 1 , Ruidong Wang 1 , Liang Xi 1 , Zuoyong - PowerPoint PPT Presentation

-1- Correlation-aware Deep Generative Model for Unsupervised Anomaly Detection Haoyi Fan 1 , Fengbin Zhang 1 , Ruidong Wang 1 , Liang Xi 1 , Zuoyong Li 2 Harbin University of Science and Technology 1 Minjiang University 2 isfanhy@hrbust.edu.cn


  1. -1- Correlation-aware Deep Generative Model for Unsupervised Anomaly Detection Haoyi Fan 1 , Fengbin Zhang 1 , Ruidong Wang 1 , Liang Xi 1 , Zuoyong Li 2 Harbin University of Science and Technology 1 Minjiang University 2 isfanhy@hrbust.edu.cn

  2. -2- Background Anomaly Anomaly Observed Space Latent Space Normal

  3. -3- Background https://www.explosion.com/135494/5-effective-strategies-of-fraud- https://towardsdatascience.com/building-an-intrusion-detection-system- detection-and-prevention-for-ecommerce/ using-deep-learning-b9488332b321 Fraud Detection Intrusion Detection https://planforgermany.com/switching-private-public-health-insurance- https://blog.exporthub.com/working-with-chinese-manufacturers/ germany/ Disease Detection Fault Detection

  4. -4- Background Unsupervised Anomaly Detection โ€“ From the Density Estimation Perspective Data samples: ๐‘Œ ๐‘ข๐‘ ๐‘๐‘—๐‘œ = ๐‘ฆ 1 , ๐‘ฆ 2 , ๐‘ฆ 3 , โ€ฆ , ๐‘ฆ ๐‘œ , ๐‘ฆ ๐‘— is assumed normal. Latent Space

  5. -5- Background Unsupervised Anomaly Detection โ€“ From the Density Estimation Perspective Data samples: ๐‘Œ ๐‘ข๐‘ ๐‘๐‘—๐‘œ = ๐‘ฆ 1 , ๐‘ฆ 2 , ๐‘ฆ 3 , โ€ฆ , ๐‘ฆ ๐‘œ , ๐‘ฆ ๐‘— is assumed normal. Model: ๐‘ž(๐‘ฆ) Latent Space

  6. -6- Background Unsupervised Anomaly Detection โ€“ From the Density Estimation Perspective Data samples: ๐‘Œ ๐‘ข๐‘ ๐‘๐‘—๐‘œ = ๐‘ฆ 1 , ๐‘ฆ 2 , โ€ฆ , ๐‘ฆ ๐‘œ , ๐‘ฆ ๐‘— is assumed normal. Model: ๐‘ž(๐‘ฆ) Test samples: ๐‘Œ ๐‘ข๐‘“๐‘ก๐‘ข = ๐‘ฆ 1 , ๐‘ฆ 2 , โ€ฆ , ๐‘ฆ ๐‘œ , ๐‘ฆ ๐‘ข is unknow. if ๐‘ž(๐‘ฆ ๐‘ข ) < ๐œ‡ , ๐‘ฆ ๐‘ข is abnormal . if ๐‘ž(๐‘ฆ ๐‘ข ) โ‰ฅ ๐œ‡ , ๐‘ฆ ๐‘ข is normal . Latent Space

  7. -7- Background Unsupervised Anomaly Detection โ€“ From the Density Estimation Perspective Data samples: ๐‘Œ ๐‘ข๐‘ ๐‘๐‘—๐‘œ = ๐‘ฆ 1 , ๐‘ฆ 2 , โ€ฆ , ๐‘ฆ ๐‘œ , ๐‘ฆ ๐‘— is assumed normal. Model: ๐‘ž(๐‘ฆ) Test samples: ๐‘Œ ๐‘ข๐‘“๐‘ก๐‘ข = ๐‘ฆ 1 , ๐‘ฆ 2 , โ€ฆ , ๐‘ฆ ๐‘œ , ๐‘ฆ ๐‘ข is unknow. if ๐‘ž(๐‘ฆ ๐‘ข ) < ๐œ‡ , ๐‘ฆ ๐‘ข is abnormal . if ๐‘ž(๐‘ฆ ๐‘ข ) โ‰ฅ ๐œ‡ , ๐‘ฆ ๐‘ข is normal . Latent Space Anomalies reside in the low probability density areas.

  8. -8- Background Correlation among data samples Conventional Anomaly Feature Learning Detection Graph Modeling Feature Space Correlation-aware Anomaly Feature Learning Detection Structure Space How to discover the normal pattern from both the feature level and structural level ?

  9. -9- Problem Statement Anomaly Detection Notations ๐“— : Graph. Given a set of input samples ๐“จ = {๐‘ฆ ๐‘— |๐‘— = ๐“ฆ : Set of nodes in a graph. 1, . . . , ๐‘‚} , each of which is associated ๐“• : Set of edges in a graph. with a ๐บ dimension feature ๐˜ ๐‘— โˆˆ โ„ ๐บ , we ๐‘‚ : Number of nodes. aim to learn a score function ๐‘ฃ(๐˜ ๐‘— ): โ„ ๐บ โ†ฆ ๐บ : Dimension of attribute. ๐ โˆˆ โ„ ๐‘‚ร—๐‘‚ : Adjacency matrix โ„ , to classify sample ๐‘ฆ ๐‘— based on the threshold ๐œ‡ : of a network. ๐˜ โˆˆ โ„ ๐‘‚ร—๐บ : Feature matrix of all nodes. ๐‘ง ๐‘— = {1, ๐‘—๐‘” ๐‘ฃ(๐˜ ๐‘— ) โ‰ฅ ๐œ‡, 0, ๐‘๐‘ขโ„Ž๐‘“๐‘ ๐‘ฅ๐‘—๐‘ก๐‘“. where ๐‘ง ๐‘— denotes the label of sample ๐‘ฆ ๐‘— , with 0 being the normal class and 1 the anomalous class.

  10. -10- Method CADGMM Feature Dual-Encoder Decoder Graph Estimation Construction network

  11. -11- Method CADGMM K-Nearest Neighbor e.g. K=5 Original feature: ๐“จ = {๐‘ฆ ๐‘— |๐‘— = 1, . . . , ๐‘‚} Find neighbors by K-NN: ๐“ž ๐‘— = {๐‘ฆ ๐‘— ๐‘™ |๐‘™ = 1, . . . , ๐ฟ เตŸ Model correlation as graph: ๐“— = {๐“ฆ, ๐“•, ๐˜} Graph Construction ๐“ฆ = {๐‘ค ๐‘— = ๐‘ฆ ๐‘— |๐‘— = 1, . . . , ๐‘‚} ๐“• = {๐‘“ ๐‘— ๐‘™ = (๐‘ค ๐‘— , ๐‘ค ๐‘— ๐‘™ )|๐‘ค ๐‘— ๐‘™ โˆˆ ๐“ž ๐‘— }

  12. -12- Method CADGMM Feature Encoder e.g. MLP , CNN, Feature Decoder LSTM Graph Encoder e.g. GAT

  13. -13- Method CADGMM Gaussian Mixture Model Initial embedding: Z Membership: Z ๐“(๐‘š โ„ณ ) = ๐œ Z ๐“ ๐‘š โ„ณ โˆ’1 W ๐“ ๐‘š โ„ณ โˆ’1 + b ๐“ ๐‘š โ„ณ โˆ’1 , Z ๐“(0) = Z ๐“ = Softmax ( Z ๐“(๐‘€ โ„ณ ) ) , ๐“ โˆˆ โ„ ๐‘‚ร—๐‘ Parameter Estimation: ๐‘‚ ๐“ ๐‘—,๐‘› ( Z ๐‘— โˆ’๐‚ ๐’ )( Z ๐‘— โˆ’๐‚ ๐’ ) T ๐‘‚ ๐“ ๐‘—,๐‘› Z ๐‘— เท เทŒ ๐‘—=1 ๐‘—=1 ๐‚ ๐’ = , ๐šป ๐’ = ๐‘‚ ๐‘‚ ๐“ ๐‘—,๐‘› ๐“ ๐‘—,๐‘› เทŒ ๐‘—=1 เทŒ ๐‘—=1 Estimation network Energy: 2 ( Z โˆ’๐‚ ๐’ ) T ๐šป ๐‘› exp (โˆ’ 1 โˆ’1 ( Z โˆ’๐‚ ๐’ )) ๐“ ๐‘—,๐‘› ๐‘‚ E Z = โˆ’ log ฯƒ ๐‘›=1 ๐‘ ฯƒ ๐‘—=1 1 ๐‘‚ |2๐œŒ๐šป ๐‘› | 2

  14. -14- Method Loss and Anomaly Score Loss Function: 2 + ๐œ‡ 1 E Z + ๐œ‡ 2 ฯƒ ๐‘›=1 1 โ„’ = || X โˆ’ เทก ๐‘ ๐‘‚ 2 X || 2 (๐šป ๐’ ) ๐‘—๐‘— + ๐œ‡ 3 || Z || 2 ฯƒ ๐‘—=1 Embedding Covariance Rec. Error Energy Penalty Penalty Anomaly Score: ๐‘‡๐‘‘๐‘๐‘ ๐‘“ = E Z Solution for Problem: ๐‘ง ๐‘— = {1, ๐‘—๐‘” ๐‘ฃ(๐˜ ๐‘— ) โ‰ฅ ๐œ‡, 0, ๐‘๐‘ขโ„Ž๐‘“๐‘ ๐‘ฅ๐‘—๐‘ก๐‘“. ๐œ‡ =Distribution( ๐‘‡๐‘‘๐‘๐‘ ๐‘“ )

  15. -15- Experiment Datasets Baselines Evaluation Metrics Precision OC-SVM Chen et al. 2001 Recall IF Liu et al. 2008 F1-Score DSEBM Zhai et al. 2016 DAGMM Zong et al. 2018 AnoGAN Schlegl et al. 2017 ALAD Zenati et al. 2018

  16. -16- Experiment Results Consistent performance improvement!

  17. -17- Experiment Results Less sensitive to noise data! More robust!

  18. -18- Experiment Results Fig. Impact of different K values of K-NN algorithms in graph construction. Less sensitive to hyper-parameters! Easy to use!

  19. -19- Experiment Results (a). DAGMM (b). CADGMM Fig. Embedding visualization on KDD99 (Blue indicates the normal samples and orange the anomalies). Explainable and Effective!

  20. -20- Conclusion and Future Works Conventional feature learning models cannot โ€ข effectively capture the correlation among data samples for anomaly detection. We propose a general representation learning โ€ข framework to model the complex correlation among data samples for unsupervised anomaly detection. We plan to explore the correlation among samples โ€ข for extremely high-dimensional data sources like image or video. We plan to develop an adaptive and learnable graph โ€ข construction module for a more reasonable correlation modeling.

  21. -21- Reference [OC-SVM] Chen, Y., Zhou, X.S., Huang, T.S.: One-class svm for learning in image โ€ข retrieval. ICIP . 2001 [IF] 8. Liu, F.T., Ting, K.M., Zhou, Z.H.: Isolation forest. ICDM . 2008. โ€ข [DSEBM] Zhai, S., Cheng, Y., Lu, W., Zhang, Z.: Deep structured energy based โ€ข models for anomaly detection. ICML . 2016. [DAGMM] Zong, B., Song, Q., Min, M.R., Cheng, W., Lumezanu, C., Cho, D., Chen, โ€ข H.: Deep autoencoding gaussian mixture model for unsupervised anomaly detection. ICLR . 2018. [AnoGAN] Schlegl, T., Seebโ€ข ock, P ., Waldstein, S.M., Schmidt-Erfurth, U., Langs, โ€ข G.: Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. IPMI . 2017. [ALAD] Zenati, H., Romain, M., Foo, C.S., Lecouat, B., Chandrasekhar, V.: โ€ข Adversarially learned anomaly detection. ICDM . 2018.

  22. -22- Thanks Thanks for listening! Contact: isfanhy@hrbust.edu.cn Home Page: https://haoyfan.github.io/

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend