from system logs through deep learning
play

from System Logs through Deep Learning Min Du , Feifei Li, Guineng - PowerPoint PPT Presentation

DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du , Feifei Li, Guineng Zheng, Vivek Srikumar University of Utah Background 2 Background System Event Log 3 Background System Event Log Available


  1. Log Key Anomaly Detection model Example log key sequence: 25 18 54 57 18 56 … 25 18 54 57 56 18 … ➢ a rigorous set of logic and control flows ➢ a ( more structured ) natural language natural language modeling multi-class classifier: history sequence => next key to appear A log key is detected to be abnormal if it does not follow the prediction. 49

  2. Log Key Anomaly Detection model Use long short-term memory ( LSTM ) architecture 50

  3. Log Key Anomaly Detection model Use long short-term memory ( LSTM ) architecture 51

  4. Log Key Anomaly Detection model Use long short-term memory ( LSTM ) architecture Training: log key sequence: 25 18 54 57 18 56 … 25 18 54 57 56 18 … h=3 52

  5. Log Key Anomaly Detection model Use long short-term memory ( LSTM ) architecture Training: log key sequence: 25 18 54 57 18 56 … 25 18 54 57 56 18 … h=3 53

  6. Log Key Anomaly Detection model Use long short-term memory ( LSTM ) architecture Training: log key sequence: 25 18 54 57 18 56 … 25 18 54 57 56 18 … h=3 54

  7. Log Key Anomaly Detection model Use long short-term memory ( LSTM ) architecture Training: log key sequence: 25 18 54 57 18 56 … 25 18 54 57 56 18 … h=3 55

  8. Log Key Anomaly Detection model Use long short-term memory ( LSTM ) architecture Detection: In detection stage, DeepLog checks if the actual next log key is among its top g probable predictions. 56

  9. Log Key Anomaly Detection model 57

  10. Log Key Anomaly Detection model 58

  11. Log Key Anomaly Detection model 59

  12. Workflow Construction Input: log key sequence 25 18 54 57 18 56 … 25 18 54 57 56 18 … Output: 60

  13. Workflow Construction Method 1: Using Log Key Anomaly Detection model --- LSTM prediction probabilities 61

  14. Workflow Construction Method 1: Using Log Key Anomaly Detection model --- LSTM prediction probabilities An example of concurrency detection: 62

  15. Workflow Construction Method 1: Using Log Key Anomaly Detection model --- LSTM prediction probabilities An example of concurrency detection: 63

  16. Workflow Construction Method 1: Using Log Key Anomaly Detection model --- LSTM prediction probabilities An example of concurrency detection: 64

  17. Workflow Construction Method 1: Using Log Key Anomaly Detection model --- LSTM prediction probabilities An example of concurrency detection: 65

  18. Workflow Construction Method 1: Using Log Key Anomaly Detection model --- LSTM prediction probabilities An example of concurrency detection: 66

  19. Workflow Construction Method 2: A density-based clustering approach 67

  20. Workflow Construction Method 2: A density-based clustering approach Co-occurrence matrix of log keys ( 𝒍 𝒋 , 𝒍 𝒌 ) within distance 𝒆 𝑔 𝑒 ( 𝑙 𝑗 , 𝑙 𝑘 ) : the frequency of ( 𝑙 𝑗 , 𝑙 𝑘 ) appearing together within distance d 𝑔 ( 𝑙 𝑗 ) : the frequency of 𝑙 𝑗 in the input sequence 𝑞 𝑒 ( i , 𝑘 ) : the probability of ( 𝑙 𝑗 , 𝑙 𝑘 ) appearing together within distance d 68

  21. Parameter Value Anomaly Detection model Example: Log messages of a particular log key: 𝒖 𝟑 : 𝑼𝒑𝒑𝒍 𝟏. 𝟕𝟐 𝒕𝒇𝒅𝒑𝒐𝒆𝒕 𝒖𝒑 𝒆𝒇𝒃𝒎𝒎𝒑𝒅𝒃𝒖𝒇 𝒐𝒇𝒖𝒙𝒑𝒔𝒍 … 𝒖′ 𝟑 : 𝑼𝒑𝒑𝒍 𝟐. 𝟐 𝒕𝒇𝒅𝒑𝒐𝒆𝒕 𝒖𝒑 𝒆𝒇𝒃𝒎𝒎𝒑𝒅𝒃𝒖𝒇 𝒐𝒇𝒖𝒙𝒑𝒔𝒍 … …. 69

  22. Parameter Value Anomaly Detection model Example: Log messages of a particular log key: 𝒖 𝟑 : 𝑼𝒑𝒑𝒍 𝟏. 𝟕𝟐 𝒕𝒇𝒅𝒑𝒐𝒆𝒕 𝒖𝒑 𝒆𝒇𝒃𝒎𝒎𝒑𝒅𝒃𝒖𝒇 𝒐𝒇𝒖𝒙𝒑𝒔𝒍 … 𝒖′ 𝟑 : 𝑼𝒑𝒑𝒍 𝟐. 𝟐 𝒕𝒇𝒅𝒑𝒐𝒆𝒕 𝒖𝒑 𝒆𝒇𝒃𝒎𝒎𝒑𝒅𝒃𝒖𝒇 𝒐𝒇𝒖𝒙𝒑𝒔𝒍 … …. Parameter value vectors overtime: [ 𝒖 𝟑 - 𝒖 𝟐 , 0.61], [ 𝒖′ 𝟑 - 𝒖′ 𝟐 , 1.1], …. 70

  23. Parameter Value Anomaly Detection model Example: Log messages of a particular log key: 𝒖 𝟑 : 𝑼𝒑𝒑𝒍 𝟏. 𝟕𝟐 𝒕𝒇𝒅𝒑𝒐𝒆𝒕 𝒖𝒑 𝒆𝒇𝒃𝒎𝒎𝒑𝒅𝒃𝒖𝒇 𝒐𝒇𝒖𝒙𝒑𝒔𝒍 … 𝒖′ 𝟑 : 𝑼𝒑𝒑𝒍 𝟐. 𝟐 𝒕𝒇𝒅𝒑𝒐𝒆𝒕 𝒖𝒑 𝒆𝒇𝒃𝒎𝒎𝒑𝒅𝒃𝒖𝒇 𝒐𝒇𝒖𝒙𝒑𝒔𝒍 … …. Parameter value vectors overtime: [ 𝒖 𝟑 - 𝒖 𝟐 , 0.61], [ 𝒖′ 𝟑 - 𝒖′ 𝟐 , 1.1], …. Multi-variate time series data anomaly detection problem! 71

  24. Parameter Value Anomaly Detection model Multi-variate time series data anomaly detection problem ✓ Leverage LSTM-based approach; ✓ A parameter value vector is given as input at each time step; ✓ An anomaly is detected if the mean-square-error (MSE) between prediction and actual data is too big. 72

  25. Parameter Value Anomaly Detection model Multi-variate time series data anomaly detection problem ✓ Leverage LSTM-based approach; ✓ A parameter value vector is given as input at each time step; ✓ An anomaly is detected if the mean-square-error (MSE) between prediction and actual data is too big. value history time 73

  26. Parameter Value Anomaly Detection model Multi-variate time series data anomaly detection problem ✓ Leverage LSTM-based approach; ✓ A parameter value vector is given as input at each time step; ✓ An anomaly is detected if the mean-square-error (MSE) between prediction and actual data is too big. value history prediction time 74

  27. Parameter Value Anomaly Detection model Multi-variate time series data anomaly detection problem ✓ Leverage LSTM-based approach; ✓ A parameter value vector is given as input at each time step; ✓ An anomaly is detected if the mean-square-error (MSE) between prediction and actual data is too big. value history prediction actual time 75

  28. Parameter Value Anomaly Detection model Multi-variate time series data anomaly detection problem ✓ Leverage LSTM-based approach; ✓ A parameter value vector is given as input at each time step; ✓ An anomaly is detected if the mean-square-error (MSE) between prediction and actual data is too big. value history prediction MSE > Threshold ? actual time 76

  29. Parameter Value Anomaly Detection model Multi-variate time series data anomaly detection problem ✓ Leverage LSTM-based approach; ✓ A parameter value vector is given as input at each time step; ✓ An anomaly is detected if the mean-square-error (MSE) between prediction and actual data is too big. value history time 77

  30. Parameter Value Anomaly Detection model Multi-variate time series data anomaly detection problem ✓ Leverage LSTM-based approach; ✓ A parameter value vector is given as input at each time step; ✓ An anomaly is detected if the mean-square-error (MSE) between prediction and actual data is too big. value history actual prediction time 78

  31. Parameter Value Anomaly Detection model Multi-variate time series data anomaly detection problem ✓ Leverage LSTM-based approach; ✓ A parameter value vector is given as input at each time step; ✓ An anomaly is detected if the mean-square-error (MSE) between prediction and actual data is too big. value history actual MSE > Threshold ? prediction time 79

  32. Parameter Value Anomaly Detection model Multi-variate time series data anomaly detection problem ✓ Leverage LSTM-based approach; ✓ A parameter value vector is given as input at each time step; ✓ An anomaly is detected if the mean-square-error (MSE) between prediction and actual data is too big. value history … time 80

  33. LSTM model online update Q: How to handle false positive? 81

  34. LSTM model online update Q: How to handle false positive? Log sequence: history 82

  35. LSTM model online update Q: How to handle false positive? Log sequence: history model 83

  36. LSTM model online update Q: How to handle false positive? Log sequence: history model prediction 84

  37. LSTM model online update Q: How to handle false positive? Log sequence: current history Anomaly? model prediction 85

  38. LSTM model online update Q: How to handle false positive? Log sequence: current history Yes Anomaly? model prediction 86

  39. LSTM model online update Q: How to handle false positive? Log sequence: current history Yes Anomaly? False model prediction positive? 87

  40. LSTM model online update Q: How to handle false positive? Log sequence: current history Yes Anomaly? False model prediction positive? Yes update model using this case: “ history -> current ” 88

  41. Evaluation – log key anomaly detection Up is good Evaluation results on HDFS log data [1] . (over a million log entries with labeled anomalies) [1] PCA (SOSP’09), IM (UsenixATC’10), N -gram (baseline language model) 89

  42. Evaluation – parameter value anomaly detection MSE: mean square error Evaluation results on OpenStack cloud log with different confidence intervals (CIs) 90

  43. Evaluation – parameter value anomaly detection MSE: mean square error generated on CloudLab; Evaluation results on OpenStack cloud log VM creation/deletion operations; with different confidence intervals (CIs) injected performance anomalies. 91

  44. Evaluation – parameter value anomaly detection MSE: mean square error thresholds Evaluation results on OpenStack cloud log with different confidence intervals (CIs) 92

  45. Evaluation – parameter value anomaly detection MSE: mean square error thresholds ANOMALY Evaluation results on OpenStack cloud log with different confidence intervals (CIs) 93

  46. Evaluation – parameter value anomaly detection MSE: mean square error thresholds ANOMALY False Positive Evaluation results on OpenStack cloud log with different confidence intervals (CIs) 94

  47. Evaluation – LSTM model online update Up is good Evaluation on Blue Gene/L log, with and without online model update. 95

  48. Evaluation – LSTM model online update Up is good HPC log with labeled anomalies; Evaluation on Blue Gene/L log, Available at with and without online model update. https://www.usenix.org/cfdr-data 96

  49. Evaluation – case study: network security log Dataset: IEEE VAST Challenge 2011 (Mini Challenge 2 – Computer Networking Operations) The dataset contains firewall log, IDS log, etc. 97

  50. Evaluation – case study: network security log Dataset: IEEE VAST Challenge 2011 (Mini Challenge 2 – Computer Networking Operations) The dataset contains firewall log, IDS log, etc. Detection results. 98

  51. Evaluation – case study: network security log Dataset: IEEE VAST Challenge 2011 (Mini Challenge 2 – Computer Networking Operations) The dataset contains firewall log, IDS log, etc. Detection results. Could be fixed with prior knowledge of “documented IP” 99

  52. Evaluation – workflow construction Constructed workflow of VM Creation . (previously generated OpenStack cloud log) 100

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend