from System Logs through Deep Learning Min Du , Feifei Li, Guineng - - PowerPoint PPT Presentation

from system logs through deep learning
SMART_READER_LITE
LIVE PREVIEW

from System Logs through Deep Learning Min Du , Feifei Li, Guineng - - PowerPoint PPT Presentation

DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du , Feifei Li, Guineng Zheng, Vivek Srikumar University of Utah Background 2 Background System Event Log 3 Background System Event Log Available


slide-1
SLIDE 1

DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning

Min Du, Feifei Li, Guineng Zheng, Vivek Srikumar University of Utah

slide-2
SLIDE 2

Background

2

slide-3
SLIDE 3

Background

System Event Log

3

slide-4
SLIDE 4

Background

System Event Log

Available practically on every computer system!

4

slide-5
SLIDE 5

Background

System Event Log

Automatic Analysis?

5

Available practically on every computer system!

slide-6
SLIDE 6

Background

6

Automatically detected anomaly

slide-7
SLIDE 7

Background

System Event Log

7

Started service A on port 80 Executor updated: app-1 is now LOADING ……

slide-8
SLIDE 8

Background

System Event Log Structured Data

Message type Log key ……

printf(“Started service %s on port %d”, x, y); LOG PARSING

8

Started service A on port 80 Executor updated: app-1 is now LOADING ……

slide-9
SLIDE 9

Background

System Event Log Structured Data

Message type Log key ……

printf(“Started service %s on port %d”, x, y); LOG PARSING

Started service A on port 80 Executor updated: app-1 is now LOADING …… Started service * on port * (log key ID: 1) Executor updated: * is now LOADING (log key ID: 2) ……

9

slide-10
SLIDE 10

Background

System Event Log Structured Data Anomaly Detection

Message type Log key ……

printf(“Started service %s on port %d”, x, y); LOG ANALYSIS LOG PARSING

10

slide-11
SLIDE 11

Background

System Event Log Structured Data Anomaly Detection

Message type Log key ……

printf(“Started service %s on port %d”, x, y); LOG ANALYSIS Message count vector: Xu’SOSP09, Lou’ATC10, etc. LOG PARSING

11

slide-12
SLIDE 12

Background

Structured Data Anomaly Detection

Message type Log key ……

printf(“Started service %s on port %d”, x, y); LOG ANALYSIS Message count vector: Xu’SOSP09, Lou’ATC10, etc. Problem: Offline batched processing LOG PARSING

System Event Log

12

slide-13
SLIDE 13

Background

Structured Data Anomaly Detection

Message type Log key ……

printf(“Started service %s on port %d”, x, y); LOG ANALYSIS Message count vector: Xu’SOSP09, Lou’ATC10, etc. Problem: Offline batched processing Build workflow model: Lou’KDD10, Beschastnikh’ICSE14, Yu’ASPLOS16, etc. LOG PARSING

System Event Log

13

slide-14
SLIDE 14

Background

Structured Data Anomaly Detection

Message type Log key ……

printf(“Started service %s on port %d”, x, y); LOG ANALYSIS Message count vector: Xu’SOSP09, Lou’ATC10, etc. Problem: Offline batched processing Build workflow model: Lou’KDD10, Beschastnikh’ICSE14, Yu’ASPLOS16, etc. Problem: Only for simple execution path anomalies LOG PARSING

System Event Log

14

slide-15
SLIDE 15

Background

Structured Data Anomaly Detection

Message type Log key ……

printf(“Started service %s on port %d”, x, y); LOG ANALYSIS Message count vector: Xu’SOSP09, Lou’ATC10, etc. Problem: Offline batched processing Build workflow model: Lou’KDD10, Beschastnikh’ICSE14, Yu’ASPLOS16, etc. Problem: Only for simple execution path anomalies LOG PARSING Common problem: Only Log keys (Message types) are considered.

System Event Log

15

slide-16
SLIDE 16

DeepLog

log message (log key underlined) log key parameter value vector 𝑢1 Deletion of file1 complete 𝑙1 [𝑢1 - 𝑢0, file1] 𝑢2 Took 0.61 seconds to deallocate network … 𝑙2 [𝑢2 - 𝑢1, 0.61] 𝑢3 VM Stopped (Lifecycle Event) 𝑙3 [𝑢3 - 𝑢2] … … …

16

slide-17
SLIDE 17

DeepLog

SPELL

A streaming log parser published in ICDM’16

log message (log key underlined) log key parameter value vector 𝑢1 Deletion of file1 complete 𝑙1 [𝑢1 - 𝑢0, file1] 𝑢2 Took 0.61 seconds to deallocate network … 𝑙2 [𝑢2 - 𝑢1, 0.61] 𝑢3 VM Stopped (Lifecycle Event) 𝑙3 [𝑢3 - 𝑢2] … … …

17

slide-18
SLIDE 18

DeepLog

SPELL

A streaming log parser published in ICDM’16

log key log message parameters

log message (log key underlined) log key parameter value vector 𝑢1 Deletion of file1 complete 𝑙1 [𝑢1 - 𝑢0, file1] 𝑢2 Took 0.61 seconds to deallocate network … 𝑙2 [𝑢2 - 𝑢1, 0.61] 𝑢3 VM Stopped (Lifecycle Event) 𝑙3 [𝑢3 - 𝑢2] … … …

18

slide-19
SLIDE 19

DeepLog

SPELL

A streaming log parser published in ICDM’16 Deletion of file1 complete.

log key log message parameters

log message (log key underlined) log key parameter value vector 𝑢1 Deletion of file1 complete 𝑙1 [𝑢1 - 𝑢0, file1] 𝑢2 Took 0.61 seconds to deallocate network … 𝑙2 [𝑢2 - 𝑢1, 0.61] 𝑢3 VM Stopped (Lifecycle Event) 𝑙3 [𝑢3 - 𝑢2] … … …

19

slide-20
SLIDE 20

DeepLog

SPELL

A streaming log parser published in ICDM’16 Deletion of file1 complete.

log key log message parameters

log message (log key underlined) log key parameter value vector 𝑢1 Deletion of file1 complete 𝑙1 [𝑢1 - 𝑢0, file1] 𝑢2 Took 0.61 seconds to deallocate network … 𝑙2 [𝑢2 - 𝑢1, 0.61] 𝑢3 VM Stopped (Lifecycle Event) 𝑙3 [𝑢3 - 𝑢2] … … …

20

Deletion of * complete. [file1]

slide-21
SLIDE 21

DeepLog

SPELL

A streaming log parser published in ICDM’16 Deletion of file1 complete.

log key log message

Deletion of file2 complete.

parameters

log message (log key underlined) log key parameter value vector 𝑢1 Deletion of file1 complete 𝑙1 [𝑢1 - 𝑢0, file1] 𝑢2 Took 0.61 seconds to deallocate network … 𝑙2 [𝑢2 - 𝑢1, 0.61] 𝑢3 VM Stopped (Lifecycle Event) 𝑙3 [𝑢3 - 𝑢2] … … …

21

Deletion of * complete. [file1]

slide-22
SLIDE 22

DeepLog

SPELL

A streaming log parser published in ICDM’16 Deletion of file1 complete. Deletion of * complete.

log key log message

Deletion of file2 complete. Deletion of * complete.

parameters

[file1] [file2]

log message (log key underlined) log key parameter value vector 𝑢1 Deletion of file1 complete 𝑙1 [𝑢1 - 𝑢0, file1] 𝑢2 Took 0.61 seconds to deallocate network … 𝑙2 [𝑢2 - 𝑢1, 0.61] 𝑢3 VM Stopped (Lifecycle Event) 𝑙3 [𝑢3 - 𝑢2] … … …

22

slide-23
SLIDE 23

DeepLog

log message (log key underlined) log key parameter value vector 𝑢1 Deletion of file1 complete 𝑙1 [𝑢1 - 𝑢0, file1] 𝑢2 Took 0.61 seconds to deallocate network … 𝑙2 [𝑢2 - 𝑢1, 0.61] 𝑢3 VM Stopped (Lifecycle Event) 𝑙3 [𝑢3 - 𝑢2] … … …

23

slide-24
SLIDE 24

DeepLog

log message (log key underlined) log key parameter value vector 𝑢1 Deletion of file1 complete 𝑙1 [𝑢1 - 𝑢0, file1] 𝑢2 Took 0.61 seconds to deallocate network … 𝑙2 [𝑢2 - 𝑢1, 0.61] 𝑢3 VM Stopped (Lifecycle Event) 𝑙3 [𝑢3 - 𝑢2] … … …

24

slide-25
SLIDE 25

DeepLog

log message (log key underlined) log key parameter value vector 𝑢1 Deletion of file1 complete 𝑙1 [𝑢1 - 𝑢0, file1] 𝑢2 Took 0.61 seconds to deallocate network … 𝑙2 [𝑢2 - 𝑢1, 0.61] 𝑢3 VM Stopped (Lifecycle Event) 𝑙3 [𝑢3 - 𝑢2] … … …

25

slide-26
SLIDE 26

DeepLog

Anomaly Detection log message (log key underlined) log key parameter value vector 𝑢1 Deletion of file1 complete 𝑙1 [𝑢1 - 𝑢0, file1] 𝑢2 Took 0.61 seconds to deallocate network … 𝑙2 [𝑢2 - 𝑢1, 0.61] 𝑢3 VM Stopped (Lifecycle Event) 𝑙3 [𝑢3 - 𝑢2] … … …

26

slide-27
SLIDE 27

DeepLog

Anomaly Detection Diagnosis log message (log key underlined) log key parameter value vector 𝑢1 Deletion of file1 complete 𝑙1 [𝑢1 - 𝑢0, file1] 𝑢2 Took 0.61 seconds to deallocate network … 𝑙2 [𝑢2 - 𝑢1, 0.61] 𝑢3 VM Stopped (Lifecycle Event) 𝑙3 [𝑢3 - 𝑢2] … … …

27

slide-28
SLIDE 28

DeepLog

DeepLog

Anomaly Detection Diagnosis log message (log key underlined) log key parameter value vector 𝑢1 Deletion of file1 complete 𝑙1 [𝑢1 - 𝑢0, file1] 𝑢2 Took 0.61 seconds to deallocate network … 𝑙2 [𝑢2 - 𝑢1, 0.61] 𝑢3 VM Stopped (Lifecycle Event) 𝑙3 [𝑢3 - 𝑢2] … … …

28

slide-29
SLIDE 29

DeepLog Architecture

Training Stage Detection Stage

MODELS

29

slide-30
SLIDE 30

DeepLog Architecture

Detection Stage

MODELS

30

slide-31
SLIDE 31

DeepLog Architecture

31

slide-32
SLIDE 32

DeepLog Architecture

32

slide-33
SLIDE 33

DeepLog Architecture

33

slide-34
SLIDE 34

DeepLog Architecture

34

slide-35
SLIDE 35

DeepLog Architecture

35

slide-36
SLIDE 36

DeepLog Architecture

Training Stage Detection Stage

MODELS

36

slide-37
SLIDE 37

DeepLog Architecture

Training Stage

MODELS

37

slide-38
SLIDE 38

DeepLog Architecture

38

slide-39
SLIDE 39

DeepLog Architecture

39

slide-40
SLIDE 40

DeepLog Architecture

40

slide-41
SLIDE 41

DeepLog Architecture

41

slide-42
SLIDE 42

DeepLog Architecture

42

slide-43
SLIDE 43

DeepLog Architecture

43

slide-44
SLIDE 44

DeepLog Architecture

44

slide-45
SLIDE 45

DeepLog Architecture

45

slide-46
SLIDE 46

DeepLog Architecture

MODELS

46

slide-47
SLIDE 47

Log Key Anomaly Detection model

47

Example log key sequence: 25 18 54 57 18 56 … 25 18 54 57 56 18 … ➢ a rigorous set of logic and control flows ➢ a (more structured) natural language

slide-48
SLIDE 48

Log Key Anomaly Detection model

48

Example log key sequence: 25 18 54 57 18 56 … 25 18 54 57 56 18 … ➢ a rigorous set of logic and control flows ➢ a (more structured) natural language

natural language modeling multi-class classifier: history sequence => next key to appear

slide-49
SLIDE 49

Log Key Anomaly Detection model

49

Example log key sequence: 25 18 54 57 18 56 … 25 18 54 57 56 18 … ➢ a rigorous set of logic and control flows ➢ a (more structured) natural language

natural language modeling multi-class classifier: history sequence => next key to appear A log key is detected to be abnormal if it does not follow the prediction.

slide-50
SLIDE 50

Log Key Anomaly Detection model

Use long short-term memory (LSTM) architecture

50

slide-51
SLIDE 51

Log Key Anomaly Detection model

Use long short-term memory (LSTM) architecture

51

slide-52
SLIDE 52

Log Key Anomaly Detection model

Use long short-term memory (LSTM) architecture

Training: log key sequence: h=3 25 18 54 57 18 56 … 25 18 54 57 56 18 …

52

slide-53
SLIDE 53

Log Key Anomaly Detection model

Use long short-term memory (LSTM) architecture

Training: log key sequence: h=3 25 18 54 57 18 56 … 25 18 54 57 56 18 …

53

slide-54
SLIDE 54

Log Key Anomaly Detection model

Use long short-term memory (LSTM) architecture

Training: log key sequence: h=3 25 18 54 57 18 56 … 25 18 54 57 56 18 …

54

slide-55
SLIDE 55

Log Key Anomaly Detection model

Use long short-term memory (LSTM) architecture

Training: log key sequence: h=3 25 18 54 57 18 56 … 25 18 54 57 56 18 …

55

slide-56
SLIDE 56

Log Key Anomaly Detection model

Use long short-term memory (LSTM) architecture

56

Detection: In detection stage, DeepLog checks if the actual next log key is among its top g probable predictions.

slide-57
SLIDE 57

Log Key Anomaly Detection model

57

slide-58
SLIDE 58

Log Key Anomaly Detection model

58

slide-59
SLIDE 59

Log Key Anomaly Detection model

59

slide-60
SLIDE 60

Workflow Construction

Input: log key sequence 25 18 54 57 18 56 … 25 18 54 57 56 18 … Output:

60

slide-61
SLIDE 61

Workflow Construction

61

Method 1: Using Log Key Anomaly Detection model

  • -- LSTM prediction probabilities
slide-62
SLIDE 62

Workflow Construction

62

Method 1: Using Log Key Anomaly Detection model

  • -- LSTM prediction probabilities

An example of concurrency detection:

slide-63
SLIDE 63

Workflow Construction

63

Method 1: Using Log Key Anomaly Detection model

  • -- LSTM prediction probabilities

An example of concurrency detection:

slide-64
SLIDE 64

Workflow Construction

64

Method 1: Using Log Key Anomaly Detection model

  • -- LSTM prediction probabilities

An example of concurrency detection:

slide-65
SLIDE 65

Workflow Construction

65

Method 1: Using Log Key Anomaly Detection model

  • -- LSTM prediction probabilities

An example of concurrency detection:

slide-66
SLIDE 66

Method 1: Using Log Key Anomaly Detection model

  • -- LSTM prediction probabilities

An example of concurrency detection:

Workflow Construction

66

slide-67
SLIDE 67

Method 2: A density-based clustering approach

Workflow Construction

67

slide-68
SLIDE 68

Co-occurrence matrix of log keys (𝒍𝒋, 𝒍𝒌) within distance 𝒆

Workflow Construction

68

Method 2: A density-based clustering approach

𝑔

𝑒(𝑙𝑗, 𝑙𝑘) : the frequency of (𝑙𝑗, 𝑙𝑘) appearing together within distance d

𝑔(𝑙𝑗) : the frequency of 𝑙𝑗 in the input sequence 𝑞𝑒(i, 𝑘) : the probability of (𝑙𝑗, 𝑙𝑘) appearing together within distance d

slide-69
SLIDE 69

Example: Log messages of a particular log key: 𝒖𝟑: 𝑼𝒑𝒑𝒍 𝟏. 𝟕𝟐 𝒕𝒇𝒅𝒑𝒐𝒆𝒕 𝒖𝒑 𝒆𝒇𝒃𝒎𝒎𝒑𝒅𝒃𝒖𝒇 𝒐𝒇𝒖𝒙𝒑𝒔𝒍 … 𝒖′𝟑: 𝑼𝒑𝒑𝒍 𝟐. 𝟐 𝒕𝒇𝒅𝒑𝒐𝒆𝒕 𝒖𝒑 𝒆𝒇𝒃𝒎𝒎𝒑𝒅𝒃𝒖𝒇 𝒐𝒇𝒖𝒙𝒑𝒔𝒍 … ….

Parameter Value Anomaly Detection model

69

slide-70
SLIDE 70

Example: Log messages of a particular log key: 𝒖𝟑: 𝑼𝒑𝒑𝒍 𝟏. 𝟕𝟐 𝒕𝒇𝒅𝒑𝒐𝒆𝒕 𝒖𝒑 𝒆𝒇𝒃𝒎𝒎𝒑𝒅𝒃𝒖𝒇 𝒐𝒇𝒖𝒙𝒑𝒔𝒍 … 𝒖′𝟑: 𝑼𝒑𝒑𝒍 𝟐. 𝟐 𝒕𝒇𝒅𝒑𝒐𝒆𝒕 𝒖𝒑 𝒆𝒇𝒃𝒎𝒎𝒑𝒅𝒃𝒖𝒇 𝒐𝒇𝒖𝒙𝒑𝒔𝒍 … …. Parameter value vectors overtime: [𝒖𝟑- 𝒖𝟐, 0.61], [𝒖′𝟑- 𝒖′𝟐, 1.1], ….

Parameter Value Anomaly Detection model

70

slide-71
SLIDE 71

Example: Log messages of a particular log key: 𝒖𝟑: 𝑼𝒑𝒑𝒍 𝟏. 𝟕𝟐 𝒕𝒇𝒅𝒑𝒐𝒆𝒕 𝒖𝒑 𝒆𝒇𝒃𝒎𝒎𝒑𝒅𝒃𝒖𝒇 𝒐𝒇𝒖𝒙𝒑𝒔𝒍 … 𝒖′𝟑: 𝑼𝒑𝒑𝒍 𝟐. 𝟐 𝒕𝒇𝒅𝒑𝒐𝒆𝒕 𝒖𝒑 𝒆𝒇𝒃𝒎𝒎𝒑𝒅𝒃𝒖𝒇 𝒐𝒇𝒖𝒙𝒑𝒔𝒍 … …. Parameter value vectors overtime: [𝒖𝟑- 𝒖𝟐, 0.61], [𝒖′𝟑- 𝒖′𝟐, 1.1], …. Multi-variate time series data anomaly detection problem!

Parameter Value Anomaly Detection model

71

slide-72
SLIDE 72

Multi-variate time series data anomaly detection problem ✓ Leverage LSTM-based approach; ✓ A parameter value vector is given as input at each time step; ✓ An anomaly is detected if the mean-square-error (MSE) between prediction and actual data is too big.

Parameter Value Anomaly Detection model

72

slide-73
SLIDE 73

Parameter Value Anomaly Detection model

history time value

73

Multi-variate time series data anomaly detection problem ✓ Leverage LSTM-based approach; ✓ A parameter value vector is given as input at each time step; ✓ An anomaly is detected if the mean-square-error (MSE) between prediction and actual data is too big.

slide-74
SLIDE 74

Parameter Value Anomaly Detection model

prediction

74

time value history

Multi-variate time series data anomaly detection problem ✓ Leverage LSTM-based approach; ✓ A parameter value vector is given as input at each time step; ✓ An anomaly is detected if the mean-square-error (MSE) between prediction and actual data is too big.

slide-75
SLIDE 75

Parameter Value Anomaly Detection model

actual

time

75

prediction value history

Multi-variate time series data anomaly detection problem ✓ Leverage LSTM-based approach; ✓ A parameter value vector is given as input at each time step; ✓ An anomaly is detected if the mean-square-error (MSE) between prediction and actual data is too big.

slide-76
SLIDE 76

Parameter Value Anomaly Detection model

actual

time

76

prediction value history MSE > Threshold ?

Multi-variate time series data anomaly detection problem ✓ Leverage LSTM-based approach; ✓ A parameter value vector is given as input at each time step; ✓ An anomaly is detected if the mean-square-error (MSE) between prediction and actual data is too big.

slide-77
SLIDE 77

Parameter Value Anomaly Detection model

history time value

77

Multi-variate time series data anomaly detection problem ✓ Leverage LSTM-based approach; ✓ A parameter value vector is given as input at each time step; ✓ An anomaly is detected if the mean-square-error (MSE) between prediction and actual data is too big.

slide-78
SLIDE 78

Parameter Value Anomaly Detection model

actual prediction

time value

78

history

Multi-variate time series data anomaly detection problem ✓ Leverage LSTM-based approach; ✓ A parameter value vector is given as input at each time step; ✓ An anomaly is detected if the mean-square-error (MSE) between prediction and actual data is too big.

slide-79
SLIDE 79

Parameter Value Anomaly Detection model

actual prediction

time value

79

history MSE > Threshold ?

Multi-variate time series data anomaly detection problem ✓ Leverage LSTM-based approach; ✓ A parameter value vector is given as input at each time step; ✓ An anomaly is detected if the mean-square-error (MSE) between prediction and actual data is too big.

slide-80
SLIDE 80

Parameter Value Anomaly Detection model

history

time value

80

Multi-variate time series data anomaly detection problem ✓ Leverage LSTM-based approach; ✓ A parameter value vector is given as input at each time step; ✓ An anomaly is detected if the mean-square-error (MSE) between prediction and actual data is too big.

slide-81
SLIDE 81

LSTM model online update

Q: How to handle false positive?

81

slide-82
SLIDE 82

LSTM model online update

history Log sequence:

Q: How to handle false positive?

82

slide-83
SLIDE 83

LSTM model online update

history

model

Log sequence:

Q: How to handle false positive?

83

slide-84
SLIDE 84

LSTM model online update

history

model

Log sequence: prediction

Q: How to handle false positive?

84

slide-85
SLIDE 85

LSTM model online update

history current

model

Anomaly?

Log sequence: prediction

Q: How to handle false positive?

85

slide-86
SLIDE 86

LSTM model online update

history current

model

Anomaly?

Log sequence: prediction

Q: How to handle false positive?

Yes

86

slide-87
SLIDE 87

LSTM model online update

history current

model

Anomaly?

Log sequence: prediction

Q: How to handle false positive?

Yes False positive?

87

slide-88
SLIDE 88

LSTM model online update

history current

model

Anomaly? Yes update model using this case: “history -> current” False positive? Yes

Log sequence: prediction

Q: How to handle false positive?

88

slide-89
SLIDE 89

Evaluation results on HDFS log data [1].

(over a million log entries with labeled anomalies)

[1] PCA (SOSP’09), IM (UsenixATC’10), N-gram (baseline language model)

Evaluation – log key anomaly detection

Up is good

89

slide-90
SLIDE 90

Evaluation – parameter value anomaly detection

Evaluation results on OpenStack cloud log with different confidence intervals (CIs)

MSE: mean square error

90

slide-91
SLIDE 91

Evaluation – parameter value anomaly detection

MSE: mean square error

generated on CloudLab; VM creation/deletion operations; injected performance anomalies.

Evaluation results on OpenStack cloud log with different confidence intervals (CIs)

91

slide-92
SLIDE 92

Evaluation – parameter value anomaly detection

Evaluation results on OpenStack cloud log with different confidence intervals (CIs)

MSE: mean square error

thresholds

92

slide-93
SLIDE 93

Evaluation – parameter value anomaly detection

Evaluation results on OpenStack cloud log with different confidence intervals (CIs)

MSE: mean square error

thresholds ANOMALY

93

slide-94
SLIDE 94

Evaluation – parameter value anomaly detection

Evaluation results on OpenStack cloud log with different confidence intervals (CIs)

MSE: mean square error

thresholds ANOMALY False Positive

94

slide-95
SLIDE 95

Evaluation – LSTM model online update

Evaluation on Blue Gene/L log, with and without online model update. Up is good

95

slide-96
SLIDE 96

Evaluation – LSTM model online update

Evaluation on Blue Gene/L log, with and without online model update. Up is good

HPC log with labeled anomalies; Available at https://www.usenix.org/cfdr-data

96

slide-97
SLIDE 97

Evaluation – case study: network security log

97

Dataset: IEEE VAST Challenge 2011

(Mini Challenge 2 – Computer Networking Operations) The dataset contains firewall log, IDS log, etc.

slide-98
SLIDE 98

Evaluation – case study: network security log

98

Dataset: IEEE VAST Challenge 2011

(Mini Challenge 2 – Computer Networking Operations) The dataset contains firewall log, IDS log, etc.

Detection results.

slide-99
SLIDE 99

Evaluation – case study: network security log

99

Dataset: IEEE VAST Challenge 2011

(Mini Challenge 2 – Computer Networking Operations) The dataset contains firewall log, IDS log, etc.

Detection results.

Could be fixed with prior knowledge

  • f “documented IP”
slide-100
SLIDE 100

Evaluation – workflow construction

Constructed workflow of VM Creation.

(previously generated OpenStack cloud log)

100

slide-101
SLIDE 101

Evaluation – workflow construction

How does it help to diagnose anomalies? Constructed workflow of VM Creation.

(previously generated OpenStack cloud log)

101

slide-102
SLIDE 102

Evaluation – workflow construction

Parameter value anomaly

How does it help to diagnose anomalies? Constructed workflow of VM Creation.

(previously generated OpenStack cloud log)

102

slide-103
SLIDE 103

Evaluation – workflow construction

Time difference (performance) anomaly Parameter value anomaly

How does it help to diagnose anomalies? Constructed workflow of VM Creation.

(previously generated OpenStack cloud log)

103

slide-104
SLIDE 104

Evaluation – workflow construction

How does it help to diagnose anomalies? Constructed workflow of VM Creation.

(previously generated OpenStack cloud log)

104

Identified anomaly:

Instance took too long to build because of the transition from 52 -> 53

slide-105
SLIDE 105

Evaluation – workflow construction

How does it help to diagnose anomalies? Identified anomaly:

Instance took too long to build because of the transition from 52 -> 53

Injected anomaly:

During VM creation, network speed from controller to compute node is throttled.

Constructed workflow of VM Creation.

(previously generated OpenStack cloud log)

105

slide-106
SLIDE 106

Summary

DeepLog ➢ A realtime system log anomaly detection framework. ➢ LSTM is used to model system execution paths and log parameter values. ➢ Workflow models are built to help anomaly diagnosis. ➢ It supports online model update.

Min Du mind@cs.utah.edu Feifei Li lifeifei@cs.utah.edu

106

Thank you!