LogAnomaly: Unsupervised Detection of Sequential and Quantitative - - PowerPoint PPT Presentation

loganomaly unsupervised detection of sequential and
SMART_READER_LITE
LIVE PREVIEW

LogAnomaly: Unsupervised Detection of Sequential and Quantitative - - PowerPoint PPT Presentation

LogAnomaly: Unsupervised Detection of Sequential and Quantitative Anomalies in Unstructured Logs Weibin Meng , Ying Liu, Yichen Zhu, Shenglin Zhang, Dan Pei Yuqing Liu, Yihao Chen, Ruizhi Zhang, Shimin Tao, Pei Sun and Rong Zhou 2019/9/10 1


slide-1
SLIDE 1

LogAnomaly: Unsupervised Detection of Sequential and Quantitative Anomalies in Unstructured Logs

Weibin Meng 1 2019/9/10

Weibin Meng, Ying Liu, Yichen Zhu, Shenglin Zhang, Dan Pei Yuqing Liu, Yihao Chen, Ruizhi Zhang, Shimin Tao, Pei Sun and Rong Zhou

slide-2
SLIDE 2

Internet Services

Weibin Meng 2 2019/9/10

122 156 201 254 319 396 2017 2018 2019 2020 2021 2022 100 200 300 400

Source: Cisco VNI Global IP Traffic will increase more than three times

The number of services is growing rapidly

Internet provide various types of services Stability of services are becoming more important

slide-3
SLIDE 3

Anomaly Detection

Weibin Meng 3 2019/9/10

■Anomalies will impact revenue and user experience. ■Anomaly detection plays an important role in service management.

slide-4
SLIDE 4

Types Timestamps Detailed messages

Switch Jul 10 19:03:03 Interface te-1/1/59, changed state to down Supercomputer Jun 4 6:45:50 RAS KERNEL INFO 87 L3 EDRAM error(s) (dcr 0x0157) detected and corrected over 27362 seconds HDFS Jun 8 13:42:26 INFO dfs.DataNode$PacketResponder: PacketResponder 1 for block blk_- 1608999687919862906 terminating Router Jul 11 11:05:07 Neighbour(rid:10.231.0.43, addr:10.231.39.61) on vlan23, changed state from Exchange to Loading

■Every service and device generates logs

General

Logs for Anomaly Detection

Weibin Meng 4 2019/9/10

■Logs are one of the most valuable data for anomaly detection

Unstructured logs

■Logs record a vast range of runtime information

Diverse

slide-5
SLIDE 5

Weibin Meng 5 2019/9/10

Logs

Keywords & Regular expressions Single log anomaly Quantitative Anomalies Sequential Anomalies

scenario

A single log can reflect an anomaly. e.g., “ power down” The number of multiple logs changes can reflect anomalies. e.g., num(down) != num(up) The sequence of multiple logs changes can reflect anomalies. e.g., OSPF failed to start Based on log sequence

pattern detection

Our work focus on log sequence anomaly detection

Logs for Anomaly Detection

slide-6
SLIDE 6

Manual Detection

Weibin Meng 6 2019/9/10

Runtime logs: OSPF ADJCHG, Nbr 1.1.1.1 on FastEthernet0/0 from Attempt to Init OSPF ADJCHG, Nbr 1.1.1.1 on FastEthernet0/0 from Init to Two-way OSPF ADJCHG, Nbr 1.1.1.1 on FastEthernet0/0 from Two-way to Exstart OSPF ADJCHG, Nbr 1.1.1.1 on FastEthernet0/0 from Two-way to Exstart Workflow of OSPF (a network protocol) startup : Down → Attempt → Init → Two-way→ Exstart → Exchange → Loading → Full

Every log is normal, but OSPF failed to start

Runtime logs: Line protocol on Interface ae3, changed state to down Interface ae3, changed state to down Interface ae3, changed state to up

An interface down event

  • ccurs

The explosion of logs

  • e.g., 10T/day in Huawei

An operator has incomplete information

  • f the overall system

Not all anomalies are explicitly displayed

  • Some anomalies hide in log

sequence. Quantitative relationship of Interface flapping : num(interface down) = num(interface up)

Automatically detect anomalies based on unstructured logs

slide-7
SLIDE 7

Previous studies

Templates (log keys):

  • T1. Interface *, changed state to down
  • T2. Vlan-interface *, changed state to down
  • T3. Interface *, changed state to up
  • T4. Vlan-interface *, changed state to up

Logs -> Template indexes: L1->T1, L2->T2, L3->T3 L4->T1, L5->T4, L6->T3 Log template index sequence: T1, T2, T3, T1, T4, T3 Logs:

  • L1. Interface ae3, changed state to down
  • L2. Vlan-interface vlan22, changed state to down
  • L3. Interface ae3, changed state to up.
  • L4. Interface ae1, changed state to down
  • L5. Vlan-interface vlan22, changed state to up
  • L6. Interface ae1, changed state to up

∆𝑢 ∆𝑢 ∆𝑢 ∆𝑢

Sliding/session windows

∆𝑢 Count Matrix ∆𝑢 ∆𝑢 ∆𝑢 ∆𝑢 ∆𝑢 ∆𝑢 ∆𝑢 ∆𝑢 ∆𝑢 T1, T2, T3, T1, T4 T1, T2, T3, T1, T4 ∆𝑢 T1, T2, T3, T1, T4

v1 v2 v3 v4 Cj 1 1 1 Cj+1 1 1 1 Cj+2 1 1 1 Cj+3 1 1 1

[v1 v2 v3] [v2 v3 v1] [v3 v1 v4] v1 v4 v3

sequence next ∆𝑢 ∆𝑢 ∆𝑢 ∆𝑢

Sliding/session windows

∆𝑢 ∆𝑢 ∆𝑢 ∆𝑢 ∆𝑢 ∆𝑢 ∆𝑢 ∆𝑢 ∆𝑢 ∆𝑢 T1, T2, T3, T1, T4 T1, T2, T3, T1, T4 ∆𝑢 T1, T2, T3, T1, T4

Weibin Meng 7 2019/9/10

Quantitative anomalies detection methods Sequential anomalies detection methods

■Existing log anomaly detection: ■Quantitative pattern based methods ■Sequential pattern based methods

LogCluster (ICSE’16) IM(ATC’10) PreFix(SIGMETRIS’18)PCA(SOSP’09) DeepLog (CCS’17)

  • Only comparing template indexes loses the

information hidden in template semantics

slide-8
SLIDE 8

Challenges

Weibin Meng 8 2019/9/10

Some templates are similar in semantics but different in indexes

Valuable information could be lost if only log template index is used.

Existing approaches cannot address this problem

Services can generate new log templates between two re-trainings Existing methods cannot detect sequential and quantitative anomalies simultaneously.

slide-9
SLIDE 9

Overview of LogAnomaly

Historical logs Template sequence Templates Word Vectors Real-time logs Template sequence Temporary Vectors Template Vectors Existing Vectors Vector sequence Vector sequence Model Comparison Output Offline learning Online detection Temporary Templates Synonyms& Antonyms Extract template2Vec Update Match Match Classification template2Vec template2Vec

Template Vector Sequence Count Vector

LSTM LSTM Attention 𝐰(%&'() 𝐃+ 𝐰(%&',-() … 𝐰(%&) 𝐃+./ 𝐃+.01/ 𝐰(%&',)

Weibin Meng 9 2019/9/10

An anomaly detection system based

  • n

unstructured logs

slide-10
SLIDE 10

Template Representation

Historical logs Template sequence Templates Word Vectors Real-time logs Template sequence Temporary Vectors Template Vectors Existing Vectors Vector sequence Vector sequence Model Comparison Output Offline learning Online detection Temporary Templates Synonyms& Antonyms Extract template2Vec Update Match Match Classification template2Vec template2Vec

Template Vector Sequence Count Vector

LSTM LSTM Attention 𝐰(%&'() 𝐃+ 𝐰(%&',-() … 𝐰(%&) 𝐃+./ 𝐃+.01/ 𝐰(%&',)

Weibin Meng 10 2019/9/10

Address the first challenge and save template semantics.

slide-11
SLIDE 11

Template Representations

Weibin Meng 11 2019/9/10

Insights Goals

Logs: 1.Interface ae3, changed state to down 2.Vlan-interface vlan22, changed state to down 3.Interface ae3, changed state to up 4.Vlan-interface vlan22, changed state to up 5.Interface ae1, changed state to down 6.Vlan-interface vlan20, changed state to down 7.Interface ae1, changed state to up 8.Vlan-interface vlan20, changed state to up Templates: 1.Interface *, changed state to down 2.Vlan-interface *, changed state to down 3.Interface *, changed state to up 4.Vlan-interface *, changed state to up Logs>Templates: L1->T1 L2->T2 L3->T3 L4->T4 L5->T1 L6->T2 L7->T3 L8->T4

■Some existing templates have similar semantics ■Some logs containing antonyms look similar but have opposite semantics ■Convert log templates to “soft” representations ■Takes antonyms and synonyms into consideration

slide-12
SLIDE 12

(3) (3) (1)

Template2Vec

Weibin Meng 12 2019/9/10

T1 Interface * changed state to up

… …

Tn+1 Interface * changed state to up

Synonyms Antonyms

Interface Vlan-interface down up … … … … Interface [x1,…,xn] … … changed [x1,…,xn]

Syns&Ants Templates Word vectors Template vectors

V1 [x1,…,xn]

… …

Vn+1 [x1,…,xn] (2)

■template2Vec: (template representation method) 1. Construct the set of synonyms and antonyms

  • Combine domain knowledge and WordNet
  • 2. Generate word vectors by using dLCE[1] algorithm
  • dLCE is a distributional lexical-contrast embedding model
  • 3. Calculate template vectors.

[1] Kim Anh Nguyen, Sabine Schulte, and Ngoc Thang Vu. Integrating distributional lexical contrast into word embeddings for antonym-synonym

  • distinction. arXiv preprint arXiv:1605.07766, 2016.

Relations Word pairs Adding methods Synonyms down low WordNet Interface port Operators Antonyms DOWN UP WordNet powerDown powerOn Operators

slide-13
SLIDE 13

Template Approximation

Historical logs Template sequence Templates Word Vectors Real-time logs Template sequence Temporary Vectors Template Vectors Existing Vectors Vector sequence Vector sequence Model Similarity comparison Output Offline learning Online detection Temporary Templates Synonyms& Antonyms Extract template2Vec Update Match Match Classification template2Vec template2Vec

Template Vector Sequence Count Vector

LSTM LSTM Attention 𝐰(%&'() 𝐃+ 𝐰(%&',-() … 𝐰(%&) 𝐃+./ 𝐃+.01/ 𝐰(%&',)

Weibin Meng 13 2019/9/10

A mechanism to address new templates at runtime

slide-14
SLIDE 14

14 2019/9/10

Template Approximation

Templates Word Vectors Real-time logs Temporary Vectors Template Vectors Existing Vectors Temporary Templates

  • ffline

Template Approximation Between Two Consecutive Trainings

  • nline

Weibin Meng

Between two re-trainings ■Extract a temporary template for the log of a new type ■Map the temporary template vector into one of the existing vector

slide-15
SLIDE 15

Anomaly Detection

Historical logs Template sequence Templates Word Vectors Real-time logs Template sequence Temporary Vectors Template Vectors Existing Vectors Vector sequence Vector sequence Model Similarity comparison Output Offline learning Online detection Temporary Templates Synonyms& Antonyms Extract template2Vec Update Match Match Classification template2Vec template2Vec

Template Vector Sequence Count Vector

LSTM LSTM Attention 𝐰(%&'() 𝐃+ 𝐰(%&',-() … 𝐰(%&) 𝐃+./ 𝐃+.01/ 𝐰(%&',)

Weibin Meng 15 2019/9/10

Address the third challenge and detect two anomalies simultaneously.

slide-16
SLIDE 16

Anomaly detection

Weibin Meng 16 2019/9/10

Logs: L1 Interface ae3, changed state to down L2 Vlan-interface v2, changed state to down L3 Interface ae3, changed state to up. L4 Interface ae1, changed state to down L5 Vlan-interface v2, changed state to up L6 Interface ae1, changed state to up Templates (log keys): T1 Interface *, changed state to down T2 Vlan-interface *, changed state to down T3 Interface *, changed state to up T4 Vlan-interface *, changed state to up Templates index sequence: T1 T2 T3 T1 T4 T3 Templates vector sequence: v1 v2 v3 v1 v4 v3

Quantitative pattern (e.g., up = down)

Sliding windows

Sequential pattern (e.g, OSPF starting)

slide-17
SLIDE 17

Anomaly Detection

Weibin Meng 17 2019/9/10

■Sort probabilities:

■For a log sequence, we sort the possible next template vector based on their probabilities (of appear in the next log).

■Top k candidates :

■If the observed next template vector is included in the top k candidates (or similar enough with them), we regard it as normal.

Template Vector Sequence Count Vector

LSTM LSTM Attention 𝐰(%&'() 𝐃+ 𝐰(%&',-() … 𝐰(%&) 𝐃+./ 𝐃+.01/ 𝐰(%&',)

Vector sequence Similarity comparison Alarm

Combine sequential and quantitative relationship

slide-18
SLIDE 18

Datasets: Baselines:

Evaluation Datasets & Baselines

Weibin Meng 18 2019/9/10

Datasets Duration # of logs # of anomalies BGL 7 months 4,747,963 348,460 (logs) HDFS 38.7 hours 11,175,629 16,838 (blocks) ■BGL: ■Generated by the Blue Gene/L supercomputer. ■HDFS: ■Collected from more than 200 Amazon nodes. ■LogCluster (ICSE’16) ■Invariants Mining (ATC’10) ■PCA (SOSP’09) ■Deeplog (CCS’17)

slide-19
SLIDE 19

Evaluation of LogAnomaly

BGL dataset HDFS dataset

Weibin Meng 19 2019/9/10

LogAnomaly achieves the best performance

slide-20
SLIDE 20

Case Study

Service were impacted Oct 13 22:15 O c t 1 4 1 : 1 6 Oct 13 15:59 LogAnomaly alarmed O c t 1 9 : S e p 2 5 : Beginning End DeepLog alarmed IM alarmed Oct 10 08:25 O c t 1 3 1 5 : Traffic dropped Service recovered

Weibin Meng 20 2019/9/10

Dataset Anomaly description Results LogAnomaly successfully detected anomalies and generated no false alarm. ■Logs form an aggregation switch deployed in a top cloud service provider. ■The traffic forwarded by this switch dropped from 15:00, Oct 13 ■The services provided by this switch were impacted from 22:15, Oct 13 ■The switch recovered at 1:16, Oct 14. ■All of LogAnomaly’s alarms were during 15:59 ~ 1:16

slide-21
SLIDE 21

Conclusion

Weibin Meng 21 2019/9/10

1

02 03 04

LogAnomaly

■An anomaly detection system based on unstructured logs.

template2Vec

■Represent template without losing semantic information.

Evaluation Template Approximation

■Merge templates of new types automatically ■Best results on public datasets and real-world switch logs

slide-22
SLIDE 22

Thanks

mwb16@mails.tsinghua.edu.cn

Weibin Meng 22 2019/9/10

slide-23
SLIDE 23

Evaluation of Online Detection

Weibin Meng 23 2019/9/10

slide-24
SLIDE 24
  • L1. 1537885119 IFNET/2/linkDown_active(l):CID=0x807a0405, alarmID=0x0852003; The

interface status changes.

  • L2. 1537885119 LACP/4/LACP_STATE_DOWN(l): CID=0x804804, PortName=40GE1/0/3;

The LACP state is down. Reason = The interface went down physically.

  • L3. 1537885130 DEVM/3/LocalFaultAlarm_clear(l): CID=0x852003, clearType=

service_resume, The local fault alarm has resumed.

  • L4. 1537885135 IFNET/2/linkDown_clear(l): CID=0x807a0405, alarmID=0x0852003; The

interface status changes. Physical link is up, mainName=Eth-Trunk104.

Case in Intro

  • L5. 1539139152 IFNET/2/linkDown_active(l):CID=0x807a0406, alarmID=0x0852007; The

interface status changes.

  • L6. 1539138152 LACP/4/LACP_STATE_DOWN(l): CID=0x804807, PortName=40GE1/0/3;

The LACP state is down. Reason = No LCAPDUs were received.

  • L7. 1539138164 DEVM/3/LocalFaultAlarm_clear(l): CID=0x852004, clearType=

service_resume, The local fault alarm has resumed.

  • L8. 1539138164 IFNET/2/linkDown_clear(l): CID=0x807a0406, alarmID=0x0852007; The

interface status changes. Physical link is up, mainName=Eth-Trunk104.

Weibin Meng 24 2019/9/10