Device-Agnostic Log Anomaly Classification with Partial Labels
2018/6/23 1 weibin
Weibin Meng, Ying Liu, Shenglin Zhang, Dan Pei Hui Dong, Lei Song, Xulong Luo
Classification with Partial Labels Weibin Meng , Ying Liu, Shenglin - - PowerPoint PPT Presentation
Device-Agnostic Log Anomaly Classification with Partial Labels Weibin Meng , Ying Liu, Shenglin Zhang, Dan Pei Hui Dong, Lei Song, Xulong Luo 2018/6/23 weibin 1 Motivation Architecture of Datacenter Networks Inter-DC Network Core Core
2018/6/23 1 weibin
Weibin Meng, Ying Liu, Shenglin Zhang, Dan Pei Hui Dong, Lei Song, Xulong Luo
2018/6/23 2 weibin
Inter-DC Network ToR Switch Server Aggregation Switch Access Router Core Router IDPS Firewall VPN Load balancer IDPS Firewall VPN Load balancer L3 L2 Core
2018/6/23 weibin 3
traffic flow
CPU
utilization
2018/6/23 weibin 4
Detailed Messages are Semi- structured natural languages provided by device developers
Message types are ambiguous for accurate classification
2018/6/23 weibin 5
Match
Syslog
Ignore Type 1
Configure anomalous regular expressions
Yes No
Operators
Type n
…
RE for Manufacturer A Manufacturer B logs
2018/6/23 weibin 6
2018/6/23 weibin 7
2018/6/23 8
Historical Logs Real-time Logs Filtering Parameters PU Binary Classifier Vocabulary Feature Vector Multiclass Classifier Anomaly Records Top-n Keywords Filtering Parameters Feature Vector Detect Anomalous Logs Classify Anomalous Logs Alarm Offline Learning Component Online Classification Component
weibin
2018/6/23 weibin 9
The universal method to construct a text feature vector is the bag-of-words model.
𝑀1 Interface te-1/1/59 changed state to down 𝑀2
VlanInterface
vlan22 changed state to up 𝑀3 Neighbour vlan23 changed state from Exchange to Loading
Interface changed state to down
VlanInterface Neighbour
from Exchange Loading up 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
𝑀1 𝑀2 𝑀3
bag-of-words vectors: logs:
Vocabulary
Assign weighting values to each component in vectors. (e.g., TF-IDF)
PU learning
unlabeled data
2018/6/23 weibin 10
: positive data (Gang Niu et al. NIPS’16)
2018/6/23 weibin 11
2018/6/23 weibin 12
Sampled anomalous logs randomly cross all switch types and assumed they have no labels. PU Learning classifier is more stable than traditional classifier.
2018/6/23 weibin 13
LogClass is more accurate. The overheads of L-LDA and RE are larger than LogClass
2018/6/23 weibin 14
Challenges
LogClass
Evaluation
mwb16@mails.tsinghua.edu.cn
2018/6/23 weibin 15