Label-Less: A Semi-Automatic Labelling Tool for KPI Anomalies
Nengwen Zhao, Jing Zhu, Rong Liu, Dapeng Liu, Ming Zhang, Dan Pei INFOCOM 2019
1
Label-Less: A Semi-Automatic Labelling Tool for KPI Anomalies - - PowerPoint PPT Presentation
Label-Less: A Semi-Automatic Labelling Tool for KPI Anomalies Nengwen Zhao , Jing Zhu, Rong Liu, Dapeng Liu, Ming Zhang, Dan Pei INFOCOM 2019 1 Operational Background Design Evaluation Experience 2 Operational Background Design
Nengwen Zhao, Jing Zhu, Rong Liu, Dapeng Liu, Ming Zhang, Dan Pei INFOCOM 2019
1
2
3
4
Internet-based services
n KPIs (Key Performance Indicators): A set of performance metrics
Search Engine Online Shopping Social Network Search response time Memory usage
5
Detect it !
KPI anomalies: unexpected behavior Potential failures
[IMC15]
6
Traditional statistical methods: Holt-Winters, MA, ARIMA… Supervised ensemble learning: Opprentice, EGADS… Unsupervised learning: Donut, Auto-encoder… 1 2 3
[IMC15,KDD15…] [WWW18,AAAI19,IJCAI17,KDD17…] [CONEXT11, INFOCOM12,SIGCOMM13…]
7
Traditional statistical methods: Holt-Winters, MA, ARIMA… Supervised ensemble learning: Opprentice, EGADS… Unsupervised learning: Donut, Autoencoder… 1 2 3
[IMC15,KDD15…] [WWW18,AAAI19,…] [IMC03, INFOCOM12…]
8
Donut is designed only for seasonal KPIs [WWW18] seasonal Variable Stationary
9
Time series datasets
detection KPI anomaly detection datasets
[ICMLA15]
points
1400 data points, only
with four data points.
n
– UCR time series archive, UCI machine learning repository – Problem: aim at classification, clustering or regression not for anomaly detection
n
– Yahoo Benchmark , Numenta Anomaly Benchmark [KDD15 ,ICMLA15] – Problem
10
11
1 Anomaly? Where are anomalies? Weekend normal pattern!
12
2 6-month-long KPI with 1-minute interval 30, 0000 data points Take a few hours to label A labeling tool
13
3
Diverse KPI pattern Large number of KPIs Rare anomalies in each KPI
Seasonal Variable Stationary Usually less than 1% Millions of KPIs in large companies
14
Thousands of unlabeled KPIs 30 labeled KPIs
Labeling overhead
n Design a semi-automatic labeling framework
15
Visually scanning through the KPIs to gauge their normal patterns and variations Unsupervised anomaly detection: provide potential anomalies 1 Raw KPI Potential anomalies
n Design a semi-automatic labeling framework
16
Examining KPIs back and forth to check whether similar patterns are labeled consistently Anomaly similarity search: discover similar anomalies automatically 2 Similar anomalies Template Potential anomalies
17
18
Isolation Forest Feature Extraction Anomaly Template Threshold Selection Accelerated DTW Candidate Anomalies Check of top-k similar anomalies All anomalies have been labeled? Yes (Labeled KPIs) No (Choose another template) Operators Investigate Preprocessing Unlabeled KPIs
Unsupervised Anomaly Detection Anomaly Similarity Search
19
Feature extraction Isolation Forest Threshold selection
n
– Time series prediction models as feature extractors. – Feature: prediction error
Candidate anomalies KPIs
04-04 04-05 04-06 04-07
Time (Day)
1 2 3
Value
KPI value Predicted value
Normal points: well predicted with small prediction errors Anomalies: large prediction errors
20
n
– Anomalies: few and different – Anomaly score: average path lengths
Diff WMA MA EWMA HW
……
ARIMA
Score iTree
Anomaly Normal samples
< > β β
1 0.9 0.6 0.2 0.1
iForest n
– Our goal: generate candidate potential anomalies – High recall and acceptable precision
08-03 08-04 08-05
Time (Day)
2 4 6
Value
KPI value 1
Anomaly score
Anomaly score
Output anomaly score
n
– None of other distance measures consistently outperforms DTW [KDD12…]
21
Candidate anomalies Anomaly similarity search Top-k similar anomalies check Anomaly template x y
n
DTW alignment
n
22
1
2
3
Limit the permissible warping paths by providing local restrictions on the set of alternative steps considered Use a cheap-to-compute lower bound of DTW to prune segments that cannot possibly be the top-k similar anomalies We adopt the following three techniques from existing works to speed up DTW
23
24
Ø Four datasets containing 30 KPIs Ø A time span of about six months Ø 1-minute monitoring interval
Ø Unsupervised anomaly detection
Ø Anomaly similarity search
25
0.2 0.4 0.6 0.8 1 A B C D
Recall Dataset Recall of Unsupervised Anomaly Detection
Isolation Forest Feature Extraction Anomaly Template Threshold Selection Accelerated DTW Candidate Anomalies Check of top-k similar anomalies All anomalies have been labeled? Yes (Labeled KPIs) No (Choose another template) Operators Investigate Preprocessing Unlabeled KPIs26 0.2 0.4 0.6 0.8 1 A B C D
Best F-score Dataset
DTW ED SBD
0.2 0.4 0.6 0.8 1 A B C D
AUC Dataset
DTW ED SBD
n
Unsupervised anomaly detection: Reduce search space
n
Accelerated DTW: Reduce DTW computational complexity
27
1 2 3 4 5 6 A B C D
Per-KPI running time (seconds) Dataset Comparison of per-KPI response time
Anomaly Similarity Search Naïve Search Original DTW
Under 0.5 second, real-time response
Isolation Forest Feature Extraction Anomaly Template Threshold Selection Accelerated DTW Candidate Anomalies Check of top-k similar anomalies All anomalies have been labeled? Yes (Labeled KPIs) No (Choose another template) Operators Investigate Preprocessing Unlabeled KPIs28
29
n
– Candidate potential anomalies (marked in red) – The threshold can be tuned by operators Tune threshold
30
n
Similar anomalies Anomaly template
31
32 20 40 60 80 100 120 A B C D
Time(minutes) Dataset
Comparison of per-KPI Labeling Time
Label-Less Traditional
Group 1: Traditional labeling Group 2: Label-less
n
33
n
n
34
znw17@mails.tsinghua.edu.cn INFOCOM 2019