Label-Less: A Semi-Automatic Labelling Tool for KPI Anomalies - - PowerPoint PPT Presentation

label less a semi automatic labelling tool for kpi
SMART_READER_LITE
LIVE PREVIEW

Label-Less: A Semi-Automatic Labelling Tool for KPI Anomalies - - PowerPoint PPT Presentation

Label-Less: A Semi-Automatic Labelling Tool for KPI Anomalies Nengwen Zhao , Jing Zhu, Rong Liu, Dapeng Liu, Ming Zhang, Dan Pei INFOCOM 2019 1 Operational Background Design Evaluation Experience 2 Operational Background Design


slide-1
SLIDE 1

Label-Less: A Semi-Automatic Labelling Tool for KPI Anomalies

Nengwen Zhao, Jing Zhu, Rong Liu, Dapeng Liu, Ming Zhang, Dan Pei INFOCOM 2019

1

slide-2
SLIDE 2

2

Background Design Evaluation Operational Experience

slide-3
SLIDE 3

3

Background Design Evaluation Operational Experience

slide-4
SLIDE 4

Internet-Based Services and KPI

4

Internet-based services

n KPIs (Key Performance Indicators): A set of performance metrics

that monitor the service

Search Engine Online Shopping Social Network Search response time Memory usage

slide-5
SLIDE 5

5

KPI Anomaly

Detect it !

KPI anomalies: unexpected behavior Potential failures

[IMC15]

slide-6
SLIDE 6

6

KPI Anomaly Detection Methods

Traditional statistical methods: Holt-Winters, MA, ARIMA… Supervised ensemble learning: Opprentice, EGADS… Unsupervised learning: Donut, Auto-encoder… 1 2 3

[IMC15,KDD15…] [WWW18,AAAI19,IJCAI17,KDD17…] [CONEXT11, INFOCOM12,SIGCOMM13…]

ØExisting KPI anomaly detection methods:

slide-7
SLIDE 7

7

KPI Anomaly Detection Methods

Traditional statistical methods: Holt-Winters, MA, ARIMA… Supervised ensemble learning: Opprentice, EGADS… Unsupervised learning: Donut, Autoencoder… 1 2 3

[IMC15,KDD15…] [WWW18,AAAI19,…] [IMC03, INFOCOM12…]

Ø Anomaly detection products in industry:

  • Prometheus, Anodot, Kibana…

ØExisting KPI anomaly detection methods: KPI anomaly detection is very important and many efforts have been devoted to the research of anomaly detection

slide-8
SLIDE 8

KPI Anomaly Detection Methods

8

The performance in reality is far from satisfying:

  • Lack of generality
  • KPIs in practice have various types of patterns

Donut is designed only for seasonal KPIs [WWW18] seasonal Variable Stationary

slide-9
SLIDE 9

Public Datasets

9

Types Datasets Problem

Time series datasets

  • UCR time series archive
  • UCI machine learning repository
  • Not for anomaly

detection KPI anomaly detection datasets

  • Yahoo Benchmark[KDD15]
  • Numenta Anomaly Benchmark

[ICMLA15]

  • KPIs with limited data

points

  • Synthetic anomalies

1400 data points, only

  • ne anomaly segment

with four data points.

slide-10
SLIDE 10

n

Public time series datasets

– UCR time series archive, UCI machine learning repository – Problem: aim at classification, clustering or regression not for anomaly detection

n

Public KPI anomaly detection datasets

– Yahoo Benchmark , Numenta Anomaly Benchmark [KDD15 ,ICMLA15] – Problem

  • KPIs with limited data points
  • Synthetic anomalies

10

Public Datasets

The community of KPI anomaly detection is in an urgent need for a large-scale and diverse KPI anomaly dataset.

slide-11
SLIDE 11

Obtaining a large-scale KPI anomaly dataset with high-quality ground truth has been a great challenge.

11

Challenges

Labeling KPI anomalies takes domain knowledge of IT operations

1 Anomaly? Where are anomalies? Weekend normal pattern!

slide-12
SLIDE 12

Challenges

12

It is labor intensive to carefully examine a several-month-long KPI back and forth and try to label anomalies in a consistent manner

2 6-month-long KPI with 1-minute interval 30, 0000 data points Take a few hours to label A labeling tool

slide-13
SLIDE 13

Challenges

13

The number of KPIs that need to be labeled is very large

3

Diverse KPI pattern Large number of KPIs Rare anomalies in each KPI

Seasonal Variable Stationary Usually less than 1% Millions of KPIs in large companies

slide-14
SLIDE 14

Challenges

14

Labeling overhead has become the main hurdle to large-scale KPI anomaly dataset, which in turn is the main hurdle to effective and practical KPI anomaly detection.

Ø A real example:

KPI Anomaly Detection Algorithm Competition: http://iops.ai/

Thousands of unlabeled KPIs 30 labeled KPIs

Labeling overhead

slide-15
SLIDE 15

n Design a semi-automatic labeling framework

15

Key Ideas

Visually scanning through the KPIs to gauge their normal patterns and variations Unsupervised anomaly detection: provide potential anomalies 1 Raw KPI Potential anomalies

slide-16
SLIDE 16

n Design a semi-automatic labeling framework

16

Key Ideas

Examining KPIs back and forth to check whether similar patterns are labeled consistently Anomaly similarity search: discover similar anomalies automatically 2 Similar anomalies Template Potential anomalies

slide-17
SLIDE 17

17

Background Design Evaluation Operational Experience

slide-18
SLIDE 18

18

Label-Less Overview

Isolation Forest Feature Extraction Anomaly Template Threshold Selection Accelerated DTW Candidate Anomalies Check of top-k similar anomalies All anomalies have been labeled? Yes (Labeled KPIs) No (Choose another template) Operators Investigate Preprocessing Unlabeled KPIs

Unsupervised Anomaly Detection Anomaly Similarity Search

slide-19
SLIDE 19

19

Unsupervised Anomaly Detection

Feature extraction Isolation Forest Threshold selection

n

Feature extraction

– Time series prediction models as feature extractors. – Feature: prediction error

Candidate anomalies KPIs

04-04 04-05 04-06 04-07

Time (Day)

1 2 3

Value

KPI value Predicted value

Normal points: well predicted with small prediction errors Anomalies: large prediction errors

slide-20
SLIDE 20

Unsupervised Anomaly Detection

20

n

Isolation Forest [ICDM08]

– Anomalies: few and different – Anomaly score: average path lengths

Diff WMA MA EWMA HW

……

ARIMA

Score iTree

Anomaly Normal samples

< > β β

1 0.9 0.6 0.2 0.1

iForest n

Threshold selection

– Our goal: generate candidate potential anomalies – High recall and acceptable precision

08-03 08-04 08-05

Time (Day)

2 4 6

Value

KPI value 1

Anomaly score

Anomaly score

Output anomaly score

slide-21
SLIDE 21

n

DTW as Similarity Measure

– None of other distance measures consistently outperforms DTW [KDD12…]

21

Anomaly Similarity Search

Candidate anomalies Anomaly similarity search Top-k similar anomalies check Anomaly template x y

n

Goals: high accuracy and low response latency

DTW alignment

slide-22
SLIDE 22

n

Accelerated DTW

22

Anomaly Similarity Search

Constrained path

1

Lower bound

2

Early stopping

3

Limit the permissible warping paths by providing local restrictions on the set of alternative steps considered Use a cheap-to-compute lower bound of DTW to prune segments that cannot possibly be the top-k similar anomalies We adopt the following three techniques from existing works to speed up DTW

slide-23
SLIDE 23

23

Background Design Evaluation Operational Experience

slide-24
SLIDE 24

24

Datasets and Metrics

Datasets

Ø Four datasets containing 30 KPIs Ø A time span of about six months Ø 1-minute monitoring interval

Metrics

Ø Unsupervised anomaly detection

  • Recall

Ø Anomaly similarity search

  • Best F-score and AUC
  • Response time
slide-25
SLIDE 25

Performance of Unsupervised Anomaly Detection

25

High recall with few false negative

0.2 0.4 0.6 0.8 1 A B C D

Recall Dataset Recall of Unsupervised Anomaly Detection

Isolation Forest Feature Extraction Anomaly Template Threshold Selection Accelerated DTW Candidate Anomalies Check of top-k similar anomalies All anomalies have been labeled? Yes (Labeled KPIs) No (Choose another template) Operators Investigate Preprocessing Unlabeled KPIs
slide-26
SLIDE 26

Performance of Anomaly Similarity Search

26 0.2 0.4 0.6 0.8 1 A B C D

Best F-score Dataset

DTW ED SBD

0.2 0.4 0.6 0.8 1 A B C D

AUC Dataset

DTW ED SBD

Ø Comparison with other distance measures DTW is a good choice!

Isolation Forest Feature Extraction Anomaly Template Threshold Selection Accelerated DTW Candidate Anomalies Check of top-k similar anomalies All anomalies have been labeled? Yes (Labeled KPIs) No (Choose another template) Operators Investigate Preprocessing Unlabeled KPIs
slide-27
SLIDE 27

n

Unsupervised anomaly detection: Reduce search space

n

Accelerated DTW: Reduce DTW computational complexity

27

Efficiency of Anomaly Similarity Search

1 2 3 4 5 6 A B C D

Per-KPI running time (seconds) Dataset Comparison of per-KPI response time

Anomaly Similarity Search Naïve Search Original DTW

Under 0.5 second, real-time response

Isolation Forest Feature Extraction Anomaly Template Threshold Selection Accelerated DTW Candidate Anomalies Check of top-k similar anomalies All anomalies have been labeled? Yes (Labeled KPIs) No (Choose another template) Operators Investigate Preprocessing Unlabeled KPIs
slide-28
SLIDE 28

28

Background Design Evaluation Operational Experience

slide-29
SLIDE 29

29

Labeling Tool with Label-Less

n

Unsupervised anomaly detection

– Candidate potential anomalies (marked in red) – The threshold can be tuned by operators Tune threshold

slide-30
SLIDE 30

30

Labeling Tool with Label-Less

n

Anomaly similarity search

Similar anomalies Anomaly template

slide-31
SLIDE 31

Comparison of Labeling time

31

Scanning back and forth Checking labeling consistency Unsupervised anomaly detection Anomaly similarity search

Traditional labeling Label-Less

slide-32
SLIDE 32

Comparison of Labeling time

32 20 40 60 80 100 120 A B C D

Time(minutes) Dataset

Comparison of per-KPI Labeling Time

Label-Less Traditional

Reduce more than 90% labeling time!

  • Experiment: eight voluntary experienced operators

Group 1: Traditional labeling Group 2: Label-less

slide-33
SLIDE 33

n

Labeling overhead has become the main hurdle to researching effective and practical KPI anomaly detection.

33

Conclusion

n

Label-Less is important first step to enable an ImageNet-like large-scale KPI anomaly dataset with high-quality ground truth.

n

The semi-automatic labeling tool Label-Less can greatly reduce operators’ labeling overhead.

slide-34
SLIDE 34

34

znw17@mails.tsinghua.edu.cn INFOCOM 2019

Thank you! Q&A