Opprentice: Towards Practical and Automatic Anomaly Detection - - PowerPoint PPT Presentation

opprentice towards practical and automatic anomaly
SMART_READER_LITE
LIVE PREVIEW

Opprentice: Towards Practical and Automatic Anomaly Detection - - PowerPoint PPT Presentation

Opprentice: Towards Practical and Automatic Anomaly Detection Through Machine Learning Dapeng Liu , Youjian Zhao, Haowen Xu, Yongqian Sun, Dan Pei, Jiao Luo, Xiaowei Jing, Mei Feng 2015/12/3 Dapeng Liu (liudp10@mails.tsinghua.edu.cn) KPIs and


slide-1
SLIDE 1

2015/12/3 Dapeng Liu (liudp10@mails.tsinghua.edu.cn)

Opprentice: Towards Practical and Automatic Anomaly Detection Through Machine Learning

Dapeng Liu, Youjian Zhao, Haowen Xu, Yongqian Sun, Dan Pei, Jiao Luo, Xiaowei Jing, Mei Feng

slide-2
SLIDE 2

KPIs and Anomaly Detection

2015/12/3 Dapeng Liu (liudp10@mails.tsinghua.edu.cn)

KPIs (Key Performance Indicators): A set of performance measures that evaluate the service quality

Page views (PV) of Baidu 1

slide-3
SLIDE 3

KPIs and Anomaly Detection

2015/12/3 Dapeng Liu (liudp10@mails.tsinghua.edu.cn)

KPIs (Key Performance Indicators): A set of performance measures that evaluate the service quality

Page views (PV) of Baidu

KPI anomalous (unexpected) behaviors  Potential failures, bugs, attacks...

2

slide-4
SLIDE 4

KPIs and Anomaly Detection

2015/12/3 Dapeng Liu (liudp10@mails.tsinghua.edu.cn)

KPIs (Key Performance Indicators): A set of performance measures that evaluate the service quality

Page views (PV) of Baidu

KPI anomalous (unexpected) behaviors  Potential failures, bugs, attacks... Anomaly detection matters: Find anomalous behaviors of the KPI curve

 Diagnose and fix it  Avoid further influences and revenue losses

3

slide-5
SLIDE 5

KPIs and Anomaly Detection

2015/12/3 Dapeng Liu (liudp10@mails.tsinghua.edu.cn)

KPIs (Key Performance Indicators): A set of performance measures that evaluate the service quality

Page views (PV) of Baidu

KPI anomalous (unexpected) behaviors  Potential failures, bugs, attacks, etc. Anomaly detection matters: Find anomalous behaviors of the KPI curve

 Diagnose and fix it  Avoid further influences and revenue losses

4

IMC’ 15 Dissecting UbuntuOne: Autopsy of a Global-scale Personal Cloud Back-end IMC’ 15 The Dark Menace: Characterizing Network-based Attacks in the Cloud

slide-6
SLIDE 6

How to Build the Anomaly Detection System

2015/12/3 Dapeng Liu (liudp10@mails.tsinghua.edu.cn)

Domain experts (Operators)

  • Responsible for the KPIs
  • Knowing the KPI behaviors well

Developers

  • Building the detection system
  • Knowing several anomaly detectors

Simple threshold … Historical Average Wavelet Holt-Winters

5

slide-7
SLIDE 7

How to Build the Anomaly Detection System

2015/12/3 Dapeng Liu (liudp10@mails.tsinghua.edu.cn)

Operators Developers

Describe anomalies

In practice, it is more complex

6

slide-8
SLIDE 8

How to Build the Anomaly Detection System

2015/12/3 Dapeng Liu (liudp10@mails.tsinghua.edu.cn)

Operators Developers

Describe anomalies

Wavelet Moving Average Holt-Winters

… Select detectors & Tune parameters Detection System

In practice, it is more complex

7

slide-9
SLIDE 9

How to Build the Anomaly Detection System

2015/12/3 Dapeng Liu (liudp10@mails.tsinghua.edu.cn)

Operators Developers

Describe anomalies

Wavelet Moving Average Holt-Winters

… Select detectors & Tune parameters Detection System Anomalies

In practice, it is more complex

8

slide-10
SLIDE 10

How to Build the Anomaly Detection System

2015/12/3 Dapeng Liu (liudp10@mails.tsinghua.edu.cn)

Operators Developers

Describe anomalies

Wavelet Moving Average Holt-Winters

… Select detectors & Tune parameters Detection System Anomalies

In practice, it is more complex

9

slide-11
SLIDE 11

How to Build the Anomaly Detection System

2015/12/3 Dapeng Liu (liudp10@mails.tsinghua.edu.cn)

Operators Developers

Describe anomalies

Wavelet Moving Average Holt-Winters

… Select detectors & Tune parameters Detection System Anomalies

Challenges

Selecting and combining suitable detectors are tricky Detectors are not intuitive to tune

2. 3.

Operators have difficulties to precisely and formally define anomalies in advance

1.

10

slide-12
SLIDE 12

2015/12/3 Dapeng Liu (liudp10@mails.tsinghua.edu.cn)

slide-13
SLIDE 13

2015/12/3 Dapeng Liu (liudp10@mails.tsinghua.edu.cn)

(Operators’ apprentice)

slide-14
SLIDE 14

A More Natural Way

2015/12/3 Dapeng Liu (liudp10@mails.tsinghua.edu.cn)

OP

Opprentice

PV

13

slide-15
SLIDE 15

Design Goal

2015/12/3 Dapeng Liu (liudp10@mails.tsinghua.edu.cn)

Operators

Label Accuracy preference (Precision & recall) Provide Anomaly Detection Opprentice

14

slide-16
SLIDE 16

Design Goal

2015/12/3 Dapeng Liu (liudp10@mails.tsinghua.edu.cn)

Operators

Label Accuracy preference (Precision & recall) Provide Anomaly Detection Opprentice

vs.

15

slide-17
SLIDE 17

 Background and Motivation  Key Ideas  Results  Conclusion

2015/12/3 Dapeng Liu (liudp10@mails.tsinghua.edu.cn)

Outline

16

slide-18
SLIDE 18

Key Ideas

2015/12/3 Dapeng Liu (liudp10@mails.tsinghua.edu.cn)

Detector model:

17

slide-19
SLIDE 19

Key Ideas

2015/12/3 Dapeng Liu (liudp10@mails.tsinghua.edu.cn)

severity =

|𝑤𝑏𝑚𝑣𝑓−𝜈| 𝜏

𝑤𝑏𝑚𝑣𝑓 For example

Detector model:

Historical Average

18

slide-20
SLIDE 20

Key Ideas

2015/12/3 Dapeng Liu (liudp10@mails.tsinghua.edu.cn)

severity =

|𝑤𝑏𝑚𝑣𝑓−𝜈| 𝜏

𝑤𝑏𝑚𝑣𝑓 For example

Detector model:

Historical Average sThld

1

19

slide-21
SLIDE 21

Anomaly feature

Key Ideas

2015/12/3 Dapeng Liu (liudp10@mails.tsinghua.edu.cn)

severity =

|𝑤𝑏𝑚𝑣𝑓−𝜈| 𝜏

𝑤𝑏𝑚𝑣𝑓 For example

Detector model:

Historical Average sThld

1

20

slide-22
SLIDE 22

Key Ideas

2015/12/3 Dapeng Liu (liudp10@mails.tsinghua.edu.cn)

Detector Configurations

Time series decomposition HW 0.2 0.2 0.2 HW 0.5 0.7 0.7 Differencing-last day Differencing-last season WMA-WIN30 Differencing-last slot Historical average-4 season EWMA-0,7

Extract features KPI data

(Detectors with different parameters)

21

slide-23
SLIDE 23

Key Ideas

2015/12/3 Dapeng Liu (liudp10@mails.tsinghua.edu.cn)

Detector Configurations

Time series decomposition HW 0.2 0.2 0.2 HW 0.5 0.7 0.7 Differencing-last day Differencing-last season WMA-WIN30 Differencing-last slot Historical average-4 season EWMA-0,7

Extract features KPI data

(Detectors with different parameters)

22

slide-24
SLIDE 24

Key Ideas

2015/12/3 Dapeng Liu (liudp10@mails.tsinghua.edu.cn)

Classification in the feature space (Supervised machine learning)

23

slide-25
SLIDE 25

Key Ideas

2015/12/3 Dapeng Liu (liudp10@mails.tsinghua.edu.cn)

Classification in the feature space (Supervised machine learning)

Operators

24

slide-26
SLIDE 26

 Labeling overhead

– Solution: an effective labeling tool

2015/12/3 Dapeng Liu (liudp10@mails.tsinghua.edu.cn)

Address Challenges of Designing Opprentice

25

slide-27
SLIDE 27

 Labeling overhead

– Solution: an effective labeling tool

 Incomplete anomaly types in the historical data

– Solution: incremental re-training with new data

2015/12/3 Dapeng Liu (liudp10@mails.tsinghua.edu.cn)

Address Challenges of Designing Opprentice

26

slide-28
SLIDE 28

 Labeling overhead

– Solution: an effective labeling tool

 Incomplete anomaly types in the historical data

– Solution: incremental re-training with new data

 Class imbalance problem

– Solution: adjusting classification threshold (cThld) based on the preference

2015/12/3 Dapeng Liu (liudp10@mails.tsinghua.edu.cn)

Address Challenges of Designing Opprentice

27

slide-29
SLIDE 29

 Labeling overhead

– Solution: an effective labeling tool

 Incomplete anomaly types in the historical data

– Solution: incremental re-training with new data

 Class imbalance problem

– Solution: adjusting classification threshold (cThld) based on the preference

 Irrelevant and redundant features

– Solution: random forests

2015/12/3 Dapeng Liu (liudp10@mails.tsinghua.edu.cn)

Address Challenges of Designing Opprentice

28

slide-30
SLIDE 30

Design Overview

2015/12/3 Dapeng Liu (liudp10@mails.tsinghua.edu.cn)

Training a classifier

See the paper for full details

29

slide-31
SLIDE 31

Design Overview

2015/12/3 Dapeng Liu (liudp10@mails.tsinghua.edu.cn)

Training a classifier Detecting anomalies

30

See the paper for full details

slide-32
SLIDE 32

 Background and Motivation  Key Ideas  Results  Conclusion

2015/12/3 Dapeng Liu (liudp10@mails.tsinghua.edu.cn)

Outline

31

slide-33
SLIDE 33

Evaluation

2015/12/3 Dapeng Liu (liudp10@mails.tsinghua.edu.cn)

32

slide-34
SLIDE 34

Evaluation

2015/12/3 Dapeng Liu (liudp10@mails.tsinghua.edu.cn)

33

slide-35
SLIDE 35

Random forests vs. Basic Detectors and Static Combinations

2015/12/3 Dapeng Liu (liudp10@mails.tsinghua.edu.cn)

basic detector basic detector basic detector

Random forest

34

slide-36
SLIDE 36

Evaluation

2015/12/3 Dapeng Liu (liudp10@mails.tsinghua.edu.cn)

35

slide-37
SLIDE 37

Random Forests vs. Other Learning Algorithms

2015/12/3 Dapeng Liu (liudp10@mails.tsinghua.edu.cn)

(The order of features is based on mutual information)

36

slide-38
SLIDE 38

Evaluation

2015/12/3 Dapeng Liu (liudp10@mails.tsinghua.edu.cn)

37

See the paper for full details

slide-39
SLIDE 39

Opprentice as a whole

2015/12/3 Dapeng Liu (liudp10@mails.tsinghua.edu.cn)

Opprentice achieves

40% 23% 110%

more points inside the preference regions than 5-Fold cross-validation

38

Oracle mode (best case) Opprentice 5-Fold

slide-40
SLIDE 40

Opprentice as a whole

2015/12/3 Dapeng Liu (liudp10@mails.tsinghua.edu.cn)

Opprentice achieves

40% 23% 110%

more points inside the preference regions than 5-Fold cross-validation

39

Oracle mode (best case) Opprentice 5-Fold

slide-41
SLIDE 41

Conclusion

2015/12/3 Dapeng Liu (liudp10@mails.tsinghua.edu.cn) 

Opprentice is an automatic and accurate machine learning framework for KPI anomaly detection

Opprentice bridges the gap in applying complex detectors in practice

The idea of Opprentice

i.e., using machine learning to model the domain knowledge

could be a very promising way to automate other service managements

Opprentice Defining anomalies Selecting detectors Tuning detectors

40

slide-42
SLIDE 42

2015/12/3 Dapeng Liu (liudp10@mails.tsinghua.edu.cn)

Thank you

liudp10@mails.tsinghua.edu.cn

On the job market 