In Incorporating Feedback in into Tree-based Anomaly Detection - - PowerPoint PPT Presentation

β–Ά
in incorporating feedback in into tree based anomaly
SMART_READER_LITE
LIVE PREVIEW

In Incorporating Feedback in into Tree-based Anomaly Detection - - PowerPoint PPT Presentation

In Incorporating Feedback in into Tree-based Anomaly Detection Shubhomoy Das, Weng-Keen Wong, Alan Fern, Thomas G. Dietterich and Md Amran Siddiqui School of EECS Anomaly Detection Goal: Identify rare or strange objects 2 Anomaly


slide-1
SLIDE 1

In Incorporating Feedback in into Tree-based Anomaly Detection

Shubhomoy Das, Weng-Keen Wong, Alan Fern, Thomas G. Dietterich and Md Amran Siddiqui School of EECS

slide-2
SLIDE 2

Anomaly Detection

  • Goal: Identify rare or strange objects

2

slide-3
SLIDE 3

Anomaly Detection

  • Goal: Identify rare or strange objects

2

slide-4
SLIDE 4

Typical Investigation

3

Anomaly Detector 𝑔(𝑦) Ranking

slide-5
SLIDE 5

Typical Investigation

3

Anomaly Detector 𝑔(𝑦) Ranking

slide-6
SLIDE 6

Typical Investigation

3

Anomaly Detector 𝑔(𝑦) Ranking

slide-7
SLIDE 7

Typical Investigation

3

Anomaly Detector 𝑔(𝑦) Ranking

. . .

slide-8
SLIDE 8

Typical Investigation

  • Major problem: Statistical anomalies

don’t necessarily correspond to semantic anomalies

  • Need to deal with large number of

false positives

3

Anomaly Detector 𝑔(𝑦) Ranking

. . .

slide-9
SLIDE 9

Investigation with Feedback

4

Anomaly Detector 𝑔(𝑦) Ranking

slide-10
SLIDE 10

Investigation with Feedback

4

Anomaly Detector 𝑔(𝑦) Ranking Nominal

slide-11
SLIDE 11

Investigation with Feedback

4

Anomaly Detector 𝑔(𝑦) Ranking Nominal

slide-12
SLIDE 12

Investigation with Feedback

4

Anomaly Detector 𝑔(𝑦) Ranking

slide-13
SLIDE 13

Investigation with Feedback

4

Anomaly Detector 𝑔(𝑦) Ranking Nominal

slide-14
SLIDE 14

Investigation with Feedback

4

Anomaly Detector 𝑔(𝑦) Ranking Nominal

slide-15
SLIDE 15

Investigation with Feedback

4

Anomaly Detector 𝑔(𝑦) Ranking

. . .

Nominal

slide-16
SLIDE 16

Investigation with Feedback

4

Anomaly Detector 𝑔(𝑦) Ranking

. . .

Anomaly

slide-17
SLIDE 17

Investigation with Feedback

4

Anomaly Detector 𝑔(𝑦) Ranking

. . .

Anomaly

slide-18
SLIDE 18

Investigation with Feedback

  • Ranking is adaptive
  • Reduces false positive

4

Anomaly Detector 𝑔(𝑦) Ranking

. . .

Anomaly

slide-19
SLIDE 19

Tree-based Anomaly Detection

  • Isolation Forest
  • HS-Trees
  • RS-Forest
  • RPAD
  • Random Projection Forest
  • …

5

slide-20
SLIDE 20

Isolation Forest

6

Random feature and random split point Deeper leaf indicates nominal

Shallow leaf indicates anomaly

< β‰₯

slide-21
SLIDE 21

Isolation Forest

6

Random feature and random split point Deeper leaf indicates nominal

Shallow leaf indicates anomaly

< β‰₯

slide-22
SLIDE 22

Isolation Forest

6

Typically 100 trees in practice Random feature and random split point Deeper leaf indicates nominal

Shallow leaf indicates anomaly

< β‰₯

slide-23
SLIDE 23

Weighted Representation of Trees

7

< β‰₯

π’š 𝑨(𝑦) = βˆ’1, 0, 0, βˆ’1, 0, 0, 0, βˆ’1, βˆ’1, … π‘ˆ

(extremely sparse)

  • Weights for isolation forest:

π‘₯ = 1, 1, 1, 1, 1, 1, 1, 1, 1, … π‘ˆ

  • Different set of weights will result other tree based detectors

𝑑𝑑𝑝𝑠𝑓 𝑦 = π‘₯π‘ˆ. 𝑨 𝑦

slide-24
SLIDE 24

Active Anomaly Discovery

8

π‘Ÿπœ

𝑒

π‘₯𝑒

slide-25
SLIDE 25

Active Anomaly Discovery

8

π‘Ÿπœ

𝑒

π‘₯𝑒

slide-26
SLIDE 26

Active Anomaly Discovery

8

π‘Ÿπœ

𝑒

Nominal π‘₯𝑒

slide-27
SLIDE 27

Active Anomaly Discovery

8

π‘Ÿπœ

𝑒

π‘Ÿπœ

𝑒+1

Nominal π‘₯𝑒 π‘₯𝑒+1

slide-28
SLIDE 28

Active Anomaly Discovery

8

π‘Ÿπœ

𝑒

π‘Ÿπœ

𝑒+1

Nominal π‘₯𝑒 π‘₯𝑒+1

slide-29
SLIDE 29

Active Anomaly Discovery

8

π‘Ÿπœ

𝑒

π‘Ÿπœ

𝑒+1

Nominal Anomaly π‘₯𝑒 π‘₯𝑒+1

slide-30
SLIDE 30

Active Anomaly Discovery

8

π‘Ÿπœ

𝑒

π‘Ÿπœ

𝑒+1

π‘Ÿπœ

𝑒+2

Nominal Anomaly π‘₯𝑒 π‘₯𝑒+1 π‘₯𝑒+2

slide-31
SLIDE 31

Active Anomaly Discovery

8

π‘Ÿπœ

𝑒

π‘Ÿπœ

𝑒+1

π‘Ÿπœ

𝑒+2

Nominal Anomaly π‘₯𝑒 π‘₯𝑒+1 π‘₯𝑒+2

slide-32
SLIDE 32

Active Anomaly Discovery

8

π‘Ÿπœ

𝑒

π‘Ÿπœ

𝑒+1

π‘Ÿπœ

𝑒+2

Nominal Anomaly π‘₯𝑒 π‘₯𝑒+1 π‘₯𝑒+2

…

slide-33
SLIDE 33

9

Synthetic Dataset Baseline discovers 12 anomalies in 35 iterations AAD discovers 23 anomalies in 35 iterations

True anomalies

Result

slide-34
SLIDE 34

10

Result

0 Feedback

slide-35
SLIDE 35

10

Result

0 Feedback 10 Feedback

slide-36
SLIDE 36

10

Result

0 Feedback 10 Feedback 20 Feedback

slide-37
SLIDE 37

10

Result

0 Feedback 10 Feedback 20 Feedback 25 Feedback

slide-38
SLIDE 38

10

Result

0 Feedback 10 Feedback 20 Feedback 25 Feedback 35 Feedback

slide-39
SLIDE 39

A closer look at the data with t-SNE

11

slide-40
SLIDE 40

A closer look at the data with t-SNE

βˆ’50 50 βˆ’50 50 x y

  • o
  • o
  • o
  • o
  • +

+ + + + + + + + + + + +

+ + + + + + + + + + + + + + + + Abalone Baseline

βˆ’100 βˆ’50 50 100 βˆ’100 βˆ’50 50 100 x y

  • o
  • o
  • o
  • o
  • o
  • o
  • o
  • o
  • +

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

+ + ++ + + + + + + + + + + + + + + + + ANN Thyroid 1v3 Baseline

π₯ False Positive + False Negative + True Positive

𝗉 True Negative

11

slide-41
SLIDE 41

A closer look at the data with t-SNE

βˆ’50 50 βˆ’50 50 x y

  • o
  • o
  • o
  • o
  • +

+ + + + + + + + + + + +

+ + + + + + + + + + + + + + + + Abalone Baseline

βˆ’50 50 βˆ’50 50 x y

  • o
  • o
  • o
  • o
  • o
  • +

+ + + + + + + +

+ + + + + + + + + + + + + + + + + + + + Abalone IF-AAD

βˆ’100 βˆ’50 50 100 βˆ’100 βˆ’50 50 100 x y

  • o
  • o
  • o
  • o
  • o
  • o
  • o
  • o
  • +

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

+ + ++ + + + + + + + + + + + + + + + + ANN Thyroid 1v3 Baseline

π₯ False Positive + False Negative + True Positive

𝗉 True Negative

βˆ’100 βˆ’50 50 100 βˆ’100 βˆ’50 50 100 x y

  • o
  • o
  • o
  • o
  • o
  • o
  • o
  • +

+ + + + + + + + + + + + + + + + + + + + + +

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + ANN Thyroid 1v3 IF-AAD

11

slide-42
SLIDE 42

A closer look at the data with t-SNE

βˆ’50 50 βˆ’50 50 x y

  • o
  • o
  • o
  • o
  • +

+ + + + + + + + + + + +

+ + + + + + + + + + + + + + + + Abalone Baseline

βˆ’50 50 βˆ’50 50 x y

  • o
  • o
  • o
  • o
  • o
  • +

+ + + + + + + +

+ + + + + + + + + + + + + + + + + + + + Abalone IF-AAD

βˆ’100 βˆ’50 50 100 βˆ’100 βˆ’50 50 100 x y

  • o
  • o
  • o
  • o
  • o
  • o
  • o
  • o
  • +

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

+ + ++ + + + + + + + + + + + + + + + + ANN Thyroid 1v3 Baseline

  • Effect of feedback
  • Increase focus where

anomalies have been discovered previously π₯ False Positive + False Negative + True Positive

𝗉 True Negative

βˆ’100 βˆ’50 50 100 βˆ’100 βˆ’50 50 100 x y

  • o
  • o
  • o
  • o
  • o
  • o
  • o
  • +

+ + + + + + + + + + + + + + + + + + + + + +

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + ANN Thyroid 1v3 IF-AAD

11

slide-43
SLIDE 43

A closer look at the data with t-SNE

βˆ’50 50 βˆ’50 50 x y

  • o
  • o
  • o
  • o
  • +

+ + + + + + + + + + + +

+ + + + + + + + + + + + + + + + Abalone Baseline

βˆ’50 50 βˆ’50 50 x y

  • o
  • o
  • o
  • o
  • o
  • +

+ + + + + + + +

+ + + + + + + + + + + + + + + + + + + + Abalone IF-AAD

βˆ’100 βˆ’50 50 100 βˆ’100 βˆ’50 50 100 x y

  • o
  • o
  • o
  • o
  • o
  • o
  • o
  • o
  • +

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

+ + ++ + + + + + + + + + + + + + + + + ANN Thyroid 1v3 Baseline

  • Effect of feedback
  • Increase focus where

anomalies have been discovered previously

  • Remove focus from

unpromising regions π₯ False Positive + False Negative + True Positive

𝗉 True Negative

βˆ’100 βˆ’50 50 100 βˆ’100 βˆ’50 50 100 x y

  • o
  • o
  • o
  • o
  • o
  • o
  • o
  • +

+ + + + + + + + + + + + + + + + + + + + + +

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + ANN Thyroid 1v3 IF-AAD

11

slide-44
SLIDE 44

12

10 20 30 40 50 60 10 20 30 40 50 60

iter # anomalies seen IForest Baseline IFβˆ’AAD LODAβˆ’AAD

20 40 60 80 100 20 40 60 80 100

iter # anomalies seen IForest Baseline IFβˆ’AAD LODAβˆ’AAD

20 40 60 80 100 20 40 60 80 100

iter # anomalies seen IForest Baseline IFβˆ’AAD LODAβˆ’AAD

Abalone Covtype Mammography

10 20 30 40 50 60 10 20 30 40 50 60

iter # anomalies seen IForest Baseline IFβˆ’AAD LODAβˆ’AAD

20 40 60 80 100 20 40 60 80 100

iter # anomalies seen IForest Baseline IFβˆ’AAD LODAβˆ’AAD

10 20 30 40 50 60 10 20 30 40 50 60

iter # anomalies seen IForest Baseline IFβˆ’AAD LODAβˆ’AAD

Cardiotocography KDDCup99 ANN Thyroid 1v3

slide-45
SLIDE 45

Conclusion & Future Work

  • Human feedback is essential and improves the unsupervised learner ☺

13

slide-46
SLIDE 46

Conclusion & Future Work

  • Human feedback is essential and improves the unsupervised learner ☺
  • Extend to other types of anomaly detectors

13

slide-47
SLIDE 47

Conclusion & Future Work

  • Human feedback is essential and improves the unsupervised learner ☺
  • Extend to other types of anomaly detectors
  • Explanation based feedback

13

slide-48
SLIDE 48

Questions?

14

slide-49
SLIDE 49

Extra Slides

15

slide-50
SLIDE 50

50 100 150 200 250 300 10 20 30 40 iter # anomalies seen

IForest Baseline IFβˆ’AAD IFβˆ’AADβˆ’Tree

  • racle

50 100 150 200 250 300 50 100 150 200 250 300 iter # anomalies seen

IForest Baseline IFβˆ’AAD IFβˆ’AADβˆ’Tree

  • racle

50 100 150 200 250 300 20 40 60 iter # anomalies seen

IForest Baseline IFβˆ’AAD IFβˆ’AADβˆ’Tree

  • racle

Cardiotocography KDDCup99 ANN Thyroid 1v3

Results (adjusting tree weights instead of node-weights)

50 100 150 200 250 300 5 10 15 20 25 30 iter # anomalies seen

IForest Baseline IFβˆ’AAD IFβˆ’AADβˆ’Tree

  • racle

50 100 150 200 250 300 50 100 150 200 250 300 iter # anomalies seen

IForest Baseline IFβˆ’AAD IFβˆ’AADβˆ’Tree

  • racle

50 100 150 200 250 300 50 100 150 200 250 iter # anomalies seen

IForest Baseline IFβˆ’AAD IFβˆ’AADβˆ’Tree

  • racle

Abalone Covtype Mammography

16

slide-51
SLIDE 51

Timing Plots

50 100 150 200 250 300 50 100 150 200 250 300

number of queries time (secs)

IFβˆ’AAD

50 100 150 200 250 300 50 100 150 200 250 300

number of queries time (secs)

IFβˆ’AAD

50 100 150 200 250 300 50 100 150 200 250 300

number of queries time (secs)

IFβˆ’AAD

Covtype Mammography Shuttle

17