Dynamic Adaptation of Temporal Event Correlation Rules Rean - - PowerPoint PPT Presentation

dynamic adaptation of temporal event correlation rules
SMART_READER_LITE
LIVE PREVIEW

Dynamic Adaptation of Temporal Event Correlation Rules Rean - - PowerPoint PPT Presentation

Dynamic Adaptation of Temporal Event Correlation Rules Rean Griffith, Gail Kaiser Joseph Hellerstein*, Yixin Diao* Presented by Rean Griffith rg2023@cs.columbia.edu - Programming Systems Lab (PSL) Columbia University * - IBM Thomas


slide-1
SLIDE 1

1

Dynamic Adaptation of Temporal Event Correlation Rules

Rean Griffith‡, Gail Kaiser‡ Joseph Hellerstein*, Yixin Diao* Presented by Rean Griffith rg2023@cs.columbia.edu

‡ - Programming Systems Lab (PSL) Columbia University * - IBM Thomas J. Watson Research Center

slide-2
SLIDE 2

2

Overview

 Introduction  Problem  Solution  System Architecture  How it works – Feed-forward control  Experiments  Results I, II, III  Conclusions & Future work

slide-3
SLIDE 3

3

Introduction

 Temporal event correlation is essential to realizing

self-managing distributed systems.

 For example, correlating multiple event streams from

multiple event sources to detect:

 System health/live-ness  Processing delays in single/multi-machine systems  Denial of service attacks  Anomalous application/machine-behavior

slide-4
SLIDE 4

4

Problem

 Time-bounds that guide event stream analysis are

usually fixed. Based on “guesstimates” that ignore dynamic changes in the operating environment

 Fixed time-bounds may result in false-alarms that

distract administrators from responding to real problems.

 Issues with client-side timestamps (even with clock

synchronization).

slide-5
SLIDE 5

5

Solution

 Use time-bounds as the basis for temporal rules, but

introduce an element of “fuzz” based on detected changes in the operating environment.

 To detect changes in the operating environment

introduce Calibration Event Generators which generate sequences of events (Calibration frames) at a known resolution.

 Use the difference in the arrival times of calibration

events to determine the “fuzz” to use.

 Only time-stamps at the receiver count.

slide-6
SLIDE 6

6

System Architecture

slide-7
SLIDE 7

7

How it works – Feed-forward Control

 Use the difference in the arrival times of

calibration events within a calibration frame (less the generator resolution) as an

  • bservation of “propagation skew”.

 Record last N observations of propagation

skew.

 Sort these observations and use the median

as the “fuzz” to add to timer rules

 Using the median prevents overreaction to

transient spikes.

slide-8
SLIDE 8

8

Experiments

N/A N/A Event Distiller N/A Linux 2.6 3GHz, 1GB RAM N/A Siena Event Router + Event Distiller Calibration Event Generator Configuration D 2-machine Siena Event Router + Event Distiller N/A Calibration Event Generator Configuration C 2-machine N/A Siena Event Router Calibration Event Generator Configuration B 3-machine Event Distiller Siena Event Router Calibration Event Generator Configuration A 3-machine Windows XP SP2, 3GHz, 1GB RAM Linux 2.6 3GHz, 1GB RAM Linux 2.6 3GHz, 1GB RAM

slide-9
SLIDE 9

9

Results I – Propagation Skews

3-machine 2-machine Windows + Linux All Linux

slide-10
SLIDE 10

10

Results II - Autocorrelations

3-machine 2-machine Windows + Linux All Linux A B C D

slide-11
SLIDE 11

11

Results III – Sensitivity to N (Run 3 Configuration C)

5 0 1 0 0 1 5 0 2 0 0 2 5 0 3 0 0 3 5 0 4 0 0 4 5 0 0 . 5 0 . 5 5 0 . 6 0 . 6 5 0 . 7 0 . 7 5 0 . 8 0 . 8 5 0 . 9 0 . 9 5 1

Most accurate N (observation window size) depends on: Actual conditions AND initial fuzz factor setting Generator set to produce 241/445 “real” failures With large N we use initial fuzz factor longer, erroneously reporting fewer “real failures” (when we’re missing real problems) Initial fuzz factor setting = 500 ms 80%+ accuracy with smaller N.

5 0 1 0 0 1 5 0 2 0 0 2 5 0 3 0 0 3 5 0 4 0 0 4 5 0 0 . 1 0 . 2 0 . 3 0 . 4 0 . 5 0 . 6 0 . 7 0 . 8 0 . 9

Initial fuzz factor setting = 0 ms 85%-90% accuracy with smaller N

slide-12
SLIDE 12

12

Conclusions

 There is more to our notion of “propagation skew”

than network delays. Resource contention at the receiver on certain platforms as seen in configuration C (2-machine Linux + Windows setups) also affects

  • ur observations.

 Near optimal settings automatically achieved by

managing the tradeoff between larger observation windows and the ability to respond quickly to changes in the environment.

 Feed-forward control useful in building self-regulating

systems that rely on temporal event correlation.

slide-13
SLIDE 13

13

Comments, Questions, Queries

Thank you for your time and attention. Contact: Rean Griffith rg2023@cs.columbia.edu

slide-14
SLIDE 14

14

Event Package

 Events Represented as Siena Notifications of

size ~80 bytes