Structure of Talk Workload-sensitive Timing Behavior Anomaly - - PDF document

structure of talk workload sensitive timing behavior
SMART_READER_LITE
LIVE PREVIEW

Structure of Talk Workload-sensitive Timing Behavior Anomaly - - PDF document

Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems Structure of Talk Workload-sensitive Timing Behavior Anomaly Detection 1 Motivation


slide-1
SLIDE 1

Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems

Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems

Diploma Thesis André van Hoorn

Abteilung Software Engineering Fakultät II - Department für Informatik

November 8, 2007

André van Hoorn Diploma Thesis November 8, 2007 1/ 42 Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems

Structure of Talk

1

Motivation

2

Foundations

3

Hypothesis & Goals

4

Results

5

Conclusions

6

Related Work

André van Hoorn Diploma Thesis November 8, 2007 2/ 42 Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems Motivation

Motivation

Availability of Enterprise Information Systems (e.g. banking

& online shopping systems) is critical QoS requirement

Anomaly detection is means for failure detection and dia- gnosis to improve availability Existing anomaly detection approaches based on timing behavior do not explicitly consider varying workload

André van Hoorn Diploma Thesis November 8, 2007 3/ 42 Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems Foundations

Structure

1

Motivation

2

Foundations

3

Hypothesis & Goals

4

Results

5

Conclusions

6

Related Work

André van Hoorn Diploma Thesis November 8, 2007 4/ 42 Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems Foundations Performance Metrics and Scalability

Performance

1 Time Behavior

Response Time

Time interval elapsed between issued request and respective response

Execution Time Throughput

Rate at which a system (re- source) handles tasks

(Client-/Server-side) Think Time, . . .

2 Resource Utilization

a b b() a()

Response Time Execution Time Execution Time = Response Time

Figure: Operation timing metrics.

André van Hoorn Diploma Thesis November 8, 2007 5/ 42 Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems Foundations Performance Metrics and Scalability

Workload and Scalability

Workload Amount of work currently reque- sted from or processed by a sy- stem Characteristics

Workload intensity Service demand characteristics

Scalability

“ability of a system to continue to meet its response time or throughput objecti- ves as the [workload] increases” [SW01]

Workload Response Time Throughput Workload

Knee Capacity Usable Capacity Nominal Capacity

The capacity of a system [Jai91].

André van Hoorn Diploma Thesis November 8, 2007 6/ 42

slide-2
SLIDE 2

Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems Foundations Workload Characterization

A Hierarchical Workload Model

Session Layer Functional Layer HTTP Request Layer

  • 1. Protocol Level
  • 2. Application Level
  • 3. User Level

Resource Level Business Level Figure: A hierarchical workload model [MAR+00].

Session [MAFM99] “Consecutive and related requests issued by the same user”

André van Hoorn Diploma Thesis November 8, 2007 7/ 42 Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems Foundations Anomaly Detection

Anomaly Detection

Motivation: Availability important QoS attribute Availability

[MIO87]

= MTTF MTTF + MTTR Goal: Improve availability by reduction of repair times Strategy: Use unusual behavior as indicator for failures Common approach for software systems:

Build model of “normal behavior” (based on set of monitored parameters, e.g. time behavior) Monitor current behavior Detect deviations

André van Hoorn Diploma Thesis November 8, 2007 8/ 42 Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems Foundations Probability and Statistics

Descriptive Statistics

Statistics

Minimum, maximum Sample mean, Sample variance p-Quantile xp: xp = min{x | F(x) ≥ p} 1.–3. Quartiles: x0.25, x0.5 (Median), x0.75 Mode, Skewness, . . .

Other distribution characteristics:

uni-/bi-/multimodal (a)symmetric left-/right-skewed

André van Hoorn Diploma Thesis November 8, 2007 9/ 42 Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems Foundations Probability and Statistics

Parametric Distribution Families (Examples)

N(x| 1, 1) N(x| −1, 0.92) N(x| 1, 0.82) Density x −4 −3 −2 −1 1 2 3 4

Normal Distribution

2-parameter: N(µ, σ2)

Λ Λ(x| 0, 1) Λ Λ(x| 0.7, 0.52) Λ(x| 2, 0, 1) Density x 1 2 3 4 5 6 7 8 9

Log-normal Distribution

2-parameter: Λ(µ, σ2) 3-parameter: Λ(τ, µ, σ2)

André van Hoorn Diploma Thesis November 8, 2007 10/ 42 Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems Hypothesis & Goals

Structure

1

Motivation

2

Foundations

3

Hypothesis & Goals

4

Results

5

Conclusions

6

Related Work

André van Hoorn Diploma Thesis November 8, 2007 11/ 42 Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems Hypothesis & Goals

Hypothesis

Assuming that varying workload implies varying response times: Hypothesis Novel workload-sensitive anomaly detection based on response times realizable if varying workload intensity has characteristic impact on response time distributions

André van Hoorn Diploma Thesis November 8, 2007 12/ 42

slide-3
SLIDE 3

Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems Hypothesis & Goals

Project Goals I

1 Probabilistic Workload Driver

Develop application-generic methodology for generating realistic user behavior (e.g. based on probabilistic model)

2 Case Study with Response Time Analysis

Apply & evaluate workload generation technique Obtain workload-dependent response times from sample application Statistically analyze impact of workload on response times

3 Workload-Sensitive Anomaly Detection Prototype

Compute degree of anomaly for operation executions Implementation of workload-sensitive AD prototype

André van Hoorn Diploma Thesis November 8, 2007 13/ 42 Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems Hypothesis & Goals

Project Goals I

1 Probabilistic Workload Driver

Develop application-generic methodology for generating realistic user behavior (e.g. based on probabilistic model)

2 Case Study with Response Time Analysis

Apply & evaluate workload generation technique Obtain workload-dependent response times from sample application Statistically analyze impact of workload on response times

3 Workload-Sensitive Anomaly Detection Prototype

Compute degree of anomaly for operation executions Implementation of workload-sensitive AD prototype

André van Hoorn Diploma Thesis November 8, 2007 13/ 42 Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems Hypothesis & Goals

Project Goals I

1 Probabilistic Workload Driver

Develop application-generic methodology for generating realistic user behavior (e.g. based on probabilistic model)

2 Case Study with Response Time Analysis

Apply & evaluate workload generation technique Obtain workload-dependent response times from sample application Statistically analyze impact of workload on response times

3 Workload-Sensitive Anomaly Detection Prototype

Compute degree of anomaly for operation executions Implementation of workload-sensitive AD prototype

André van Hoorn Diploma Thesis November 8, 2007 13/ 42 Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems Results Probabilistic Workload Driver

Structure

1

Motivation

2

Foundations Performance Metrics and Scalability Workload Characterization Anomaly Detection Probability and Statistics

3

Hypothesis & Goals

4

Results Probabilistic Workload Driver Case Study with Response Time Analysis Workload-sensitive Anomaly Detection Prototype

5

Conclusions

6

Related Work

André van Hoorn Diploma Thesis November 8, 2007 14/ 42 Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems Results Probabilistic Workload Driver

Probabilistic Workload Driver – Approach

Challenge: Generate valid sessions Constraint: Realistic behavior (not: “capture & replay”) Approach:

1 Workload configuration data model separated into

Application Model User Behavior Model User Behavior Mix Workload Intensity

2 High-level design

Iterative execution model Session model composition semantics

3 Implementation: Markov4JMeter (JMeter extension)

André van Hoorn Diploma Thesis November 8, 2007 15/ 42 Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems Results Probabilistic Workload Driver

Application Model

Session layer mo- dels allowed se- quences of service calls in a session Protocol layer con- tains all protocol- specific (e.g. HTTP) request details

Figure: Sample application model illustrating separation into session and protocol layer.

André van Hoorn Diploma Thesis November 8, 2007 16/ 42

slide-4
SLIDE 4

Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems Results Probabilistic Workload Driver

User Behavior Model

User behavior model corresponds to specific application model Markov chain models probabilistic behavior within a session States correspond to states of session layer Includes definition of (client-side) think time

S1 S0 S2 0.1 0.5 0.5 0.1 0.7 0.1

Exit Exit Exit 0.2 0.4 0.4

Figure: User behavior model BA,0

S1 S0 S2 0.1 0.3 0.1 0.7 0.3 0.65

Exit Exit Exit 0.05 0.2 0.6

Figure: User behavior model BA,1

André van Hoorn Diploma Thesis November 8, 2007 17/ 42 Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems Results Probabilistic Workload Driver

User Behavior Mix, Workload Intensity

User Behavior Mix

Assignment of user behavior models BA,i to application model A with relative frequencies pi Formally, BMIXA = {(BA,0, p0),. . . ,(BA,n−1,pn−1)}

Workload Intensity

Duration Function R≥0 → N defining number of concurrent users

André van Hoorn Diploma Thesis November 8, 2007 18/ 42 Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems Results Probabilistic Workload Driver

Markov4JMeter

Implemented workload driver as extension for existing workload tool Apache JMeter Markov4JMeter [vH07] released under GPL Feedback: “Markov4JMeter has worked very well for us. We have used it in several scripts for the last two months. There have been no bugs. The add-in should be made a part of the JMeter distribution.”

Mark McWhinney, Portata, Inc., Mountain View, CA (Sep 9, 2007) André van Hoorn Diploma Thesis November 8, 2007 19/ 42 Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems Results Case Study with Response Time Analysis

Structure

1

Motivation

2

Foundations Performance Metrics and Scalability Workload Characterization Anomaly Detection Probability and Statistics

3

Hypothesis & Goals

4

Results Probabilistic Workload Driver Case Study with Response Time Analysis Workload-sensitive Anomaly Detection Prototype

5

Conclusions

6

Related Work

André van Hoorn Diploma Thesis November 8, 2007 20/ 42 Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems Results Case Study with Response Time Analysis

Sample Application

Service Layer Presentation Layer

Apache Struts

struts.ActionServlet

Persistence Layer

iBatis DAO Framework

DBMS iBatis JPetStore

SQL SQL

Client

HTTP HTTP

iBatis JPetStore Online shopping store 3-layer architecture

Presentation layer Service layer Persistence layer

Deployment

Application Server Database Server

André van Hoorn Diploma Thesis November 8, 2007 21/ 42 Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems Results Case Study with Response Time Analysis

Markov4JMeter Profile for JPetStore

Identified 29 request types Grouped request types into 15 services

Service Request Type Service Request Type Service Request Type Home† index† Browse Category† viewCategory† Remove Item removeItemFromCart Browse Help help switchProductListPage Purchase† checkout† Sign On† signonForm† Browse Product† viewProduct† newOrderForm† signon† switchItemListPage newOrderData† Edit Account editAccountForm View Item† viewItem† newOrderConfirm† editAccount Add to Cart† addItemToCart† Search searchProducts listOrders View Cart† viewCart† switchSearchListPage viewOrder switchCartPage Register newAccountForm switchOrderPage switchMyListPage newAccount Sign Off† signoff† Update Cart updateCartQuantities

Focused on services of “typical user sessions” 9 services / 13 request types (labeled by †)

André van Hoorn Diploma Thesis November 8, 2007 22/ 42

slide-5
SLIDE 5

Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems Results Case Study with Response Time Analysis

Application Model

Markov4JMeter Profile for JPetStore (cont’d)

View Category Home Sign On View Product View Item Add to Cart View Cart Purchase Sign Off

[!signedOn]/signedOn:=true [!signedOn]/ signedOn:=true /itemInCart:=true [signedOn && itemInCart:=false itemInCart]/ /signedOn:=false [signedOn]/ signedOn:=false [signedOn]/ signedOn:=false [itemInCart]

Purchase

checkout newOrderForm newOrderData newOrderConfirm

Sign On

signon

req.method="POST" req.uri="/jpetstore/shop/signon.shtml" req.header=<"..."> req.body=<username=userId, password=password, submit="Login">

signonForm

req.method="POST" req.uri="/jpetstore/shop/signon.shtml" req.header=<"..."> req.body=<>

signonForm

req.method="GET" req.uri="/jpetstore/shop/signon.shtml" req.header=<"..."> req.body=<>

Figure: Session layer of application model and protocol states of 2 application states.

André van Hoorn Diploma Thesis November 8, 2007 23/ 42 Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems Results Case Study with Response Time Analysis

Behavior Models

Markov4JMeter Profile for JPetStore (cont’d)

Defined 2 user behavior models: Browser and Buyer

View Category Home View Product View Item View Cart Exit Exit Exit Exit 0.975 0.025 0.025 0.05 0.3 0.575 0.025 0.05 0.05 0.3 0.575 0.025 0.05 0.05 0.450 0.025 0.05 0.4 0.2 0.450 View Category Home Sign On View Product View Item Add to Cart View Cart Purchase Sign Off Exit Exit Exit Exit Exit Exit Exit 0.975 0.025 1.0 0.025 0.225 0.725 0.025 0.025 0.175 0.775 0.025 0.225 0.175 0.55 0.025 0.025 0.5 0.45 0.025 0.9 0.1 1.0 1.0

Figure: Transition graphs of browsers and buyers.

André van Hoorn Diploma Thesis November 8, 2007 24/ 42 Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems Results Case Study with Response Time Analysis

Test Plan

Markov4JMeter Profile for JPetStore (cont’d)

DEMO

André van Hoorn Diploma Thesis November 8, 2007 25/ 42 Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems Results Case Study with Response Time Analysis

Experiment Configuration

Distributed deployment

Application server (Apache Tomcat executing JPetStore) Client (workload driver w/ JMeter extended by Markov4JMeter) Database server (JPetStore and Tpmon databases)

Application-level monitoring with Tpmon Instrumented 19 operations 25 × 2 (db/fs) experiment runs

Increasing number of active sessions (concurrent users) Constant number of active sessions in single run

No. 1 2 3 4 5 6 7 8 9 10 11 12 13 Active Sessions 1 5 10 15 20 25 30 35 40 45 55 65 75 Ramp-up (seconds) 30 60 60 60 60 60 60 60 90 90 90 Duration (minutes) 20 20 15 15 15 12 11 10 9 8 7 7 7 No. 14 15 16 17 18 19 20 21 22 23 24 25 Active Sessions 85 95 105 115 125 135 145 155 165 175 185 195 Ramp-up (seconds) 120 120 120 120 180 180 180 180 180 180 180 200 Duration (minutes) 7 7 8 8 9 9 9 9 9 9 9 9 André van Hoorn Diploma Thesis November 8, 2007 26/ 42 Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems Results Case Study with Response Time Analysis

Platform Workload Intensity Metric (PWI)

Quantifies workload intensity on server node at a given time Given time t and window size ω, PWI expresses average number

  • f active traces in the time interval [t − ω, t]

1 2 3 4 5 6

Active traces

Experiment time (minutes) Active traces 2 3 0.0 0.5 1.0 1.5

Platform Workload Intensity window size: 61 ms, step size: 30 ms

Experiment time (minutes) PWI 2 2.111 2.333 2.556 2.778 3 max (1.541) mean (0.3213) median (0.2787) min (0)

Figure: Graphs visualizing active traces history and PWI.

André van Hoorn Diploma Thesis November 8, 2007 27/ 42 Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems Results Case Study with Response Time Analysis

Response Time Analysis

1 Analyzed impact of PWI on response time statistics

Minimum, maximum, Mean, variance, and standard deviation, Mode,

  • 1. quartile, median, 3. quartile,

Skewness, and Outlier ratio.

2 Distribution fitting

André van Hoorn Diploma Thesis November 8, 2007 28/ 42

slide-6
SLIDE 6

Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems Results Case Study with Response Time Analysis

Experiment Report

Automatic generation of Plots for each ex- periment run and

  • peration

PWI vs. response time statistics for each operation

20 40 60 3.0 3.2 3.4 3.6 3.8 4.0 Users vs. Quartiles of Response Times persistence.sqlmapdao.AccountSqlMapDao.getAccount(...) Users Response time (ms)

  • 1. quartile

median

  • 3. quartile

4 5 6 7 8 3.0 3.2 3.4 3.6 3.8 4.0 Scatter Plot of Response Times persistence.sqlmapdao.AccountSqlMapDao.getAccount(...) Experiment time (minutes) N=486 Response time (ms) Local regression 3.0 3.5 4.0 0.0 0.5 1.0 1.5 2.0 2.5 Density Plot of Response Times persistence.sqlmapdao.AccountSqlMapDao.getAccount(...) Response time (milliseconds) N=486, Bandwidth=0.04385 Density Mean 3.163 Median 3.106

  • Approx. Mode 3.057

Skewness 1.666 Kurtosis 3.614

André van Hoorn Diploma Thesis November 8, 2007 29/ 42 Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems Results Case Study with Response Time Analysis

Experiment Report

Automatic generation of Plots for each ex- periment run and

  • peration

PWI vs. response time statistics for each operation DEMO

20 40 60 3.0 3.2 3.4 3.6 3.8 4.0 Users vs. Quartiles of Response Times persistence.sqlmapdao.AccountSqlMapDao.getAccount(...) Users Response time (ms)

  • 1. quartile

median

  • 3. quartile

4 5 6 7 8 3.0 3.2 3.4 3.6 3.8 4.0 Scatter Plot of Response Times persistence.sqlmapdao.AccountSqlMapDao.getAccount(...) Experiment time (minutes) N=486 Response time (ms) Local regression 3.0 3.5 4.0 0.0 0.5 1.0 1.5 2.0 2.5 Density Plot of Response Times persistence.sqlmapdao.AccountSqlMapDao.getAccount(...) Response time (milliseconds) N=486, Bandwidth=0.04385 Density Mean 3.163 Median 3.106

  • Approx. Mode 3.057

Skewness 1.666 Kurtosis 3.614

André van Hoorn Diploma Thesis November 8, 2007 29/ 42 Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems Results Case Study with Response Time Analysis

Results (1/2)

Workload intensity impacts (most) response time statistics

Maximum very sensitive Mean more sensitive than median Upper quartiles more sensitive than lower quartiles − → Increasing IQR Minimum largely unaffected Observed no correlation with outlier ratio

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 10 20 30 40 50

Box−and−Whisker Plot of Response Times service.CatalogService.getItem(...)

Experiment time (minutes) Response time (ms) Mean Median

André van Hoorn Diploma Thesis November 8, 2007 30/ 42 Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems Results Case Study with Response Time Analysis

Results (2/2)

Distributions right-shifted, long-tailed, right-skewed Most monotonically increasing curves show characteristic “performance knees” by Jain (1991) [Jai91] Identified 4 distribution shapes

Bimodal with 2 major clusters Bimodal with minor and major cluster Multimodal becoming unimodal Unimodal

Indication for need of probabilistic workload In large parts, 3-parameter log-normal distribution fits left sides of unimodal data samples

André van Hoorn Diploma Thesis November 8, 2007 31/ 42 Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems Results Case Study with Response Time Analysis

Distribution Fitting with 3-parameter Log-normal Distr.

In large parts, 3-parameter log-normal distribution fits left sides

  • f unimodal data samples

In most cases, tails of response time samples shorter than those

  • f estimated distribution

4 6 8 10 12 14 0.0 0.2 0.4 0.6

Density Plot of Response Times and 3−Parameter Log−Normal Distribution Model

Response time (milliseconds) N=11863, Bandwidth=0.1509 Density Mean 4.796 Median 4.192

  • Approx. Mode 3.697

Skewness 2.457 Kurtosis 6.998 20 40 60 80 4 6 8 10 12

QQ Plot of Sample Data and 3−Parameter Log−Normal Distribution

Λ(τ=3.437, µ=−0.3, σ σ=1.155) Sample response time (ms)

André van Hoorn Diploma Thesis November 8, 2007 32/ 42 Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems Results Case Study with Response Time Analysis

Bimodal Distribution Shapes

2 4 6 8 10 12 14 5 10 15 20 Scatter Plot of Response Times presentation.OrderBean.newOrder(...) Experiment time (minutes) N=1681 Response time (ms) Local regression 5 10 15 20 0.00 0.05 0.10 0.15 0.20

Density Plot of Response Times presentation.OrderBean.newOrder(...) Response time (milliseconds) N=1681, Bandwidth=0.9664 Density Mean 4.517 Median 7.245

  • Approx. Mode 0.008206

Skewness 0.3665 Kurtosis −1.167

André van Hoorn Diploma Thesis November 8, 2007 33/ 42

slide-7
SLIDE 7

Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems Results Case Study with Response Time Analysis

Bimodal Distribution Shapes

2 4 6 8 10 12 14 5 10 15 20 Scatter Plot of Response Times presentation.OrderBean.newOrder(...) Experiment time (minutes) N=1681 Response time (ms) Local regression 5 10 15 20 0.00 0.05 0.10 0.15 0.20

Density Plot of Response Times presentation.OrderBean.newOrder(...) Response time (milliseconds) N=1681, Bandwidth=0.9664 Density Mean 4.517 Median 7.245

  • Approx. Mode 0.008206

Skewness 0.3665 Kurtosis −1.167 4.0 4.5 5.0 5.5 6.0 2 4 6 8 10 14 Scatter Plot of Response Times presentation.CartBean.addItemToCart(...) Experiment time (minutes) N=1025 Response time (ms) Local regression 5 10 15 0.0 0.1 0.2 0.3 0.4 Density Plot of Response Times presentation.CartBean.addItemToCart(...) Response time (milliseconds) N=1025, Bandwidth=0.3326 Density Mean 6.447 Median 5.938

  • Approx. Mode 5.322

Skewness 0.5146 Kurtosis 3.08

André van Hoorn Diploma Thesis November 8, 2007 33/ 42 Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems Results Workload-sensitive Anomaly Detection Prototype

Structure

1

Motivation

2

Foundations Performance Metrics and Scalability Workload Characterization Anomaly Detection Probability and Statistics

3

Hypothesis & Goals

4

Results Probabilistic Workload Driver Case Study with Response Time Analysis Workload-sensitive Anomaly Detection Prototype

5

Conclusions

6

Related Work

André van Hoorn Diploma Thesis November 8, 2007 34/ 42 Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems Results Workload-sensitive Anomaly Detection Prototype

Anomaly Detection in Software Timing Behavior

Anomaly considered response time exceeding a given threshold τ Execution of operation o is tuple (o, st, rt) Anomaly detector (AD) must decide for execution whether

  • r not it is an anomaly (based on historical data)

Quality of AD: Error rate with type I/II errors

André van Hoorn Diploma Thesis November 8, 2007 35/ 42 Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems Results Workload-sensitive Anomaly Detection Prototype

Plain Anomaly Detector (PAD) (1/2)

PAD classifies an execution as anomalous iff its response time exceeds operation-specific threshold τ PAD(e) :=

  • 1, rt > τ

0, else

André van Hoorn Diploma Thesis November 8, 2007 36/ 42 Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems Results Workload-sensitive Anomaly Detection Prototype

Plain Anomaly Detector (PAD) (1/2)

PAD classifies an execution as anomalous iff its response time exceeds operation-specific threshold τ PAD(e) :=

  • 1, rt > τ

0, else Example 1: PAD with Constant Workload Intensity

20 40 60 80 60 80 100 140 Experiment time (seconds) Response time milli secs Anomaly (16) Normal obs. (84)

Error rate is 0 for τ ∈ [106.4, 144.9]

André van Hoorn Diploma Thesis November 8, 2007 36/ 42 Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems Results Workload-sensitive Anomaly Detection Prototype

Plain Anomaly Detector (2/2)

Example 2: PAD with Varying Workload Intensity

10 20 30 40 50 60 80 100 140 180 Experiment time (seconds) Response time milli secs Anomaly (16, 8.0 percent) Normal obs. (184) Maximum of normal obs. 177.3

No anomaly detected w/ minimum error rate of 8 % (τ > 176)

André van Hoorn Diploma Thesis November 8, 2007 37/ 42

slide-8
SLIDE 8

Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems Results Workload-sensitive Anomaly Detection Prototype

Workload-Intensity-sensitive Anomaly Det. (WISAD)

WISAD explicitly considers varying workload intensity by including

1 Platform workload intensity (PWI) during time of execution 2 Workload intensity normalization factor

André van Hoorn Diploma Thesis November 8, 2007 38/ 42 Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems Results Workload-sensitive Anomaly Detection Prototype

Workload-Intensity-sensitive Anomaly Det. (WISAD)

WISAD explicitly considers varying workload intensity by including

1 Platform workload intensity (PWI) during time of execution 2 Workload intensity normalization factor

Example 3: WISAD with Varying Workload Intensity

10 20 30 40 50 60 80 100 140 180 Experiment time (seconds) Response time milli secs Classified as anomaly (16, 8.0 percent) Classified as normal obs. (184) Threshold = 110

Error rate 0 for threshold values between 106–118

André van Hoorn Diploma Thesis November 8, 2007 38/ 42 Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems Conclusions

Structure

1

Motivation

2

Foundations

3

Hypothesis & Goals

4

Results

5

Conclusions

6

Related Work

André van Hoorn Diploma Thesis November 8, 2007 39/ 42 Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems Conclusions

Conclusions

1 Probabilistic Workload Driver

Methodology for probabilistic workload modeling based on Markov chains Design resulted in Markov4JMeter [vH07] (GPL-licensed)

2 Case Study with Response Time Analysis

Evaluated Markov4JMeter approach Executed large number of experiments with varying workload intensity Analyzed workload intensity vs. response time statistics

3 Workload-sensitive Anomaly Detection Prototype

AD prototype which considers varying workload intensity Evaluation with “real” data is work in progress [RvHGH07]

André van Hoorn Diploma Thesis November 8, 2007 40/ 42 Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems Conclusions

Conclusions

1 Probabilistic Workload Driver

Methodology for probabilistic workload modeling based on Markov chains Design resulted in Markov4JMeter [vH07] (GPL-licensed)

2 Case Study with Response Time Analysis

Evaluated Markov4JMeter approach Executed large number of experiments with varying workload intensity Analyzed workload intensity vs. response time statistics

3 Workload-sensitive Anomaly Detection Prototype

AD prototype which considers varying workload intensity Evaluation with “real” data is work in progress [RvHGH07]

André van Hoorn Diploma Thesis November 8, 2007 40/ 42 Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems Conclusions

Conclusions

1 Probabilistic Workload Driver

Methodology for probabilistic workload modeling based on Markov chains Design resulted in Markov4JMeter [vH07] (GPL-licensed)

2 Case Study with Response Time Analysis

Evaluated Markov4JMeter approach Executed large number of experiments with varying workload intensity Analyzed workload intensity vs. response time statistics

3 Workload-sensitive Anomaly Detection Prototype

AD prototype which considers varying workload intensity Evaluation with “real” data is work in progress [RvHGH07]

André van Hoorn Diploma Thesis November 8, 2007 40/ 42

slide-9
SLIDE 9

Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems Related Work

Structure

1

Motivation

2

Foundations

3

Hypothesis & Goals

4

Results

5

Conclusions

6

Related Work

André van Hoorn Diploma Thesis November 8, 2007 41/ 42 Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems Related Work

Related Work

Workload Generation

Workload characterization of system in productional use, e.g. Arlitt et. al [AKR01], Menascé et al. [MAR+00], [MA03] Costumer Behavior Model Graph (CBMG) by Menascé et al. [MAFM99] Extended Finite State Machine (EFSM) by Shams et al. [SKF06] Freely available and commercial workload generators, e.g. Mercury LoadRunner [Mer07], OpenSTA [Ope05], Siege [Ful06]

Response Time Analysis

Response time analysis of ERP systems by Mielke [Mie06]

Timing Behavior Anomaly Detection

Agarwal et al. [AAG+04]

André van Hoorn Diploma Thesis November 8, 2007 42/ 42 Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems Bibliography

Bibliography I

[AAG+04] Manoj K. Agarwal, Karen Appleby, Manish Gupta, Gautam Kar, Anindya Neogi, and Anca Sailer. Problem determination using dependency graphs and run-time behavior models. In Akhil Sahai and Felix Wu, editors, 15th IFIP/IEEE International Workshop on Distributed Systems: Operations and Management (DSOM 2004), Davis, CA, USA, November 15-17, 2004, volume 3278 of Lecture Notes in Computer Science, pages 171–182. Berlin: Springer, 2004. [AKR01] Martin F. Arlitt, Diwakar Krishnamurthy, and Jerry Rolia. Characterizing the scalability of a large web-based shopping system. ACM Transactions on Internet Technology, 1(1):44–69, 2001. [Ful06] Jeffrey Fulmer. Siege – homepage, 2006. Last visited August 31, 2007. [Jai91] Raj Jain. The Art of Computer Systems Performance Analysis. New York: John Wiley & Sons, 1991. [MA03] Daniel A. Menascé and Vasudeva Akula. Towards workload characterization of auction sites. In Proceedings of the 6th IEEE Workshop on Workload Characterization (WWC-6), Austin, TX, USA, October 27, 2003, pages 12–20. IEEE Press, 2003. [MAFM99] Daniel A. Menascé, Virgilio A. F. Almeida, Rodrigo Fonseca, and Marco A. Mendes. A methodology for workload characterization of e-commerce sites. In Proceedings of the 1st ACM Conference on Electronic commerce (EC ’99), Denver, CO, USA, November 3-5, 1999, pages 119–128. ACM Press, 1999. [MAR+00] Daniel Menascé, Virgilio A. F. Almeida, Rudolf Riedi, Flávia Ribeiro, Rodrigo Fonseca, and Jr. Wagner Meira. In search of invariants for e-business workloads. In Proceedings of the 2nd ACM Conference on Electronic Commerce (EC ’00), Denver, CO, USA, November 3-5, 1999, pages 56–65. ACM Press, 2000. [Mer07] Mercury Interactive Corporation. Mercury LoadRunner – homepage, 2007. Last visited August 31, 2007. André van Hoorn Diploma Thesis November 8, 2007 43/ 42 Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems Bibliography

Bibliography II

[Mie06] Andreas Mielke. Elements for response-time statistics in ERP transaction systems. Performance Evaluation, 63(7):635–653, 2006. [MIO87] John D. Musa, Anthony Iannino, and Kazuhira Okumoto. Software Reliability: Measurement, Prediction, Application. New York: McGraw-Hill, first edition, 1987. [MR06] Douglas C. Montgomery and George C. Runger. Applied Statistics and Probability for Engineers. New York: John Wiley & Sons, Inc., fourth edition, 2006. [Ope05]

  • OpenSTA. OpenSTA (Open System Testing Architecture) – homepage, 2005. Last visited August 31,

2007. [RvHGH07] Matthias Rohr, André van Hoorn, Simon Giesecke, and Wilhelm Hasselbring. Workload intensity sensitive timing behavior anomaly detection (in preparation). 2007. [Sil86]

  • B. W. Silverman. Kernel density estimation technique for statistics and data analysis. In Monographs
  • n Statistics and Applied Probability, volume 26. London: Chapman and Hall, 1986.

[SKF06] Mahnaz Shams, Diwakar Krishnamurthy, and Behrouz Far. A model-based approach for testing the performance of web applications. In Proceedings of the 3rd International Workshop on Software Quality Assurance (SOQUA ’06), Portland, OR, USA, November 6, 2006, pages 54–61. ACM Press, 2006. [SW01] Connie U. Smith and Lloyd G. William. Performance Solutions: A Practical Guide to Creating Responsive, Scalable Software. Amsterdam: Addison-Wesley, 1st edition, 2001. [vH07] André van Hoorn. Markov4JMeter – homepage, 2007. Last visited August 31, 2007. André van Hoorn Diploma Thesis November 8, 2007 44/ 42 Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems Bonus Scenes

Structure

1

Motivation

2

Foundations

3

Hypothesis & Goals

4

Results

5

Conclusions

6

Related Work

André van Hoorn Diploma Thesis November 8, 2007 45/ 42 Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems Bonus Scenes

Box-and-Whisker Plot

Visualizes Quartiles Interquartile range (IQR) Normal and extreme outliers

IQR 1.5 IQR 1.5 IQR 1.5 IQR 1.5 IQR Normal

  • utliers

Extreme

  • utlier

Normal

  • utliers
  • 1. quartile

Median

  • 3. quartile

Whisker extends to smallest data point within 1.5 interquartile ranges from first quartile Whisker extends to largest data point within 1.5 interquartile ranges from third quartile

Figure: Description of a box-and-whisker plot [MR06].

André van Hoorn Diploma Thesis November 8, 2007 46/ 42

slide-10
SLIDE 10

Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems Bonus Scenes

Density Estimation

Density Estimation:

Goal: Estimate underlying density function ˆ f Parametric (based on parametric distribution family) Non-parametric (e.g. kernel density estimation [Sil86])

20 40 60 80 0.000 0.005 0.010 0.015 0.020 0.025 0.030 0.035 Density −50 50 100 0.000 0.005 0.010 0.015 Density

Figure: Kernel density estimations of a data sample using a normal kernel and window sizes 2 and 20.

André van Hoorn Diploma Thesis November 8, 2007 47/ 42 Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems Bonus Scenes

Application Model

Workload Configuration Data Model

Contains all information to generate valid sessions 2-layered hierarchical state machine Session Layer

Non-det. finite state machine Application transitions can be labeled with guards and actions Transitions represent valid sequence of service calls in session

Protocol Layer

Contains required protocol details for session generation

  • Det. state machine for each application state

Again: Transitions can be labeled with guards and actions

André van Hoorn Diploma Thesis November 8, 2007 48/ 42 Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems Bonus Scenes

Meta Model

Workload Configuration Data Model (cont’d)

Workload Configuration − name Application Transition − guard − action Behavioral Transition − probability 0..* 1 1 0..* 0..* 1 1 0..* 1 1 1 Application Model User Behavior Model − thinkTime − entryState − finalState 1 0..* 1 1..* 1 1 Application State 1 Behavioral State 1 0..* 1 1..* Behavior Assignment − relativeFrequency Protocol Transition − guard − action Protocol State − uri − parameters 1 0..* Workload Intensity − duration − sessionArrivalFormula User Behavior Mix 0..* 1..*

André van Hoorn Diploma Thesis November 8, 2007 49/ 42 Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems Bonus Scenes

High-level Design

High−Level Overview of Workload Driver CD

1 1 0..* 0..* 1 0..* Behavior Mix Controller 1 1 User Simulation Thread Engine 1 behavior model assigned by > initializes and controls > includes > includes > session entrance scheduled by > 1 Session Arrival Controller

Architecture and iterative execution model Session model composition

André van Hoorn Diploma Thesis November 8, 2007 50/ 42 Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems Bonus Scenes

Markov4JMeter - Integration into Apache JMeter

GUI Thread Test Plan

(configuration)

Engine JMX JMeter Test Elements Thread Group Session Arrival Controller Behavior Mix Controller Behavior Files (CSV) Non−GUI Markov4JMeter Test Elements

markov4jmeter.controller markov4jmeter.controller.gui

MarkovSessionControllerGui extends AbstractControllerGui SessionArrivalFormulaPanel MarkovStateGui extends AbstractControllerGui ApplicationTransitionsPanel BehaviorMixPanel MarkovSessionController extends GenericController SessionArrivalFormula MarkovState extends GenericController ApplicationTransitions BehaviorMix

refers to

Test Plan

(instance)

configured by reads models from

André van Hoorn Diploma Thesis November 8, 2007 51/ 42 Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems Bonus Scenes

Instrumentation of JPetStore

Operation Request Type 1 2 3 4 5 6 7 8 9 10 11 12 13 index signonForm signon viewCategory viewProduct viewItem addItemToCart viewCart checkout newOrderForm newOrderData newOrderConfirm signoff struts.action.ActionServlet.doGet(HttpServletRequest, HttpServletResponse)

  • struts.action.ActionServlet.doPost(HttpServletRequest, HttpServletResponse)
  • persistence.sqlmapdao.AccountSqlMapDao.getAccount(String, String)
  • persistence.sqlmapdao.ItemSqlMapDao.getItem(String)
  • persistence.sqlmapdao.ItemSqlMapDao.getItemListByProduct(String)
  • persistence.sqlmapdao.OrderSqlMapDao.insertOrder(Order)
  • presentation.AccountBean.signon()
  • presentation.CartBean.addItemToCart()
  • presentation.CatalogBean.viewCategory()
  • presentation.CatalogBean.viewItem()
  • presentation.CatalogBean.viewProduct()
  • presentation.OrderBean.newOrder()
  • service.AccountService.getAccount(String, String)
  • service.CatalogService.getCategory(String)
  • service.CatalogService.getItem(String)
  • service.CatalogService.getItemListByProduct(String)
  • service.CatalogService.getProductListByCategory(String)
  • service.OrderService.getNextId(String)
  • service.OrderService.insertOrder(Order)
  • Activated Monitoring Points

Table: Identified monitoring points and coverage of request types.

André van Hoorn Diploma Thesis November 8, 2007 52/ 42

slide-11
SLIDE 11

Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems Bonus Scenes

Constructive Definition of PWI

1 Trace history H ⊆ N2 with tuples of trace start and stop times 2 Event history E ⊆ N × {−1, 1} E := {(tin, 1), (tout, −1) ∈ N × {−1, 1} | (tin, tout) ∈ H} 3 Active traces history A ⊆ N2 A := {(t, k) ∈ N2 | ∃a ∈ {−1, 1} : (t, a) ∈ E ∧ k = X

∀(t′,b)∈E:t′≤t

b} 4 Step Function activeTracesA : N → N activeTracesA(t) = ( k, ∃t∗ ∈ N : t∗ = max{t′ ∈ N|t′ ≤ t ∧ (t′, k) ∈ A} 0, else. 5 Platform workload intensity pwiA;ω : N → R+ pwiA;ω(t) = 1 ω

ω−1

X

i=0

activeTracesA(t − i)

André van Hoorn Diploma Thesis November 8, 2007 53/ 42 Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems Bonus Scenes

Overview of Plot Types

20 40 60 3.0 3.2 3.4 3.6 3.8 4.0 Users vs. Quartiles of Response Times persistence.sqlmapdao.AccountSqlMapDao.getAccount(...) Users Response time (ms)

  • 1. quartile

median

  • 3. quartile

4 5 6 7 8 3.0 3.2 3.4 3.6 3.8 4.0 Scatter Plot of Response Times persistence.sqlmapdao.AccountSqlMapDao.getAccount(...) Experiment time (minutes) N=486 Response time (ms) Local regression 5 6 7 8 3.0 3.2 3.4 3.6 3.8 4.0 Box−and−Whisker Plot of Response Times persistence.sqlmapdao.AccountSqlMapDao.getAccount(...) Experiment time (minutes) Response time (ms) Mean Median 3.0 3.5 4.0 0.0 0.5 1.0 1.5 2.0 2.5 Density Plot of Response Times persistence.sqlmapdao.AccountSqlMapDao.getAccount(...) Response time (milliseconds) N=486, Bandwidth=0.04385 Density Mean 3.163 Median 3.106

  • Approx. Mode 3.057

Skewness 1.666 Kurtosis 3.614 3.0 3.2 3.4 3.6 3.8 4.0 3.0 3.2 3.4 3.6 3.8 4.0 QQ Plot of Sample Data and 3−Parameter Log−Normal Distribution Λ(τ τ=2.78, µ=−1.084, σ σ=0.496) Sample response time (ms)

André van Hoorn Diploma Thesis November 8, 2007 54/ 42 Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems Bonus Scenes

The Four Identified Density Shapes (1/2)

2 4 6 8 10 12 14 5 10 15 20 Scatter Plot of Response Times presentation.OrderBean.newOrder(...) Experiment time (minutes) N=1681 Response time (ms) Local regression 5 10 15 20 0.00 0.05 0.10 0.15 0.20

Density Plot of Response Times presentation.OrderBean.newOrder(...) Response time (milliseconds) N=1681, Bandwidth=0.9664 Density Mean 4.517 Median 7.245

  • Approx. Mode 0.008206

Skewness 0.3665 Kurtosis −1.167

André van Hoorn Diploma Thesis November 8, 2007 55/ 42 Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems Bonus Scenes

The Four Identified Density Shapes (1/2)

2 4 6 8 10 12 14 5 10 15 20 Scatter Plot of Response Times presentation.OrderBean.newOrder(...) Experiment time (minutes) N=1681 Response time (ms) Local regression 5 10 15 20 0.00 0.05 0.10 0.15 0.20

Density Plot of Response Times presentation.OrderBean.newOrder(...) Response time (milliseconds) N=1681, Bandwidth=0.9664 Density Mean 4.517 Median 7.245

  • Approx. Mode 0.008206

Skewness 0.3665 Kurtosis −1.167 4.0 4.5 5.0 5.5 6.0 2 4 6 8 10 14 Scatter Plot of Response Times presentation.CartBean.addItemToCart(...) Experiment time (minutes) N=1025 Response time (ms) Local regression 5 10 15 0.0 0.1 0.2 0.3 0.4 Density Plot of Response Times presentation.CartBean.addItemToCart(...) Response time (milliseconds) N=1025, Bandwidth=0.3326 Density Mean 6.447 Median 5.938

  • Approx. Mode 5.322

Skewness 0.5146 Kurtosis 3.08

André van Hoorn Diploma Thesis November 8, 2007 55/ 42 Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems Bonus Scenes

The Four Identified Density Shapes (2/2)

5 10 15 3.6 3.8 4.0 4.2 Scatter Plot of Response Times service.CatalogService.getItem(...) Experiment time (minutes) N=2756 Response time (ms) Local regression 3.4 3.6 3.8 4.0 4.2 4.4 1 2 3 4 5 6 Density Plot of Response Times service.CatalogService.getItem(...) Response time (milliseconds) N=2756, Bandwidth=0.01253 Density Mean 3.632 Median 3.623

  • Approx. Mode 3.624

Skewness 2.127 Kurtosis 7.02 4 6 8 10 12 14 0.0 0.2 0.4 0.6 Density Plot of Response Times service.CatalogService.getItem(...) Response time (milliseconds) N=11863, Bandwidth=0.1509 Density Mean 4.796 Median 4.192

  • Approx. Mode 3.697

Skewness 2.457 Kurtosis 6.998

André van Hoorn Diploma Thesis November 8, 2007 56/ 42 Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems Bonus Scenes

The Four Identified Density Shapes (2/2)

5 10 15 3.6 3.8 4.0 4.2 Scatter Plot of Response Times service.CatalogService.getItem(...) Experiment time (minutes) N=2756 Response time (ms) Local regression 3.4 3.6 3.8 4.0 4.2 4.4 1 2 3 4 5 6 Density Plot of Response Times service.CatalogService.getItem(...) Response time (milliseconds) N=2756, Bandwidth=0.01253 Density Mean 3.632 Median 3.623

  • Approx. Mode 3.624

Skewness 2.127 Kurtosis 7.02 4 6 8 10 12 14 0.0 0.2 0.4 0.6 Density Plot of Response Times service.CatalogService.getItem(...) Response time (milliseconds) N=11863, Bandwidth=0.1509 Density Mean 4.796 Median 4.192

  • Approx. Mode 3.697

Skewness 2.457 Kurtosis 6.998 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 0.0 0.5 1.0 1.5 2.0

Density Plot of Response Times service.AccountService.getAccount(...) Response time (milliseconds) N=1294, Bandwidth=0.05728 Density Mean 3.751 Median 3.575

  • Approx. Mode 3.502

Skewness 2.374 Kurtosis 6.374 3 4 5 6 7 8 9 10 0.0 0.4 0.8 Density Plot of Response Times service.AccountService.getAccount(...) Response time (milliseconds) N=1284, Bandwidth=0.1243 Density Mean 4.135 Median 3.785

  • Approx. Mode 3.571

Skewness 2.542 Kurtosis 8.362

André van Hoorn Diploma Thesis November 8, 2007 56/ 42

slide-12
SLIDE 12

Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems Bonus Scenes

Anomaly Detection in Software Timing Behavior

Anomaly considered response time exceeding a given mean value by α percent in a period β Execution of operation o is tuple (o, st, rt) Anomaly detector (AD) must decide for all executions in set of executions Y whether or not it is an anomaly It knows set of observations X (History) assumed to con- tain no anomalies AD decides by comparing Y with X Quality of AD: Error rate with type I/II errors

André van Hoorn Diploma Thesis November 8, 2007 57/ 42 Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems Bonus Scenes

Plain Anomaly Detector (PAD)

PAD classifies an execution as anomalous iff its response time exceeds operation-specific threshold τ Threshold τ based on operation-specific response time mean from history X multiplied by tolerance factor δ Given history X, set of executions Y, e = (o, st, rt) ∈ Y, sample mean rto of o in X. PAD : N → B: PAD(e) :=

  • 1, rt > δ ∗ rto

0, else

André van Hoorn Diploma Thesis November 8, 2007 58/ 42 Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems Bonus Scenes

Example 1: PAD with Constant Workload Intensity

Synthetic workload scenario with Single operation Constant workload intensity

20 40 60 80 60 80 100 140 Experiment time (seconds) Response time milli secs Anomaly (16) Normal obs. (84) 90 100 110 120 130 140 150 160 0.0 0.2 0.4 0.6 0.8 1.0 Threshold Error rate

Error rate is 0 for τ ∈ [106.4, 144.9] Assuming rto ≈ 100, δ ∈ [1.06, 1.44]

André van Hoorn Diploma Thesis November 8, 2007 59/ 42 Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems Bonus Scenes

Example 2: PAD with Varying Workload Intensity

Synthetic workload scenario with Single operation Increasing workload intensity

10 20 30 40 50 60 80 100 140 180 Experiment time (seconds) Response time milli secs Anomaly (16, 8.0 percent) Normal obs. (184) Maximum of normal obs. 177.3 80 100 120 140 160 180 200 0.0 0.2 0.4 0.6 0.8 Threshold Error rate

Minimum error rate is 8 % (τ > 176) But then: No anomaly detected

André van Hoorn Diploma Thesis November 8, 2007 60/ 42 Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems Bonus Scenes

Workload-Intensity-sensitive Anomaly Det. (WISAD)

Explicitly considers varying workload intensity by including

1 Function pwi : N × N → R+

pwiA(e) := 1 rt

st+rt

  • t=st

activeTracesA(t)

2 Function wnfo : R → R; w → wnfo(w)

For a given workload intensity w, wnfo(w) is a workload in- tensity normalization factor for the response time threshold that applies to a workload intensity of 1.

Given history X, set of executions Y, e = (o, st, rt) ∈ Y, historical sample mean rto,1 of o at pwi 1 WISAD(e) :=

  • 1, rt > rto,1 ∗ wnfo(pwi(e)) ∗ δ

0, else

André van Hoorn Diploma Thesis November 8, 2007 61/ 42 Workload-sensitive Timing Behavior Anomaly Detection in Large Software Systems Bonus Scenes

Example 3: WISAD with Varying Workload Intensity

Synthetic varying workload scenario from Example 3 Values of pwi follow the equation 1 +

st 11.4

10 20 30 40 50 60 80 100 140 180 Experiment time (seconds) Response time milli secs Classified as anomaly (16, 8.0 percent) Classified as normal obs. (184) Threshold = 110 80 100 120 140 160 180 200 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Threshold Error rate Total error rate Type 1 error rate Type 2 error rate

wnfo(x) = x+6

7

and rto,1 = 100 Error rate 0 for threshold values between 106–118

André van Hoorn Diploma Thesis November 8, 2007 62/ 42