CS626 Data Analysis and Simulation Instructor: Peter Kemper R 104A, - - PowerPoint PPT Presentation

cs626 data analysis and simulation
SMART_READER_LITE
LIVE PREVIEW

CS626 Data Analysis and Simulation Instructor: Peter Kemper R 104A, - - PowerPoint PPT Presentation

CS626 Data Analysis and Simulation Instructor: Peter Kemper R 104A, phone 221-3462, email:kemper@cs.wm.edu Today: Trace Driven Simulation Reference: Law/Kelton, Simulation Modeling and Analysis, Ch 5.6. 1 What is trace-driven simulation?


slide-1
SLIDE 1

1

CS626 Data Analysis and Simulation

Today: Trace Driven Simulation

Reference: Law/Kelton, Simulation Modeling and Analysis, Ch 5.6.

Instructor: Peter Kemper

R 104A, phone 221-3462, email:kemper@cs.wm.edu

slide-2
SLIDE 2

What is trace-driven simulation? General idea

 Replace sequence of pseudo random numbers by measurement data from real

system (historical data) in a simulation run.

Examples

 Workload model: arrival times and types of requests/tasks  Machine model:

 service times for tasks  failures and repair times

Purpose

 Model validation: Does the model represent the real system well enough?

We need to compare the output of a simulation with the measurement data. Why is trace-driven simulation not used for production runs?

 Can only reproduce what happened historically  Seldom enough data for all scenarios of interest  Data limited to set of observed values (finite set of discrete values)

2

slide-3
SLIDE 3

Trace driven simulation Idea: feed measurement data into simulation model Example from MAP fitting work by Casale, Smirni et al.

3

slide-4
SLIDE 4

Trace-Driven Simulation Advantages

 Credibility  Easy Validation: Compare simulation results with measured data  Accurate Workload: Models correlation and dependencies  Detailed workload: Can study effect of small changes  Less Randomness: Input is deterministic input  Fair Comparison: Better than random input

Disadvantages

 Complexity: May be too detailed for simulation model  Representativeness: Historical data may be outdated, may refer to

very particular load situation and system configuration

 Finiteness: Simulation must stop at end of data  Space: May take enormous amount of space  Single Point of Validation: One particular scenario in design space  Parameterization: Workload data difficult to parameterize/adjust

4

slide-5
SLIDE 5

Comparing simulated and measured behavior Basic Inspection Approach

 To compare simulated and measured behavior, run simulation with

input values sampled from a distribution and compare to measurement data.

 Seems classical area of statistical tests

 Are both sets of samples from the same distribution?  But: Tests assume samples are i.i.d ... but simulated output is usually

correlated and NOT independent.

 What if we compare estimates of performance measures?

Law/Kelton compares 2 M/M/1 systems

 System X is M/M/1 with λ=1, ρ=.6  Model Y is M/M/1 with λ=1, ρ=.5  Observation: Sequence of delays in queue Di, let’s compare

estimated means: correct E(X)=.87, E(Y)=.49 for first 200 customers

 Exp 1 µX=0.90 µY=0.70 µX-µY =0.20  Exp 2 µX=0.70 µY=0.71 µX-µY =-0.01  Exp 3 µX=1.08 µY=0.35 µX-µY =0.73

5

slide-6
SLIDE 6

Trace-driven Simulation, Correlated Inspection Approach Correlation is good ...

 If System and Model face exactly the same observations from input

RVs, then comparison should be more precise due to correlation.

Why is that?

 Say RV X corresponds to the system, Y to the model  Recall: Var(aX + bY) = a2Var(X) + b2Var(Y) + 2 ab Cov(X,Y)  If X and Y are independent because the simulation draws from a

distribution to produce values for Y, then Cov(X,Y)=0 and Var(X-Y) = Var(X) + Var(Y)

 If the model follows the measured input data of the system, then we

can expect that X and Y are positively correlated, s.th. Var(X-Y) = Var(X) + Var(Y) - 2 Cov(X,Y) and V(X-Y) is reduced.

Law/Kelton contains an illustrating example to show that

 Trace-driven and ordinary simulation both produce comparable

estimates for the mean of a performance measure but trace-driven simulation results in a smaller variance of the estimate.

6

slide-7
SLIDE 7

Technical Issues Given: Sequence of interarrival times for tasks/requests How to incorporate data into a model (here Mobius)?

 If not directly supported, we need to find a work-around ...  Problem 1: Need to make data in file accessible

 Output-Gates of an Activity allow us to write C++ code segments  Open file and load data into some internal data structure like an array  Use an immediate, one-time activity to load data from file into array

 Problem 2: Store data such that an activity can access it

 Define an extended place to hold an array of floating point values  State variables are accessible in activities since behavior can be state-

dependent

 Problem 3: Make activity fire according to given interarrival times

 Define a timed activity with a deterministic delay  Define the parameter of that delay to be the value at the current position in

the array of interarrival times

 Increment the current position in the output gate of that activity

 Problem 4: Check if dynamic behavior is as expected (with trace)

7

slide-8
SLIDE 8

Improvements Change array into a ring buffer

 load more data on-demand as necessary  uses less space  requires less configuration effort

Encapsulate aspect into a separate atomic model

 Reuse same model to read multiple files for different input streams  Requires some way to assign filenames appropriately

Note: Many more ways to do this

 Mobius supports user defined libraries with C++ code  Possible to implement file access with particular methods in a library

 Provide an iterator concept to access numerical entries in a file  Memory mapped files as an alternative to arrays  Have a robust parser for file access with an appropriate exception handling  ...

8

slide-9
SLIDE 9

Furthermore Law/Kelton, Section 5.6.2

 If we can obtain

 m independent sets of system data  n independent sets of simulation data

 we can take advantage of the independence and calculate

confidence intervals for the µX-µY

 Options

 paired-t approach, n=m but pairs can be correlated  Welch approach, any values of n, m > 1 but X,Y must be independent

 Need to check if 0 is contained in interval between lower and upper bound  Statistically significant vs practically significant  Practically significant: Magnitude of difference invalidates any inference about

the system that would be derived from the model

9