1
CS626 Data Analysis and Simulation Instructor: Peter Kemper R 104A, - - PowerPoint PPT Presentation
CS626 Data Analysis and Simulation Instructor: Peter Kemper R 104A, - - PowerPoint PPT Presentation
CS626 Data Analysis and Simulation Instructor: Peter Kemper R 104A, phone 221-3462, email:kemper@cs.wm.edu Today: Trace Driven Simulation Reference: Law/Kelton, Simulation Modeling and Analysis, Ch 5.6. 1 What is trace-driven simulation?
What is trace-driven simulation? General idea
Replace sequence of pseudo random numbers by measurement data from real
system (historical data) in a simulation run.
Examples
Workload model: arrival times and types of requests/tasks Machine model:
service times for tasks failures and repair times
Purpose
Model validation: Does the model represent the real system well enough?
We need to compare the output of a simulation with the measurement data. Why is trace-driven simulation not used for production runs?
Can only reproduce what happened historically Seldom enough data for all scenarios of interest Data limited to set of observed values (finite set of discrete values)
2
Trace driven simulation Idea: feed measurement data into simulation model Example from MAP fitting work by Casale, Smirni et al.
3
Trace-Driven Simulation Advantages
Credibility Easy Validation: Compare simulation results with measured data Accurate Workload: Models correlation and dependencies Detailed workload: Can study effect of small changes Less Randomness: Input is deterministic input Fair Comparison: Better than random input
Disadvantages
Complexity: May be too detailed for simulation model Representativeness: Historical data may be outdated, may refer to
very particular load situation and system configuration
Finiteness: Simulation must stop at end of data Space: May take enormous amount of space Single Point of Validation: One particular scenario in design space Parameterization: Workload data difficult to parameterize/adjust
4
Comparing simulated and measured behavior Basic Inspection Approach
To compare simulated and measured behavior, run simulation with
input values sampled from a distribution and compare to measurement data.
Seems classical area of statistical tests
Are both sets of samples from the same distribution? But: Tests assume samples are i.i.d ... but simulated output is usually
correlated and NOT independent.
What if we compare estimates of performance measures?
Law/Kelton compares 2 M/M/1 systems
System X is M/M/1 with λ=1, ρ=.6 Model Y is M/M/1 with λ=1, ρ=.5 Observation: Sequence of delays in queue Di, let’s compare
estimated means: correct E(X)=.87, E(Y)=.49 for first 200 customers
Exp 1 µX=0.90 µY=0.70 µX-µY =0.20 Exp 2 µX=0.70 µY=0.71 µX-µY =-0.01 Exp 3 µX=1.08 µY=0.35 µX-µY =0.73
5
Trace-driven Simulation, Correlated Inspection Approach Correlation is good ...
If System and Model face exactly the same observations from input
RVs, then comparison should be more precise due to correlation.
Why is that?
Say RV X corresponds to the system, Y to the model Recall: Var(aX + bY) = a2Var(X) + b2Var(Y) + 2 ab Cov(X,Y) If X and Y are independent because the simulation draws from a
distribution to produce values for Y, then Cov(X,Y)=0 and Var(X-Y) = Var(X) + Var(Y)
If the model follows the measured input data of the system, then we
can expect that X and Y are positively correlated, s.th. Var(X-Y) = Var(X) + Var(Y) - 2 Cov(X,Y) and V(X-Y) is reduced.
Law/Kelton contains an illustrating example to show that
Trace-driven and ordinary simulation both produce comparable
estimates for the mean of a performance measure but trace-driven simulation results in a smaller variance of the estimate.
6
Technical Issues Given: Sequence of interarrival times for tasks/requests How to incorporate data into a model (here Mobius)?
If not directly supported, we need to find a work-around ... Problem 1: Need to make data in file accessible
Output-Gates of an Activity allow us to write C++ code segments Open file and load data into some internal data structure like an array Use an immediate, one-time activity to load data from file into array
Problem 2: Store data such that an activity can access it
Define an extended place to hold an array of floating point values State variables are accessible in activities since behavior can be state-
dependent
Problem 3: Make activity fire according to given interarrival times
Define a timed activity with a deterministic delay Define the parameter of that delay to be the value at the current position in
the array of interarrival times
Increment the current position in the output gate of that activity
Problem 4: Check if dynamic behavior is as expected (with trace)
7
Improvements Change array into a ring buffer
load more data on-demand as necessary uses less space requires less configuration effort
Encapsulate aspect into a separate atomic model
Reuse same model to read multiple files for different input streams Requires some way to assign filenames appropriately
Note: Many more ways to do this
Mobius supports user defined libraries with C++ code Possible to implement file access with particular methods in a library
Provide an iterator concept to access numerical entries in a file Memory mapped files as an alternative to arrays Have a robust parser for file access with an appropriate exception handling ...
8
Furthermore Law/Kelton, Section 5.6.2
If we can obtain
m independent sets of system data n independent sets of simulation data
we can take advantage of the independence and calculate
confidence intervals for the µX-µY
Options
paired-t approach, n=m but pairs can be correlated Welch approach, any values of n, m > 1 but X,Y must be independent
Need to check if 0 is contained in interval between lower and upper bound Statistically significant vs practically significant Practically significant: Magnitude of difference invalidates any inference about
the system that would be derived from the model
9