cs626 data analysis and simulation
play

CS626 Data Analysis and Simulation Instructor: Peter Kemper R 104A, - PowerPoint PPT Presentation

CS626 Data Analysis and Simulation Instructor: Peter Kemper R 104A, phone 221-3462, email:kemper@cs.wm.edu Today: Trace Driven Simulation Reference: Law/Kelton, Simulation Modeling and Analysis, Ch 5.6. 1 What is trace-driven simulation?


  1. CS626 Data Analysis and Simulation Instructor: Peter Kemper R 104A, phone 221-3462, email:kemper@cs.wm.edu Today: Trace Driven Simulation Reference: Law/Kelton, Simulation Modeling and Analysis, Ch 5.6. 1

  2. What is trace-driven simulation? General idea  Replace sequence of pseudo random numbers by measurement data from real system (historical data) in a simulation run. Examples  Workload model: arrival times and types of requests/tasks  Machine model:  service times for tasks  failures and repair times Purpose  Model validation: Does the model represent the real system well enough? We need to compare the output of a simulation with the measurement data. Why is trace-driven simulation not used for production runs?  Can only reproduce what happened historically  Seldom enough data for all scenarios of interest  Data limited to set of observed values (finite set of discrete values) 2

  3. Trace driven simulation Idea: feed measurement data into simulation model Example from MAP fitting work by Casale, Smirni et al. 3

  4. Trace-Driven Simulation Advantages  Credibility  Easy Validation: Compare simulation results with measured data  Accurate Workload: Models correlation and dependencies  Detailed workload: Can study effect of small changes  Less Randomness: Input is deterministic input  Fair Comparison: Better than random input Disadvantages  Complexity: May be too detailed for simulation model  Representativeness: Historical data may be outdated, may refer to very particular load situation and system configuration  Finiteness: Simulation must stop at end of data  Space: May take enormous amount of space  Single Point of Validation: One particular scenario in design space  Parameterization: Workload data difficult to parameterize/adjust 4

  5. Comparing simulated and measured behavior Basic Inspection Approach  To compare simulated and measured behavior, run simulation with input values sampled from a distribution and compare to measurement data.  Seems classical area of statistical tests  Are both sets of samples from the same distribution?  But: Tests assume samples are i.i.d ... but simulated output is usually correlated and NOT independent.  What if we compare estimates of performance measures? Law/Kelton compares 2 M/M/1 systems  System X is M/M/1 with λ =1, ρ =.6  Model Y is M/M/1 with λ =1, ρ =.5  Observation: Sequence of delays in queue D i , let’s compare estimated means: correct E(X)=.87, E(Y)=.49 for first 200 customers  Exp 1 µ X =0.90 µ Y =0.70 µ X -µ Y =0.20  Exp 2 µ X =0.70 µ Y =0.71 µ X -µ Y =-0.01  Exp 3 µ X =1.08 µ Y =0.35 µ X -µ Y =0.73 5

  6. Trace-driven Simulation, Correlated Inspection Approach Correlation is good ...  If System and Model face exactly the same observations from input RVs, then comparison should be more precise due to correlation. Why is that?  Say RV X corresponds to the system, Y to the model  Recall: Var(aX + bY) = a 2 Var(X) + b 2 Var(Y) + 2 ab Cov(X,Y)  If X and Y are independent because the simulation draws from a distribution to produce values for Y, then Cov(X,Y)=0 and Var(X-Y) = Var(X) + Var(Y)  If the model follows the measured input data of the system, then we can expect that X and Y are positively correlated, s.th. Var(X-Y) = Var(X) + Var(Y) - 2 Cov(X,Y) and V(X-Y) is reduced. Law/Kelton contains an illustrating example to show that  Trace-driven and ordinary simulation both produce comparable estimates for the mean of a performance measure but trace-driven simulation results in a smaller variance of the estimate. 6

  7. Technical Issues Given: Sequence of interarrival times for tasks/requests How to incorporate data into a model (here Mobius)?  If not directly supported, we need to find a work-around ...  Problem 1: Need to make data in file accessible  Output-Gates of an Activity allow us to write C++ code segments  Open file and load data into some internal data structure like an array  Use an immediate, one-time activity to load data from file into array  Problem 2: Store data such that an activity can access it  Define an extended place to hold an array of floating point values  State variables are accessible in activities since behavior can be state- dependent  Problem 3: Make activity fire according to given interarrival times  Define a timed activity with a deterministic delay  Define the parameter of that delay to be the value at the current position in the array of interarrival times  Increment the current position in the output gate of that activity  Problem 4: Check if dynamic behavior is as expected (with trace) 7

  8. Improvements Change array into a ring buffer  load more data on-demand as necessary  uses less space  requires less configuration effort Encapsulate aspect into a separate atomic model  Reuse same model to read multiple files for different input streams  Requires some way to assign filenames appropriately Note: Many more ways to do this  Mobius supports user defined libraries with C++ code  Possible to implement file access with particular methods in a library  Provide an iterator concept to access numerical entries in a file  Memory mapped files as an alternative to arrays  Have a robust parser for file access with an appropriate exception handling  ... 8

  9. Furthermore Law/Kelton, Section 5.6.2  If we can obtain  m independent sets of system data  n independent sets of simulation data  we can take advantage of the independence and calculate confidence intervals for the µ X -µ Y  Options  paired-t approach, n=m but pairs can be correlated  Welch approach, any values of n, m > 1 but X,Y must be independent  Need to check if 0 is contained in interval between lower and upper bound  Statistically significant vs practically significant  Practically significant: Magnitude of difference invalidates any inference about the system that would be derived from the model 9

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend