 
              Computer Science, Informatik 4 Communication and Distributed Systems Simulation Simulation Modeling and Performance Analysis with Discrete-Event Simulation g y Dr. Mesut Güneş
Computer Science, Informatik 4 Communication and Distributed Systems Chapter 9 Input Modeling
Computer Science, Informatik 4 Communication and Distributed Systems Contents Contents � Data Collection Data Collection � Identifying the Distribution with Data � Parameter Estimation � Goodness-of-Fit Tests � Fitting a Nonstationary Poisson Process � Selecting Input Models without Data � Multivariate and Time-Series Input Data Dr. Mesut Güneş Chapter 9. Input Modeling 3
Computer Science, Informatik 4 Communication and Distributed Systems Purpose & Overview Purpose & Overview Input models provide the driving force for a simulation model. p p g � The quality of the output is no better than the quality of inputs. � In this chapter, we will discuss the 4 steps of input model � development: d l t 1) Collect data from the real system 2) Identify a probability distribution to represent the input process 2) Identify a probability distribution to represent the input process 3) Choose parameters for the distribution 4) Evaluate the chosen distribution and parameters for goodness of fit. Dr. Mesut Güneş Chapter 9. Input Modeling 4
Computer Science, Informatik 4 Communication and Distributed Systems Data Collection Data Collection Dr. Mesut Güneş Chapter 9. Input Modeling 5
Computer Science, Informatik 4 Communication and Distributed Systems Data Collection Data Collection One of the biggest tasks in solving a real problem gg g p � • GIGO – Garbage-In-Garbage-Out System Input Raw Data Performance Data Output Simulation Even when model structure is valid simulation results can be Even when model structure is valid simulation results can be � � misleading, if the input data is • inaccurately collected • • inappropriately analyzed inappropriately analyzed • not representative of the environment Dr. Mesut Güneş Chapter 9. Input Modeling 6
Computer Science, Informatik 4 Communication and Distributed Systems Data Collection Data Collection � Suggestions that may enhance and facilitate data Suggestions that may enhance and facilitate data collection: • Plan ahead: begin by a practice or pre-observing session, watch for unusual circumstances • Analyze the data as it is being collected: check adequacy • Combine homogeneous data sets: successive time • Combine homogeneous data sets: successive time periods, during the same time period on successive days • Be aware of data censoring: the quantity is not observed in its entirety, danger of leaving out long process times • Check for relationship between variables (scatter diagram) • Check for autocorrelation • Check for autocorrelation • Collect input data, not performance data Dr. Mesut Güneş Chapter 9. Input Modeling 7
Computer Science, Informatik 4 Communication and Distributed Systems Identifying the Distribution Identifying the Distribution Dr. Mesut Güneş Chapter 9. Input Modeling 8
Computer Science, Informatik 4 Communication and Distributed Systems Identifying the Distribution Identifying the Distribution Histograms g � Scatter Diagrams � Selecting families of distributions � Parameter estimation � Goodness-of-fit tests � Fitting a non stationary process Fitting a non-stationary process � � Dr. Mesut Güneş Chapter 9. Input Modeling 9
Computer Science, Informatik 4 Communication and Distributed Systems Histograms Histograms � A frequency distribution or histogram is useful in determining q y g g the shape of a distribution � The number of class intervals depends on: • The number of observations • The dispersion of the data • Suggested number of intervals: the square root of the sample size � For continuous data: • Corresponds to the probability density function of a theoretical distribution � For discrete data: • Corresponds to the probability mass function If few data points are available � • combine adjacent cells to eliminate the ragged appearance of the histogram g Dr. Mesut Güneş Chapter 9. Input Modeling 10
Computer Science, Informatik 4 Communication and Distributed Systems Histograms Histograms 15 � Same data with different Same data with different 10 10 interval sizes 5 0 0 0 2 2 4 4 6 6 8 8 10 12 14 16 18 20 10 12 14 16 18 20 30 20 10 0 4 8 12 16 20 40 40 35 30 25 7 14 20 Dr. Mesut Güneş Chapter 9. Input Modeling 11
Computer Science, Informatik 4 Communication and Distributed Systems Histograms – Example Histograms – Example Arrivals Vehicle Arrival Example: p � per Period Frequency 0 0 12 12 Number of vehicles arriving at 1 10 an intersection between 7 am 2 19 3 17 and 7:05 am was monitored for and 7:05 am was monitored for 4 4 10 10 100 random workdays. 5 8 6 7 There are ample data, so the � 7 5 8 5 hi t histogram may have a cell for h ll f 9 3 each possible value in the data 10 3 11 1 range 20 20 15 10 10 5 0 0 0 1 2 3 4 5 6 7 8 9 10 11 Dr. Mesut Güneş Chapter 9. Input Modeling 12
Computer Science, Informatik 4 Communication and Distributed Systems Histograms – Example Histograms – Example � Life tests were performed on electronic components at 1.5 Life tests were performed on electronic components at 1.5 times the nominal voltage, and their lifetime was recorded Component Life Frequency 0 ≤ x < 3 0 3 23 23 3 ≤ x < 6 10 6 ≤ x < 9 5 9 ≤ x < 12 1 12 ≤ x < 15 1 … 42 ≤ x < 45 1 … 144 ≤ x < 147 1 Dr. Mesut Güneş Chapter 9. Input Modeling 13
Computer Science, Informatik 4 Communication and Distributed Systems Histograms – Example Histograms – Example Stanford University Mobile Activity Traces (SUMATRA) • Target community: cellular network research community • Traces contain mobility as well as connection information connection information Available traces � • SULAWESI (S.U. Local Area Wireless Environment Signaling Information) • • BALI (Bay Area Location Information) BALI (Bay Area Location Information) BALI Characteristics � • San Francisco Bay Area y • Trace length: 24 hour • Number of cells: 90 • Persons per cell: 1100 • • Persons at all: 99 000 Persons at all: 99.000 Question: How to transform the BALI � • Active persons: 66.550 information so that it is usable with a • Move events: 243.951 network simulator, e.g., ns-2? • Call events: 1.570.807 • N d Node number as well as connection b ll ti number is too high for ns-2 Dr. Mesut Güneş Chapter 9. Input Modeling 14
Computer Science, Informatik 4 Communication and Distributed Systems Histograms – Example Histograms – Example 1800 Analysis of the BALI Trace y � 1600 • Goal: Reduce the amount of 1400 data by identifying user groups 1200 e 1000 l p User group g p o � e 800 P 600 600 • Between 2 local minima 400 • Communication characteristic 200 50 0 40 is kept in the group p g p 30 0 C C 5 5 • A user represents a group 20 a l 10 l s s 10 15 n t e m e v Groups with different mobility o 20 M � 0 characteristics characteristics 25000 • Intra- and inter group 20000 communication f People Interesting characteristic Interesting characteristic 15000 � � Number of 10000 • Number of people with odd number movements is 5000 negligible! negligible! 0 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Number of Movements Dr. Mesut Güneş Chapter 9. Input Modeling 15
Computer Science, Informatik 4 Communication and Distributed Systems Scatter Diagrams Scatter Diagrams � A scatter diagram is a quality tool that can show the A scatter diagram is a quality tool that can show the relationship between paired data • Random Variable X = Data 1 • Random Variable Y = Data 2 • Draw random variable X on the x -axis and Y on the y -axis 40 40 60 40 30 30 40 20 20 20 20 10 10 0 0 0 0 10 20 30 40 0 10 20 30 40 0 10 20 30 40 Strong Correlation Moderate Correlation No Correlation Dr. Mesut Güneş Chapter 9. Input Modeling 16
Computer Science, Informatik 4 Communication and Distributed Systems Scatter Diagrams Scatter Diagrams � Linear relationship Linear relationship • Correlation: Measures how well data line up • Slope: Measures the steepness of the data • Direction • Y Intercept Positive Correlation Negative Correlation 35 40 35 35 30 30 30 25 25 20 20 15 15 10 10 5 5 0 0 0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35 Dr. Mesut Güneş Chapter 9. Input Modeling 17
Computer Science, Informatik 4 Communication and Distributed Systems Selecting the Family of Distributions Selecting the Family of Distributions A family of distributions is selected based on: A family of distributions is selected based on: � • The context of the input variable • Shape of the histogram Frequently encountered distributions: � • Easier to analyze: Exponential, Normal and Poisson • Harder to analyze: Beta, Gamma and Weibull Dr. Mesut Güneş Chapter 9. Input Modeling 18
Recommend
More recommend