1
CS626 Data Analysis and Simulation
Today: Stochastic Input Modeling
Reference: Law/Kelton, Simulation Modeling and Analysis, Ch 6. NIST/SEMATECH e-Handbook of Statistical Methods, http://www.itl.nist.gov/div898/handbook/
CS626 Data Analysis and Simulation Instructor: Peter Kemper R 104A, - - PowerPoint PPT Presentation
CS626 Data Analysis and Simulation Instructor: Peter Kemper R 104A, phone 221-3462, email:kemper@cs.wm.edu Office hours: Monday, Wednesday 2-4 pm Today: Stochastic Input Modeling Reference: Law/Kelton, Simulation Modeling and Analysis, Ch 6.
1
Reference: Law/Kelton, Simulation Modeling and Analysis, Ch 6. NIST/SEMATECH e-Handbook of Statistical Methods, http://www.itl.nist.gov/div898/handbook/
Deriving a representation of the uncertainty or randomness in a
Common representations
Measurement data Distributions derived from measurement data <-- focus of “Input modeling”
usually requires that samples are i.i.d and corresponding random
variables in the simulation model are i.i.d
i.i.d. = independent and identically distributed theoretical distributions empirical distribution
Time-dependent stochastic process Other stochastic processes
time to failure for a machining process; demand per unit time for inventory of a product; number of defective items in a shipment of goods; times between arrivals of calls to a call center.
2
based on physical characteristics of the process and graphical examination of the data.
determine values for its unknown parameters.
via statistical tests and via graphical analysis.
select another candidate and repeat the process, or use an empirical distribution.
3 from WSC 2010 Tutorial by Biller and Gunes, CMU, slides used with permission
Plot fitted distribution and data in a way that differences can be
beyond obvious cases, there is a grey area of subjective acceptance/rejection
Challenges
How much difference is significant enough to trash a fitted distribution? Which graphical representation is easy to judge?
Options:
Histogram-based plots Probability plots: P-P plot, Q-Q plot
define a measure X for the difference between fitted distribution & data X is an RV, so if we find an argument what distribution X has, we get a
Goodness-of-fit tests:
Chi-square test(χ2), Kolmogorov-Smirnov test(K-S), Anderson Darling test(AD)
4
define a measure X for the difference between fitted distribution & data Test statistic X is an RV
say small X means small difference, high X means huge difference
if we find an argument what distribution X has, we get a statistical test
Say P(X ≤ x) = (1-α), and e.g. this holds for x=10 and α=.05, then we know that
if data is sampled from a given distribution and this is done n times (n->∞), this measure X will be below 10 in 95% of those cases.
If in our case, the sample data yields x=10.7, we can argue that it is too unlikely
that the sample data is from the fitted distribution.
Hypothesis H0, Alternative H1 Power of a test: (1-beta), probability to correctly reject H0 Alpha / Type I error: reject a true hypothesis Beta / Type II error: not rejecting a false hypothesis P-value: probability of observing result at least as extreme as test
5
6
the probability of observing a result at least as extreme as test
is the Type I error level (significance) at which we would just reject
If the α level (common values: 0.01, 0.05, 0.1) < p-value,
If the p-value is large (> 0.10)
then more extreme values than our current one are still reasonably likely so we fail to reject H0 in this sense it supports H0 that the distribution fits (but not more than that!)
7
8
>;
" " " "
! " " #
L+,(),JMQ STU,M0,-.L+,(),JMQ !" #$%&'" V$,+,.'% %'.0$,. 0$,3+,0%M*P.U+36<.37. 0$,."0$ %J0,+R*P<
which approximately follows the chi-square
9
" " " "
! " " #
B C JB JC KB B J K L M C O @ N ! JB JJ 9(,8(!""#$%&' ."*/0*123
!""#$%&'()*"( +*"#,- ."*/0*123 B JK J JB K J! L J@ M JB C N O @ @ C N C ! L JB L JJ J
10
! " # ! " # ! #$ %
! &
"#$%&'%()*&%+,%-./0)"% 1!2%.3%()*&%+,%-./0)1% 4"5)6)1578915 " L! !@A L L" ;@A ! L; LM@B "@LN ? LM !L@L "@O B L; L;@! B@BL N A LB@" !@NM A M O@N "@!A M N B@B O N !@" ; ? "@O L" ? "@? P.LL L "@L L"" L""@" !M@AO M@OM LL@A!
#E7F%>,-.F,3*)',. EH.7%>.-.
11
12
Features:
with the distribution function of the hypothesized distribution.
higher power than K-S test
Features:
line graph with the fitted density or mass function
from WSC 2010 Tutorial by Biller and Gunes, CMU, slides used with permission
13
!"
3.45.676666 6768 3.45.967666 :;7<8 6 67" 67! 67= 679 67< 67> 67? 67; 67: " 6 < "6 "< !6 !< =6 =< 96
!
@AB+#.+2CD0-2.E0(1,$-
F$(.GHID0&,H.J10.K-%L
@AB+#.+2CD0-2.E0(1,$-
F$(.GHID0&,H.J10.K-%L
@AB+#.+2CD0-2.E0(1,$-
F$(.GHID0&,H.J10.K-%L
@AB+#.+2CD0-2.E0(1,$-
F$(.GHID0&,H.J10.K-%L
@AB+#.+2CD0-2.E0(1,$-
F$(.GHID0&,H.J10.K-%L
@AB+#.+2CD0-2.E0(1,$-
F$(.GHID0&,H.J10.K-%L
@AB+#.+2CD0-2.E0(1,$-
F$(.GHID0&,H.J10.K-%L
@AB+#.+2CD0-2.E0(1,$-
F$(.GHID0&,H.J10.K-%L
@AB+#.+2CD0-2.E0(1,$-
F$(.GHID0&,H.J10.K-%L
@AB+#.+2CD0-2.E0(1,$-
F$(.GHID0&,H.J10.K-%L
TUF.$O.2Q0. QLR$2Q01,S0D. D,12(,VC2,$- TUF.$O.2Q0. 0&R,(,HI%. D,12(,VC2,$-. H$-12(CH20D. O($&.2Q0.DI2I
14
15
Modified critical values for adjusted A-D test statistics, reject H0 if An2 exceeds critical value.
16
with the distribution function of the hypothesized distribution.
higher power than K-S test
line graph with the fitted density or mass function
from WSC 2010 Tutorial by Biller and Gunes, CMU, slides used with permission
Histogram with fitted distribution Probability plots: P-P plot, Q-Q plot.
represent lack of fit by a summary statistic, while plots show where
may accept the fit, but the plots may suggest the opposite,
17
+*0%1%*/21*34*56*37/2$8%1(3,/*(/* 72-(2820*13*72*4$39*%*,3$9%-* 0(/1$(7:1(3,;*<'2*43--3=(,>*%$2*1'2* !?8%-:2/*4$39*)'(?/@:%$2*12/1*%,0* A?B*12/1C D'(?/@:%$2*12/1C*6;EFF A?B*12/1C*G6;EH I'%1*(/*.3:$*)3,)-:/(3,J
18
19
20
21
the data with the density function of the fitted distribution.
true distribution function of the data with the distribution function of the fit.
between the tails (middle) of the model and sample distribution functions.
from WSC 2010 Tutorial by Biller and Gunes, CMU, slides used with permission