cs626 data analysis and simulation
play

CS626 Data Analysis and Simulation Instructor: Peter Kemper R 104A, - PowerPoint PPT Presentation

CS626 Data Analysis and Simulation Instructor: Peter Kemper R 104A, phone 221-3462, email:kemper@cs.wm.edu Office hours: Monday,Wednesday 2-4 pm Today: Stochastic Input Modeling Reference: Law/Kelton, Simulation Modeling and Analysis, Ch 6.


  1. CS626 Data Analysis and Simulation Instructor: Peter Kemper R 104A, phone 221-3462, email:kemper@cs.wm.edu Office hours: Monday,Wednesday 2-4 pm Today: Stochastic Input Modeling Reference: Law/Kelton, Simulation Modeling and Analysis, Ch 6. NIST/SEMATECH e-Handbook of Statistical Methods, http://www.itl.nist.gov/div898/handbook/ 1

  2. What is input modeling? Input modeling  Deriving a representation of the uncertainty or randomness in a stochastic simulation.  Common representations  Measurement data  Distributions derived from measurement data <-- focus of “Input modeling”  usually requires that samples are i.i.d and corresponding random variables in the simulation model are i.i.d  i.i.d. = independent and identically distributed  theoretical distributions  empirical distribution  Time-dependent stochastic process  Other stochastic processes Examples include  time to failure for a machining process;  demand per unit time for inventory of a product;  number of defective items in a shipment of goods;  times between arrivals of calls to a call center. 2

  3. Overview of fitting with data Check if key assumptions hold (i.i.d) Select one or more candidate distributions  based on physical characteristics of the process and  graphical examination of the data. Fit the distribution to the data  determine values for its unknown parameters. Check the fit to the data  via statistical tests and  via graphical analysis. If the distribution does not fit,  select another candidate and repeat the process, or  use an empirical distribution. from WSC 2010 Tutorial by Biller and Gunes, CMU, slides used with permission 3

  4. Check the fit to the data Graphical analysis  Plot fitted distribution and data in a way that differences can be recognized  beyond obvious cases, there is a grey area of subjective acceptance/rejection  Challenges  How much difference is significant enough to trash a fitted distribution?  Which graphical representation is easy to judge?  Options:  Histogram-based plots  Probability plots: P-P plot, Q-Q plot Statistical tests  define a measure X for the difference between fitted distribution & data  X is an RV, so if we find an argument what distribution X has, we get a statistical test to see if in a concrete case a value of X is significant  Goodness-of-fit tests:  Chi-square test( χ 2), Kolmogorov-Smirnov test(K-S), Anderson Darling test(AD) 4

  5. Sample test characteristic for Chi-Square test (all parameters known) One-sided Right side: - critical region - region of rejection Left side: - region of acceptance where we fail to reject hypothesis P-value of x: 1-F(x) 5

  6. Graphic Analysis vs Goodness-of-fit tests Graphic analysis includes:  Histogram with fitted distribution  Probability plots: P-P plot, Q-Q plot. Goodness-of-fit tests  represent lack of fit by a summary statistic, while plots show where the lack of fit occurs and whether it is important.  may accept the fit, but the plots may suggest the opposite, especially when the number of observations is small. +*0%1%*/21*34*56*37/2$8%1(3,/*(/* 72-(2820*13*72*4$39*%*,3$9%-* 0(/1$(7:1(3,;*<'2*43--3=(,>*%$2*1'2* ! ?8%-:2/*4$39*)'(?/@:%$2*12/1*%,0* A?B*12/1C D'(?/@:%$2*12/1C*6;EFF A?B*12/1C*G6;EH I'%1*(/*.3:$*)3,)-:/(3,J 6

  7. Density Histogram compares sample histogram (mind the bin sizes) with fitted distribution 7

  8. Frequency Histogram compares histogram from data with histogram according to fitted distribution 8

  9. Differences in distributions are easier to see along a straight line: 9

  10. Graphical comparisons Frequency Comparisons Probability Plots Features: Features: •Graphical comparison of a histogram of •Graphical comparison of an estimate of the the data with the density function of the true distribution function of the data with the distribution function of the fit. fitted distribution. • Q-Q (P-P) plot amplifies differences •Sensitive to how we group the data. between the tails (middle) of the model and sample distribution functions. • Use every graphical tool in the software to examine the fit. • If histogram-based tool, then play with the widths of the cells. • Q-Q plot is very highly recommended! from WSC 2010 Tutorial by Biller and Gunes, CMU, slides used with permission 10

  11. P-P plots and Q-Q plots Q-Q plot vs for q 1 ,...,q n P-P plot vs for p 1 ,...,p n This intuitive definition needs an adjustment to handle ties (multiple samples of same value) 11

  12. Q-Q Plot Recall that one way to generate data from cdf F is via � � 1 R � � via � Y F ( ) � � � � � The Q-Q plot displays the sorted data � � � � � Y Y Y � � � � � 1 2 n � � � � vs � � � � � � � � � � � � � � j 1 / 2 � � � � � � 1 � � � F , j 1 , 2 , n � � � � n � � � from WSC 2010 Tutorial by Biller and Gunes, CMU, slides used with permission 12

  13. Q-Q Plot Intuition . /0)1230)2)4256&0) ! " #$! % #&#! ' 2+7)80)9-()2)7-4(:-;,(-'+) ( (12()80)1'60)-4)<''7= . *9)80)+'8)<0+0:2(0)2):2+7'5)4256&0)'9)4->0) ' 9:'5) ($ -()41',&7)&''?)2;',()&-?0) ! " #$! % #&#! ' ) . @10)#$# 6&'()<0+0:2(04)2) *+,-+./ :2+7'5)4256&0)9':) A'562:-4'+= 13

  14. Features of the Q-Q plot It does not depend on how the data are grouped. It is much better than a density-histogram when the number of data points is small. Deviations from a straight line show where the distribution does not match. A straight line implies that the family of distributions is correct. A 45 o line implies that parameters fit as well. from WSC 2010 Tutorial by Biller and Gunes, CMU, slides used with permission 14

  15. Features of the Q-Q Plot A straight line implies the family of distributions is correct; a 45-degree line implies correct parameters. Exponential(44.468) Shift=-0.58 LogLogistic(-113.32, 156.71, 16.107) 120 120 Poor fit, misses badly in both tails. 100 100 Pretty good fit, but misses a bit on the right tail. 80 80 Fitted quantile @RISK Student Version @RISK Student Version @RISK Student Version @RISK Student Version @RISK Student Version @RISK Student Version @RISK Student Version @RISK Student Version @RISK Student Version @RISK Student Version @RISK Student Version @RISK Student Version @RISK Student Version @RISK Student Version @RISK Student Version @RISK Student Version @RISK Student Version @RISK Student Version @RISK Student Version @RISK Student Version 60 60 For Academic Use Only For Academic Use Only For Academic Use Only For Academic Use Only For Academic Use Only For Academic Use Only For Academic Use Only For Academic Use Only For Academic Use Only For Academic Use Only For Academic Use Only For Academic Use Only For Academic Use Only For Academic Use Only For Academic Use Only For Academic Use Only For Academic Use Only For Academic Use Only For Academic Use Only For Academic Use Only 40 40 20 20 0 0 0 20 40 60 80 100 120 0 20 40 60 80 100 120 Input quantile Input quantile from WSC 2010 Tutorial by Biller and Gunes, CMU, slides used with permission 15

  16. Examples of Q-Q plot 32 !2 "3 "2 43 !3 !2 42 56)+<7*0:7>?07,8+ 43 -%&7(;+*))&*+@A+ 42 >?0+06)+'%:%&)0):*+ 56)+-700789+7*+ 13 13 %:)+8,0 ':)00;+9,,<= 12 12 3 12 13 42 43 !2 3 12 13 42 43 !2 !3 "2 "3 32 16

  17. Example of Q-Q plot !" ;*7%/%*5)/*+,*!<*+65)1=%/2+95* 25*6)(2)=)7*/+*6)*,1+&*%* !< 9+1&%(*725/126>/2+9?*@:)* ,+((+A29B*%1)*/:)* ! .=%(>)5* K" ,1+&*C:2.5->%1)*/)5/*%97*D.E* /)5/F K< G:2.5->%1)*/)5/F*<?HII 0++1*,2/3*4255*6%7(8* H" 29*/:)*(),/*/%2( D.E*/)5/F*J<?H" H< H< H" K< K" !< !" 17

  18. P-P plot vs Q-Q plot: Sensitive to different kinds of deviations 18

  19. Should we just use the best fit? Software tools  exercise a set of distributions  optimize parameter settings for data and distribution  evaluate statistical tests  suggest a “best fit” Some concerns about the fully automated solution:  Tests represent lack of fit by a single summary statistic, while plots show where the lack of fit occurs and whether it is important.  Be sure to try different numbers of histogram cells; it affects the p- value of the χ 2 test, and your perception of the fit.  Be cautious with ranking fits by Chi-Sq, K-S and A-D statistics and always check the Q-Q plot.  If there is a strong physical basis for a particular distribution choice, then use it even if it is not the best fit.  Don’t be afraid to use your brain in addition to software! 19 from WSC 2010 Tutorial by Biller and Gunes, CMU, slides used with permission

  20. Overview of fitting with data Select one or more candidate distributions, based on physical characteristics of the process and graphical examination of the data. Fit the distribution to the data (determine values for its unknown parameters). Check the fit to the data via tests and graphical analysis. If the distribution does not fit, then select another candidate and repeat the process. What if no distribution provides a good fit? from WSC 2010 Tutorial by Biller and Gunes, CMU, slides used with permission 20

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend