CS626 Data Analysis and Simulation Instructor: Peter Kemper R 104A, - PowerPoint PPT Presentation

CS626 Data Analysis and Simulation Instructor: Peter Kemper R 104A, phone 221-3462, email:kemper@cs.wm.edu Office hours: Monday,Wednesday 2-4 pm Today: Stochastic Input Modeling Reference: Law/Kelton, Simulation Modeling and Analysis, Ch 6. NIST/SEMATECH e-Handbook of Statistical Methods, http://www.itl.nist.gov/div898/handbook/ 1

What is input modeling? Input modeling  Deriving a representation of the uncertainty or randomness in a stochastic simulation.  Common representations  Measurement data  Distributions derived from measurement data <-- focus of “Input modeling”  usually requires that samples are i.i.d and corresponding random variables in the simulation model are i.i.d  i.i.d. = independent and identically distributed  theoretical distributions  empirical distribution  Time-dependent stochastic process  Other stochastic processes Examples include  time to failure for a machining process;  demand per unit time for inventory of a product;  number of defective items in a shipment of goods;  times between arrivals of calls to a call center. 2

Overview of fitting with data Check if key assumptions hold (i.i.d) Select one or more candidate distributions  based on physical characteristics of the process and  graphical examination of the data. Fit the distribution to the data  determine values for its unknown parameters. Check the fit to the data  via statistical tests and  via graphical analysis. If the distribution does not fit,  select another candidate and repeat the process, or  use an empirical distribution. from WSC 2010 Tutorial by Biller and Gunes, CMU, slides used with permission 3

Check the fit to the data Graphical analysis  Plot fitted distribution and data in a way that differences can be recognized  beyond obvious cases, there is a grey area of subjective acceptance/rejection  Challenges  How much difference is significant enough to trash a fitted distribution?  Which graphical representation is easy to judge?  Options:  Histogram-based plots  Probability plots: P-P plot, Q-Q plot Statistical tests  define a measure X for the difference between fitted distribution & data  X is an RV, so if we find an argument what distribution X has, we get a statistical test to see if in a concrete case a value of X is significant  Goodness-of-fit tests:  Chi-square test( χ 2), Kolmogorov-Smirnov test(K-S), Anderson Darling test(AD) 4

Sample test characteristic for Chi-Square test (all parameters known) One-sided Right side: - critical region - region of rejection Left side: - region of acceptance where we fail to reject hypothesis P-value of x: 1-F(x) 5

Graphic Analysis vs Goodness-of-fit tests Graphic analysis includes:  Histogram with fitted distribution  Probability plots: P-P plot, Q-Q plot. Goodness-of-fit tests  represent lack of fit by a summary statistic, while plots show where the lack of fit occurs and whether it is important.  may accept the fit, but the plots may suggest the opposite, especially when the number of observations is small. +*0%1%*/21*34*56*37/2$8%1(3,/*(/* 72-(2820*13*72*4$39*%*,3$9%-* 0(/1$(7:1(3,;*<'2*43--3=(,>*%$2*1'2* ! ?8%-:2/*4$39*)'(?/@:%$2*12/1*%,0* A?B*12/1C D'(?/@:%$2*12/1C*6;EFF A?B*12/1C*G6;EH I'%1*(/*.3:$*)3,)-:/(3,J 6

Density Histogram compares sample histogram (mind the bin sizes) with fitted distribution 7

Frequency Histogram compares histogram from data with histogram according to fitted distribution 8

Differences in distributions are easier to see along a straight line: 9

Graphical comparisons Frequency Comparisons Probability Plots Features: Features: •Graphical comparison of a histogram of •Graphical comparison of an estimate of the the data with the density function of the true distribution function of the data with the distribution function of the fit. fitted distribution. • Q-Q (P-P) plot amplifies differences •Sensitive to how we group the data. between the tails (middle) of the model and sample distribution functions. • Use every graphical tool in the software to examine the fit. • If histogram-based tool, then play with the widths of the cells. • Q-Q plot is very highly recommended! from WSC 2010 Tutorial by Biller and Gunes, CMU, slides used with permission 10

P-P plots and Q-Q plots Q-Q plot vs for q 1 ,...,q n P-P plot vs for p 1 ,...,p n This intuitive definition needs an adjustment to handle ties (multiple samples of same value) 11

Q-Q Plot Recall that one way to generate data from cdf F is via � � 1 R � � via � Y F ( ) � � � � � The Q-Q plot displays the sorted data � � � � � Y Y Y � � � � � 1 2 n � � � � vs � � � � � � � � � � � � � � j 1 / 2 � � � � � � 1 � � � F , j 1 , 2 , n � � � � n � � � from WSC 2010 Tutorial by Biller and Gunes, CMU, slides used with permission 12

Q-Q Plot Intuition . /0)1230)2)4256&0) ! " #$! % #&#! ' 2+7)80)9-()2)7-4(:-;,(-'+) ( (12()80)1'60)-4)<''7= . *9)80)+'8)<0+0:2(0)2):2+7'5)4256&0)'9)4->0) ' 9:'5) ($ -()41',&7)&''?)2;',()&-?0) ! " #$! % #&#! ' ) . @10)#$# 6&'()<0+0:2(04)2) *+,-+./ :2+7'5)4256&0)9':) A'562:-4'+= 13

Features of the Q-Q plot It does not depend on how the data are grouped. It is much better than a density-histogram when the number of data points is small. Deviations from a straight line show where the distribution does not match. A straight line implies that the family of distributions is correct. A 45 o line implies that parameters fit as well. from WSC 2010 Tutorial by Biller and Gunes, CMU, slides used with permission 14

Features of the Q-Q Plot A straight line implies the family of distributions is correct; a 45-degree line implies correct parameters. Exponential(44.468) Shift=-0.58 LogLogistic(-113.32, 156.71, 16.107) 120 120 Poor fit, misses badly in both tails. 100 100 Pretty good fit, but misses a bit on the right tail. 80 80 Fitted quantile @RISK Student Version @RISK Student Version @RISK Student Version @RISK Student Version @RISK Student Version @RISK Student Version @RISK Student Version @RISK Student Version @RISK Student Version @RISK Student Version @RISK Student Version @RISK Student Version @RISK Student Version @RISK Student Version @RISK Student Version @RISK Student Version @RISK Student Version @RISK Student Version @RISK Student Version @RISK Student Version 60 60 For Academic Use Only For Academic Use Only For Academic Use Only For Academic Use Only For Academic Use Only For Academic Use Only For Academic Use Only For Academic Use Only For Academic Use Only For Academic Use Only For Academic Use Only For Academic Use Only For Academic Use Only For Academic Use Only For Academic Use Only For Academic Use Only For Academic Use Only For Academic Use Only For Academic Use Only For Academic Use Only 40 40 20 20 0 0 0 20 40 60 80 100 120 0 20 40 60 80 100 120 Input quantile Input quantile from WSC 2010 Tutorial by Biller and Gunes, CMU, slides used with permission 15

Examples of Q-Q plot 32 !2 "3 "2 43 !3 !2 42 56)+<7*0:7>?07,8+ 43 -%&7(;+*))&*+@A+ 42 >?0+06)+'%:%&)0):*+ 56)+-700789+7*+ 13 13 %:)+8,0 ':)00;+9,,<= 12 12 3 12 13 42 43 !2 3 12 13 42 43 !2 !3 "2 "3 32 16

Example of Q-Q plot !" ;*7%/%*5)/*+,*!<*+65)1=%/2+95* 25*6)(2)=)7*/+*6)*,1+&*%* !< 9+1&%(*725/126>/2+9?*@:)* ,+((+A29B*%1)*/:)* ! .=%(>)5* K" ,1+&*C:2.5->%1)*/)5/*%97*D.E* /)5/F K< G:2.5->%1)*/)5/F*<?HII 0++1*,2/3*4255*6%7(8* H" 29*/:)*(),/*/%2( D.E*/)5/F*J<?H" H< H< H" K< K" !< !" 17

P-P plot vs Q-Q plot: Sensitive to different kinds of deviations 18

Should we just use the best fit? Software tools  exercise a set of distributions  optimize parameter settings for data and distribution  evaluate statistical tests  suggest a “best fit” Some concerns about the fully automated solution:  Tests represent lack of fit by a single summary statistic, while plots show where the lack of fit occurs and whether it is important.  Be sure to try different numbers of histogram cells; it affects the p- value of the χ 2 test, and your perception of the fit.  Be cautious with ranking fits by Chi-Sq, K-S and A-D statistics and always check the Q-Q plot.  If there is a strong physical basis for a particular distribution choice, then use it even if it is not the best fit.  Don’t be afraid to use your brain in addition to software! 19 from WSC 2010 Tutorial by Biller and Gunes, CMU, slides used with permission

Overview of fitting with data Select one or more candidate distributions, based on physical characteristics of the process and graphical examination of the data. Fit the distribution to the data (determine values for its unknown parameters). Check the fit to the data via tests and graphical analysis. If the distribution does not fit, then select another candidate and repeat the process. What if no distribution provides a good fit? from WSC 2010 Tutorial by Biller and Gunes, CMU, slides used with permission 20

CS626 Data Analysis and Simulation Instructor: Peter Kemper R 104A, - PowerPoint PPT Presentation

CS626 Data Analysis and Simulation Instructor: Peter Kemper R 104A, phone 221-3462, email:kemper@cs.wm.edu Office hours: Monday,Wednesday 2-4 pm Today: Stochastic Input Modeling Reference: Law/Kelton, Simulation Modeling and Analysis, Ch 6.

CS626 Data Analysis and Simulation Instructor: Peter Kemper R 104A, phone 221-3462,

CS626 Data Analysis and Simulation Instructor: Peter Kemper R 104A, phone 221-3462,

CS626 Data Analysis and Simulation Instructor: Peter Kemper R 104A, phone 221-3462,

CS626 Data Analysis and Simulation Instructor: Peter Kemper R 104A, phone 221-3462,

CS626 Data Analysis and Simulation Instructor: Peter Kemper R 104A, phone 221-3462,

CS626 Data Analysis and Simulation Instructor: Peter Kemper R 104A, phone 221-3462,

CS626 Data Analysis and Simulation Instructor: Peter Kemper R 104A, phone 221-3462,

Outline Narcisse Ngada DESY, MKK 1) What is simulation ? 14.05.2014 2) Why simulation ? 3)

Grid simulation (AliEn) Outline GRID simulation Simulation tool Ptolemy (Berkeley)

T7 Cloud Simulation On-demand access simulation December 2016 T7 Cloud Simulation December 2016

Simulation Simulation CHAPTER 1 INTRODUCTION TO SIMULATION 2 MODELING CHAPTER 1 INTRODUCTION

Statistical Simulation in Python Tushar Shanker Data Scientist DataCamp Statistical Simulation

Automated Configuration of Co-simulation with Domain Specific Hints Co-simulation on the rise

Simulation of stationary processes Timo Tiihonen 2014 Tactical aspects of simulation

MD3311 Simulation Results Joschua Dilly 28.01.2019 MD3311 Simulation Results 2 Introduction

Surgical Simulation: Surgical Simulation: We dont need simulation. We dont need

Welcome! Office Hours will start at 2pm and run until 3pm Please mute your microphone As time

CE419 Session 17: Forms Web Programming Forms <form> is the way that allows users to

CMS Data: Introduction to Limited Data Sets (LDS) 4/10/13 Presented by Sarah Brunsberg About

Lists Joan Boone jpboone@email.unc.edu Summer 2020 Slide 1 Topics Part 1 Overview,

List Implementations Mark Redekopp David Kempe Sandra Batista 2 Lists Ordered collection

Regexp Lecture 26: Regular Expressions Regular Expressions Regular expressions are a small

Inefficiencies 1 Ad Tech Value Chain Evolution Aggregation 2 Ad Tech Value Chain Evolution

Advanced MPI Programming Latest slides and code examples are available

Sambuz

Useful Links

Newsletter

Mail Us

CS626 Data Analysis and Simulation Instructor: Peter Kemper R 104A, - PowerPoint PPT Presentation

CS626 Data Analysis and Simulation Instructor: Peter Kemper R 104A, phone 221-3462, email:kemper@cs.wm.edu Office hours: Monday,Wednesday 2-4 pm Today: Stochastic Input Modeling Reference: Law/Kelton, Simulation Modeling and Analysis, Ch 6.

CS626 Data Analysis and Simulation Instructor: Peter Kemper R 104A, phone 221-3462,

CS626 Data Analysis and Simulation Instructor: Peter Kemper R 104A, phone 221-3462,

CS626 Data Analysis and Simulation Instructor: Peter Kemper R 104A, phone 221-3462,

CS626 Data Analysis and Simulation Instructor: Peter Kemper R 104A, phone 221-3462,

CS626 Data Analysis and Simulation Instructor: Peter Kemper R 104A, phone 221-3462,

CS626 Data Analysis and Simulation Instructor: Peter Kemper R 104A, phone 221-3462,

CS626 Data Analysis and Simulation Instructor: Peter Kemper R 104A, phone 221-3462,

Outline Narcisse Ngada DESY, MKK 1) What is simulation ? 14.05.2014 2) Why simulation ? 3)

Grid simulation (AliEn) Outline GRID simulation Simulation tool Ptolemy (Berkeley)

T7 Cloud Simulation On-demand access simulation December 2016 T7 Cloud Simulation December 2016

Simulation Simulation CHAPTER 1 INTRODUCTION TO SIMULATION 2 MODELING CHAPTER 1 INTRODUCTION

Statistical Simulation in Python Tushar Shanker Data Scientist DataCamp Statistical Simulation

Automated Configuration of Co-simulation with Domain Specific Hints Co-simulation on the rise

Simulation of stationary processes Timo Tiihonen 2014 Tactical aspects of simulation

MD3311 Simulation Results Joschua Dilly 28.01.2019 MD3311 Simulation Results 2 Introduction

Surgical Simulation: Surgical Simulation: We dont need simulation. We dont need

Welcome! Office Hours will start at 2pm and run until 3pm Please mute your microphone As time

CE419 Session 17: Forms Web Programming Forms &lt;form&gt; is the way that allows users to

CMS Data: Introduction to Limited Data Sets (LDS) 4/10/13 Presented by Sarah Brunsberg About

Lists Joan Boone jpboone@email.unc.edu Summer 2020 Slide 1 Topics Part 1 Overview,

List Implementations Mark Redekopp David Kempe Sandra Batista 2 Lists Ordered collection

Regexp Lecture 26: Regular Expressions Regular Expressions Regular expressions are a small

Inefficiencies 1 Ad Tech Value Chain Evolution Aggregation 2 Ad Tech Value Chain Evolution

Advanced MPI Programming Latest slides and code examples are available

Sambuz

Useful Links

Newsletter

Mail Us

CE419 Session 17: Forms Web Programming Forms <form> is the way that allows users to