Experiment Design and Data Analysis When dealing with measurement - PDF document

Experiment Design and Data Analysis When dealing with measurement and simulation, a careful experiment design and data analysis are essential for reducing costs and drawing meaningful conclusions. The two is- sues are coupled, since it is usually not possi- ble to select all parameters of an experiment without doing a preliminary run and analyzing the data obtained.

Simulation Techniques • Continuous-Time Simulation • Discrete-Event Simulation

A Standard Uniform Random Variable Y • Let us assume that Y is a random variable uniformly distributed between 0 and 1. That is  0 , if x < 0 ,        F X ( x ) = P [ X ≤ x ] = if 0 ≤ x ≤ 1 , x,      1 , if x > 0 .  

Generating a Random Variable X with Dis- tribution G(.) • Define X = G − 1 ( Y ). • Then, P [ X ≤ x ] F X ( x ) = P [ G − 1 ( Y ) ≤ x ] = = P [ Y ≤ G ( x )] = G ( x ) .

Fundamentals of Data Analysis The most fundamental aspect of the systems of interest is that they are driven by a nonde- terministic workload. The randomness in the inputs makes the outputs also random. Thus, no single observation from the system would give a reliable indication of the performance of the system. One way to cope with this randomness is to use several observations in estimating how the system will behave “on average”.

Some Questions • How do we use several observations to estimate the average performance, i.e., what is a good estimator based on several observations? • Is an estimate based on several observations necessarily more reliable than the one based on a single observation? • How do we characterize the error in our estimate as a function of the number of observations? Or, put another way, given the tolerable error, how do we determine the number of observations?

Some Questions (continued) • How do we perform experiments so that the error characterization is itself reliable? • If the number of needed observations is found to be too large, what can we do to reduce it?

Some Assumptions • Let X denote a performance measure of interest (e.g., the response time). • We can regard X as a random variable with some unknown distribution. Let s and σ 2 denote its mean and variance re- spectively. • Suppose that we obtain the observations X 1 , X 2 , · · · , X n as a sequence of i.i.d. random variables where for each i , E ( X i ) = s and V ar ( X i ) = σ 2 .

Sample Mean Estimator X • n X = 1 � X i n i =1 • X is an unbiased estimator because n n E [ X ] = 1 X i ] = 1 � � n E [ E [ X i ] = s n i =1 i =1

Variance of Sample Mean Estimator X σ 2 E [( X − s ) 2 ] = X n n 1 � � E [( X i − s )( X j − s )] = n 2 i =1 j =1 n 1 E [ X i − s ] 2 + 1 � = n 2 n 2 i =1 n n � � E [( X i − s )( X j − s )] i =1 j =1 ,j � = i n n σ 2 n + 2 � � = Cov ( X i , X j ) n 2 i =1 j = i +1 σ 2 = n

Variance of Sample Mean Estimator X (continued) • If σ is finite, then we have n →∞ σ 2 lim X = 0 . • That is the sample mean will converge to the expected value as n → ∞ . This is one form of the law of large numbers .

Sample Variance Estimator δ 2 X • n 1 δ 2 ( X i − X ) 2 � X = n − 1 i =1

Sample Variance Estimator δ 2 X (continued) • n E [ X i − s + s − X ] 2 � φ = i =1 n n E [( X i − s ) − 1 ( X j − s )] 2 � � = n i =1 j =1 • Expanding the square, taking the expecta- tion operator inside, and noting that E [( X i − s ) 2 ] = σ 2 for any i , we can have as in the next page:

Sample Variance Estimator δ 2 X (continued) n [ σ 2 − σ 2 n − 2 � � φ = Cov ( X i , X j ) n i =1 j � = i n + 1 � � Cov ( X j , X k )] n 2 j =1 k � = j n n ( n − 1) σ 2 − 2 � � = Cov ( X i , X j ) n i =1 j = i +1

Sample Variance Estimator δ 2 X (continued) φ • It is easy to see that E [ δ 2 X ] = n − 1 • Thus, we see that if X i ’s are mutually independent, δ X is an unbiased estimator of σ, but not otherwise in general. • Since V ar ( X ) = σ 2 in this case, we can n also define an unbiased estimator of V ar ( X ), X , as simply δ 2 denoted δ 2 X n .

Characterization of the value of s • The measures X and δ 2 X give some idea about the value of s . • For a more concrete characterization, we would like to obtain an interval of width e around X , such that the real value s lies somewhere in the range of X ± e. • Since X is a random variable, we can spec- ify such a finite range only with a probability P 0 < 1. • The parameter P 0 is called the confidence level , and must be chosen a priori.

Characterization of the value of s (continued) • Thus, our problem is to determine e such that Pr ( | X − s | ≤ e ) = P 0 • The parameter 2 e is called the confidence interval , and is expected to increase as P 0 increases. • To determine the value of e , we need to know the distribution of X . • To this end, we use the central limit the- orem , and conclude that if n is large, the distribution of X can be approximated as N ( s, σ/ √ n ), i.e., normal with mean s and variance σ 2 /n .

Characterization of the value of s (continued) • Let Y = ( X − s ) √ n /σ • Then, the distribution of Y must be N (0 , 1). • We can find e ′ such that Pr ( | Y | ≤ e ′ ) = P 0 = 1 − α • Let Pr ( Y ≤ Z β ) = 1 − β . Z β can be found Then, e ′ can be from a standard table. found as e ′ = Z α/ 2

Characterization of the value of s (continued) • Accordingly, we have Pr ( | Y | ≤ Z α/ 2 ) = Pr ( | ( X − s ) √ n /σ | ≤ Z α/ 2 ) = Pr ( | ( X − s ) ≤ Z α/ 2 σ/ √ n ) • Thus, we can have e = Z α/ 2 σ/ √ n where σ is unknown. • We can substitute δ X for σ , but that will not work because the distribution of the random variable ( X − s ) √ n /δ X is unknown and may differ substantially from the normal distribution.

Characterization of the value of s (continued) • T get around this difficulty, we assume that the distribution of each X i itself is normal, i.e., N ( s, σ ). Then, Y = ( X − s ) √ n /δ X has the standard t-distribution with ( n − 1) degrees of freedom. We denote the latter as Φ t,n − 1 ( . ). • Let Pr ( Y ≤ t n − 1 ,β ) = 1 − β . t n − 1 ,β can be found from a standard table. • Then, we can write Pr ( | Y | ≤ t n − 1 ,α/ 2 ) = 1 − α

Characterization of the value of s (continued) • Accordingly, we get Pr ( | ( X − s ) √ n /δ X | ≤ t n − 1 ,α/ 2 ) = 1 − α • We can put the above equation in the fol- lowing alternate form Pr [ X − η ≤ s ≤ X + η ] = 1 − α where δ X t n − 1 ,α/ 2 η = √ n

Characterization of the value of s (continued) • The last formula can be used in two ways: – to determine confidence interval for a given number of observations, or – to determine the number of observations needed to achieve a given confidence interval. • For the latter, suppose that the desired error (i.e., fractional half-width of the confidence interval) is q . Then δ 2 X t 2 δ X t n − 1 ,α/ 2 n − 1 ,α/ 2 √ n ≤ qX ⇒ n ≥ q 2 X 2

Characterization of the value of s (continued) • For the latter, suppose that the desired error (i.e., fractional half-width of the confidence interval) is q . Then δ 2 X t 2 δ X t n − 1 ,α/ 2 n − 1 ,α/ 2 √ n ≤ qX ⇒ n ≥ q 2 X 2 • Since δ X , X , and t n − 1 ,α/ 2 depend on n , we should first “guess” some value for n and determine δ X , X , and t n − 1 ,α/ 2 . Then, we can check if the above equation is satis- fied. If it is not, more observations should be made.

Characterization of the value of s (continued) • In the previous cases, we considered a two- sided confidence interval. In some appli- cations, we only want to find out whether the performance measure of interest exceeds (or remains below) some given threshold. • For example, to assert that the actual value s exceeds some threshold X − e , let Y = ( X − s ) √ n /δ X . Then Pr ( s ≥ X − e ) = P 0 = 1 − α

Characterization of the value of s (continued) • Accordingly, we get Pr ( Y ≤ e ′ ) = 1 − α where e ′ = t n − 1 ,α . • Thus, we find e = δ X t n − 1 ,α √ n

Example: Five independent experiments were conducted for determining the average flow rate of the coolant discharged by the cooling system. One hundred observations were taken in each experiment, the means of which are reported below 3 . 07 3 . 24 3 . 14 3 . 11 3 . 07 Based on this data, could we say that the mean flow rate exceeds 3.00 at a confidence level of 99.5%? What happens if we degrade the confidence level to 97.5%?

Solution: • The sample mean and sample standard deviation can be calculated from the data as: X = 3 . 162, δ X = 0 . 0702 . • From the table, we get t 4 , 0 . 005 = 4 . 604. Thus, we have: Pr ( Y ≤ 4 . 604) = Pr [( X − s ) √ n /δ X ≤ 4 . 604] √ = P [(3 . 162 − s ) 5 / 0 . 0702 ≤ 4 . 604] = Pr ( s ≥ 2 . 9815) = . 995 • Therefore, with the confidence level of 0.995, we cannot be sure that the the flow rate exceeds 3.00.

Experiment Design and Data Analysis When dealing with measurement - PDF document

Experiment Design and Data Analysis When dealing with measurement and simu- lation, a careful experiment design and data analysis are essential for reducing costs and drawing meaningful conclusions. The two is- sues are coupled, since it is

Dealing With The Irate Customer Dealing With The Irate Customer Dealing with difficult

An NFR Pattern Approach to Dealing An NFR Pattern Approach to Dealing An NFR Pattern Approach to

Dealing Dealing with the News with the News Media in Media in Crisis Crisis Response

Cross Border Update Dermot Corry Dealing/Transaction Accounts Dealing/transaction accounts

Dealing with Winter Neighbourhood Operations 1 Dealing with Winter Background to

Probabilistic Models of Human Sentence Experiment 1: Entropy and Sentence Length 2 Processing

Sodium Reactor Experiment Accident Sodium Reactor Experiment Accident Sodium Reactor Experiment

Future Outlook: Experiment Future Outlook: Experiment Future Outlook: Experiment Future Outlook:

Performing and tracking imputation Nicholas Tierney Statistician DataCamp Dealing With Missing

Searching for and replacing missing values Nicholas Tierney Statistician DataCamp Dealing With

Design & Analysis of Design & Analysis of Design & Analysis of Physical Design

PHYSICS PROSPECTS OF THE PHYSICS PROSPECTS OF THE JUNO EXPERIMENT JUNO EXPERIMENT Monica Sisti

Dealing with Darwin Place, Politics and Polemics in Christian Engagements with Evolution Dealing

Dealing With Angry Library Patron Behaviors With Andrew Sanderbeck Lets Talk About

Hans Vangheluwe Software? Model Everything! Compl. Causes Dealing with Compl. MPM Software?

Hans Vangheluwe Software? Model Everything! Compl. Causes Dealing with Compl. MPM Software?

rt r trst rt

INF5110 Compiler Construction Spring 2017 1 / 97 Outline 1. Intermediate code generation

Modelling and Verification Hennessy-Milner Logic An introduction to Hennessy-Milner logic (HML)

f A ca mm b The probability that defined is Isa as E Prew prey Prata a WE Naka two rolled

Lecture 15: Architecture and Design Patterns . Software Modelling Strategy, (i) views and

Frequentist Statistics and Hypothesis Testing 18.05 Spring 2018 http://xkcd.com/539/ Agenda

Big O and Limits Abstract Data Types Data Structure Grand Tour

Worthington Schools School Resource Officer Annual Review The SRO Partnership TWHS: WPD WKHS:

Experiment Design and Data Analysis When dealing with measurement - PDF document

Experiment Design and Data Analysis When dealing with measurement and simu- lation, a careful experiment design and data analysis are essential for reducing costs and drawing meaningful conclusions. The two is- sues are coupled, since it is

Dealing With The Irate Customer Dealing With The Irate Customer Dealing with difficult

An NFR Pattern Approach to Dealing An NFR Pattern Approach to Dealing An NFR Pattern Approach to

Dealing Dealing with the News with the News Media in Media in Crisis Crisis Response

Cross Border Update Dermot Corry Dealing/Transaction Accounts Dealing/transaction accounts

Dealing with Winter Neighbourhood Operations 1 Dealing with Winter Background to

Probabilistic Models of Human Sentence Experiment 1: Entropy and Sentence Length 2 Processing

Sodium Reactor Experiment Accident Sodium Reactor Experiment Accident Sodium Reactor Experiment

Future Outlook: Experiment Future Outlook: Experiment Future Outlook: Experiment Future Outlook:

Performing and tracking imputation Nicholas Tierney Statistician DataCamp Dealing With Missing

Searching for and replacing missing values Nicholas Tierney Statistician DataCamp Dealing With

Design &amp; Analysis of Design &amp; Analysis of Design &amp; Analysis of Physical Design

PHYSICS PROSPECTS OF THE PHYSICS PROSPECTS OF THE JUNO EXPERIMENT JUNO EXPERIMENT Monica Sisti

Dealing with Darwin Place, Politics and Polemics in Christian Engagements with Evolution Dealing

Dealing With Angry Library Patron Behaviors With Andrew Sanderbeck Lets Talk About

Hans Vangheluwe Software? Model Everything! Compl. Causes Dealing with Compl. MPM Software?

Hans Vangheluwe Software? Model Everything! Compl. Causes Dealing with Compl. MPM Software?

rt r trst rt

INF5110 Compiler Construction Spring 2017 1 / 97 Outline 1. Intermediate code generation

Modelling and Verification Hennessy-Milner Logic An introduction to Hennessy-Milner logic (HML)

f A ca mm b The probability that defined is Isa as E Prew prey Prata a WE Naka two rolled

Lecture 15: Architecture and Design Patterns . Software Modelling Strategy, (i) views and

Frequentist Statistics and Hypothesis Testing 18.05 Spring 2018 http://xkcd.com/539/ Agenda

Big O and Limits Abstract Data Types Data Structure Grand Tour

Worthington Schools School Resource Officer Annual Review The SRO Partnership TWHS: WPD WKHS:

Design & Analysis of Design & Analysis of Design & Analysis of Physical Design