Sampling Distributions When you carry out an experiment and measure - - PowerPoint PPT Presentation

sampling distributions
SMART_READER_LITE
LIVE PREVIEW

Sampling Distributions When you carry out an experiment and measure - - PowerPoint PPT Presentation

ST 370 Probability and Statistics for Engineers Sampling Distributions When you carry out an experiment and measure some quantity, the resulting value is regarded as one particular value of a random variable , and the probability distribution of


slide-1
SLIDE 1

ST 370 Probability and Statistics for Engineers

Sampling Distributions

When you carry out an experiment and measure some quantity, the resulting value is regarded as one particular value of a random variable, and the probability distribution of that random variable governs the probabilities of the different possible values. When the experiment is part of a factorial design or a regression design, you observe the values of several random variables, each of which has its own probability distribution.

1 / 14 Sampling Distributions and Statistical Inference Sampling Distributions

slide-2
SLIDE 2

ST 370 Probability and Statistics for Engineers

Any quantity calculated from the observed values, such as an estimate of one of the parameters, is called a statistic. Because it is a function of the observed values of random variables, a statistic is also a random variable. As a random variable, a statistic has a probability distribution, called its sampling distribution, and the standard deviation of that distribution is called the standard error of the statistic. The sampling distribution of a parameter estimate is the key to making statistical inferences about the parameter that it estimates.

2 / 14 Sampling Distributions and Statistical Inference Sampling Distributions

slide-3
SLIDE 3

ST 370 Probability and Statistics for Engineers

Factorial designs Consider for example the replicated two-factor design, for which the statistical model is Yi,j,k = µ + τi + βj + (τβ)i,j + ǫi,j,k, i = 1, . . . , a, j = 1, . . . , b, k = 1, . . . , n where ǫi,j,k, i = 1, . . . , a, j = 1, . . . , b, k = 1, . . . , n are independent random variables, each distributed as N(0, σ2).

3 / 14 Sampling Distributions and Statistical Inference Sampling Distributions

slide-4
SLIDE 4

ST 370 Probability and Statistics for Engineers

An equivalent way to write the model is: Yi,j,k, i = 1, . . . , a, j = 1, . . . , b, k = 1, . . . , n are independent random variables, normally distributed with common variance σ2 and expected values E(Yi,j,k) = µ + τi + βj + (τβ)i,j, i = 1, . . . , a, j = 1, . . . , b, k = 1, . . . , n

4 / 14 Sampling Distributions and Statistical Inference Sampling Distributions

slide-5
SLIDE 5

ST 370 Probability and Statistics for Engineers

With constraints such as τ1 = β1 = 0, (τβ)i,1 = 0, i = 1, . . . , a and (τβ)1,j = 0, j = 1, . . . , b the least squares estimates of the parameters can be found.

5 / 14 Sampling Distributions and Statistical Inference Sampling Distributions

slide-6
SLIDE 6

ST 370 Probability and Statistics for Engineers

Regression designs The statistical model is Yi = β0 + β1xi,1 + · · · + βkxi,k + ǫi where ǫi, i = 1, . . . , n are independent random variables, each distributed as N(0, σ2). Again, an equivalent way to write the model is: Yi, i = 1, n, are independent random variables, normally distributed with common variance σ2 and expected values E(Yi) = β0 + β1xi,1 + · · · + βkxi,k.

6 / 14 Sampling Distributions and Statistical Inference Sampling Distributions

slide-7
SLIDE 7

ST 370 Probability and Statistics for Engineers

General linear model Any factorial model may be written as a regression model by using indicator variables, so the regression model is the more general form. The key sampling distribution results for the least squares estimates are: Each parameter estimate ˆ βj has a Gaussian distribution, with expected value equal to βj (they are unbiased). The standard error of each estimate is of the form aj × σ, where aj is a quantity that can be calculated from the design, and does not depend on the unknown parameters. So ˆ βj − βj ajσ ∼ N(0, 1).

7 / 14 Sampling Distributions and Statistical Inference Sampling Distributions

slide-8
SLIDE 8

ST 370 Probability and Statistics for Engineers

The residual mean square s2 has the property that νs2 σ2 has the chi-squared distribution with ν degrees of freedom, which is a special case of the Gamma distribution. Here ν is the degrees of freedom for residuals. Also, s2 is independent of the least squares estimates ˆ βj. So T = ˆ βj − βj ajs ∼ Student’s t with ν degrees of freedom The “standard error” reported by software is the estimated standard error ajs.

8 / 14 Sampling Distributions and Statistical Inference Sampling Distributions

slide-9
SLIDE 9

ST 370 Probability and Statistics for Engineers

Confidence intervals 1 − α = P(|T| ≤ tα/2,ν) = P

  • ˆ

βj − βj ajs

  • ≤ tα/2,ν
  • = P
  • −tα/2,ν ≤

ˆ βj − βj ajs ≤ tα/2,ν

  • = P
  • −tα/2,ν × ajs ≤ ˆ

βj − βj ≤ tα/2,ν × ajs

  • = P
  • −ˆ

βj − tα/2,ν × ajs ≤ −βj ≤ −ˆ βj + tα/2,ν × ajs

  • = P
  • ˆ

βj + tα/2,ν × ajs ≥ βj ≥ ˆ βj − tα/2,ν × ajs

  • 9 / 14

Sampling Distributions and Statistical Inference Sampling Distributions

slide-10
SLIDE 10

ST 370 Probability and Statistics for Engineers

That is, the probability that βj lies between the random limits ˆ βj ± tα/2,ν × ajs is 1 − α, and

  • ˆ

βj − tα/2,ν × ajs, ˆ βj + tα/2,ν × ajs

  • is a 100(1 − α)% confidence interval for βj.

10 / 14 Sampling Distributions and Statistical Inference Sampling Distributions

slide-11
SLIDE 11

ST 370 Probability and Statistics for Engineers

Hypothesis tests Similarly, under H0 : βj = β0

j , the probability of finding as large a

value as tobs = ˆ βj − β0

j

ajs is P(|T| ≥ |tobs|) where T ∼ Student’s t with ν degrees of freedom. This is the P-value reported by software.

11 / 14 Sampling Distributions and Statistical Inference Sampling Distributions

slide-12
SLIDE 12

ST 370 Probability and Statistics for Engineers

Nested models Write SSE,reduced and SSE,full for the residual sums of squares of the two models, where the “reduced” model is nested within the “full” model. The extra sum of squares is SSR,extra = SSE,reduced − SSE,full and if this is large, the r additional predictors have explained a substantial additional amount of variability.

12 / 14 Sampling Distributions and Statistical Inference Sampling Distributions

slide-13
SLIDE 13

ST 370 Probability and Statistics for Engineers

The key properties of these sums of squares are: SSR,extra and SSE,full are independent; SSE,full/σ2 follows the chi-squared distribution with ν degrees of freedom, where ν is the residual degrees of freedom for the full model. Under the null hypothesis that the added predictors all have zero coefficients, SSR,extra/σ2 follows the chi-squared distribution, with r degrees of freedom. Consequently, under that null hypothesis, the F-statistic F = SSR,extra/r MSE,full follows Fisher’s F-distribution with r and ν degrees of freedom.

13 / 14 Sampling Distributions and Statistical Inference Sampling Distributions

slide-14
SLIDE 14

ST 370 Probability and Statistics for Engineers

The “test for significance of regression” is a special case, where the reduced model has no predictors, only an intercept. In this case, and in the general nested model comparison, the P-value reported by software is P(F ≥ Fobs) where F ∼ Fisher’s F with r and ν degrees of freedom.

14 / 14 Sampling Distributions and Statistical Inference Sampling Distributions