Some Continuous Distributions Normal Distribution The normal - - PowerPoint PPT Presentation

some continuous distributions
SMART_READER_LITE
LIVE PREVIEW

Some Continuous Distributions Normal Distribution The normal - - PowerPoint PPT Presentation

ST 435/535 Statistical Methods for Quality and Productivity Improvement / Statistical Process Control Some Continuous Distributions Normal Distribution The normal distribution with parameters and > 0 has probability density function


slide-1
SLIDE 1

ST 435/535 Statistical Methods for Quality and Productivity Improvement / Statistical Process Control

Some Continuous Distributions

Normal Distribution The normal distribution with parameters µ and σ > 0 has probability density function (pdf) f (x) = 1 σ √ 2π e− 1

2( x−µ σ ) 2

, −∞ < x < ∞. The mean and variance are µ and σ2, respectively. Notation If the random variable X follows the normal distribution with parameters µ and σ, we write X ∼ N(µ, σ2).

1 / 25 Modeling Process Quality Important Continuous Distributions

slide-2
SLIDE 2

ST 435/535 Statistical Methods for Quality and Productivity Improvement / Statistical Process Control

The standard normal distribution has µ = 0 and σ = 1, and pdf ϕ(x) = 1 √ 2π e− 1

2 x2, −∞ < x < ∞.

The cumulative distribution function (cdf) of the standard normal distribution is Φ(x) = x

−∞

ϕ(y)dy It cannot be written in closed form, but can be computed or tabulated.

2 / 25 Modeling Process Quality Important Continuous Distributions

slide-3
SLIDE 3

ST 435/535 Statistical Methods for Quality and Productivity Improvement / Statistical Process Control

Linear transformation If X has a normal distribution, say X ∼ N(µ, σ2), and Y = a + bX for some constants a and b, then Y is also normally distributed. By the basic rules of expected values, Y has mean a + bµ and variance b2σ2, so Y ∼ N(a + bµ, b2σ2).

3 / 25 Modeling Process Quality Important Continuous Distributions

slide-4
SLIDE 4

ST 435/535 Statistical Methods for Quality and Productivity Improvement / Statistical Process Control

Standardizing a random variable If X ∼ N(µ, σ2), then Z = X − µ σ has mean 0 and variance 1, so Z ∼ N(0, 1), and P(Z ≤ z) = Φ(z). So P(X ≤ x) = P

  • Z ≤ x − µ

σ

  • = Φ

x − µ σ

  • We use calculations of Φ(·) to make probability statements about X

by standardizing X into Z.

4 / 25 Modeling Process Quality Important Continuous Distributions

slide-5
SLIDE 5

ST 435/535 Statistical Methods for Quality and Productivity Improvement / Statistical Process Control

Example The time X to resolve a customer complaint at a certain financial institution is normally distributed with mean µ = 40 hours and standard deviation σ = 2 hours. How many complaints are resolved in at most 35 hours? P(X ≤ 35) = P

  • Z = X − 40

2 ≤ 35 − 40 2 = −2.5

  • = Φ(−2.5)

= 0.00621. Use the table in Appendix II or the R function pnorm(-2.5) to find the value 0.00621. So fewer than 1% of complaints are resolved in 35 hours or less.

5 / 25 Modeling Process Quality Important Continuous Distributions

slide-6
SLIDE 6

ST 435/535 Statistical Methods for Quality and Productivity Improvement / Statistical Process Control

Example, continued How long do 95% of complaints take to be resolved? 0.95 = P(Z ≤ 1.645) = P(40 + 2Z ≤ 40 + 2 × 1.645) = P(X ≤ 43.29). Use inverse look-up in the same table, or the R function qnorm(0.95), to find the value 1.645. So 95% of complaints are resolved in 43.29 hours or less.

6 / 25 Modeling Process Quality Important Continuous Distributions

slide-7
SLIDE 7

ST 435/535 Statistical Methods for Quality and Productivity Improvement / Statistical Process Control

Central Limit Theorem Why is the normal distribution used as a model for data variation? The Central Limit Theorem (CLT) implies that any variable that is the accumulation of many small contributions is close to normal. Specifically: if X1, X2, . . . , Xn are independent random variables with means µi and variances σ2

i , and if Yn = X1 + X2 + · · · + Xn, then the

distribution of Zn = Yn − n

i=1 µi

n

i=1 σ2 i

approaches the standard normal distribution as n approaches infinity. Note Some constraints on the variances and on the tails of the distributions are needed for the CLT to hold.

7 / 25 Modeling Process Quality Important Continuous Distributions

slide-8
SLIDE 8

ST 435/535 Statistical Methods for Quality and Productivity Improvement / Statistical Process Control

Lognormal distribution If X is a positive random variable, and W = log X ∼ N(θ, ω2), then X has the lognormal distribution with parameters θ and ω. The mean and variance of X are E(X) = eθ+ 1

2 ω2

and Var(X) =

  • eω2 − 1
  • e2θ+ω2 =
  • eω2 − 1
  • E(X)2.

Sometimes used as a model for the time-to-failure of a piece of equipment.

8 / 25 Modeling Process Quality Important Continuous Distributions

slide-9
SLIDE 9

ST 435/535 Statistical Methods for Quality and Productivity Improvement / Statistical Process Control

Exponential distribution The pdf of the exponential distribution is f (x) = λe−λx, x ≥ 0 where λ > 0 is the rate parameter of the distribution. The mean and variance are E(X) = 1 λ and Var(X) = 1 λ2.

9 / 25 Modeling Process Quality Important Continuous Distributions

slide-10
SLIDE 10

ST 435/535 Statistical Methods for Quality and Productivity Improvement / Statistical Process Control

The cdf is F(x) = P(X ≤ x) = x λe−λydy = 1 − e−λx, and P(X > x) = 1 − F(x) = e−λx.

10 / 25 Modeling Process Quality Important Continuous Distributions

slide-11
SLIDE 11

ST 435/535 Statistical Methods for Quality and Productivity Improvement / Statistical Process Control

Lack of memory If x′ > 0, then P(X ≤ x + x′|X > x) = F(x + x′) − F(x) 1 − F(x) = e−λx − e−λ(x+x′) e−λx = 1 − e−λx′. That is, conditionally on X > x, the distribution of X − x is the same as the unconditional distribution of X: “lack of memory”.

11 / 25 Modeling Process Quality Important Continuous Distributions

slide-12
SLIDE 12

ST 435/535 Statistical Methods for Quality and Productivity Improvement / Statistical Process Control

Suppose you are waiting for a bus that runs on average every 10 minutes. If the time until the arrival of the next bus is exponentially distributed, the expected waiting time is 10 minutes. If no bus arrives in the first 5 minutes (or 10 minutes, or ...), the expected waiting time is still a further 10 minutes.

12 / 25 Modeling Process Quality Important Continuous Distributions

slide-13
SLIDE 13

ST 435/535 Statistical Methods for Quality and Productivity Improvement / Statistical Process Control

Also, if x′ is small, P(X ≤ x + x′|X > x) ≈ λx′. When X is time-to-failure of some system, λ is the failure rate of the system. In words: P(fail in a short time interval|not failed at the start) ≈ failure rate × length of interval.

13 / 25 Modeling Process Quality Important Continuous Distributions

slide-14
SLIDE 14

ST 435/535 Statistical Methods for Quality and Productivity Improvement / Statistical Process Control

Gamma distribution The pdf of the gamma distribution is f (x) = λ Γ(r)(λx)r−1e−λx, x > 0 where r > 0 is the shape parameter and λ > 0 is the rate parameter. The special case r = 1 is the exponential distribution. Warning Montgomery calls λ the scale parameter, but the scale of the distribution is really 1/λ.

14 / 25 Modeling Process Quality Important Continuous Distributions

slide-15
SLIDE 15

ST 435/535 Statistical Methods for Quality and Productivity Improvement / Statistical Process Control

The mean and variance are E(X) = r λ and Var(X) = r λ2. If r is an integer and X1, X2, . . . , Xr are independent exponential random variables with rate parameter λ, then X = X1 + X2 + · · · + Xr has the gamma distribution with shape parameter r and rate parameter λ. Also known as the Erlang distribution.

15 / 25 Modeling Process Quality Important Continuous Distributions

slide-16
SLIDE 16

ST 435/535 Statistical Methods for Quality and Productivity Improvement / Statistical Process Control

Weibull distribution The pdf of the Weibull distribution is f (x) = β θ x θ β−1 exp

x θ β , where β > 0 is the shape parameter and θ > 0 is the scale parameter. The mean and variance are, forgettably, E(X) = θΓ

  • 1 + 1

β

  • and

Var(X) = θ2

  • Γ
  • 1 + 2

β

  • Γ
  • 1 + 1

β 2 .

16 / 25 Modeling Process Quality Important Continuous Distributions

slide-17
SLIDE 17

ST 435/535 Statistical Methods for Quality and Productivity Improvement / Statistical Process Control

The cdf is F(x) = 1 − exp

x θ β . If Y follows the exponential distribution with rate parameter 1, and X = θY 1/β, then P(X ≤ x) = 1 − exp

x θ β , so X follows the Weibull distribution.

17 / 25 Modeling Process Quality Important Continuous Distributions

slide-18
SLIDE 18

ST 435/535 Statistical Methods for Quality and Productivity Improvement / Statistical Process Control

The Weibull distribution is also used to model time-to-failure, with failure rate β θ x θ β−1 . If β > 1, the failure rate increases over time, while if β < 1 the failure rate decreases over time. The special case β = 1 is the exponential distribution, with constant failure rate.

18 / 25 Modeling Process Quality Important Continuous Distributions

slide-19
SLIDE 19

ST 435/535 Statistical Methods for Quality and Productivity Improvement / Statistical Process Control

Graphical Tools for Comparing Distributions

Quantile-quantile plot Many distributions are available as possible models for a given problem. We need graphical tools for comparing one distribution with another. One tool that has been found widely useful is the quantile-quantile plot, or Q-Q plot: a plot of the quantiles of one distribution against the matching quantiles of the other distribution. The distributions might be theoretical, or constructed from

  • bservations.

19 / 25 Modeling Process Quality Probability Plots

slide-20
SLIDE 20

ST 435/535 Statistical Methods for Quality and Productivity Improvement / Statistical Process Control

Theoretical quantiles The median m of a distribution with cdf F(·) solves the equation F(m) = 0.5. Similarly, the lower and upper quartiles, q0.25 and q0.75 solve the equations F(q0.25) = 0.25 and F(q0.75) = 0.75. In general, for any 0 < p < 1, the pth quantile qp solves the equation F(qp) = p. If F(·) is not continuous, the solution may not exist for all p, but for many distributions it exists: qp = F −1(p).

20 / 25 Modeling Process Quality Probability Plots

slide-21
SLIDE 21

ST 435/535 Statistical Methods for Quality and Productivity Improvement / Statistical Process Control

Empirical quantiles If instead of a theoretical distribution we have some observations x1, x2, . . . , xn, we construct empirical quantiles. First order the observations: x(1) ≤ x(2) ≤ · · · ≤ x(n). Next, treat the kth ordered value x(k) as an estimate of the pth

k

quantile, where pk =       

k n+1

Weibull

k− 1

2

n

Hazen . . . many other proposals Very little difference when n is not small.

21 / 25 Modeling Process Quality Probability Plots

slide-22
SLIDE 22

ST 435/535 Statistical Methods for Quality and Productivity Improvement / Statistical Process Control

To compare two theoretical distributions FX(·) and FY (·), plot F −1

Y (p) against F −1 X (p), 0 < p < 1.

To compare empirical quantiles with a theoretical distribution F(·), plot x(k) against F −1(pk), k = 1, 2, . . . , n. To compare two sets of empirical quantiles: If the sample sizes are the same, plot y(k) against x(k). If the sample sizes are unequal, interpolate quantiles from the larger sample to match the quantiles of the smaller sample.

22 / 25 Modeling Process Quality Probability Plots

slide-23
SLIDE 23

ST 435/535 Statistical Methods for Quality and Productivity Improvement / Statistical Process Control

Interpreting the Q-Q plot If the distributions are the same, then their quantiles are the same, and the plotted points all fall on the line y = x. If the standardized distributions are the same, then one set of quantiles is a linear function of the other set, and the plotted points all fall on some other straight line. For example, N(µ, σ2) and N(0, 1) are both N(0, 1) when standardized. Empirical quantiles are estimates of theoretical quantiles but are not exact, so plotted points fall close to the corresponding line.

23 / 25 Modeling Process Quality Probability Plots

slide-24
SLIDE 24

ST 435/535 Statistical Methods for Quality and Productivity Improvement / Statistical Process Control

If the plotted points are systematically nonlinear, then the distributions are different, even when standardized: A simple curve indicates a difference in skewness. An S-curve indicates a difference in tail length.

24 / 25 Modeling Process Quality Probability Plots

slide-25
SLIDE 25

ST 435/535 Statistical Methods for Quality and Productivity Improvement / Statistical Process Control

Example: aluminum contamination in plastic The R function qqnorm() plots the empirical quantiles of a sample against the theoretical quantiles of the standard normal distribution:

aluminum <- read.csv("Data/Table-03-05.csv"); qqnorm(aluminum$Contamination)

The curve indicates that the sample values have a skewed

  • distribution. Often, skewness can be reduced or removed

transforming the observations, such as taking logarithms:

qqnorm(log(aluminum$Contamination))

25 / 25 Modeling Process Quality Probability Plots