Point Estimates: Parameters I (7.1.1) Point Estimates: Parameters I - - PowerPoint PPT Presentation

point estimates parameters i 7 1 1 point estimates
SMART_READER_LITE
LIVE PREVIEW

Point Estimates: Parameters I (7.1.1) Point Estimates: Parameters I - - PowerPoint PPT Presentation

Outline: Chapter 7 Parameter Estimation Using the Sample Mean E U h S l M 2 Point Estimates (7.1) P i E i (7 1) Properties of Point Estimates (7.2) 204312 PROBABILITY AND 204312 PROBABILITY AND Confidence Intervals (8.1)


slide-1
SLIDE 1

204312 PROBABILITY AND 204312 PROBABILITY AND RANDOM PROCESSES FOR COMPUTER ENGINEERS COMPUTER ENGINEERS

Lecture 10: Chapter 7 p

1st Semester, 2007 Monchai Sopitkamon, Ph.D.

Outline: Chapter 7 Parameter E U h S l M Estimation Using the Sample Mean

P i E i (7 1)

2

Point Estimates (7.1) Properties of Point Estimates (7.2) Confidence Intervals (8.1)

Point Estimates: Parameters I (7.1.1) Point Estimates: Parameters I (7.1.1)

Parameters – denote a quantity that is a property

  • f an unknown prob distribution.

For example, a mean, a variance, or a particular

quantile of the prob distribution q p

Parameters are unknown and the goal of statistical

inference is to estimate them inference is to estimate them

Point Estimates: Parameters II Point Estimates: Parameters II

  • Ex. (Machine Breakdowns):

Let p0 be the prob that a machine breakdown is due to

  • perator misuse.

p0 = P(misuse) is a parameter since it depends on the prob distribution that governs the cause of machine b kd breakdowns. In practice, p0 is an unknown quantity, but can be ti t d f ti th b f hi estimated from counting the number of machine breakdown causes.

slide-2
SLIDE 2

Point Estimates: Statistics I Point Estimates: Statistics I

Statistics denote a quantity that is a property of a

sample.

For example, a sample mean, a sample variance,

  • r a particular sample quantile.

x

p p q

Statistics are random variables (Xs) whose

  • bserved values (e g s2) can be calculated

x

x

  • bserved values (e.g., , s ) can be calculated

from a set of data observations (x1, …, xn) St ti ti d t ti t k

x

Statistics are used to estimate unknown

parameters.

Point Estimates: Estimation I Point Estimates: Estimation I

A i t ti t f k t θ i

A point estimate of an unknown parameter θ is a

statistic that is a “best guess” of the value of θ.

θ ˆ

More than one good point estimates of a

parameter is possible.

Point estimates are as good as the data set from

which they are calculated. w c ey a e ca cu a ed

So the question is how representative the sample is

  • f the population relating to the parameter being
  • f the population relating to the parameter being

estimated.

Point Estimates: Estimation II Point Estimates: Estimation II

The relationship between a point estimate and an unknown parameter θ

Point Estimates: Estimation III Point Estimates: Estimation III

Estimation of the population mean by the sample mean

slide-3
SLIDE 3

Point Estimates: Estimation IV Point Estimates: Estimation IV

Estimating the probability that a machine breakdown is due to operator misuse machine breakdown is due to operator misuse

Point Estimates: Estimation V

Estimating the population mean and variance of the students’ midterm scores

Outline Outline

Point Estimates (7.1) Point Estimates (7.1) Properties of Point Estimates (7.2) Confidence Intervals (8.1) Hypothesis Testing (8.2)

Properties of Point Estimates (7.2) Properties of Point Estimates (7.2)

Two basic criteria for determining good point

estimates of a particular parameter: unbiased estimates and minimum variance estimates.

These criteria help to decide which statistics to use

p as point estimates.

slide-4
SLIDE 4

Properties of Point Estimates: Unbiased Estimates I

A i t ti t f t θ i θ ˆ

A point estimate for a parameter θ is

unbiased if

θ

θ ˆ

E( ) = θ

The property of unbiasedness requires a

θ ˆ point estimate to have a prob distribution w/ mean equal to θ θ

If a point estimate is not unbiased, then its

bias can be defined as

bias = E( ) – θ The smaller the absolute value of the bias the

θ ˆ

The smaller the absolute value of the bias, the better.

Properties of Point Estimates: Unbiased Estimates II

An unbiased point estimate

Properties of Point Estimates: Unbiased Estimates III

A biased point estimate

Properties of Point Estimates: Unbiased Estimates IV

A f B lli t i l ith t t

A sequence of Bernoulli trials with a constant

unknown success prob p.

Parameter p is estimated by conducting a

sequence of n trails and X counts the number

  • f success observed. Point estimate of p is

X

Si X B( ) E(X) =

n X p = ˆ Unbiased point estimate?

Since X ∼ B(n, p) E(X) = np Therefore,

( )

p np X E X E p E = = = ⎟ ⎞ ⎜ ⎛ = 1 1 ) ˆ (

( )

p np n X E n n E p E = = = ⎟ ⎠ ⎜ ⎝ = ) (

slide-5
SLIDE 5

Properties of Point Estimates: Unbiased Estimates V

Point Estimate of a Population Mean

If X1, …, Xn is a sample of observations from a prob distribution w/ a mean µ (or E(X)), then the sample mean

n i

X X X + +

1 1

ˆ

i i n

X X X n n μ

=

+ + = = =

  • is an unbiased point estimate of the population mean µ.

Properties of Point Estimates: Unbiased Estimates VI

E M hi B kd

  • Ex. Machine Breakdowns

Suppose that X1, …, Xn is a sample of

  • bservations from a prob distribution w/ a

mean µ and a variance σ2. The sample mean is an unbiased point estimate of the population mean µ?

X = μ ˆ

popu a o ea µ Since E(Xi) = µ, 1 ≤ i ≤ n so that ( )

( )

( )

μ μ μ = = = ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ = =

∑ ∑

= =

n n X E n X E n X E E

n i i n i i

1 1 1 ˆ

1 1

⎠ ⎝

i i 1 1

Properties of Point Estimates: Unbiased Estimates VII

19

The sample mean has expected value and

variance

X

Proof

( ) ( ) ( ) ( ) Var X E X E X Var X n = =

Proof:

2

/ 2

[ ] [ ]

1

1 1 ( ) ( ) ( ) ( ) ( ) ( )

n

E X E X E X E X E X E X n n = + + = + + =

  • From Var(aY) = a2Var(Y), Var( ) = Var(X1+…+Xn)/n2

Since Xi are iid, Var(X1+…+Xn) = Var(X1)+ … + Var(Xn)

X

= nVar(X) Thus,

2

( ) ( ) ( ) nVar X Var x Var X = =

As n approaches ∞ variance

2

( ) n n

As n approaches ∞, variance

  • f the sample mean

approaches 0

Properties of Point Estimates: Unbiased Estimates VIII

20

Ex.Quiz 7.1 (Y&G): Let X be an exponential RV w/

E(X) = 1. Let be the sample mean of n ind.

X

samples of . How many samples n are needed to guarantee that the variance of the sample mean

X X

is ≤ 0.01? From Var(X) = E(X) when X is an exp. RV., therefore From Var(X) E(X) when X is an exp. RV., therefore Var(X) = 1. From

( ) 1 ( ) Var x

From

( ) 1 ( ) 0.01 Var x Var X n n = = =

Hence, we need n = 100 samples.

slide-6
SLIDE 6

Properties of Point Estimates: Unbiased Estimates VII

Point Estimate of a Population Variance

If X1, …, X is a sample of observations from a prob If X1, …, Xn is a sample of observations from a prob distribution w/ a variance σ2. The sample variance

( )

( )

2 2 2 n n

X X

∑ ∑ (

)

( )

2 2 2 1 2 2 1

ˆ ( ) 1 1

n n i i i i n

X nX X X S V X n n σ

= =

− − ′ = = = = − −

∑ ∑

is an unbiased point estimate of the population variance σ2

  • 2

ˆ ( ) ( ( )) ( ( )) ( ) n E E V X E V X Var X σ ′ = = =

  • The smaller the variance Var( ) of the point estimate,

th b tt th i t ti t

( ) ( ( )) ( ( )) ( ) 1

n n

E E V X E V X Var X n σ = = = −

θ ˆ

the better the point estimate

Properties of Point Estimates: Minimum Variance f E II

  • f Estimates II

The unbiased point estimate

2 is better than the

The unbiased point estimate 2 is better than the unbiased point estimate 1 because it has a smaller variance

Properties of Point Estimates: Minimum Variance f E III

  • f Estimates III

Comparing point estimates with different biases and different biases and different variances

Properties of Point Estimates: Minimum Variance f E i IV

  • f Estimates IV

In that case, the point estimate that provides

smaller value of mean square error (MSE) is preferable.

By definition, MSE( ) = E(( − θ )2), which

θ ˆ

θ ˆ

y , ( ) (( ) ), reduces to

MSE( ) = Var( ) + bias2 = Var(θ )/n + bias2

θ ˆ θ ˆ

MSE( ) Var( ) + bias Var(θ )/n + bias

From the previous figure, suppose that

and

θ θ

) 04 19 1 ( ˆ

2

θ θ θ N ≈ ) 02 2 1 ( ˆ

2

θ θ θ N ≈

and

) 04 . , 19 . 1 (

1

θ θ θ N ≈ ) 02 . , 2 . 1 (

2

θ θ θ N ≈

2 2 2 1

05 . ) 1 . ( 04 . ) ˆ ( θ θ θ θ = + = MSE preferable

2 2 2 2

06 . ) 2 . ( 02 . ) ˆ ( θ θ θ θ = + = MSE

slide-7
SLIDE 7

Properties of Point Estimates: Minimum Variance of Estimates V

25

ˆ ˆ ˆ

From MSE( ) = Var( ) + bias2, if is the sample

mean, then reduces to MSE( ) = Var( ) + 0 since

θ ˆ θ ˆ θ ˆ θ ˆ θ ˆ

sample mean is unbiased (bias = 0).

MSE( ) = Var(X)/n

X

/

  • standard error of the estimate.

Standard error gives us an indication of how far the

ˆ ( )

n

MSE e θ =

Standard error gives us an indication of how far the

sample mean is expected to deviate from the t d l expected value.

Properties of Point Estimates: Minimum Variance of Estimates VI

26

Specifically, when X is a normal RV ( is also

normal),

X

meaning there is about two thirds prob that the

( ( ) ( ) 2 (1) 1 0.68

n n

P E X e X E X e − ≤ ≤ + = Φ − ≈

meaning there is about two-thirds prob. that the sample mean is within one standard error of the t d l expected value.

Properties of Point Estimates: Minimum Variance of Estimates VII

27

Ex.7.5 (Y&G): How many ind. trials n are needed to

guarantee that , the relative frequency estimate of

ˆ p

p, has standard error less than 0.1? From en = MSE( ) = Var(X)/n = ,

X (1 ) p p −

n

( ) ( )/ , we need to choose n large enough to guarantee ≤ 0 01

n 0 1 ≤

  • r en ≤ 0.01.

From the fact that 0 ≤ p ≤ 1, so p(1-p) ≤ 0.25.

0.1

n

e ≤

Thus, en ≤ 0.25/n or 0.01 = 0.25/n That is n ≥ 25 trials That is, n ≥ 25 trials.

Deviation of a RV from the Expected Value I

28

Analyze how far a RV is away from its expected

value.

By computing |X - μX|, the absolute difference

between a RV X and its expected value. p

Markov Inequality: For a RV X such that P(X<0) = 0

and a constant c and a constant c,

2 2

( ) ( ) E X P X c c ≥ ≤ Note that X must be nonnegative RV before Markov c Inequality can be used.

slide-8
SLIDE 8

Deviation of a RV from the Expected Value II

29

( &G) ( f ) f

Ex.7.1 (Y&G): X = height (in feet) of a randomly

chosen adult. If the expected height E(X) = 5.5, then th M k I lit t t th t th b d lt the Markov Inequality states that the prob. an adult is at least 11 feet tall (P(X ≥ 11)) can be computed as: as: From Markov Inequality,

( ) E X

S h 2 11 d E(X) 5 5

2 2

( ) ( ) E X P X c c ≥ ≤

So we have c2 = 11 and E(X) = 5.5. Plugging it in to the eq., we get P(X ≥ ) 5 5/ 0 5 P(X ≥ 11) ≤ 5.5/11 = 0.5

Deviation of a RV from the Expected Value III

30

Chebyshev Inequality: For an arbitrary RV X and

constant c > 0,

2

( ) (| | )

X

Var X P X c c μ − ≥ ≤

The Chebyshev Inequality is good for all kinds of

RVs. RVs.

The Chebyshev Inequality also provides a tighter

bound than the Markov one since it uses both the bound than the Markov one since it uses both the expected value and variance.

Deviation of a RV from the Expected Value IV

31

Ex.7.3 (Y&G): If the height X of an adult has E(X) =

5.5 feet and SD σX = 1 foot, use the Chebyshev Inequality to find an upper bound on P(X ≥ 11). From the Chebyshev Inequality, y q y,

2

( ) (| | )

X

Var X P X c c μ − ≥ ≤

Rewrite P(X ≥ 11) to P(X – μX ≥ 11 – μX) = P(X – μX ≥ 5 5)

c

≥ 5.5). So, P(X ≥ 11) = P(X – μX ≥ 5.5) ≤ Var(X)/(5.5)2 = 1/30.25 = 0.033.

Outline Outline

Point Estimates (7.1) Point Estimates (7.1) Properties of Point Estimates (7.2) Confidence Intervals (8.1) Hypothesis Testing (8.2)

slide-9
SLIDE 9

Confidence Intervals (8.1) Confidence Intervals (8.1)

Combine random variable theory and estimation

methods to provide analysis of a single sample of continuous data observations taken from a population

Describe inference procedures designed to

investigate the population mean µ investigate the population mean µ

The inference procedures include confidence

interval construction and hypothesis testing interval construction and hypothesis testing.

Confidence Interval Construction I (8 1 1) (8.1.1)

Confidence interval for μ – an interval that contains

a set of plausible values of the parameter μ.

Is associated with a confidence level 1 – α, which

measures the prob that the confidence interval p indeed contains the unknown parameter value.

Confidence Interval Construction II (8 1 1) (8.1.1)

Confidence level α Confidence level α 90% 0.1 95% 0.05 99% 0 01 99% 0.01 Confidence Interval Construction: Inferences on a P l i M (8 1 1) Population Mean (8.1.1)

A i

Assumptions 1.

Sample size is large (n ≥ 30), or

2.

For small sample sizes, data observations are normally distributed

U i t id d t i t l

  • Using two-sided t-interval:

⎟ ⎟ ⎞ ⎜ ⎜ ⎛ + − ∈

− −

s t x s t x

n n 1 , 2 / 1 , 2 /

,

α α

μ

for a confidence interval w/ confidence level 1 –

⎟ ⎟ ⎠ ⎜ ⎜ ⎝ + ∈ n x n x , μ

α for a population mean µ based on a sample of n continuous data observations w/ a sample mean and a sample standard

x

a sample mean and a sample standard deviation s.

x

slide-10
SLIDE 10

Confidence Interval Construction: Inferences on a P l i M (8 1 1) Population Mean (8.1.1)

( )

ˆ ˆ ˆ ˆ critical point s.e.( ), critical point s.e.( ) μ μ μ μ μ ∈ − × + × A two-sided t-interval

Confidence Interval Construction: Inferences

  • n a Population Mean (8.1.1)

38

The length of the CI is

/2 1

2 ˆ 2 i i l i ( )

n

t s L

α −

  • As s.e.( )

d L l ˆ μ

/2, 1

ˆ 2 critical point s.e.( )

n

L n

α

μ = = × ×

decreases, L also decreases so that becomes

ˆ x μ =

becomes more “accurate” estimate of μ.

x μ

  • As confidence level

increase (α dec.), iti l i t t Higher confidence levels require critical point tα/2, n–1 increases and so L is longer. Higher confidence levels require longer confidence intervals longer.

Confidence Interval Construction: Inferences on a P l ti M (8 1 1) Population Mean (8.1.1)

Ex 17 pg 329: Milk Container Contents Ex.17 pg.329: Milk Container Contents

Confidence level 95% (α = 0.05), sample size n = 50, critical point t / = t = 2 0096 (obtained critical point tα/2, n-1 = t0.025, 49 = 2.0096 (obtained from Table III or from Excel’s TINV).

Excel sheet

Confidence Interval Construction: Effect of Sample Size C fid I t l (8 1 2)

  • n Confidence Intervals (8.1.2)

For a fixed critical point, a confidence

interval length L is inversely proportional to

1

the square root of the sample size n:

Specifically, minimum sample size n to reduce

n L 1 ∝

p y, p confidence interval length to desired value:

2

⎞ ⎛

2 1 , 2 /

4 ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ × ≥

L s t n

n α

⎠ ⎝

slide-11
SLIDE 11

Confidence Interval Construction: Effect of Sample Size on Confidence Intervals (8.1.2)

41

E 17 333 Wi h l i f 50 ilk

Ex.17 pg.333: With a sample size of n = 50 milk

containers, a CI for the mean container content w/ CL 99% is constructed to be (2.0457, 2.0996). This ( , ) interval has a length of 2.0996-2.0457 = 0.0539

  • liters. If the engineer decide that they need a 99% CI

that has a length no larger than 0 04 liters How much that has a length no larger than 0.04 liters. How much additional sampling is required? Using the values t0 005 49 = 2.68 and s = 0.0711 used in Using the values t0.005,49 2.68 and s 0.0711 used in the initial analysis, it appears that a total sample size

2 2 /2, 1

2.68 0.0711 4 4 90 77

n

t s

α −

⎛ ⎞ × ⎛ ⎞ ≥ × × ⎜ ⎟ ⎜ ⎟

is required.

,

4 4 90.77 0.04 n L ≥ × = × = ⎜ ⎟ ⎜ ⎟ ⎝ ⎠ ⎝ ⎠

Therefore, an additional sample of at least 91 – 50 = 41 milk containers should be obtained.

Confidence Interval Construction: Simulation Experiment 3: An Investigation of Confidence Intervals (8.1.4) 3: An Investigation of Confidence Intervals (8.1.4)

Confidence Interval Construction: One-Sided C fid I l (8 1 5) Confidence Intervals (8.1.5)

Used to provide upper bound or lower bound on the

population mean µ:

Upper bound on population mean µ

⎟ ⎞ ⎜ ⎛ ⎟ ⎞ ⎜ ⎛ s t s t

n n 1 1 α α

⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ + ∞ − ∈ → − = ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ + ≤

− −

n x n X P

n n 1 , 1 ,

, 1

α α

μ α μ

Lower bound on population mean µ

⎟ ⎟ ⎞ ⎜ ⎜ ⎛ ⎟ ⎟ ⎞ ⎜ ⎜ ⎛ ≤

− −

1

1 1

s t s t X P

n n α α

⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ ∞ − ∈ → − = ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ ≤ − , 1

1 , 1 ,

n x n X P

n n α α

μ α μ

Confidence Interval Construction: One-Sided C fid I l (8 1 5) Confidence Intervals (8.1.5)

Comparison of two-sided and

  • ne-sided confidence intervals
slide-12
SLIDE 12

Confidence Interval Construction: z-Intervals (8 1 6) (8.1.6)

U t d d l iti l i t i t d

Use standard normal critical point zα/2 instead

  • f tα/2, n-1 for two-sided confidence interval

h i “k ” l ti SD σ when using a “known” population SD σ.

Construct a CI for a population mean µ based

/

  • n a sample size n w/ a sample mean and

using an assumed known value for the l SD h h CI

x

population SD σ, then the appropriate CI is ⎞ ⎛ ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ + − ∈ n z x n z x σ σ μ

α α 2 / 2 /

,

Two-sided z-interval

Confidence Interval Construction: z- I l (8 1 6) Intervals (8.1.6)

Construct a one-sided 1 – α level CI for a

population mean µ based on a sample size n

  • bservations w/ a sample mean and using a

known value for the population SD σ are

x

⎞ ⎛ z σ ⎞ ⎛ z σ and ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ + ∞ − ∈ n z x σ μ

α

, ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ ∞ − ∈ , n z x σ μ

α

One-sided z-intervals

HW 7 HW 7

47

Problem 7.1.2 7 2 1 7.2.1 7.3.1 7.4.2