Inference for Numerical Data III Dajiang Liu @ PHS 525 Feb 23 th , - - PowerPoint PPT Presentation

inference for numerical data iii
SMART_READER_LITE
LIVE PREVIEW

Inference for Numerical Data III Dajiang Liu @ PHS 525 Feb 23 th , - - PowerPoint PPT Presentation

Inference for Numerical Data III Dajiang Liu @ PHS 525 Feb 23 th , 2016 Central Limit Theorem is approximately normal. The sample mean point estimates ~ , The approximation works when: Sample


slide-1
SLIDE 1

Inference for Numerical Data III

Dajiang Liu @ PHS 525 Feb 23th, 2016

slide-2
SLIDE 2

Central Limit Theorem

  • The sample mean point estimates

is approximately normal.

  • ~ ,
  • The approximation works when:
  • Sample size is “large”
  • A rule of thumb is sample size ≥ 30
  • The distribution should not be skewed (i.e. be symmetric)
  • There are no outliers
  • The approximation may not be good if any of the above 3 conditions are

not met

slide-3
SLIDE 3

Sampling Distribution for Different Sample Sizes

Population Distribution does not need to be normal Sample mean is still normal when sample sizes are large enough

slide-4
SLIDE 4

1

slide-5
SLIDE 5

4.33 Answer

  • (a) The distribution is skewed toward smaller values and has several

very large outliers

  • (b) As sample size gets larger, the distribution of the sample mean

estimator behave more like normal distribution. Yet, there are still heavy upper tails, possibly due to the influence of the outliers with large values.

slide-6
SLIDE 6
slide-7
SLIDE 7

4.35 Answer

  • (1) -> (b)
  • (2) -> (a)
  • (3) -> (c)
  • The key is to examine the standard error. The sample mean from

larger samples has the smallest standard errors.

slide-8
SLIDE 8

One Sample Means with t-distribution

  • Central Limit Theorem requires large sample sizes
  • In large samples, sample mean estimate is more likely to be normally

distributed

  • In large samples, the sample mean estimate tend to have smaller standard

deviation

Yet:

  • In many cases, large samples can be hard to attain
  • t-distribution can be a helpful alternative for small sample inference
slide-9
SLIDE 9

The Normality Condition – Modified

  • Central limit theorem modified:
  • The sampling distribution for the mean is nearly normal when the sample
  • bservations are independent and come from a nearly normal distribution.
  • Important to note:
  • The CLT modified does not put constraint on the sample size
  • The CLT modified does require that population distribution is nearly normal
  • Original CLT does not require population distribution be normal
  • Even for sample sizes, CLT modified holds.
slide-10
SLIDE 10

Degrees of Freedom (df)

  • Degrees of freedom measure the shape of the distribution
  • The larger the df, the more closely the t-distribution resembles the

normal distribution

slide-11
SLIDE 11

Tails are heavier

slide-12
SLIDE 12
slide-13
SLIDE 13

Use t-distribution to Obtain Confidence Interval

  • Confidence intervals obtained using t-distribution can be more

accurate

  • Procedures for obtaining t-distribution based confidence interval
  • Obtain sample mean point estimate
  • Obtain sample standard deviation
  • Obtain standard error for the sample mean point estimate

= /

  • Confidence interval is obtained by

− × ≤ ≤ + ×

  • is the critical t-value
slide-14
SLIDE 14
slide-15
SLIDE 15

Example: What is the normal and t- distribution based confidence interval??

slide-16
SLIDE 16

Example: What is the normal and t- distribution based 95%-confidence interval??

Answer: Answer: Answer: Answer: =

#.% &

  • =0.53

Normal confidence interval equals to (3.36,5.44) t confidence interval equals to (3.29,5.51)

slide-17
SLIDE 17
slide-18
SLIDE 18

Hypothesis Testing with t-Distribution

  • T statistic
  • For a sample of size ,
  • Estimate sample mean

and standard deviation

  • To test the hypothesis '(: = ( v.s. '

): > (

  • A t-statistic can be calculated

+ = − (

  • The p-value can be assessed by Pr +∗ > + , where +∗ is a random variable

with distribution

slide-19
SLIDE 19
slide-20
SLIDE 20

Answer and R command: (a). pt(1.91,df=10,lower.tail=FALSE) [1] 0.04260244 (b). 2*pt(0.83,df=6,lower.tail=FALSE) [1] 0.4383084 (c). pt(-3.45,df=16,lower.tail=TRUE) [1] 0.001646786 (d). pt(2.13,df=28,lower.tail=FALSE) [1] 0.02104844

slide-21
SLIDE 21
slide-22
SLIDE 22

Answer 5.19

  • (a). '(: = 8 v.s. '): < 8
  • (b). + =

0.0%1 (.00/ #2

  • = −

(.#0 (.00 × 5 = −1.75

  • (c). P-value 0.046
  • (d). Reject the null hypothesis at 6 = 0.05
  • (e). (7.47,7.99)