Business Statistics CONTENTS Sampling The central limit theorem - - PowerPoint PPT Presentation

โ–ถ
business statistics
SMART_READER_LITE
LIVE PREVIEW

Business Statistics CONTENTS Sampling The central limit theorem - - PowerPoint PPT Presentation

SAMPLING, THE CLT, AND THE STANDARD ERROR Business Statistics CONTENTS Sampling The central limit theorem Point and interval estimates for Confidence intervals for Old exam question Further study SAMPLING Suppose youre a


slide-1
SLIDE 1

SAMPLING, THE CLT, AND THE STANDARD ERROR

Business Statistics

slide-2
SLIDE 2

Sampling The central limit theorem Point and interval estimates for ๐œˆ Confidence intervals for ๐œˆ Old exam question Further study CONTENTS

slide-3
SLIDE 3

Suppose youโ€™re a scissors manufacturer in the UK

โ–ช What proportion of your production should be left-handed?

โ–ช Three strategies

โ–ช look at Wikipedia (โ€œStudies suggest that 70โ€“90% of the world population is right-handed.[4][5]โ€) โ–ช ask all persons in the UK (~63 million) โ–ช ask a sample of persons (100?) in the UK

SAMPLING

slide-4
SLIDE 4

Sampling is the process of collecting data about a sample (a subset of the population), with the aim of representing the entire population โ–ช Arguments pro sampling

โ–ช too costly to probe entire population โ–ช too time-consuming โ–ช too dangerous โ–ช too destructive โ–ช etc.

โ–ช Arguments against sampling

โ–ช limited accuracy ๏‚ฎ confidence intervals (later in this course) โ–ช not representative ๏‚ฎ design of experiments (not in this course)

SAMPLING

slide-5
SLIDE 5

A sample should be representative

โ–ช e.g., donโ€™t ask people at Schiphol if theyโ€™re afraid of flying

A sample should be large enough

โ–ช cf. the โ€œ ๐‘œโ€ law later on

Choice in sampling

โ–ช with replacement or without replacement โ–ช this has consequences for the probability model

SAMPLING

slide-6
SLIDE 6

Population Sample unknown known we would like to know irrelevant parameter statistic mostly Greek letters (๐œŒ, ๐œ) mostly Roman letters (๐‘ž, ๐‘ก) some deviating notations (๐‘‚) some deviating notations ( าง ๐‘ฆ, ๐‘œ)

SAMPLING

slide-7
SLIDE 7

โ–ช Let ๐‘Œ1, ๐‘Œ2, โ€ฆ , ๐‘Œ๐‘œ be a random sample from a population ๐‘Œ with mean ๐œˆ๐‘Œ and variance ๐œ๐‘Œ

2

โ–ช e.g., body heights of ๐‘œ persons โ–ช waiting times of ๐‘œ customers โ–ช failure rates of ๐‘œ cars, ...

โ–ช Then, for ๐‘œ sufficiently large, the mean เดค ๐‘Œ =

๐‘Œ1+๐‘Œ2+โ‹ฏ+๐‘Œ๐‘œ ๐‘œ

  • 1. is normally distributed
  • 2. with mean ๐œˆ เดค

๐‘Œ = ๐œˆ๐‘Œ

  • 3. and variance ๐œเดค

๐‘Œ 2 = ๐œ๐‘Œ

2

๐‘œ

THE CENTRAL LIMIT THEOREM

Capital ๐‘Œ, because it is a random variable! Capital เดค ๐‘Œ, because this is also a random variable!

slide-8
SLIDE 8

So for large ๐‘œ: เดค ๐‘Œ~๐‘‚ ๐œˆ เดค

๐‘Œ = ๐œˆ๐‘Œ, ๐œเดค ๐‘Œ 2 = ๐œ๐‘Œ 2

๐‘œ

โ–ช or for short

เดค ๐‘Œ~๐‘‚ ๐œˆ๐‘Œ, ๐œ๐‘Œ

2

๐‘œ โ–ช This holds regardless of the distribution of ๐‘Œ!

โ–ช so thatโ€™s why the normal distribution is called โ€œnormalโ€ โ–ช this fact is called the central limit theorem (CLT) โ–ช it is one of the most important results of statistics โ–ช it holds for โ€œsufficiently largeโ€ ๐‘œ

THE CENTRAL LIMIT THEOREM

slide-9
SLIDE 9

The CLT for a fair die Distribution of เดค ๐‘Œ for

โ–ช ๐‘œ = 1 โ–ช ๐‘œ = 2 โ–ช ๐‘œ = 5 โ–ช ๐‘œ = 20

THE CENTRAL LIMIT THEOREM

slide-10
SLIDE 10

The CLT for a loaded (unfair) die Distribution of เดค ๐‘Œ for

โ–ช ๐‘œ = 1 โ–ช ๐‘œ = 2 โ–ช ๐‘œ = 5 โ–ช ๐‘œ = 20

THE CENTRAL LIMIT THEOREM

slide-11
SLIDE 11

We roll with a die 100 times. The outcomes are ๐‘Œ = ๐‘Œ1, ๐‘Œ2, โ€ฆ , ๐‘Œ100 . How is เดค ๐‘Œ distributed? EXERCISE 1

slide-12
SLIDE 12

A โ€œproofโ€ of the theorem (for normal populations) โ–ช Recall the additive property of the normal distribution:

โ–ช if ๐‘Œ1~๐‘‚ ๐œˆ๐‘Œ, ๐œ๐‘Œ

2 and ๐‘Œ2~๐‘‚ ๐œˆ๐‘Œ, ๐œ๐‘Œ 2 , then ๐‘Œ1 +

๐‘Œ2~๐‘‚ 2๐œˆ๐‘Œ, 2๐œ๐‘Œ

2 (provided ๐‘Œ1 and ๐‘Œ2 are independent)

โ–ช Also recal that if ๐‘Œ~๐‘‚ ๐œˆ๐‘Œ, ๐œ๐‘Œ

2 then ๐‘๐‘Œ~๐‘‚ ๐‘๐œˆ๐‘Œ, ๐‘2๐œ๐‘Œ 2

โ–ช So, if ๐‘Œ1 + ๐‘Œ2~๐‘‚ 2๐œˆ๐‘Œ, 2๐œ๐‘Œ

2 then ๐‘Œ1+๐‘Œ2 2

~๐‘‚ ๐œˆ๐‘Œ,

๐œ๐‘Œ

2

2

โ–ช and more general:

๐‘Œ1+โ‹ฏ+๐‘Œ๐‘œ ๐‘œ

~๐‘‚ ๐œˆ๐‘Œ,

๐œ๐‘Œ

2

๐‘œ

โ–ช or equivalently: เดค ๐‘Œ~๐‘‚ ๐œˆ๐‘Œ,

๐œ๐‘Œ

2

๐‘œ

โ–ช This proof works for normal populations and all ๐‘œ, but the CLT is valid for all populations and โ€œlargeโ€ ๐‘œ THE CENTRAL LIMIT THEOREM

You donโ€™t need to reproduce such proofs, but it may help

slide-13
SLIDE 13

Some consequences of the CLT โ–ช เดค ๐‘Œ is an estimator of ๐œˆ๐‘Œ

โ–ช and าง ๐‘ฆ is the best estimate of ๐œˆ๐‘Œ

โ–ช เดค ๐‘Œ will be a better estimator for large ๐‘œ

โ–ช because ๐œ เดค

๐‘Œ decreases with ๐‘œ

โ–ช we can use the distribution of เดค ๐‘Œ to construct a confidence interval for ๐œˆ THE CENTRAL LIMIT THEOREM

slide-14
SLIDE 14

The CLT holds for ๐‘œ โ€œsufficientlyโ€ large โ–ช More specifically:

โ–ช if ๐‘Œ is normally distributed, the CLT holds for all sample sizes ๐‘œ โ–ช if the distribution of ๐‘Œ is fairly symmetric without extreme outliers, for sample sizes ๐‘œ โ‰ฅ 15 the CLT gives a pretty good approximation of the distribution of เดค ๐‘Œ โ–ช for any distribution of เดค ๐‘Œ and a sample size ๐‘œ โ‰ฅ 30, the CLT gives a pretty good approximation of the distribution of เดค ๐‘Œ

THE CENTRAL LIMIT THEOREM

slide-15
SLIDE 15

The effect of asymmetry vs. sample size THE CENTRAL LIMIT THEOREM

slide-16
SLIDE 16

A statistic is a function of the (randomly sampled) data โ–ช important example: the statistic เดค ๐‘Œ

โ–ช defined by เดค ๐‘Œ =

1 ๐‘œ ฯƒ๐‘—=1 ๐‘œ

๐‘Œ๐‘—

โ–ช in a concrete case, าง ๐‘ฆ =

1 ๐‘œ ฯƒ๐‘—=1 ๐‘œ

๐‘ฆ๐‘— is the best possible estimate of the parameter ๐œˆ โ–ช so the sample mean าง ๐‘ฆ is the best possible estimate of the population mean ๐œˆ โ–ช because it is just one value, it is a point estimate POINT AND INTERVAL ESTIMATES FOR ๐œˆ

slide-17
SLIDE 17

Due to sampling variation, าง ๐‘ฆ will be different in each sample โ–ช and there will be a distribution of าง ๐‘ฆ-values, the distribution เดค ๐‘Œ โ–ช the true value of ๐œˆ may be different from the value of าง ๐‘ฆ obtained โ–ช however, keep in mind that the value of าง ๐‘ฆ obtained cannot be โ€œtooโ€ wrong โ–ช we know that เดค ๐‘Œ~๐‘‚ ๐œˆ เดค

๐‘Œ, ๐œเดค ๐‘Œ 2 , so it follows that a specific

value าง ๐‘ฆ must be within ๐œˆ เดค

๐‘Œ โˆ’ 1.96๐œ เดค ๐‘Œ, ๐œˆ เดค ๐‘Œ + 1.96๐œ เดค ๐‘Œ

with 95% probability POINT AND INTERVAL ESTIMATES FOR ๐œˆ

slide-18
SLIDE 18

Conversely, the population value ๐œˆ เดค

๐‘Œ must be within

าง ๐‘ฆ โˆ’ 1.96๐œ เดค

๐‘Œ,

าง ๐‘ฆ + 1.96๐œ เดค

๐‘Œ with 95% probability

โ–ช and because ๐œˆ เดค

๐‘Œ = ๐œˆ๐‘Œ, the population value ๐œˆ๐‘Œ must be

within าง ๐‘ฆ โˆ’ 1.96๐œ เดค

๐‘Œ,

าง ๐‘ฆ + 1.96๐œ เดค

๐‘Œ with 95% probability

โ–ช this is an interval estimate for ๐œˆ๐‘Œ โ–ช we say that าง ๐‘ฆ โˆ’ 1.96๐œ เดค

๐‘Œ,

าง ๐‘ฆ + 1.96๐œ เดค

๐‘Œ is a 95%

confidence interval for ๐œˆ๐‘Œ POINT AND INTERVAL ESTIMATES FOR ๐œˆ

slide-19
SLIDE 19

So: โ–ช we estimate ๐œˆ๐‘Œ by าง ๐‘ฆ โ–ช and we know with 95% probability that าง ๐‘ฆ โˆ’ 1.96๐œ เดค

๐‘Œ โ‰ค

๐œˆ๐‘Œ โ‰ค าง ๐‘ฆ + 1.96๐œ เดค

๐‘Œ

โ–ช the quantity ๐œ เดค

๐‘Œ = ๐œ๐‘Œ ๐‘œ is the standard error of the

distribution of the mean เดค ๐‘Œ โ–ช it is so important that we give it a special name: the standard error of the mean โ–ช sometimes (unfortunately!) abbreviated as the standard error POINT AND INTERVAL ESTIMATES FOR ๐œˆ

slide-20
SLIDE 20

We sample (๐‘œ = 25) from a normal population ๐‘Œ with unknown ๐œˆ๐‘Œ and known ๐œ๐‘Œ

2 = 4. We find าง

๐‘ฆ = 3.

  • a. Give a point estimate for ๐œˆ๐‘Œ.
  • b. Find the standard error of the mean, ๐‘ก เดค

๐‘Œ.

  • b. Give a 95%-confidence interval for ๐œˆ๐‘Œ.

EXERCISE 2

slide-21
SLIDE 21

โ–ช Carefully distinguish:

โ–ช ๐œˆ๐‘Œ (a value, often unknown) โ–ช าง ๐‘ฆ (a value from observations) โ–ช เดค ๐‘Œ (a distribution, not a value) โ–ช and its two parameters ๐œˆ เดค

๐‘Œ and ๐œเดค ๐‘Œ 2 (both are values, often

unknown)

โ–ช Later on, we will follow a similar logic, e.g.

โ–ช ๐œ๐‘Œ

2

โ–ช ๐‘ก๐‘Œ

2

โ–ช ๐‘‡๐‘Œ

2

โ–ช and its two parameters

CONCEPTS AND SYMBOLS

and the CLT claims that ๐œˆ เดค

๐‘Œ = ๐œˆ๐‘Œ

๐œเดค

๐‘Œ 2 = ๐œ๐‘Œ 2

๐‘œ

slide-22
SLIDE 22

23 March 2015, Q1h OLD EXAM QUESTION

slide-23
SLIDE 23

Doane & Seward 5/E 8.1-8.3 Tutorial exercises week 2 sampling distribution central limit theorem standard error FURTHER STUDY