1
play

1 2019 STAT 373/ Week 9 STAT 814_STAT714 Population values - PDF document

STAT 373/ Week 9 STAT 814_STAT714 LGAs Week 9: Example 1996 Australian Bureau of Statistics Albury, Armidale, Ashfield, Auburn, Ballina, census data Balranald, Bankstown, Barraba, Bathurst, We will use the data from Australian Baulkham


  1. STAT 373/ Week 9 STAT 814_STAT714 LGAs Week 9: Example 1996 Australian Bureau of Statistics Albury, Armidale, Ashfield, Auburn, Ballina, census data Balranald, Bankstown, Barraba, Bathurst, • We will use the data from Australian Baulkham Hills, Bega Valley, Bellingen, census 1996 as an example. Berrigan, Bingara, Blacktown, …….., • Units: Local Government Areas (LGAs) of Wollongong, Woollahra, Wyong, Yallaroi, NSW (182) at the time Yarrowlumla, Yass,Young • Data are in file LGA.MTW , available on the unit iLearn. 1 2 Variables in LGA.mtw • We will be using these data to illustrate sampling and estimation techniques. Variable Mean Median Variable Mean Median Total M 18335 6997 AusBorn 13302 5991 Total F 18803 6893 AusBorn 13716 6010 • Sampling frame: list of 182 LGAs with IDs Total P 37138 13890 AusBorn 27018 12001 GE15 M 14286 5233 OSBorn M 4246 512 from 1 to 182 (N=182 LGAs) GE15 F 14944 5193 OSBorn F 4256 432 GE15 P 29231 10426 OSBorn P 8502 935 Aborig M 269.9 146.5 AusCit M 16171 6274 Aborig F 277.5 142.0 AusCit F 16599 6330 Aborig P 547.3 292.0 AusCit P 32769 12605 • We will estimate quantities such as total Unempl M 944 345 Unempl F 609.7 209.5 overseas-born population of NSW on the Unempl P 1554 586 basis of a random sample, and compare our answer with the actual total population. 3 4 Overseas-born population • Say we wish to estimate the mean LGA OS- 100 born population,  , and the total NSW OS- Frequency born population  , on the basis of a simple 50 random sample of size n = 30 LGAs. (Note: You can find instructions on obtaining a SRS in Minitab on slides 46 and 47 of this 0 0 50000 100000 lecture) OSBorn P • Histogram of number of OS-born in all 182 Very skewed population... normal approximation LGAs, ie, the population (see next page). for sample mean (for n = 30) may be unlikely. 5 6 1 2019

  2. STAT 373/ Week 9 STAT 814_STAT714 Population values Sample (n=30) drawn using Minitab: (click Calc, Random Data, Sample from Columns and then follow it through) Descriptive Statistics LGA OS Born LGA OS Born Variable N Mean Median TrMean StDev SE Mean Tumbarumba 278 Dungog 377 Albury 3998 Bourke 162 OSBorn P 182 8502 935 5918 16237 1204 M uswellbrook 930 Pittwater 11177 Yarrowlumla 1449 Nambucca 1477 M udgee 1382 Junee 300 Variable Minimum Maximum Q1 Q3 Botany 16002 South Sydney 27729 OSBorn P 18 97203 275 8921 Hay 183 Narrabri 611 M aitland 3624 Urana 48 Great Lakes 2763 Rockdale 33491 • Mean:  = 8,502 W arringah 31893 Culcairn 227 Bland 266 M osman 7129 W agga W agga 3787 Lake M acquarie 16914 • Total:  = N  = 182×8,502=1,547,364 Crookwell 184 Kogarah 14914 Yass 879 Eurobodalla 3996 Holbrook 137 Shoalhaven 9502 7 8 Estimation of the population mean Sample Statistics Based on the sample of 30 LGAs, we have  y 6527 Descriptive Statistics  s 9713 estimated SE ( y )  s  ( 1  f ) / n 30 Variable N Mean Median TrMean StDev SE Mean   f 0.165 OSBorn s 30 6527 1463 5009 9713 1773 182 .975  t 2.0452 Variable Minimum Maximum Q1 Q3 29 OSBorn s 48 33491 275 9921 95% CI for population mean OS-born:  .975    y t s (1 f )/ n 29      6527 2.0452 9713 (1 0.165) /30  6,527  3,314  (3,213, 9,841) 9 10 Estimation of the population total NOTE: We have     y Ny 182 6527 1,187,914 • We find that the true population values of T  = 8,502 and  = 1,547,364 do in fact lie in  s 9713 large error bound;  f 0.165 the 95% confidence intervals. sample size may .975  t 2.0452 be too small. 29 • However, because of the severe skewness of 95% CI for total OS-born: the population values, it would have been  .975     y t N s (1 f )/ n T 29 more appropriate to stratify the population       1,187,914 2.0452 182 9713 (1 0.165)/30 on some criterion. [ We will return to this   1,187,914 603,175 issue later .]  (584,739,1,791,089) 11 12 2 2019

  3. STAT 373/ Week 9 STAT 814_STAT714 Sample size required Now let’s take a SRS of size n=114 Say we wish to estimate the total OS-born in NSW within 200,000 ( = error bound) persons of the true and see what error bound we get: value, with a probability of 0.95. Descriptive Statistics Take the previous sample as a pilot study. We estimate  as s =9713. Given D = 200,000, (From Lecture 8) Variable N Mean Median TrMean StDev SE Mean C26 114 9097 935 6044 17837 1671 Then we have  1 Variable Minimum Maximum Q1 Q3   2   200000 1 C26 72 97203 275 8464    n 182  1    113 . 296    182 1 . 96 9713     Take n 114 . 13 14 We have     y Ny 182 9097 1,655,654 T Note:  s 17,837 114 • Why has the error bound turned out to be f   0.626 182 364,263 (compared to 603,175 when n = 30),  z 1.96 (as we have a large sample here) still much greater than 200,000 as planned? .975 • Recall we used s to estimate the population 95% CI for total OS-born: standard deviation  in the calculation of y  z  N s   (1  f )/ n sample size, n. T .975       1,655,654 1.96 182 17,837 (1 0.626) /114   1,655,654 364,263 15 16 We had: • If we had used the population standard deviation,  = 16,237, in the calculation of n , we would have obtained Estimate of  from pilot sample : s = 9,713 Actual value :  = 16,237 >> s  1   2 1  200000     n 182 1 149.5      182 1.96 16237       Note : The pilot sample underestimated  , which led   us to underestimate the sample size required. ie , we need n 150. 17 18 3 2019

  4. STAT 373/ Week 9 STAT 814_STAT714 Estimating a population proportion p • We may want to estimate the – proportion/percentage (p) – number (a) Say we are interested in the presence/absence in the population that possess the of some characteristic, eg, characteristic. – person has HIV/AIDS – person watching a particular TV program A SRS of size n allows us to estimate – person supports the use of nuclear power in Australia • p = population proportion • a = Np = population total 19 20 Let Let r = number in sample with the  1 if i th member of pop n . has the characteristic u i =  characteristic of interest.  0 if i th member doesn’t have characteristic Then we estimate p by: Then    u u ... u r  1 2 N p  p ˆ N n   ( , population mean of the binary var iable u ) i and and a by       a u u ... u ( , population total ) 1 2 N N    a ˆ N ˆ p r n 21 22 Extra simplification i.e. Here u i 2 =u i since u i = 0 or 1; Thus we have • p = population mean (of the binary variable population variance worked out as follows: with values of 0 or 1)    N  2 u u i  2  i  1 • a = population total  N 1 1  N  2   2   [ u N ] NB : u , the population mean  i N 1 i  1 Good news 1  N   2 [ u N u ] i N  1 i  1 • We know properties of the estimators of N means (and totals), so we know properties  [ u  u 2 ] Recall ,   u  p  N 1 of estimators of p and a shown on Slide 21. N N      p ( 1 p ) pq , where q 1 p   N 1 N 1 23 24 4 2019

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend