1 2019 STAT 373/ Week 9 STAT 814_STAT714 Population values - PDF document

STAT 373/ Week 9 STAT 814_STAT714 LGAs Week 9: Example 1996 Australian Bureau of Statistics Albury, Armidale, Ashfield, Auburn, Ballina, census data Balranald, Bankstown, Barraba, Bathurst, • We will use the data from Australian Baulkham Hills, Bega Valley, Bellingen, census 1996 as an example. Berrigan, Bingara, Blacktown, …….., • Units: Local Government Areas (LGAs) of Wollongong, Woollahra, Wyong, Yallaroi, NSW (182) at the time Yarrowlumla, Yass,Young • Data are in file LGA.MTW , available on the unit iLearn. 1 2 Variables in LGA.mtw • We will be using these data to illustrate sampling and estimation techniques. Variable Mean Median Variable Mean Median Total M 18335 6997 AusBorn 13302 5991 Total F 18803 6893 AusBorn 13716 6010 • Sampling frame: list of 182 LGAs with IDs Total P 37138 13890 AusBorn 27018 12001 GE15 M 14286 5233 OSBorn M 4246 512 from 1 to 182 (N=182 LGAs) GE15 F 14944 5193 OSBorn F 4256 432 GE15 P 29231 10426 OSBorn P 8502 935 Aborig M 269.9 146.5 AusCit M 16171 6274 Aborig F 277.5 142.0 AusCit F 16599 6330 Aborig P 547.3 292.0 AusCit P 32769 12605 • We will estimate quantities such as total Unempl M 944 345 Unempl F 609.7 209.5 overseas-born population of NSW on the Unempl P 1554 586 basis of a random sample, and compare our answer with the actual total population. 3 4 Overseas-born population • Say we wish to estimate the mean LGA OS- 100 born population,  , and the total NSW OS- Frequency born population  , on the basis of a simple 50 random sample of size n = 30 LGAs. (Note: You can find instructions on obtaining a SRS in Minitab on slides 46 and 47 of this 0 0 50000 100000 lecture) OSBorn P • Histogram of number of OS-born in all 182 Very skewed population... normal approximation LGAs, ie, the population (see next page). for sample mean (for n = 30) may be unlikely. 5 6 1 2019

STAT 373/ Week 9 STAT 814_STAT714 Population values Sample (n=30) drawn using Minitab: (click Calc, Random Data, Sample from Columns and then follow it through) Descriptive Statistics LGA OS Born LGA OS Born Variable N Mean Median TrMean StDev SE Mean Tumbarumba 278 Dungog 377 Albury 3998 Bourke 162 OSBorn P 182 8502 935 5918 16237 1204 M uswellbrook 930 Pittwater 11177 Yarrowlumla 1449 Nambucca 1477 M udgee 1382 Junee 300 Variable Minimum Maximum Q1 Q3 Botany 16002 South Sydney 27729 OSBorn P 18 97203 275 8921 Hay 183 Narrabri 611 M aitland 3624 Urana 48 Great Lakes 2763 Rockdale 33491 • Mean:  = 8,502 W arringah 31893 Culcairn 227 Bland 266 M osman 7129 W agga W agga 3787 Lake M acquarie 16914 • Total:  = N  = 182×8,502=1,547,364 Crookwell 184 Kogarah 14914 Yass 879 Eurobodalla 3996 Holbrook 137 Shoalhaven 9502 7 8 Estimation of the population mean Sample Statistics Based on the sample of 30 LGAs, we have  y 6527 Descriptive Statistics  s 9713 estimated SE ( y )  s  ( 1  f ) / n 30 Variable N Mean Median TrMean StDev SE Mean   f 0.165 OSBorn s 30 6527 1463 5009 9713 1773 182 .975  t 2.0452 Variable Minimum Maximum Q1 Q3 29 OSBorn s 48 33491 275 9921 95% CI for population mean OS-born:  .975    y t s (1 f )/ n 29      6527 2.0452 9713 (1 0.165) /30  6,527  3,314  (3,213, 9,841) 9 10 Estimation of the population total NOTE: We have     y Ny 182 6527 1,187,914 • We find that the true population values of T  = 8,502 and  = 1,547,364 do in fact lie in  s 9713 large error bound;  f 0.165 the 95% confidence intervals. sample size may .975  t 2.0452 be too small. 29 • However, because of the severe skewness of 95% CI for total OS-born: the population values, it would have been  .975     y t N s (1 f )/ n T 29 more appropriate to stratify the population       1,187,914 2.0452 182 9713 (1 0.165)/30 on some criterion. [ We will return to this   1,187,914 603,175 issue later .]  (584,739,1,791,089) 11 12 2 2019

STAT 373/ Week 9 STAT 814_STAT714 Sample size required Now let’s take a SRS of size n=114 Say we wish to estimate the total OS-born in NSW within 200,000 ( = error bound) persons of the true and see what error bound we get: value, with a probability of 0.95. Descriptive Statistics Take the previous sample as a pilot study. We estimate  as s =9713. Given D = 200,000, (From Lecture 8) Variable N Mean Median TrMean StDev SE Mean C26 114 9097 935 6044 17837 1671 Then we have  1 Variable Minimum Maximum Q1 Q3   2   200000 1 C26 72 97203 275 8464    n 182  1    113 . 296    182 1 . 96 9713     Take n 114 . 13 14 We have     y Ny 182 9097 1,655,654 T Note:  s 17,837 114 • Why has the error bound turned out to be f   0.626 182 364,263 (compared to 603,175 when n = 30),  z 1.96 (as we have a large sample here) still much greater than 200,000 as planned? .975 • Recall we used s to estimate the population 95% CI for total OS-born: standard deviation  in the calculation of y  z  N s   (1  f )/ n sample size, n. T .975       1,655,654 1.96 182 17,837 (1 0.626) /114   1,655,654 364,263 15 16 We had: • If we had used the population standard deviation,  = 16,237, in the calculation of n , we would have obtained Estimate of  from pilot sample : s = 9,713 Actual value :  = 16,237 >> s  1   2 1  200000     n 182 1 149.5      182 1.96 16237       Note : The pilot sample underestimated  , which led   us to underestimate the sample size required. ie , we need n 150. 17 18 3 2019

STAT 373/ Week 9 STAT 814_STAT714 Estimating a population proportion p • We may want to estimate the – proportion/percentage (p) – number (a) Say we are interested in the presence/absence in the population that possess the of some characteristic, eg, characteristic. – person has HIV/AIDS – person watching a particular TV program A SRS of size n allows us to estimate – person supports the use of nuclear power in Australia • p = population proportion • a = Np = population total 19 20 Let Let r = number in sample with the  1 if i th member of pop n . has the characteristic u i =  characteristic of interest.  0 if i th member doesn’t have characteristic Then we estimate p by: Then    u u ... u r  1 2 N p  p ˆ N n   ( , population mean of the binary var iable u ) i and and a by       a u u ... u ( , population total ) 1 2 N N    a ˆ N ˆ p r n 21 22 Extra simplification i.e. Here u i 2 =u i since u i = 0 or 1; Thus we have • p = population mean (of the binary variable population variance worked out as follows: with values of 0 or 1)    N  2 u u i  2  i  1 • a = population total  N 1 1  N  2   2   [ u N ] NB : u , the population mean  i N 1 i  1 Good news 1  N   2 [ u N u ] i N  1 i  1 • We know properties of the estimators of N means (and totals), so we know properties  [ u  u 2 ] Recall ,   u  p  N 1 of estimators of p and a shown on Slide 21. N N      p ( 1 p ) pq , where q 1 p   N 1 N 1 23 24 4 2019

1 2019 STAT 373/ Week 9 STAT 814_STAT714 Population values - PDF document

STAT 373/ Week 9 STAT 814_STAT714 LGAs Week 9: Example 1996 Australian Bureau of Statistics Albury, Armidale, Ashfield, Auburn, Ballina, census data Balranald, Bankstown, Barraba, Bathurst, We will use the data from Australian Baulkham

Extracting Causal Rules from Spatio-temporal Data Antony Galton 1 , Matt Duckham 2 , and Alan Both

Application Site Area 2006 Aerial Photos Application Number : GU10/1707 Aerial 1 : Albury

EW Asia 2020 Vacuum Electronic Devices (VEDs): Innovation to Power Advanced Electronics Panel

Basic Research in Space Science at AFOSR 12 May 2015 Dr. Kent L. Miller Air Force Office of

RESULTS BRIEFING PERIOD ENDED 30 JUNE 2014 28 AUGUST 2014 PAGE 0 AGENDA Japara Healthcare

The Role of Water Markets Some Observations Lin Crase Some Starters Thanks to

2019 Annu nnual Sta State Cong ngre ress Monday 28 O Mo 28 October 20 2019 The Retu

A country boy...... Born and raised in the tin mining town of Ardlethan Enjoyed a happy if

Albury/Wodonga Family Law Pathways Network 14-year- old Ranis views I dont think there is a

April June 2017 Disclaimer Platinum Asset Management has prepared this presentation material.

A case study turning data into information and information into knowledge. Using HydroNET to

State of the GStreamer Project Jan Schmidt Centricular Ltd jan@centricular.com Who am I?

Latin and Greek Elements in English A Brief History of the English Language The Beginnings of

CSE 517 Natural Language Processing Winter 2017 Introduction Yejin Choi Slides adapted from

2019 NCSEA Board of Directors Election Lyndsy Irwin IV-D Director Mississippi Department of

Killer Marketing Strategies for Young Lawyers PRIMERUS Web Seminar January 18, 2011 John

Investing in Legal Aid to Promote Individual and Population Health Moderator: Mary McClymont,

Civility as the Core of Your Successful Career Sponsored by the Law Student Division, the Center

Families Caring for An Aging America Ladson Hinton, M.D. Karen Schumacher, Ph.D., R.N. Jennifer

Costa Rica: Sustainability Service Learning March 6 - 15, 2020 Paul Heimberger and Ryan

Scouting Hurricane Relief ONLINE DONATIONS THROUGH OCCONEECHEE COUNCIL, BSA FACEBOOK PAGE

Best Practices & Teaching Assessments Evidence to show Proficiency Section F Introduction:

FINANCE REFORM TASK FORCE JANUARY 31, 2018 STEPHEN FISHER Collective IQ "Alone we can do

Indiana Digital Preservation (InDiPres): A Collaborative Model for Long-Term Preservation

1 2019 STAT 373/ Week 9 STAT 814_STAT714 Population values - PDF document

STAT 373/ Week 9 STAT 814_STAT714 LGAs Week 9: Example 1996 Australian Bureau of Statistics Albury, Armidale, Ashfield, Auburn, Ballina, census data Balranald, Bankstown, Barraba, Bathurst, We will use the data from Australian Baulkham

Extracting Causal Rules from Spatio-temporal Data Antony Galton 1 , Matt Duckham 2 , and Alan Both

Application Site Area 2006 Aerial Photos Application Number : GU10/1707 Aerial 1 : Albury

EW Asia 2020 Vacuum Electronic Devices (VEDs): Innovation to Power Advanced Electronics Panel

Basic Research in Space Science at AFOSR 12 May 2015 Dr. Kent L. Miller Air Force Office of

RESULTS BRIEFING PERIOD ENDED 30 JUNE 2014 28 AUGUST 2014 PAGE 0 AGENDA Japara Healthcare

The Role of Water Markets Some Observations Lin Crase Some Starters Thanks to

2019 Annu nnual Sta State Cong ngre ress Monday 28 O Mo 28 October 20 2019 The Retu

A country boy...... Born and raised in the tin mining town of Ardlethan Enjoyed a happy if

Albury/Wodonga Family Law Pathways Network 14-year- old Ranis views I dont think there is a

April June 2017 Disclaimer Platinum Asset Management has prepared this presentation material.

A case study turning data into information and information into knowledge. Using HydroNET to

State of the GStreamer Project Jan Schmidt Centricular Ltd jan@centricular.com Who am I?

Latin and Greek Elements in English A Brief History of the English Language The Beginnings of

CSE 517 Natural Language Processing Winter 2017 Introduction Yejin Choi Slides adapted from

2019 NCSEA Board of Directors Election Lyndsy Irwin IV-D Director Mississippi Department of

Killer Marketing Strategies for Young Lawyers PRIMERUS Web Seminar January 18, 2011 John

Investing in Legal Aid to Promote Individual and Population Health Moderator: Mary McClymont,

Civility as the Core of Your Successful Career Sponsored by the Law Student Division, the Center

Families Caring for An Aging America Ladson Hinton, M.D. Karen Schumacher, Ph.D., R.N. Jennifer

Costa Rica: Sustainability Service Learning March 6 - 15, 2020 Paul Heimberger and Ryan

Scouting Hurricane Relief ONLINE DONATIONS THROUGH OCCONEECHEE COUNCIL, BSA FACEBOOK PAGE

Best Practices &amp; Teaching Assessments Evidence to show Proficiency Section F Introduction:

FINANCE REFORM TASK FORCE JANUARY 31, 2018 STEPHEN FISHER Collective IQ &quot;Alone we can do

Indiana Digital Preservation (InDiPres): A Collaborative Model for Long-Term Preservation

Best Practices & Teaching Assessments Evidence to show Proficiency Section F Introduction:

FINANCE REFORM TASK FORCE JANUARY 31, 2018 STEPHEN FISHER Collective IQ "Alone we can do