introduction to survey sampling
play

Introduction to Survey Sampling James M. Lepkowski & Michael - PowerPoint PPT Presentation

A three day short course sponsored by the Social & Economic Research Institute, Qatar University Introduction to Survey Sampling James M. Lepkowski & Michael Traugott Institute for Social Research University of Michigan April 29 May


  1. Exercise 1: Table of Random Digits R o w C o l u m n 1 -5 6 -1 0 1 1 -1 5 1 6 -2 0 2 1 -2 5 2 6 -3 0 3 1 -3 5 3 6 -4 0 4 1 -4 5 4 6 -5 0 5 1 -5 5 5 6 -6 0 6 1 -6 5 6 6 -7 0 7 1 -7 5 7 6 -8 0 8 1 -8 5 8 6 -9 0 49018 34042 72000 49522 85941 84723 51072 56454 67420 05025 25234 10671 05579 90906 54706 79486 57057 40468 1 97294 25351 12331 82557 13834 91334 32510 47165 08535 27491 87064 23579 72223 45164 98781 20189 17391 75145 2 97638 18356 31198 39366 37340 76043 77528 21714 44751 81797 28670 50973 07915 45259 45334 88904 47365 37249 3 34525 30477 75462 34635 51422 60669 62413 52524 79883 26235 46933 23381 72335 74702 77289 83419 28761 68996 4 79619 43993 89902 64817 88397 35390 44558 91500 87656 83603 00491 37693 75524 04058 77373 61598 60059 32241 5 54778 70353 54134 19513 89074 07807 74520 59684 47494 58194 29810 91489 45410 28737 55504 50467 94953 25565 6 12256 17900 33754 11853 65033 24106 41833 68345 62300 33076 70119 60498 70180 06929 34567 37075 57735 44602 7 33297 14796 91080 67108 85984 81892 37533 24643 37522 71461 96220 16177 04449 38396 09675 64290 96410 49117 8 75083 44991 46851 46383 00695 54453 34156 49854 68163 83123 89928 39667 15632 43854 04707 41766 01876 20016 9 66288 63908 74090 52902 69701 72959 64480 78123 81841 92675 08731 20577 94939 43211 63438 93640 75825 57922 1 0 84578 05698 92016 94285 26563 36372 55989 94790 36338 30640 81337 56599 05695 42896 57115 73143 49959 84903 1 1 55699 23402 30639 39508 41495 44462 11924 70471 97867 82637 18031 38020 70819 64948 17274 67345 31672 66155 1 2 51917 88538 58239 58633 80392 89447 81230 97654 52579 34888 06454 94398 16452 76723 00902 81924 73166 85669 1 3 36779 68538 88591 96616 84918 29413 99116 66987 41334 43877 00185 90070 43292 01754 01505 25362 39548 60933 1 4 49852 36333 84789 65346 46181 61218 54131 57370 64814 44430 43774 72286 11644 33071 74301 02154 37021 04828 1 5 66752 08578 57498 17884 83667 59532 73254 83347 85751 18536 55969 73265 06726 80734 29351 36800 77081 10687 1 6 61689 45570 53663 66779 85627 27662 34436 58824 18902 49414 05020 98033 85987 53127 72623 00983 92504 54686 1 7 19111 76703 32467 51391 85381 48433 68754 89843 02166 59177 80856 71628 27731 90073 04233 34913 46188 28778 1 8 46913 70576 16918 46675 02304 83330 55894 39684 20753 48885 72907 37048 80065 58931 78214 36397 97252 69593 1 9 22224 48264 96826 15434 52010 22811 07914 89541 61620 83346 96204 52742 27485 37716 71756 79244 04517 20831 2 0 84119 49920 29328 03239 15832 72406 94946 45797 70566 19586 26419 40852 70097 02276 93410 87952 71018 96533 2 1 75594 56191 18861 44995 44764 76960 12585 01842 19324 46085 33903 77234 07418 42805 21925 86305 12510 87281 2 2 34821 90491 28843 85959 72301 14576 94229 43353 55740 86145 73278 89446 36093 39173 07384 32388 17494 52734 2 3 23378 01578 09081 20536 31412 00632 16380 14876 26249 00449 26441 14765 05223 08297 54280 35937 02965 79389 2 4 09985 71346 32130 58906 97244 07003 91231 23396 47378 19064 01118 04376 83218 01890 94316 40309 41332 30966 2 5 43814 09227 11841 44516 62348 31284 58895 88559 19567 82425 00614 68626 10523 96822 79297 16858 52693 63887 2 6 26724 80216 75905 54725 46995 75504 79112 50571 57115 02600 35097 04329 78514 02663 48700 57166 30316 97649 2 7 37876 85859 19333 87221 44809 50700 57889 43075 99310 32235 62624 88356 51865 21946 52479 69599 29065 26434 2 8 23634 07454 63628 30531 52979 28534 03208 75663 33587 27738 04018 32256 32259 14042 27624 94889 91414 72658 2 9 10906 61337 16571 98829 96434 25748 01518 97758 93725 64532 79331 25961 82782 23354 47052 36078 12780 78331 3 0 09372 97239 72017 99537 99977 96404 04824 64248 68816 02734 38384 87274 18213 67600 18730 17870 02026 34180 3 1 86659 47171 96123 33853 64659 76657 53911 09900 70918 07733 89084 42345 22250 13583 52020 96144 25382 10875 3 2 78209 23140 94532 89438 43271 89616 63137 85026 15799 62580 70837 50071 74496 94191 45858 13545 66999 77390 3 3 15430 43742 77673 21745 34854 31505 05275 16758 58996 70211 97794 60918 98986 14446 72130 43056 13412 86691 3 4 64947 43432 14105 78393 03682 47498 75738 76250 69143 19799 31068 31261 31912 47359 26853 62917 40581 40772 3 5 71143 09505 65318 29034 89055 17744 48752 69171 08426 08827 14816 61969 68694 19168 67081 26010 68211 80384 3 6 03104 54280 49703 72368 99964 68555 57769 27567 55962 31100 26364 61603 48176 04177 00935 05130 83625 66323 3 7 56085 69548 50876 92855 52293 11580 22797 94044 67994 50651 26397 01782 73341 80486 72738 66943 75883 10106 3 8 41842 68437 92724 67791 21113 47124 28279 50647 09809 26717 48925 14686 24824 38530 62429 57330 33340 07994 3 9 28521 08035 30260 91407 04111 18581 84777 87116 96280 09202 31360 02923 83625 19821 35903 86927 36021 90593 4 0 85133 15310 42745 84831 82992 73756 67473 62066 83254 02735 55402 39765 92121 07338 39944 36882 74892 00148 4 1 28122 35506 71104 96492 90721 22225 23256 30415 63671 27160 19768 08441 38172 15357 73851 53381 20093 42073 4 2 4 3 56665 12467 44282 00817 58668 70312 66617 75720 93458 74491 72624 45673 68051 53523 58745 13730 93676 87636 19871 89889 70142 63766 71799 97398 23855 08350 11993 16729 23096 75940 45632 05786 46643 52563 30407 28338 4 4 48253 37932 79566 98774 02523 54942 15195 01354 03979 36909 21991 08828 45452 75565 90933 08713 36319 70259 4 5 80828 98357 85671 69918 30878 48784 81471 43729 60566 81014 68445 82593 59634 16601 05712 80642 26928 11496 4 6 09863 88615 26990 94808 32784 51992 60048 09830 75745 30593 64917 90209 55266 57533 68877 37486 91998 30055 4 7 05754 47499 53052 86074 01045 90121 12938 84746 55683 64345 22413 08513 04316 38192 73202 99160 56397 77063 4 8 32883 01773 11423 07799 12268 59983 60446 16744 12452 81457 56278 49040 31680 66267 05187 69329 28067 78017 4 9 82869 70040 36427 18798 57316 09565 11637 30597 11151 46114 30048 60952 48736 39133 79698 90272 80447 88785 5 0 25

  2. Faculty Mem ber Salaries (in $1,000) Seq. ID Division Sex Ran Salary Seq. ID Division Sex Ran Salary Seq. ID Division Sex Ran Salary No. No. No. 1 1 Eng&Prof m 3 $88 51 155 Eng&Prof m 3 $55 101 217 Lit&SocSci m 2 $55 2 2 Medicine f 3 $45 52 156 Biol&Sci m 1 $49 102 218 Medicine m 3 $80 3 9 Medicine m 3 $57 53 157 Eng&Prof m 3 $57 103 219 Eng&Prof m 1 $114 11 Medicine m 1 $133 158 Medicine m 1 $118 220 Lit&SocSci m 1 $63 4 54 104 12 Eng&Prof f 2 $71 159 Medicine m 3 $84 221 Medicine m 1 $112 5 55 105 6 13 Lit&SocSci m 1 $113 56 160 Eng&Prof m 3 $52 106 222 Medicine m 1 $93 7 14 Medicine f 3 $65 57 161 Medicine m 3 $64 107 223 Lit&SocSci m 2 $47 15 Biol&Sci m 3 $47 162 Eng&Prof m 1 $75 224 Biol&Sci m 1 $127 8 58 108 16 Lit&SocSci f 3 $39 163 Medicine f 1 $87 225 Eng&Prof m 2 $121 9 59 109 10 17 Biol&Sci m 1 $74 60 164 Eng&Prof m 3 $58 110 226 Medicine m 3 $58 11 18 Medicine m 1 $88 61 165 Medicine f 3 $39 111 227 Biol&Sci f 3 $97 12 19 Lit&SocSci m 1 $62 62 166 Medicine m 3 $69 112 228 Lit&SocSci m 1 $71 37 Lit&SocSci m 1 $49 167 Medicine f 2 $46 229 Eng&Prof m 1 $72 13 63 113 38 Medicine m 3 $88 179 Eng&Prof f 1 $86 230 Lit&SocSci m 3 $29 14 64 114 15 39 Medicine m 1 $181 65 180 Medicine m 3 $87 115 231 Medicine m 2 $167 16 40 Eng&Prof m 3 $63 66 181 Medicine m 3 $59 116 232 Lit&SocSci m 3 $36 41 Medicine m 2 $94 182 Eng&Prof f 3 $44 233 Medicine m 1 $57 17 67 117 42 Eng&Prof m 1 $91 183 Medicine m 2 $123 234 Biol&Sci m 1 $107 18 68 118 19 43 Medicine m 1 $60 69 184 Lit&SocSci f 3 $37 119 235 Medicine m 2 $88 20 44 Eng&Prof m 3 $55 70 185 Lit&SocSci m 1 $106 120 236 Medicine m 2 $87 21 45 Biol&Sci m 2 $55 71 186 Lit&SocSci m 1 $91 121 237 Lit&SocSci f 2 $43 46 Medicine f 1 $106 187 Lit&SocSci m 1 $78 238 Lit&SocSci m 1 $79 22 72 122 47 Medicine m 1 $116 188 Biol&Sci m 1 $77 239 Medicine m 2 $113 23 73 123 24 48 Medicine m 3 $79 74 189 Medicine m 1 $90 124 240 Medicine m 3 $55 25 49 Lit&SocSci m 1 $61 75 190 Eng&Prof m 2 $71 125 280 Medicine m 3 $57 50 Lit&SocSci f 3 $37 191 Medicine f 3 $42 281 Eng&Prof m 3 $56 26 76 126 51 Medicine m 2 $72 192 Medicine f 2 $59 282 Eng&Prof m 2 $65 27 77 127 28 52 Eng&Prof m 1 $105 78 193 Eng&Prof m 2 $49 128 283 Medicine m 2 $42 29 59 Medicine m 2 $79 79 194 Biol&Sci m 1 $83 129 284 Medicine m 1 $102 30 133 Medicine m 1 $61 80 195 Lit&SocSci m 1 $34 130 285 Medicine f 3 $40 134 Medicine m 1 $86 196 Medicine f 3 $42 286 Eng&Prof m 3 $53 31 81 131 135 Biol&Sci m 1 $103 197 Medicine m 2 $97 287 Medicine m 3 $82 32 82 132 33 136 Lit&SocSci m 1 $48 83 198 Medicine m 1 $109 133 288 Medicine m 2 $64 34 137 Eng&Prof m 2 $64 84 199 Lit&SocSci f 2 $48 134 289 Eng&Prof m 1 $72 138 Eng&Prof m 1 $78 200 Medicine m 1 $47 290 Biol&Sci f 3 $36 35 85 135 139 Medicine f 2 $53 201 Eng&Prof m 2 $45 291 Lit&SocSci f 1 $66 36 86 136 37 140 Biol&Sci m 1 $85 87 202 Medicine m 3 $83 137 292 Medicine f 3 $66 38 141 Eng&Prof m 1 $61 88 203 Medicine m 2 $51 138 293 Medicine m 2 $102 142 Medicine m 1 $106 204 Biol&Sci m 1 $78 294 Biol&Sci m 1 $103 39 89 139 143 Lit&SocSci m 2 $60 205 Lit&SocSci m 1 $70 295 Medicine m 1 $148 40 90 140 144 Biol&Sci f 1 $73 206 Eng&Prof f 2 $46 296 Lit&SocSci f 1 $60 41 91 141 42 145 Medicine m 1 $70 92 207 Eng&Prof m 1 $85 142 297 Lit&SocSci f 3 $46 43 147 Medicine f 3 $32 93 208 Lit&SocSci m 1 $53 143 298 Lit&SocSci f 1 $57 148 Lit&SocSci m 2 $49 209 Medicine f 3 $40 299 Medicine f 2 $50 44 94 144 149 Eng&Prof m 3 $43 210 Eng&Prof m 1 $87 300 Lit&SocSci m 1 $90 45 95 145 46 150 Medicine m 1 $75 96 211 Lit&SocSci m 1 $71 146 301 Eng&Prof m 3 $63 47 151 Lit&SocSci m 1 $92 97 212 Medicine m 1 $75 147 303 Eng&Prof m 1 $80 152 Medicine m 2 $107 214 Biol&Sci m 1 $85 304 Medicine m 3 $56 48 98 148 153 Biol&Sci m 2 $57 215 Lit&SocSci m 2 $50 305 Medicine m 1 $72 49 99 149 50 154 Medicine m 2 $114 100 216 Medicine m 3 $118 150 306 Eng&Prof m 1 $96 26

  3. Faculty Mem ber Salaries (Continued) Seq. ID Division Sex Ran Salary Seq. ID Division Sex Ran Salary Seq. ID Division Sex Ran Salary No. No. No. 151 307 Medicine m 3 $65 201 440 Medicine m 1 $108 251 496 Medicine m 3 $60 152 308 Lit&SocSci m 3 $37 202 441 Lit&SocSci m 1 $48 252 497 Eng&Prof m 1 $86 153 309 Eng&Prof m 1 $127 203 442 Medicine m 3 $85 253 498 Medicine m 1 $134 310 Lit&SocSci m 1 $90 443 Lit&SocSci m 1 $59 499 Medicine f 3 $63 154 204 254 311 Lit&SocSci m 3 $45 444 Lit&SocSci f 1 $63 500 Medicine m 1 $123 155 205 255 156 312 Eng&Prof f 1 $75 206 445 Lit&SocSci f 2 $46 256 501 Medicine m 3 $85 157 313 Medicine m 2 $60 207 446 Medicine f 3 $41 257 502 Medicine f 3 $42 158 314 Lit&SocSci m 2 $57 208 447 Medicine m 3 $71 258 503 Medicine f 2 $83 315 Medicine m 1 $129 448 Eng&Prof f 3 $44 504 Lit&SocSci m 1 $54 159 209 259 316 Eng&Prof m 1 $102 449 Lit&SocSci m 2 $46 505 Lit&SocSci f 1 $66 160 210 260 317 Eng&Prof m 3 $57 450 Medicine m 3 $85 506 Medicine m 1 $84 161 211 261 162 318 Eng&Prof m 3 $61 212 452 Medicine m 1 $119 262 507 Eng&Prof m 3 $46 163 319 Eng&Prof m 1 $93 213 453 Medicine m 2 $69 263 508 Eng&Prof m 1 $90 164 320 Medicine f 3 $41 214 454 Eng&Prof m 3 $74 264 509 Medicine m 2 $76 321 Medicine m 1 $181 455 Biol&Sci m 1 $59 510 Eng&Prof m 1 $88 165 215 265 322 Medicine f 2 $69 456 Biol&Sci m 1 $53 515 Medicine f 1 $87 166 216 266 167 323 Lit&SocSci m 1 $81 217 457 Medicine f 3 $49 267 516 Eng&Prof m 3 $75 168 324 Biol&Sci m 1 $94 218 459 Eng&Prof m 1 $78 268 517 Eng&Prof m 3 $64 169 325 Lit&SocSci m 2 $53 219 460 Biol&Sci m 1 $68 269 518 Biol&Sci f 3 $52 326 Medicine m 3 $48 461 Eng&Prof m 1 $83 519 Medicine m 2 $109 170 220 270 327 Lit&SocSci m 1 $83 462 Eng&Prof m 1 $105 520 Lit&SocSci m 1 $144 171 221 271 328 Lit&SocSci m 1 $47 463 Lit&SocSci m 3 $37 521 Eng&Prof m 2 $79 172 222 272 173 329 Lit&SocSci m 3 $45 223 464 Medicine m 1 $111 273 522 Biol&Sci m 1 $56 174 330 Medicine f 1 $75 224 465 Medicine f 2 $70 274 530 Biol&Sci m 1 $60 175 331 Medicine m 3 $49 225 466 Eng&Prof m 1 $57 275 531 Biol&Sci m 3 $52 333 Medicine m 3 $53 467 Eng&Prof m 1 $71 532 Lit&SocSci f 2 $45 176 226 276 334 Eng&Prof m 1 $84 468 Biol&Sci m 3 $36 533 Lit&SocSci m 1 $59 177 227 277 178 335 Eng&Prof m 1 $78 228 469 Eng&Prof f 3 $43 278 534 Eng&Prof m 3 $56 179 336 Lit&SocSci m 1 $102 229 470 Eng&Prof m 1 $120 279 535 Medicine m 1 $123 180 337 Lit&SocSci f 2 $50 230 471 Lit&SocSci m 1 $66 280 536 Medicine m 2 $75 338 Medicine f 2 $49 472 Eng&Prof m 1 $84 537 Eng&Prof m 1 $84 181 231 281 339 Medicine m 1 $54 473 Medicine m 2 $99 538 Medicine m 2 $70 182 232 282 340 Medicine m 3 $35 474 Biol&Sci f 1 $91 539 Medicine m 3 $84 183 233 283 184 341 Medicine m 2 $87 234 475 Eng&Prof m 2 $105 284 540 Eng&Prof m 1 $63 185 342 Lit&SocSci m 1 $52 235 476 Medicine f 2 $60 285 541 Eng&Prof m 1 $121 186 343 Lit&SocSci m 1 $75 236 477 Medicine f 3 $34 286 542 Medicine m 1 $52 344 Medicine f 3 $41 478 Medicine f 3 $42 543 Biol&Sci m 1 $73 187 237 287 345 Eng&Prof m 2 $62 479 Medicine m 2 $80 544 Eng&Prof f 3 $32 188 238 288 189 346 Medicine m 1 $79 239 480 Medicine m 1 $94 289 545 Eng&Prof f 3 $40 190 347 Biol&Sci m 3 $37 240 481 Biol&Sci m 1 $57 290 546 Biol&Sci m 3 $47 191 348 Lit&SocSci m 3 $44 241 482 Medicine m 1 $82 291 547 Medicine m 1 $112 349 Lit&SocSci m 3 $47 483 Lit&SocSci m 1 $70 548 Biol&Sci m 1 $68 192 242 292 353 Medicine m 1 $70 484 Lit&SocSci m 1 $75 550 Medicine m 2 $93 193 243 293 433 Lit&SocSci m 1 $113 485 Medicine m 1 $139 551 Medicine m 1 $124 194 244 294 195 434 Medicine m 3 $55 245 486 Lit&SocSci m 1 $40 295 552 Lit&SocSci f 2 $49 196 435 Lit&SocSci m 1 $50 246 488 Lit&SocSci m 2 $60 296 556 Medicine f 3 $65 197 436 Lit&SocSci f 2 $54 247 489 Eng&Prof f 1 $128 297 557 Eng&Prof m 1 $84 437 Eng&Prof m 3 $53 490 Medicine m 3 $47 558 Medicine f 2 $71 198 248 298 438 Biol&Sci m 1 $79 491 Eng&Prof m 3 $67 559 Medicine f 3 $40 199 249 299 200 439 Biol&Sci m 2 $53 250 495 Eng&Prof m 1 $90 300 560 Medicine m 2 $70 27

  4. Faculty Mem ber Salaries (Continued) Seq. ID Division Sex Ran Salary Seq. ID Division Sex Ran Salary Seq. ID Division Sex Ran Salary No. No. No. 301 561 Eng&Prof m 1 $98 351 636 Lit&SocSci f 1 $72 562 Lit&SocSci m 1 $89 637 Eng&Prof m 1 $94 302 352 563 Medicine f 3 $36 638 Eng&Prof m 3 $52 303 353 564 Medicine m 1 $63 639 Biol&Sci m 1 $66 304 354 565 Eng&Prof m 2 $74 640 Eng&Prof m 3 $68 305 355 566 Medicine f 3 $38 641 Lit&SocSci m 1 $89 306 356 567 Eng&Prof m 3 $76 642 Medicine m 2 $148 307 357 308 568 Medicine m 3 $97 358 643 Medicine m 1 $159 309 569 Medicine m 1 $76 359 644 Biol&Sci m 1 $62 310 570 Eng&Prof m 1 $86 360 645 Lit&SocSci m 1 $70 311 571 Medicine m 3 $59 361 646 Medicine f 3 $109 312 572 Medicine f 2 $60 362 647 Eng&Prof m 1 $120 313 573 Lit&SocSci m 2 $45 363 648 Eng&Prof m 1 $112 314 595 Biol&Sci m 2 $56 364 649 Medicine m 2 $90 596 Lit&SocSci m 1 $63 650 Medicine m 1 $108 315 365 597 Lit&SocSci m 1 $69 651 Eng&Prof m 1 $152 316 366 598 Eng&Prof m 1 $138 652 Medicine f 2 $47 317 367 599 Lit&SocSci f 3 $31 653 Medicine m 1 $116 318 368 600 Medicine f 2 $50 654 Biol&Sci m 1 $77 319 369 601 Eng&Prof m 1 $89 655 Biol&Sci M 1 $57 320 370 602 Eng&Prof m 1 $148 321 322 603 Lit&SocSci m 3 $55 323 604 Lit&SocSci m 1 $81 324 605 Lit&SocSci m 1 $52 325 606 Medicine m 3 $85 326 607 Medicine m 1 $132 327 608 Lit&SocSci m 1 $85 609 Eng&Prof m 1 $66 328 610 Eng&Prof f 1 $94 329 611 Eng&Prof m 2 $77 330 612 Medicine f 2 $76 331 613 Medicine m 1 $109 332 614 Lit&SocSci m 1 $99 333 334 616 Eng&Prof f 2 $78 335 617 Eng&Prof m 1 $98 336 618 Medicine f 3 $41 337 619 M edicine f 3 $37 338 620 Eng&Prof m 3 $89 339 622 Biol&Sci m 2 $55 340 623 Lit&SocSc m 1 $52 341 624 Eng&Prof m 3 $42 342 625 Biol&Sci m 2 $52 343 626 Lit&SocSc m 1 $63 344 627 Lit&SocSc m 1 $95 345 628 M edicine f 3 $75 346 629 M edicine f 3 $106 347 630 Lit&SocSc f 3 $44 348 631 Lit&SocSc m 1 $58 349 632 Lit&SocSc m 1 $79 350 633 Lit&SocSc m 1 $135 28

  5. 3. Historical perspective • Historical development • The beginnings • Development • Divergence • Framework for comparison • Selection bias • Development, part II • What should we do? 29

  6. Historical development • Sampling practice: – Result of attempts to solve practical problems • Function of theory – Formalize implicit assumptions, and confirm, correct, or extend practice • Origins – Data gathering • health and social problems • social physics – Census – Monography 30

  7. The beginnings • Berne, 1895 – Kaier at ISI: Representative method • Miniature of country • Large number of units • Use prior information in selection – Von Mayr and others • No calculation where observation is possible • Cf. Godambe, Basu after 1950 – Cheysson and others • Monography: detailed examination of typical cases 31

  8. Development • 1903 ISI Resolution – Four implicit principles • Representative • Objective • Measurability • Specification – Actuality • Multistage proportionate stratified samples (no theory) 32

  9. Divergence • Representative • Objective – Purposive sampling – Randomized selection – Expert choice – Bowley, 1906 (colleague of R.A. Fisher) – Balanced sampling 33

  10. Separation • ISI Commission 1926 report – Sampling established as basis for information collection – Equal status given to random and purposive sampling – No theory for unequal sized clusters • No basis for comparing the two methodologies 34

  11. Framework for comparison • Neyman, 1934 • The sampling distribution – Properties of sample under repeated sampling • All possible samples and their associated probabilities of occurrence – The sampling distribution of an estimator 35

  12. Conditions for inference • Conditions under which different procedures will produce valid estimates – Probability sampling • “Unbiased” irrespective of population structure – Purposive/balanced/quota sampling • Tough assumptions about population structure, unlikely to be achieved in practice 36

  13. Selection bias • Italian census storage problem • Sample of completed forms to be retained • Gini and Galvani, 1929 – Matched sample communes on 7 variables – Other variables, even aspects other than means of 7 variables, showed wide deviations from population values 37

  14. What should we do? • Probability sampling for objectivity • Stratification for precision (representativeness) • Variance estimation from the sample • Complete and comprehensible description of the sampling procedure 38

  15. 4. Element samples • Element samples • The sampling distribution • Properties of the sampling distribution • Central limit theorem • Properties of the sample mean for SRS • Estimation of variance • Determination of sample size • Formulas • Exercise 2 39

  16. Element samples • A sample design for which the unit of selection is the population element • Basic framework: Neyman, 1934 – Must be applicable to all populations – Must not depend on assumptions about the population structure – Appropriate for large populations of elements 40

  17. Element samples • Repeated sampling – Objective (mechanical) selection of elements – Consider possible outcomes of the sampling process – Evaluation of the whole set of possible outcomes 41

  18. The sampling distribution • The set of all possible values of the estimator that can be obtained with a given sample design – For a given sample we obtain a particular value, the estimate (such as ) y • We want to know … – … how likely is the estimate to be close to the population value 42

  19. Sample realization • In fact, we select just one sample • The estimate may be correct, or incorrect • Want to maximize the probability of a satisfactory estimate 43

  20. Properties of the sampling distribution • Unbiasedness   – Expected value (average value): E y • Variability from one sample to another Var y ( ) – Variance of the estimator – The square root of the variance is called the standard error of the estimator: Var y ( ) • Measurable design – A design for which the variance can be estimated from the sample itself 44

  21. Central limit theorem • For large samples, the sampling distribution of y is Normal  • Confidence intervals y z Var y ( )   (1 /2) 45

  22. Properties of the sample mean for SRS   • Unbiased E y • Variance   2 n S   – Consider   Var y ( ) 1   n N n    1 f 1 • N 2 S • n •   N   1     2   2 – Where and 2 S P 1 P S Y Y i N 1  i 1 46

  23. Estimation of variance 2 2 • Can use (sample) to estimate (population) s S   2 n S   • Estimate of (population)   Var y ( ) 1   N n   2 n s   –   ( ) 1 (sample) var y   N n Y • From a single sample we can not only estimate using but also estimate the precision of using y y ( ) var y n 1           2 • Note that and for a 2 2 s p 1 p s y y i n 1 proportion  i 1 47

  24. Determination of sample size • What sample size do we need to obtain a given standard error of the estimator? • population variance known (or guessed) 2 S – Census – Other surveys – Administrative records • Desired standard error – Policy requirements in terms of Var y ( ) – Decision making requirements 48

  25. Sample size formulas   2 n S     • In general, Var y ( ) 1   N n • For an infinitely large population (or for sampling with replacement), this is 2 S  ( ) Var y n • We can calculate the necessary sample size to      achieve variance as 2 n S Var y Var y 49

  26. Sample size formulas (continued) • In general (that is, not assuming N is large), the variance may be expressed as   2 2 n S S      Var y ( ) 1   N n n '   n   – Where   n ' n 1   N 50

  27. Sample size formulas (continued) • We can compute the necessary as n ' 2 S  ' n   Var y • To calculate the n necessary for a population of a particular size, we use the formula n '  n n '  1 N 51

  28. Exercise 2 • The variability in income levels is comparable across many countries S  • For a country with a value of (which 2,000 S  would give ), we want an 2 4,000,000 estimate of the mean income which has a   standard error ( ) of 50. Var y • Answer the following questions in groups: 52

  29. Exercise 2 (continued) Calculate the sample size needed in China with N = 1,400,000,000? What about in the US where N = 320,000,000? What about in Qatar where N = 1,700,000? What about in a small city where N = 100,000? What about in a small town where N = 10,000? 53

  30. 5. Systematic sampling • Systematic sampling • Problems with intervals in systematic sampling • Solutions • Exercise 3 54

  31. Systematic sampling • A simple method of selecting a sample from a list • Once the first element is chosen, every k th element is selected by counting through the list sequentially • In probability sampling, the first element is chosen at random 55

  32. Sampling intervals  • Determine the sampling interval k N n • Select a random number (RN) from 1 to k • Add k repeatedly • Example: – N = 12,000 dwellings in a city – Sample of n = 500 required – k = 12,000/500 = 24 – Take a RN from 01 to 24, say 03 – Take the third dwelling, and every 24 th thereafter: 3, 27, 51, etc . 56

  33. Problems with intervals  • Take 1 in k where k N n • k may not be an integer • Examples – N = 9, n = 2, and k = 4.5 – N = 952, n = 200, and k = 4.76 – N = 170,345, n = 1,250, and k = 136.272 57

  34. Solutions: round sampling interval • Round the fractional interval – Let the sample size vary, depending on the choice of the “integer interval” k – Example: N = 9, n =2, take k = 4 or 5 • If k = 4 and RN = 1, the sample is elements 1, 5, 9. • If RN = 2, 3, or 4, the sample has only two elements • If k = 5 and RN = 1, 2, 3, or 4, the sample has two elements • If RN = 5, the sample has only one element – Under this method, what happened when N = 952 and n = 200? – What about for N = 170,345 and n = 1,250? 58

  35. Solutions: elimination or duplication • Eliminate, or duplicate, population elements by epsem to get exact multiple – Example: N = 9 and n = 2. Eliminate one of 9 at random, and take 1 in 4 of remaining 8. – If N = 952 and n = 200, duplicate 48 at random, and take 1 in 5 from the 1,000 listed elements – If N = 170,345 and n = 1,250, eliminate 345 at random, and take 1 in 136 of the remainder 59

  36. Solutions: circular list • Treat the list as circular • Select one element at random from anywhere on the list • Take every [ k ]th thereafeter, where [ k ] is an integer near N / n , until n selections are made 60

  37. Exercise 3 • Consider again the list of 370 faculty member salaries given in Exercise 1 (slides 23 ‐ 25) • Suppose again we seek a sample of n = 20 from this list Each group should select two systematic samples of n = 20 from the list using as random starts the next appropriate numbers from the random number table (slide 22) ‐‐ that is, the next random number after the last one used in Exercise 1 61

  38. Exercise 3 (continued) Each group should select two systematic samples of n = 20 from the list using as random starts the next appropriate numbers from the random number table (slide 22) ‐‐ that is, the next random number after the last one used in Exercise 1 Since N / n is not an integer, use for one sample the rounding method (letting the sample size vary depending on the choice of k ) for the first sample And the circular list method for the second sample 20 1   For each sample, compute the mean salary y y i 20  i 1 62

  39. 6. Cluster sampling • Cluster sampling • Equal ‐ sized cluster sampling • Effective sample size • Design effect • Intra ‐ class correlation • Exercise 4 63

  40. Cluster sampling • Populations widely distributed geographically • Cannot afford to visit n sites drawn randomly from the entire area • Cluster sampling reduces the cost of data collection – Sample schools and children within them – Sample blocks and households within them 64

  41. Cluster sampling • Cluster sampling is also useful when the sampling frame lists clusters and not elements – Select clusters and list elements in selected clusters – Frame of blocks: list households within selected blocks • Clusters are often naturally occurring units – Facilitates sample selection 65

  42. Cluster sampling • Suppose we select an SRS of a = 10 classrooms from A = 1,000, and examine the immunization history of all b = 24 children in selected classrooms    • Here n a b 240 • We refer to the A classrooms as primary sampling units or PSU’s 66

  43. Cluster sampling • For each of the a = 10 selected PSU’s, we record the number of children immunized: 9 11 13 15 16 17 18 20 20 21 , , , , , , , , , 24 24 24 24 24 24 24 24 24 24 • Adding the numerators, there are 160 immunized children • The overall proportion immunized is p   160/ 240 0.67 67

  44. Cluster sampling • Recall for SRS (without replacement selection n   of n elements), the sample mean was y y n i  i 1 • The estimated sampling variance is       2 var y 1 f s n • But for an SRS of a equal ‐ sized clusters from A , we have a for each selected PSU p  68

  45. Cluster sampling: variance estimation • In cluster sampling, treat the sample as an SRS of a units from A :    1 f  2 var( ) p s a a a        2  2 – Where 1 s p p a  a   1  f a A / a     2 p p     1 f    • That is, 1 var( ) p  a a 1 69

  46. Cluster sampling: estimated variance • For the illustration,   2 2     1 9 160 11 160        2      s  a       10 1 24 240 24 240    0.02816        2 var p 1 f s a 0.002760 a       se p var p 0.0525 70

  47. Design effect • If the sample had instead been an SRS of n = 240 children from all schools, then  p 160/ 240    p 1 p       var 1 p f  SRS n 1  0.0009112 71

  48. Design effect • Compared to cluster sampling, the estimated variance of p is considerably smaller for SRS • A ratio quantifies the comparison:   var p 0.002760      deff p 3.029   var p 0.0009112 SRS 72

  49. roh • The design effect is a function of … – the size of the clusters b – the degree of homogeneity of elements within clusters • The homogeneity is measured by the intra ‐ cluster correlation roh • The design effect is given by        1 1 deff p b roh 73

  50. Estimating roh • The intra ‐ cluster correlation can be estimated from the design effect:   1  deff p  roh  b 1  3.029 1   24 1  0.088 74

  51. Features of roh • roh is a property of the clusters and the variable under study • roh is substantive, not statistical • roh is nearly always positive – Elements in a cluster tend to resemble one another • Source of roh – Environment – Self ‐ selection – Interaction 75

  52. Magnitude of r oh • Magnitude depends on – The characteristic (variable) under study ( e.g. , disease status, age) – The nature of the clusters ( e.g. , households, establishments) – The size of the cluster ( e.g. , household, blocks of household, census tracts) 76

  53. Effective sample size • Alternatively, the actual sample size is n = 240 in the cluster sample, but an SRS that is equally precise would only have to have 240   n 79 eff 3.209 77

  54. Examples • Consider alternative outcomes for our sample of a = 10 classrooms – Homogeneity with, heterogeneity between 0 0 0 16 24 24 24 24 24 24 , , , , , , , , , 24 24 24 24 24 24 24 24 24 24     2 s 0.2222 var p 0.02178 a  23.90 1 deff    23.90 0.996 roh  24 1   n 240/ 23.9 10 eff 78

  55. Examples • Heterogeneity within, homogeneity among: 16 16 16 16 16 16 16 16 16 16 , , , , , , , , , 24 24 24 24 24 24 24 24 24 24     2 s 0.0 var p 0.0 a deff  0  n 240/ 0 eff 79

  56. Exercise 4 • An equal probability ( epsem ) sample of n = 2,400 was obtained from a one ‐ stage sample of 60 equal ‐ sized clusters selected by SRS • In a journal article describing survey results, we found the following information – For a key proportion, p = 0.40   p  – And var 0.00021795 Estimate deff and roh 80

  57. 7. Two ‐ stage sampling • Two ‐ stage sampling • Portability of roh • Exercise 5 81

  58. Two ‐ stage sampling • Selecting many elements per cluster increases variances • Even small values of roh can be magnified by large b since        deff p 1 b 1 roh • Consider the following for    n a b 240  240 1 a b a b      f 1000 24 24000 24000 100 82

  59. Subsamples of size b • Sample a = 20 classrooms and b = 12:           deff p 1 12 1 0.088 1.97 n 122 • Sample a = 30 classrooms and b = 8: eff           deff p 1 8 1 0.088 1.62 n 148 • Sample a = 80 classrooms and b = 3: eff           deff p 1 3 1 0.088 1.18 n 204 eff 83

  60. Portability of roh • Estimation • Design    1 f    var ( ) p deff var ( ) p 2 var ( ) p s (2) (2) (2), SRS (1) a a      p 1 p   p 1 p var ( ) y  (1), SRS var ( ) y n (2), SRS n (1) (2) var ( ) p  (1)    deff deff 1 ( b 1) roh (1) var ( ) p (2) (2) (1), SRS  deff 1 roh  (1) roh  b 1 (1) 84

  61. Exercise 5 • Suppose the sample described in Exercise 4 (with n = 2,400 and a = 60) is to be repeated with a smaller sample of n =1,200 and in only a = 30 equal ‐ sized clusters Project how large the sampling variance of p will be under this new design. 85

  62. Exercise 5 (continued) • Now suppose the reduced size of n = 1,200 is retained, but we want to consider a = 60 equal ‐ sized clusters. Project how large the sampling variance of p will be under this new design. 86

  63. 8. Probability proportionate to size sampling • Unequal ‐ sized cluster sampling • Sampling with fixed rates • Control of subsample size • Selection of fixed size subsamples • PPS sampling • Systematic PPS sampling • Exercise 6 87

  64. Unequal ‐ sized cluster sampling • Naturally occurring clusters tend to be unequal in size • Fixed sampling rates and unequal sized clusters result in variation in sample size 88

  65. Consider the following sample of 12 schools: School School B B a a 1 308 7 393 2 823 8 148 3 146 9 321 4 809 10 393 5 827 11 207 6 775 12 850 89

  66. Fixed rate sample • An epsem sample of n = 100 students is to be selected from the N = 6,000 students in f   the 12 schools: 100 6000 1/ 60 • Two stages: Select a = 2 schools, say an SRS of a = 2 schools (a rate of 2/12 = 1/6) • And then choose students at the rate 1/10 within the selected schools     f    1 6 1/10 1 60 90

  67. Unequal subsample sizes • Suppose schools 3 and 8 are chosen – Subsampling at the rate of 1/10 yields sample size   • On the other hand, if schools 5 and 12 were n      146 148 /10 14.6 14.8 29.4 chosen instead, • Subsample size varies from 29 to 143 …   n      727 750 /10 72.7 75 142.7 – Sample administration becomes difficult 91

  68. Sample size variation • Variation in the overall sample size is undesirable • Since n is a random variable, no   n 1      y y longer applies i  n  i 1 • We need to use a ratio estimator a  y  y     1 r a  x x    1 92

  69. Control of subsample size • In the survey literature, we need to find a way to control the sample size – keep it from varying • A controlled sample size provides administrative convenience in fieldwork • It also has greater statistical efficiency • Several methods – we discuss two – Select exactly b elements per cluster – Probability proportionate to size (PPS) 93

  70. Selection of fixed subsample sizes • Suppose a = 2 schools are chosen at random • And b = 50 students are chosen at random per selected school • Sample size is n = 2 x 50 =100 – Sample size does not vary across samples! • But this design, on average across, all possible samples, over ‐ represent students in small schools – Why? 94

  71. Selection of fixed subsample sizes • For example, for school 3,    f   1 6 50 146 1/17.52 • While for school 12,    f   1 6 50 750 1/ 90 • If students in large schools are different than those in small, we have bias • The bias can be taken care of through weighting (later discussion) 95

  72. PPS • Require a method that is equal chance for students ( epsem ) • And still achieves equal sized subsamples – And thus achieves fixed sample sizes • Again, consider a = 2 and b = 50 • “Selection equation:” 1 50       f P 60 B  96

  73. PPS: Achieving epsem • For example, if school 1 is chosen, then 1 50 1            f P P 60 308 6.16 • In order to make this epsem for students, we need for each school to be selected with probability … 1 50 1 B B              P OR P 60 B 60 50 3000  97

  74. PPS: Selection by size • Re ‐ expressing this in terms of selecting both schools,     2 2 B B       P 6000 B   • In general, this becomes, across two stages,   a B b a b n           f P and   B B B N    a a 98

  75. PPS selection of schools B B School Cum.   1 308 308  2 823 1131 702 3 146 1277  4 809 2086 1744 5 827 2913 6 775 3688 7 393 4081 8 148 4229 9 321 4550 10 393 4943 11 207 5150 12 850 6000 99

  76. PPS:Choosing schools • Select Random Numbers (RN’s) from 1 to 6000: – RN = 702 – RN = 1744 • Find the first school with cumulative sum greater than or equal to the first RN • Find the next school with sum greater than the second RN • These choose hospitals 2 and 4: 100

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend