Sampling Sampling
Vorasith Sornsrivichai, M.D., FETP Cert. Epidemiology Unit, Faculty of Medicine, PSU
Sampling Sampling Vorasith Sornsrivichai, M.D., FETP Cert. - - PowerPoint PPT Presentation
Sampling Sampling Vorasith Sornsrivichai, M.D., FETP Cert. Epidemiology Unit, Faculty of Medicine, PSU Objectives Objectives 1. Explain the need for survey sampling 2. Define the following terms: Reference population, study
Vorasith Sornsrivichai, M.D., FETP Cert. Epidemiology Unit, Faculty of Medicine, PSU
2
Objectives Objectives
1. Explain the need for survey sampling 2. Define the following terms: – Reference population, study population, study sample – Internal validity, external validity – Probability sampling, equal probability selection method, disproportionate sampling – Stratification, design effect 3. Describe principles of & steps in sampling for a household survey – Simple, systematic, stratified random sampling, cluster sampling
3
Outline of Presentation Outline of Presentation
sampling
sampling bias
4
POPULATION SAMPLE
5
Population (N) Population (N)
“universe”), from which a sample may be drawn
necessarily a population of persons
6
Sample (n) Sample (n)
–Random or nonrandom –Representative or nonrepresentative
that are representative of the whole population
7
Population (N=100) Population (N=100)
20 40 60 80 100 y 20 40 60 80 100 x
Wisdom Score Age (yr)
Does wisdom come with age?
8
Sample (n=10) Sample (n=10)
20 40 60 80 100 y 20 40 60 80 100 x
Wisdom Score Age (yr)
Does wisdom come with age?
9
Population (N=100) Population (N=100)
20 40 60 80 100 y 20 40 60 80 100 x
Wisdom Score Age (yr)
Does wisdom come with age?
“ “ Pain makes man think. Pain makes man think. Thought makes man wise. Thought makes man wise. Wisdom makes life endurable." Wisdom makes life endurable."
~ John Patrick ~ ~ John Patrick ~
11
Why Do We Sample Populations? Why Do We Sample Populations?
populations
12
Hierarchy of Population Hierarchy of Population
population you want to know about
you collect the data
estimate the parameter in the target population
13
Hierarchy of Population Hierarchy of Population
Reference/external population
Study/target population
Actual population
Study sample/population (Sample)
Statistical inference Issue of chance Internal validity Issue of bias External Validity (Generalizability) Issue of population difference Sampling
14
Validity & Precision Validity & Precision
Valid, and precise Valid, not precise Not valid, but precise Not valid, not precise
15
Validity & Precision Validity & Precision
– Measurement reflects true value of population – Improved by good design, sampling scheme, quality assurance
– The measurement results conform to themselves – Improved by increasing the sample size
16
Truth is (almost) Everything Truth is (almost) Everything
estimate of the target population is better than a big sample that gives a precise but false estimate
17
Representativeness Representativeness
18
Type of Samples Type of Samples
selected is unknown – Convenience or accidental or haphazard samples e.g. Man-in-the-street surveys, grab sample
– Purposive or subjective samples e.g. expert sample, quota sample
population has a known probability of being selected
19
Probability Sample Probability Sample
selection –May have an equal chance of being selected –Or, if a stratified sampling method is used, the chance of being selected can be varied
20
Probability Sample Probability Sample
–Assigning an identity (label, number) to all individuals in the population –Arranging them in alphabetical order and numbering in sequence, or simply assigning a number to each, or by grouping according to area of residence and numbering the groups
21
Probability Sample Probability Sample
–Select individuals (or groups) for study by a random procedure such as use of a table of random numbers (or comparable procedure) to ensure that the chance of selection is known
"To conquer fear is the beginning of wisdom." "To conquer fear is the beginning of wisdom."
~ Bertrand Russell ~ ~ Bertrand Russell ~
Any questions? :-)
23
Sampling Sampling
subjects from all the subjects in a particular group –Conclusions based on sample results may be attributed only to the population sampled –Any extrapolation to a larger or different population is a judgment or a guess and is not part of statistical inference
24
Definition of Sampling Terms Definition of Sampling Terms
– Any list of all the sampling units in the population
– Sample drawn from sampling frame in the first stage of sample selection
– Method of selecting sampling units from sampling frame
25
EPSM EPSM
A sample that each final unit of selection in the population has an equal probability of selection –Simple random sampling, systematic random sampling are EPSM samples
26
Disproportionate Sampling Disproportionate Sampling
– Cost efficiency – Important small subgroup population
sampling fraction or probability of selection (n/N) is not the same for all strata
27
Sampling Error Sampling Error
a parameter caused by the random nature of the sample
–of mean, proportion, differences, etc
–Sample size –Amount of variability in measuring factor of interest
28
Sampling Variation Sampling Variation
sample is determined by chance, the result of analysis in two or more samples will differ, purely by chance
29
Sampling Bias Sampling Bias
nonrandom sample of a population
30
E+ D+ E+ D- E- D+ E- D- E+ D+ E+ D- E- D+ E- D- E+ D+ E+ D- E- D+ E- D- E+ D+ E+ D- E- D+ E- D-
Selection Bias Selection Bias
n N
"Mistakes are "Mistakes are the usual bridge the usual bridge between between inexperience inexperience and wisdom." and wisdom."
~ Phyllis Theroux ~ ~ Phyllis Theroux ~
32
Selecting a Sampling Method Selecting a Sampling Method
– Heterogeneity with respect to variable of interest – Size/geographical distribution
sampling error
33
Methods in Probability Sampling Methods in Probability Sampling
34
Simple Random Simple Random Sampling (SRS) Sampling (SRS)
– Each person has an equal chance of being selected from the entire population
– Assign each person a number, starting with 1, 2, 3, and so on – Numbers are selected at random until the desired sample size is attained
35
Random Table Random Table
Row <--------- uniform random digits --------------------------> 1 57245 39666 18545 50534 57654 25519 35477 71309 12212 98911 2 42726 58321 59267 72742 53968 63679 54095 56563 09820 86291 3 82768 32694 62828 19097 09877 32093 23518 08654 64815 19894 4 97742 58918 33317 34192 06286 39824 74264 01941 95810 26247 5 48332 38634 20510 09198 56256 04431 22753 20944 95319 29515 6 26700 40484 28341 25428 08806 98858 04816 16317 94928 05512 7 66156 16407 57395 86230 47495 13908 97015 58225 82255 01956 8 64062 10061 01923 29260 32771 71002 58132 58646 69089 63694 9 24713 95591 26970 37647 26282 89759 69034 55281 64853 50837 10 90417 18344 22436 77006 87841 94322 45526 38145 86554 42733 11 78886 86557 11295 07253 29289 44814 58898 36929 66839 81250 12 39681 54696 38482 48217 73598 93649 92705 34912 18981 74299 13 38265 45196 31143 82190 27279 79883 20219 38823 84543 22119 14 34270 41885 00079 63600 59152 10670 27951 77830 05368 58315 15 73869 34748 75787 88844 89522 71436 04166 06246 20952 56808 16 21732 36017 69149 70330 90500 73110 92908 55789 73450 68282
36
Simple Random Sampling Simple Random Sampling
1 Albert D. 2 Richard D. 3 Belle H. 4 Raymond L. 5 Stéphane B. 6 Albert T. 7 Jean William V. 8 André D. 9 Denis C. 10 Anthony Q. 11 James B. 12 Denis G. 13 Amanda L. 14 Jennifer L. 15 Philippe K. 16 Eve F. 17 Priscilla O. 18 Frank V.L. 19 Brian F. 20 Hellène H. 21 Isabelle R. 22 Jean T. 23 Samanta D. 24 Berthe L. 25 Monique Q. 26 Régine D. 27 Lucille L. 28 Jérémy W. 29 Gilles D. 30 Renaud S. 31 Pierre K. 32 Mike R. 33 Marie M. 34 Gaétan Z. 35 Fidèle D. 36 Maria P. 37 Anne-Marie G. 38 Michel K. 39 Gaston C. 40 Alain M. 41 Olivier P. 42 Geneviève M. 43 Berthe D. 44 Jean Pierre P. 45 Jacques B. 46 François P. 47 Dominique M. 48 Antoine C.
37
Simple Random Simple Random Sampling Sampling
– Simple – Sampling error easily measured
– Need complete & up-to-date list of units – For a wide geographic area, travel costs is
– Does not always achieve best representativeness
38
Systematic Systematic Sampling Sampling
– Units drawn with a constant interval between successive units – Equal chance of being selected for each unit
– Calculate sampling interval (k = N/n) – Draw a random number (≤ k) for random starting point – Draw every kth units from first unit
39
Systematic Sampling Systematic Sampling
f=11/93
40
Systematic Sampling Systematic Sampling
41
Systematic Systematic Sampling Sampling
– Provide better spread, ensures representativeness across list – Can improve precision – Easy to implement
– Dangerous if list has cycles or periodic – Travel cost
42
Systematic Sampling Systematic Sampling
f=7/93
43
Stratified Sampling Stratified Sampling
– Dividing the population into subgroups according to some important characteristic e.g. age – Selecting a random sample out of each subgroup – If proportion of the sample drawn from each strata is the same as the proportion of the population in each stratum (Probability Proportional to Size-PPS) then all strata will be fairly represented with the sample
– Classify population into homogeneous subgroups (strata) – Draw sample in each strata – Combine results of all strata
44
Stratified Sampling with PPS Stratified Sampling with PPS
Stratification Sampling
15 10
N=25 n=5
3 2
45
Stratified Sampling Stratified Sampling
– More precise if interesting variable associated with strata – All subgroups represented, allowing separate conclusions about each of them
– Sampling error difficult to measure – Loss of precision if very small numbers sampled in individual strata
46
Cluster Cluster Sampling Sampling
– Each unit selected is a group of units (a village, an ED. etc.) rather than an individual
– Random sample of groups (“clusters”) of units – In selected clusters, all units or proportion (sample) of units included – Sampling within cluster may be simple random or systematic
47
EPI 30 Clusters Survey EPI 30 Clusters Survey
Community Pop. size Cum. pop. size 1 110 110 2 100 210 3 130 340 4 100 440 5 120 560 6 80 640 7 160 800 8 110 910 | | | 50 300 4000
communities (4000) by the number of clusters to be selected (30) = sampling interval (133)
between 1 and 133 (118.) Since 118 lies between 110 and 210, community 2 will be chosen
118 + 133 = 251, so community 3 is chosen and so on
(total sample size/no. of clusters = 210/30 = 7)
48
Cluster Sampling Cluster Sampling
Section 4 Section 5 Section 3 Section 2 Section 1
49
Cluster Sampling Cluster Sampling
– List of sampling units within population not required – Less travel/resources required
– Imprecise if clusters homogeneous and therefore sample variation greater than population variation (large design effect) – Sampling error difficult to measure
50
Steps in Cluster Sampling Steps in Cluster Sampling
suitable for survey for estimating proportion not so good for testing hypothesis
geographic definition such as rural/urban area in the specific province(s)
51
Steps in Cluster Sampling Steps in Cluster Sampling
and villages in the study province(s). Each village should have the most recent number of population or household.
districts in the study province is relative stable over time, this enumeration table (with name of village in the first and its population size in the second column) will be use as sampling frame. In the first stage of sampling, the PSU is village
the number in the third column
then sampling interval = total pop. / 30 = i
52
Steps in Cluster Sampling Steps in Cluster Sampling
1 to the sampling interval. Say r
population is just over r
population is just over r + i
population is just over r + 2 i
similarly
53
Steps in Cluster Sampling Steps in Cluster Sampling
transportation, security and availability of other facilities: shelters, food
each selected village. Have an initial visit. Employ 1-2 locals to facilitate the survey. Make sure that there would be no serious problem during the day of survey
interviewers in a non-selected village
54
Steps in Cluster Sampling Steps in Cluster Sampling
Choose a random starting point and visit consecutive nearest household. Ask for eligible
desire number, check the completeness of the questionnaire and leave the village
55
Design Effect Design Effect
Global variance p(1-p) Var srs = ---------- n Cluster variance
p= global proportion pi= proportion in each stratum n= number of subjects k= number of strata
Σ (pi-p)² Var clust = ------------- k(k-1) Design effect = ------------------ Var srs Var clust
56
Design Effect Design Effect
= Actual sample size (SS) / Effective SS e.g. cluster sampling SS / SRS SS = 1+(m−1)ρ If m = cluster size k = no. of cluster ρ = Intracluster correlation coefficient
57
The The Intracluster Intracluster Correlation Correlation Coefficient (ICC, Coefficient (ICC, ρ ρ-
rho) )
data; by comparing the variance within clusters with the variance between clusters.
variability divided by the sum of the within- cluster and between-cluster variabilities.
ICC (ρ) = Sb
2
(Sb
2 + Sw 2)
58
Effective Sample Size Effective Sample Size
each (total 128 patients.) Given ρ = 0.017, what is the effective sample size (ESS) after adjusting for clustering?
If m = cluster size = 32, k = no. of cluster = 4, ρ = 0.017
59
Multistage Sampling Multistage Sampling
– Several chained samples – Several statistical units
– No complete listing of population required – Most feasible approach for large populations
– Several sampling lists – Sampling error difficult to measure
60
Example: Multistage Sampling Example: Multistage Sampling
among school children in a country
– Sample of regions drawn from country – Sample of provinces drawn from each selected region – Sample of schools drawn in each selected province – Sample children within selected schools
"A prudent question is one half of wisdom." "A prudent question is one half of wisdom."
~ Francis Bacon ~ ~ Francis Bacon ~ Any questions? :-)
"The rain is famous for falling "The rain is famous for falling
but if I had the management but if I had the management
I would rain softly and sweetly I would rain softly and sweetly
but if I caught a sample of but if I caught a sample of the unjust outdoors the unjust outdoors I would drown him I would drown him“ “
~ Mark Twain ~ ~ Mark Twain ~