Statistical Foundations: Sampling
17 February 2020 Modern Research Methods
Statistical Foundations: Sampling 17 February 2020 Modern Research - - PowerPoint PPT Presentation
Statistical Foundations: Sampling 17 February 2020 Modern Research Methods The Single Experiment Population Question Hypothesis Exp. Design Experimenter Data Analyst Code Estimate Claim Overview of course 1) Philosophy of Cumulati
17 February 2020 Modern Research Methods
Population Question Hypothesis
Experimenter Data Analyst Code Estimate Claim
Different Original
REPRO REPRODUCE CE = Get same result from same dataset.
(Patil, Peng, & Leek, 2019)
REPLI LICATE = Get same result with a new dataset
Population Question Hypothesis
Experimenter Data Analyst Code Estimate Claim Original Reproduction Replication
* Sometimes people are sloppy with these terms and use them interchangeably.
Low nameability High nameability
Population Question Hypothesis
Experimenter Data Analyst Code Estimate Claim Original Reproduction Replication
[Y [You] [Y [You]
High Nameability Condition = 75% Low Nameability Condition = 69% [Y [You]
High Nameability Condition = 75% Low Nameability Condition = 69%
(There are other measures of the center and dispersion of a distribution, but these are the measures we’re going to focus on here)
Distribution A −5 5 50 100 150 200
count
Distribution A Distribution B Distribution C Distribution D Distribution E Distribution F −5 5 −5 5 −5 5 −5 5 −5 5 −5 5 50 100 150 200
x count
Me Mean = 0 Lo Low variance Me Mean = 5 Lo Low variance Me Mean = 0 V.
va variance ce Me Mean = 3 Lo Low variance Me Mean = 0 Hi High gh v variance Me Mean = 2 Hi High gh v variance
(Thanks to Danielle Navarro, LSR https://learningstatisticswithr.com/)
Variance is the average squared deviation from the mean of a dataset. Standard deviation is the square root of variance.
Population
people who speak English (1.5 billion), or maybe all people at UW- Madisoin (44k)
Sample: Zettersten and Lupyan only tested 50 participants.
looks like (and we usually don’t). Challenge: Make (good) inferences about the population from the sample.
25000 50000 75000 100000 0.0 0.4 0.8
count
Population
2 4 6 0.4 0.6 0.8 1.0
count
Sample
Use mean of sample to estimate mean of population.
N = 50 N = a lot
1 2 3 4 5 0.2 0.4 0.6 0.8 1.00.2 0.4 0.6 0.8 1.00.2 0.4 0.6 0.8 1.00.2 0.4 0.6 0.8 1.00.2 0.4 0.6 0.8 1.0 2 4 6 8
count
Sample
0.0 2.5 5.0 7.5 10.0 0.70 0.72 0.74 0.76
count
1 2 3 4 5 0.2 0.4 0.6 0.8 1.00.2 0.4 0.6 0.8 1.00.2 0.4 0.6 0.8 1.00.2 0.4 0.6 0.8 1.00.2 0.4 0.6 0.8 1.0 2 4 6 8
count
Sample
17 17 18 19 19 22 22 22 22 21 21 24 24 24 24 23 23 23 26 26 26 26 26 26 25 25 25 25 25 25 25 25 28 28 28 28 28 28 27 27 27 27 30 30 30 30 30 30 30 30 29 29 29 29 29 29 29 29 32 32 32 32 31 31 31 31 31 31 31 34 34 34 34 34 33 33 33 33 33 33 33 33 33 36 36 36 36 36 36 35 35 35 37 37 38 40 39
4 8 12 16 18 20 22 24 26 28 30 32 34 36 38 40
Zorbia IQ
Zorbia Population IQ
N = 97 Mean = 29
come from different populations
Explore this Shiny app: https://gallery.shinyapps.io/CLT_mean/