Statistical Foundations: Sampling 17 February 2020 Modern Research - - PowerPoint PPT Presentation

statistical foundations sampling
SMART_READER_LITE
LIVE PREVIEW

Statistical Foundations: Sampling 17 February 2020 Modern Research - - PowerPoint PPT Presentation

Statistical Foundations: Sampling 17 February 2020 Modern Research Methods The Single Experiment Population Question Hypothesis Exp. Design Experimenter Data Analyst Code Estimate Claim Overview of course 1) Philosophy of Cumulati


slide-1
SLIDE 1

Statistical Foundations: Sampling

17 February 2020 Modern Research Methods

slide-2
SLIDE 2

Population Question Hypothesis

  • Exp. Design

Experimenter Data Analyst Code Estimate Claim

The Single Experiment

slide-3
SLIDE 3

Overview of course

1) 1) Philosophy of Cumulati tive Science Science 2) 2) The Single Experi riment t – Experimental data, tools in R for working with data and plotting data, reproducibility 3) 3) Repeati ting an Experi riment t – Intro to statistical concepts, replication of experiments 4) 4) Aggregati ting Many Experi riments ts – Meta-analysis

slide-4
SLIDE 4

Different Original

REPRO REPRODUCE CE = Get same result from same dataset.

(Patil, Peng, & Leek, 2019)

REPLI LICATE = Get same result with a new dataset

Population Question Hypothesis

  • Exp. Design

Experimenter Data Analyst Code Estimate Claim Original Reproduction Replication

* Sometimes people are sloppy with these terms and use them interchangeably.

slide-5
SLIDE 5

Low nameability High nameability

slide-6
SLIDE 6

Replication vs. Reproduction

Population Question Hypothesis

  • Exp. Design

Experimenter Data Analyst Code Estimate Claim Original Reproduction Replication

[Y [You] [Y [You]

slide-7
SLIDE 7

Original Replication

Replicating Zettersten and Lupyan (2020)

High Nameability Condition = 75% Low Nameability Condition = 69% [Y [You]

Sho Shoul uld you u ex expec ect to rep eplicate e the he origina nal find nding ng? Did you u re replicate it? What would convince you? Discuss with a partner.

slide-8
SLIDE 8

High Nameability Condition = 75% Low Nameability Condition = 69%

?= ?=

In order to evaluate this replication, we need think about sampling. In the next few classes, we’re going to discuss sampling in order to reason about the replicability of psychological effects.

slide-9
SLIDE 9

Reading for today

slide-10
SLIDE 10

Distributions

Distributions = counts of a variable Plot with histograms Two measures:

  • Me

Mean measures center (“central tendency”)

  • Va

Variance measures dispersion.

(There are other measures of the center and dispersion of a distribution, but these are the measures we’re going to focus on here)

Distribution A −5 5 50 100 150 200

count

slide-11
SLIDE 11

Distribution A Distribution B Distribution C Distribution D Distribution E Distribution F −5 5 −5 5 −5 5 −5 5 −5 5 −5 5 50 100 150 200

x count

Me Mean = 0 Lo Low variance Me Mean = 5 Lo Low variance Me Mean = 0 V.

  • V. High

va variance ce Me Mean = 3 Lo Low variance Me Mean = 0 Hi High gh v variance Me Mean = 2 Hi High gh v variance

What is the mean of these distributions? Which ones have low

  • vs. high variance?
slide-12
SLIDE 12

Calculating mean

(Thanks to Danielle Navarro, LSR https://learningstatisticswithr.com/)

slide-13
SLIDE 13

Calculating variance

Variance is the average squared deviation from the mean of a dataset. Standard deviation is the square root of variance.

slide-14
SLIDE 14

Our goal as scientists

  • As scientists, we want to es

estimate e paramet eter ers about the world.

  • One of the most common parameters is the mean.
  • For example: What is the mean accuracy in the high

nameability condition? What is the mean accuracy in the low nameability condition? (Zettersten & Lupyan, 2020)

  • Are the two means different from each other?
  • As psychologists we’re interested in the population of ALL

PEOPLE if they had done our experiment.

  • But, to save time and effort, we only measure a sa

sample.

slide-15
SLIDE 15

Population vs. sample

  • A sample is a random subset of the population.
  • That means there are really two distributions.
  • Pop

Population

  • n: The distribution of all people (7.53 billion), or maybe all

people who speak English (1.5 billion), or maybe all people at UW- Madisoin (44k)

  • Sa

Sample: Zettersten and Lupyan only tested 50 participants.

  • Unlike the Zorbia example, we don’t know what the population

looks like (and we usually don’t). Challenge: Make (good) inferences about the population from the sample.

slide-16
SLIDE 16

25000 50000 75000 100000 0.0 0.4 0.8

  • Prop. Right

count

Population

2 4 6 0.4 0.6 0.8 1.0

  • Prop. Right

count

Sample

Po Popul ulation Sa Sample

Use mean of sample to estimate mean of population.

N = 50 N = a lot

slide-17
SLIDE 17

1 2 3 4 5 0.2 0.4 0.6 0.8 1.00.2 0.4 0.6 0.8 1.00.2 0.4 0.6 0.8 1.00.2 0.4 0.6 0.8 1.00.2 0.4 0.6 0.8 1.0 2 4 6 8

  • Prop. Right

count

Sample

Po Popul ulation Sa Sample

slide-18
SLIDE 18

Sampling distribution of the mean

0.0 2.5 5.0 7.5 10.0 0.70 0.72 0.74 0.76

  • Prop. Right

count

1 2 3 4 5 0.2 0.4 0.6 0.8 1.00.2 0.4 0.6 0.8 1.00.2 0.4 0.6 0.8 1.00.2 0.4 0.6 0.8 1.00.2 0.4 0.6 0.8 1.0 2 4 6 8

  • Prop. Right

count

Sample

slide-19
SLIDE 19

Two things to know about the sampling distribution of the mean

  • 1. The mean of the sampling distribution is the

same as the mean of the population.

  • 2. The variance of the sampling distribution of

means gets smaller as the sample size

  • increases. (i.e. we get better at estimating the

population with more data)

slide-20
SLIDE 20

What’s the mean IQ of Zorbia?

17 17 18 19 19 22 22 22 22 21 21 24 24 24 24 23 23 23 26 26 26 26 26 26 25 25 25 25 25 25 25 25 28 28 28 28 28 28 27 27 27 27 30 30 30 30 30 30 30 30 29 29 29 29 29 29 29 29 32 32 32 32 31 31 31 31 31 31 31 34 34 34 34 34 33 33 33 33 33 33 33 33 33 36 36 36 36 36 36 35 35 35 37 37 38 40 39

4 8 12 16 18 20 22 24 26 28 30 32 34 36 38 40

Zorbia IQ

Zorbia Population IQ

N = 97 Mean = 29

slide-21
SLIDE 21

In class simulation

What can we learn from sampling the population? In groups of ~5:

  • 1. Cut the people of Zorbia out.
  • 2. Put them in the envelope.

3.

  • 3. Ea

Each pe person

  • n in

in the gr grou

  • up

p should take a sample of th three.

  • 4. Calculate the average.
  • 5. Write it on a stick note, and add it to the class plot
  • 6. Do steps 3-5 once more.
slide-22
SLIDE 22

Key points from Zorbia Simulation

  • Mores samples give better estimate of population mean
  • Two samples from the same population will tend to have

somewhat different means

  • Conversely, two different samples means does NOT mean that they

come from different populations

slide-23
SLIDE 23

Next Time: Distributions and probability

Explore this Shiny app: https://gallery.shinyapps.io/CLT_mean/

slide-24
SLIDE 24

Acknowledgements

  • Slides 12-13 have content adapted from Danielle Navarro,

Learning Statistics with R (https://learningstatisticswithr.com/)