Statistical Foundations: Sampling 17 February 2020 Modern Research - PowerPoint PPT Presentation

Statistical Foundations: Sampling 17 February 2020 Modern Research Methods

The Single Experiment Population Question Hypothesis Exp. Design Experimenter Data Analyst Code Estimate Claim

Overview of course 1) Philosophy of Cumulati 1) tive Science Science 2) 2) The Single Experi riment t – Experimental data, tools in R for working with data and plotting data, reproducibility t – Intro to statistical concepts, 3) 3) Repeati ting an Experi riment replication of experiments ts – Meta-analysis 4) 4) Aggregati ting Many Experi riments

Original Reproduction Replication Population Original Question Different Hypothesis Exp. Design REPRO REPRODUCE CE = Get same result Experimenter from same dataset. Data Analyst REPLI LICATE = Get same result with a new dataset Code Estimate * Sometimes people are sloppy with these Claim terms and use them interchangeably. (Patil, Peng, & Leek, 2019)

High nameability Low nameability

Original Reproduction Replication Replication vs. Population Reproduction Question Hypothesis Exp. Design [You] [Y Experimenter Data [Y [You] Analyst Code Estimate Claim

Replicating Zettersten and Lupyan (2020) Original Replication [Y [You] High Nameability Condition = 75% Low Nameability Condition = 69% Sho Shoul uld you u ex expec ect to rep eplicate e the he origina nal find nding ng? Did you u re replicate it? What would convince you? Discuss with a partner.

High Nameability Condition = 75% ?= ?= Low Nameability Condition = 69% In order to evaluate this replication, we need think about sampling. In the next few classes, we’re going to discuss sampling in order to reason about the replicability of psychological effects.

Reading for today

Distributions Distributions = counts of a variable Distribution A Plot with histograms 200 Two measures: Mean measures center (“central • Me 150 count tendency”) 100 Variance measures dispersion. • Va 50 (There are other measures of the center and 0 dispersion of a distribution, but these are the − 5 0 5 measures we’re going to focus on here)

What is the mean of these distributions? Which ones have low vs. high variance? Distribution A Distribution B Distribution C Distribution D Distribution E Distribution F 200 150 count 100 50 0 − 5 0 5 − 5 0 5 − 5 0 5 − 5 0 5 − 5 0 5 − 5 0 5 x Mean = 0 Me Mean = 5 Me Me Mean = 0 Me Mean = 3 Mean = 0 Me Mean = 2 Me Low variance Lo Lo Low variance V. V. High Low variance Lo Hi High gh v variance Hi High gh v variance va variance ce

Calculating mean (Thanks to Danielle Navarro, LSR https://learningstatisticswithr.com/)

Calculating variance Variance is the average squared deviation from the mean of a dataset. Standard deviation is the square root of variance.

Our goal as scientists • As scientists, we want to es estimate e paramet eter ers about the world. • One of the most common parameters is the mean. • For example: What is the mean accuracy in the high nameability condition? What is the mean accuracy in the low nameability condition? (Zettersten & Lupyan, 2020) • Are the two means different from each other? • As psychologists we’re interested in the population of ALL PEOPLE if they had done our experiment. • But, to save time and effort, we only measure a sa sample.

Population vs. sample • A sample is a random subset of the population. • That means there are really two distributions. • Pop Population on : The distribution of all people (7.53 billion), or maybe all people who speak English (1.5 billion), or maybe all people at UW- Madisoin (44k) • Sa Sample : Zettersten and Lupyan only tested 50 participants. • Unlike the Zorbia example, we don’t know what the population looks like (and we usually don’t). Challenge: Make (good) inferences about the population from the sample.

Population Popul Po ulation 100000 75000 N = a lot count 50000 25000 0 0.0 0.4 0.8 Prop. Right Sample 6 Sa Sample 4 Use mean of sample to estimate count N = 50 mean of population. 2 0 0.4 0.6 0.8 1.0 Prop. Right

Popul Po ulation Sample Sa Sample 1 2 3 4 5 8 6 count 4 2 0 0.2 0.4 0.6 0.8 1.00.2 0.4 0.6 0.8 1.00.2 0.4 0.6 0.8 1.00.2 0.4 0.6 0.8 1.00.2 0.4 0.6 0.8 1.0 Prop. Right

Sampling distribution of the mean Sample 1 2 3 4 5 8 6 count 4 2 0 0.2 0.4 0.6 0.8 1.00.2 0.4 0.6 0.8 1.00.2 0.4 0.6 0.8 1.00.2 0.4 0.6 0.8 1.00.2 0.4 0.6 0.8 1.0 10.0 Prop. Right 7.5 count 5.0 2.5 0.0 0.70 0.72 0.74 0.76 Prop. Right

Two things to know about the sampling distribution of the mean 1. The mean of the sampling distribution is the same as the mean of the population. 2. The variance of the sampling distribution of means gets smaller as the sample size increases. (i.e. we get better at estimating the population with more data)

What ’ s the mean IQ of Zorbia? Zorbia Population IQ 16 29 29 25 29 33 25 29 33 12 25 29 33 25 29 31 33 25 27 29 31 33 N = 97 25 27 29 31 33 35 Mean = 29 8 25 27 30 31 33 35 23 25 27 30 31 33 35 21 23 26 28 30 31 33 36 21 23 26 28 30 31 34 36 4 22 24 26 28 30 32 34 36 18 22 24 26 28 30 32 34 36 38 17 19 22 24 26 28 30 32 34 36 37 39 17 19 22 24 26 28 30 32 34 36 37 40 18 20 22 24 26 28 30 32 34 36 38 40 Zorbia IQ

In class simulation What can we learn from sampling the population? In groups of ~5: 1. Cut the people of Zorbia out. 2. Put them in the envelope. 3. 3. Ea Each pe person on in in the gr grou oup p should take a sample of th three . 4. Calculate the average. 5. Write it on a stick note, and add it to the class plot 6. Do steps 3-5 once more.

Key points from Zorbia Simulation • Mores samples give better estimate of population mean • Two samples from the same population will tend to have somewhat different means • Conversely, two different samples means does NOT mean that they come from different populations

Next Time: Distributions and probability Explore this Shiny app: https://gallery.shinyapps.io/CLT_mean/

Acknowledgements • Slides 12-13 have content adapted from Danielle Navarro, Learning Statistics with R (https://learningstatisticswithr.com/)

Statistical Foundations: Sampling 17 February 2020 Modern Research - PowerPoint PPT Presentation

Statistical Foundations: Sampling 17 February 2020 Modern Research Methods The Single Experiment Population Question Hypothesis Exp. Design Experimenter Data Analyst Code Estimate Claim Overview of course 1) Philosophy of Cumulati

Sampling Methods Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 11 Sampling Rejection Sampling

Chapter 7. Sampling Chapter 7. Sampling methods? methods? Two types of sampling methods Two

Multiple importance sampling Slides for CS6630 lecture 6 sampling the BRDF sampling the

What is the strengths and weakness of these sampling methods? Sampling Strengths /

Sampling Overview R toy sampling Non-probability sampling Probability Methods (AKA random)

Sampling Sediment and Sampling Sediment and Sampling Sediment and Porewater Sampling Sediment

Sampling Methods CMSC 678 UMBC Outline Recap Monte Carlo methods Sampling Techniques Uniform

Double, Multiple, and Sequential Sampling Double-sampling In a double-sampling plan, a first

recap to this point foundations foundations foundations foundations genetics =

Boosting: Foundations and Algorithms Boosting: Foundations and Algorithms Boosting: Foundations

Newfound Water Quality Sampling: In Lake Sampling 8 Historic Sampling locations

Sampling Distributions Sampling Distribution of the Mean & Hypothesis Testing Sampling

Overview of Sampling Topics (Shannon) sampling theorem Impulse-train sampling

Probabilistic Foundations of Statistical Network Analysis Chapter 3: Network sampling Harry Crane

Statistical guidelines for sampling Statistical guidelines for sampling marine avian populations

Probabilistic Foundations of Statistical Network Analysis Chapter 5: Statistical modeling paradigm

Surv rviving Restructure Welcome Surviving Restructure - Introductions Sandra Leek Catherine

THE 3-R'S OF DATA- THE 3-R'S OF DATA- SCIENCE: SCIENCE: REPEATABILITY REPEATABILITY, ,

Understanding parallel analysis methods for rank selection in PCA David Hong Yue Sheng Edgar

Explainable (Deep) Learning and Simulation approaches Torsten Mller Visualization and

Youth Involvement Team Brahmpreet Gulati Member of Leicester City Young Peoples Council

How to Make Best Use of Cross-Company Data for Web Effort Estimation? Leandro L. Minku

RCE-EM: from Citizen to Civic Science Linking our activities to quality education for

Automatic Presentations and Classes of Semigroups Graham Oliver University of Leicester Joint

Sambuz

Useful Links

Newsletter

Mail Us

Statistical Foundations: Sampling 17 February 2020 Modern Research - PowerPoint PPT Presentation

Statistical Foundations: Sampling 17 February 2020 Modern Research Methods The Single Experiment Population Question Hypothesis Exp. Design Experimenter Data Analyst Code Estimate Claim Overview of course 1) Philosophy of Cumulati

Sampling Methods Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 11 Sampling Rejection Sampling

Chapter 7. Sampling Chapter 7. Sampling methods? methods? Two types of sampling methods Two

Multiple importance sampling Slides for CS6630 lecture 6 sampling the BRDF sampling the

What is the strengths and weakness of these sampling methods? Sampling Strengths /

Sampling Overview R toy sampling Non-probability sampling Probability Methods (AKA random)

Sampling Sediment and Sampling Sediment and Sampling Sediment and Porewater Sampling Sediment

Sampling Methods CMSC 678 UMBC Outline Recap Monte Carlo methods Sampling Techniques Uniform

Double, Multiple, and Sequential Sampling Double-sampling In a double-sampling plan, a first

recap to this point foundations foundations foundations foundations genetics =

Boosting: Foundations and Algorithms Boosting: Foundations and Algorithms Boosting: Foundations

Newfound Water Quality Sampling: In Lake Sampling 8 Historic Sampling locations

Sampling Distributions Sampling Distribution of the Mean &amp; Hypothesis Testing Sampling

Overview of Sampling Topics (Shannon) sampling theorem Impulse-train sampling

Probabilistic Foundations of Statistical Network Analysis Chapter 3: Network sampling Harry Crane

Statistical guidelines for sampling Statistical guidelines for sampling marine avian populations

Probabilistic Foundations of Statistical Network Analysis Chapter 5: Statistical modeling paradigm

Surv rviving Restructure Welcome Surviving Restructure - Introductions Sandra Leek Catherine

THE 3-R'S OF DATA- THE 3-R'S OF DATA- SCIENCE: SCIENCE: REPEATABILITY REPEATABILITY, ,

Understanding parallel analysis methods for rank selection in PCA David Hong Yue Sheng Edgar

Explainable (Deep) Learning and Simulation approaches Torsten Mller Visualization and

Youth Involvement Team Brahmpreet Gulati Member of Leicester City Young Peoples Council

How to Make Best Use of Cross-Company Data for Web Effort Estimation? Leandro L. Minku

RCE-EM: from Citizen to Civic Science Linking our activities to quality education for

Automatic Presentations and Classes of Semigroups Graham Oliver University of Leicester Joint

Sambuz

Useful Links

Newsletter

Mail Us

Sampling Distributions Sampling Distribution of the Mean & Hypothesis Testing Sampling