STAT 113 Sampling, Randomization and Confounding Colin Reimer - - PowerPoint PPT Presentation

stat 113 sampling randomization and confounding
SMART_READER_LITE
LIVE PREVIEW

STAT 113 Sampling, Randomization and Confounding Colin Reimer - - PowerPoint PPT Presentation

Sampling Confounding Variables STAT 113 Sampling, Randomization and Confounding Colin Reimer Dawson Oberlin College September 6, 2019 1 / 12 Sampling Confounding Variables Sampling and Inference: The Big Picture 3 / 12 Sampling


slide-1
SLIDE 1

Sampling Confounding Variables

STAT 113 Sampling, Randomization and Confounding

Colin Reimer Dawson

Oberlin College

September 6, 2019 1 / 12

slide-2
SLIDE 2

Sampling Confounding Variables

Sampling and Inference: The “Big Picture”

3 / 12

slide-3
SLIDE 3

Sampling Confounding Variables

Population, Samples, and Inference

Population: All potential cases that we are interested in saying something about Sample: The set of cases we actually have data for (a subset of the population) Statistical Inference: Using sample data to

  • btain information about the population

For inference to be effective, samples ought to be representative of the population.

4 / 12

slide-4
SLIDE 4

Sampling Confounding Variables

Simple Random Sampling

  • To guard against sampling bias, we typically want to

collect a random sample. Versions...

  • Simple Random Sampling
  • Stratified Sampling
  • Cluster Sampling
  • Systematic Sampling

5 / 12

slide-5
SLIDE 5

Sampling Confounding Variables

Feasibility of Random Sampling

It is often not feasible to get a truly random sample. Options:

  • Sample from a subset of the population; generalize

conservatively

  • Collect a non-random sample, avoid bias related to

variables of interest 6 / 12

slide-6
SLIDE 6

Sampling Confounding Variables

Not all Non-Random Samples are Created Equal

You want to estimate the average hours per week that Oberlin students spend studying. None of the following is random; which would you go with? (a) Go to Mudd and use a RNG to select people to ask (b) Email every student and use all responses (c) Require all students in a random statistics class to respond (d) Go to the gym and ask everyone going in (e) Stand outside The Local and ask every fifth person entering 7 / 12

slide-7
SLIDE 7

Sampling Confounding Variables

Not all Non-Random Samples are Created Equal

None of the above are representative in every way, but some are more obviously non-representative for the variable we care about.

Sampling Bias is a problem when the chance a case has of being selected is associated with one or more of the variables being collected.

8 / 12

slide-8
SLIDE 8

Sampling Confounding Variables

“Don’t Ask Don’t Tell”

2010 CBS/NYT polls (when DADT was being reconsidered): “Do you favor or oppose homosexuals gay men and lesbians serving

  • penly in the military?”

Favor Oppose “homosexuals” 44% 42% “gay men and lesbians” 58% 28%

9 / 12

slide-9
SLIDE 9

Sampling Confounding Variables

Non-Sampling Bias

Other sources of bias not due to sampling procedure

  • Question wording
  • Non-response bias
  • Context

10 / 12

slide-10
SLIDE 10

Sampling Confounding Variables

Handout: Sell-Out Crowds and Home Team Wins

12 / 12