Welcome to the course! Mine Cetinkaya-Rundel Associate Professor of - - PowerPoint PPT Presentation

welcome to the course
SMART_READER_LITE
LIVE PREVIEW

Welcome to the course! Mine Cetinkaya-Rundel Associate Professor of - - PowerPoint PPT Presentation

DataCamp Inference for Numerical Data in R INFERENCE FOR NUMERICAL DATA IN R Welcome to the course! Mine Cetinkaya-Rundel Associate Professor of the Practice, Duke University DataCamp Inference for Numerical Data in R Rent in Manhattan On a


slide-1
SLIDE 1

DataCamp Inference for Numerical Data in R

Welcome to the course!

INFERENCE FOR NUMERICAL DATA IN R

Mine Cetinkaya-Rundel

Associate Professor of the Practice, Duke University

slide-2
SLIDE 2

DataCamp Inference for Numerical Data in R

Rent in Manhattan

On a given day, twenty 1 BR apartments were randomly selected on Craigslist Manhattan from apartments listed as "by

  • wner" (as opposed to by a rental

agency). Is the mean or the median a better measure of typical rent in Manhattan?

slide-3
SLIDE 3

DataCamp Inference for Numerical Data in R

Bootstrapping techniques

Assume the data is representative Pulling oneself up by one's bootstraps

slide-4
SLIDE 4

DataCamp Inference for Numerical Data in R

Observed sample

sample median = $2,350

slide-5
SLIDE 5

DataCamp Inference for Numerical Data in R

Bootstrap population

slide-6
SLIDE 6

DataCamp Inference for Numerical Data in R

Bootstraping scheme

  • 1. Take a bootstrap sample - a random sample taken with replacement from the
  • riginal sample, of the same size as the original sample.
  • 2. Calculate the bootstrap statistic - a statistic such as mean, median, proportion,
  • etc. computed on the bootstrap samples.
  • 3. Repeat steps (1) and (2) many times to create a bootstrap distribution - a

distribution of bootstrap statistics.

slide-7
SLIDE 7

DataCamp Inference for Numerical Data in R

Bootstraping scheme, in R

library(infer) ___ %>% # start with data frame specify(response = ___) %>% # specify the variable of interest

slide-8
SLIDE 8

DataCamp Inference for Numerical Data in R

Bootstraping scheme, in R

library(infer) ___ %>% # start with data frame specify(response = ___) %>% # specify the variable of interest generate(reps = ___, type = "bootstrap") %>% # generate bootstrap samples

slide-9
SLIDE 9

DataCamp Inference for Numerical Data in R

Bootstraping scheme, in R

library(infer) ___ %>% # start with data frame specify(response = ___) %>% # specify the variable of interest generate(reps = ___, type = "bootstrap") %>% # generate bootstrap samples calculate(stat = "___") # calculate bootstrap statistic

slide-10
SLIDE 10

DataCamp Inference for Numerical Data in R

Constructing the bootstrap interval

library(infer) ___ %>% # start with data frame specify(response = ___) %>% # specify the variable of interest generate(reps = ___, type = "bootstrap") %>% # generate bootstrap samples calculate(stat = "___") # calculate bootstrap statistic

slide-11
SLIDE 11

DataCamp Inference for Numerical Data in R

Constructing the bootstrap interval

library(infer) ___ %>% # start with data frame specify(response = ___) %>% # specify the variable of interest generate(reps = ___, type = "bootstrap") %>% # generate bootstrap samples calculate(stat = "___") # calculate bootstrap statistic

slide-12
SLIDE 12

DataCamp Inference for Numerical Data in R

Let's practice!

INFERENCE FOR NUMERICAL DATA IN R

slide-13
SLIDE 13

DataCamp Inference for Numerical Data in R

Review: Percentile and standard error methods

INFERENCE FOR NUMERICAL DATA IN R

Mine Cetinkaya-Rundel

Associate Professor of the Practice, Duke University

slide-14
SLIDE 14

DataCamp Inference for Numerical Data in R

Bootstrap distribution

slide-15
SLIDE 15

DataCamp Inference for Numerical Data in R

Percentile method

slide-16
SLIDE 16

DataCamp Inference for Numerical Data in R

Percentile method

slide-17
SLIDE 17

DataCamp Inference for Numerical Data in R

Standard error method

sample statistic ± t × SE df for t is n − 1, where n is the sample size SE is the standard deviation of the bootstrap distribution distribution

df=n−1 ∗ boot ∗ boot

slide-18
SLIDE 18

DataCamp Inference for Numerical Data in R

Let's practice!

INFERENCE FOR NUMERICAL DATA IN R

slide-19
SLIDE 19

DataCamp Inference for Numerical Data in R

Re-centering a bootstrap distribution for hypothesis testing

INFERENCE FOR NUMERICAL DATA IN R

Mine Cetinkaya-Rundel

Associate Professor of the Practice, Duke University

slide-20
SLIDE 20

DataCamp Inference for Numerical Data in R

Re-centering a bootstrap distribution for hypothesis testing

Bootstrap distributions are by design centered at the observed sample statistic. However since in a hypothesis test we assume that H is true, we shift the bootstrap distribution to be centered at the null value. p-value = The proportion of simulations that yield a sample statistic at least as favorable to the alternative hypothesis as the observed sample statistic.

slide-21
SLIDE 21

DataCamp Inference for Numerical Data in R

Re-centering the bootstrap distribution - sketch

slide-22
SLIDE 22

DataCamp Inference for Numerical Data in R

Let's practice!

INFERENCE FOR NUMERICAL DATA IN R