Parameters and confidence inter v als FOU N DATION S OF IN FE R E N - - PowerPoint PPT Presentation

parameters and confidence inter v als
SMART_READER_LITE
LIVE PREVIEW

Parameters and confidence inter v als FOU N DATION S OF IN FE R E N - - PowerPoint PPT Presentation

Parameters and confidence inter v als FOU N DATION S OF IN FE R E N C E Jo Hardin Instr u ctor Research q u estions H y pothesis test Con dence inter v al Under w hich diet plan w ill participants lose Ho w m u ch sho u ld participants e x


slide-1
SLIDE 1

Parameters and confidence intervals

FOU N DATION S OF IN FE R E N C E

Jo Hardin

Instructor

slide-2
SLIDE 2

FOUNDATIONS OF INFERENCE

Research questions

Hypothesis test Condence interval Under which diet plan will participants lose more weight on average? How much should participants expect to lose on average? Which of two car manufacturers are users more likely to recommend to their friends? What percent of users are likely to recommend Subaru to their friends? Are education level and average income linearly related? For each additional year of education, what is the predicted average income?

slide-3
SLIDE 3

FOUNDATIONS OF INFERENCE

Parameter

A parameter is a numerical value from the population Examples (continued): The true average amount all dieters will lose on a particular program The proportion of individuals in a population who recommend Subaru cars The average income of all individuals in the population with a particular education level

slide-4
SLIDE 4

FOUNDATIONS OF INFERENCE

Confidence interval

Range of numbers that (hopefully) captures the true parameter "95% condent that between 12% and 34% of the entire population recommends Subarus"

slide-5
SLIDE 5

Let's practice!

FOU N DATION S OF IN FE R E N C E

slide-6
SLIDE 6

Bootstrapping

FOU N DATION S OF IN FE R E N C E

Jo Hardin

Instructor

slide-7
SLIDE 7

FOUNDATIONS OF INFERENCE

Hypothesis testing

How do samples from the null population vary? Statistic, proportion of successes in sample → Parameter, proportion of successes in population → p

p ^

slide-8
SLIDE 8

FOUNDATIONS OF INFERENCE

Confidence intervals

No null population, unlike in hypothesis testing How do p and vary?

p ^

slide-9
SLIDE 9

FOUNDATIONS OF INFERENCE

slide-10
SLIDE 10

FOUNDATIONS OF INFERENCE

slide-11
SLIDE 11

FOUNDATIONS OF INFERENCE

slide-12
SLIDE 12

FOUNDATIONS OF INFERENCE

slide-13
SLIDE 13

FOUNDATIONS OF INFERENCE

slide-14
SLIDE 14

FOUNDATIONS OF INFERENCE

slide-15
SLIDE 15

FOUNDATIONS OF INFERENCE

slide-16
SLIDE 16

FOUNDATIONS OF INFERENCE

slide-17
SLIDE 17

FOUNDATIONS OF INFERENCE

slide-18
SLIDE 18

FOUNDATIONS OF INFERENCE

Polling

# Original data Source: local data frame [30 x 3] flip_num flip <int> <chr> 1 1 H 2 2 H 3 3 H 4 4 T 5 5 H 6 6 H # ... with 24 more rows

Original data Candidate X Total voters Proportion X 17 30 0.5667

slide-19
SLIDE 19

FOUNDATIONS OF INFERENCE

Polling

# First resample Source: local data frame [30 x 3] replicate flip_num flip <dbl> <int> <chr> 1 1 7 H 2 1 17 T 3 1 13 H 4 1 14 H 5 1 24 H 6 1 28 T # ... with 24 more rows

First resample Candidate X Total voters Proportion X 17 30 0.5667 14 30 0.4667

slide-20
SLIDE 20

FOUNDATIONS OF INFERENCE

Polling

# Second resample Source: local data frame [30 x 3] replicate flip_num flip <dbl> <int> <chr> 1 2 21 H 2 2 19 T 3 2 25 H 4 2 24 T 5 2 21 H 6 2 28 T 7 2 13 H 8 2 23 H 9 2 24 T 10 2 24 T # ... with 20 more rows

Second resample Candidate X Total voters Proportion X 17 30 0.5667 14 30 0.4667 18 30 0.6

slide-21
SLIDE 21

FOUNDATIONS OF INFERENCE

Polling

# Third resample Source: local data frame [30 x 3] replicate flip_num flip <dbl> <int> <chr> 1 3 6 H 2 3 19 H 3 3 1 H 4 3 24 T 5 3 11 H 6 3 28 T 7 3 16 H 8 3 13 H 9 3 21 T 10 3 29 H # ... with 20 more rows

Third resample Candidate X Total voters Proportion X 17 30 0.5667 14 30 0.4667 18 30 0.6 12 30 0.4

slide-22
SLIDE 22

FOUNDATIONS OF INFERENCE

Standard error

Obtained standard error of 0.09 by resampling many times Describes how the statistic varies around parameter Bootstrap provides an approximation of the standard error

slide-23
SLIDE 23

FOUNDATIONS OF INFERENCE

Variability of p-hat from the population

# Compute p-hat for each poll ex1_props <- recommend %>% group_by(poll) %>% summarize(prop_yes = mean(vote == "yes")) # Variability of p-hat ex1_props %>% summarize(sd(prop_yes)) # A tibble: 1 × 1 `sd(prop_yes)` <dbl> 1 0.08523512

slide-24
SLIDE 24

FOUNDATIONS OF INFERENCE

Variability of p-hat from the sample (bootstrapping)

# Select one poll from which to resample

  • ne_poll <- all_polls %>%

filter(poll ==1) %>% select(vote) # Compute p-hat for each resampled poll ex2_props <- one_poll %>% specify(response = vote, success = "yes") %>% generate(reps = 1000, type = "bootstrap") # Variability of p-hat ex2_props %>% summarize(sd(stat)) # A tibble: 1 × 1 `sd(stat)` <dbl> 1 0.08691885

slide-25
SLIDE 25

Let's practice!

FOU N DATION S OF IN FE R E N C E

slide-26
SLIDE 26

Variability in p-hat

FOU N DATION S OF IN FE R E N C E

Jo Hardin

Instructor

slide-27
SLIDE 27

FOUNDATIONS OF INFERENCE

How far are the data from the parameter?

slide-28
SLIDE 28

FOUNDATIONS OF INFERENCE

How far are the data from the parameter?

slide-29
SLIDE 29

FOUNDATIONS OF INFERENCE

How far are the data from the parameter?

slide-30
SLIDE 30

FOUNDATIONS OF INFERENCE

Standard error of p-hat

slide-31
SLIDE 31

FOUNDATIONS OF INFERENCE

Empirical rule

slide-32
SLIDE 32

FOUNDATIONS OF INFERENCE

Empirical rule

slide-33
SLIDE 33

FOUNDATIONS OF INFERENCE

Empirical rule

slide-34
SLIDE 34

Let's practice!

FOU N DATION S OF IN FE R E N C E

slide-35
SLIDE 35

Interpreting CIs and technical conditions

FOU N DATION S OF IN FE R E N C E

Jo Hardin

Instructor

slide-36
SLIDE 36

FOUNDATIONS OF INFERENCE

Creating CIs

# Compare confidence intervals

  • ne_poll_boot %>% summarize(

lower = p_hat - 2 * sd(prop_yes_boot), upper = p_hat + 2 * sd(prop_yes_boot)) # A tibble: 1 × 2 lower upper <dbl> <dbl> 1 0.536148 0.863852 # Find 2.5% and 97.5% of p-hat vals

  • ne_poll_boot %>% summarize(

q025_prop = quantile(prop_yes_boot, p = .025), q975_prop = quantile(prop_yes_boot, p = .975)) # A tibble: 1 × 2 q025_prop q975_prop <dbl> <dbl> 1 0.5333333 0.8333333

slide-37
SLIDE 37

FOUNDATIONS OF INFERENCE

Motivating CIs

Goal is to nd the parameter when all we know is the statistic Never know whether the sample you collected actually contains the true parameter

slide-38
SLIDE 38

FOUNDATIONS OF INFERENCE

Interpreting the CIs

Bootstrap t-CI: (0.536, 0.864) Percentile interval: (0.533, 0.833) We are 95% condent that the true proportion of people planning to vote for candidate X is between 0.536 and 0.864 (or 0.533 and 0.833)

slide-39
SLIDE 39

FOUNDATIONS OF INFERENCE

Technical conditions

Sampling distribution of the statistic is reasonably symmetric and bell-shaped Sample size is reasonably large Variability of resampled proportions

slide-40
SLIDE 40

Let's practice!

FOU N DATION S OF IN FE R E N C E

slide-41
SLIDE 41

Summary of statistical inference

FOU N DATION S OF IN FE R E N C E

Jo Hardin

Instructor

slide-42
SLIDE 42

FOUNDATIONS OF INFERENCE

Inference

slide-43
SLIDE 43

FOUNDATIONS OF INFERENCE

Testing

H : There is no gender discrimination in hiring H : Men are more likely to be promoted than women

A

slide-44
SLIDE 44

FOUNDATIONS OF INFERENCE

Estimation

What proportion of the voters will select candidate X?

slide-45
SLIDE 45

FOUNDATIONS OF INFERENCE

Bootstrapping

slide-46
SLIDE 46

Congratulations!

FOU N DATION S OF IN FE R E N C E