2.6 Statistical Inference ECON 480 Econometrics Fall 2020 Ryan - PowerPoint PPT Presentation

2.6 — Statistical Inference ECON 480 • Econometrics • Fall 2020 Ryan Safner Assistant Professor of Economics  safner@hood.edu  ryansafner/metricsF20  metricsF20.classes.ryansafner.com

Outline Why Uncertainty Matters Confidence Intervals Confidence Intervals Using the infer Package Hypothesis Testing Digression: p-Values and the Philosophy of Science

Why Uncertainty Matters

Recall: The Two Big Problems with Data We use econometrics to identify causal relationships and make inferences about them �. Problem for identification : endogeneity is exogenous if X cor ( x , u ) = 0 is endogenous if X cor ( x , u ) ≠ 0 �. Problem for inference : randomness Data is random due to natural sampling variation Taking one sample of a population will yield slightly different information than another sample of the same population

Distributions of the OLS Estimators OLS estimators and are computed from a finite (specific) sample of data ^ ^ ( β 0 β 1 ) Our OLS model contains 2 sources of randomness : Modeled randomness : includes all factors affecting other than u Y X different samples will have different values of those other factors ( ) u i Sampling randomness : different samples will generate different OLS estimators Thus, are also random variables , with their own sampling distribution ^ β 1 ^ β 0 ,

The Two Problems: Where We're Heading...Ultimately Sample Population Unobserved Parameters → → ⏟ ⏟ statistical inference causal indenti fi cation We want to identify causal relationships between population variables Logically first thing to consider Endogeneity problem We'll use sample statistics to infer something about population parameters In practice, we'll only ever have a finite sample distribution of data We don't know the population distribution of data Randomness problem

Why Sample vs. Population Matters Population Population relationship Y i = 3.24 + 0.44 X i + u i Y i = β 0 + β 1 X i + u i

Why Sample vs. Population Matters Sample 1: 30 random individuals Population relationship Y i = 3.24 + 0.44 X i + u i Sample relationship Y ̂ = 3.19 + 0.47 X i i

Why Sample vs. Population Matters Let's repeat this process 10,000 times ! This exercise is called a (Monte Carlo) simulation I'll show you how to do this next class with the infer package

Why Sample vs. Population Matters On average estimated regression lines from our hypothetical samples provide an unbiased estimate of the true population regression line ^ E [ β 1 ] = β 1 However, any individual line (any one sample) can miss the mark This leads to uncertainty about our estimated regression line Remember, we only have one sample in reality! This is why we care about the standard error of our line: ! ^ se ( β 1 )

Confidence Intervals

Statistical Inference statistical inference causal indenti fi cation Sample Population Unobserved Parameters − − − − − − − − − − − → − − − − − − − − − − − − →

Statistical Inference statistical inference causal indenti fi cation Sample Population Unobserved Parameters − − − − − − − − − − − → − − − − − − − − − − − − → So what we naturally want to start doing is inferring what the true population regression model is, using our estimated regression model from our sample 🤟 hopefully � ^ ^ ^ Y i = β 0 + β 1 X − − − − − − − − − → Y i = β 0 + β 1 X + u i We can’t yet make causal inferences about whether/how causes X Y coming after the midterm!

Estimation and Statistical Inference Our problem with uncertainty is we don’t know whether our sample estimate is close or far from the unknown population parameter But we can use our errors to learn how well our model statistics likely estimate the true parameters Use and its standard error, for statistical inference about true ^ ^ β 1 se ( β 1 ) β 1 We have two options...

Estimation and Statistical Inference Point estimate Confidence interval Use our and to determine Use and to create an range of ^ ^ ^ ^ β 1 se ( β 1 ) β 1 se ( β 1 ) whether we have statistically significant values that gives us a good chance of evidence to reject a hypothesized capturing the true β 1 β 1

Accuracy vs. Precision More typical in econometrics to do hypothesis testing (next class)

Generating Confidence Intervals We can generate our confidence interval by generating a “bootstrap” sampling distribution This takes our sample data, and resamples it by selecting random observations with replacement This allows us to approximate the sampling distribution of by ^ β 1 simulation!

Confidence Intervals Using the infer Package

Confidence Intervals Using the infer Package The infer package allows you to do statistical inference in a tidy way, following the philosophy of the tidyverse # install first! install.packages("infer") # load library (infer)

Confidence Intervals with the infer Package I infer allows you to run through these steps manually to understand the process: �. specify() a model �. generate() a bootstrap distribution �. calculate() the confidence interval �. visualize() with a histogram (optional)

Confidence Intervals with the infer Package II

Bootstrapping Our Sample Another “Sample” term estimate std.error term estimate std.error <chr> <dbl> <dbl> <chr> <dbl> <dbl> (Intercept) 698.932952 9.4674914 (Intercept) 708.270835 9.5041448 str -2.279808 0.4798256 str -2.797334 0.4802065 2 rows | 1-3 of 5 columns 2 rows | 1-3 of 5 columns 👇 Bootstrapped from Our Sample Now we want to do this 1,000 times to simulate the unknown sampling distribution of β 1 ^

The infer Pipeline: Specify

The infer Pipeline: Specify Take our data and pipe it into the specify() Specify function, which is essentially a lm() function for data %>% specify(y ~ x) regression (for our purposes) CASchool %>% specify(testscr ~ str) testscr str <dbl> <dbl> 690.80 17.88991 661.20 21.52466 643.60 18.69723 647.70 17.35714 640.85 18.67133 5 rows

The infer Pipeline: Generate

The infer Pipeline: Generate Now the magic starts, as we run a number of Specify simulated samples Generate Set the number of reps and set type to %>% generate(reps = n, "bootstrap" type = "bootstrap") CASchool %>% specify(testscr ~ str) %>% generate(reps = 1000, type = "bootstrap")

The infer Pipeline: Generate replicate testscr str <int> <dbl> <dbl> Specify 1 642.20 19.22221 1 664.15 19.93548 Generate 1 671.60 20.34927 1 640.90 19.59016 1 677.25 19.34853 %>% generate(reps = n, 1 672.20 20.20000 type = "bootstrap") 1 621.40 22.61905 1 657.00 20.86808 1 664.95 25.80000 1 635.20 17.75499 1-10 of 10,000 rows Previous 1 2 3 4 5 6 ... 1000 Next replicate : the “sample” number (1-1000) creates x and y values (data points)

The infer Pipeline: Calculate CASchool %>% Specify specify(testscr ~ str) %>% generate(reps = 1000, type = "bootstrap") %>% Generate calculate(stat = "slope") Calculate For each of the 1,000 replicates, calculate slope in lm(testscr ~ str) %>% calculate(stat = Calls it the stat "slope")

The infer Pipeline: Calculate replicate stat <int> <dbl> Specify 1 -3.0370939 2 -2.2228021 Generate 3 -2.6601745 4 -3.5696240 5 -2.0007488 Calculate 6 -2.0979764 7 -1.9015875 %>% calculate(stat = 8 -2.5362338 9 -2.3061820 "slope") 10 -1.9369460 1-10 of 1,000 rows Previous 1 2 3 4 5 6 ... 100 Next

The infer Pipeline: Calculate boot <- CASchool %>% #<< # save this Specify specify(testscr ~ str) %>% generate(reps = 1000, type = "bootstrap") %>% Generate calculate(stat = "slope") Calculate boot is (our simulated) sampling distribution of ! ^ β 1 %>% calculate(stat = We can now use this to estimate the confidence "slope") interval from our ^ β 1 = − 2.28 And visualize it

Confidence Interval A 95% confidence interval is the middle sampling_dist<-ggplot(data = boot)+ aes(x = stat)+ 95% of the sampling distribution geom_histogram(color="white", fill = "#e64173 labs(x = expression(hat(beta[1])))+ theme_pander(base_family = "Fira Sans Condens lower upper base_size=20) <dbl> <dbl> sampling_dist -3.340545 -1.238815 1 row

2.6 Statistical Inference ECON 480 Econometrics Fall 2020 Ryan - PowerPoint PPT Presentation

2.6 Statistical Inference ECON 480 Econometrics Fall 2020 Ryan Safner Assistant Professor of Economics safner@hood.edu ryansafner/metricsF20 metricsF20.classes.ryansafner.com Outline Why Uncertainty Matters Confidence

STAT 401A - Statistical Methods for Research Workers Statistical Inference Jarad Niemi (Dr. J)

Foundations for Inference I Dajiang Liu @PHS525 Feb-09-2016 Statistical Inference

UQ, STAT2201, 2017, Lecture 6 Unit 6 Statistical Inference Ideas. 1 Statistical Inference is

Statistical Natural Language Processing Statistical models: learning, inference, estimation,

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Statistical Statistical Statistical Model Statistical Model Model Checking Model Checking

Lifted Inference in Statistical Relational Models Guy Van den Broeck BUDA Invited Tutorial June

Statistical Inference https://people.bath.ac.uk/masss/APTS/apts.html Simon Shaw University of

COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 23. PGM

Post-Selection Inference Todd Kuffner Washington University in St. Louis PhyStat 2016

Soft Inference and Posterior Marginals September 19, 2013 Soft vs. Hard Inference Hard

Type Inference 75 Definition Type Inference Type inference = Java compiler's ability

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Exact Inference Inference Basic task for inference: Compute

Statistical graphics with Statistical graphics with ggplot2 ggplot2 Programming for Statistical

Modes of Statistical Inference for Causal Efgects Plus an overview of the testing based approach

Recent Results in Neutrino Physics Seventh Topical Seminar on The Legacy of Lep and SLC Siena 8-

Third EU Health Programme 2014-2020 National Info day Rome, 16 March 2016 Irne Athanassoudis

Uncertainty J2P216 SE: International Cooperation and Conflict April 28/May 6, 2016 Reto West

British Management of Territorial Disputes and Lessons for the Japanese Government TALK AT THE

A Heart For Your heart is filled with the joy of Christmas, so Im scheduling you for

Forthcoming Requirements on Software Verification Patrick COUSOT cole Normale Suprieure 45

OP OPUL ULENCE ENCE And this hope will never disappoint us, because God has poured out his

Review 68. What Messianic Prophecy is found in Obadiah? Exaltation of Israel & Mt. Zion, A

2.6 Statistical Inference ECON 480 Econometrics Fall 2020 Ryan - PowerPoint PPT Presentation

2.6 Statistical Inference ECON 480 Econometrics Fall 2020 Ryan Safner Assistant Professor of Economics safner@hood.edu ryansafner/metricsF20 metricsF20.classes.ryansafner.com Outline Why Uncertainty Matters Confidence

STAT 401A - Statistical Methods for Research Workers Statistical Inference Jarad Niemi (Dr. J)

Foundations for Inference I Dajiang Liu @PHS525 Feb-09-2016 Statistical Inference

UQ, STAT2201, 2017, Lecture 6 Unit 6 Statistical Inference Ideas. 1 Statistical Inference is

Statistical Natural Language Processing Statistical models: learning, inference, estimation,

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Statistical Statistical Statistical Model Statistical Model Model Checking Model Checking

Lifted Inference in Statistical Relational Models Guy Van den Broeck BUDA Invited Tutorial June

Statistical Inference https://people.bath.ac.uk/masss/APTS/apts.html Simon Shaw University of

COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 23. PGM

Post-Selection Inference Todd Kuffner Washington University in St. Louis PhyStat 2016

Soft Inference and Posterior Marginals September 19, 2013 Soft vs. Hard Inference Hard

Type Inference 75 Definition Type Inference Type inference = Java compiler's ability

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Exact Inference Inference Basic task for inference: Compute

Statistical graphics with Statistical graphics with ggplot2 ggplot2 Programming for Statistical

Modes of Statistical Inference for Causal Efgects Plus an overview of the testing based approach

Recent Results in Neutrino Physics Seventh Topical Seminar on The Legacy of Lep and SLC Siena 8-

Third EU Health Programme 2014-2020 National Info day Rome, 16 March 2016 Irne Athanassoudis

Uncertainty J2P216 SE: International Cooperation and Conflict April 28/May 6, 2016 Reto West

British Management of Territorial Disputes and Lessons for the Japanese Government TALK AT THE

A Heart For Your heart is filled with the joy of Christmas, so Im scheduling you for

Forthcoming Requirements on Software Verification Patrick COUSOT cole Normale Suprieure 45

OP OPUL ULENCE ENCE And this hope will never disappoint us, because God has poured out his

Review 68. What Messianic Prophecy is found in Obadiah? Exaltation of Israel &amp; Mt. Zion, A

Review 68. What Messianic Prophecy is found in Obadiah? Exaltation of Israel & Mt. Zion, A