political science 209 fall 2018
play

Political Science 209 - Fall 2018 Uncertainty Florian Hollenbach - PowerPoint PPT Presentation

Political Science 209 - Fall 2018 Uncertainty Florian Hollenbach 2nd December 2018 Statistical Inference Goal: trying to estimate something unobservable from observable data What we want to estimate: parameter unobservable What you do


  1. Political Science 209 - Fall 2018 Uncertainty Florian Hollenbach 2nd December 2018

  2. Statistical Inference Goal: trying to estimate something unobservable from observable data What we want to estimate: parameter θ � unobservable What you do observe: data Florian Hollenbach 1

  3. Statistical Inference Goal: trying to estimate something unobservable from observable data What we want to estimate: parameter θ � unobservable What you do observe: data We use data to compute an estimate of the parameter ˆ θ Florian Hollenbach 1

  4. Parameters and Estimators • parameter: the quantity that we are interested in Florian Hollenbach 2

  5. Parameters and Estimators • parameter: the quantity that we are interested in • estimator: method to compute parameter of interest Florian Hollenbach 2

  6. Parameters and Estimators Example: • parameter: support for Jimbo Fisher in student population • estimator: sample proportion of support as estimator Florian Hollenbach 3

  7. Parameters and Estimators Example: • parameter: average causal effect of aspirin on headache • estimator: difference in mean between treatment and control Florian Hollenbach 4

  8. Quality of estimators For the rest of the semester the question becomes: How good is our estimator? Florian Hollenbach 5

  9. Quality of estimators For the rest of the semester the question becomes: How good is our estimator? 1. How close in expectation is the estimator to the truth? 2. How certain or uncertain are we about the estimate? Florian Hollenbach 5

  10. Quality of estimators How good is ˆ θ as an estimate of θ ? • Ideally, we want to know estimation error = ˆ θ − θ truth But we can never calculate this. Why? Florian Hollenbach 6

  11. Quality of estimators How good is ˆ θ as an estimate of θ ? • Ideally, we want to know estimation error = ˆ θ − θ truth But we can never calculate this. Why? θ truth is unknown If we knew what the truth was, we didn’t need an estimate Florian Hollenbach 6

  12. Quality of estimators Instead, we consider two hypothetical scenarios: 1. How well would ˆ θ perform over repeated data generating processes ? (bias) 2. How well would ˆ θ perform as the sample size goes to infinity? (consistency) Florian Hollenbach 7

  13. Bias • Imagine the estimate being a random variable itself • Drawing infinitely many samples of students asking about Jimbo What is the average of the sample average? Or what is the expectation of the estimator? bias = E (estimation error) = E (estimate - truth) = E ( ¯ X ) - p = p - p = 0 Florian Hollenbach 8

  14. Bias - Important An unbiased estimator does not mean that it is always exactly correct! Florian Hollenbach 9

  15. Bias - Important An unbiased estimator does not mean that it is always exactly correct! To remember: bias measures whether in expectation (on average) the estimator is giving us the truth Florian Hollenbach 9

  16. Consistency Essentially saying that the law of large numbers applies to the estimator, i.e.: An estimator is said to be consistent if it converges to the parameter (truth) if N goes to ∞ Florian Hollenbach 10

  17. Variability Next, we have to consider how certain we are about our results Consider two estimators: 1. slightly biased , on average off by a bit, but always by the same margin 2. unbiased, but misses target left and right Florian Hollenbach 11

  18. Variability (Encyclopedia of Machine Learning) Florian Hollenbach 12

  19. Variability We characterize the variability of an estimator by using the standard deviation of the sampling distribution How do we find that???? Florian Hollenbach 13

  20. Variability We characterize the variability of an estimator by using the standard deviation of the sampling distribution How do we find that???? Remember, the sampling distribution is the distribution of our statistic over hypothetical infinitely many samples Florian Hollenbach 13

  21. Variability Florian Hollenbach 14

  22. Standard Error We estimate the standard deviation of the sampling distribution from the observed data standard error Florian Hollenbach 15

  23. Standard Error We estimate the standard deviation of the sampling distribution from the observed data standard error “ standard error and describes the (estimated) average degree to which an estimator deviates from its expected value” (Imai 2017) Florian Hollenbach 15

  24. Polling Example Say we took a sample of 1500 students and asked whether they support Jimbo or not Define a random variable X i = 1 if student i supports Jimbo, X i = 0 if not Florian Hollenbach 16

  25. Polling Example Say we took a sample of 1500 students and asked whether they support Jimbo or not Define a random variable X i = 1 if student i supports Jimbo, X i = 0 if not Binomial distribution with success probability p and size N where p is the proportion of all students who support Jimbo (population dist) Florian Hollenbach 16

  26. Polling Example Estimator: ? Florian Hollenbach 17

  27. Polling Example � N Estimator: X = 1 i = 1 X i N Florian Hollenbach 18

  28. Polling Example � N Estimator: X = 1 i = 1 X i N In earlier notation: θ truth = p and θ = X Florian Hollenbach 18

  29. Polling Example Estimator: X = 1 � N i = 1 X i N 1. LLN: X − → p (consistent) 2. Expectation: E ( X ) = p (unbiased) 3. standard error? Florian Hollenbach 19

  30. Polling Example - standard error X i are i.i.d Bernoulli random variables with probability = p N 2 V ( � N � N 1 1 V ( X ) = i = 1 V ( X i ) i = 1 X i ) = N 2 Florian Hollenbach 20

  31. Polling Example - standard error X i are i.i.d Bernoulli random variables with probability = p N 2 V ( � N � N 1 1 i = 1 V ( X i ) = N V ( X ) = N 2 V ( X ) i = 1 X i ) = N 2 Florian Hollenbach 21

  32. Polling Example - standard error X i are i.i.d Bernoulli random variables with probability = p N 2 V ( � N � N N 2 V ( X ) = p × ( 1 − p ) 1 1 i = 1 V ( X i ) = N V ( X ) = i = 1 X i ) = N 2 N Florian Hollenbach 22

  33. Polling Example - standard error V ( X ) = p × ( 1 − p ) N � V ( X ) Standard error: But we don’t know p! Now what? Florian Hollenbach 23

  34. Polling Example - standard error V ( X ) = p × ( 1 − p ) N � V ( X ) Standard error: But we don’t know p! Now what? We use our unbiased estimate of p: X Florian Hollenbach 23

  35. Polling Example - standard error estimate � � � X ( 1 − X ) V ( X ) = N Florian Hollenbach 24

  36. Polling Example - standard error estimate Assume in our sample 55% of students support Jimbo: � � � � 0 . 55 × ( 1 − 0 . 55 ) 0 . 55 × ( 0 . 45 ) V ( X ) = SE = = = 0 . 013 1500 1500 We can expect our estimate on average to be off by 1.3 percentage points Florian Hollenbach 25

  37. Polling Example - standard error estimate Assume in our sample 55% of students support Jimbo: � � � � 0 . 55 × ( 1 − 0 . 55 ) 0 . 55 × ( 0 . 45 ) V ( X ) = SE = = = 0 . 013 1500 1500 We can expect our estimate on average to be off by 1.3 percentage points If X = 0.8, then SE = 0.010 If N = 500, X = 0.55, then SE = 0.022 Florian Hollenbach 25

  38. Standard error estimate Standard error is based on variance of the sampling distribution Gives estimate of uncertainty Each estimator/statistic has unique sampling distribution, e.g. difference in means Florian Hollenbach 26

  39. Confidence Intervals Often we don’t even know the sampling distribution of our estimators How could we approximate it? Florian Hollenbach 27

  40. Confidence Intervals Often we don’t even know the sampling distribution of our estimators How could we approximate it? Central limit theorem! Florian Hollenbach 27

  41. Confidence Intervals Central limit theorem says: X ≈ N ( E ( X ) , V ( X ) N ) regardless of distribution of X Florian Hollenbach 28

  42. Confidence Intervals We can use the approximation to the sampling distribution, X ≈ N ( E ( X ) , V ( X ) N ) to construct confidence intervals Confidence intervals give a range of values that is likely to contain the true value Florian Hollenbach 29

  43. Confidence Intervals We can use the approximation to the sampling distribution, X ≈ N ( E ( X ) , V ( X ) N ) to construct confidence intervals Confidence intervals give a range of values that is likely to contain the true value To start, we select a probability value for our confidence level: usually 95% Florian Hollenbach 29

  44. Confidence Intervals The 95% confidence interval specifies the range of values in which the true parameter will fall for 95% of our hypothetical samples/experiments Florian Hollenbach 30

  45. Confidence Intervals The 95% confidence interval specifies the range of values in which the true parameter will fall for 95% of our hypothetical samples/experiments Put differently “Over a hypothetically repeated data generating process, confidence intervals contain the true value of parameter with the probability specified by the confidence level” (Imai 2017) Florian Hollenbach 30

  46. Confidence interval (1- α ) large sample Confidence interval is defined as: CI( α ) = X − z α 2 × SE , X + z α 2 × SE 2 is the critical value which equals (1 2 ) quantile of the standard z α α normal distribution Florian Hollenbach 31

  47. Confidence interval Where do the critical values come from? Florian Hollenbach 32

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend