Political Science 209 - Fall 2018 Uncertainty Florian Hollenbach - PowerPoint PPT Presentation

Political Science 209 - Fall 2018 Uncertainty Florian Hollenbach 2nd December 2018

Statistical Inference Goal: trying to estimate something unobservable from observable data What we want to estimate: parameter θ � unobservable What you do observe: data Florian Hollenbach 1

Statistical Inference Goal: trying to estimate something unobservable from observable data What we want to estimate: parameter θ � unobservable What you do observe: data We use data to compute an estimate of the parameter ˆ θ Florian Hollenbach 1

Parameters and Estimators • parameter: the quantity that we are interested in Florian Hollenbach 2

Parameters and Estimators • parameter: the quantity that we are interested in • estimator: method to compute parameter of interest Florian Hollenbach 2

Parameters and Estimators Example: • parameter: support for Jimbo Fisher in student population • estimator: sample proportion of support as estimator Florian Hollenbach 3

Parameters and Estimators Example: • parameter: average causal effect of aspirin on headache • estimator: difference in mean between treatment and control Florian Hollenbach 4

Quality of estimators For the rest of the semester the question becomes: How good is our estimator? Florian Hollenbach 5

Quality of estimators For the rest of the semester the question becomes: How good is our estimator? 1. How close in expectation is the estimator to the truth? 2. How certain or uncertain are we about the estimate? Florian Hollenbach 5

Quality of estimators How good is ˆ θ as an estimate of θ ? • Ideally, we want to know estimation error = ˆ θ − θ truth But we can never calculate this. Why? Florian Hollenbach 6

Quality of estimators How good is ˆ θ as an estimate of θ ? • Ideally, we want to know estimation error = ˆ θ − θ truth But we can never calculate this. Why? θ truth is unknown If we knew what the truth was, we didn’t need an estimate Florian Hollenbach 6

Quality of estimators Instead, we consider two hypothetical scenarios: 1. How well would ˆ θ perform over repeated data generating processes ? (bias) 2. How well would ˆ θ perform as the sample size goes to infinity? (consistency) Florian Hollenbach 7

Bias • Imagine the estimate being a random variable itself • Drawing infinitely many samples of students asking about Jimbo What is the average of the sample average? Or what is the expectation of the estimator? bias = E (estimation error) = E (estimate - truth) = E ( ¯ X ) - p = p - p = 0 Florian Hollenbach 8

Bias - Important An unbiased estimator does not mean that it is always exactly correct! Florian Hollenbach 9

Bias - Important An unbiased estimator does not mean that it is always exactly correct! To remember: bias measures whether in expectation (on average) the estimator is giving us the truth Florian Hollenbach 9

Consistency Essentially saying that the law of large numbers applies to the estimator, i.e.: An estimator is said to be consistent if it converges to the parameter (truth) if N goes to ∞ Florian Hollenbach 10

Variability Next, we have to consider how certain we are about our results Consider two estimators: 1. slightly biased , on average off by a bit, but always by the same margin 2. unbiased, but misses target left and right Florian Hollenbach 11

Variability (Encyclopedia of Machine Learning) Florian Hollenbach 12

Variability We characterize the variability of an estimator by using the standard deviation of the sampling distribution How do we find that???? Florian Hollenbach 13

Variability We characterize the variability of an estimator by using the standard deviation of the sampling distribution How do we find that???? Remember, the sampling distribution is the distribution of our statistic over hypothetical infinitely many samples Florian Hollenbach 13

Variability Florian Hollenbach 14

Standard Error We estimate the standard deviation of the sampling distribution from the observed data standard error Florian Hollenbach 15

Standard Error We estimate the standard deviation of the sampling distribution from the observed data standard error “ standard error and describes the (estimated) average degree to which an estimator deviates from its expected value” (Imai 2017) Florian Hollenbach 15

Polling Example Say we took a sample of 1500 students and asked whether they support Jimbo or not Define a random variable X i = 1 if student i supports Jimbo, X i = 0 if not Florian Hollenbach 16

Polling Example Say we took a sample of 1500 students and asked whether they support Jimbo or not Define a random variable X i = 1 if student i supports Jimbo, X i = 0 if not Binomial distribution with success probability p and size N where p is the proportion of all students who support Jimbo (population dist) Florian Hollenbach 16

Polling Example Estimator: ? Florian Hollenbach 17

Polling Example � N Estimator: X = 1 i = 1 X i N Florian Hollenbach 18

Polling Example � N Estimator: X = 1 i = 1 X i N In earlier notation: θ truth = p and θ = X Florian Hollenbach 18

Polling Example Estimator: X = 1 � N i = 1 X i N 1. LLN: X − → p (consistent) 2. Expectation: E ( X ) = p (unbiased) 3. standard error? Florian Hollenbach 19

Polling Example - standard error X i are i.i.d Bernoulli random variables with probability = p N 2 V ( � N � N 1 1 V ( X ) = i = 1 V ( X i ) i = 1 X i ) = N 2 Florian Hollenbach 20

Polling Example - standard error X i are i.i.d Bernoulli random variables with probability = p N 2 V ( � N � N 1 1 i = 1 V ( X i ) = N V ( X ) = N 2 V ( X ) i = 1 X i ) = N 2 Florian Hollenbach 21

Polling Example - standard error X i are i.i.d Bernoulli random variables with probability = p N 2 V ( � N � N N 2 V ( X ) = p × ( 1 − p ) 1 1 i = 1 V ( X i ) = N V ( X ) = i = 1 X i ) = N 2 N Florian Hollenbach 22

Polling Example - standard error V ( X ) = p × ( 1 − p ) N � V ( X ) Standard error: But we don’t know p! Now what? Florian Hollenbach 23

Polling Example - standard error V ( X ) = p × ( 1 − p ) N � V ( X ) Standard error: But we don’t know p! Now what? We use our unbiased estimate of p: X Florian Hollenbach 23

Polling Example - standard error estimate � � � X ( 1 − X ) V ( X ) = N Florian Hollenbach 24

Polling Example - standard error estimate Assume in our sample 55% of students support Jimbo: � � � � 0 . 55 × ( 1 − 0 . 55 ) 0 . 55 × ( 0 . 45 ) V ( X ) = SE = = = 0 . 013 1500 1500 We can expect our estimate on average to be off by 1.3 percentage points Florian Hollenbach 25

Polling Example - standard error estimate Assume in our sample 55% of students support Jimbo: � � � � 0 . 55 × ( 1 − 0 . 55 ) 0 . 55 × ( 0 . 45 ) V ( X ) = SE = = = 0 . 013 1500 1500 We can expect our estimate on average to be off by 1.3 percentage points If X = 0.8, then SE = 0.010 If N = 500, X = 0.55, then SE = 0.022 Florian Hollenbach 25

Standard error estimate Standard error is based on variance of the sampling distribution Gives estimate of uncertainty Each estimator/statistic has unique sampling distribution, e.g. difference in means Florian Hollenbach 26

Confidence Intervals Often we don’t even know the sampling distribution of our estimators How could we approximate it? Florian Hollenbach 27

Confidence Intervals Often we don’t even know the sampling distribution of our estimators How could we approximate it? Central limit theorem! Florian Hollenbach 27

Confidence Intervals Central limit theorem says: X ≈ N ( E ( X ) , V ( X ) N ) regardless of distribution of X Florian Hollenbach 28

Confidence Intervals We can use the approximation to the sampling distribution, X ≈ N ( E ( X ) , V ( X ) N ) to construct confidence intervals Confidence intervals give a range of values that is likely to contain the true value Florian Hollenbach 29

Confidence Intervals We can use the approximation to the sampling distribution, X ≈ N ( E ( X ) , V ( X ) N ) to construct confidence intervals Confidence intervals give a range of values that is likely to contain the true value To start, we select a probability value for our confidence level: usually 95% Florian Hollenbach 29

Confidence Intervals The 95% confidence interval specifies the range of values in which the true parameter will fall for 95% of our hypothetical samples/experiments Florian Hollenbach 30

Confidence Intervals The 95% confidence interval specifies the range of values in which the true parameter will fall for 95% of our hypothetical samples/experiments Put differently “Over a hypothetically repeated data generating process, confidence intervals contain the true value of parameter with the probability specified by the confidence level” (Imai 2017) Florian Hollenbach 30

Confidence interval (1- α ) large sample Confidence interval is defined as: CI( α ) = X − z α 2 × SE , X + z α 2 × SE 2 is the critical value which equals (1 2 ) quantile of the standard z α α normal distribution Florian Hollenbach 31

Confidence interval Where do the critical values come from? Florian Hollenbach 32

Political Science 209 - Fall 2018 Uncertainty Florian Hollenbach - PowerPoint PPT Presentation

Political Science 209 - Fall 2018 Uncertainty Florian Hollenbach 2nd December 2018 Statistical Inference Goal: trying to estimate something unobservable from observable data What we want to estimate: parameter unobservable What you do

Political Science 209 - Fall 2018 Causal Inference Florian Hollenbach 7th September 2018 Causal

Political Science 209 - Fall 2018 Observational Studies Florian Hollenbach 24th September 2018

Political Science 209 - Fall 2018 Probability Florian Hollenbach 26th October 2018 Why

Political Science 209 - Fall 2018 Probability II Florian Hollenbach 8th November 2018

Political Science 209 - Fall 2018 Probability III Florian Hollenbach 11th November 2018 Random

Political Science 209 - Fall 2018 Linear Regression Florian Hollenbach 22nd October 2018

Political Science 209 - Fall 2018 Prediction Florian Hollenbach 9th October 2018 In-class

Political Science 209 - Fall 2018 Linear Regression Florian Hollenbach 12th October 2018 Recall

Political Science 209 - Fall 2018 Hypothesis Testing Florian Hollenbach 30th November 2018

Introduction to Geometry Return to Table of Contents Slide 6 / 209 The Origin of Geometry

July 2019 POLITICAL MONITOR 1 1 Ipsos MORI Political Monitor | Public Ipsos MORI Political

Political Communication: Political Advertising POLS 418 MWF 10:00-10:50 Drew Seib February 16,

Amtrak Marketing and Sales and PRIIA Section 209 Standing Committee on Rail Transportation Matt

Greater Tshwane SANCO Regional General Council Time Square Casino,209 Aramist,Menlyn 16 March

THERMODYNAMICS Course No: ME 209 Department: Mechanical Engineering Instructor: U. N.

Product Specifications - Written Product Name: Part Numbers: 640-209-58123 (Sandstone),

Sampling and Estimation in Network Graphs Gonzalo Mateos Dept. of ECE and Goergen Institute for

Sampling in Practice GESIS Survey Guidelines Sabine Hder These slides are based on the GESIS

Data Analysis and Uncertainty Part 3: Hypothesis Testing/Sampling Instructor: Sargur N. Srihari

DS504/CS586: Big Data Analytics Data acquisition and measurement Prof. Yanhua Li Time: 6:00pm

Logistics and Such COGS 105 Research Methods for Cognitive Scientists Exam date now posted.

Sampling and Representativeness Department of Government London School of Economics and

Power and Limitations of Opinion Polls Rajeeva L. Karandikar Director Chennai Mathematical

. Power Analysis for Logistic