Data Management Department of Political Science and Government - PowerPoint PPT Presentation

Weighting Missing Data Coding and Data Preparation Wrap-up Preview of Next Time Data Management Department of Political Science and Government Aarhus University November 24, 2014

Weighting Missing Data Coding and Data Preparation Wrap-up Preview of Next Time Data Management Weighting Handling missing data Categorizing missing data types Imputation Summary measures Scale construction Combining question branches Coding and editing Open-ended questions Marking problematic data Data preparation Codebook creation File formats Archiving, access, and rights

Weighting Missing Data Coding and Data Preparation Wrap-up Preview of Next Time Weighting 1 Missing Data 2 Coding and Data Preparation 3 Wrap-up 4 Preview of Next Time 5

Weighting Missing Data Coding and Data Preparation Wrap-up Preview of Next Time Goal of Survey Research The goal of survey research is to estimate population-level quantities (e.g., means, proportions, totals) Samples estimate those quantities with uncertainty (sampling error) Sample estimates are unbiased if they match population quantities

Weighting Missing Data Coding and Data Preparation Wrap-up Preview of Next Time Realities of Survey Research Sample may not match population for a variety of reasons: Due to constraints on design Due to sampling frame coverage Due to intentional over/under-sampling Due to nonresponse Due to sampling error

Weighting Missing Data Coding and Data Preparation Wrap-up Preview of Next Time Realities of Survey Research Sample may not match population for a variety of reasons: Due to constraints on design Due to sampling frame coverage Due to intentional over/under-sampling Due to nonresponse Due to sampling error Weights can be used to “correct” a sample

Weighting Missing Data Coding and Data Preparation Wrap-up Preview of Next Time Realities of Survey Research Sample may not match population for a variety of reasons: Due to constraints on design Due to sampling frame coverage Due to intentional over/under-sampling Due to nonresponse Due to sampling error Weights can be used to “correct” a sample Weighting is never perfect Limited to work with observed variables Rarely have good knowledge of coverage, nonresponse, or sampling error Weighting can increase sampling variance

Weighting Missing Data Coding and Data Preparation Wrap-up Preview of Next Time Three Kinds of Weights Design Weights Nonresponse Weights Post-Stratification Weights

Weighting Missing Data Coding and Data Preparation Wrap-up Preview of Next Time Design Weights Address design-related unequal probability of selection into a sample Applied to complex survey designs : Disproportionate allocation stratified sampling Oversampling of subpopulations Cluster sampling Combinations thereof

Weighting Missing Data Coding and Data Preparation Wrap-up Preview of Next Time Design Weights: Simple Random Sampling Imagine sampling frame of 100,000 units Sample size will be 1,000 What is the probability that a unit in the sampling frame is included in the sample?

Weighting Missing Data Coding and Data Preparation Wrap-up Preview of Next Time Design Weights: Simple Random Sampling Imagine sampling frame of 100,000 units Sample size will be 1,000 What is the probability that a unit in the sampling frame is included in the sample? 1000 p = 100 , 000 = . 01

Weighting Missing Data Coding and Data Preparation Wrap-up Preview of Next Time Design Weights: Simple Random Sampling Imagine sampling frame of 100,000 units Sample size will be 1,000 What is the probability that a unit in the sampling frame is included in the sample? 1000 p = 100 , 000 = . 01 Design weight for all units is w = 1 / p = 100 SRS is self-weighting

Weighting Missing Data Coding and Data Preparation Wrap-up Preview of Next Time Design Weights: Stratified Sample Imagine sampling frame of 100,000 units 90,000 Danes & 10,000 Immigrants Sample size will be 1,000 (proportionate allocation) 900 Danes & 100 Immigrants What is the probability that a unit in the sampling frame is included in the sample?

Weighting Missing Data Coding and Data Preparation Wrap-up Preview of Next Time Design Weights: Stratified Sample Imagine sampling frame of 100,000 units 90,000 Danes & 10,000 Immigrants Sample size will be 1,000 (proportionate allocation) 900 Danes & 100 Immigrants What is the probability that a unit in the sampling frame is included in the sample? 900 p Danish = 90 , 000 = . 01 100 10 , 000 = . 01 p Imm =

Weighting Missing Data Coding and Data Preparation Wrap-up Preview of Next Time Design Weights: Stratified Sample Imagine sampling frame of 100,000 units 90,000 Danes & 10,000 Immigrants Sample size will be 1,000 (proportionate allocation) 900 Danes & 100 Immigrants What is the probability that a unit in the sampling frame is included in the sample? 900 p Danish = 90 , 000 = . 01 100 10 , 000 = . 01 p Imm = Design weight for all units is w = 1 / p = 100 Proportionate allocation is self-weighting

Weighting Missing Data Coding and Data Preparation Wrap-up Preview of Next Time Design Weights: Stratified Sample Imagine sampling frame of 100,000 units 90,000 Danes & 10,000 Immigrants Sample size will be 1,000 (disproportionate allocation) 500 Danes & 500 Immigrants What is the probability that a unit in the sampling frame is included in the sample?

Weighting Missing Data Coding and Data Preparation Wrap-up Preview of Next Time Design Weights: Stratified Sample Imagine sampling frame of 100,000 units 90,000 Danes & 10,000 Immigrants Sample size will be 1,000 (disproportionate allocation) 500 Danes & 500 Immigrants What is the probability that a unit in the sampling frame is included in the sample? 500 p Danish = 90 , 000 = . 0056 500 10 , 000 = . 05 p Imm =

Weighting Missing Data Coding and Data Preparation Wrap-up Preview of Next Time Design Weights: Stratified Sample Imagine sampling frame of 100,000 units 90,000 Danes & 10,000 Immigrants Sample size will be 1,000 (disproportionate allocation) 500 Danes & 500 Immigrants What is the probability that a unit in the sampling frame is included in the sample? 500 p Danish = 90 , 000 = . 0056 500 10 , 000 = . 05 p Imm = Design weights differ across units: w Danish = 1 / p Danish = 178 . 57 w Imm = 1 / p Imm = 20 Disproportionate allocation is not self-weighting

Weighting Missing Data Coding and Data Preparation Wrap-up Preview of Next Time Design Weights: Cluster Sample Imagine sampling frame of 1000 units in 5 clusters of varying sizes Sample size will be 10 each from 3 clusters What is the probability that a unit in the sampling frame is included in the sample? p = n clusters / N clusters ∗ 1 / n cluster = 3 5 ∗ 1 / n cluster

Weighting Missing Data Coding and Data Preparation Wrap-up Preview of Next Time Design Weights: Cluster Sample Imagine sampling frame of 1000 units in 5 clusters of varying sizes Sample size will be 10 each from 3 clusters What is the probability that a unit in the sampling frame is included in the sample? p = n clusters / N clusters ∗ 1 / n cluster = 3 5 ∗ 1 / n cluster Design weights differ across units: Clusters are equally likely to be sampled Probability of selection within cluster varies with cluster size Cluster sampling is rarely self-weighting

Weighting Missing Data Coding and Data Preparation Wrap-up Preview of Next Time Nonresponse Weights Correct for nonresponse Require knowledge of nonrespondents on variables that have been measured for respondents Requires data are missing at random Two common methods Weighting classes Propensity score subclassification

Weighting Missing Data Coding and Data Preparation Wrap-up Preview of Next Time Nonresponse Weights: Example Imagine immigrants end up being less likely to respond 1 RR Danish = 1 . 0 RR Imm = 0 . 8 1 This refers to a lower RR in this particular survey sample, not in general.

Weighting Missing Data Coding and Data Preparation Wrap-up Preview of Next Time Nonresponse Weights: Example Imagine immigrants end up being less likely to respond 1 RR Danish = 1 . 0 RR Imm = 0 . 8 Using weighting classes: w rr , Danish = 1 / 1 = 1 w rr , Imm = 1 / 0 . 8 = 1 . 25 Can generalize to multiple variables and strata 1 This refers to a lower RR in this particular survey sample, not in general.

Weighting Missing Data Coding and Data Preparation Wrap-up Preview of Next Time Post-Stratification Correct for nonresponse, coverage errors, and sampling errors

Data Management Department of Political Science and Government - PowerPoint PPT Presentation

Weighting Missing Data Coding and Data Preparation Wrap-up Preview of Next Time Data Management Department of Political Science and Government Aarhus University November 24, 2014 Weighting Missing Data Coding and Data Preparation Wrap-up

DataCamp Data Types for Data Science DataCamp Data Types for Data Science Data types Data type

Data Collection and Data Management saverio . giallorenzo @gmail.com 1 Web Science Data

PRESENTS PRESENTS TRUE HOTEL MANAGEMENT SYSTEM TRUE HOTEL MANAGEMENT SYSTEM MANAGEMENT FEATURE

COMP9313: Big Data Management Introduction to Big Data Management What is big data? Tweeted by

Environmental Health Science Data Streams Data Streams Health Data Health Data Brian S.

Diagnose data for cleaning Cleaning Data in Python Cleaning data Prepare data for analysis

CS378 Introduction to Data Mining Data Exploration and Data Preprocessing Li Xiong Data

Data Preparation Discretization Data cleaning (Data pre-processing) Data

Efficient Scientific Data Efficient Scientific Data Management on Supercomputers Management on

Modern Data Management and Governance Benjamin Pecheux Data Management and Governance for Better

Data Management Week 14 Why Focus on Data Management? Lots of data to keep track of in many

Business Statistics CONTENTS The role of data The data matrix Data types Aspects of data

Data Preparation Data cleaning Discretization (Data preprocessing) Data

DATA MANAGEMENT: A TOOL FOR RESOURCE MANAGEMENT Africa Petroleum Data Management (APDM) Forum

St St Storm Water Storm Water W t W t Management Management Management Management Program

Adaptive Management: Adaptive Management: Science, Management, or What? Science, Management, or

COLLARTS WRITTEN JOB APPLICATIONS PART 1: THE BASICS COLLARTS TYPES OF JOB APPLICATIONS

Spatial Statistics and Econometrics Roberto Patuelli Department of Economics University of

IVR versus a Live Operator for Phone Surveys in India Dipanjan Chakraborty * Indrani Medhi ^

Introduction to Political Research Session 13-Sources of Error in Research Lecturer: Prof. A.

What do you want to get out of this? You are spoilt for choice Dream job Perception is

Documenting and describing data Scott Summers UK Data Archive Practical research data

Visualization of Perceptual Qualities in Textural Sounds DAF-x 2011, IRCAM/Paris/France

Why Dont Software Developers Use Static Analysis Tools to Find Bugs? Brittany Johnson, Yoonki

Sambuz

Useful Links

Newsletter

Mail Us