motivation
play

Motivation S T A T I S T I K A U S T R I A D i e I n f o r m a t - PowerPoint PPT Presentation

Motivation S T A T I S T I K A U S T R I A D i e I n f o r m a t i o n s m a n a g e r EU-SILC poverty rates High quality indicators on national- but estimates on sub-national level have poor accuracy SAE-Methods modelling


  1. Motivation S T A T I S T I K A U S T R I A D i e I n f o r m a t i o n s m a n a g e r EU-SILC → poverty rates ◮ High quality indicators on national- but estimates on sub-national ◮ level have poor accuracy SAE-Methods → modelling assumptions ◮ Use administrative data (see (Qinghua and Lanjouw 2009)) → not ◮ always available Estimate error of differences between waves → many covariates ◮ (tedious) Methodology, which is easy to apply and yields better estimates on ◮ sub-national levels? → R-Package surveysd ◮ Johannes, Gussenbauer (www.statistik.at) 1 / 15 | May 2017

  2. Motivation S T A T I S T I K A U S T R I A D i e I n f o r m a t i o n s m a n a g e r EU-SILC → poverty rates ◮ High quality indicators on national- but estimates on sub-national ◮ level have poor accuracy SAE-Methods → modelling assumptions ◮ Use administrative data (see (Qinghua and Lanjouw 2009)) → not ◮ always available Estimate error of differences between waves → many covariates ◮ (tedious) Methodology, which is easy to apply and yields better estimates on ◮ sub-national levels? → R-Package surveysd ◮ Johannes, Gussenbauer (www.statistik.at) 2 / 15 | May 2017

  3. surveysd S T A T I S T I K A U S T R I A D i e I n f o r m a t i o n s m a n a g e r R-package for variance estimation on surveys with rotating panel ◮ design Variance estimation via bootstrap techniques ◮ Rescaled bootstrap for stratified multistage sampling (Preston, 2009) ◮ Improve accuracy by using multiple (consecutive) waves of the ◮ survey Average bootstrap replicates over waves (Betti et al., 2012) ◮ Easy to use, even for R-Beginners ◮ Johannes, Gussenbauer (www.statistik.at) 3 / 15 | May 2017

  4. Main functionality S T A T I S T I K A U S T R I A D i e I n f o r m a t i o n s m a n a g e r Draw bootstrap replicates → draw.bootstrap() ◮ Calibrate bootstrap replicates → recalib() ◮ Estimate standard errors → calc.stError() ◮ Johannes, Gussenbauer (www.statistik.at) 4 / 15 | May 2017

  5. Draw bootstrap replicates S T A T I S T I K A U S T R I A D i e I n f o r m a t i o n s m a n a g e r draw.bootstrap (dat,REP=1000,hid="DB030",weights="RB050", year="RB010",strata="DB040",cluster= NULL , totals= NULL ,single.PSU= c ("merge","mean"), boot.names= NULL ,country= NULL ,split=FALSE,pid= NULL ) Rectangular data set with household identifier ◮ Describe sampling design with strata and cluster ◮ Automatic detection and dealing with single PSUs ◮ Replicates are taken forward to mimic rotational panel design ◮ Split households are considered ◮ Johannes, Gussenbauer (www.statistik.at) 5 / 15 | May 2017

  6. Draw bootstrap replicates S T A T I S T I K A U S T R I A D i e I n f o r m a t i o n s m a n a g e r draw.bootstrap (dat,REP=1000,hid="DB030",weights="RB050", year="RB010",strata="DB040",cluster= NULL , totals= NULL ,single.PSU= c ("merge","mean"), boot.names= NULL ,country= NULL ,split=FALSE,pid= NULL ) Rectangular data set with household identifier ◮ Describe sampling design with strata and cluster ◮ Automatic detection and dealing with single PSUs ◮ Replicates are taken forward to mimic rotational panel design ◮ Split households are considered ◮ Johannes, Gussenbauer (www.statistik.at) 6 / 15 | May 2017

  7. Calibrate Bootsrap Replicates S T A T I S T I K A U S T R I A D i e I n f o r m a t i o n s m a n a g e r recalib (dat,hid="DB030",weights="RB050", b.rep= paste0 ("w",1:1000),year="RB010", country= NULL ,conP.var= c ("RB090"), conH.var= c ("DB040","DB100"),...) Calibration with ipu2() from Package simPop ◮ Define households and/or personal variables to be calibrated onto ◮ Johannes, Gussenbauer (www.statistik.at) 7 / 15 | May 2017

  8. Estimate standard errors S T A T I S T I K A U S T R I A D i e I n f o r m a t i o n s m a n a g e r calc.stError (dat,weights="RB050",b.weights= paste0 ("w",1:1000), year="RB010",var="HX080",fun="weightedRatio", cross_var= NULL ,year.diff= NULL ,year.mean=3,bias=FALSE, add.arg= NULL ,size.limit=20,cv.limit=10,p= NULL ) Use output of recalib() or rectangular data with bootstrap ◮ weights Function fun is applied on variable var using each bootstrap weight ◮ Predefined functions available, also able to handle custom functions ◮ or functions from other packages Must return double or integer and second argument is weight ◮ Johannes, Gussenbauer (www.statistik.at) 8 / 15 | May 2017

  9. Estimate standard errors S T A T I S T I K A U S T R I A D i e I n f o r m a t i o n s m a n a g e r calc.stError (dat,weights="RB050",b.weights= paste0 ("w",1:1000), year="RB010",var="HX080",fun="weightedRatio", cross_var= NULL ,year.diff= NULL ,year.mean=3,bias=FALSE, add.arg= NULL ,size.limit=20,cv.limit=10,p= NULL ) Use output of recalib() or rectangular data with bootstrap ◮ weights. Function fun is applied on variable var using each bootstrap weight ◮ Predefined functions available, also able to handle custom functions ◮ or functions from other packages Must return double or integer and second argument is weight ◮ Johannes, Gussenbauer (www.statistik.at) 9 / 15 | May 2017

  10. Estimate standard errors S T A T I S T I K A U S T R I A D i e I n f o r m a t i o n s m a n a g e r Results of point estimates are averaged over year.mean years ◮ (optional) Apply filter with equal filter weights over time series ◮ Estimate standard errors for differences between waves with ◮ year.diff (optional) Estimate errors on subgroups with cross_var (optional) ◮ Estimate quantiles using parameter p ◮ Johannes, Gussenbauer (www.statistik.at) 10 / 15 | May 2017

  11. Estimate standard errors S T A T I S T I K A U S T R I A D i e I n f o r m a t i o n s m a n a g e r calc.stError (UDB_AT,weights="weights", year="year",b.weights= paste0 ("w",1:10), var="poverty",cross_var= list ("region", c ("gender","region"))) ## Calculated point estimates for variable(s) ## ## poverty ## ## using function weightedRatio ## ## Results hold 448 point estimates for 9 years in 28 subgroups ## ## Estimted standard error exceeds 10 % of the the point estimate in 246 cases Johannes, Gussenbauer (www.statistik.at) 11 / 15 | May 2017

  12. Estimate standard errors S T A T I S T I K A U S T R I A D i e I n f o r m a t i o n s m a n a g e r # Apply function which is not in package 'surveysd' # take the gini - index library (laeken,quietly=TRUE) # simulate income set.seed (1234) UDB_AT[,income:= exp ( rnorm (.N,mean= sample (7:10,1),sd=1)), by= list (urban)] # gini() returns list # calc.stError needs function that returns double or integer help_gini <- function (x,w){ return ( gini (x,w)$value) } Johannes, Gussenbauer (www.statistik.at) 12 / 15 | May 2017

  13. Estimate standard errors S T A T I S T I K A U S T R I A D i e I n f o r m a t i o n s m a n a g e r calc.stError (UDB_AT,fun="help_gini", weights="weights",year="year",b.weights= paste0 ("w",1:10), var="income",cross_var= list ("region", c ("gender","region")), year.diff= c ("2014-2008"),p= c (.025,.975)) ## Calculated point estimates for variable(s) ## ## income ## ## using function help_gini from .GlobalEnv ## ## Results hold 504 point estimates for 9 years in 28 subgroups ## ## Estimted standard error exceeds 10 % of the the point estimate in 22 cases Johannes, Gussenbauer (www.statistik.at) 13 / 15 | May 2017

  14. Plot Method S T A T I S T I K A U S T R I A D i e I n f o r m a t i o n s m a n a g e r plot (res_inc,type="grouping", groups="region",sd.type="ribbon") AT11 AT12 AT13 65 60 55 50 help_gini of income AT21 AT22 AT31 65 60 55 50 AT32 AT33 AT34 65 60 55 50 8 9 0 1 2 3 4 5 6 8 9 0 1 2 3 4 5 6 8 9 0 1 2 3 4 5 6 0 0 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 Johannes, Gussenbauer (www.statistik.at) 14 / 15 | May 2017

  15. Final Remarks S T A T I S T I K A U S T R I A D i e I n f o r m a t i o n s m a n a g e r Simple to use R-Package ◮ Supports a harmonious approach for estimating standard errors on ◮ surveys with rotating panel design Achieve more accuracy by averaging over multiple years ◮ No need for administrative data or modelling assumptions ◮ Check it out on github: https://github.com/statistikat/surveysd ◮ Johannes, Gussenbauer (www.statistik.at) 15 / 15 | May 2017

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend