the role of the auxiliary information in statistical
play

The Role of the Auxiliary Information in Statistical Matching - PowerPoint PPT Presentation

The Role of the Auxiliary Information in Statistical Matching Income and Consumption Gabriella Donatiello, Marcello DOrazio, Doriana Frattarola, Antony Rizzi, Mauro Scanu, Mattia Spaziani Italian National Institute of Statistics ISTAT


  1. The Role of the Auxiliary Information in Statistical Matching Income and Consumption Gabriella Donatiello, Marcello D’Orazio, Doriana Frattarola, Antony Rizzi, Mauro Scanu, Mattia Spaziani Italian National Institute of Statistics ISTAT New techniques and technologies for statistics (NTTS) 2015 Session 8B - Record linkage and statistical matching Charlemagne Conference Centre Brussels, Belgium, 11 March 2015

  2. THE RESEARCH OBJECTIVE • Evaluate the possibility of integrating two different data sources (EU- SILC 2012 and HBS 2011) in order to provide joint information on household income and consumption in Italy at the micro level • One of the aim is to focus on the role of the auxiliary information in improving the matching outputs and overcoming the underlying assumptions • Most of the statistical matching methods assume: - conditional independence (CI) of the target variables given the common variables - the observations in the samples are independent and identically distributed (i.i.d.) : also difficult to be maintained when matching data from complex sample surveys 1 The role of the auxiliary information in Statistical Matching Income and Consumption, Gabriella Donatiello et al.

  3. METHODS FOR MATCHING HBS AND SILC 1. Application of random hot deck under CIA using the R package StatMatch (HBS as a donor data set to impute consumption classes in SILC) 2. Exploration of uncertainty and introduction of auxiliary information to overcome the CIA and improve the final estimates 3. Renssen’s weights calibration approach (preserves marginal distributions of the target variables) - Independence between income and consumption: unrealistic - The exploration of SM uncertainty is applied by calculating the Fréchet bounds for the contingency table between the variables of interest - HBS household monthly income used as auxiliary information 2 The role of the auxiliary information in Statistical Matching Income and Consumption, Gabriella Donatiello et al.

  4. HBS INCOME AS AUXILIARY INFORMATION 1/2 All the available information of HBS income section is exploited in order to estimate a new income variable income’s The synthetic variable reduces the household underestimation in HBS while preserving 81% of original distribution 3 The role of the auxiliary information in Statistical Matching Income and Consumption, Gabriella Donatiello et al.

  5. HBS INCOME AS AUXILIARY INFORMATION 2/2 The auxiliary information is used in the random hot deck matching procedure in further restricting the subset of potential donors The comparison between the imputed classes of consumption in SILC using the original HBS income or using the synthetic income displays an improvement of the estimates in consumption highest classes 4 The role of the auxiliary information in Statistical Matching Income and Consumption, Gabriella Donatiello et al.

  6. CALIBRATION OF THE SURVEY WEIGHTS 1/2 Renssen’s approach is a SM method that explicitly takes into account the sampling design and the sampling weights The procedure consists of two steps: (step 1) harmonization of the distribution of the matching variables (macro areas, number of durable goods, class of income) in the two data sources through the calibration of survey weights; (step 2) estimation of the target variables by using the data and the weights after the harmonization 25 20 Comparison of consumption classes after weights 15 calibration Hbs 10 Silc 5 0 5 The role of the auxiliary information in Statistical Matching Income and Consumption, Gabriella Donatiello et al.

  7. CALIBRATION OF THE SURVEY WEIGHTS 2/2 Observed (HBS) and imputed (SILC) consumption by macro areas Northwest Northeast Central Italy South and Italy Class of Islands Consumption Hbs Silc Hbs Silc Hbs Silc Hbs Silc Hbs Silc 18.6 21.0 11.6 13.2 13.5 14.2 56.4 51.6 11.3 11.9 Under 1000 23.1 22.6 16.2 17.9 17.4 17.7 43.3 41.8 17.5 17.7 1000-1500 27.2 26.6 18.7 19.0 20.1 20.2 34.0 34.3 19.7 18.9 1500-2000 28.6 28.6 21.7 22.2 22.1 20.9 27.6 28.3 17.1 17.2 2000-2600 32.1 32.4 23.8 23.0 21.7 21.5 22.4 23.2 16.4 16.2 2600-3300 35.8 36.1 24.1 21.1 22.9 23.9 17.2 18.9 12.2 12.2 3300-4900 43.2 40.9 25.2 23.2 20.2 20.8 11.4 15.1 5.8 5.9 4900 or more 28.5 28.5 19.8 19.8 19.9 19.9 31.8 31.8 100.0 100.0 Total The Renssen’s procedure fits linear probability models by taking into account the survey weights The predicted probabilities at unit level allow to reproduce the marginal distribution of the imputed variable as estimated on the donor data set These models can provide negative or greater than one estimated probabilities and further investigations are needed 6 The role of the auxiliary information in Statistical Matching Income and Consumption, Gabriella Donatiello et al.

  8. NEW SOURCE OF AUXILIARY INFORMATION 1/3 An ex-ante collection of wealth/consumption in SILC could provide new shared variables with high predictive power for data matching A predictive model in order to identify those consumption components that are good predictors for total consumption in HBS: i. analyze the structure of total consumption and compare the shares that different items have across the classes of income ii. investigate the predictive power of each item with a statistical model Method: stepwise regression Dependent variable: • logarithmic transformation of monthly total consumption Covariates : • Socio-demographic characteristics of household and r. p. • synthetic class of income • all main consumption components 7 The role of the auxiliary information in Statistical Matching Income and Consumption, Gabriella Donatiello et al.

  9. NEW SOURCE OF AUXILIARY INFORMATION 2/3 The figure shows that three large components (total housing costs, food, transport) represent 63% of total consumption 8 The role of the auxiliary information in Statistical Matching Income and Consumption, Gabriella Donatiello et al.

  10. NEW SOURCE OF AUXILIARY INFORMATION 3/3 Some simulations on HBS using different methods of classification are performed to build a rule for identifying different consumption components, in order to allocate observations to the estimated classes Comparing the overall classification error between models and covariates, it turns out that all the models identify the same set of variables (the union of 1 and 2) Food, housing and transport expenditures result as the best predictors of total consumption classes able to classify correctly the 56,3% of total households in HBS survey 9 The role of the auxiliary information in Statistical Matching Income and Consumption, Gabriella Donatiello et al.

  11. CONCLUDING REMARKS - Role of the auxiliary information for overcoming CIA in matching procedures - The inclusion of one or two questions on savings in HBS can be useful for data integration purposes, as well as for improving the quality of information on HBS household monthly income - Another source of auxiliary information could come from the introduction of few questions on food consumption and transport in SILC - Our work is still in progress and in the short term we plan to explore in more depth the Renssen’s approach which seems more flexible and also promising in matching HBS and SILC 10 The role of the auxiliary information in Statistical Matching Income and Consumption, Gabriella Donatiello et al.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend