The Role of the Auxiliary Information in Statistical Matching - - PowerPoint PPT Presentation
The Role of the Auxiliary Information in Statistical Matching - - PowerPoint PPT Presentation
The Role of the Auxiliary Information in Statistical Matching Income and Consumption Gabriella Donatiello, Marcello DOrazio, Doriana Frattarola, Antony Rizzi, Mauro Scanu, Mattia Spaziani Italian National Institute of Statistics ISTAT
1
- Evaluate the possibility of integrating two different data sources (EU-
SILC 2012 and HBS 2011) in order to provide joint information on household income and consumption in Italy at the micro level
- One of the aim is to focus on the role of the auxiliary information in
improving the matching outputs and overcoming the underlying assumptions
- Most of the statistical matching methods assume:
- conditional independence (CI) of the target variables
given the common variables
- the observations in the samples are independent and
identically distributed (i.i.d.): also difficult to be maintained when matching data from complex sample surveys
THE RESEARCH OBJECTIVE
The role of the auxiliary information in Statistical Matching Income and Consumption, Gabriella Donatiello et al.
2
METHODS FOR MATCHING HBS AND SILC
- 1. Application of random hot deck under CIA using the R package
StatMatch (HBS as a donor data set to impute consumption classes in SILC)
- 2. Exploration of uncertainty and introduction of auxiliary information to
- vercome the CIA and improve the final estimates
- 3. Renssen’s
weights calibration approach (preserves marginal distributions of the target variables)
- Independence between income and consumption: unrealistic
- The exploration of SM uncertainty is applied by calculating the
Fréchet bounds for the contingency table between the variables of interest
- HBS household monthly income used as auxiliary information
The role of the auxiliary information in Statistical Matching Income and Consumption, Gabriella Donatiello et al.
All the available information of HBS income section is exploited in order to estimate a new income variable The synthetic variable reduces the household income’s underestimation in HBS while preserving 81% of original distribution
3
HBS INCOME AS AUXILIARY INFORMATION 1/2
The role of the auxiliary information in Statistical Matching Income and Consumption, Gabriella Donatiello et al.
4
The auxiliary information is used in the random hot deck matching procedure in further restricting the subset of potential donors The comparison between the imputed classes of consumption in SILC using the original HBS income or using the synthetic income displays an improvement of the estimates in consumption highest classes
The role of the auxiliary information in Statistical Matching Income and Consumption, Gabriella Donatiello et al.
HBS INCOME AS AUXILIARY INFORMATION 2/2
5
Renssen’s approach is a SM method that explicitly takes into account the sampling design and the sampling weights The procedure consists of two steps: (step 1) harmonization of the distribution of the matching variables (macro areas, number of durable goods, class of income) in the two data sources through the calibration of survey weights; (step 2) estimation of the target variables by using the data and the weights after the harmonization
CALIBRATION OF THE SURVEY WEIGHTS 1/2
The role of the auxiliary information in Statistical Matching Income and Consumption, Gabriella Donatiello et al.
5 10 15 20 25 Hbs Silc
Comparison of consumption classes after weights calibration
6
The Renssen’s procedure fits linear probability models by taking into account the survey weights The predicted probabilities at unit level allow to reproduce the marginal distribution of the imputed variable as estimated on the donor data set These models can provide negative or greater than one estimated probabilities and further investigations are needed
CALIBRATION OF THE SURVEY WEIGHTS 2/2
The role of the auxiliary information in Statistical Matching Income and Consumption, Gabriella Donatiello et al.
Class of Consumption Northwest Northeast Central Italy South and Islands Italy Hbs Silc Hbs Silc Hbs Silc Hbs Silc Hbs Silc Under 1000 18.6 21.0 11.6 13.2 13.5 14.2 56.4 51.6 11.3 11.9 1000-1500 23.1 22.6 16.2 17.9 17.4 17.7 43.3 41.8 17.5 17.7 1500-2000 27.2 26.6 18.7 19.0 20.1 20.2 34.0 34.3 19.7 18.9 2000-2600 28.6 28.6 21.7 22.2 22.1 20.9 27.6 28.3 17.1 17.2 2600-3300 32.1 32.4 23.8 23.0 21.7 21.5 22.4 23.2 16.4 16.2 3300-4900 35.8 36.1 24.1 21.1 22.9 23.9 17.2 18.9 12.2 12.2 4900 or more 43.2 40.9 25.2 23.2 20.2 20.8 11.4 15.1 5.8 5.9 Total 28.5 28.5 19.8 19.8 19.9 19.9 31.8 31.8 100.0 100.0
Observed (HBS) and imputed (SILC) consumption by macro areas
7
An ex-ante collection of wealth/consumption in SILC could provide new shared variables with high predictive power for data matching A predictive model in order to identify those consumption components that are good predictors for total consumption in HBS: i. analyze the structure of total consumption and compare the shares that different items have across the classes of income ii. investigate the predictive power of each item with a statistical model Method: stepwise regression Dependent variable:
- logarithmic transformation of monthly total consumption
Covariates:
- Socio-demographic characteristics of household and r. p.
- synthetic class of income
- all main consumption components
NEW SOURCE OF AUXILIARY INFORMATION 1/3
The role of the auxiliary information in Statistical Matching Income and Consumption, Gabriella Donatiello et al.
8
The figure shows that three large components (total housing costs, food, transport) represent 63% of total consumption
NEW SOURCE OF AUXILIARY INFORMATION 2/3
The role of the auxiliary information in Statistical Matching Income and Consumption, Gabriella Donatiello et al.
9
Some simulations on HBS using different methods of classification are performed to build a rule for identifying different consumption components, in order to allocate observations to the estimated classes Comparing the overall classification error between models and covariates, it turns out that all the models identify the same set of variables (the union
- f 1 and 2)
Food, housing and transport expenditures result as the best predictors of total consumption classes able to classify correctly the 56,3% of total households in HBS survey
NEW SOURCE OF AUXILIARY INFORMATION 3/3
The role of the auxiliary information in Statistical Matching Income and Consumption, Gabriella Donatiello et al.
10
- Role of the auxiliary information for overcoming CIA in matching
procedures
- The inclusion of one or two questions on savings in HBS can be
useful for data integration purposes, as well as for improving the quality of information on HBS household monthly income
- Another source of auxiliary information could come from the
introduction of few questions on food consumption and transport in SILC
- Our work is still in progress and in the short term we plan to
explore in more depth the Renssen’s approach which seems more flexible and also promising in matching HBS and SILC
CONCLUDING REMARKS
The role of the auxiliary information in Statistical Matching Income and Consumption, Gabriella Donatiello et al.