Orietta Luzi , Marco Di Zio, Ugo Guarnera, Roberta Varriale Italian - - PowerPoint PPT Presentation

orietta luzi marco di zio ugo guarnera roberta varriale
SMART_READER_LITE
LIVE PREVIEW

Orietta Luzi , Marco Di Zio, Ugo Guarnera, Roberta Varriale Italian - - PowerPoint PPT Presentation

Dealing with measurement and integration errors in administrative data: the case of the Italian multi-source system on small and medium enterprises Orietta Luzi , Marco Di Zio, Ugo Guarnera, Roberta Varriale Italian National Statistical


slide-1
SLIDE 1

Dealing with measurement and integration errors in administrative data: the case of the Italian multi-source system on small and medium enterprises - NTTS 2015

Dealing with measurement and integration errors in administrative data: the case of the Italian multi-source system on small and medium enterprises

Orietta Luzi , Marco Di Zio, Ugo Guarnera, Roberta Varriale

Italian National Statistical Institute (Istat)

NTTS2015 Conference - Brussels, 10-12 March, 2015

slide-2
SLIDE 2

Dealing with measurement and integration errors in administrative data: the case of the Italian multi-source system on small and medium enterprises - NTTS 2015

Frame SBS Statistical information system for estimating structural economic variables on business accounts (Turnover, Purchases of goods and Services, Production Value, Value Added,…) for small and medium enterprises based on the primary use of integrated administrative/fiscal data, “complemented” with survey data Until now, SBS for enterprises with less than 100 employees (~4.4 mln units in 2011) have been estimated based on a direct sample survey (~100,000 units) - administrative data were used as auxiliary information. Variables Main economic aggregates Components of the main economic aggregates (out of scope of the paper)

The «frame SBS»: a multiple-source system for Italian Structural Business Statistics based on administrative and survey data

Y6 Purchases of goods Y7 Purchases of services Y8 Use of third party assets Y9 Changes in stocks of raw materials and for resale Y10 Other operating charges PC Personnel Costs Y1 Income from sales and Services (Turnover) Y2 Changes in stock of finished and semi-finished products Y3 Changes in contract work in progress Y4 Changes in internal work capitalized under fixed assets Y5 Other income and earnings

slide-3
SLIDE 3

Dealing with measurement and integration errors in administrative data: the case of the Italian multi-source system on small and medium enterprises - NTTS 2015

Financial Statements (FS) of corporate enterprises liable to fill in the financial statement (about 800.000 enterprises each year) The Sector Studies survey (SS), which is a Fiscal Authority survey that includes each year about 3.5 mln enterprises with a turnover lower than 7.5 mln and greater than 30,000 euros belonging to many economic activity sectors The Tax Return Data (Unico model), based on a unified model of tax declarations by legal form, and IRAP, the Italian regional tax on productive activities The Business Register (BR). Used as population list, auxiliary source of information The Social Security Data (SSD), which includes firm level data and employee data on wages and labor cost. Auxiliary source of information

The sources of the «frame SBS»

slide-4
SLIDE 4

Dealing with measurement and integration errors in administrative data: the case of the Italian multi-source system on small and medium enterprises - NTTS 2015

The sources of the «frame SBS»

Units ID Ateco N Emp Turn N Emp PC WS WH SC Y 1

1 Y 2 1 .....… Y k 1

Y 1

2 Y 2 2 .....… Y k 2 Y 1 3 Y 2 3 .....… Y k 3 Y 1 S Y 2 S……...… Y p S

1 2

SME Survey

. . . .

SME Survey

. . . . . . . .

SME Survey

. . . . . . . . . .

SME Survey

. . . . . .

SME Survey

N (4.4 mil)

BR Social Security Data (SSD) Not covered (~4%) Financial Statements (~16% of SMEs) Sector Studies Survey (~80% of SMEs) Tax Returns Data (UNICO, IRAP) (~97% of SMEs)

slide-5
SLIDE 5

Dealing with measurement and integration errors in administrative data: the case of the Italian multi-source system on small and medium enterprises - NTTS 2015

  • Harmonization
  • Measurement errors (consistency errors)
  • Coverage problems
  • (no unit identification errors were possible)

Non-sampling errors

slide-6
SLIDE 6

Dealing with measurement and integration errors in administrative data: the case of the Italian multi-source system on small and medium enterprises - NTTS 2015

  • A system of different indicators and quality measures at both micro and aggregate

level were used to compare and harmonize information on target variables coming from the different sources  hierarchical approach in the use of different sources

  • Subject matter experts
  • Permanent activity, dealing with changes in administrative and fiscal sources

Non-sampling errors - Harmonization

slide-7
SLIDE 7

Dealing with measurement and integration errors in administrative data: the case of the Italian multi-source system on small and medium enterprises - NTTS 2015

A two-phases data editing strategy :

  • a. editing activities on micro-data observed in each AD source were performed to identify

logical/formal data inconsistencies (e.g. balance errors and other kind of invalid information) b.specific analyses were devoted to assess and resolve inconsistencies between variables integrated from different sources:  identification of outliers: trimming approach based on the analysis of the distribution of economic indicators built using information from different sources (such as the per-capita labor cost), and in rejecting those values exceeding pre-defined thresholds, by domain  influential errors: identified using a model-based robust selective editing approach for continuous variables (the selective editing methodology implemented in the R package SeleMix - Selective Editing via Mixture models)

Non-sampling errors - Consistency errors

slide-8
SLIDE 8

Dealing with measurement and integration errors in administrative data: the case of the Italian multi-source system on small and medium enterprises - NTTS 2015

  • Coverage problems

 unit non-response, deriving from the fact that the integrated AD sources relate to sub- populations which do not cover the overall SMEs population as defined for the SBS purposes  item non-response , mainly due to the incompleteness of information, for some units, of some AD sources, which do not observe all the target variables required for SBS estimation

  • Predictive approach based on imputation

 allowed to build a complete micro-data file for those variables which are extensively covered by the (integrated) AD sources  the not available information is predicted (imputed) based on the available administrative information using a combination of different techniques (including Predictive Mean Matching, Nearest Neighbor Donor, other approaches based on logistic and linear regression), which have been applied to separate groups of variables taking into account their distributional characteristics and their relationships with other variables

Non-sampling errors - Coverage errors

slide-9
SLIDE 9

Dealing with measurement and integration errors in administrative data: the case of the Italian multi-source system on small and medium enterprises - NTTS 2015

Coverage rate of the SME population by source and some main economic aggregates (year 2011)

Source Number Units Number Employees Revenues Value Added FS 16.1 38.2 66.2 54.1 SS 64.0 49.2 24.5 36.4 Unico 16.2 8.3 5.5 6.1 Total covered 96.3 95.7 96.2 96.6 Not covered 3.7 4.3 3.7 3.4 Total 100.0 100.0 100.0 100.0

slide-10
SLIDE 10

Dealing with measurement and integration errors in administrative data: the case of the Italian multi-source system on small and medium enterprises - NTTS 2015

Relative differences between total estimates based on administrative and sample data are considered. 𝒆𝒖 =

𝑍𝑇𝑏𝑛𝑞𝑚𝑓−𝑍𝐺𝑠𝑏𝑛𝑓 ×100 𝑍𝐺𝑠𝑏𝑛𝑓

= 𝒆𝒕 + 𝒆𝒏 𝑒𝑡 =

𝑍𝐺𝑠𝑏𝑛𝑓,𝑇𝑏𝑛𝑞𝑚𝑓−𝑍𝐺𝑠𝑏𝑛𝑓 ×100 𝑍𝐺𝑠𝑏𝑛𝑓

: sampling effect 𝑒𝑛 =

𝑍𝑇𝑏𝑛𝑞𝑚𝑓−𝑍𝐺𝑠𝑏𝑛𝑓,𝑡𝑏𝑛𝑞𝑚𝑓 ×100 𝑍𝐺𝑠𝑏𝑛𝑓

: measurement effect The largest component in the decomposition of the main economic aggregate estimates is the one associated with the sampling error

Results

percentage difference between the variables estimates based on the new estimation system (YFrame) and the corresponding estimates based on the SME survey (YSample) This result is encouraging because it implies that the transition from design-based inference to an estimation approach based on administrative sources would result in a significant improvement of the estimate accuracy

slide-11
SLIDE 11

Dealing with measurement and integration errors in administrative data: the case of the Italian multi-source system on small and medium enterprises - NTTS 2015

  • Overcome some limitations of the current statistical production strategy (costs, burden,

accuracy)

  • Expected increase of SBS consistency over time
  • Higher levels of consistency between annual statistics on enterprises and National

Accounts, starting from the 2011 Benchmark

… and future work

  • Managing unit identification problems over time (splits, fusions,…)
  • Assessing estimates accuracy for the main economic aggregates
  • Improve inferences for some components of the main economic aggregates in specific

economic sectors

  • Consistent estimation w.r.t. the frame information in the different domains of statistics on

enterprises (R&D, ICT, etc.)

Concluding remarks….

slide-12
SLIDE 12

Dealing with measurement and integration errors in administrative data: the case of the Italian multi-source system on small and medium enterprises - NTTS 2015

Thank you for your attention!