Analysing whether sample survey data can be replaced by - - PowerPoint PPT Presentation

analysing whether sample survey data
SMART_READER_LITE
LIVE PREVIEW

Analysing whether sample survey data can be replaced by - - PowerPoint PPT Presentation

Analysing whether sample survey data can be replaced by administrative data Arnout van Delden (a.vandelden@cbs.nl), Reinder Banning, Arjen de Boer and Jeroen Pannekoek Outline 1. Understanding fitness for use 2. Conceptual differences 3.


slide-1
SLIDE 1

Arnout van Delden (a.vandelden@cbs.nl), Reinder Banning, Arjen de Boer and Jeroen Pannekoek

Analysing whether sample survey data can be replaced by administrative data

slide-2
SLIDE 2

Outline

2

  • 1. Understanding fitness for use
  • 2. Conceptual differences
  • 3. Numerical differences
  • 4. Discussion
slide-3
SLIDE 3
  • 1. Understanding fitness for use

3

Concepts admin. data:

  • Numerous rules
  • Differ by type of industry

Case study:

  • 2011 new production system
  • Levels and growth rates
  • Can VAT be used for turnover?
  • 324 “base cells” for publication
slide-4
SLIDE 4
  • 1. Fitness for use: group the base cells

4 Taking decisions Group Target vs. administrative variable Control No conceptual differences Accept Conceptual differences and small numerical differences Adjust Conceptual differences and substantial systematic numerical differences Reject Conceptual differences and substantial non-systematic numerical differences

How to assign the base cells to the groups?

slide-5
SLIDE 5
  • 2. Conceptual differences; find Control

5

Base cells Unique (set) of rules Expected Effect 85 No regulation VAT = T 64 Foreign services not charged from 2010 VAT < T 35 International trade regulations, correctly derived VAT ≈ T 18 * Subcontractors shift VAT payment to main contractor * Foreign turnover not charged VAT ≈ T 17 Derogation: certain economic activities not charged VAT ≪ T 16 Subcontractors shift VAT payment to main contractor VAT ≈ T 89 21 Other sets of rules (not specified) 324 Total

slide-6
SLIDE 6
  • 3. Numerical differences: the data

6

Yearly turnover: 2009, 2010

  • SBS and VAT
  • Linked at micro level
  • Units exist whole year
  • Extremely small units excluded

Hotels and similar accommodation

slide-7
SLIDE 7
  • 3. Numerical data: the model

7

Linear regression: 𝑧𝑙𝑗

𝑢 = 𝛽𝑙 + 𝑒𝛽𝑙𝜀𝑙𝑗 𝑢 + (𝛾𝑙 +𝑒𝛾𝑙 𝜀𝑙𝑗 𝑢 ) 𝑦𝑙𝑗 𝑢 + 𝜁𝑙𝑗 𝑢

SBS(𝑧) and VAT (𝑦) for base cell (𝑙), unit (𝑗), year(𝑢) & year-dummy (𝜀𝑙𝑗

𝑢 )

Regression weights – calibration weights (sample to population) – weighted residuals (heteroscedasticity) – M-estimator (Huber weights against outliers)

slide-8
SLIDE 8
  • 3. Numerical data: indicators for grouping

8 Indicator Description

𝑆𝑙

2 = 1 − 𝑇𝑇(𝑥)𝑙,𝑠𝑓𝑡

𝑇𝑇(𝑥)𝑙,𝑢𝑝𝑢 Coefficient of determination, with regression weights w 𝑁𝑙

𝑧 ,𝑧 =

𝑒𝑙𝑗

𝑢 (𝑧

𝑙𝑗

𝑢 −𝑧𝑙𝑗 𝑢 ) 𝑗 𝑢

𝑒𝑙𝑗

𝑢 (𝑧

𝑙𝑗

𝑢 +𝑧𝑙𝑗 𝑢 ) 𝑗 𝑢

MAPE: Mean absolute percentage error, with calibration weights d 𝛽𝑙, 𝑒𝛽𝑙, 𝛾𝑙, 𝑒𝛾𝑙 Size and p-values of regression coefficients

slide-9
SLIDE 9

Indicators for Reject

9

𝑆𝑙

2

𝑺𝒍

𝟑: 20 poorest base cells

  • Sales partly not charged (19)
  • International Trade (1)

← 95% range Control → R Sea and coastal passenger water transport

̷ ̷

slide-10
SLIDE 10

Indicators for group Accept & Adjust

10

slope 2009 ← 95% range Control → Import of new passenger motor vehicles

̷ ̷

slide-11
SLIDE 11

Conceptual and numerical result in line?

11

Adjust? Expected effect VAT < T

Base cell Number of points Slope (2009) Change of Slope? (2010) Regulation

45112 1742 1.36

  • 0.01

Margin 45402 31 1.34 NA Margin 45194 42 1.17 0.05 Margin 45111 55 1.16

  • 0.03

Margin 45191X 210 1.08

  • 0.04

Margin 47641 59 1.02 0.09 Different moment, Margin 47790 88 0.99 1.86 Margin 45320 35 0.94 0.09 Margin

slide-12
SLIDE 12
  • 4. Discussion

12

Main findings – Use outlier robust regression and indicators – Also control group not error free (deviations from 1:1) – We could not use the significance of regression coefficients – Instead: used 95%-range from control group – We achieved a rough grouping by re-using existing data Discussion points – Some base cells no decision: conceptual ≠ numerical results – Limitations: requires the presence of a control group