Modelling Measurement Error in Administrative and Survey Variables - - PowerPoint PPT Presentation
Modelling Measurement Error in Administrative and Survey Variables - - PowerPoint PPT Presentation
Modelling Measurement Error in Administrative and Survey Variables Sander Scholtus, Bart Bakker, Arnout van Delden (s.scholtus@cbs.nl) Outline Introduction Modelling measurement error structural equation models identification by
Outline
– Introduction – Modelling measurement error
‐ structural equation models ‐ identification by means of an audit sample
– Application
‐ VAT data for Dutch quarterly turnover statistics
– Summary / discussion
2
Introduction
– Quality of administrative data for statistical purposes
‐ coverage of target population, timeliness, … ‐ measurement issues
– Administrative data: possible conceptual differences – Compare admin. data to survey data
‐ previous presentation: survey data = gold standard ‐ current presentation: measurement errors in both sources 3
Modelling measurement error
– Basic approach:
‐ link administrative data to survey data ‐ allow for measurement errors in both sources ‐ fit a structural equation model (SEM) ‐ latent variables represent “true” concepts ‐ standardised factor loadings reflect validity of measurement 4
η1
true
λ21 ε1 ε2 λ11 y1
admin
y2
survey
1 τ1 τ2 y1 = τ1 + λ11η1 + ε1 y2 = τ2 + λ21η1 + ε2
latent
- bserved
constant
Modelling measurement error
– Complications: model identification ‘requires’
‐ multiple (≥ 3) related concepts ‐ multiple (≥ 2) observed variables for each concept ‐ choice of a metric for each latent variable (for evaluating bias) 5
η1 ε1 ε2 y1 y2 1 η2 ε3 ε4 y3 y4 1 ξ1 δ1 δ2 x1 x2 1 ζ1 ζ2 1 1
latent
- bserved
constant
Modelling measurement error
– Standard identification solutions yield ‘arbitrary’ metrics:
‐ reference indicators [e.g., τ1 = 0 and λ11 = 1] ‐ standardise latent variables [E(η1) = 0 and Var(η1) = 1]
– Alternative solution: calibration
‐ collect additional gold standard data for a random subsample (audit sample / verification study) ‐ simulation results suggest: audit sample of 50 units is sufficient 6
y1 = τ1 + λ11η1 + ε1 y2 = τ2 + λ21η1 + ε2 y3 = η1
η1
true
λ21 ε1 ε2 λ11 y1
admin
y2
survey
y3
audit
1 1 τ1 τ2
Application: VAT data
– Dutch quarterly turnover statistics – Main question: VAT turnover fit for use?
‐ base cells in car trade and transport sector ‐ tax regulations exist, previous analysis inconclusive ‐ large and complex units excluded
– Sources of data:
‐ Business Register (BR) ‐ Profit Declarations (PD; admin. source) ‐ VAT data (admin. source) ‐ Structural Business Statistics (SBS; sample survey) ‐ Audit sample: re-edited SBS data (50 units per base cell) 7
Application: VAT data
– Model:
(SBS data removed to avoid multicollinearity with audit data)
8
No. Empl.
BR SBS
audit
Pur- chase
PD SBS
audit
Tot. Costs
PD SBS
audit
Turn-
- ver
PD VAT
audit
SBS
Application: VAT data
– Model estimation
‐ used Pseudo Maximum Likelihood to account for
- complex survey design (SBS + audit sample)
- skewness of the data
‐ examined data transformations:
- variables on original scale
- variables divided by number of legal units (heteroscedasticity)
‐ used R packages lavaan and lavaan.survey 9
Application: VAT data
Results for NACE 45112 (“Sale/repair of passenger cars”)
10
No. Empl.
BR
audit
Pur- chase
PD
audit
Tot. Costs
PD
audit
Turn-
- ver
PD VAT
audit
1 1 1 1 1 1 1 1 1 0.87 1 1 1 1 1.04 1.05 1.03 0.80 0.05 1.03 1.02 55 1.02 –0.02 –0.02 –0.01 –0.02 3.31 1.21 0.02 0.03 Robust (PML) fit measures : χ2 = 66 (df = 47, p = 0.03); CFI = 0.998; TLI = 0.999; RMSEA = 0.032
Application: VAT data
– Result from SEM on previous slide: – Derive a correction formula through a second SEM:
11
VAT
Turn-
- ver
PD
1 λ* = 1.03 β θ* = 0.06 α
Turnover(VAT) = –0.02 + 0.80 × Turnover(true) + ε
1 τ*= –0.01 ε ζ
Turnover(true) = 0.18 + 1.13 × Turnover(VAT) + ζ
(σ=0.08) (σ=0.06) (R2=0.90)