Modelling Measurement Error in Administrative and Survey Variables - - PowerPoint PPT Presentation

modelling measurement error in administrative and survey
SMART_READER_LITE
LIVE PREVIEW

Modelling Measurement Error in Administrative and Survey Variables - - PowerPoint PPT Presentation

Modelling Measurement Error in Administrative and Survey Variables Sander Scholtus, Bart Bakker, Arnout van Delden (s.scholtus@cbs.nl) Outline Introduction Modelling measurement error structural equation models identification by


slide-1
SLIDE 1

Sander Scholtus, Bart Bakker, Arnout van Delden (s.scholtus@cbs.nl)

Modelling Measurement Error in Administrative and Survey Variables

slide-2
SLIDE 2

Outline

– Introduction – Modelling measurement error

‐ structural equation models ‐ identification by means of an audit sample

– Application

‐ VAT data for Dutch quarterly turnover statistics

– Summary / discussion

2

slide-3
SLIDE 3

Introduction

– Quality of administrative data for statistical purposes

‐ coverage of target population, timeliness, … ‐ measurement issues

– Administrative data: possible conceptual differences – Compare admin. data to survey data

‐ previous presentation: survey data = gold standard ‐ current presentation: measurement errors in both sources 3

slide-4
SLIDE 4

Modelling measurement error

– Basic approach:

‐ link administrative data to survey data ‐ allow for measurement errors in both sources ‐ fit a structural equation model (SEM) ‐ latent variables represent “true” concepts ‐ standardised factor loadings reflect validity of measurement 4

η1

true

λ21 ε1 ε2 λ11 y1

admin

y2

survey

1 τ1 τ2 y1 = τ1 + λ11η1 + ε1 y2 = τ2 + λ21η1 + ε2

latent

  • bserved

constant

slide-5
SLIDE 5

Modelling measurement error

– Complications: model identification ‘requires’

‐ multiple (≥ 3) related concepts ‐ multiple (≥ 2) observed variables for each concept ‐ choice of a metric for each latent variable (for evaluating bias) 5

η1 ε1 ε2 y1 y2 1 η2 ε3 ε4 y3 y4 1 ξ1 δ1 δ2 x1 x2 1 ζ1 ζ2 1 1

latent

  • bserved

constant

slide-6
SLIDE 6

Modelling measurement error

– Standard identification solutions yield ‘arbitrary’ metrics:

‐ reference indicators [e.g., τ1 = 0 and λ11 = 1] ‐ standardise latent variables [E(η1) = 0 and Var(η1) = 1]

– Alternative solution: calibration

‐ collect additional gold standard data for a random subsample (audit sample / verification study) ‐ simulation results suggest: audit sample of 50 units is sufficient 6

y1 = τ1 + λ11η1 + ε1 y2 = τ2 + λ21η1 + ε2 y3 = η1

η1

true

λ21 ε1 ε2 λ11 y1

admin

y2

survey

y3

audit

1 1 τ1 τ2

slide-7
SLIDE 7

Application: VAT data

– Dutch quarterly turnover statistics – Main question: VAT turnover fit for use?

‐ base cells in car trade and transport sector ‐ tax regulations exist, previous analysis inconclusive ‐ large and complex units excluded

– Sources of data:

‐ Business Register (BR) ‐ Profit Declarations (PD; admin. source) ‐ VAT data (admin. source) ‐ Structural Business Statistics (SBS; sample survey) ‐ Audit sample: re-edited SBS data (50 units per base cell) 7

slide-8
SLIDE 8

Application: VAT data

– Model:

(SBS data removed to avoid multicollinearity with audit data)

8

No. Empl.

BR SBS

audit

Pur- chase

PD SBS

audit

Tot. Costs

PD SBS

audit

Turn-

  • ver

PD VAT

audit

SBS

slide-9
SLIDE 9

Application: VAT data

– Model estimation

‐ used Pseudo Maximum Likelihood to account for

  • complex survey design (SBS + audit sample)
  • skewness of the data

‐ examined data transformations:

  • variables on original scale
  • variables divided by number of legal units (heteroscedasticity)

‐ used R packages lavaan and lavaan.survey 9

slide-10
SLIDE 10

Application: VAT data

Results for NACE 45112 (“Sale/repair of passenger cars”)

10

No. Empl.

BR

audit

Pur- chase

PD

audit

Tot. Costs

PD

audit

Turn-

  • ver

PD VAT

audit

1 1 1 1 1 1 1 1 1 0.87 1 1 1 1 1.04 1.05 1.03 0.80 0.05 1.03 1.02 55 1.02 –0.02 –0.02 –0.01 –0.02 3.31 1.21 0.02 0.03 Robust (PML) fit measures : χ2 = 66 (df = 47, p = 0.03); CFI = 0.998; TLI = 0.999; RMSEA = 0.032

slide-11
SLIDE 11

Application: VAT data

– Result from SEM on previous slide: – Derive a correction formula through a second SEM:

11

VAT

Turn-

  • ver

PD

1 λ* = 1.03 β θ* = 0.06 α

Turnover(VAT) = –0.02 + 0.80 × Turnover(true) + ε

1 τ*= –0.01 ε ζ

Turnover(true) = 0.18 + 1.13 × Turnover(VAT) + ζ

(σ=0.08) (σ=0.06) (R2=0.90)

slide-12
SLIDE 12

Summary / discussion

– Can assess validity and bias of admin. data with SEMs – Advantages over direct comparison to survey data:

‐ allow for measurement errors in all sources ‐ objective evaluation of measurement quality

– Possible disadvantages:

‐ need multiple related concepts ‐ need an audit sample to identify bias

– Suggestion: apply a multi-stage approach

1. Make a direct comparison to survey data (linear regression) 2. If inconclusive, determine validity with SEM approach 3. If validity high, collect audit sample to estimate bias as well 12