Probability Sampling Approach to Editing Maiki Ilves 1 , Prof. - - PowerPoint PPT Presentation

probability sampling approach to editing
SMART_READER_LITE
LIVE PREVIEW

Probability Sampling Approach to Editing Maiki Ilves 1 , Prof. - - PowerPoint PPT Presentation

Probability Sampling Approach to Editing Maiki Ilves 1 , Prof. Thomas Laitila 2 1 Department of Statistics, Orebro University, Sweden 2 Department of Statistics, Orebro University and Statistics Sweden. Introduction The role of editing:


slide-1
SLIDE 1

Probability Sampling Approach to Editing

Maiki Ilves1, Prof. Thomas Laitila2

1 Department of Statistics, ¨

Orebro University, Sweden

2Department of Statistics, ¨

Orebro University and Statistics Sweden.

slide-2
SLIDE 2

Introduction

The role of editing:

  • 1. To assess the quality of data
  • 2. To improve the survey by identifying error sources
  • 3. Correct errors

Probability Sampling Approach to Editing – p.1/16

slide-3
SLIDE 3

Different ways of editing

Traditional micro-editing Automated editing Selective editing Macro-editing

Probability Sampling Approach to Editing – p.2/16

slide-4
SLIDE 4

Selective editing - 1

Purpose: prioritize suspicious responses according to their influence to the survey estimates and edit only the most influential responses.

▽Probability Sampling Approach to Editing – p.3/16

slide-5
SLIDE 5

Selective editing - 1

Purpose: prioritize suspicious responses according to their influence to the survey estimates and edit only the most influential responses. Three stages:

▽Probability Sampling Approach to Editing – p.3/16

slide-6
SLIDE 6

Selective editing - 1

Purpose: prioritize suspicious responses according to their influence to the survey estimates and edit only the most influential responses. Three stages:

  • 1. Find out suspicious responses
  • editing rules

▽Probability Sampling Approach to Editing – p.3/16

slide-7
SLIDE 7

Selective editing - 1

Purpose: prioritize suspicious responses according to their influence to the survey estimates and edit only the most influential responses. Three stages:

  • 1. Find out suspicious responses
  • editing rules
  • 2. Prioritize
  • score function i.e. function of measured value and

expected amended value. Local score, global score.

▽Probability Sampling Approach to Editing – p.3/16

slide-8
SLIDE 8

Selective editing - 1

Purpose: prioritize suspicious responses according to their influence to the survey estimates and edit only the most influential responses. Three stages:

  • 1. Find out suspicious responses
  • editing rules
  • 2. Prioritize
  • score function i.e. function of measured value and

expected amended value. Local score, global score.

  • 3. Determine cut-off point
  • in simulation study based on fully edited dataset

Probability Sampling Approach to Editing – p.3/16

slide-9
SLIDE 9

Selective editing - 2

Evaluation: relative pseudo-bias

  • ˆ

θq − ˆ θ100 se(ˆ θ100)

  • q - percentage of suspicious responses pursued.

Probability Sampling Approach to Editing – p.4/16

slide-10
SLIDE 10

Selective editing - 3

Advantages + Reduced costs + Reduced response burden + Gain in timeliness

▽Probability Sampling Approach to Editing – p.5/16

slide-11
SLIDE 11

Selective editing - 3

Advantages + Reduced costs + Reduced response burden + Gain in timeliness Disadvantages

  • How to take into account the effect of editing in the

estimation stage?

  • Influence of edited data when used in different

statistical analysis is not known.

  • So far used only on quantitative variables.

Probability Sampling Approach to Editing – p.5/16

slide-12
SLIDE 12

Estimating measurement bias

Literature: Madow (1965), Lessler and Kalsbeck (1992), Rao and Sitter (1997) Bias estimation through double sampling or two-phase

  • sampling. For all subsampled units the true values are

recorded and the difference between true values and

  • bserved values is used for bias estimation.

Probability Sampling Approach to Editing – p.6/16

slide-13
SLIDE 13

Probability sampling approach

Our idea: Combine selective editing with bias estimation and derive unbiased estimator and its variance for this approach.

▽Probability Sampling Approach to Editing – p.7/16

slide-14
SLIDE 14

Probability sampling approach

Our idea: Combine selective editing with bias estimation and derive unbiased estimator and its variance for this approach.

U

▽Probability Sampling Approach to Editing – p.7/16

slide-15
SLIDE 15

Probability sampling approach

Our idea: Combine selective editing with bias estimation and derive unbiased estimator and its variance for this approach.

U

✬ ✫ ✩ ✪

sa

▽Probability Sampling Approach to Editing – p.7/16

slide-16
SLIDE 16

Probability sampling approach

Our idea: Combine selective editing with bias estimation and derive unbiased estimator and its variance for this approach.

U

✬ ✫ ✩ ✪

sa U1 U2 sa1 sa2

▽Probability Sampling Approach to Editing – p.7/16

slide-17
SLIDE 17

Probability sampling approach

Our idea: Combine selective editing with bias estimation and derive unbiased estimator and its variance for this approach.

U

✬ ✫ ✩ ✪

sa U1 U2 sa1 sa2

✫✪ ✬✩

s2

Probability Sampling Approach to Editing – p.7/16

slide-18
SLIDE 18

Unbiased estimator for edited data

Notation:

zk, k ∈ U1 - true value xk, k ∈ U2 - observed value yk = Iedit

k

zk + (1 − Iedit

k

)xk, k ∈ U - observed value

after selective editing

▽Probability Sampling Approach to Editing – p.8/16

slide-19
SLIDE 19

Unbiased estimator for edited data

Notation:

zk, k ∈ U1 - true value xk, k ∈ U2 - observed value yk = Iedit

k

zk + (1 − Iedit

k

)xk, k ∈ U - observed value

after selective editing We want to estimate tz =

U zk.

▽Probability Sampling Approach to Editing – p.8/16

slide-20
SLIDE 20

Unbiased estimator for edited data

Notation:

zk, k ∈ U1 - true value xk, k ∈ U2 - observed value yk = Iedit

k

zk + (1 − Iedit

k

)xk, k ∈ U - observed value

after selective editing We want to estimate tz =

U zk.

HT-estimator ˆ

ty =

sa yk/πak is biased.

▽Probability Sampling Approach to Editing – p.8/16

slide-21
SLIDE 21

Unbiased estimator for edited data

Notation:

zk, k ∈ U1 - true value xk, k ∈ U2 - observed value yk = Iedit

k

zk + (1 − Iedit

k

)xk, k ∈ U - observed value

after selective editing We want to estimate tz =

U zk.

HT-estimator ˆ

ty =

sa yk/πak is biased.

Estimator of bias is

ˆ B(ˆ ty) =

  • s2

ek πakπk|sa2 , ek = xk − zk.

▽Probability Sampling Approach to Editing – p.8/16

slide-22
SLIDE 22

Unbiased estimator for edited data

Notation:

zk, k ∈ U1 - true value xk, k ∈ U2 - observed value yk = Iedit

k

zk + (1 − Iedit

k

)xk, k ∈ U - observed value

after selective editing We want to estimate tz =

U zk.

HT-estimator ˆ

ty =

sa yk/πak is biased.

Estimator of bias is

ˆ B(ˆ ty) =

  • s2

ek πakπk|sa2 , ek = xk − zk.

Bias corrected estimator is ˆ

tz = ˆ ty − ˆ B(ˆ ty).

Probability Sampling Approach to Editing – p.8/16

slide-23
SLIDE 23

Precision of the estimators

MSE(ˆ ty) = V (ˆ ty) + B2(ˆ ty). MSE(ˆ tz) = V (ˆ ty) + V ( ˆ B(ˆ ty)) − 2C(ˆ ty, ˆ B(ˆ ty))

▽Probability Sampling Approach to Editing – p.9/16

slide-24
SLIDE 24

Precision of the estimators

MSE(ˆ ty) = V (ˆ ty) + B2(ˆ ty). MSE(ˆ tz) = V (ˆ ty) + V ( ˆ B(ˆ ty)) − 2C(ˆ ty, ˆ B(ˆ ty))

where

V (ˆ ty) =

U

∆akl yk πak yl πal ,

(1)

▽Probability Sampling Approach to Editing – p.9/16

slide-25
SLIDE 25

Precision of the estimators

MSE(ˆ ty) = V (ˆ ty) + B2(ˆ ty). MSE(ˆ tz) = V (ˆ ty) + V ( ˆ B(ˆ ty)) − 2C(ˆ ty, ˆ B(ˆ ty))

where

V ( ˆ B(ˆ ty)) =

U2

∆akl ek πak el πal +

(2)

+Ea

U2

∆kl|sa2IakIal ek πakπk|sa2 el πalπl|sa2

  • ,

▽Probability Sampling Approach to Editing – p.9/16

slide-26
SLIDE 26

Precision of the estimators

MSE(ˆ ty) = V (ˆ ty) + B2(ˆ ty). MSE(ˆ tz) = V (ˆ ty) + V ( ˆ B(ˆ ty)) − 2C(ˆ ty, ˆ B(ˆ ty))

where

C(ˆ ty, ˆ B(ˆ ty)) =

  • U
  • U2

∆akl yk πak el πal .

Probability Sampling Approach to Editing – p.9/16

slide-27
SLIDE 27

One example

One specific two-phase design is considered. First phase sampling design: SI of size na, second phase sampling design: Poisson with inclusion probability πk|sa2.

▽Probability Sampling Approach to Editing – p.10/16

slide-28
SLIDE 28

One example

Then,

ˆ V (ˆ ty) = CS2

ysa,

ˆ V ( ˆ B(ˆ ty)) = C

  • S2

ˇ es2 +

1 N − na

  • s2

(1 − πk|sa2)ˇ e2

k

  • ,

ˆ C(ˆ ty, ˆ B(ˆ ty)) = C na

  • s2

xkˇ ek − 1 na − 1

  • sa

yk

  • s2

ˇ ek

  • ,

where C = (1−fa)N 2

na

, ˇ

ek = ek/πk|sa2 and S2

ˇ es2 = 1/(na − 1)( s2 ˇ

e2

k − 1/na( s2 ˇ

ek)2).

Probability Sampling Approach to Editing – p.10/16

slide-29
SLIDE 29

Simulation study: purpose

To compare survey estimates under two editing approaches:

▽Probability Sampling Approach to Editing – p.11/16

slide-30
SLIDE 30

Simulation study: purpose

To compare survey estimates under two editing approaches: Approach 1 - editing procedure where selective editing procedure is applied;

▽Probability Sampling Approach to Editing – p.11/16

slide-31
SLIDE 31

Simulation study: purpose

To compare survey estimates under two editing approaches: Approach 1 - editing procedure where selective editing procedure is applied; Approach 2 - editing procedure where in addition to selective editing bias correction is carried out.

Probability Sampling Approach to Editing – p.11/16

slide-32
SLIDE 32

Simulation study: setup

Population size: N = 10000

▽Probability Sampling Approach to Editing – p.12/16

slide-33
SLIDE 33

Simulation study: setup

Population size: N = 10000 Sample size: na = 1000

▽Probability Sampling Approach to Editing – p.12/16

slide-34
SLIDE 34

Simulation study: setup

Population size: N = 10000 Sample size: na = 1000 True values: z ∼ Po(5)

▽Probability Sampling Approach to Editing – p.12/16

slide-35
SLIDE 35

Simulation study: setup

Population size: N = 10000 Sample size: na = 1000 True values: z ∼ Po(5) Observed values: x =

         z, 0 ≤ u < 0.3 Po(5), 0.3 ≤ u < 0.6 Po(2), 0.6 ≤ u < 0.8 Po(10), 0.8 ≤ u ≤ 1

where u ∼ U(0, 1).

▽Probability Sampling Approach to Editing – p.12/16

slide-36
SLIDE 36

Simulation study: setup

Population size: N = 10000 Sample size: na = 1000 True values: z ∼ Po(5) Observed values: x =

         z, 0 ≤ u < 0.3 Po(5), 0.3 ≤ u < 0.6 Po(2), 0.6 ≤ u < 0.8 Po(10), 0.8 ≤ u ≤ 1

where u ∼ U(0, 1). Second phase inclusion probability: πk|sa2 being constant

▽Probability Sampling Approach to Editing – p.12/16

slide-37
SLIDE 37

Simulation study: setup

Population size: N = 10000 Sample size: na = 1000 True values: z ∼ Po(5) Observed values: x =

         z, 0 ≤ u < 0.3 Po(5), 0.3 ≤ u < 0.6 Po(2), 0.6 ≤ u < 0.8 Po(10), 0.8 ≤ u ≤ 1

where u ∼ U(0, 1). Second phase inclusion probability: πk|sa2 being constant Score function: sk(x) = xk − µz

Probability Sampling Approach to Editing – p.12/16

slide-38
SLIDE 38

Simulation study: results

Table 1: Total estimate and its variance and MSE under two approaches (1000 repetition). Approach 1 Approach 2 Edited 20% 12% + 8%

ˆ t

46 448 50 067

B(ˆ t)

  • 3 572 (8% )
  • 47 (0% )

V ar

495 508 5 213 649

ˆ V ar

495 830 5 143 426

MSE

13 254 435 5 213 649

  • Emp. MSE

13 356 847 5 289 592

  • Emp. Root MSE

3 655 2 300

Probability Sampling Approach to Editing – p.13/16

slide-39
SLIDE 39

Simulation study: results 2

Instead of study variable z the correlated variable w is used in the score function. Table 2: Size of bias and precision of the estimated total under two approaches (Corr(w, z) = 0.70,

sk(w) = wk − µw, 1000 repetition).

App. Edited

B(ˆ ty)

SE 95% CI Root MSE 1 16% 8% 1006 54388±1972 4516 9% +7% 0% 3738 50010±7326 3738 2 5% +11% 0.2% 2910 50076±5704 2910 0% +16% 0.2% 2376 50074±4657 2376

Probability Sampling Approach to Editing – p.14/16

slide-40
SLIDE 40

Simulation study: conclusions

Precision: two-phase approach generally gives smaller MSE, except when bias or subsample is small.

▽Probability Sampling Approach to Editing – p.15/16

slide-41
SLIDE 41

Simulation study: conclusions

Precision: two-phase approach generally gives smaller MSE, except when bias or subsample is small. Inference: two-phase approach enables to describe the effect of editing on the estimates, not possible with selective editing.

▽Probability Sampling Approach to Editing – p.15/16

slide-42
SLIDE 42

Simulation study: conclusions

Precision: two-phase approach generally gives smaller MSE, except when bias or subsample is small. Inference: two-phase approach enables to describe the effect of editing on the estimates, not possible with selective editing. Implementation: the same number of responses are pursued but difference is in timeliness and estimation procedure.

▽Probability Sampling Approach to Editing – p.15/16

slide-43
SLIDE 43

Simulation study: conclusions

Precision: two-phase approach generally gives smaller MSE, except when bias or subsample is small. Inference: two-phase approach enables to describe the effect of editing on the estimates, not possible with selective editing. Implementation: the same number of responses are pursued but difference is in timeliness and estimation procedure. Improvement: consider different estimators and possibilities to draw second phase sample more efficiently.

Probability Sampling Approach to Editing – p.15/16

slide-44
SLIDE 44

Thank you!

Probability Sampling Approach to Editing – p.16/16