Some Preliminaries Main goal: to streamline subprocess 5.3 of GSBPM - PowerPoint PPT Presentation

AN EFFICIENT EDITING AND IMPUTATION STRATEGY WITHIN A CORPORATE - WIDE DATA COLLECTION SYSTEM AT INE SPAIN : A PILOT EXPERIENCE R. L´ opez-Ure˜ na, M. Mancebo, S. Rama and David Salgado david.salgado.fernandez@ine.es D.G. Methodology, Quality and ICT Spanish National Statistical Institute Paris, 24th April 2013 AN EFFICIENT EDITING AND IMPUTATION STRATEGY WITHIN A CORPORATE - WIDE DATA COLLECTION SYSTEM AT INE SPAIN : A PILOT EXPERIENCE – p. 1/10

Some Preliminaries Main goal: to streamline subprocess 5.3 of GSBPM (Review, validate & edit, including editing during data collection (subprocesses 4.x)). We focus upon the selection of questionnaires (detection of errors) under two generic principles: Editing must minimize the amount of resources deployed to recontacts, follow-ups and interactive tasks, in general. Data quality must be ensured . Design of E&I strategies . Pilot experience with the ITI and INORI survey : Fixed panel of 11000 (aprox.) industrial establishments selected by cut-off . Monthly collected data through CSAQ , mail, email, fax and telephone at provincial delegations. Laspeyres indices disseminated for 37 publications cells (NACE Rev. 2). No geographical breakdown. Breakdown into markets (national, euro, noneuro, rest of the world). AN EFFICIENT EDITING AND IMPUTATION STRATEGY WITHIN A CORPORATE - WIDE DATA COLLECTION SYSTEM AT INE SPAIN : A PILOT EXPERIENCE – p. 2/10

Editing Functions Editing function : type of task that has to be performed within a data editing process. The interaction between the statistical methodology and information technologies is fundamental. We incorporate this interaction in the design of an E&I strategy by choosing standardizable editing functions . As a first step in the transition to an industrialized production process, in the editing phase we have focused upon the selection of questionnaires . We distinguish three types of editing functions: survey-specific functions (mainly format and balance edits); interval-distance functions; distribution-angle functions. AN EFFICIENT EDITING AND IMPUTATION STRATEGY WITHIN A CORPORATE - WIDE DATA COLLECTION SYSTEM AT INE SPAIN : A PILOT EXPERIENCE – p. 3/10

Interval-Distance Editing Function General idea: for each variable of level y ( q ) (total turnover and total new orders received in our survey) we construct a validation interval for the reference period t for each respondent; we measure the distance of the reported value to this interval; we compare this distance with the threshold for the reference period t . Construction of the validation interval I ( q ) kt = [ l ( q ) kt , u ( q ) kt ] I ( q ) 1 t + 11 kt = [ˆ y kt − s t · ˆ σ kt , ˆ y kt + s t · ˆ σ kt ] , s t = 11 s ∗ 12 s t − 1 , where ˆ y and ˆ σ denote ARIMA predictions and s ∗ t = argmax s HitRate . In case of short time series or too many missing/zero values, we use a ratio edit . � � l ( q ) u ( q ) y ( rep,q ) kt kt kt AN EFFICIENT EDITING AND IMPUTATION STRATEGY WITHIN A CORPORATE - WIDE DATA COLLECTION SYSTEM AT INE SPAIN : A PILOT EXPERIENCE – p. 4/10

Interval-Distance Editing Function Construction of the distance d ( y ( rep,q ) , I ( q ) kt ) kt If the editing function is an edit if y ( rep,q ) ∈ I ( q ) � 0 kt , d ( y ( rep,q ) , I ( q ) kt kt ) = if y ( rep,q ) ∈ I ( q ) kt ∞ / kt . kt If the editing function is a score function and y ( q ) is discrete if y ( rep,q ) ∈ I ( q )  0 kt , kt   d ( y ( rep,q ) , I ( q ) y ( rep,q ) − u ( q ) if y ( rep,q ) > u ( q ) kt ) = ω k kt , kt kt kt kt l ( q ) kt − y ( rep,q ) if y ( rep,q ) < l ( q )  kt .  kt kt If the editing function is a score function and y ( q ) is continuous if y ( rep,q ) ∈ I ( q )  0 kt , kt   y ( rep,q ) − u ( q )  if y ( rep,q ) > u ( q )  kt , d ( y ( rep,q ) , I ( q ) kt kt kt ) = ω k u ( q ) kt − l ( q ) kt kt kt l ( q ) kt − y ( rep,q )  if y ( rep,q ) < l ( q )  kt . kt   u ( q ) kt − l ( q ) kt kt � � l ( q ) u ( q ) y ( rep,q ) kt kt kt AN EFFICIENT EDITING AND IMPUTATION STRATEGY WITHIN A CORPORATE - WIDE DATA COLLECTION SYSTEM AT INE SPAIN : A PILOT EXPERIENCE – p. 5/10

Interval-Distance Editing Function Construction of the threshold d jt Compute the distance d k ( t − 1) = d ( y ( ed,q ) k ( t − 1) , I ( q ) k ( t − 1) ) between the final edited values and their corresponding validation intervals for the preceding period t − 1 for each unit k . Divide the sample s into J minimal publication cells s = � J j =1 s j . � { d k ( t − 1) } k ∈ s j � For each domain s j compute the quantile q j over the distribution of distances. The quantile (1st quartile, pth percentile,...) is chosen by a trade-off between cost and precision . � { d k ( t − 1) } k ∈ s j � The threshold for unit k is given by d kt = q j if k ∈ s j . An establishment k ∈ s j is flagged for editing if d ( y ( rep,q ) , I ( q ) kt ) > d jt . kt Standard input for a data collection application for each variable of level : l kt , u kt , edit k (0 , 1) , continuous k (0 , 1) , d kt . AN EFFICIENT EDITING AND IMPUTATION STRATEGY WITHIN A CORPORATE - WIDE DATA COLLECTION SYSTEM AT INE SPAIN : A PILOT EXPERIENCE – p. 6/10

Distribution-Angle Editing Function General idea: for each set of variables of distributions { y ( q i ) } (turnover and new orders received by markets in our survey) � � we define a vector y ( q ) y ( q 1 ) , . . . , y ( q I ) i y ( q i ) / � kt = ; k k k we determine the angle of this vector respect to another ( y ( q ) k ( t − 1) , y (˜ q ) kt , etc.); we compare this angle with the threshold for the reference period t . The angle is trivially computed ( scalar product ). The thresholds are determined as quantiles over the distribution of angles over each minimal publication cell . t euro 1 T = ( T nat ,T euro ) T nat + T euro = ( t nat , t euro ) 0 t nat 1 AN EFFICIENT EDITING AND IMPUTATION STRATEGY WITHIN A CORPORATE - WIDE DATA COLLECTION SYSTEM AT INE SPAIN : A PILOT EXPERIENCE – p. 7/10

Macro Editing Phase Mathematical translation of Editing must minimize the amount of resources deployed to recontacts, follow-ups and interactive tasks, in general. Data quality must be ensured . Optimization problem : minimize number of questionnaires to edit interactively estimated mean squared error of y ( q ) ≤ bound ( q ) p = 1 , . . . , P s.t. For editing field work considerations, instead of a selection, a prioritization of units is determined by concatenating a sequence of optimization problems. This prioritization is carried out for each publication cell . A fixed number n macro of questionnaires is further edited. These n macro units are allocated among the publication cells proportional to the estimated mean squared error , to the weights of the cells within the global index, to the proportion of questionnaires reporting zero turnover and to the proportion of imputed questionnaires in the preceding time period having reported zero turnover. AN EFFICIENT EDITING AND IMPUTATION STRATEGY WITHIN A CORPORATE - WIDE DATA COLLECTION SYSTEM AT INE SPAIN : A PILOT EXPERIENCE – p. 8/10

New E&I Strategy CAWI mode and editing at provincial delegations Editing functions as edits (CAWI)/ score functions (Prov. Del.). Total turnover and total new orders received controlled by interval-distance functions. Turnover breakdown controlled by distribution-angle with respect to the preceding time period. New orders received breakdown controlled by distribution-angle with respect to turnover breakdown . Editing at the central office . n macro = 100 . The prediction model is the best among 4 simple time series models . The observation model considers the occurrence of error as a Bernoulli variable whose value in the positive case follows a normal distribution . AN EFFICIENT EDITING AND IMPUTATION STRATEGY WITHIN A CORPORATE - WIDE DATA COLLECTION SYSTEM AT INE SPAIN : A PILOT EXPERIENCE – p. 9/10

Some conclusions Simulations have been carried out with real data from 13 consecutive months. While maintaining nearly the same precision , the interactive editing rate has decreased from 55% in the traditional strategy to 15% − 20% in the proposed strategy. This strategy has been applied in real production conditions in January 2013 (reference month). Preliminary data suggest that simulations were too optimistic (interactive editing rate ≈ 30% − 35% ). The simulation of the respondent behaviour during the CAWI is crucial . The distribution-angle editing function can be reformulated as an interval-distance editing function. The interval construction scheme can be adapted to more common sampling designs (rotating panel with stratified random sampling, . . . ) by (i) aggregating units into homogeneous domains and (ii) using simpler time series models (random walks, etc.). More implementations are currently under development . AN EFFICIENT EDITING AND IMPUTATION STRATEGY WITHIN A CORPORATE - WIDE DATA COLLECTION SYSTEM AT INE SPAIN : A PILOT EXPERIENCE – p. 10/10

Some Preliminaries Main goal: to streamline subprocess 5.3 of GSBPM - PowerPoint PPT Presentation

AN EFFICIENT EDITING AND IMPUTATION STRATEGY WITHIN A CORPORATE - WIDE DATA COLLECTION SYSTEM AT INE SPAIN : A PILOT EXPERIENCE R. L opez-Ure na, M. Mancebo, S. Rama and David Salgado david.salgado.fernandez@ine.es D.G. Methodology, Quality

Outline 2 Introduction Introduction Preliminaries Preliminaries Problem formulation Problem

Preliminaries Programming Coprogramming Advanced Coprogramming Preliminaries Higher-Order

How smart APIs are different. @berndruecker Some Service Some Some Service Service Some

The Good Samaritan Luke 10:25-37 Here is some test text Here is some test text Here is some

The God Who Whispers 1 Kings 19 Here is some test text Here is some test text Here is some test

God Reveals His HOLINESS Isaiah 6 Here is some test text Here is some test text Here is some

For Such a Time as This Esther 4 Here is some test text Here is some test text Here is some

Nehemiah Prays Nehemiah 1-2 Here is some test text Here is some test text Here is some test

God Rescues Daniel from the Lions Daniel 6 Here is some test text Here is some test text Here

Esther and the Great Reversal Esther 6-9 Here is some test text Here is some test text Here is

The Handwriting of God Daniel 5 Here is some test text Here is some test text Here is some test

Jesus Calls His First Disciples Matthew 4:17-22; 9:9-13 Here is some test text Here is some test

God Calls a Spokesman The Book of JEREMIAH Here is some test text Here is some test text Here

God is Worthy of Our Allegiance Daniel 3 Here is some test text Here is some test text Here is

How to Stay Faithful in Exile Daniel 1 Here is some test text Here is some test text Here is

CBMC: Bounded Model Checking for ANSI-C Version 1.0, 2010 Outline Preliminaries BMC Basics

INCORPORATING COMPANIES IN MALTA EXPL EXPLANATORY Y MEMO MEMORANDUM CONT ENTS Why Malta?

Corporate Tax in Malta Russell Attard Baldacchino Nexia BT Malta September 2018 Understanding

Dividend Imputation and the Australian Financial System: What do we know? Professor Kevin

Are people returning to stores? What do consumers expect? What are we buying? What is MNs

2020 Interim Results Presentation 26 FEBRUARY 2020 Progress on strategy 26 FEBRUARY 2020 2020

A Comparison of Imputation Methods under Large Samples and Different Censoring Levels Jose A.

for Spousal and Child Support Navigating Complex SSI and SSDI Rules; Structuring the Divorce

2005: we looked to the future of EU system 2008 2010 2012 2003 2004 2005 2006 2007 2009

Sambuz

Useful Links

Newsletter

Mail Us