robust lh stratified sampling strategy
play

Robust LH stratified sampling strategy Maria Caterina Bramati - PowerPoint PPT Presentation

Robust LH stratified sampling strategy Maria Caterina Bramati Sapienza University of Rome Southampton Research Seminar - July 15th 2014 - ( Southampton Research Seminar ) Robust LH stratified sampling strategy 1 / 37 Introduction Outline


  1. Robust LH stratified sampling strategy Maria Caterina Bramati Sapienza University of Rome Southampton Research Seminar - July 15th 2014 - ( Southampton Research Seminar ) Robust LH stratified sampling strategy 1 / 37

  2. Introduction Outline Motivation Robustness issues in Stratified design Some proposals Simulation Study Further issues: Time-dependent survey variables Agenda ( Southampton Research Seminar ) Robust LH stratified sampling strategy 2 / 37

  3. Introduction Key features on surveying firms 1 Population has a skewed distribution (small number of units accounts for a large share of the study variables) 2 Availability of administrative information, providing a list of the statistical units of the target population (i.e. tax declaration, social security registers) 3 Survey burdens for firms and costs for NSIs 4 Data quality (administrative sources and survey collection) 5 @EU: Compliance requirements established by EUROSTAT ( Southampton Research Seminar ) Robust LH stratified sampling strategy 3 / 37

  4. Introduction Key features on surveying firms 1 Population has a skewed distribution (small number of units accounts for a large share of the study variables) 2 Availability of administrative information, providing a list of the statistical units of the target population (i.e. tax declaration, social security registers) 3 Survey burdens for firms and costs for NSIs 4 Data quality (administrative sources and survey collection) 5 @EU: Compliance requirements established by EUROSTAT ( Southampton Research Seminar ) Robust LH stratified sampling strategy 3 / 37

  5. Introduction Key features on surveying firms 1 Population has a skewed distribution (small number of units accounts for a large share of the study variables) 2 Availability of administrative information, providing a list of the statistical units of the target population (i.e. tax declaration, social security registers) 3 Survey burdens for firms and costs for NSIs 4 Data quality (administrative sources and survey collection) 5 @EU: Compliance requirements established by EUROSTAT ( Southampton Research Seminar ) Robust LH stratified sampling strategy 3 / 37

  6. Introduction Key features on surveying firms 1 Population has a skewed distribution (small number of units accounts for a large share of the study variables) 2 Availability of administrative information, providing a list of the statistical units of the target population (i.e. tax declaration, social security registers) 3 Survey burdens for firms and costs for NSIs 4 Data quality (administrative sources and survey collection) 5 @EU: Compliance requirements established by EUROSTAT ( Southampton Research Seminar ) Robust LH stratified sampling strategy 3 / 37

  7. Introduction Key features on surveying firms 1 Population has a skewed distribution (small number of units accounts for a large share of the study variables) 2 Availability of administrative information, providing a list of the statistical units of the target population (i.e. tax declaration, social security registers) 3 Survey burdens for firms and costs for NSIs 4 Data quality (administrative sources and survey collection) 5 @EU: Compliance requirements established by EUROSTAT ( Southampton Research Seminar ) Robust LH stratified sampling strategy 3 / 37

  8. Sampling Strategy Sampling design: 3 main problems 1 choice of the sampling design 2 sample size determination 3 sample allocation under some constraints -costs related to the surveying process and statistical burdens -statistical precision -legal obligations and requirements (EUROSTAT, NBB, . . . ) -availability of auxiliary information ( Southampton Research Seminar ) Robust LH stratified sampling strategy 4 / 37

  9. Sampling Strategy Sampling design: 3 main problems 1 choice of the sampling design 2 sample size determination 3 sample allocation under some constraints -costs related to the surveying process and statistical burdens -statistical precision -legal obligations and requirements (EUROSTAT, NBB, . . . ) -availability of auxiliary information ( Southampton Research Seminar ) Robust LH stratified sampling strategy 4 / 37

  10. Sampling Strategy Sampling design: stratified sample -population is divided into subgroups (or strata) in order to maximize the intra-group ‘ homogeneity ’ (according to a chosen target variable) and to minimize the inter-group ‘ homogeneity ’. ( Southampton Research Seminar ) Robust LH stratified sampling strategy 5 / 37

  11. Sampling Strategy Sampling design: stratified sample -population is divided into subgroups (or strata) in order to maximize the intra-group ‘ homogeneity ’ (according to a chosen target variable) and to minimize the inter-group ‘ homogeneity ’. It requires mutually exclusive strata: 1 unit can belong to 1 stratum only collectively exhaustive strata: no population unit excluded ( Southampton Research Seminar ) Robust LH stratified sampling strategy 5 / 37

  12. Sampling Strategy Sampling design: stratified sample -population is divided into subgroups (or strata) in order to maximize the intra-group ‘ homogeneity ’ (according to a chosen target variable) and to minimize the inter-group ‘ homogeneity ’. It requires mutually exclusive strata: 1 unit can belong to 1 stratum only collectively exhaustive strata: no population unit excluded The choice of 1 ) − 3 ) should be linked to quality issues of the final statistical product , balancing costs and benefits. = ⇒ Target statistical precision is the constraint under which choices are made ( Southampton Research Seminar ) Robust LH stratified sampling strategy 5 / 37

  13. Sampling Strategy HL sampling algorithm t ystrat = � L � N h The HT estimator for the total ^ k ∈ S h y k h = 1 n h has variance estimated by L � ( 1 − a h ) ^ s 2 Var (^ t ystrat ) = N h (1) yh a h h = 1 where � 1 s 2 y h ) 2 , ( y k − ^ yh = n h − 1 k ∈ S h and ^ y h is the sample mean of Y within stratum h . ( Southampton Research Seminar ) Robust LH stratified sampling strategy 6 / 37

  14. Sampling Strategy HL sampling algorithm The HL algorithm with Neyman allocation represents an optimal solution for the three problems. � L − 1 W 2 h s 2 yh h = 1 a h n ^ = N L + (2) ( cY / N ) 2 + � L − 1 t ystrat W h N s 2 h = 1 yh a h = n h W h s yh = (3) � L − 1 N h k = 1 W k s yk ( Southampton Research Seminar ) Robust LH stratified sampling strategy 7 / 37

  15. Sampling Strategy HL sampling algorithm The idea of HL algorithm is to find the optimal strata boundaries b 1 , . . . , b L − 1 which minimize the size n ^ t ystrat subject to a required precision c , with some appropriate sampling allocation (Neyman, proportional...). ( Southampton Research Seminar ) Robust LH stratified sampling strategy 8 / 37

  16. Sampling Strategy HL sampling algorithm The idea of HL algorithm is to find the optimal strata boundaries b 1 , . . . , b L − 1 which minimize the size n ^ t ystrat subject to a required precision c , with some appropriate sampling allocation (Neyman, proportional...). However ( Southampton Research Seminar ) Robust LH stratified sampling strategy 8 / 37

  17. Sampling Strategy HL sampling algorithm The idea of HL algorithm is to find the optimal strata boundaries b 1 , . . . , b L − 1 which minimize the size n ^ t ystrat subject to a required precision c , with some appropriate sampling allocation (Neyman, proportional...). However 1 s 2 yh is unknown = ⇒ use of auxiliary information X for Y 2 number L of strata is selected by the user 3 low quality of the administrative records: outliers? ( Southampton Research Seminar ) Robust LH stratified sampling strategy 8 / 37

  18. Sampling Strategy HL sampling algorithm The idea of HL algorithm is to find the optimal strata boundaries b 1 , . . . , b L − 1 which minimize the size n ^ t ystrat subject to a required precision c , with some appropriate sampling allocation (Neyman, proportional...). However 1 s 2 yh is unknown = ⇒ use of auxiliary information X for Y 2 number L of strata is selected by the user 3 low quality of the administrative records: outliers? BUT auxiliary information X � = Y target variable. = ⇒ modified HL algorithm ( Rivest, 2002 ): the discrepancy between Y and X is estimated ( Southampton Research Seminar ) Robust LH stratified sampling strategy 8 / 37

  19. Sampling Strategy The effects of outliers in the HL sampling algorithm Type of anomalies erroneous records in the surveyed data ( Y ) (vertical outliers) quality issues in the administrative registers ( X ) (leverage) outliers in both variables ( X , Y ) (good/bad leverages) = ⇒ Unreliable conditional mean and variance of Y | X, affecting sample size strata bounds sample allocation ( Southampton Research Seminar ) Robust LH stratified sampling strategy 9 / 37

  20. Sampling Strategy The effects of outliers in the HL sampling algorithm Type of anomalies erroneous records in the surveyed data ( Y ) (vertical outliers) quality issues in the administrative registers ( X ) (leverage) outliers in both variables ( X , Y ) (good/bad leverages) = ⇒ Unreliable conditional mean and variance of Y | X, affecting sample size strata bounds sample allocation ( Southampton Research Seminar ) Robust LH stratified sampling strategy 9 / 37

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend