Aligning estimates from different surveys using Empirical Likelihood - - PowerPoint PPT Presentation

aligning estimates from different surveys
SMART_READER_LITE
LIVE PREVIEW

Aligning estimates from different surveys using Empirical Likelihood - - PowerPoint PPT Presentation

Aligning estimates from different surveys using Empirical Likelihood methods EWA KABZINSKA AND YVES G. BERGER THE UNIVERSITY OF SOUTHAMPTON OUTLINE 1. INTRODUCTION 1. WHY IS IT BENFICIAL TO COMBINE INFORMATION? 2. CURRENT APPROACHES 3.


slide-1
SLIDE 1

Aligning estimates from different surveys using Empirical Likelihood methods

EWA KABZINSKA AND YVES G. BERGER THE UNIVERSITY OF SOUTHAMPTON

slide-2
SLIDE 2

OUTLINE

  • 1. INTRODUCTION

1. WHY IS IT BENFICIAL TO COMBINE INFORMATION? 2. CURRENT APPROACHES 3. WHY EMPIRICAL LIKELIHOOD? 4. EMPIRICAL LOGLIKELIHOOD FUNCTION 5. CONSTRAINTS

  • 2. POINT ESTIMATION

1. ESTIMATION OF SCALE LOADS 2. ESTIMATING EQUATIONS 3. EMPIRICAL LOGLIKELIHOOD RATIO FUNCTION 4. COMPARISON WITH OTHER APPROACHES

  • 3. CONFIDENCE REGIONS
  • 4. SUMMARY
slide-3
SLIDE 3

INTRODUCTION

  • Often different surveys carried out independently

in the same population measure some common variables

  • Population level parameters associated with these

common variables may be unknown or unreliable

  • Examples:
  • Household size and composition, tenure type
  • Income, expenditure
  • Educational attainment
  • Ethnic origin
slide-4
SLIDE 4

INTRODUCTION

WHY IS IT BENFICIAL TO COMBINE INFORMATION?

  • CONSISTENCY - both samples give the same point

estimate for the common variables

  • IMPROVED PRECISION – „borrowing strength” from

the other samples

slide-5
SLIDE 5

INTRODUCTION

slide-6
SLIDE 6

INTRODUCTION

slide-7
SLIDE 7

INTRODUCTION

  • Using information from two surveys, we want to obtain a

single set of positive weights, which:

  • give the same estimates for the unknown totals of the common

variables (Z),

  • Capture aditional benchmark constraints (𝑌1 and 𝑌2)
  • may be used for estimation of other population level parameters

(𝜄1

𝑂and 𝜄2 𝑂)

  • Once the weights are created, each survey can be analysed

separatelly

slide-8
SLIDE 8

INTRODUCTION

CURRENT APPROACHES

  • GREG estimators with enlarged number of predictors by

Zieschang [1], Renssen and Nieuwenbroek [2] and Merkouris [3]

  • Pseudo Empirical Likelihood estimator of Wu [4]
  • Single sample Empirical Likelihood approach for complex

sampling designs by Berger and De La Riva Torres [5]

OTHER RELEVANT WORK

  • Model based projection estimator of Kim and Rao (2011)
  • Weighted Empirical Likelihood approach to the common mean problem

by Tsao and Wu (2006)

slide-9
SLIDE 9

INTRODUCTION

WHY EMPIRICAL LIKELIHOOD?

  • Variables of interest are often skewed (e.g. income,

expenditure) - Empirical Likelihood is a nonparametric approach

  • EL allows to easily incorporate additional benchmark

constraints

  • Asymmetric, data-driven confidence reagions may be
  • btained easily, without relying on variance estimation
  • Weights are positive by definition
slide-10
SLIDE 10

INTRODUCTION

EMPIRICAL LOGLIKELIHOOD FUNCTION

𝓂 𝑛 = 𝓂 𝑛1, 𝑛2 = log 𝑛1𝑗

𝑗∈𝑡1

+ log 𝑛2𝑘

𝑘∈𝑡2

(1)

slide-11
SLIDE 11

INTRODUCTION

CONSTRAINTS

𝑛1𝑗

𝑗∈𝑡1

𝜌1𝑗 = 𝑜1 𝑛2𝑗

𝑘∈𝑡2

𝜌2𝑗 = 𝑜2

(2)

𝑛1𝑗

𝑗∈𝑡1

𝑔(𝑦1𝑗, 𝜘1 ) = 0 𝑛2𝑗

𝑘∈𝑡2

𝑔(𝑦2𝑗, 𝜘2 ) = 0

(3)

𝑛1𝑗

𝑗∈𝑡1

𝑨1𝑗 = 𝑛2𝑗

𝑘∈𝑡2

𝑨2𝑗

(4)

𝑛1𝑗 > 0 𝑛2𝑗 > 0

(5)

slide-12
SLIDE 12

POINT ESTIMATION

ESTIMATION OF SCALE LOADS

Find 𝒏 = 𝑏𝑠𝑕 𝑛𝑏𝑦 {𝓂 𝒏 = log 𝑛 𝑗

𝑗∈𝑡1∪𝑡2

} Solution: where 𝜌𝑗 is the inclusion probability of the i-th unit, 𝜽 is the vector of Lagrange multipliers and 𝒅 is the vector of constraints (2)-(5) 𝑛 𝑗 = (𝜌𝑗 + 𝜽𝑼 𝒅𝑗)−1 (6)

slide-13
SLIDE 13

POINT ESTIMATION

ESTIMATING EQUATIONS

Let 𝜄1

𝑂 and 𝜄2 𝑂 befixed, unknown population level

parameters of interest, solutions to: Example: 𝑕t𝑗 𝑧t𝑗, 𝜄t = 𝑧t𝑗 − 𝜄t𝜌𝑗𝑜𝑢

−1

Aim: point estimators for 𝜄1

𝑂 and 𝜄2 𝑂

𝑕1𝑗 𝑧1𝑗, 𝜄1 = 0,

𝑗∈𝑉

𝑕2𝑗 𝑧2𝑗, 𝜄2 = 0

𝑗∈𝑉

(7)

slide-14
SLIDE 14

POINT ESTIMATION

EMPIRICAL LOGLIKELIHOOD RATIO FUNCTION

𝑠 𝜄1, 𝜄2 = 2 𝓂 𝒏 − 𝓂 𝒏 ∗, 𝜄1, 𝜄2 (8) Constraints (2)-(5) Constraints (2)-(5),

𝑛 1𝑗

∗ 𝑕1𝑗 𝑧1𝑗, 𝜄1 = 0, 𝑗∈𝑡1

𝑛 2𝑗

∗ 𝑕2𝑗 𝑧2𝑗, 𝜄2 = 0 𝑗∈𝑡2

slide-15
SLIDE 15

POINT ESTIMATION

EMPIRICAL LOGLIKELIHOOD RATIO FUNCTION

𝜄 1, 𝜄 2 = arg 𝑛𝑗𝑜𝜄1,𝜄2 𝑠 𝜄1, 𝜄2 (9) Point estimators for𝜄1

𝑂, 𝜄2 𝑂:

SOME ASYMPTOTIC PROPERTIES OF THE ESTIMATOR:

  • Equivalent to a GREG-family estimator making use of

information from both samples

  • Root-n consistent
slide-16
SLIDE 16

POINT ESTIMATION

COMPARISON WITH OTHER APPROACHES

Two test populations:

  • 1. Skewed distribution generated according to the model

proposed in [6]

  • 2. 2006 British Expenditure and Food Survey
  • x: number of people living in the household and number of rooms in

the household

  • z: gross weekly income
  • y: total gross expenditure and total expenditure on housing (7), total

expenditure on clothing and the total expenditure on housing (8), total expenditure on clothing and the total expenditure on food (9)

slide-17
SLIDE 17

POINT ESTIMATION

COMPARISON WITH OTHER APPROACHES

  • 10 000 iterations
  • Two independent samples selected by systematic random

sampling

  • Tested estimators:
  • The proposed Empirical Likelihood estimator (EL),
  • Wu’s Pseudo Empirical Likelihood estimator (PEL) [4],
  • Renssen and Nieuwenbroek’s GREG-type estimator (RN) [2],
  • Zieschang’s GREG-type estimator (ZG) [1]
slide-18
SLIDE 18

POINT ESTIMATION

RELATIVE BIASES OF THE ESTIMATORS

N 𝑜1 𝑜2 𝜄 1

(𝐹𝑀)

𝜄 1

(𝑋𝑉)

𝜄 1

(𝑆𝑂)

𝜄 1

(𝑎𝐻)

𝜄 2

(𝐹𝑀)

𝜄 2

(𝑋𝑉)

𝜄 2

(𝑆𝑂)

𝜄 2

(𝑎𝐻)

Generated data 1

100000 1000 1000 0.01%

  • 0.02%

0.19%

  • 0.16%
  • 0.03%
  • 0.06%
  • 0.16%
  • 0.17%

2

100000 200 400 0.01% 0.01%

  • 0.99%
  • 0.76%
  • 0.01%
  • 0.11%
  • 0.37%
  • 0.53%

3

100000 200 200 0.01% 0.13%

  • 0.76%
  • 0.64%

0.02%

  • 0.06%
  • 0.62%
  • 0.68%

4

2500 160 160 0.00%

  • 0.04%
  • 1.14%
  • 0.98%
  • 0.02%
  • 0.12%
  • 0.97%
  • 1.09%

5

2500 140 260

  • 0.01%

0.15%

  • 1.28%
  • 0.98%

0.00%

  • 0.13%
  • 0.51%
  • 0.72%

6

2500 240 240 0.01% 0.13%

  • 0.76%
  • 0.64%

0.02%

  • 0.06%
  • 0.62%
  • 0.68%

Expenditure and Food Survey data

7

6645 500 500

  • 0.11%

0.07%

  • 0.57%
  • 0.31%
  • 0.05%

0.21%

  • 0.56%
  • 0.20%

8

6645 500 500 0.38% 0.44%

  • 0.07%

0.03% 0.06% 0.06%

  • 0.38%
  • 0.35%

9

6645 500 500 0.07% 0.07%

  • 0.38%
  • 0.30%

0.01% 0.01%

  • 0.36%
  • 0.32%

T A B L E 1 . R E L A T I V E B I A S E S O F T H E P R O P O S E D E M P I R I C A L L I K E L I H O O D E S T I M A T O R ( E L ) , W U ’ S P S E U D O E M P I R I C A L L I K E L I H O O D E S T I M A T O R [ 4 ] ( W U ) , G R E G E S T I M A T O R S P R O P O S E D B Y Z I E S C H A N G [ 1 ] ( Z G ) A N D R E N S S E N A N D N I E U W E N B R O E K [ 2 ] ( R N )

slide-19
SLIDE 19

CONFIDENCE REGIONS

𝑠 𝜄1

𝑂, 𝜄2 𝑂 𝜓2 2

(10)

𝜄1, 𝜄2: r 𝜄1, 𝜄2 ≤ 𝜓𝑒𝑔=2,𝛽

2

(11)

The (1−α) Wilk type confidence region for 𝜄1

𝑂, 𝜄2 𝑂 is constructed by choosing:

Under some regularity conditions

CONFIDENCE INTERVALS obtained using a numerical algorithm

slide-20
SLIDE 20

SUMMARY

  • We present an Empirical Likelihood approach to combining information from

multiple surveys in presence of benchmark and consistency constraints

  • This approach may be used to estimate a wide class of parameters and can be

used in complex sampling designs

  • Under the tested scenarios, the proposed point estimator shows satisfactory

performance compared to the other available estimators in terms of relative bias

  • The main advantage lies in the possibility to construct confidence regions using

the 𝝍𝟑approximation of the empirical log likelihood ratio function

  • A numerical algorithm for constructing confidence intervals is proposed
  • Although the proposed method entails some numerical operations, it is still less

computationally intensive than methods such as bootstrap and relatively easy to implement

slide-21
SLIDE 21

LITERATURE

[1] K. D. Zieschang, Sample weighting methods and estimation of totals in the consumer expenditure

  • survey. Journal of the American Statistical Association, 85(412), (1990),

986–1001. [2] R.H. Renssen and N.J. Nieuwenbroek. Aligning estimates for common variables in two or more sample surveys. Journal of the American Statistical Association, 92(437), (1997), 368–374. [3] Takis Merkouris. Combining independent regression estimators from multiple surveys. Journal of the American Statistical Association, 99(468), (2004), 1131-1139. [4] Ch. Wu, Combining information from multiple surveys through the empirical likelihood method, Canadian Journal of Statistics, 32(1) (2004), 15–26. [5] Y.G. Berger and O. De La Riva Torres. Empirical likelihood confidence intervals for complex sampling

  • designs. Southampton Statistical Sciences Research Institute, (S3RI Methodology Working Papers),

(2012). [6] Ch. Wu and J.K. Rao, Pseudo Empirical Likelihood Ratio Confidence Intervals for Complex Surveys, The Canadian Journal of Statistics, 34, (2006), 359-375. [7] Office for National Statistics and Department for Environment, Food and Rural Affairs, Expenditure and Food Survey, 2006 [computer file]. 3rd Edition. Colchester, Essex: UK Data Archive [distributor], July 2009. SN: 5986.