[PPT] - Aligning estimates from different surveys using Empirical Likelihood PowerPoint Presentation

SLIDE 1

Aligning estimates from different surveys using Empirical Likelihood methods

EWA KABZINSKA AND YVES G. BERGER THE UNIVERSITY OF SOUTHAMPTON

SLIDE 2

OUTLINE

1. INTRODUCTION

1. WHY IS IT BENFICIAL TO COMBINE INFORMATION? 2. CURRENT APPROACHES 3. WHY EMPIRICAL LIKELIHOOD? 4. EMPIRICAL LOGLIKELIHOOD FUNCTION 5. CONSTRAINTS

2. POINT ESTIMATION

1. ESTIMATION OF SCALE LOADS 2. ESTIMATING EQUATIONS 3. EMPIRICAL LOGLIKELIHOOD RATIO FUNCTION 4. COMPARISON WITH OTHER APPROACHES

3. CONFIDENCE REGIONS
4. SUMMARY

SLIDE 3

INTRODUCTION

Often different surveys carried out independently

in the same population measure some common variables

Population level parameters associated with these

common variables may be unknown or unreliable

Examples:
Household size and composition, tenure type
Income, expenditure
Educational attainment
Ethnic origin

SLIDE 4

INTRODUCTION

WHY IS IT BENFICIAL TO COMBINE INFORMATION?

CONSISTENCY - both samples give the same point

estimate for the common variables

IMPROVED PRECISION – „borrowing strength” from

the other samples

SLIDE 5

INTRODUCTION

SLIDE 6

INTRODUCTION

SLIDE 7

INTRODUCTION

Using information from two surveys, we want to obtain a

single set of positive weights, which:

give the same estimates for the unknown totals of the common

variables (Z),

Capture aditional benchmark constraints (𝑌1 and 𝑌2)
may be used for estimation of other population level parameters

(𝜄1

𝑂and 𝜄2 𝑂)

Once the weights are created, each survey can be analysed

separatelly

SLIDE 8

INTRODUCTION

CURRENT APPROACHES

GREG estimators with enlarged number of predictors by

Zieschang [1], Renssen and Nieuwenbroek [2] and Merkouris [3]

Pseudo Empirical Likelihood estimator of Wu [4]
Single sample Empirical Likelihood approach for complex

sampling designs by Berger and De La Riva Torres [5]

OTHER RELEVANT WORK

Model based projection estimator of Kim and Rao (2011)
Weighted Empirical Likelihood approach to the common mean problem

by Tsao and Wu (2006)

SLIDE 9

INTRODUCTION

WHY EMPIRICAL LIKELIHOOD?

Variables of interest are often skewed (e.g. income,

expenditure) - Empirical Likelihood is a nonparametric approach

EL allows to easily incorporate additional benchmark

constraints

Asymmetric, data-driven confidence reagions may be
btained easily, without relying on variance estimation
Weights are positive by definition

SLIDE 10

INTRODUCTION

EMPIRICAL LOGLIKELIHOOD FUNCTION

𝓂 𝑛 = 𝓂 𝑛1, 𝑛2 = log 𝑛1𝑗

𝑗∈𝑡1

+ log 𝑛2𝑘

𝑘∈𝑡2

(1)

SLIDE 11

INTRODUCTION

CONSTRAINTS

𝑛1𝑗

𝑗∈𝑡1

𝜌1𝑗 = 𝑜1 𝑛2𝑗

𝑘∈𝑡2

𝜌2𝑗 = 𝑜2

(2)

𝑛1𝑗

𝑗∈𝑡1

𝑔(𝑦1𝑗, 𝜘1 ) = 0 𝑛2𝑗

𝑘∈𝑡2

𝑔(𝑦2𝑗, 𝜘2 ) = 0

(3)

𝑛1𝑗

𝑗∈𝑡1

𝑨1𝑗 = 𝑛2𝑗

𝑘∈𝑡2

𝑨2𝑗

(4)

𝑛1𝑗 > 0 𝑛2𝑗 > 0

(5)

SLIDE 12

POINT ESTIMATION

ESTIMATION OF SCALE LOADS

Find 𝒏 = 𝑏𝑠𝑕 𝑛𝑏𝑦 {𝓂 𝒏 = log 𝑛 𝑗

𝑗∈𝑡1∪𝑡2

} Solution: where 𝜌𝑗 is the inclusion probability of the i-th unit, 𝜽 is the vector of Lagrange multipliers and 𝒅 is the vector of constraints (2)-(5) 𝑛 𝑗 = (𝜌𝑗 + 𝜽𝑼 𝒅𝑗)−1 (6)

SLIDE 13

POINT ESTIMATION

ESTIMATING EQUATIONS

Let 𝜄1

𝑂 and 𝜄2 𝑂 befixed, unknown population level

parameters of interest, solutions to: Example: 𝑕t𝑗 𝑧t𝑗, 𝜄t = 𝑧t𝑗 − 𝜄t𝜌𝑗𝑜𝑢

−1

Aim: point estimators for 𝜄1

𝑂 and 𝜄2 𝑂

𝑕1𝑗 𝑧1𝑗, 𝜄1 = 0,

𝑗∈𝑉

𝑕2𝑗 𝑧2𝑗, 𝜄2 = 0

𝑗∈𝑉

(7)

SLIDE 14

POINT ESTIMATION

EMPIRICAL LOGLIKELIHOOD RATIO FUNCTION

𝑠 𝜄1, 𝜄2 = 2 𝓂 𝒏 − 𝓂 𝒏 ∗, 𝜄1, 𝜄2 (8) Constraints (2)-(5) Constraints (2)-(5),

𝑛 1𝑗

∗ 𝑕1𝑗 𝑧1𝑗, 𝜄1 = 0, 𝑗∈𝑡1

𝑛 2𝑗

∗ 𝑕2𝑗 𝑧2𝑗, 𝜄2 = 0 𝑗∈𝑡2

SLIDE 15

POINT ESTIMATION

EMPIRICAL LOGLIKELIHOOD RATIO FUNCTION

𝜄 1, 𝜄 2 = arg 𝑛𝑗𝑜𝜄1,𝜄2 𝑠 𝜄1, 𝜄2 (9) Point estimators for𝜄1

𝑂, 𝜄2 𝑂:

SOME ASYMPTOTIC PROPERTIES OF THE ESTIMATOR:

Equivalent to a GREG-family estimator making use of

information from both samples

Root-n consistent

SLIDE 16

POINT ESTIMATION

COMPARISON WITH OTHER APPROACHES

Two test populations:

1. Skewed distribution generated according to the model

proposed in [6]

2. 2006 British Expenditure and Food Survey
x: number of people living in the household and number of rooms in

the household

z: gross weekly income
y: total gross expenditure and total expenditure on housing (7), total

expenditure on clothing and the total expenditure on housing (8), total expenditure on clothing and the total expenditure on food (9)

SLIDE 17

POINT ESTIMATION

COMPARISON WITH OTHER APPROACHES

10 000 iterations
Two independent samples selected by systematic random

sampling

Tested estimators:
The proposed Empirical Likelihood estimator (EL),
Wu’s Pseudo Empirical Likelihood estimator (PEL) [4],
Renssen and Nieuwenbroek’s GREG-type estimator (RN) [2],
Zieschang’s GREG-type estimator (ZG) [1]

SLIDE 18

POINT ESTIMATION

RELATIVE BIASES OF THE ESTIMATORS

N 𝑜1 𝑜2 𝜄 1

(𝐹𝑀)

𝜄 1

(𝑋𝑉)

𝜄 1

(𝑆𝑂)

𝜄 1

(𝑎𝐻)

𝜄 2

(𝐹𝑀)

𝜄 2

(𝑋𝑉)

𝜄 2

(𝑆𝑂)

𝜄 2

(𝑎𝐻)

Generated data 1

100000 1000 1000 0.01%

0.02%

0.19%

0.16%
0.03%
0.06%
0.16%
0.17%

2

100000 200 400 0.01% 0.01%

0.99%
0.76%
0.01%
0.11%
0.37%
0.53%

3

100000 200 200 0.01% 0.13%

0.76%
0.64%

0.02%

0.06%
0.62%
0.68%

4

2500 160 160 0.00%

0.04%
1.14%
0.98%
0.02%
0.12%
0.97%
1.09%

5

2500 140 260

0.01%

0.15%

1.28%
0.98%

0.00%

0.13%
0.51%
0.72%

6

2500 240 240 0.01% 0.13%

0.76%
0.64%

0.02%

0.06%
0.62%
0.68%

Expenditure and Food Survey data

7

6645 500 500

0.11%

0.07%

0.57%
0.31%
0.05%

0.21%

0.56%
0.20%

8

6645 500 500 0.38% 0.44%

0.07%

0.03% 0.06% 0.06%

0.38%
0.35%

9

6645 500 500 0.07% 0.07%

0.38%
0.30%

0.01% 0.01%

0.36%
0.32%

T A B L E 1 . R E L A T I V E B I A S E S O F T H E P R O P O S E D E M P I R I C A L L I K E L I H O O D E S T I M A T O R ( E L ) , W U ’ S P S E U D O E M P I R I C A L L I K E L I H O O D E S T I M A T O R [ 4 ] ( W U ) , G R E G E S T I M A T O R S P R O P O S E D B Y Z I E S C H A N G [ 1 ] ( Z G ) A N D R E N S S E N A N D N I E U W E N B R O E K [ 2 ] ( R N )

SLIDE 19

CONFIDENCE REGIONS

𝑠 𝜄1

𝑂, 𝜄2 𝑂 𝜓2 2

(10)

𝜄1, 𝜄2: r 𝜄1, 𝜄2 ≤ 𝜓𝑒𝑔=2,𝛽

2

(11)

The (1−α) Wilk type confidence region for 𝜄1

𝑂, 𝜄2 𝑂 is constructed by choosing:

Under some regularity conditions

CONFIDENCE INTERVALS obtained using a numerical algorithm

SLIDE 20

SUMMARY

We present an Empirical Likelihood approach to combining information from

multiple surveys in presence of benchmark and consistency constraints

This approach may be used to estimate a wide class of parameters and can be

used in complex sampling designs

Under the tested scenarios, the proposed point estimator shows satisfactory

performance compared to the other available estimators in terms of relative bias

The main advantage lies in the possibility to construct confidence regions using

the 𝝍𝟑approximation of the empirical log likelihood ratio function

A numerical algorithm for constructing confidence intervals is proposed
Although the proposed method entails some numerical operations, it is still less

computationally intensive than methods such as bootstrap and relatively easy to implement

SLIDE 21

LITERATURE

[1] K. D. Zieschang, Sample weighting methods and estimation of totals in the consumer expenditure

survey. Journal of the American Statistical Association, 85(412), (1990),

986–1001. [2] R.H. Renssen and N.J. Nieuwenbroek. Aligning estimates for common variables in two or more sample surveys. Journal of the American Statistical Association, 92(437), (1997), 368–374. [3] Takis Merkouris. Combining independent regression estimators from multiple surveys. Journal of the American Statistical Association, 99(468), (2004), 1131-1139. [4] Ch. Wu, Combining information from multiple surveys through the empirical likelihood method, Canadian Journal of Statistics, 32(1) (2004), 15–26. [5] Y.G. Berger and O. De La Riva Torres. Empirical likelihood confidence intervals for complex sampling

designs. Southampton Statistical Sciences Research Institute, (S3RI Methodology Working Papers),

(2012). [6] Ch. Wu and J.K. Rao, Pseudo Empirical Likelihood Ratio Confidence Intervals for Complex Surveys, The Canadian Journal of Statistics, 34, (2006), 359-375. [7] Office for National Statistics and Department for Environment, Food and Rural Affairs, Expenditure and Food Survey, 2006 [computer file]. 3rd Edition. Colchester, Essex: UK Data Archive [distributor], July 2009. SN: 5986.