Aligning estimates from different surveys using Empirical Likelihood - - PowerPoint PPT Presentation
Aligning estimates from different surveys using Empirical Likelihood - - PowerPoint PPT Presentation
Aligning estimates from different surveys using Empirical Likelihood methods EWA KABZINSKA AND YVES G. BERGER THE UNIVERSITY OF SOUTHAMPTON OUTLINE 1. INTRODUCTION 1. WHY IS IT BENFICIAL TO COMBINE INFORMATION? 2. CURRENT APPROACHES 3.
OUTLINE
- 1. INTRODUCTION
1. WHY IS IT BENFICIAL TO COMBINE INFORMATION? 2. CURRENT APPROACHES 3. WHY EMPIRICAL LIKELIHOOD? 4. EMPIRICAL LOGLIKELIHOOD FUNCTION 5. CONSTRAINTS
- 2. POINT ESTIMATION
1. ESTIMATION OF SCALE LOADS 2. ESTIMATING EQUATIONS 3. EMPIRICAL LOGLIKELIHOOD RATIO FUNCTION 4. COMPARISON WITH OTHER APPROACHES
- 3. CONFIDENCE REGIONS
- 4. SUMMARY
INTRODUCTION
- Often different surveys carried out independently
in the same population measure some common variables
- Population level parameters associated with these
common variables may be unknown or unreliable
- Examples:
- Household size and composition, tenure type
- Income, expenditure
- Educational attainment
- Ethnic origin
INTRODUCTION
WHY IS IT BENFICIAL TO COMBINE INFORMATION?
- CONSISTENCY - both samples give the same point
estimate for the common variables
- IMPROVED PRECISION – „borrowing strength” from
the other samples
INTRODUCTION
INTRODUCTION
INTRODUCTION
- Using information from two surveys, we want to obtain a
single set of positive weights, which:
- give the same estimates for the unknown totals of the common
variables (Z),
- Capture aditional benchmark constraints (𝑌1 and 𝑌2)
- may be used for estimation of other population level parameters
(𝜄1
𝑂and 𝜄2 𝑂)
- Once the weights are created, each survey can be analysed
separatelly
INTRODUCTION
CURRENT APPROACHES
- GREG estimators with enlarged number of predictors by
Zieschang [1], Renssen and Nieuwenbroek [2] and Merkouris [3]
- Pseudo Empirical Likelihood estimator of Wu [4]
- Single sample Empirical Likelihood approach for complex
sampling designs by Berger and De La Riva Torres [5]
OTHER RELEVANT WORK
- Model based projection estimator of Kim and Rao (2011)
- Weighted Empirical Likelihood approach to the common mean problem
by Tsao and Wu (2006)
INTRODUCTION
WHY EMPIRICAL LIKELIHOOD?
- Variables of interest are often skewed (e.g. income,
expenditure) - Empirical Likelihood is a nonparametric approach
- EL allows to easily incorporate additional benchmark
constraints
- Asymmetric, data-driven confidence reagions may be
- btained easily, without relying on variance estimation
- Weights are positive by definition
INTRODUCTION
EMPIRICAL LOGLIKELIHOOD FUNCTION
𝓂 𝑛 = 𝓂 𝑛1, 𝑛2 = log 𝑛1𝑗
𝑗∈𝑡1
+ log 𝑛2𝑘
𝑘∈𝑡2
(1)
INTRODUCTION
CONSTRAINTS
𝑛1𝑗
𝑗∈𝑡1
𝜌1𝑗 = 𝑜1 𝑛2𝑗
𝑘∈𝑡2
𝜌2𝑗 = 𝑜2
(2)
𝑛1𝑗
𝑗∈𝑡1
𝑔(𝑦1𝑗, 𝜘1 ) = 0 𝑛2𝑗
𝑘∈𝑡2
𝑔(𝑦2𝑗, 𝜘2 ) = 0
(3)
𝑛1𝑗
𝑗∈𝑡1
𝑨1𝑗 = 𝑛2𝑗
𝑘∈𝑡2
𝑨2𝑗
(4)
𝑛1𝑗 > 0 𝑛2𝑗 > 0
(5)
POINT ESTIMATION
ESTIMATION OF SCALE LOADS
Find 𝒏 = 𝑏𝑠 𝑛𝑏𝑦 {𝓂 𝒏 = log 𝑛 𝑗
𝑗∈𝑡1∪𝑡2
} Solution: where 𝜌𝑗 is the inclusion probability of the i-th unit, 𝜽 is the vector of Lagrange multipliers and 𝒅 is the vector of constraints (2)-(5) 𝑛 𝑗 = (𝜌𝑗 + 𝜽𝑼 𝒅𝑗)−1 (6)
POINT ESTIMATION
ESTIMATING EQUATIONS
Let 𝜄1
𝑂 and 𝜄2 𝑂 befixed, unknown population level
parameters of interest, solutions to: Example: t𝑗 𝑧t𝑗, 𝜄t = 𝑧t𝑗 − 𝜄t𝜌𝑗𝑜𝑢
−1
Aim: point estimators for 𝜄1
𝑂 and 𝜄2 𝑂
1𝑗 𝑧1𝑗, 𝜄1 = 0,
𝑗∈𝑉
2𝑗 𝑧2𝑗, 𝜄2 = 0
𝑗∈𝑉
(7)
POINT ESTIMATION
EMPIRICAL LOGLIKELIHOOD RATIO FUNCTION
𝑠 𝜄1, 𝜄2 = 2 𝓂 𝒏 − 𝓂 𝒏 ∗, 𝜄1, 𝜄2 (8) Constraints (2)-(5) Constraints (2)-(5),
𝑛 1𝑗
∗ 1𝑗 𝑧1𝑗, 𝜄1 = 0, 𝑗∈𝑡1
𝑛 2𝑗
∗ 2𝑗 𝑧2𝑗, 𝜄2 = 0 𝑗∈𝑡2
POINT ESTIMATION
EMPIRICAL LOGLIKELIHOOD RATIO FUNCTION
𝜄 1, 𝜄 2 = arg 𝑛𝑗𝑜𝜄1,𝜄2 𝑠 𝜄1, 𝜄2 (9) Point estimators for𝜄1
𝑂, 𝜄2 𝑂:
SOME ASYMPTOTIC PROPERTIES OF THE ESTIMATOR:
- Equivalent to a GREG-family estimator making use of
information from both samples
- Root-n consistent
POINT ESTIMATION
COMPARISON WITH OTHER APPROACHES
Two test populations:
- 1. Skewed distribution generated according to the model
proposed in [6]
- 2. 2006 British Expenditure and Food Survey
- x: number of people living in the household and number of rooms in
the household
- z: gross weekly income
- y: total gross expenditure and total expenditure on housing (7), total
expenditure on clothing and the total expenditure on housing (8), total expenditure on clothing and the total expenditure on food (9)
POINT ESTIMATION
COMPARISON WITH OTHER APPROACHES
- 10 000 iterations
- Two independent samples selected by systematic random
sampling
- Tested estimators:
- The proposed Empirical Likelihood estimator (EL),
- Wu’s Pseudo Empirical Likelihood estimator (PEL) [4],
- Renssen and Nieuwenbroek’s GREG-type estimator (RN) [2],
- Zieschang’s GREG-type estimator (ZG) [1]
POINT ESTIMATION
RELATIVE BIASES OF THE ESTIMATORS
N 𝑜1 𝑜2 𝜄 1
(𝐹𝑀)
𝜄 1
(𝑋𝑉)
𝜄 1
(𝑆𝑂)
𝜄 1
(𝑎𝐻)
𝜄 2
(𝐹𝑀)
𝜄 2
(𝑋𝑉)
𝜄 2
(𝑆𝑂)
𝜄 2
(𝑎𝐻)
Generated data 1
100000 1000 1000 0.01%
- 0.02%
0.19%
- 0.16%
- 0.03%
- 0.06%
- 0.16%
- 0.17%
2
100000 200 400 0.01% 0.01%
- 0.99%
- 0.76%
- 0.01%
- 0.11%
- 0.37%
- 0.53%
3
100000 200 200 0.01% 0.13%
- 0.76%
- 0.64%
0.02%
- 0.06%
- 0.62%
- 0.68%
4
2500 160 160 0.00%
- 0.04%
- 1.14%
- 0.98%
- 0.02%
- 0.12%
- 0.97%
- 1.09%
5
2500 140 260
- 0.01%
0.15%
- 1.28%
- 0.98%
0.00%
- 0.13%
- 0.51%
- 0.72%
6
2500 240 240 0.01% 0.13%
- 0.76%
- 0.64%
0.02%
- 0.06%
- 0.62%
- 0.68%
Expenditure and Food Survey data
7
6645 500 500
- 0.11%
0.07%
- 0.57%
- 0.31%
- 0.05%
0.21%
- 0.56%
- 0.20%
8
6645 500 500 0.38% 0.44%
- 0.07%
0.03% 0.06% 0.06%
- 0.38%
- 0.35%
9
6645 500 500 0.07% 0.07%
- 0.38%
- 0.30%
0.01% 0.01%
- 0.36%
- 0.32%
T A B L E 1 . R E L A T I V E B I A S E S O F T H E P R O P O S E D E M P I R I C A L L I K E L I H O O D E S T I M A T O R ( E L ) , W U ’ S P S E U D O E M P I R I C A L L I K E L I H O O D E S T I M A T O R [ 4 ] ( W U ) , G R E G E S T I M A T O R S P R O P O S E D B Y Z I E S C H A N G [ 1 ] ( Z G ) A N D R E N S S E N A N D N I E U W E N B R O E K [ 2 ] ( R N )
CONFIDENCE REGIONS
𝑠 𝜄1
𝑂, 𝜄2 𝑂 𝜓2 2
(10)
𝜄1, 𝜄2: r 𝜄1, 𝜄2 ≤ 𝜓𝑒𝑔=2,𝛽
2
(11)
The (1−α) Wilk type confidence region for 𝜄1
𝑂, 𝜄2 𝑂 is constructed by choosing:
Under some regularity conditions
CONFIDENCE INTERVALS obtained using a numerical algorithm
SUMMARY
- We present an Empirical Likelihood approach to combining information from
multiple surveys in presence of benchmark and consistency constraints
- This approach may be used to estimate a wide class of parameters and can be
used in complex sampling designs
- Under the tested scenarios, the proposed point estimator shows satisfactory
performance compared to the other available estimators in terms of relative bias
- The main advantage lies in the possibility to construct confidence regions using
the 𝝍𝟑approximation of the empirical log likelihood ratio function
- A numerical algorithm for constructing confidence intervals is proposed
- Although the proposed method entails some numerical operations, it is still less
computationally intensive than methods such as bootstrap and relatively easy to implement
LITERATURE
[1] K. D. Zieschang, Sample weighting methods and estimation of totals in the consumer expenditure
- survey. Journal of the American Statistical Association, 85(412), (1990),
986–1001. [2] R.H. Renssen and N.J. Nieuwenbroek. Aligning estimates for common variables in two or more sample surveys. Journal of the American Statistical Association, 92(437), (1997), 368–374. [3] Takis Merkouris. Combining independent regression estimators from multiple surveys. Journal of the American Statistical Association, 99(468), (2004), 1131-1139. [4] Ch. Wu, Combining information from multiple surveys through the empirical likelihood method, Canadian Journal of Statistics, 32(1) (2004), 15–26. [5] Y.G. Berger and O. De La Riva Torres. Empirical likelihood confidence intervals for complex sampling
- designs. Southampton Statistical Sciences Research Institute, (S3RI Methodology Working Papers),
(2012). [6] Ch. Wu and J.K. Rao, Pseudo Empirical Likelihood Ratio Confidence Intervals for Complex Surveys, The Canadian Journal of Statistics, 34, (2006), 359-375. [7] Office for National Statistics and Department for Environment, Food and Rural Affairs, Expenditure and Food Survey, 2006 [computer file]. 3rd Edition. Colchester, Essex: UK Data Archive [distributor], July 2009. SN: 5986.