InGrid 2.0 Study of poverty measurement on context-specific - - PowerPoint PPT Presentation

ingrid 2 0 study of poverty measurement on context
SMART_READER_LITE
LIVE PREVIEW

InGrid 2.0 Study of poverty measurement on context-specific - - PowerPoint PPT Presentation

InGrid 2.0 Study of poverty measurement on context-specific environment Federica Nicolussi and Manuela Cazzaro September 27, 2018 Results from visiting at TARKI Group Budapest SPMCSE September 27, 2018 1 / 27 Overview Introduction 1


slide-1
SLIDE 1

InGrid 2.0 Study of poverty measurement on context-specific environment

Federica Nicolussi and Manuela Cazzaro September 27, 2018 Results from visiting at TARKI Group Budapest

SPMCSE September 27, 2018 1 / 27

slide-2
SLIDE 2

Overview

1

Introduction

2

Models Graphical model Multivariate Regression parametrization

3

Application Results

4

Conclusions

5

Acknowledgement

6

References

SPMCSE September 27, 2018 2 / 27

slide-3
SLIDE 3

Introduction

Framework: q categorical (ordinal) variables Q = {X1, . . . , Xq} collected in a contingency table. study of (in)dependence relationships among these variables.

SPMCSE September 27, 2018 3 / 27

slide-4
SLIDE 4

Introduction

Framework: q categorical (ordinal) variables Q = {X1, . . . , Xq} collected in a contingency table. study of (in)dependence relationships among these variables. Main goals: 1) to consider different kind of relationships (marginal, conditional and context-specific independencies in the same model) 2) to represent these relationships through graphical model [Stratified chain graph model]. 3) to represent the variables in a multivariate regression system [Regression parameters];

SPMCSE September 27, 2018 3 / 27

slide-5
SLIDE 5

Example of potential relationships among the variables

(AIM 1) Let us consider 3 variables A, B and C, for instance A Gender (M,F); B University Position (Student, PhD and Post Doc, Academic staff and Tecnical and administrative staff); C Age (≤ 25, 25 ⊣ 40, > 40).

SPMCSE September 27, 2018 4 / 27

slide-6
SLIDE 6

Example of potential relationships among the variables

(AIM 1) Let us consider 3 variables A, B and C, for instance A Gender (M,F); B University Position (Student, PhD and Post Doc, Academic staff and Tecnical and administrative staff); C Age (≤ 25, 25 ⊣ 40, > 40). MARGINAL INDEPENDENCE: A ⊥ C → GENDER ⊥ AGE

SPMCSE September 27, 2018 4 / 27

slide-7
SLIDE 7

Example of potential relationships among the variables

(AIM 1) Let us consider 3 variables A, B and C, for instance A Gender (M,F); B University Position (Student, PhD and Post Doc, Academic staff and Tecnical and administrative staff); C Age (≤ 25, 25 ⊣ 40, > 40). MARGINAL INDEPENDENCE: A ⊥ C → GENDER ⊥ AGE CONDITIONAL INDEPENDENCE: A ⊥ B|C → GENDER ⊥ POSITION|AGE

SPMCSE September 27, 2018 4 / 27

slide-8
SLIDE 8

Example of potential relationships among the variables

(AIM 1) Let us consider 3 variables A, B and C, for instance A Gender (M,F); B University Position (Student, PhD and Post Doc, Academic staff and Tecnical and administrative staff); C Age (≤ 25, 25 ⊣ 40, > 40). MARGINAL INDEPENDENCE: A ⊥ C → GENDER ⊥ AGE CONDITIONAL INDEPENDENCE: A ⊥ B|C → GENDER ⊥ POSITION|AGE CONTEXT-SPECIFIC INDEPENDENCE: A ⊥ B|C = ck → GENDER ⊥ POSITION|AGE = (> 40)

SPMCSE September 27, 2018 4 / 27

slide-9
SLIDE 9

Graphical representation:

(AIM 2) ⋆ These different kind of independencies can be highlighted with a representation taking advantage of the Graphical Model

SPMCSE September 27, 2018 5 / 27

slide-10
SLIDE 10

Graphical representation:

(AIM 2) ⋆ These different kind of independencies can be highlighted with a representation taking advantage of the Graphical Model ⋆ The three kinds of independencies can be well represented by the so-called Stratified Chain Graph Model (SCGM)

SPMCSE September 27, 2018 5 / 27

slide-11
SLIDE 11

Stratified regression chain graph models

A GRAPH is defined as a set of vertices (V) and edges (E). The edge can be undirected or directed (arrow).

A B C A B C A B C D E

SPMCSE September 27, 2018 6 / 27

slide-12
SLIDE 12

Stratified regression chain graph models

A B C D E

⋆ Vertices act as variables ⋆ Any Missing Arc is a symptom of independence

SPMCSE September 27, 2018 7 / 27

slide-13
SLIDE 13

Stratified regression chain graph models

A B C D E

⋆ Vertices act as variables ⋆ Any Missing Arc is a symptom of independence ⋆ Markov properties:

missing undirected arc (C − D): C ⊥ D|AB missing directed arc (B → C): C ⊥ B|A missing directed arcs (A → E) and (B → E): E ⊥ AB|CD

SPMCSE September 27, 2018 7 / 27

slide-14
SLIDE 14

Stratified regression chain graph models

A B C D E

⋆ Any Directed Arc links a covariate with a response variable ⋆ Any Undirected Arc describes symmetrical dependence (among a set

  • f covariate or among a set of

dependent variables)

SPMCSE September 27, 2018 8 / 27

slide-15
SLIDE 15

Stratified regression chain graph models

A B C D E

⋆ Any Directed Arc links a covariate with a response variable ⋆ Any Undirected Arc describes symmetrical dependence (among a set

  • f covariate or among a set of

dependent variables) ⋆ A,B: covariate; ⋆ C,D,E: dependent variables;

SPMCSE September 27, 2018 8 / 27

slide-16
SLIDE 16

Stratified regression chain graph models

A B C D E AB=(a1,∗)

⋆ Labelled arcs report the list of modality(ies) of variable(s) according to the context-specific independence C ⊥ D|AB = (a1, ∗)

SPMCSE September 27, 2018 9 / 27

slide-17
SLIDE 17

Multivariate Regression parametrization

(AIM 3) Given two sets of variables: the response variables (C and D) and the covariates (A and B), the multivariate regression models is: ηABC

C

(iC|iAiB) = βC

∅ + βC A (iA) + βC B (iB) + βC AB(iAiB)

ηABD

D

(iD|iAiB) = βD

∅ + βD A (iA) + βD B (iB) + βD AB(iAiB)

ηABCD

CD

(iCD|iAiB) = βCD

+ βCD

A (iA) + βCD B (iB) + βCD AB (iAiB)

η are log-linear parameters (contrasts of logarithms -of sum- of probabilities) that are defined on marginal tables (by respecting completeness and hierarchical properties), Bartolucci, Colombi and Forcina, 2007; the regression β parameters are a function of the η parameters.

SPMCSE September 27, 2018 10 / 27

slide-18
SLIDE 18

Model definition (constraints on regression parameters)

From the missing arcs to the constraints on regression parameters MISSING UNDIRECTED ARC (C − D): C ⊥ D|AB ηABCD

CD

(iCiD|iAiB) = 0

∀iA,iB,iC ,iD

MISSING DIRECTED ARC (B → C): B ⊥ C|A ηABC

C

(iC|iAiB) = βC

∅ + βC A (iA) ∀iA,iB,iC

A B C D E

SPMCSE September 27, 2018 11 / 27

slide-19
SLIDE 19

Model definition (constraints on regression parameters)

From the labelled arcs to the constraints on regression parameters LABELLED UNDIRECTED ARC (C − D): C ⊥ D|AB = (a1, ∗) ηABCD

CD

(iCiD|i′

Ai′ B) = 0 ∀iC ,iD and i′

Ai′ B=(a1,∗)

LABELLED DIRECTED ARC (B → C): B ⊥ C|A = a1 ηABC

C

(iC|i′

AiB) = βC ∅ +βC A (i′ A) ∀iB,iC and i′

A=a1

A B C D E AB=(a1,∗) A=(a1)

SPMCSE September 27, 2018 12 / 27

slide-20
SLIDE 20

At glance

Graph gives a system of independencies; The unconstrained parameters describe the dependence relationships; The system of independencies identifies the regression parameters constrained to zero; A model is estimated through the Likelihood Ratio test.

SPMCSE September 27, 2018 13 / 27

slide-21
SLIDE 21

Selection of the best fitting model

Step 1 Exploratory phase where we test all SCRGMs with only one missed

  • arc. We consider as reduced model the one with the missing arcs that

have lead to a p-value greater than 0.01; Step 2 We start from the reduced model and we add one by one all arcs. We choose the HMM model with lowest AIC; Step 3 We proceed to a further simplification of the model by replacing the missing arcs with labelled arcs. We choose with lowest AIC among the ones with p-value greater or equal to 0.05.

SPMCSE September 27, 2018 14 / 27

slide-22
SLIDE 22

Data Set

We consider the subjects from 26 European countries which the self-defined current economic status (variable PL031 in the survey) is (i) employee working full-time, (ii) employee working part-time, (iii) self-employed working full-time, (iv) self-employed working part-time, (v) unemployed, (vi) permanently disabled or/and unfit to work or (vii) fulfilling domestic tasks and care responsibilities. The survey covers 288132 individuals.

SPMCSE September 27, 2018 15 / 27

slide-23
SLIDE 23

Variables

G Gender (1= male, 2= female); A Age, categorized in 4 values representing the quartiles (1= 16 ⊢ 36; 2= 36 ⊢ 46; 3= 46 ⊢ 55; 4= 55 ⊢ 81); W Status in employment (1= self-employed with employees, 2= self-employed without employees, 3= employee, 4= family worker, 5= unemployed) H General health (1= very good, 2= good, 3= fair, 4= bad, 5= very bad) P Poverty indicator (0= equivalised disposable income ≥ at risk of poverty threshold, 1= equivalised disposable income < at risk of poverty threshold) AIMS: How gender and age affect the working condition and the general perceived health; How these variables affect the poverty indicator.

SPMCSE September 27, 2018 16 / 27

slide-24
SLIDE 24

Some information

We have 288132 observations. We collect the 5 variables in a contingency table of 400 cells where

  • nly 33 cells are null.

The class of marginal sets is {(G, A); (G, A, W ); (G, A, H); (G, A, W , H); (G, A, W , H, P)}

SPMCSE September 27, 2018 17 / 27

slide-25
SLIDE 25

Mosaic Plots

Representation of the distribution of G and P in two conditional

  • distributions. (left) evidence of dependence between G and P; (right)

evidence of independence between G and P.

−3.9 −2.0 0.0 2.0 4.0 5.7 deviance residuals: p−value = 8.8224e−16

A, H, W = 4,1,5

P G 2 1 1 −0.046 0.000 0.052 deviance residuals: p−value = 0.93214

A, H, W = 1,5,5

P G 2 1 1

SPMCSE September 27, 2018 18 / 27

slide-26
SLIDE 26

SCGR model

G 2=19.82, df=33, p-value=0.966

G ⊥ P|AHW = iK1 A ⊥ P|GHW = iK2 H ⊥ P|GAW = iK3 where K1 = {(4, 1, 1); (2, 2, 1); (1, 4, 3); (4, 1, 4); (4, 3, 4); (1, 5, 5)} K2 = {(1, 2, 1); (1, 4, 2, ); (1, 4, 3); (2, 1, 4)} K3 = {(1, 1, 2); (1, 2, 4); (1, 3, 4); }

G A H W P

AHW ∈ K1 GHW ∈ K2 GAW ∈ K3

SPMCSE September 27, 2018 19 / 27

slide-27
SLIDE 27

Regression parameters part 1

ηGAH

H

(iH|iGA) =

  • t∈GA

βH

t (it)

iGA 11 21 12 22 13 23 14 24 iH = good

  • 0,07

0,08 0,54 0,63 1,03 1,06 1,35 1,51 iH = fair

  • 1,78
  • 1,52
  • 0,83
  • 0,57

0,16 0,35 1,02 1,36 iH = bad

  • 3,36
  • 3,10
  • 2,43
  • 2,14
  • 1,27
  • 1,02
  • 0,15

0,33 iH = vary bad

  • 4,49
  • 4,57
  • 3,84
  • 3,77
  • 2,75
  • 2,64
  • 1,68
  • 1,03

SPMCSE September 27, 2018 20 / 27

slide-28
SLIDE 28

Regression parameters part 2

ηGAW

W

(iW |iGA) =

  • t∈GA

βW

t (it)

iGA 11 21 12 22 13 23 14 24 W = 2 1,30 1,74 0,95 1,27 0,89 1,36 1,05 1,43 W = 3 3,73 4,59 2,85 3,72 2,62 3,67 2,44 3,41 W = 4

  • 0,57
  • 0,04
  • 2,87
  • 0,59
  • 3,01
  • 0,60
  • 2,55
  • 0,50

W = 5 2,57 3,86 1,16 2,66 1,16 2,67 1,59 3,46

SPMCSE September 27, 2018 21 / 27

slide-29
SLIDE 29

Regression parameters part 2

ηGAEW

W

(iEW |iGA) =

  • t∈GA

βEW

t

(it)

iGA iEW = 22 iEW = 32 iEW = 42 iEW = 23 iEW = 33 iEW = 43 22

  • 2,4754
  • 1,088
  • 2,1511
  • 3,7235
  • 2,7586
  • 3,0713

23

  • 1,0745
  • 1,8838
  • 1,8447
  • 1,3376
  • 4,3929
  • 5,0068

24

  • 2,041
  • 0,1457

0,4076 0,1554 0,4376 0,1487 25

  • 2,041
  • 0,1457

0,4076 0,1554 0,4376 0,1487

SPMCSE September 27, 2018 22 / 27

slide-30
SLIDE 30

Regression parameters part 2

iG AHW P = 2 iG AHW P = 2 iG AHW P = 2 iG AHW P = 2 iG AHW P = 2 1111

  • 0,02559

2141 0,365155 1222

  • 0,67201

2252 45,82337 1333

  • 25,6109

2111

  • 0,31051

1241

  • 3,56451

2222 1,110271 1352 23,1385 2333

  • 3,95803

1211

  • 1,76053

2241

  • 22,7201

1322

  • 20,4415

2352 27,35861 1433

  • 4,2822

2211

  • 1,86117

1341

  • 22,1938

2322

  • 18,3234

1452 44,81975 2433 17,2771 1311

  • 20,9713

2341

  • 20,3191

1422

  • 1,15438

2452 49,12801 1143

  • 2,63091

2311

  • 21,0637

1441

  • 1,75969

2422 0,370168 1113

  • 2,2461

2143 20,05227 1411

  • 0,08826

2441 0,706974 1132

  • 1,52665

2113 17,3933 1243

  • 25,0956

2411 0,07576 1151 0,414511 2132 1,107598 1213

  • 4,36905

2243

  • 1,6593

1121 0,301576 2151 2,092933 1232

  • 3,35302

2213 15,82797 1343

  • 43,9062

2121

  • 0,07569

1251

  • 2,34795

2232 0,341225 1313

  • 23,6601

2343

  • 20,2286

1221 0,963888 2251 1,280823 1332

  • 22,5923

2313

  • 3,04126

1443

  • 23,1913

2221 0,557474 1351

  • 20,4344

2332

  • 18,8195

1413

  • 2,72196

2443

  • 0,07466

1321

  • 18,527

2351

  • 17,7221

1432

  • 0,86475

2413 17,73191 1153

  • 21,1515

2321

  • 18,4346

1451 1,735888 2432 2,513405 1123

  • 2,21047

2153

  • 0,50869

1421 0,45437 2451 3,532973 1142

  • 0,65964

2123 16,95908 1253

  • 1,06106

2421

  • 0,30068

1112 0,056486 2142

  • 18,6708

1223

  • 1,844

2253 21,17888 1131

  • 1,87645

2112 1,590824 1242

  • 22,9016

2223 17,97288 1353

  • 19,5181

2131

  • 2,56222

1212

  • 1,58953

2242

  • 17,3608

1323

  • 21,4503

2353 3,166362 1231

  • 4,3371

2212 0,573607 1342

  • 42,3171

2323

  • 1,28252

1453 2,944493 2231

  • 3,09182

1312

  • 21,1281

2342

  • 36,3713

1423

  • 2,53632

2453 24,48228 1331

  • 23,2078

2312

  • 18,8477

1442

  • 21,0566

2423 17,27476 1114

  • 20,9038

2331

  • 21,4846

1412 0,056285 2442

  • 16,0784

1133

  • 4,26928

2114 18,64108 1431

  • 2,21381

2412 2,166507 1152 0,573049 2133 16,65308 1214

  • 21,703

2431

  • 1,72497

1122

  • 1,41347

2152 25,34966 1233

  • 6,67417

2214 18,78614 1141

  • 0,85299

2122

  • 0,21722

1252 40,778 2233 14,70979 1314

  • 40,4894

SPMCSE September 27, 2018 23 / 27

slide-31
SLIDE 31

Conclusions and further research

⋆ The CS independence allows to fix on a modality(ies) of a conditioning variable in order to study the effect of this(these) on the

  • ther variables.

⋆ The graphical representation admits a visual simplification of the relationships among the variables. ∼ With sparse tables the asymptotic theory does not hold. ∼ Computationally expensive to test all possible models.

SPMCSE September 27, 2018 24 / 27

slide-32
SLIDE 32

Acknowledgement

This report is based on data from Eurostat, EU Statistics on Income and Living Conditions [2016]. The research leading to these results has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 730998, InGRID-2 Integrating Research Infrastructure for European expertise on Inclusive Growth from data to policy.

SPMCSE September 27, 2018 25 / 27

slide-33
SLIDE 33

References

BARTOLUCCI, F., COLOMBI, R., & FORCINA, A. 2007. An extended class of marginal link functions for modelling contingency tables by equality and inequality constraints. Statistica Sinica, 17, 691-711. BERGSMA, W. P., & RUDAS, T. 2002. Marginal models for categorical data. Annals of Statistics, 140-159. CAZZARO, M., & COLOMBI, R. 2014. Marginal Nested Interactions for Contingency Tables. Communications in Statistics-Theory and Methods, 43(13), 2799-2814. COLOMBI, R., GIORDANO, S., & CAZZARO, M. 2014. hmmm: An R Package for Hierarchical Multinomial Marginal

  • Models. Journal of Statistical Software, 59(11), 1-25.

LAURITZEN, S. L, & WERMUTH, N. 1989. Graphical models for associations between variables, some of which are qualitative and some quantitative. Annals of Statistics, 31-57. MARCHETTI, G. M, LUPPARELLI, M., et al. 2011. Chain graph models of multivariate regression type for categorical

  • data. Bernoulli, 17(3), 827-844.

NICOLUSSI, F. 2013. Marginal parameterizations for conditional independence models and graphical models for categorical data. Ph.D. thesis, University of Milan Bicocca. NYMAN, H., PENSAR, J., KOSKI, T., & CORANDER, J. 2016. Context-specific independence in graphical log-linear

  • models. Computational Statistics, 31(4), 1493-1512.

HØJSGAARD, S. (2004). Statistical inference in context specific interaction models for contingency tables. Scandinavian journal of statistics, 31(1), 143-158. SPMCSE September 27, 2018 26 / 27