ingrid 2 0 study of poverty measurement on context
play

InGrid 2.0 Study of poverty measurement on context-specific - PowerPoint PPT Presentation

InGrid 2.0 Study of poverty measurement on context-specific environment Federica Nicolussi and Manuela Cazzaro September 27, 2018 Results from visiting at TARKI Group Budapest SPMCSE September 27, 2018 1 / 27 Overview Introduction 1


  1. InGrid 2.0 Study of poverty measurement on context-specific environment Federica Nicolussi and Manuela Cazzaro September 27, 2018 Results from visiting at TARKI Group Budapest SPMCSE September 27, 2018 1 / 27

  2. Overview Introduction 1 Models 2 Graphical model Multivariate Regression parametrization Application 3 Results Conclusions 4 Acknowledgement 5 References 6 SPMCSE September 27, 2018 2 / 27

  3. Introduction Framework : q categorical (ordinal) variables Q = { X 1 , . . . , X q } collected in a contingency table. study of (in)dependence relationships among these variables. SPMCSE September 27, 2018 3 / 27

  4. Introduction Framework : q categorical (ordinal) variables Q = { X 1 , . . . , X q } collected in a contingency table. study of (in)dependence relationships among these variables. Main goals : 1) to consider different kind of relationships ( marginal , conditional and context-specific independencies in the same model) 2) to represent these relationships through graphical model [Stratified chain graph model]. 3) to represent the variables in a multivariate regression system [Regression parameters]; SPMCSE September 27, 2018 3 / 27

  5. Example of potential relationships among the variables (AIM 1) Let us consider 3 variables A , B and C , for instance A Gender (M,F); B University Position (Student, PhD and Post Doc, Academic staff and Tecnical and administrative staff); C Age ( ≤ 25, 25 ⊣ 40, > 40). SPMCSE September 27, 2018 4 / 27

  6. Example of potential relationships among the variables (AIM 1) Let us consider 3 variables A , B and C , for instance A Gender (M,F); B University Position (Student, PhD and Post Doc, Academic staff and Tecnical and administrative staff); C Age ( ≤ 25, 25 ⊣ 40, > 40). MARGINAL INDEPENDENCE : A ⊥ C → GENDER ⊥ AGE SPMCSE September 27, 2018 4 / 27

  7. Example of potential relationships among the variables (AIM 1) Let us consider 3 variables A , B and C , for instance A Gender (M,F); B University Position (Student, PhD and Post Doc, Academic staff and Tecnical and administrative staff); C Age ( ≤ 25, 25 ⊣ 40, > 40). MARGINAL INDEPENDENCE : A ⊥ C → GENDER ⊥ AGE CONDITIONAL INDEPENDENCE : A ⊥ B | C → GENDER ⊥ POSITION | AGE SPMCSE September 27, 2018 4 / 27

  8. Example of potential relationships among the variables (AIM 1) Let us consider 3 variables A , B and C , for instance A Gender (M,F); B University Position (Student, PhD and Post Doc, Academic staff and Tecnical and administrative staff); C Age ( ≤ 25, 25 ⊣ 40, > 40). MARGINAL INDEPENDENCE : A ⊥ C → GENDER ⊥ AGE CONDITIONAL INDEPENDENCE : A ⊥ B | C → GENDER ⊥ POSITION | AGE CONTEXT-SPECIFIC INDEPENDENCE : A ⊥ B | C = c k → GENDER ⊥ POSITION | AGE = ( > 40) SPMCSE September 27, 2018 4 / 27

  9. Graphical representation: (AIM 2) ⋆ These different kind of independencies can be highlighted with a representation taking advantage of the Graphical Model SPMCSE September 27, 2018 5 / 27

  10. Graphical representation: (AIM 2) ⋆ These different kind of independencies can be highlighted with a representation taking advantage of the Graphical Model ⋆ The three kinds of independencies can be well represented by the so-called Stratified Chain Graph Model (SCGM) SPMCSE September 27, 2018 5 / 27

  11. Stratified regression chain graph models A GRAPH is defined as a set of vertices (V) and edges (E). The edge can be undirected or directed (arrow). B B A C A A E C C B D SPMCSE September 27, 2018 6 / 27

  12. Stratified regression chain graph models ⋆ Vertices act as variables ⋆ Any Missing Arc is a symptom of A C independence E B D SPMCSE September 27, 2018 7 / 27

  13. Stratified regression chain graph models ⋆ Vertices act as variables ⋆ Any Missing Arc is a symptom of A C independence ⋆ Markov properties: missing undirected arc ( C − D ): E C ⊥ D | AB missing directed arc ( B → C ): C ⊥ B | A missing directed arcs B D ( A → E ) and ( B → E ): E ⊥ AB | CD SPMCSE September 27, 2018 7 / 27

  14. Stratified regression chain graph models ⋆ Any Directed Arc links a covariate with a response variable A C ⋆ Any Undirected Arc describes symmetrical dependence (among a set of covariate or among a set of E dependent variables) B D SPMCSE September 27, 2018 8 / 27

  15. Stratified regression chain graph models ⋆ Any Directed Arc links a covariate with a response variable A C ⋆ Any Undirected Arc describes symmetrical dependence (among a set of covariate or among a set of E dependent variables) ⋆ A,B: covariate; ⋆ C,D,E: dependent variables; B D SPMCSE September 27, 2018 8 / 27

  16. Stratified regression chain graph models ⋆ Labelled arcs report the list of modality(ies) of variable(s) according to the context-specific independence A C C ⊥ D | AB = ( a 1 , ∗ ) AB=(a 1 , ∗ ) E B D SPMCSE September 27, 2018 9 / 27

  17. Multivariate Regression parametrization (AIM 3) Given two sets of variables: the response variables ( C and D ) and the covariates ( A and B ), the multivariate regression models is: η ABC β C ∅ + β C A ( i A ) + β C B ( i B ) + β C ( i C | i A i B ) = AB ( i A i B ) C η ABD β D ∅ + β D A ( i A ) + β D B ( i B ) + β D ( i D | i A i B ) = AB ( i A i B ) D η ABCD β CD + β CD A ( i A ) + β CD B ( i B ) + β CD ( i CD | i A i B ) = AB ( i A i B ) CD ∅ η are log-linear parameters (contrasts of logarithms -of sum- of probabilities) that are defined on marginal tables (by respecting completeness and hierarchical properties), Bartolucci, Colombi and Forcina, 2007; the regression β parameters are a function of the η parameters. SPMCSE September 27, 2018 10 / 27

  18. Model definition (constraints on regression parameters) From the missing arcs to the constraints on regression parameters MISSING UNDIRECTED ARC ( C − D ) : C ⊥ D | AB A C η ABCD ( i C i D | i A i B ) = 0 ∀ i A , i B , i C , i D CD E MISSING DIRECTED ARC ( B → C ) : B ⊥ C | A B D η ABC ( i C | i A i B ) = β C ∅ + β C A ( i A ) ∀ i A , i B , i C C SPMCSE September 27, 2018 11 / 27

  19. Model definition (constraints on regression parameters) From the labelled arcs to the constraints on regression parameters LABELLED UNDIRECTED ARC ( C − D ) : C ⊥ D | AB = ( a 1 , ∗ ) A C A=(a 1 ) η ABCD ( i C i D | i ′ A i ′ B ) = 0 AB=(a 1 , ∗ ) ∀ i C , i D and i ′ A i ′ B =( a 1 , ∗ ) CD E LABELLED DIRECTED ARC ( B → C ) : B D B ⊥ C | A = a 1 ( i C | i ′ A ( i ′ η ABC A i B ) = β C ∅ + β C A ) ∀ i B , i C and i ′ A = a 1 C SPMCSE September 27, 2018 12 / 27

  20. At glance Graph gives a system of independencies; The unconstrained parameters describe the dependence relationships; The system of independencies identifies the regression parameters constrained to zero; A model is estimated through the Likelihood Ratio test. SPMCSE September 27, 2018 13 / 27

  21. Selection of the best fitting model Step 1 Exploratory phase where we test all SCRGMs with only one missed arc. We consider as reduced model the one with the missing arcs that have lead to a p-value greater than 0 . 01; Step 2 We start from the reduced model and we add one by one all arcs. We choose the HMM model with lowest AIC; Step 3 We proceed to a further simplification of the model by replacing the missing arcs with labelled arcs. We choose with lowest AIC among the ones with p-value greater or equal to 0 . 05. SPMCSE September 27, 2018 14 / 27

  22. Data Set We consider the subjects from 26 European countries which the self-defined current economic status (variable PL031 in the survey) is (i) employee working full-time , (ii) employee working part-time , (iii) self-employed working full-time , (iv) self-employed working part-time , (v) unemployed , (vi) permanently disabled or/and unfit to work or (vii) fulfilling domestic tasks and care responsibilities . The survey covers 288132 individuals. SPMCSE September 27, 2018 15 / 27

  23. Variables G Gender (1= male, 2= female); A Age, categorized in 4 values representing the quartiles (1= 16 ⊢ 36; 2= 36 ⊢ 46; 3= 46 ⊢ 55; 4= 55 ⊢ 81); W Status in employment (1= self-employed with employees, 2= self-employed without employees, 3= employee, 4= family worker, 5= unemployed) H General health (1= very good, 2= good, 3= fair, 4= bad, 5= very bad) P Poverty indicator (0= equivalised disposable income ≥ at risk of poverty threshold, 1= equivalised disposable income < at risk of poverty threshold) AIMS : How gender and age affect the working condition and the general perceived health; How these variables affect the poverty indicator. SPMCSE September 27, 2018 16 / 27

  24. Some information We have 288132 observations. We collect the 5 variables in a contingency table of 400 cells where only 33 cells are null. The class of marginal sets is { ( G , A ); ( G , A , W ); ( G , A , H ); ( G , A , W , H ); ( G , A , W , H , P ) } SPMCSE September 27, 2018 17 / 27

  25. Mosaic Plots Representation of the distribution of G and P in two conditional distributions. (left) evidence of dependence between G and P ; (right) evidence of independence between G and P . A, H, W = 4,1,5 A, H, W = 1,5,5 P P 0 1 0 1 deviance deviance residuals: residuals: 5.7 0.052 1 4.0 1 2.0 G G 0.000 0.0 2 −2.0 2 −3.9 −0.046 p−value = p−value = 8.8224e−16 0.93214 SPMCSE September 27, 2018 18 / 27

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend