to Choose the Matching Variables in Statistical Matching Marcello - PowerPoint PPT Presentation

The Use of Uncertainty to Choose the Matching Variables in Statistical Matching Marcello D’Orazio* ( madorazi@istat.it) Marco Di Zio* (dizio@istat.it) Mauro Scanu* (scanu@istat.it) *Italian National Institute of Statistics (Istat) NTTS 2015 conference, Brussels, 10-12 March 2015

Statistical Matching (data fusion or synthetic matching) Series of statistical methods for integrating two data sources (usually samples) referred to the same target population. Objective : study the relationship between variables not jointly observed in a single data source Y X X variables in common source A Y and Z are NOT jointly observed X Z source B 1 Uncertainty to choose the matching variables , M. D’Orazio, M. Di Zio, M. Scanu – NTTS 2015, Brussels

Objectives of Statistical Matching  micro : derive a “synthetic” data -set with X , Y and Z ; for instance: • A filled-in with Z • with Z filled in A and Y filled in B (file concatenation)  macro : estimation of parameters; for instance: • correlation coef. ( ) • regression coefficient ( ) • a contingency table ( ) Various methods available, depending on the objective (micro or macro) and on the framework (parametric, nonparametric or mixed). Uncertainty to choose the matching variables , M. D’Orazio, M. Di Zio, M. Scanu – NTTS 2015, Brussels

Matching Variables A and B may share many common variables X NOT all the X variables will be used. It is necessary to select just the most relevant X s called matching variables i.e. the subset of the X s connected, at the same time, with Y and Z : Many methods can be applied to identify (best predictors of Y ) and (best predictors of Z ). They imply separate analyses on A and B . Proposal : perform a unique analysis for choosing by searching the set of common variables more effective in reducing the uncertainty on the relationship between Y and Z Uncertainty is due to lack of information: Y and Z are NOT jointly observed. Uncertainty to choose the matching variables , M. D’Orazio, M. Di Zio, M. Scanu – NTTS 2015, Brussels

Uncertainty bounds Focus on categorical X , Y and Z variables are categorical. Objective of SM: estimation of the probabilities        , , p Pr Y j Z , k j 1, , J k 1, , K  jk In this case the uncertainty set can be computed by resorting to the Fréchet bounds By conditioning on the X , it is possible to conclude that the probability will lie in the interval:              p , p p max 0, p p 1 , p min p , p         jk jk h h j h k h j h k h h h Uncertainty to choose the matching variables , M. D’Orazio, M. Di Zio, M. Scanu – NTTS 2015, Brussels

Proposed method for choosing the matching variables Step 0) ordering of the X s according to their ability in minimizing:     1 ˆ   ˆ d p p   jk jk j k , J K Step 1) evaluate d for all the possible combinations of the starting variable(s) with each of the remaining ones ordered as in step (0) and evaluate the uncertainty associated in terms of d Step 2) Select the combination of the variables which determine the higher decrease in d and go back to step (1). Method tested with artificial data Uncertainty to choose the matching variables , M. D’Orazio, M. Di Zio, M. Scanu – NTTS 2015, Brussels

The data (1) Bayesian networks are used to generate two artificial samples sharing 3 binary X s with the following association structure : True association structure Association str. in A Association str. n B Output of the procedure X variables No. of Xs d X1 1 0.1703 X1*X3 2 0.1703 X1*X3*X2 3 0.1699 Uncertainty to choose the matching variables , M. D’Orazio, M. Di Zio, M. Scanu – NTTS 2015, Brussels

The data (2) Artificial data resembling EU-SILC Two artificial samples, and 7 common variables Output of the procedure: Best Combination of X variables d No. of X s 1 0.0878 Yes c.age 2 0.0781 Yes c.age*sex 3 0.0714 Yes c.age*sex*edu7 4 0.0608 No c.age*sex*edu7*area5 No c.age*sex*edu7*area5*hsize5 5 0.0411 6 0.0225 Yes c.age*sex*edu7*area5*hsize5*urb Yes 7 0.0162 c.age*edu7*marital*sex*hsize5*area5*urb The found combinations with 4 and 5 X s are very close to optimality Uncertainty to choose the matching variables , M. D’Orazio, M. Di Zio, M. Scanu – NTTS 2015, Brussels

Conclusions Pros: - avoids separate analyses - is able to find best solutions or solutions close to them - is fully authomatic, code written in R and related to the package StatMatch ( D’Orazio , 2015) Cons: - dependence on the initial ordering of the variables - absence of a stopping rule: by increasing the no. of X s the uncertainty always decreases but the tables become very sparse Uncertainty to choose the matching variables , M. D’Orazio, M. Di Zio, M. Scanu – NTTS 2015, Brussels

Essential References D’Orazio M., Di Zio M., and Scanu M. (2006) Statistical Matching, Theory and Practice . Wiley, New York. D’Orazio, M. (2015) “ StatMatch : Statistical Matching”, R package version 1.2.3 http://CRAN.R-project.org/package=StatMatch Uncertainty to choose the matching variables , M. D’Orazio, M. Di Zio, M. Scanu – NTTS 2015, Brussels

to Choose the Matching Variables in Statistical Matching Marcello - PowerPoint PPT Presentation

The Use of Uncertainty to Choose the Matching Variables in Statistical Matching Marcello DOrazio* ( madorazi@istat.it) Marco Di Zio* (dizio@istat.it) Mauro Scanu* (scanu@istat.it) *Italian National Institute of Statistics (Istat) NTTS 2015

7.5 Bipartite Matching Matching Matching. Input: undirected graph G = (V, E). M E

f TAB 2/13/2012 1 1 CHOOSE BUDGET MANAGEMENT CHOOSE BUDGET MANAGEMENT 2/13/2012 2 CHOOSE

Matching of Matrix Elements and Parton Showers CKKW matching in e + e collisions Lecture 2:

Global Shape Matching Section 3.3: Articulated Matching using Graph Cuts Global Shape Matching:

YCL Week 3 Lets talk about variables! Variables Variables are containers for data. Variables

Matching Bipartite Matching Input Given a (undirected) graph G = ( V , E ) Input Given a bipartite

Closures & Scoping Variables Parameters Local variables Free variables

Impedance Matching of 640 GHz SIS Mixer Impedance Matching of 640 GHz SIS Mixer of 640 GHz SIS

String Matching Inge Li Grtz CLRS 32 String Matching String matching problem: string

Outline Morning program Preliminaries Text matching I Text matching II Afternoon program

CSE182-L7 Dicitionary matching Pattern matching October 09 CSE182 Dictionary Matching

Graph Matchings Matching A matching M in a graph G is a set of non-loop edges with no shared

Outline Morning program Preliminaries Text matching I Text matching II Afternoon program

Scalable String Matching on the Scalable String Matching on the Scalable String Matching on the

Outline Morning program Preliminaries Text matching I Text matching II Afternoon program

1 Shape- -Context: Matching Context: Matching Scale Invariance in Clutter ? Shape Scale

2016-2020 Final LCR Study Results Sierra and Stockton Local Areas Binaya Shrestha Senior

30 th Annual General Meeting 22 nd July, 2011 This report is solely for the use of company

Passenger Confidence Plan Safety, education & reassurance Return to service Its not

1Q20 Financial Results May 28, 2020 Forward-Looking Statements This report contains certain

Tools Tirusew Asefa, Ph.D., P.E. June 24, 2013 Orlando, FL OROP: Optimized Regional

Contingency Plan for Antananarivo FIR AIR TRAFFIC MANAGEMENT COORDINATION MEETING FOR SOUTHERN

Dont judge a book by its cover How Big Data changes decision processes of marketing

Recently I had to review a paper, where a CNN was used Visualizing Crash Data Patterns to

Sambuz

Useful Links

Newsletter

Mail Us

to Choose the Matching Variables in Statistical Matching Marcello - PowerPoint PPT Presentation

The Use of Uncertainty to Choose the Matching Variables in Statistical Matching Marcello DOrazio* ( madorazi@istat.it) Marco Di Zio* (dizio@istat.it) Mauro Scanu* (scanu@istat.it) *Italian National Institute of Statistics (Istat) NTTS 2015

7.5 Bipartite Matching Matching Matching. Input: undirected graph G = (V, E). M E

f TAB 2/13/2012 1 1 CHOOSE BUDGET MANAGEMENT CHOOSE BUDGET MANAGEMENT 2/13/2012 2 CHOOSE

Matching of Matrix Elements and Parton Showers CKKW matching in e + e collisions Lecture 2:

Global Shape Matching Section 3.3: Articulated Matching using Graph Cuts Global Shape Matching:

YCL Week 3 Lets talk about variables! Variables Variables are containers for data. Variables

Matching Bipartite Matching Input Given a (undirected) graph G = ( V , E ) Input Given a bipartite

Closures &amp; Scoping Variables Parameters Local variables Free variables

Impedance Matching of 640 GHz SIS Mixer Impedance Matching of 640 GHz SIS Mixer of 640 GHz SIS

String Matching Inge Li Grtz CLRS 32 String Matching String matching problem: string

Outline Morning program Preliminaries Text matching I Text matching II Afternoon program

CSE182-L7 Dicitionary matching Pattern matching October 09 CSE182 Dictionary Matching

Graph Matchings Matching A matching M in a graph G is a set of non-loop edges with no shared

Outline Morning program Preliminaries Text matching I Text matching II Afternoon program

Scalable String Matching on the Scalable String Matching on the Scalable String Matching on the

Outline Morning program Preliminaries Text matching I Text matching II Afternoon program

1 Shape- -Context: Matching Context: Matching Scale Invariance in Clutter ? Shape Scale

2016-2020 Final LCR Study Results Sierra and Stockton Local Areas Binaya Shrestha Senior

30 th Annual General Meeting 22 nd July, 2011 This report is solely for the use of company

Passenger Confidence Plan Safety, education &amp; reassurance Return to service Its not

1Q20 Financial Results May 28, 2020 Forward-Looking Statements This report contains certain

Tools Tirusew Asefa, Ph.D., P.E. June 24, 2013 Orlando, FL OROP: Optimized Regional

Contingency Plan for Antananarivo FIR AIR TRAFFIC MANAGEMENT COORDINATION MEETING FOR SOUTHERN

Dont judge a book by its cover How Big Data changes decision processes of marketing

Recently I had to review a paper, where a CNN was used Visualizing Crash Data Patterns to

Sambuz

Useful Links

Newsletter

Mail Us

Closures & Scoping Variables Parameters Local variables Free variables

Passenger Confidence Plan Safety, education & reassurance Return to service Its not