working with missing data
play

Working with Missing Data Steve Borgatti LINKS Center Workshop on - PDF document

2009 LINKS Center Workshop on Social Network Analysis Slide 1 Working with Missing Data Steve Borgatti LINKS Center Workshop on Social Network Analysis _____________________________________________________________________________________


  1. 2009 LINKS Center Workshop on Social Network Analysis Slide 1 Working with Missing Data Steve Borgatti LINKS Center Workshop on Social Network Analysis _____________________________________________________________________________________ _____________________________________________________________________________________ _____________________________________________________________________________________ _____________________________________________________________________________________ _____________________________________________________________________________________ _____________________________________________________________________________________ _____________________________________________________________________________________ _____________________________________________________________________________________ _____________________________________________________________________________________ _____________________________________________________________________________________ __________ FRIDAY (c) 2009 LINKS Center ADVANCED Session

  2. 2009 LINKS Center Workshop on Social Network Analysis Slide 2 The problem: 1 1 1 1 1 1 1 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 Some respondents did • A A B B C G G L M P P P R S S T not participate in the - - - - - - - - - - - - - - - - survey, leaving blank 1 ACCIAIUOL 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 2 ALBIZZI 0 0 0 0 0 1 1 0 1 0 0 0 0 0 0 0 rows in the network 3 BARBADORI 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 data matrix 4 BISCHERI Important note: if you 5 CASTELLAN 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 • 6 GINORI 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 enter data as edgelist 7 GUADAGNI 0 1 0 1 0 0 0 1 0 0 0 0 0 0 0 1 or nodelist using dl file, 8 LAMBERTES 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 missing values are 9 MEDICI 1 1 1 0 0 0 0 0 0 0 0 0 1 1 0 1 automatically 10 PAZZI 11 PERUZZI 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 converted to zeros. 12 PUCCI 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 How would you convert – 13 RIDOLFI 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 them back? 14 SALVIATI 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 15 STROZZI 0 0 0 1 1 0 0 0 0 0 1 0 1 0 0 0 16 TORNABUON 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 _____________________________________________________________________________________ _____________________________________________________________________________________ _____________________________________________________________________________________ _____________________________________________________________________________________ _____________________________________________________________________________________ _____________________________________________________________________________________ _____________________________________________________________________________________ _____________________________________________________________________________________ _____________________________________________________________________________________ _____________________________________________________________________________________ __________ FRIDAY (c) 2009 LINKS Center ADVANCED Session

  3. 2009 LINKS Center Workshop on Social Network Analysis Slide 3 Size of the problem • Counting the number of missing values with Tools | Freq. – Select “matrices” 1 1 1 1 1 1 1 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 A A B B C G G L M P P P R S S T - - - - - - - - - - - - - - - - 1 ACCIAIUOL 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 2 ALBIZZI 0 0 0 0 0 1 1 0 1 0 0 0 0 0 0 0 3 BARBADORI 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 4 BISCHERI 5 CASTELLAN 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 6 GINORI 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 GUADAGNI 0 1 0 1 0 0 0 1 0 0 0 0 0 0 0 1 8 LAMBERTES 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 9 MEDICI 1 1 1 0 0 0 0 0 0 0 0 0 1 1 0 1 10 PAZZI 11 PERUZZI 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 12 PUCCI 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 13 RIDOLFI 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 14 SALVIATI 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 15 STROZZI 0 0 0 1 1 0 0 0 0 0 1 0 1 0 0 0 16 TORNABUON 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 Output: 1 ----- 0.000 0.725 1.000 0.150 blank 0.125 _____________________________________________________________________________________ _____________________________________________________________________________________ _____________________________________________________________________________________ _____________________________________________________________________________________ _____________________________________________________________________________________ _____________________________________________________________________________________ _____________________________________________________________________________________ _____________________________________________________________________________________ _____________________________________________________________________________________ _____________________________________________________________________________________ __________ FRIDAY (c) 2009 LINKS Center ADVANCED Session

  4. 2009 LINKS Center Workshop on Social Network Analysis Slide 4 Standard Solutions • Convert missings to zeros (since you did NOT observe a tie) – Re ‐ run having converted missings to ones, to see how different the results could be • Convert missings to zeros and ones at random, using density of the matrix as guide • Impute the missing values using other information – Symmetricity – QAP regression _____________________________________________________________________________________ _____________________________________________________________________________________ _____________________________________________________________________________________ _____________________________________________________________________________________ _____________________________________________________________________________________ _____________________________________________________________________________________ _____________________________________________________________________________________ _____________________________________________________________________________________ _____________________________________________________________________________________ _____________________________________________________________________________________ __________ FRIDAY (c) 2009 LINKS Center ADVANCED Session

  5. 2009 LINKS Center Workshop on Social Network Analysis Slide 5 One solution 1 1 1 1 1 1 1 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 Suppose the data are • A A B B C G G L M P P P R S S T largely symmetric, and - - - - - - - - - - - - - - - - the social relation is 1 ACCIAIUOL 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 logically symmetric 2 ALBIZZI 0 0 0 0 0 1 1 0 1 0 0 0 0 0 0 0 3 BARBADORI 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 Marriage to – 4 BISCHERI Saw movie with – 5 CASTELLAN 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 Then we can impute the • 6 GINORI 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 missing data from the 7 GUADAGNI 0 1 0 1 0 0 0 1 0 0 0 0 0 0 0 1 transpose of the matrix 8 LAMBERTES 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 9 MEDICI 1 1 1 0 0 0 0 0 0 0 0 0 1 1 0 1 i.e., assume that if A says – 10 PAZZI B is a friend, then if B had 11 PERUZZI 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 participated, s/he would 12 PUCCI 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 have said A was a a friend too 13 RIDOLFI 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 14 SALVIATI 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 So, fill in missing row with – 15 STROZZI 0 0 0 1 1 0 0 0 0 0 1 0 1 0 0 0 the corresponding column 16 TORNABUON 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 _____________________________________________________________________________________ _____________________________________________________________________________________ _____________________________________________________________________________________ _____________________________________________________________________________________ _____________________________________________________________________________________ _____________________________________________________________________________________ _____________________________________________________________________________________ _____________________________________________________________________________________ _____________________________________________________________________________________ _____________________________________________________________________________________ __________ FRIDAY (c) 2009 LINKS Center ADVANCED Session

  6. 2009 LINKS Center Workshop on Social Network Analysis Slide 6 Doing it … • In matrix algebra, you can do it with the REPLACENA command: – newdata = replacena(olddata transp(olddata)) • Syntax – > <newds> = replacena(<ds1> <ds2>) – Where ds1 is the dataset that contains missing values and ds2 is the dataset from which to draw the correct values _____________________________________________________________________________________ _____________________________________________________________________________________ _____________________________________________________________________________________ _____________________________________________________________________________________ _____________________________________________________________________________________ _____________________________________________________________________________________ _____________________________________________________________________________________ _____________________________________________________________________________________ _____________________________________________________________________________________ _____________________________________________________________________________________ __________ FRIDAY (c) 2009 LINKS Center ADVANCED Session

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend