SOCIAL SOCIAL NET NETWORK ORK AN ANAL ALYSIS SIS USING ST - - PowerPoint PPT Presentation

β–Ά
social social net network ork an anal alysis sis using st
SMART_READER_LITE
LIVE PREVIEW

SOCIAL SOCIAL NET NETWORK ORK AN ANAL ALYSIS SIS USING ST - - PowerPoint PPT Presentation

SOCIAL SOCIAL NET NETWORK ORK AN ANAL ALYSIS SIS USING ST USING STATA 10 June 2016 German Stata User Meeting GESIS, Cologne Thomas Grund University College Dublin thomas.u.grund@gmail.com www.grund.co.uk International art fairs


slide-1
SLIDE 1

SOCIAL SOCIAL NET NETWORK ORK AN ANAL ALYSIS SIS USING USING ST STATA

10 June 2016 German Stata User Meeting GESIS, Cologne Thomas Grund University College Dublin thomas.u.grund@gmail.com www.grund.co.uk

slide-2
SLIDE 2
slide-3
SLIDE 3

NETWORK DYNAMICS

International art fairs Changes 2005 - 2006

Yogev, T. and Grund, T. (2012) Structural Dynamics and the Market for Contemporary Art: The Case

  • f International Art Fairs. Sociological Focus, 54(1), 23-40.
slide-4
SLIDE 4

CO-OFFENDING IN YOUTH GANG

Grund, T. and Densley, J. (2012) Ethnic Heterogeneity in the Activity and Structure of a Black Street

  • Gang. European Journal of Criminology, 9(3), 388-406.

Grund, T. and Densley, J. (2015). Ethnic homophily and triad closure: Mapping internal gang structure using exponential random graph models. Journal of Contemporary Criminal Justice, 31(3), 354–370

Caribbean East Africa UK West Africa

slide-5
SLIDE 5

MANCHESTER UTD – TOTTENHAM

9/9/2006, Old Trafford

Grund, T. (2012) Network Structure and Team Performance: The Case of English Premier League Soccer Teams. Social Networks, 34(4), 682-690.

slide-6
SLIDE 6

SOCIAL NETWORKS

  • Social
  • Friendship, kinship, romantic relationships
  • Government
  • Political alliances, government agencies
  • Markets
  • Trade: flow of goods, supply chains, auctions
  • Labor markets: vacancy chains, getting jobs
  • Organizations and teams
  • Interlocking directorates
  • Within-team communication, email exchange
slide-7
SLIDE 7

DEFINITION

  • Mathematically, a (binary) network is defined as 𝐻 = π‘Š, 𝐹

where π‘Š = 1,2, . . , π‘œ is a set of β€œvertices” (or β€œnodes”) and 𝐹 βŠ† 𝑗, π‘˜ | 𝑗, π‘˜ ∈ π‘Š is a set of β€œedges” (or β€œties”, β€œarcs”). Edges are simply pairs of vertices, e.g. 𝐹 βŠ† 1,2 , 2,5 … .

  • We write π‘§π‘—π‘˜ = 1 if actors 𝑗 and π‘˜ are related to each other (i.e.,

if 𝑗, π‘˜ ∈ 𝐹), and π‘§π‘—π‘˜ = 0 otherwise.

  • In digraphs (or directed networks) it is possible that π‘§π‘—π‘˜ β‰  π‘§π‘˜π‘—.
slide-8
SLIDE 8

ADJACENCY MATRIX

slide-9
SLIDE 9

ADJACENCY MATRIX

slide-10
SLIDE 10

ADJACENCY LIST

slide-11
SLIDE 11

ADJACENCY LIST

slide-12
SLIDE 12

NETWORK ANALYSIS

  • Simple description/characterization of networks
  • Calculation of node-level characteristics (e.g. centrality)
  • Components, blocks, cliques, equivalences…
  • Visualization of networks
  • Statistical modeling of networks, network dynamics
  • ….
slide-13
SLIDE 13

. findit nwcommands

http://nwcommands.org

slide-14
SLIDE 14

http://nwcommands.org

slide-15
SLIDE 15

Twitter: nwcommands GoogleGroup: nwcommands Search β€œnwcommands” to find a channel with video tutorials.

slide-16
SLIDE 16

NWCOMMANDS

  • Software package for Stata. Almost 100 new Stata commands

for handling, manipulating, plotting and analyzing networks.

  • Ideal for existing Stata users. Corresponds to the R packages

β€œnetwork”, β€œsna”, β€œigraph”, β€œnetworkDynamic”.

  • Designed for small to medium-sized networks (< 10000).
  • Almost all commands have menus. Can be used like Ucinet
  • r Pajek. Ideal for beginners and teaching.
  • Not just specialized commands, but whole infrastructure for

handling/dealing with networks in Stata.

  • Writing own network commands that build on the

nwcommands is very easy.

slide-17
SLIDE 17

LINES OF CODE

Type Files LoC .ado 94 14548 .dlg 57 5707 .sthlp 97 9954 Downloads Over 13 000 (since Jan 2015)

slide-18
SLIDE 18

. nwinstall, all

slide-19
SLIDE 19
slide-20
SLIDE 20

. help nwcommands

slide-21
SLIDE 21

INTUITION

  • Software introduces netname and netlist.
  • Networks are dealt with like normal variables.
  • Many normal Stata commands have their network counterpart

that accept a netname, e.g. nwdrop, nwkeep, nwclear, nwtabulate, nwcorrelate, nwcollapse, nwexpand, nwreplace, nwrecode, nwunab and more.

  • Stata intuition just works.
slide-22
SLIDE 22

SETTING NETWORKS

  • β€œSetting” a network creates a network quasi-object that has a

netname.

  • After that you can refer to the network simply by its netname,

just like when refer to a variable with its varname.

Syntax:

slide-23
SLIDE 23
slide-24
SLIDE 24
slide-25
SLIDE 25

LIST ALL NETWORKS

Check out the return vector. Both commands populate it as well. These are the names of the networks in memory. You can refer to these networks by their name.

slide-26
SLIDE 26

LOAD NETWORK FROM THE INTERNET

. help netexample

slide-27
SLIDE 27

IMPORT NETWORK

  • A wide array of popular network file-formats are supported, e.g.

Pajek, Ucinet, by nwimport.

  • Files can be imported directly from the internet as well.
  • Similarly, networks can be exported to other formats with

nwexport.

slide-28
SLIDE 28

DROP/KEEP NETWORKS

  • Dropping and keeping networks works almost exactly like

dropping and keeping variables.

slide-29
SLIDE 29
slide-30
SLIDE 30

DROP/KEEP NODES

You can also drop/keep nodes of a specific network.

slide-31
SLIDE 31

NODE ATTRIBUTES

  • Every node of a network has a nodeid, which is matched with the
  • bservation number in a normal dataset.
  • In this case, the node with nodeid == 1 is the β€œacciaiuoli” family and they

have a wealth of 10.

slide-32
SLIDE 32

nwset nwdrop nwds nwkeep nwcurrent nwimport webnwuse

slide-33
SLIDE 33
slide-34
SLIDE 34
slide-35
SLIDE 35

. webnwuse gang . nwplot gang, color(Birthplace) scheme(s2network)

slide-36
SLIDE 36

nwplot gang, color(Birthplace) symbol(Prison) size(Arrests)

slide-37
SLIDE 37

acciaiuoli albizzi barbadori bischeri castellani ginori guadagni lamberteschi medici pazzi peruzzi pucci ridolfi salviati strozzi tornabuoni

. webnwuse florentine . nwplot flomarriage, lab

slide-38
SLIDE 38

. nwplotmatrix flomarriage, lab

slide-39
SLIDE 39

. nwplotmatrix flomarriage, sortby(wealth) label(wealth)

slide-40
SLIDE 40

. webnwuse klas12 . nwmovie klas12_wave1-klas12_wave4

slide-41
SLIDE 41
slide-42
SLIDE 42

. nwmovie _all, colors(col_t*) sizes(siz_t*) edgecolors(edge_t*)

slide-43
SLIDE 43

nwplot nwplotmatrix nwmovie

slide-44
SLIDE 44
slide-45
SLIDE 45

SUMMARIZE

slide-46
SLIDE 46

SUMMARIZE

slide-47
SLIDE 47

TABULATE NETWORK

slide-48
SLIDE 48

TABULATE TWO NETWORKS

slide-49
SLIDE 49

TABULATE NETWORK AND ATTRIBUTE

slide-50
SLIDE 50

DYAD CENSUS

M: mutual A: asymmetric N: null

slide-51
SLIDE 51
slide-52
SLIDE 52

nwsummarize nwtabulate nwdyads nwtriads

slide-53
SLIDE 53
slide-54
SLIDE 54

TABULATE NETWORK

slide-55
SLIDE 55

RECODE TIE VALUES

slide-56
SLIDE 56

FLORENTINE FAMILIES

Marriage ties Business ties

slide-57
SLIDE 57

REPLACE TIE VALUES

slide-58
SLIDE 58

. help nwreplace

slide-59
SLIDE 59

GENERATE NETWORKS

slide-60
SLIDE 60

. help nwgen

slide-61
SLIDE 61

nwrecode nwreplace nwsync nwtranspose nwsym nwgen

slide-62
SLIDE 62
slide-63
SLIDE 63

FLORENTINE FAMILIES

Who are the neighbors?

slide-64
SLIDE 64

NEIGHBORS

slide-65
SLIDE 65

NEIGHBORS

slide-66
SLIDE 66

CONTEXT

slide-67
SLIDE 67

CONTEXT

What is the average wealth of the β€œalbizzi’s” network neighbors?

slide-68
SLIDE 68

CONTEXT

slide-69
SLIDE 69

CONTEXT

slide-70
SLIDE 70

nwneighbor nwcontext

slide-71
SLIDE 71
slide-72
SLIDE 72

DISTANCE

Length of a shortest connecting path defines the (geodesic) distance between two nodes.

slide-73
SLIDE 73

DISTANCE

π‘’π‘—π‘‘π‘’π‘π‘œπ‘‘π‘“π‘‘ = 1 1 1 2 1 2 2 2 1 1 3 3 1 2 2 2 1 3 3 1

3 1 2 4 5

𝑏𝑀𝑕𝑓𝑠𝑏𝑕𝑓 π‘‘β„Žπ‘π‘ π‘’π‘“π‘‘π‘’ π‘žπ‘π‘’β„Ž π‘šπ‘“π‘œπ‘•π‘’β„Ž = 1.8

slide-74
SLIDE 74

DISTANCE

slide-75
SLIDE 75

DISTANCE

slide-76
SLIDE 76

PATHS

How can one get from the β€œperuzzi” to the β€œmedici”?

slide-77
SLIDE 77

PATHS

slide-78
SLIDE 78

PATHS

slide-79
SLIDE 79

PATHS

slide-80
SLIDE 80

nwgeodesic nwpath nwplot

slide-81
SLIDE 81
slide-82
SLIDE 82

CENTRALITY

Well connected actors are in a structurally advantageous position.

  • Getting jobs
  • Better informed
  • Higher status
  • …

What is β€œwell-connected?”

slide-83
SLIDE 83

DEGREE CENTRALITY

Degree centrality

  • Simply the number of incoming/outgoing ties => indegree

centrality, outdegree centrality

  • How many ties does an individual have?

𝐷𝑝𝑒𝑓𝑕𝑠𝑓𝑓 𝑗 = ෍

π‘˜=1 𝑂

π‘§π‘—π‘˜ 𝐷𝑗𝑒𝑓𝑕𝑠𝑓𝑓 𝑗 = ෍

π‘˜=1 𝑂

π‘§π‘˜π‘—

slide-84
SLIDE 84

BETWEENNESS CENTRALITY

Betweeness centrality

  • How many shortest paths go through an individual?

a e b c d 𝐷𝑐𝑓𝑒π‘₯π‘“π‘“π‘œπ‘œπ‘“π‘‘π‘‘ 𝑏 = 6

…

𝐷𝑐𝑓𝑒π‘₯π‘“π‘“π‘œπ‘œπ‘“π‘‘π‘‘ 𝑐 = 0

slide-85
SLIDE 85

BETWEENNESS CENTRALITY

Betweeness centrality

  • How many shortest paths go through an individual?

a e b c d e What about multiple shortest paths? E.g. there are two shortest paths from c to d (one via a and another

  • ne via e)

Give each shortest path a weight inverse to how many shortest paths there are between two nodes.

slide-86
SLIDE 86
slide-87
SLIDE 87

nwdegree nwbetween nwevcent nwcloseness nwkatz

slide-88
SLIDE 88
slide-89
SLIDE 89

RANDOM NETWORK

nwrandom 15, prob(.1)

Each tie has the same probability to exist, regardless of any other ties.

nwrandom 15, prob(.5)

slide-90
SLIDE 90

LATTICE RING LATTICE

nwlattice 5 5 nwring 15, k(2) undirected

slide-91
SLIDE 91

SMALL WORLD NETWORK

nwsmall 10, k(2) shortcuts(3) undirected

slide-92
SLIDE 92

PREFERENTIAL ATTACHMENT NETWORK

nwpref 10, prob(.5)

slide-93
SLIDE 93

HOMOPHILY NETWORK

nwhomophily gender, density(0.05) homophily(5)

slide-94
SLIDE 94

nwrandom nwlattice nwsmall nwpref nwring nwhomophily nwdyadprob

slide-95
SLIDE 95
slide-96
SLIDE 96

Is a particular network pattern more (or less) prominent than expected?

slide-97
SLIDE 97

Question: Is there more or less correlation between these two networks than expected?

𝑑𝑝𝑠𝑠

𝑝𝑐𝑑 = 0.372

slide-98
SLIDE 98

Test-statistic

𝑑𝑝𝑠𝑠

𝑝𝑐𝑑 = 0.372

Distribution of test- statistic under null hypothesis

𝑑𝑝𝑠𝑠

π‘ π‘π‘œπ‘’π‘π‘› =? ?

1 2

slide-99
SLIDE 99

QUADRATIC ASSIGNMENT PROCEDURE

  • Scramble the network by permuting the actors

(randomly re-label the nodes), i.e. the actual network does not change, however, the position each node takes does.

  • Re-calculate the test-static on the

permuted networks and compare it with test-statistic on the unscrambled network.

Network structure is β€˜controlled’ for. Keeps dependencies.

slide-100
SLIDE 100

PERMUTATION TEST

permutation

  • 1

1 1

  • 1

1

  • 1

2 3 4 4 3 1 2

  • 1

1 1

  • 1

1

slide-101
SLIDE 101

GRAPH CORRELATION

nwcorrelate flobusiness flomarriage, permutations(100)

1 2 3 4

  • .2

.2 .4 correlation

based on 100 QAP permutations of network flobusiness

Corr(flobusiness, flomarriage)

slide-102
SLIDE 102

nwcorrelate nwpermute nwqap nwergm

slide-103
SLIDE 103

SOCIAL SOCIAL NET NETWORK ORK AN ANAL ALYSIS SIS USING USING ST STATA

10 June 2016 German Stata User Meeting GESIS, Cologne Thomas Grund University College Dublin thomas.u.grund@gmail.com www.grund.co.uk

slide-104
SLIDE 104
slide-105
SLIDE 105

ERGM

logit 𝑄 𝑍

π‘—π‘˜ = 1 π‘œ 𝑏𝑑𝑒𝑝𝑠𝑑, 𝑍 π‘—π‘˜ 𝑑

= ෍

𝑙=1 𝐿

πœ„π‘™πœ€π‘‘π‘™ 𝒛

Probability that there is a tie from i to j. Given, n actors AND the rest

  • f the network, excluding the

dyad in question!

𝑍

π‘—π‘˜ 𝑑 = all dyads other than 𝑍 π‘—π‘˜

Amount by which the feature 𝑑𝑙 𝑧 changes when 𝑍

π‘—π‘˜ is

toggled from 0 to 1.

slide-106
SLIDE 106

ERGM

𝑄 𝒁 = 𝒛 πœ„ = 𝑓 πœ„π‘ˆπ‘‘ 𝒛 𝑑 πœ„

𝒁 = 𝒔𝒃𝒐𝒆𝒑𝒏 π’˜π’ƒπ’”π’‹π’ƒπ’„π’Žπ’‡, a randomly selected network from the pool of all potential networks 𝒛 = π’‘π’„π’•π’‡π’”π’˜π’‡π’† π’˜π’ƒπ’”π’‹π’ƒπ’„π’Žπ’‡, here observed network

Probability to draw β€˜our’ observed network y from all potential networks A score given to our network y using some parameters πœ„ and the network features s of y

𝜾 = 𝒒𝒃𝒔𝒃𝒏𝒇𝒖𝒇𝒔𝒕, to be estimated

A score given to all

  • ther networks we

could have observed

slide-107
SLIDE 107

ERGM: INTEPRETATION

ERGM’s ultimately give you an estimate for various parameters πœ„π‘™, which mean…

If a potential tie 𝑍

π‘—π‘˜ = 1

(between i and j) would change the network statistic 𝑑𝑙 by one unit. This changes the log-

  • dds for the tie 𝑍

π‘—π‘˜ to

actually exist by πœ„π‘™.

slide-108
SLIDE 108

EXAMPLE

Consider an ERGM for an undirected network with parameters for these three statistics: 𝑑𝑓𝑒𝑕𝑓𝑑 𝑧 = ෍ π‘§π‘—π‘˜ 𝑑2𝑑𝑒𝑏𝑠𝑑 𝑧 = ෍ π‘§π‘—π‘˜ 𝑧𝑗𝑙 π‘‘π‘’π‘ π‘—π‘π‘œπ‘•π‘šπ‘“π‘‘ 𝑧 = ෍ π‘§π‘—π‘˜ π‘§π‘˜π‘™π‘§π‘—π‘™

1) number of edges 2) number of 2-stars 3) number of triangles

𝑄 𝒁 = 𝒛 πœ„ ∝ 𝑓 πœ„π‘“π‘’π‘•π‘“π‘‘π‘‘π‘“π‘’π‘•π‘“π‘‘ 𝑧 + πœ„2𝑑𝑒𝑏𝑠𝑑𝑑2𝑑𝑒𝑏𝑠𝑑 𝑧 + πœ„π‘’π‘ π‘—π‘π‘œπ‘•π‘šπ‘“π‘‘π‘‘π‘’π‘ π‘—π‘π‘œπ‘•π‘šπ‘“π‘‘ 𝑧

Then the 3-parameter ERG distribution function is:

slide-109
SLIDE 109
slide-110
SLIDE 110

SOCIAL SOCIAL NET NETWORK ORK AN ANAL ALYSIS SIS USING USING ST STATA

10 June 2016 German Stata User Meeting GESIS, Cologne Thomas Grund University College Dublin thomas.u.grund@gmail.com www.grund.co.uk