Introduction Problem Approach Properties Measures Summary
Measuring Segregation in Social Networks Micha l Bojanowski Rense - - PowerPoint PPT Presentation
Measuring Segregation in Social Networks Micha l Bojanowski Rense - - PowerPoint PPT Presentation
Introduction Problem Approach Properties Measures Summary Measuring Segregation in Social Networks Micha l Bojanowski Rense Corten ICS/Sociology, Utrecht University July 2, 2010 Sunbelt XXX, Riva del Garda Introduction Problem
Introduction Problem Approach Properties Measures Summary
Outline
1
Introduction Homophily and segregation
2
Problem
3
Approach Approach Notation
4
Properties Ties Nodes Network
5
Measures
6
Summary
Introduction Problem Approach Properties Measures Summary Homophily and segregation
Homophily and segregation
Homophily Contact between similar people occurs at a higher rate than among dissimilar people (McPherson, Smith-Lovin, & Cook, 2001). Segregation Nonrandom allocation of people who belong to different groups into social positions and the associated social and physical distances between groups (Bruch & Mare, 2009).
Introduction Problem Approach Properties Measures Summary Homophily and segregation
Homophily and segregation
Homophily Contact between similar people occurs at a higher rate than among dissimilar people (McPherson, Smith-Lovin, & Cook, 2001). Segregation Nonrandom allocation of people who belong to different groups into social positions and the associated social and physical distances between groups (Bruch & Mare, 2009).
Introduction Problem Approach Properties Measures Summary Homophily and segregation
Homophily: Friendship selection in school classes
Moody (2001)
Introduction Problem Approach Properties Measures Summary Homophily and segregation
Residential segregation in Seattle
Blacks Asians Whites
Source: Seattle Civil Rights and Labor History Project
Introduction Problem Approach Properties Measures Summary Homophily and segregation
Segregation in network terms
Neighborhood structure can be conceptualized as a network in which links correspond to neigh- borhood proximities.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
Introduction Problem Approach Properties Measures Summary Homophily and segregation
Assumption In static terms homophily and segregation correspond to the same network phenomenon. We will stick with the segregation label.
Introduction Problem Approach Properties Measures Summary
Measurement problem
To be able to compare the levels of segregation of different networks (different school classes, different cities etc.) we need a measure.
Introduction Problem Approach Properties Measures Summary
Problems with measures
There exist an abundance of measures in the literature, but: Stem from different research streams Follow different logics Hardly ever refer to each other Lead to different conclusions given the same problems (data) So, the problems are: Which one to select in a given setting? On what grounds such selection should be performed?
Introduction Problem Approach Properties Measures Summary Approach
Possible approaches
Introduction Problem Approach Properties Measures Summary Approach
Possible approaches
Empirical Assemble a large set of empirical datasets. Calculate the measures for all of them. Look how they
- correlate. Perhaps through PCA or alike.
Introduction Problem Approach Properties Measures Summary Approach
Possible approaches
Empirical Assemble a large set of empirical datasets. Calculate the measures for all of them. Look how they
- correlate. Perhaps through PCA or alike.
Theo-pirical Take a set of probabilistic models of networks (Erd¨
- s-Renyi random graph, preferential attachment,
small-world etc.). Generate a collection of networks. Proceed as in the item above.
Introduction Problem Approach Properties Measures Summary Approach
Possible approaches
Empirical Assemble a large set of empirical datasets. Calculate the measures for all of them. Look how they
- correlate. Perhaps through PCA or alike.
Theo-pirical Take a set of probabilistic models of networks (Erd¨
- s-Renyi random graph, preferential attachment,
small-world etc.). Generate a collection of networks. Proceed as in the item above. Theoretical Come-up with a set of properties that the measures might (or might not) posses. Evaluate the differences between the measures in terms of satisfying (or not) certain properties.
Introduction Problem Approach Properties Measures Summary Approach
Possible approaches
Empirical Assemble a large set of empirical datasets. Calculate the measures for all of them. Look how they
- correlate. Perhaps through PCA or alike.
Theo-pirical Take a set of probabilistic models of networks (Erd¨
- s-Renyi random graph, preferential attachment,
small-world etc.). Generate a collection of networks. Proceed as in the item above. Theoretical Come-up with a set of properties that the measures might (or might not) posses. Evaluate the differences between the measures in terms of satisfying (or not) certain properties.
Introduction Problem Approach Properties Measures Summary Notation
Actors
Actors N = {1, 2, . . . , i, . . . , N} Groups of actors Actors are assigned into K exhaustive and mutually exclusive groups. G = {G1, . . . , Gk, . . . , GK}. Group membership is denoted with “type vector”: t = [t1, . . . , ti, . . . , tN] where ti ∈ {1, . . . , K} ti = group of actor i Let T be a set of all possible type vectors for N.
Introduction Problem Approach Properties Measures Summary Notation
Network
Network Actors form an undirected network which is a square binary matrix X = [xij]N×N. Let X be a set of all possible networks over actors in N. Mixing matrix A three-dimensional array M = [mghy]K×K×2 defined as mgh1 =
- i∈Gg
- j∈Gh
xij mgh0 =
- i∈Gg
- j∈Gh
(1 − xij)
Introduction Problem Approach Properties Measures Summary Notation
Segregation index
Segregation measure A generic segregation index S(·): S : X × T → ℜ For a given network and type vector assign a real number.
Introduction Problem Approach Properties Measures Summary Ties
Adding between-group ties
Property (Monotonicity in between-group ties: MBG) Let there be two networks X and Y defined on the same set of nodes, a type vector t, and two nodes i and j such that ti = tj, xij = 0, and yij = 1. For all the other nodes p, q = i, j xpq = ypq, i.e. the networks X and Y are identical. Network segregation index S is monotonic in between-group ties iff S(X, t) ≥ S(Y , t) In words: adding a between-group tie cannot increase segregation.
Introduction Problem Approach Properties Measures Summary Ties
Adding within-group ties
Property (Monotonicity in within-group ties: MWG) Let there be two networks X and Y defined on the same set of nodes, a type vector t, and two nodes i and j such that ti = tj, xij = 0 and yij = 1. For all the other nodes p, q = i, j xpg = ypg, i.e. the networks X and Y are identical. Network segregation index S is monotonic in within-group ties iff S(X, t) ≤ S(Y , t) In words: adding a within-group tie to the network cannot decrease segregation.
Introduction Problem Approach Properties Measures Summary Ties
Rewiring between-group tie to within-group
Property (Monotonicity in rewiring: MR) Let there be two networks X and Y , a type vector t and three nodes i, j and k such that
1 xij = 1 and ti = tj 2 yij = 0, yik = 1, and ti = tk
That is, an between-group tie ij in X is rewired to a within-group tie ik in Y . Network segregation index S is monotonic in rewiring iff S(X, t) ≤ S(Y , t)
Introduction Problem Approach Properties Measures Summary Nodes
Adding isolates
Property (Effect of adding isolates: ISO) Define two networks X = [xij]N×N and Y = [ypq]N+1×N+1 and associated type vectors u and w which are identical for the N actors and differ by an (N + 1)-th node which is an isolate:
1 ∀p, q ∈ 1..N
ypq = xpq
2 N+1
p=1 yp N+1 = N+1 q=1 yN+1 q = 0.
3 ∀k ∈ 1..N
wk = uk. S(X, u) ? S(X, w) In words: how does the segregation level change if isolates are added to the network?
Introduction Problem Approach Properties Measures Summary Network
Duplicating the network
Property (Symmetry: S) Define two identical networks X and Y and some type vector t. Network segregation index S satisfies symmetry iff S(X, t) = S(Y , t) = S(Z, z) where the network Z is constructed by considering X and Y together as a single network, namely: Z = [zpq]2N×2N such that ∀p, q ∈ {1, . . . , N} zpq = xpq ∀p, q ∈ {N + 1, . . . , 2N} zpq = ypq
- therwise zpq = 0
Introduction Problem Approach Properties Measures Summary
Measures
Freeman’s segregation index (Freeman, 1978) Spectral Segregation Index (Echenique & Fryer, 2007) Assortativity coefficient (Newman, 2003) Gupta-Anderson-May’s Q (Gupta et al, 1989) Coleman’s Homophily Index (Coleman, 1958) Segregation Matrix index (Freshtman, 1997) Exponential Random Graph Models (Snijders et al, 2006) Conditional Log-linear models for mixing matrix (Koehly, Goodreau & Morris, 2004)
Introduction Problem Approach Properties Measures Summary
Measure Level Network type Scale Freeman network U [0; 1] SSI node U [0; ∞] Assortativity network D/U [−
- g pg+p+g
1−
g pg+p+g ; 1]
Gupta-Anderson-May network D/U [−
1 G−1 ; 1]
Coleman group D [−1; 1] Segregation Matrix Index group D/U [−1; 1] Uniform homophily (CLL) network D/U [−∞; ∞] Differential homophily (CLL) group D/U [−∞; ∞] Uniform homophily (ERGM) network D/U [−∞; ∞] Differential homophily (ERGM) group D/U [−∞; ∞]
Introduction Problem Approach Properties Measures Summary
Freeman (1978)
Given two groups SFreeman = 1 − p π where p is the observed proportion of between-group ties and π is the expected proportion given that ties are created randomly. It varies between 0 (random network) and 1 (full segregation of groups).
Introduction Problem Approach Properties Measures Summary
Assortativity Coefficient, Newman (2003)
Based on a contact layer of the mixing matrix pgh = mgh1/m++1. SNewman = K
g=1 pgg − K g=1 pg+p+g
1 − K
g=1 pg+p+g
Maximum of 1 for perfect segregation; 0 for random network. Negative values for “dissasortative” networks. Minimum depends
- n the density.
Introduction Problem Approach Properties Measures Summary
Gupta, Anderson & May 1989
Also based on contact layer of the mixing matrix SGAM = K
g=1 λg − 1
K − 1 Where λg are eigenvalues of pgh. It varies between −1/(K − 1) and 1
Introduction Problem Approach Properties Measures Summary
Coleman, 1958
Expected number of ties within group g m∗
gg =
- i∈Gg
ηi ng − 1 N − 1 Sg
Coleman =
mgg − m∗
gg
- i∈Gg ηi − m∗
gg
where mgg >= m∗
gg
(1) Sg
Coleman = mgg − m∗ gg
m∗
gg
where mgg < m∗
gg
(2)
Introduction Problem Approach Properties Measures Summary
Segregation matrix index, Freshtman 1997
SSMI = d11 − d12 d11 + d12 (3) where d11 is the density of within-group ties and d12 is the density
- f between-group ties.
Introduction Problem Approach Properties Measures Summary
Conditional Log-Linear Models (Koehly et al, 2004)
log mgh1 = µ + λA
g + λB h + λUHOM gh
- λUHOM
gh
= λUHOM g = h λUHOM
gh
= 0 g = h log mgh1 = µ + λA
g + λB h + λDHOM gh
- λDHOM
gh
= λDHOM
g
g = h λDHOM
gh
= 0 g = h Parameters λUHOM and λDHOM
g
as measures of homophily/segregation.
Introduction Problem Approach Properties Measures Summary
ERGM
Exponential Random Graph models log mgh1 mgh0
- = α + βA
g + βB h + βUHOM gh
- βUHOM
gh
= βUHOM g = h βUHOM
gh
= 0 g = h log mgh1 mgh0
- = µ + βA
g + βB h + βDHOM gh
- βDHOM
gh
= βDHOM
g
g = h βDHOM
gh
= 0 g = h Parameters βUHOM and βDHOM
g
as measures of homophily/segregation.
Introduction Problem Approach Properties Measures Summary
Spectral Segregation Index, Echenique & Fryer (2007)
Segregation level of individual i in group g in component B: sg
i (B) =
1 Sg
Ci
- j
rijsg
j (B)
(4) where rij are entries in a row-normalized adjacency matrix. Segregation of individual i Si
SSI = li
l λ (5) where λ is the largest eigenvalue of B, and l is the corresponding eigenvector
Introduction Problem Approach Properties Measures Summary
SSI (2)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
Node segregation in White's kinship data
Mother Sister Brother's Wife Sister's Daughter Brother's Daughter Father Brother Sister's Husband Brother's Son Sister's Son
- Men
Women
Introduction Problem Approach Properties Measures Summary
Summary
Measure MBG (ց) MWG (ր) MR (ր) ISO S (→) Freeman
- ր
- ց
SSI ց ր ր ց → Assortativity ց ր ր → → Gupta-Anderson-May ց ր ր → → Coleman ց ր ր
- ց
Segregation Matrix Index ց ր ր
- →
Uniform homophily (CLL) ց ր ր → → Differential homophily (CLL) ց ր ր → → Uniform homophily (ERGM) ց ր ր
- →
Differential homophily (ERGM) ց ր ր
- →
Introduction Problem Approach Properties Measures Summary
Summary
Measures on different levels: individuals, groups, global network Different zero points: random graph, proportionate mixing, full integration MBW, MWG not very informative, all measures satisfy them. Symmetry: All but two measures satisfy it, Coleman and Freeman decrease.
Introduction Problem Approach Properties Measures Summary
Summary: adding isolates
Measures based on contact layer of mixing matrix are insensitive to isolates. SSI is the only one that always decreases The effect on others depend on relative group sizes.
Introduction Problem Approach Properties Measures Summary
Summary
Measures based on contact layer of the mixing matrix summarize probability of node attribute combination given that the tie exists (CLL, assortativity, GAM): explaining attributes given the network. Measures that take also disconnected dyads into account. (ERGM, Freeman, SSI): explaining tie formation given the attributes.
Introduction Problem Approach Properties Measures Summary
Further questions
Stricter formal analysis (axiomatizations). SSI is the only measure derived axiomatically. Link to behavioral models: how the segregation comes about. For example
Network formation game further justifying Bonacich centrality (Ballester et al., 2006) Coleman’s index in Currarini et al. (2010).
Introduction Problem Approach Properties Measures Summary