Homophily Rearrangement Algorithms and Similarity Based Diffusion on - - PowerPoint PPT Presentation

homophily rearrangement algorithms and similarity based
SMART_READER_LITE
LIVE PREVIEW

Homophily Rearrangement Algorithms and Similarity Based Diffusion on - - PowerPoint PPT Presentation

Homophily Rearrangement Algorithms and Similarity Based Diffusion on Networks Thesis defense Benedek Andrs Rzemberczki Central European University Supervisor: Professor Rosario Nunzio Mantegna 2016.06.13. Introduction Informal model


slide-1
SLIDE 1

Homophily Rearrangement Algorithms and Similarity Based Diffusion on Networks Thesis defense

Benedek András Rózemberczki

Central European University

Supervisor: Professor Rosario Nunzio Mantegna 2016.06.13.

slide-2
SLIDE 2

Introduction Informal model descriptions Simulations Summary References

Overview Introduction Informal model descriptions Homophily rearrangement algorithms Similarity based diffusion Simulations Homophily rearrangement simulations Similarity based diffusion simulations Summary

Homophily Rearrangement Algorithms and Similarity Based Diffusion on Networks Thesis defense

slide-3
SLIDE 3

Introduction Informal model descriptions Simulations Summary References

Introduction Research questions

  • 1. Univariate homophily rearrangement algorithms.
  • 2. Multivariate homophily rearrangement algorithms.
  • 3. Similarity based diffusion model.

Initial state Pseudo-ordered state Diffusion process started Diffusion process ended Ordered state Homophily rearrangement Diffusion initialized Diffusion Randomization

Figure 1: The schematics of the modeling framework used in my thesis

Homophily Rearrangement Algorithms and Similarity Based Diffusion on Networks Thesis defense

slide-4
SLIDE 4

Introduction Informal model descriptions Simulations Summary References

The context of homophily

◮ Birds of a feather (McPherson et al., 2001). ◮ Not just a social phenomenon. ◮ Micro-level similarity results in a macro-level outcome

(Jackson et al., 2016). It is present in numerous socio-economic and non socio-economic networks, such as:

◮ Corporate governance networks (Kogut et al., 2012). ◮ Friendships (Epstein, 1986). ◮ Labor market referrals (Fernandez & Fernandez-Mateo, 2006). ◮ Blogs and webpages (Bisgin et al., 2010). ◮ Interactomes (Navlakha & Kingsford, 2010).

Homophily Rearrangement Algorithms and Similarity Based Diffusion on Networks Thesis defense

slide-5
SLIDE 5

Introduction Informal model descriptions Simulations Summary References

Homophily rearrangements and diffusion

◮ Homophilous network generation (van Eck & Jager, 2010;

Quayle et al., 2006).

◮ Homophily rearrangement is used for randomized experiments

(Centola, 2011).

◮ Peer-effects are measurable – it would be nice to have

large-scale homophily rearrangement algorithms.

◮ The Schelling (1969) and Fagiolo et al. (2007) models are

actually homophily rearrangement algorithms.

◮ Later diffusion can be initiated on the network (Yavas & Yusel,

2014).

◮ The diffusion in my model is probabilistic not relative threshold

based (Yavas & Yusel, 2014; Halberstam & Knight, 2014).

◮ The seeders have multiple infection trials – unlike in Kempe

et al. (2003).

Homophily Rearrangement Algorithms and Similarity Based Diffusion on Networks Thesis defense

slide-6
SLIDE 6

Introduction Informal model descriptions Simulations Summary References

Informal model descriptions Homophily rearrangement algorithms I.

(a) Perfect heterophily (b) Homophily (c) Strong homophily Figure 2: Different levels of universal homophily on a 4 × 4 square lattice without periodic boundary conditions

The network is defined by the adjacency matrix (W) and the generic feature vector or matrix (x and X respectively).

Homophily Rearrangement Algorithms and Similarity Based Diffusion on Networks Thesis defense

slide-7
SLIDE 7

Introduction Informal model descriptions Simulations Summary References

Homophily rearrangement algorithms II.

The algorithm design has to include:

◮ Homophily measurement function – H(x, W) or H(X, W). ◮ The type of the generic vertex feature matters – for example

continuous or categorical.

◮ The target homophily level(s) – φ or Φ. ◮ If there are multiple homophily targets they must have the

same sign. The following homophily rearrangement algorithms were implemented for univariate and multivariate systems:

◮ Heuristic ◮ Heuristic with bag of indices ◮ Greedy

Homophily Rearrangement Algorithms and Similarity Based Diffusion on Networks Thesis defense

slide-8
SLIDE 8

Introduction Informal model descriptions Simulations Summary References

Similarity based diffusion I.

The model setup consists:

  • 1. Agents (vertices) who are connected by a network (edges).
  • 2. A binary information that agents transmit among each other.
  • 3. Agents have generic vertex features denoted by X.
  • 4. The transmission of the information is probabilistic.

The probability that i transmits the information to agent j is epxressed by the pairwise transmission probability equations. See Equation (1). Pi,j = P0 · Ψ(−γ · d(Xi, Xj))

  • Base function

(1) Specifically Equation (2) describes the pairwise transmission proba- bility equation that I use later during the simulations. Pi,j = P0 · exp (−γ · |xi − xj|) (2)

Homophily Rearrangement Algorithms and Similarity Based Diffusion on Networks Thesis defense

slide-9
SLIDE 9

Introduction Informal model descriptions Simulations Summary References

Similarity based diffusion II.

  • 1. Initially only a single agent has the information.
  • 2. The time is discrete.
  • 3. This is a modification of the susceptible-infected model.
  • 4. Convergence to a fully infected state only happens when the

network has one single component.

  • 5. There is no recovering.
  • 6. The γ value can be breed specific – discrimination is built in.

Homophily Rearrangement Algorithms and Similarity Based Diffusion on Networks Thesis defense

slide-10
SLIDE 10

Introduction Informal model descriptions Simulations Summary References

Homophily rearrangement simulations The notion of unstable results

(a) Simulation run 1.

2500 5000

  • 0.15
  • 0.1
  • 0.05

0.05 Iterative steps Inbreeding homophily

(b) Simulation run 2.

1500 3000

  • 0.15
  • 0.1
  • 0.05

0.05 Iterative steps Inbreeding homophily

F M Figure 3: The convergence of gender based homophily to a target in separate simulation runs – friendship network from Harris (2009)

Homophily Rearrangement Algorithms and Similarity Based Diffusion on Networks Thesis defense

slide-11
SLIDE 11

Introduction Informal model descriptions Simulations Summary References

Relaxation of switching conditions

(a) Simulation run 1.

250 500

  • 0.15
  • 0.20

Iterative steps Inbreeding homophily

(b) Simulation run 2.

125 250

  • 0.15
  • 0.20

Iterative steps Inbreeding homophily

9th 10th 11th 12th Figure 4: The convergence of grade based inbreeding homophily to a target vector in two separate simulation runs – based on the school friendship network

Homophily Rearrangement Algorithms and Similarity Based Diffusion on Networks Thesis defense

slide-12
SLIDE 12

Introduction Informal model descriptions Simulations Summary References

Increased target homophily I.

(a) Mean solution time

  • 0.5 -0.25

0.25 0.5 100 200 300 400 φ E(t)

(b) Median solution time

  • 0.5 -0.25

0.25 0.5 100 200 300 400 φ Me(t)

Figure 5: Expected average and median convergence times of the heuristic algorithm on a square lattice with periodic boundary conditions as a function of target homophily

Homophily Rearrangement Algorithms and Similarity Based Diffusion on Networks Thesis defense

slide-13
SLIDE 13

Introduction Informal model descriptions Simulations Summary References

Increased target homophily II.

(a) Mean solution time

  • 0.5 -0.25

0.25 0.5 5 10 φ E(t)

(b) Median solution time

  • 0.5 -0.25

0.25 0.5 5 10 φ Me(t)

Figure 6: Expected average and median convergence times of the greedy algorithm on a square lattice with periodic boundary conditions as a function of target homophily

Homophily Rearrangement Algorithms and Similarity Based Diffusion on Networks Thesis defense

slide-14
SLIDE 14

Introduction Informal model descriptions Simulations Summary References

System size and feature distribution

0.2 0.25 0.3 0.35 0.4 0.45 0.5 400 800 1,200 1,600

P(X = 1) E(t) N = 121 N = 196 N = 256

Figure 7: Expected convergence time of the heuristic algorithm as a function of system size and balancedness of the feature distribution

Homophily Rearrangement Algorithms and Similarity Based Diffusion on Networks Thesis defense

slide-15
SLIDE 15

Introduction Informal model descriptions Simulations Summary References

Multiple features

−0.5 −0.4 −0.3 −0.2 −0.1 0.1 0.2 0.3 0.4 0.5 1,000 1,500 2,000 2,500 3,000

ρ E(t)

N = 100 N = 144 N = 196 N = 256 Figure 8: Expected solution time of the multivariate heuristic homophily rearrangement algorithms as a function of feature correlation and lattice size

Homophily Rearrangement Algorithms and Similarity Based Diffusion on Networks Thesis defense

slide-16
SLIDE 16

Introduction Informal model descriptions Simulations Summary References

Simulation of diffusion Pairwise transmission probability distributions

0.1 0.2 0.3 0.4 2 4 Transmission probability Density φ = −0.5 φ = 0 φ = 0.5

Figure 9: The distribution of pairwise transmission probabilities

Homophily Rearrangement Algorithms and Similarity Based Diffusion on Networks Thesis defense

slide-17
SLIDE 17

Introduction Informal model descriptions Simulations Summary References

Ratio of infected nodes

5 10 15 20 25 30 0.5 1 t E(Yt)/N φ = −0.8 φ = 0 φ = 0.8

Figure 10: The ratio of infected nodes as a function of time

Homophily Rearrangement Algorithms and Similarity Based Diffusion on Networks Thesis defense

slide-18
SLIDE 18

Introduction Informal model descriptions Simulations Summary References

Sensitivity to dissimilarity increase

0.2 0.4 0.6 0.8 20 30 40 50 60 γ E(t) φ = −0.8 φ = 0 φ = 0.8

Figure 11: Expected solution time as a function of sensitivity to dissimilarity

Homophily Rearrangement Algorithms and Similarity Based Diffusion on Networks Thesis defense

slide-19
SLIDE 19

Introduction Informal model descriptions Simulations Summary References

Baseline transmission probability increase

0.4 0.5 0.6 0.7 0.8 0.9 12 14 16 18 20 P0 E(t) φ = −0.8 φ = 0 φ = 0.8

Figure 12: Expected solution time as a function of baseline transmission probability

Homophily Rearrangement Algorithms and Similarity Based Diffusion on Networks Thesis defense

slide-20
SLIDE 20

Introduction Informal model descriptions Simulations Summary References

Discriminating seeders I.

20 40 60 80 100 120 0.5 1 t E(Yt)/N Discriminator seeder Non-discriminator seeder

Figure 13: Non-homophilous state with discrimination – expected ratio of infected nodes

Homophily Rearrangement Algorithms and Similarity Based Diffusion on Networks Thesis defense

slide-21
SLIDE 21

Introduction Informal model descriptions Simulations Summary References

Discriminating seeders II.

10 20 30 40 50 60 70 80 90 100 110 0.5 1 t E(Yt)/N Discriminator seeder Non-discriminator seeder

Figure 14: Homophilous state with discrimination – expected ratio of infected nodes

Homophily Rearrangement Algorithms and Similarity Based Diffusion on Networks Thesis defense

slide-22
SLIDE 22

Introduction Informal model descriptions Simulations Summary References

Summary Summary of results

◮ Models

  • 1. Univariate homophily rearrangement algorithms.
  • 2. Multivariate homophily rearrangement algorithms.
  • 3. Similarity based diffusion model.

◮ Homophily rearrangement simulation results

  • 1. Instability of resulting assignments.
  • 2. Target homophily – convergence time relationship.
  • 3. Network size – convergence time relationship.
  • 4. Target homophily – convergence time relationship.
  • 5. In multivariate systems the correlation of features effect the

convergence.

◮ Similarity based diffusion simulation results

  • 1. Homophily helps the propagation of information.
  • 2. Heterophily slows down the propagation of information.
  • 3. Discrimination has adverse effects.

Homophily Rearrangement Algorithms and Similarity Based Diffusion on Networks Thesis defense

slide-23
SLIDE 23

Introduction Informal model descriptions Simulations Summary References

Policy relevance

Homophily rearrangement algorithms

  • 1. Benchmarking the factual level of homophily.
  • 2. Investigation of generic and topological feature relationships.
  • 3. Randomized experiments.
  • 4. Multivariate randomized experiments.

Similarity based diffusion

  • 1. Multiple seeding of innovations – labor market information.
  • 2. Targeted seeding in clusters.
  • 3. Changing talent reference systems (Petersen et al., 2000).
  • 4. Reforming corporate governance boards (Edling et al., 2012).

Homophily Rearrangement Algorithms and Similarity Based Diffusion on Networks Thesis defense

slide-24
SLIDE 24

Introduction Informal model descriptions Simulations Summary References

Limitations and possible extensions

Limitations:

◮ Theoretical:

  • 1. Computation of homophily levels is inefficient.
  • 2. Same sign of target homophily levels.

◮ Simulations:

  • 1. Simple topology.
  • 2. Simplistic assumptions about the distribution of generic vertex

features.

  • 3. Testing with a few homophily measurement functions.
  • 4. Diffusion only analyzed for univariate systems.

Further research ideas:

  • 1. Scaling up the homophily rearrangement algorithms.
  • 2. Identification of optimal seeding strategies.
  • 3. Sensitivity analysis on topology different from a lattice.
  • 4. Ensembles of different homophily rearrangement algorithms.

Homophily Rearrangement Algorithms and Similarity Based Diffusion on Networks Thesis defense

slide-25
SLIDE 25

Introduction Informal model descriptions Simulations Summary References

Thank You for the kind attention!

Homophily Rearrangement Algorithms and Similarity Based Diffusion on Networks Thesis defense

slide-26
SLIDE 26

Introduction Informal model descriptions Simulations Summary References

References I

  • H. Bisgin, et al. (2010). ‘Investigating Homophily in Online Social Net-

works’. In IEEE (ed.), Web Intelligence and Intelligent Agent Technology (WI-IAT), IEEE/WIC/ACM International Conference, vol. 1.

  • D. Centola (2011). ‘An Experimental Study of Homophily in the Adoption
  • f Health Behavior’. Science 334:1269–1273.
  • C. Edling, et al. (2012).

The Small Worlds of Corporate Governance,

  • chap. Testing the Old Boys Network: Diversity and Board Interlocks in

Scandinavia, pp. 183–202. MIT Press.

  • J. Epstein (1986).

Process and Outcome in Peer Relationships, chap. Friendship Selection: Developmental and Environmental Influences., pp. 129–160. New York: Academic Press.

  • G. Fagiolo, et al. (2007). ‘Segregation in Networks’. Journal of Economic

Behavior and Organization 64:316–336.

  • R. M. Fernandez & I. Fernandez-Mateo (2006).

‘Networks, Race, and Hiring’. American Sociological Review 71(1):42–71.

Homophily Rearrangement Algorithms and Similarity Based Diffusion on Networks Thesis defense

slide-27
SLIDE 27

Introduction Informal model descriptions Simulations Summary References

References II

  • Y. Halberstam & B. Knight (2014).

‘Homophily, Group Size, and the Diffusion of Political Information in Social Networks’. National Bureau

  • f Economic Research .
  • K. M. Harris (2009).

‘The National Longitudinal Study of Adolescent to Adult Health (Add Health), Waves I and II, 1994–1996; Wave III, 2001–2002; Wave IV, 2007-2009’. Chapel Hill, NC: Carolina Population Center, University of North Carolina at Chapel Hill. DOI: 10.3886/ICPSR27021.v9. M. O. Jackson, et al. (2016). ‘The Economic Conse- quences

  • f

Social Network Structure’. Available at SSRN: http://ssrn.com/abstract=2467812.

  • D. Kempe, et al. (2003). ‘Maximizing the Spread of Influence Through

a Social Network’. In Proceedings of the ninth ACM SIGKDD interna- tional conference on Knowledge discovery and data mining., pp. 137–

  • 146. ACM.

Homophily Rearrangement Algorithms and Similarity Based Diffusion on Networks Thesis defense

slide-28
SLIDE 28

Introduction Informal model descriptions Simulations Summary References

References III

  • B. Kogut, et al. (2012). The Small Worlds of Corporate Governance, chap.

Generating Rules and the Social Science of Governance, pp. 259–299. MIT Press.

  • M. McPherson, et al. (2001). ‘Birds of a Feather: Homophily in Social

Networks’. Annual Review of Sociology 27:415–444.

  • S. Navlakha & C. Kingsford (2010). ‘The Power of Protein Interaction Net-

works for Associating Genes with Diseases’. Bioinformatics 26(8):1057– 1063.

  • T. Petersen, et al. (2000). ‘Offering a Job: Meritocracy and Social Net-

works’. American Journal of Sociology 106(3):763–816.

  • A. P. Quayle, et al. (2006). ‘Modeling Network Growth with Assortative

Mixing’. The European Physical Journal B - Condensed Matter and Complex Systems 50(4):617–630.

  • T. Schelling (1969). ‘Models of Segregation’. American Economic Review

59:488–493.

Homophily Rearrangement Algorithms and Similarity Based Diffusion on Networks Thesis defense

slide-29
SLIDE 29

Introduction Informal model descriptions Simulations Summary References

References IV

  • P. van Eck & W. Jager (2010). ‘Social Network Structures in Agent Based

Modelling: Finding an Optimal Structure Based on Survey Data (or Finding the Network That Does Not Exist).’. In Proceedings of the 3rd World Congress on social simulation.

  • M. Yavas & G. Yusel (2014). ‘Impact of Homophily on Diffusion Dynamics

Over Social Networks’. Social Science Computer Review 33(3):354–372.

Homophily Rearrangement Algorithms and Similarity Based Diffusion on Networks Thesis defense