A Tale of two communities Assessing Homophily in Node-Link Diagrams - - PowerPoint PPT Presentation

a tale of two communities
SMART_READER_LITE
LIVE PREVIEW

A Tale of two communities Assessing Homophily in Node-Link Diagrams - - PowerPoint PPT Presentation

A Tale of two communities Assessing Homophily in Node-Link Diagrams 23rd International Symposium on Graph-Drawing and Network Visualization Los Angeles, September 26, 2015 Wouter Meulemans City University London Andr e Schulz


slide-1
SLIDE 1

23rd International Symposium on

A Tale of two communities

Los Angeles, September 26, 2015 Graph-Drawing and Network Visualization Wouter Meulemans City University London Andr´ e Schulz FernUniversit¨ at in Hagen

Assessing Homophily in Node-Link Diagrams

slide-2
SLIDE 2

A Tale of two Communities Meulemans and Schulz, GD15

Homophily

slide-3
SLIDE 3

A Tale of two Communities Meulemans and Schulz, GD15

Homophily

homophily is a concept in social network analysis

slide-4
SLIDE 4

A Tale of two Communities Meulemans and Schulz, GD15

Homophily

homophily is a concept in social network analysis more likely that two individuals with a common charactristic

form a link → homophily

slide-5
SLIDE 5

A Tale of two Communities Meulemans and Schulz, GD15

Homophily

homophily is a concept in social network analysis more likely that two individuals with a common charactristic

form a link → homophily

(example: same-gender links are more likely in a friendship-networks)

slide-6
SLIDE 6

A Tale of two Communities Meulemans and Schulz, GD15

Homophily

homophily is a concept in social network analysis more likely that two individuals with a common charactristic

form a link → homophily

(example: same-gender links are more likely in a friendship-networks)

reason 1 for homophily: “Birds of feather flock together”

(social selection)

slide-7
SLIDE 7

A Tale of two Communities Meulemans and Schulz, GD15

Homophily

homophily is a concept in social network analysis more likely that two individuals with a common charactristic

form a link → homophily

(example: same-gender links are more likely in a friendship-networks)

reason 1 for homophily: “Birds of feather flock together”

(social selection)

reason 2 for homophily: we form characteristics similar to

  • ur friends

(social influence)

slide-8
SLIDE 8

A Tale of two Communities Meulemans and Schulz, GD15

Homophily

homophily is a concept in social network analysis more likely that two individuals with a common charactristic

form a link → homophily

(example: same-gender links are more likely in a friendship-networks)

reason 1 for homophily: “Birds of feather flock together”

(social selection)

reason 2 for homophily: we form characteristics similar to

  • ur friends

(social influence)

also effects opposite to homophily can occur (heterophily)

slide-9
SLIDE 9

A Tale of two Communities Meulemans and Schulz, GD15

Homophily

homophily is a concept in social network analysis more likely that two individuals with a common charactristic

form a link → homophily

(example: same-gender links are more likely in a friendship-networks)

reason 1 for homophily: “Birds of feather flock together”

(social selection)

reason 2 for homophily: we form characteristics similar to

  • ur friends

(social influence)

also effects opposite to homophily can occur (heterophily) homophily is not restricted to social networks

(Question: groups = clusters?)

slide-10
SLIDE 10

A Tale of two Communities Meulemans and Schulz, GD15

Formalizing Homophily

slide-11
SLIDE 11

A Tale of two Communities Meulemans and Schulz, GD15

Formalizing Homophily

Group A Group B fraction p of the individuals fraction q of the individuals

slide-12
SLIDE 12

A Tale of two Communities Meulemans and Schulz, GD15

Formalizing Homophily

Group A Group B fraction p of the individuals fraction q of the individuals

A random link is

  • with probability p2: A ↔ A
  • with probability q2: B ↔ B
  • with probability 2pq: A ↔ B

p2 q2 2pq

slide-13
SLIDE 13

A Tale of two Communities Meulemans and Schulz, GD15

Formalizing Homophily

Group A Group B fraction p of the individuals fraction q of the individuals

A random link is

  • with probability p2: A ↔ A
  • with probability q2: B ↔ B
  • with probability 2pq: A ↔ B

p2 q2 2pq Homophily Test

If the fraction of the between-group links is significantly smaller than 2pq we have homophily.

slide-14
SLIDE 14

A Tale of two Communities Meulemans and Schulz, GD15

Degree of Homophily

we want to measure the degree of homophily in a network

slide-15
SLIDE 15

A Tale of two Communities Meulemans and Schulz, GD15

Degree of Homophily

we want to measure the degree of homophily in a network

3 no cross-group links (homophily) 1

  • nly cross-group links (heterophily)

2 2pq cross-group links (balanced) Important Cases

slide-16
SLIDE 16

A Tale of two Communities Meulemans and Schulz, GD15

Degree of Homophily

we want to measure the degree of homophily in a network

3 no cross-group links (homophily) 1

  • nly cross-group links (heterophily)

2 2pq cross-group links (balanced) Important Cases 1/2 1 Degree of Homophily

slide-17
SLIDE 17

A Tale of two Communities Meulemans and Schulz, GD15

Degree of Homophily

we want to measure the degree of homophily in a network

3 no cross-group links (homophily) 1

  • nly cross-group links (heterophily)

2 2pq cross-group links (balanced) Important Cases 1/2 1 Degree of Homophily

1/2 1

degree of homophily

fraction of cross-group links

interpolate all other values linearly

2pq 1

slide-18
SLIDE 18

A Tale of two Communities Meulemans and Schulz, GD15

Research Questions

slide-19
SLIDE 19

A Tale of two Communities Meulemans and Schulz, GD15

Research Questions Can an observer assess homophily in a node-link diagram?

slide-20
SLIDE 20

A Tale of two Communities Meulemans and Schulz, GD15

Research Questions Can an observer assess homophily in a node-link diagram?

Subquestions:

Which node-link diagram layout is best suitable for

detecting homophily?

slide-21
SLIDE 21

A Tale of two Communities Meulemans and Schulz, GD15

Research Questions Can an observer assess homophily in a node-link diagram?

Subquestions:

Is there a tendency for overestimation or underestimation? Which node-link diagram layout is best suitable for

detecting homophily?

slide-22
SLIDE 22

A Tale of two Communities Meulemans and Schulz, GD15

Research Questions Can an observer assess homophily in a node-link diagram?

Are there general design principles to improve homophily

detection? Subquestions:

Is there a tendency for overestimation or underestimation? Which node-link diagram layout is best suitable for

detecting homophily?

slide-23
SLIDE 23

A Tale of two Communities Meulemans and Schulz, GD15

Research Questions Can an observer assess homophily in a node-link diagram?

Are there general design principles to improve homophily

detection?

! We only consider node-link diagrams and the “two-groups-scenario”

Subquestions:

Is there a tendency for overestimation or underestimation? Which node-link diagram layout is best suitable for

detecting homophily?

slide-24
SLIDE 24

A Tale of two Communities Meulemans and Schulz, GD15

Layouts

force-directed polarized bipartite

slide-25
SLIDE 25

A Tale of two Communities Meulemans and Schulz, GD15

Layouts

force-directed polarized bipartite

layout based on the Fruchtermann–Reingold Algorithm implementation taken from the d3.js library

slide-26
SLIDE 26

A Tale of two Communities Meulemans and Schulz, GD15

Layouts

force-directed polarized bipartite

modification of the force-directed layout additional forces pull blue vertices to the left and red vertices to

the right

slide-27
SLIDE 27

A Tale of two Communities Meulemans and Schulz, GD15

Layouts

force-directed polarized bipartite

groups are placed on opposing vertical lines barycentric layout + sifting to remove crossings different shapes for cross-group/within-group edges

slide-28
SLIDE 28

A Tale of two Communities Meulemans and Schulz, GD15

Layouts

force-directed polarized bipartite group separation

slide-29
SLIDE 29

A Tale of two Communities Meulemans and Schulz, GD15

Layouts

force-directed polarized bipartite group separation homophily detection easier?

  • ther tasks more difficult?
slide-30
SLIDE 30

A Tale of two Communities Meulemans and Schulz, GD15

Hypothesis

slide-31
SLIDE 31

A Tale of two Communities Meulemans and Schulz, GD15

Hypothesis

H1 For Homophily assessment we have force-directed < polarized < bipartite

x < y means y is better than x

slide-32
SLIDE 32

A Tale of two Communities Meulemans and Schulz, GD15

H2 For Homophily assesment we have

Hypothesis

H1 For Homophily assessment we have force-directed < polarized < bipartite unbalanced < balanced

x < y means y is better than x

slide-33
SLIDE 33

A Tale of two Communities Meulemans and Schulz, GD15

H2 For Homophily assesment we have

Hypothesis

H1 For Homophily assessment we have force-directed < polarized < bipartite H3 For shortest path queries we have force-directed > polarized > bipartite unbalanced < balanced

x < y means y is better than x

slide-34
SLIDE 34

A Tale of two Communities Meulemans and Schulz, GD15

User Study Design

slide-35
SLIDE 35

A Tale of two Communities Meulemans and Schulz, GD15

User Study Design

mixed design (too much trials otherwise)

slide-36
SLIDE 36

A Tale of two Communities Meulemans and Schulz, GD15

User Study Design

mixed design (too much trials otherwise) between subject

  • 3 graph sizes (20-28 nodes, 20-40 edges)
slide-37
SLIDE 37

A Tale of two Communities Meulemans and Schulz, GD15

User Study Design

mixed design (too much trials otherwise)

  • balanced (50:50) and unbalanced (25:75)
  • 3 layouts
  • 5 degree of homophily levels (only 3 for unbalanced)
  • 2 tasks (homophily / length of shortest path)

within subjects between subject

  • 3 graph sizes (20-28 nodes, 20-40 edges)
slide-38
SLIDE 38

A Tale of two Communities Meulemans and Schulz, GD15

User Study Design

mixed design (too much trials otherwise)

  • balanced (50:50) and unbalanced (25:75)
  • 3 layouts
  • 5 degree of homophily levels (only 3 for unbalanced)
  • 2 tasks (homophily / length of shortest path)

within subjects between subject

  • 3 graph sizes (20-28 nodes, 20-40 edges)

demo of the user study

http://tutte.fernuni-hagen.de/~schulza

slide-39
SLIDE 39

A Tale of two Communities Meulemans and Schulz, GD15

Evaluating Results

slide-40
SLIDE 40

A Tale of two Communities Meulemans and Schulz, GD15

Evaluating Results

Users have an internal “scale” for the degree of homophily

slide-41
SLIDE 41

A Tale of two Communities Meulemans and Schulz, GD15

Evaluating Results

Users have an internal “scale” for the degree of homophily

True Degree of Homophily Estimation good estimation bad estimation

  • kay estimation

but overestimated good estimation (different personal scale)

slide-42
SLIDE 42

A Tale of two Communities Meulemans and Schulz, GD15

Homophily Results

0% 50% 100% 0% 50% 100%

All Sizes

0% 50% 100% 0% 50% 100% 0% 50% 100% 0% 50% 100%

Force-Directed Polarized Bipartite

slide-43
SLIDE 43

A Tale of two Communities Meulemans and Schulz, GD15

Homophily Results

0% 50% 100% 0% 50% 100%

All Sizes

0% 50% 100% 0% 50% 100% 0% 50% 100% 0% 50% 100%

Force-Directed Polarized Bipartite

polarized < bipartite, force-directed

slide-44
SLIDE 44

A Tale of two Communities Meulemans and Schulz, GD15

Homophily Results

0% 50% 100% 0% 50% 100%

All Sizes

0% 50% 100% 0% 50% 100% 0% 50% 100% 0% 50% 100%

Force-Directed Polarized Bipartite

polarized < bipartite, force-directed

statistical evidence

slide-45
SLIDE 45

A Tale of two Communities Meulemans and Schulz, GD15

Homophily Results

0% 50% 100% 0% 50% 100%

All Sizes

0% 50% 100% 0% 50% 100% 0% 50% 100% 0% 50% 100%

Force-Directed Polarized Bipartite

polarized < bipartite, force-directed

statistical evidence

no difference between force-direced and bipartite

slide-46
SLIDE 46

A Tale of two Communities Meulemans and Schulz, GD15

Homophily Results

0% 50% 100% 0% 50% 100%

All Sizes

0% 50% 100% 0% 50% 100% 0% 50% 100% 0% 50% 100%

Force-Directed Polarized Bipartite

polarized < bipartite, force-directed

statistical evidence

no difference between force-direced and bipartite

Deviation Response time

15% 0% All Size 1 Size 2 Size 3 20s 0s All Size 1 Size 2 Size 3 −15%

B P FD

slide-47
SLIDE 47

A Tale of two Communities Meulemans and Schulz, GD15

Homophily Results - Internal Consistency

Size 3

0% 50% 100% 0% 50% 100% 0% 50% 100% 0% 50% 100% 0% 50% 100% 0% 50% 100%

Force-Directed Polarized Bipartite All Sizes

90 110 60

slide-48
SLIDE 48

A Tale of two Communities Meulemans and Schulz, GD15

Homophily Results - Internal Consistency

Size 3

0% 50% 100% 0% 50% 100% 0% 50% 100% 0% 50% 100% 0% 50% 100% 0% 50% 100%

Force-Directed Polarized Bipartite

individual results, decreasing parts = defects (red)

All Sizes

90 110 60

slide-49
SLIDE 49

A Tale of two Communities Meulemans and Schulz, GD15

Homophily Results - Internal Consistency

Size 3

0% 50% 100% 0% 50% 100% 0% 50% 100% 0% 50% 100% 0% 50% 100% 0% 50% 100%

Force-Directed Polarized Bipartite

individual results, decreasing parts = defects (red) many inconsistencies (not clear from the aggregated data)

All Sizes

90 110 60

slide-50
SLIDE 50

A Tale of two Communities Meulemans and Schulz, GD15

Homophily Results - Internal Consistency

Size 3

0% 50% 100% 0% 50% 100% 0% 50% 100% 0% 50% 100% 0% 50% 100% 0% 50% 100%

Force-Directed Polarized Bipartite

individual results, decreasing parts = defects (red) many inconsistencies (not clear from the aggregated data) evidence that bipartite > force-directed

All Sizes

90 110 60

slide-51
SLIDE 51

A Tale of two Communities Meulemans and Schulz, GD15

Homophily Results - Internal Consistency

Size 3

0% 50% 100% 0% 50% 100% 0% 50% 100% 0% 50% 100% 0% 50% 100% 0% 50% 100%

Force-Directed Polarized Bipartite

individual results, decreasing parts = defects (red) many inconsistencies (not clear from the aggregated data) evidence that bipartite > force-directed tendency to overestimate in the polarized layout

All Sizes

90 110 60

slide-52
SLIDE 52

A Tale of two Communities Meulemans and Schulz, GD15

Shortest Path Results

Error rate Response time

100% 0% All Size 1 Size 2 Size 3 20s 0s All Size 1 Size 2 Size 3

B P FD

slide-53
SLIDE 53

A Tale of two Communities Meulemans and Schulz, GD15

Shortest Path Results

Error rate Response time

100% 0% All Size 1 Size 2 Size 3 20s 0s All Size 1 Size 2 Size 3

B P FD

forced-directed better than polarized better than bipartite

(again supported by statistical evidence)

slide-54
SLIDE 54

A Tale of two Communities Meulemans and Schulz, GD15

Shortest Path Results

Error rate Response time

100% 0% All Size 1 Size 2 Size 3 20s 0s All Size 1 Size 2 Size 3

B P FD

forced-directed better than polarized better than bipartite

(again supported by statistical evidence)

there was one problematic instance in size group 3 for the

bipartite layout, caused by collinearities in the layout

slide-55
SLIDE 55

A Tale of two Communities Meulemans and Schulz, GD15

Shortest Path Results

Error rate Response time

100% 0% All Size 1 Size 2 Size 3 20s 0s All Size 1 Size 2 Size 3

B P FD

forced-directed better than polarized better than bipartite

(again supported by statistical evidence)

there was one problematic instance in size group 3 for the

bipartite layout, caused by collinearities in the layout

size was not a big influence

slide-56
SLIDE 56

A Tale of two Communities Meulemans and Schulz, GD15

Summary

H1 For Homophily assessment we have force-directed < polarized < bipartite

slide-57
SLIDE 57

A Tale of two Communities Meulemans and Schulz, GD15

Summary

H1 For Homophily assessment we have force-directed < polarized < bipartite

we can only partially accept H1:

polarized < bipartite polarized < force-directed

we can only partially accept H1: internal consistency data supports

force-directed < bipartite

slide-58
SLIDE 58

A Tale of two Communities Meulemans and Schulz, GD15

Summary

H1 For Homophily assessment we have force-directed < polarized < bipartite

we can only partially accept H1:

polarized < bipartite polarized < force-directed

we can only partially accept H1: internal consistency data supports

force-directed < bipartite H2 For Homophily assesment unbalanced < balanced H3 For shortest path queries we have force-directed > polarized > bipartite

slide-59
SLIDE 59

A Tale of two Communities Meulemans and Schulz, GD15

Summary

H1 For Homophily assessment we have force-directed < polarized < bipartite

we can only partially accept H1:

polarized < bipartite polarized < force-directed

we can only partially accept H1: internal consistency data supports

force-directed < bipartite H2 For Homophily assesment unbalanced < balanced H3 For shortest path queries we have force-directed > polarized > bipartite

we can accept H2 and H3 based on our statistical analysis

slide-60
SLIDE 60

A Tale of two Communities Meulemans and Schulz, GD15

Final thoughts

slide-61
SLIDE 61

A Tale of two Communities Meulemans and Schulz, GD15

Final thoughts

homophily is difficult to assess, but when averaging over a

set of indivduals we get a good estimate

slide-62
SLIDE 62

A Tale of two Communities Meulemans and Schulz, GD15

Final thoughts

homophily is difficult to assess, but when averaging over a

set of indivduals we get a good estimate

the bipartite layout helped to assess homophily at the

costs of more difficult path tracing

slide-63
SLIDE 63

A Tale of two Communities Meulemans and Schulz, GD15

Final thoughts

homophily is difficult to assess, but when averaging over a

set of indivduals we get a good estimate

the bipartite layout helped to assess homophily at the

costs of more difficult path tracing

node seperation, was not the primary reason for this, since

the polarized layout was outperformed

slide-64
SLIDE 64

A Tale of two Communities Meulemans and Schulz, GD15

Final thoughts

homophily is difficult to assess, but when averaging over a

set of indivduals we get a good estimate

the bipartite layout helped to assess homophily at the

costs of more difficult path tracing

node seperation, was not the primary reason for this, since

the polarized layout was outperformed

the unbalanced case is harder

slide-65
SLIDE 65

A Tale of two Communities Meulemans and Schulz, GD15

Final thoughts

homophily is difficult to assess, but when averaging over a

set of indivduals we get a good estimate

the bipartite layout helped to assess homophily at the

costs of more difficult path tracing

node seperation, was not the primary reason for this, since

the polarized layout was outperformed

the unbalanced case is harder there is a tendency to overestimate in the polarized layout

slide-66
SLIDE 66

A Tale of two Communities Meulemans and Schulz, GD15

Final thoughts

homophily is difficult to assess, but when averaging over a

set of indivduals we get a good estimate

the bipartite layout helped to assess homophily at the

costs of more difficult path tracing

node seperation, was not the primary reason for this, since

the polarized layout was outperformed

the unbalanced case is harder there is a tendency to overestimate in the polarized layout

Thank you for your attention!