EVOLUTION DYNAMICS IN SOCIAL NETWORKS Ashwin Bahulkar Advisor & - - PowerPoint PPT Presentation

evolution dynamics in social networks
SMART_READER_LITE
LIVE PREVIEW

EVOLUTION DYNAMICS IN SOCIAL NETWORKS Ashwin Bahulkar Advisor & - - PowerPoint PPT Presentation

EVOLUTION DYNAMICS IN SOCIAL NETWORKS Ashwin Bahulkar Advisor & Collaborators: Boleslaw K. Szymanski , Kevin Chan 1 , Omar Lizardo 2 1 US Army Research Laboratory 2 University of Notre Dame, Notre Dame, IN, USA supported by Network Science


slide-1
SLIDE 1

EVOLUTION DYNAMICS IN SOCIAL NETWORKS

Ashwin Bahulkar Advisor & Collaborators: Boleslaw K. Szymanski, Kevin Chan1, Omar Lizardo2

1US Army Research Laboratory 2University of Notre Dame, Notre Dame, IN, USA supported by Network Science CTA, ARL

slide-2
SLIDE 2

Overview

  • Link Formation and Dissolution in attribute-rich

networks

  • Can we predict the state of a network from node attributes?
  • Which node attributes can predict formation and dissolution of

edges in networks.

  • Coevolution of node-aligned multiple layers in

networks

  • Multiple layers: several networks sharing the same node-set,

different relations among nodes.

  • Coevolution: Do edges occur in one network before they do so in

another?

  • Groups and Influence

2

slide-3
SLIDE 3

Motivation

  • Find out which factors affect evolution of networks
  • Sociological interests: influence policy making in
  • rganizations, based on factors
  • Bring stability to networks in organizations through policies, if

desired

  • Infer cause of instability in networks
  • Build strong, stable teams in organizations
  • Commercial interests: influence advertisement, marketing

and reach-out strategies

3

slide-4
SLIDE 4

Part 1

Link Formation and Dissolution in Attribute-rich Networks

4

slide-5
SLIDE 5

Introduction

  • How much does knowledge of node-attributes improve

link formation and dissolution prediction?

  • How should these attributes be used to make predictions?
  • Find which attributes are correlated with formation of new

links

  • We introduce the preference model
  • Find which attributes are correlated with dissolution and

persistence of existing links.

  • Track network stability with link prediction

5

slide-6
SLIDE 6

Link Prediction

  • Link Formation Prediction:
  • Given is a social network, which evolves over time and this evolution is

recorded in a sequence of network snapshots.

  • Some new edges are created, some old edges get dissolved and some

node are removed from one network snapshot to another.

  • At any given snapshot, which edges would be created in the future

snapshot?

  • Highly unbalanced classification, very few potential links are created

New links New links Training set Test set Link Dissolution Prediction: similar, predict which links would dissolve.

visible visible visible visible hidden hidden

6

slide-7
SLIDE 7

Related Work

  • Existing link prediction approaches:
  • Topology based link predictors
  • Machine learning based
  • Markov model or graphical model based
  • Little work on attribute-rich networks, attributes are used

in very simplistic manner

  • Little work on dissolution prediction
  • Attribute-rich data has become recently available to us,

although the size of networks is relatively small

7

slide-8
SLIDE 8

Attribute-rich data: NetSense

  • Nodes: Students from University of NotreDame, from Freshman to

Junior years, around 2 years, 200 of them.

  • Data collected:
  • Call and message logs between students in the study.
  • Contact data based on bluetooth recorded proximity.
  • Nominations of significant peers, opinions on social & political issues,

student background and university activities for every student.

  • Frequency:
  • Nominations and opinions were collected in the form of surveys at

the beginning of every semester.

8

slide-9
SLIDE 9

Evolving NetSense Networks

Network snapshots are taken for every semester of the year: Fall and Spring.

  • Behavioral Networks : Based on calls and texts made

in the semester. An edge exists if there is a call or text exchange between two nodes. Typical network size ranges from 150-200 nodes and 200-350 edges. We have snapshots for 4 semesters.

  • Nominative Network : Based on survey answers by

students to “Who are your top contacts”.

9

slide-10
SLIDE 10

Node Attributes

  • Student background:
  • Major in the Notre Dame programs
  • Behavioral traits
  • Family income, race and religion
  • Opinions on:
  • Politics
  • Abortion and marijuana legalization
  • Homosexuality and gay marriage
  • Habits and Lifestyle:
  • Drinking habits
  • Time spent on weekly activities: studying, partying etc.

10

slide-11
SLIDE 11

Attributes for link prediction

  • We use machine learning for link prediction
  • The Homophily Model:
  • “Birds of a feather flock together”
  • Node n1, n2; attribute values a1 = a1; feature value = 1
  • Node n1, n2; attribute values a1 ≠ a2; feature value = 0
  • Does this work? Not so much.
  • Why? Nodes have different “preferences” for different attribute

values

  • We introduce the “preference model”.

11

slide-12
SLIDE 12

A case for the preference model

Values > 1 indicate preference for, values < indicates preference against.

  • Different groups of people have different attributes
  • Still, difficult to generalize preferences on a group-basis
  • Different nodes would have different preferences for attributes

12

slide-13
SLIDE 13

Intuition of the Preference Model

  • Population: 60% liberals, 40% conservatives
  • Node 1: liberal; 90% contacts liberals, 10% conservatives
  • Strong bias towards liberals, strong bias against conservatives
  • Node 2: conservative; 50% liberals, 50% conservatives
  • Only slight bias towards conservatives
  • We capture the bias, or “anomaly” for every attribute

value, for each node, with reference to the population.

13

slide-14
SLIDE 14

Individual Preferences of Nodes

  • Features for machine learning:
  • Node Preferences -> Edge Preferences
  • Some network features: number of common neighbors
  • Node preference feature:
  • For an edge with nodes n1 and n2, for attribute a:
  • Feature-value (a) = n1->preference(n2.a) * n2-> preference(n1.a).
  • Calculate preference of node n1 for attribute-value v:
  • n1 has n contacts with attribute-value v.
  • Calculate Z-score of having n contacts
  • Z-Score= (value – expected mean) / standard deviation
  • Obtain scores, which can vary from -3.4 to +3.4
  • Convert to a range 0 - 1

14

slide-15
SLIDE 15

Results with the Preference method

  • Link Prediction: We get about 90% recall with good

accuracy, using SVM, Linear and Logistic regression.

  • Link Dissolution Prediction: 80-90% accuracy
  • Below are the plots of recall vs. false positives for different

thresholds in linear regression.

15

slide-16
SLIDE 16

Results and Ranking of Attributes

  • Ranking of attributes: Leave-feature out, weight in linear

regression

16

Behavior Nomination Link Creation Link Dissolution

1. Political Views 2. Parental Income 3. Common Neighbors 4. Time Volunteering 5. Time Exercising 1. Views on Homosexuality 2. Political Views 3. Time socializing 4. Time Partying 5. Marijuana Legalization 1. Time socializing 2. Time in Clubs 3. Marijuana Legalization 4. Time Exercising 5. Time Studying 1. Gay Marriage Legalization 2. Political Views 3. Parental Income 4. Views on homosexuality 5. Time Camping

slide-17
SLIDE 17

Track Network Stability by Link Prediction

  • Networks evolve over time
  • Patterns of new Link formation also change over time
  • We look at the network of researchers studying

Leishmaniasis, a rare disease

  • Network spreads over several countries, including, Brazil,

India, US, European Union countries

  • From 1980 to 2015, leaders of research changed over

time, nature of link formation also changed

  • We use link prediction to track the change

17

slide-18
SLIDE 18

Experiment

  • Perform link prediction over the period 1980 to 2015,

divided into seven 5-year snapshots

  • Perform link prediction using older snapshots, see if the

models still apply

  • Perform link prediction only on newly emerging nodes,

and compare with older nodes

  • Features:
  • Network topology features, common areas of research, country of
  • rigin, recency and strength of collaboration
  • Network size:
  • Ranges from 700 to 5000 nodes, and 1200 to 34,000 edges.

18

slide-19
SLIDE 19

Results

  • Using the most recent snapshot, we get recall values

between 60-80%

  • Using older snapshots, recall and accuracy values both

drop, about a 8-10% drop

  • Edges between old nodes vs. new nodes:
  • Till 2000, recall of edges between old and new nodes is equivalent.
  • After 2000, recall of edges with new nodes is very poor, increases a

little by 2015

  • Possible large scale disruption in 2000 in the network
  • Leadership in research passes from USA, Europe to

India, Brazil, and focus shifts from fundamental research to more diagnostic and trials based work

19

slide-20
SLIDE 20

Part 2

Coevolution of a Multilayer Node-aligned Network whose Layers Represent Different Social Relations

20

slide-21
SLIDE 21

Coevolution of Multiple Layers in Social Networks

  • Continuously

evolving cognitive and behavioral layers.

  • Are behavioral

edges formed before nominative edges are formed?

  • How likely does

behavioral edge dissolve after the corresponding edge disappears in the nominative network? Nominative network (red edges) and behavioral network(green edges).

21

slide-22
SLIDE 22

Questions

  • Are behavioral edges formed before nomination edges

are formed?

  • How likely does behavioral edge dissolve after the

corresponding edge disappears in the nomination network?

  • Are there any patterns of communication decay following

link dissolution in the nomination network?

  • Do symmetric nominations differ from asymmetric

nominations?

22

slide-23
SLIDE 23

Dataset: NetSense

  • NetSense communication and nomination data is used.
  • Also, bluetooth interactions data is used.
  • Behavioral Layers : Layer based on communication

edges, and based on bluetooth proximity measures. Bluetooth proximity layer is much more dense. We have snapshots for 4 semesters.

  • Nominative Layer : Based on survey answers by

students to “Who are your top contacts”.

23

slide-24
SLIDE 24

Behavior Before Nomination

  • We can predict future

edges with a good accuracy and recall, based on number of calls and texts.

  • Higher communication

corresponds to edge formation in the next semester.

Prediction accuracy and recall: 70-80%

24

slide-25
SLIDE 25

Behavior After Nomination

  • Does behavior

changes after nomination? Yes, they do.

  • Contacts have much

higher communication than non-contacts who communicate.

  • Newly formed edges

communicate less than older edges.

Prediction accuracy and recall: 70-80%

25

slide-26
SLIDE 26

ROC curve calls vs. text vs. collocations

26

slide-27
SLIDE 27

Temporal Features of Nomination

  • Collocations of edges in nominative layer:
  • Significant collocations on weekends and weekday evenings
  • Collocations of edges not in the nominative layer
  • Most collocations on weekdays, largely during the working times of

day

27

slide-28
SLIDE 28

Slow Progression of Nomination

  • Collocation -> Communication -> Nomination
  • Coevolution of behavioral networks with each other
  • Higher collocation often leads to creation of edge in

communication layer

  • Out of the new communication edges, nodes whose

edges represent more talking and collocations, show nomination subsequently.

  • Ones which drop contact, don’t show nomination.

28

slide-29
SLIDE 29

Behavior Decay After Nomination Dissolution

  • Typically, 50-55% of behavior edges not connected in the

nominative network stop communicating in the next semester.

  • However, out of the edges which dissolve contact, 70-

75% of them stop communicating in the next semester.

  • Implication: Dissolution of nomination is faster than

formation of nomination.

29

slide-30
SLIDE 30

Discussion and Conclusion

  • Good predictions with using the “preference model”
  • Important attributes: income and political views, known in

sociological literature, never tested on real-life datasets

  • Future: apply to bigger, more diverse datasets
  • We observe the process and speed of formation and

dissolution of links is different

  • Sociological theories with respect to coevolution of

behavior and nomination have been verified

  • Limitations: small size of data, college campus, several
  • ther factors affect links

30

slide-31
SLIDE 31

Part 3: Groups and Influence

31

slide-32
SLIDE 32

Overview

  • We study the formation of stable groups in a face-to-face

interactions network based on bluetooth proximity records.

  • We discuss a method of identifying small, stable groups

which have face-to-face meetings.

  • We discuss how node attributes play a role in the

formation of groups

  • We look at the different purposes of groups which can be

identified, from temporal behavior of group meetings.

32

slide-33
SLIDE 33

Which Groups do We Aim to Discover?

  • We discover groups with the following properties:
  • Stable over time, a semester for example
  • Repeated, face-to-face interactions
  • How is this different from community detection:
  • Based on non-aggregated F2F interactions
  • Bluetooth interactions network is very dense with several
  • verlapping communities
  • Discover groups directly from interactions.

33

slide-34
SLIDE 34

How are Groups Discovered?

  • Method:
  • Groups discovered for every semester
  • Get multiple snapshots of network over time, of 10 minutes
  • Identify connected components in each snapshot. This is a

potential group.

  • However,
  • Members might be missing on certain meetings
  • Visiting members might be present on certain meetings
  • Group might evolve a little during the course of time
  • Merge these components across time using certain rules.

34

slide-35
SLIDE 35

Creation of Groups

  • Merging connected components into groups
  • Input: Connected components, with number of meetings
  • Output: Groups
  • Parameters: intersection threshold, membership

threshold, member-intersection threshold.

  • Method:
  • Merge groups iteratively, using hierarchical clustering
  • Merge a pair of groups if:
  • Intersection of members > intersection threshold
  • Potential member : Members having attendance % > membership threshold,
  • Intersection of potential members > membership – intersection threshold
  • Merge by adding intersecting and potential members of both groups.
  • Stop merging when no new merges can be made

35

slide-36
SLIDE 36

Merging of components

Not regular members Regular members

Intersection = 4/6 = 0.66 # potential members = 4 Intersection of potential members = 3/4 = 0.75

Compo nent 1 Compo nent 2 New group- A, B, C, D, E.

A B C D E F

Compo nent 1 Compo nent 2 No merge

A B C D E F intersection threshold = 0.6, membership threshold = 0.3, member-intersection threshold = 0.5

Compo nent 1 Compo nent 2 No merge

A B C D E F Intersection = 4/6 = 0.66 # potential members = 3 Intersection of potential members = 1/3 = 0.33 Intersection = 2/6 = 0.33

The membership threshold decides this value 36

slide-37
SLIDE 37

Groups Discovered

  • With certain values on the thresholds:
  • intersection threshold = 0.6, membership threshold = 0.3, member-

intersection threshold = 0.5

  • Number of nodes: 200
  • Number of groups: 256
  • Average Group Size: 4.1
  • Average member attendance: 91%
  • Higher levels of thresholds: smaller groups with higher

attendance

  • Lower levels of thresholds: larger groups with lower

attendance

37

slide-38
SLIDE 38

Thresholds and Size of Groups

Compo nent 2 New group- A, B, C, D, E, F.

B C D E F

Compo nent 1 Compo nent 2 New group- B, C, D, E.

B C D E F

Compo nent 1

A

Compo nent 1 Compo nent 2 New group- A, B, C, D, E.

A B C D E F A, B, C, D, E, all are either intersecting or regular members. A, B, C, D, E, F all are either intersecting or regular members. B, C, D, E all are either intersecting or regular members. Normal threshold levels Lower threshold levels, larger group Higher threshold levels, smaller groups are formed

38

slide-39
SLIDE 39

Group Relations and Path Length

  • In the communication layer of the network:
  • Group Contacts are 2.5 hops away from each other
  • A random node pair is 3.9 hops away
  • In the contact nomination layer:
  • Group Contacts are 3.6 hops away
  • Communication contacts are 1.2 hops away
  • Implication: Although group contacts not as “close” as

communication of nomination contacts, they are are “closer” than random pair of nodes.

39

slide-40
SLIDE 40

Group Contacts and Attributes

  • How many attributes do group contacts agree on?
  • Out of 14 attributes, group contacts agree on 7 attributes
  • While, communication contacts agree on 8 attributes
  • Random contacts agree on 5 attributes only
  • Nodes have less bias for group contacts, than

communication contacts, for several attributes, like parental income, drinking habits, views on abortion and homosexuality.

  • However, biases exist to a certain extent.
  • Implication: groups are biased, but not as much as

communication contacts.

40

slide-41
SLIDE 41

Groups with Different Purposes

  • Social and work-related purpose
  • Groups are clustered based on % of time spent on

weekends and weekday evenings

  • 2 well separated clusters emerge.
  • Social groups have 40% of meetups on weekends and weekday

evenings

  • Work groups have about 10% of meetups on weekends and

weekday evenings

  • Social groups are more biased than work groups on several

attributes.

41

slide-42
SLIDE 42

Clustering of Groups on Attributes

  • Groups are clustered based on attributes
  • Vector for group: % of members with particular attribute
  • value. All values form the vector.
  • Neat, though based clusters of groups are seen.
  • With k=2:
  • Cluster one: Groups with majority of members who are mostly

conservative, rich, against homosexuality and abortion.

  • Cluster two : Groups with majority of members who are equally

distributed on political and social views, with a liberal bias.

  • With k=3:
  • Cluster two splits into two more clusters, one has strongly liberal

members while the other contains less liberal members.

42

slide-43
SLIDE 43

Conclusion

  • We demonstrate a methodology for identifying groups of

stable and frequently interacting nodes.

  • Group relationships are stronger than random edges,

however, are more fragile than communication relationships.

  • Groups are biased on several attributes.
  • Groups have different purposes: social and work related.
  • Groups can be well clustered based on attributes: into

conservative and liberal leaning groups.

43

slide-44
SLIDE 44

References

  • Analysis of Link Formation, Persistence and Dissolution in NetSense Data,

Advances in Social Networks Analysis and Mining (ASONAM), IEEE/ACM International Conference, 2016

  • Influence of Personal Preferences on Link Dynamics in Social Networks,

Complexity, 2017

  • Network Analysis to Support Public Health: Evolution of Collaboration among

Leishmaniasis Researchers, Scientometrics, 2017

  • Co-evolution of two networks representing different social relations in

NetSense, International Workshop on Complex Networks and their Applications, Springer, 2016

  • Coevolution of a multilayer node-aligned network whose layers represent

different social relations, Computational Social Networks, Volume 4, 2017

  • Impact of Attributes on Group Formation, Proc. IEEE/ACM International

Conference on Advances in Social Networks Analysis and Mining (ASONAM), Barcelona, Spain, August 28, 2018, pp. 1250-1257.

44

slide-45
SLIDE 45

THANK YOU!

Questions?

45