Generative models for social network data Kevin S. Xu (University - - PowerPoint PPT Presentation

β–Ά
generative models for social network data
SMART_READER_LITE
LIVE PREVIEW

Generative models for social network data Kevin S. Xu (University - - PowerPoint PPT Presentation

Generative models for social network data Kevin S. Xu (University of Toledo) James R. Foulds (University of California-San Diego) SBP-BRiMS 2016 Tutorial About Us Kevin S. Xu James R. Foulds Assistant professor at Postdoctoral scholar


slide-1
SLIDE 1

Generative models for social network data

Kevin S. Xu (University of Toledo) James R. Foulds (University of California-San Diego) SBP-BRiMS 2016 Tutorial

slide-2
SLIDE 2

About Us

Kevin S. Xu

  • Assistant professor at

University of Toledo

  • 3 years research

experience in industry

  • Research interests:
  • Machine learning
  • Network science
  • Physiological data

analysis James R. Foulds

  • Postdoctoral scholar at

UCSD

  • Research interests:
  • Bayesian modeling
  • Social networks
  • Text
  • Latent variable models
slide-3
SLIDE 3

Outline

  • Mathematical representations of social networks

and generative models

  • Introduction to generative approach
  • Connections to sociological principles
  • Fitting generative social network models to data
  • Example application scenarios
  • Model selection and evaluation
  • Recent developments in generative social network

models

  • Dynamic social network models
slide-4
SLIDE 4

Social networks today

slide-5
SLIDE 5

Social network analysis: an interdisciplinary endeavor

  • Sociologists have been

studying social networks for decades!

  • First known empirical

study of social networks: Jacob Moreno in 1930s

  • Moreno called them

sociograms

  • Recent interest in social

network analysis (SNA) from physics, EECS, statistics, and many other disciplines

slide-6
SLIDE 6

Social networks as graphs

  • A social network can be represented by a graph 𝐻 = π‘Š, 𝐹
  • π‘Š: vertices, nodes, or actors typically representing people
  • 𝐹: edges, links, or ties denoting relationships between nodes
  • Directed graphs used to represent asymmetric relationships
  • Graphs have no natural representation in a geometric space
  • Two identical graphs drawn differently
  • Moral: visualization provides very limited analysis ability
  • How do we model and analyze social network data?
slide-7
SLIDE 7

Matrix representation of social networks

  • Represent graph by π‘œ Γ— π‘œ adjacency matrix or

sociomatrix 𝐙

  • π‘§π‘—π‘˜ = 1 if there is an edge between nodes 𝑗 and π‘˜
  • π‘§π‘—π‘˜ = 0 otherwise
  • Easily extended to directed and weighted graphs

𝐙 = 1 1 1 1 1 1 1 1 1 1 1 1 1 1

slide-8
SLIDE 8

Exchangeability of nodes

  • Nodes are typically assumed to be (statistically)

exchangeable by symmetry

  • Row and column permutations to adjacency matrix

do not change graph

  • Needs to be incorporated into social network models
slide-9
SLIDE 9

Sociological principles related to edge formation

  • Homophily or assortative mixing
  • Tendency for individuals to bond with similar others
  • Assortative mixing by age, gender, social class,
  • rganizational role, node degree, etc.
  • Results in transitivity (triangles) in social networks
  • β€œMy friend of my friend is my friend”
  • Equivalence of nodes
  • Two nodes are structurally equivalent if their relations to

all other nodes are identical

  • Approximate equivalence recorded by similarity measure
  • Two nodes are regularly equivalent if their neighbors are

similar (not necessarily common neighbors)

slide-10
SLIDE 10

Brief history of social network models

  • Early 1900s – sociology and social psychology precursors to SNA (Georg Simmel)
  • 1930s – Graphical depictions of social networks: sociograms (Jacob Moreno)
  • 1960s – Small world / 6-degrees of separation experiment (Stanley Milgram)
  • 1970s – Mathematical models of social networks (Erdos-Renyi-Gilbert)
  • 1980s – Statistical models (Holland and Leinhardt, Frank and Strauss)
  • 1990s – Statistical physicists weigh in: preferential attachment, small world

models, power-law degree distributions (Barabasi et al.)

  • 2000s – Today – Machine learning approaches, latent variable models
slide-11
SLIDE 11

Generative models for social networks

  • A generative model is one that can simulate

new networks

  • Two distinct schools of thought:
  • Probability models (non-statistical)
  • Typically simple, 1-2 parameters, not learned from data
  • Can be studied analytically
  • Statistical models
  • More parameters, latent variables
  • Learned from data via statistical techniques
slide-12
SLIDE 12

Probability and Inference

12

Data generating process Observed data Probability Inference

Figure based on one by Larry Wasserman, "All of Statistics"

Mathematics/physics: ErdΕ‘s-RΓ©nyi, preferential attachment,… Statistics/machine learning: ERGMs, latent variable models…

slide-13
SLIDE 13

13

Probability models for networks (non-statistical)

slide-14
SLIDE 14

ErdΕ‘s-RΓ©nyi model

  • There are two variants of this model
  • The model is a probability distribution over

graphs with a fixed number of edges.

  • It posits that all graphs on N nodes with E edges are

equally likely

slide-15
SLIDE 15

ErdΕ‘s-RΓ©nyi model

  • The model posits that each edge is β€œon”

with probability p

  • Probability of adjacency matrix
slide-16
SLIDE 16

ErdΕ‘s-RΓ©nyi model

  • Adjacency matrix likelihood:
  • Number of edges is binomial.
  • For large N, this is well approximated by a Poisson
slide-17
SLIDE 17

Preferential attachment models

  • The ErdΕ‘s-RΓ©nyi model assumes nodes typically

have about the same degree (# edges)

  • Many real networks have a degree distribution

following a power law (possibly controversial?)

  • Preferential attachment is a variant on the G(N,p)

model to address this (Barabasi and Albert, 1999)

slide-18
SLIDE 18

Preferential attachment models

  • Initially, no edges, and N0 nodes.
  • For each remaining node n
  • Add n to the network
  • For i =1:m
  • Connect n to a random existing node with probability

proportional to its degree ( + smoothing counts),

  • A Polya urn process! Rich get richer.

18

slide-19
SLIDE 19

Small world models (Watts and Strogatz)

  • Start with nodes

connected to K neighbors in a ring

  • Randomly rewire

each edge with probability

  • Has low average path length (small world

phenomenon, β€œ6-degrees of separation”)

19 Figure due to Arpad Horvath, https://commons.wikimedia.org/wiki/File:Watts_strogatz.svg

slide-20
SLIDE 20

20

Statistical network models

slide-21
SLIDE 21

Exponential family random graphs (ERGMs)

21

Arbitrary sufficient statistics Covariates (gender, age, …) E.g. β€œhow many males are friends with females”

slide-22
SLIDE 22

Exponential family random graphs (ERGMs)

  • Pros:
  • Powerful, flexible representation
  • Can encode complex theories, and do substantive social

science

  • Handles covariates
  • Mature software tools available,

e.g. ergm package for statnet

22

slide-23
SLIDE 23

Exponential family random graphs (ERGMs)

  • Cons:
  • Usual caveats of undirected models apply
  • Computationally intensive, especially learning
  • Inference may be intractable, due to partition function
  • Model degeneracy can easily happen
  • β€œa seemingly reasonable model can actually be such a bad mis-

specification for an observed dataset as to render the observed data virtually impossible”

  • Goodreau (2007)

23

slide-24
SLIDE 24

Exponential family random graphs (ERGMs)

  • Cons:
  • Usual caveats of undirected models apply
  • Computationally intensive, especially learning
  • Inference may be intractable, due to partition function
  • Model degeneracy can easily happen
  • β€œa seemingly reasonable model can actually be such a bad mis-

specification for an observed dataset as to render the observed data virtually impossible”

  • Goodreau (2007)

24

slide-25
SLIDE 25

Triadic closure

25

If two people have a friend in common, then there is an increased likelihood that they will become friends themselves at some point in the future.

slide-26
SLIDE 26

Measuring triadic closure

  • Mean clustering co-efficient:

26

+

slide-27
SLIDE 27

Simple ERGM for triadic closure leads to model degeneracy

27

Depending on parameters, we could get:

  • Graph is empty with probability close to 1
  • Graph is full with probability close to 1
  • Density, clustering distribution is bimodal, with little

mass on desired density and triad closure MLE may not exist!

slide-28
SLIDE 28

Simple ERGM for triadic closure leads to model degeneracy

28

Depending on parameters, we could get:

  • Graph is empty with probability close to 1
  • Graph is full with probability close to 1
  • Density, clustering distribution is bimodal, with little

mass on desired density and triad closure MLE may not exist!

slide-29
SLIDE 29

29 Handcock, M. S., Hunter, D. R., Butts, C. T., Goodreau, S. M., & Morris, M. (2008). statnet: Software tools for the representation, visualization, analysis and simulation of network data. Journal of statistical software, 24(1), 1548.

slide-30
SLIDE 30

What is the problem?

30

Completes two triangles! If an edge completes more triangles, it becomes overwhelming likely to exist. This propagates to create more triangles …

slide-31
SLIDE 31

Solution

  • Change the model so that there are diminishing

returns for completing more triangles

  • A different natural parameter for each possible number
  • f triangles completed by one edge
  • Natural parameters parameterized by a lower-

dimensional , e.g. encoding geometrically decreasing weights (curved exponential family)

  • Moral of the story: ERGMS are powerful, but

require care and expertise to perform well

31

slide-32
SLIDE 32

Latent variable models for social networks

  • Model where observed variables are dependent on

a set of unobserved or latent variables

  • Observed variables assumed to be conditionally

independent given latent variables

  • Why latent variable models?
  • Adjacency matrix 𝐙 is invariant to row and column

permutations

  • Aldous-Hoover theorem implies existence of a latent

variable model of form for iid latent variables and some function

slide-33
SLIDE 33

Latent variable models for social networks

  • Latent variable models allow for heterogeneity of

nodes in social networks

  • Each node (actor) has a latent variable 𝐴𝑗
  • Probability of forming edge between two nodes is

independent of all other node pairs given values of latent variables π‘ž 𝐙 𝐚, πœ„ = π‘ž π‘§π‘—π‘˜ 𝐴𝑗, π΄π‘˜, πœ„

π‘—β‰ π‘˜

  • Ideally latent variables should provide an interpretable

representation

slide-34
SLIDE 34

(Continuous) latent space model

  • Motivation: homophily or assortative mixing
  • Probability of edge between two nodes increases as

characteristics of the nodes become more similar

  • Represent nodes in an unobserved (latent) space of

characteristics or β€œsocial space”

  • Small distance between 2 nodes in latent space 

high probability of edge between nodes

  • Induces transitivity: observation of edges 𝑗, π‘˜ and π‘˜, 𝑙

suggests that 𝑗 and 𝑙 are not too far apart in latent space  more likely to also have an edge

slide-35
SLIDE 35

(Continuous) latent space model

  • (Continuous) latent space model (LSM) proposed

by Hoff et al. (2002)

  • Each node has a latent position 𝐴𝑗 ∈ ℝ𝑒
  • Probabilities of forming edges depend on distances

between latent positions

  • Define pairwise affinities πœ”π‘—π‘˜ = πœ„ βˆ’ 𝐴𝑗 βˆ’ π΄π‘˜

2

slide-36
SLIDE 36

Latent space model: generative process

  • 1. Sample node positions in

latent space

  • 2. Compute affinities

between all pairs of nodes

  • 3. Sample edges between all

pairs of nodes

Figure due to P. D. Hoff, Modeling homophily and stochastic equivalence in symmetric relational data, NIPS 2008

slide-37
SLIDE 37

Advantages and disadvantages of latent space model

  • Advantages of latent space model
  • Visual and interpretable spatial representation of

network

  • Models homophily (assortative mixing) well via

transitivity

  • Disadvantages of latent space model
  • 2-D latent space representation often may not offer

enough degrees of freedom

  • Cannot model disassortative mixing (people preferring

to associate with people with different characteristics)

slide-38
SLIDE 38

Stochastic block model (SBM)

  • First formalized by Holland et al.

(1983)

  • Also known as multi-class ErdΕ‘s-

RΓ©nyi model

  • Each node has categorical latent

variable 𝑨𝑗 ∈ 1, … , 𝐿 denoting its class or group

  • Probabilities of forming edges

depend on class memberships of nodes (𝐿 Γ— 𝐿 matrix W)

  • Groups often interpreted as

functional roles in social networks

slide-39
SLIDE 39

Stochastic equivalence and block models

  • Stochastic equivalence:

generalization of structural equivalence

  • Group members have

identical probabilities of forming edges to members

  • ther groups
  • Can model both assortative and

disassortative mixing

Figure due to P. D. Hoff, Modeling homophily and stochastic equivalence in symmetric relational data, NIPS 2008

slide-40
SLIDE 40

Stochastic equivalence vs community detection

Original graph Blockmodel

Figure due to Goldenberg et al. (2009) - Survey of Statistical Network Models, Foundations and Trends

Stochastically equivalent, but are not densely connected

slide-41
SLIDE 41

Stochastic blockmodel Latent representation

UCSD UCI UCLA Alice 1 Bob 1 Claire 1

Alice Bob Claire

slide-42
SLIDE 42

Reordering the matrix to show the inferred block structure

Kemp, Charles, et al. "Learning systems of concepts with an infinite relational model." AAAI. Vol. 3. 2006.

slide-43
SLIDE 43

Model structure

Kemp, Charles, et al. "Learning systems of concepts with an infinite relational model." AAAI. Vol. 3. 2006.

Latent groups Z Interaction matrix W (probability of an edge from block k to block k’)

slide-44
SLIDE 44

Stochastic block model generative process

44

slide-45
SLIDE 45

Stochastic block model Latent representation

Running Dancing Fishing Alice 1 Bob 1 Claire 1

Alice Bob Claire Nodes assigned to only

  • ne latent group.

Not always an appropriate assumption

slide-46
SLIDE 46

Mixed membership stochastic blockmodel (MMSB)

Running Dancing Fishing Alice 0.4 0.4 0.2 Bob 0.5 0.5 Claire 0.1 0.9

Alice Bob Claire Nodes represented by distributions

  • ver latent groups (roles)

Airoldi et al., (2008)

slide-47
SLIDE 47

Mixed membership stochastic blockmodel (MMSB)

Airoldi et al., (2008)

slide-48
SLIDE 48

Latent feature models

Cycling Fishing Running Waltz Running Tango Salsa Alice Bob Claire Mixed membership implies a kind of β€œconservation of (probability) mass” constraint: If you like cycling more, you must like running less, to sum to one

Miller, Griffiths, Jordan (2009)

slide-49
SLIDE 49

Latent feature models

Mixed membership implies a kind of β€œconservation of (probability) mass” constraint: If you like cycling more, you must like running less, to sum to one

Miller, Griffiths, Jordan (2009)

Cycling Fishing Running Waltz Running Tango Salsa Alice Bob Claire

slide-50
SLIDE 50

Latent feature models

Mixed membership implies a kind of β€œconservation of (probability) mass” constraint: If you like cycling more, you must like running less, to sum to one

Miller, Griffiths, Jordan (2009)

Cycling Fishing Running Waltz Running Tango Salsa Alice Bob Claire

slide-51
SLIDE 51

Latent feature models

Miller, Griffiths, Jordan (2009)

Cycling Fishing Running Waltz Running Tango Salsa

Cycling Fishing Running Tango Salsa Waltz Alice Bob Claire

Z =

Alice Bob Claire Nodes represented by binary vector of latent features

slide-52
SLIDE 52

Latent feature models

  • Latent Feature Relational Model LFRM

(Miller, Griffiths, Jordan, 2009) likelihood model:

  • β€œIf I have feature k, and you have feature l, add Wkl to the log-odds
  • f the probability we interact”
  • Can include terms for network density, covariates, popularity,…, as

in the p2 model

52

1

  • ο‚΅

+ ο‚΅

slide-53
SLIDE 53

Outline

  • Mathematical representations of social networks

and generative models

  • Introduction to generative approach
  • Connections to sociological principles
  • Fitting generative social network models to data
  • Example application scenarios
  • Model selection and evaluation
  • Recent developments in generative social network

models

  • Dynamic social network models
slide-54
SLIDE 54

Application 1: Facebook wall posts

  • Network of wall posts on Facebook collected by

Viswanath et al. (2009)

  • Nodes: Facebook users
  • Edges: directed edge from 𝑗 to π‘˜ if 𝑗 posts on π‘˜β€™s

Facebook wall

  • What model should we use?
  • (Continuous) latent space and latent feature models do

not handle directed graphs in a straightforward manner

  • Wall posts might not be transitive, unlike friendships
  • Stochastic block model might not be a bad choice

as a starting point

slide-55
SLIDE 55

Model structure

Kemp, Charles, et al. "Learning systems of concepts with an infinite relational model." AAAI. Vol. 3. 2006.

Latent groups Z Interaction matrix W (probability of an edge from block k to block k’)

slide-56
SLIDE 56

Fitting stochastic block model

  • A priori block model: assume that class (role) of

each node is given by some other variable

  • Only need to estimate 𝑋𝑙𝑙′: probability that node in

class 𝑙 connects to node in class 𝑙′ for all 𝑙, 𝑙′

  • Likelihood given by
  • Maximum-likelihood estimate (MLE) given by

Number of actual edges in block 𝑙, 𝑙′ Number of possible edges in block 𝑙, 𝑙′

slide-57
SLIDE 57

Estimating latent classes

  • Latent classes (roles) are unknown in this data set
  • First estimate latent classes 𝐚 then use MLE for 𝐗
  • MLE over latent classes is intractable!
  • ~𝐿𝑂 possible latent class vectors
  • Spectral clustering techniques have been shown to

accurately estimate latent classes

  • Use singular vectors of (possibly transformed) adjacency

matrix to estimate classes

  • Many variants with differing theoretical guarantees
slide-58
SLIDE 58

Spectral clustering for directed SBMs

  • 1. Compute singular value decomposition

𝑍 = π‘‰Ξ£π‘Šπ‘ˆ

  • 2. Retain only first 𝐿 columns of 𝑉, π‘Š and first 𝐿

rows and columns of Ξ£

  • 3. Define coordinate-scaled singular vector matrix

π‘Ž = 𝑉Σ1/2 π‘ŠΞ£1/2

  • 4. Run k-means clustering on rows of π‘Ž

to return estimate π‘Ž

  • f latent classes

Scales to networks with thousands of nodes!

slide-59
SLIDE 59

Demo of SBM on Facebook wall post network

slide-60
SLIDE 60

Application 2: social network of bottlenose dolphin interactions

  • Data collected by marine biologists observing

interactions between 62 bottlenose dolphins

  • Introduced to network science community by Lusseau

and Newman (2004)

  • Nodes: dolphins
  • Edges: undirected relations denoting frequent

interactions between dolphins

  • What model should we use?
  • Social interactions here are in a group setting so lots of

transitivity may be expected

  • Interactions associated by physical proximity
  • Use latent space model to estimate latent positions
slide-61
SLIDE 61

(Continuous) latent space model

  • (Continuous) latent space model (LSM) proposed

by Hoff et al. (2002)

  • Each node has a latent position 𝐴𝑗 ∈ ℝ𝑒
  • Probabilities of forming edges depend on distances

between latent positions

  • Define pairwise affinities πœ”π‘—π‘˜ = πœ„ βˆ’ 𝐴𝑗 βˆ’ π΄π‘˜

2

π‘ž 𝑍 π‘Ž, πœ„ = π‘“π‘§π‘—π‘˜πœ”π‘—π‘˜ 1 + π‘“πœ”π‘—π‘˜

π‘—β‰ π‘˜

slide-62
SLIDE 62

Estimation for latent space model

  • Maximum-likelihood estimation
  • Log-likelihood is concave in terms of pairwise distance

matrix 𝐸 but not in latent positions π‘Ž

  • First find MLE in terms of 𝐸 then use multi-dimensional

scaling (MDS) to get initialization for π‘Ž

  • Faster approach: replace 𝐸 with shortest path distances

in graph then use MDS

  • Use non-linear optimization to find MLE for π‘Ž
  • Latent space dimension often set to 2 to allow

visualization using scatter plot Scales to ~1000 nodes

slide-63
SLIDE 63

Demo of latent space model on dolphin network

slide-64
SLIDE 64

Bayesian inference

  • As a Bayesian, all you have to do is write down your

prior beliefs, write down your likelihood, and apply Bayes β€˜ rule,

64

slide-65
SLIDE 65

Elements of Bayesian Inference

65

Posterior Likelihood Marginal likelihood (a.k.a. model evidence) Prior is a normalization constant that does not depend on the value of ΞΈ. It is the probability of the data under the model, marginalizing over all possible θ’s.

slide-66
SLIDE 66

The full posterior distribution can be very useful

66

The mode (MAP estimate) is unrepresentative of the distribution

slide-67
SLIDE 67

MAP estimate can result in

  • verfitting

67

slide-68
SLIDE 68

Markov chain Monte Carlo

  • Goal: approximate/summarize a distribution, e.g.

the posterior, with a set of samples

  • Idea: use a Markov chain to simulate the

distribution and draw samples

68

slide-69
SLIDE 69

Gibbs sampling

  • Sampling from a complicated distribution, such as a

Bayesian posterior, can be hard.

  • Often, sampling one variable at a time, given all the
  • thers, is much easier.
  • Graphical models:

Graph structure gives us Markov blanket

69

slide-70
SLIDE 70

Gibbs sampling

  • Update variables one at a time by drawing from

their conditional distributions

  • In each iteration, sweep through and update all of

the variables, in any order.

70

slide-71
SLIDE 71

Gibbs sampling

71

slide-72
SLIDE 72

Gibbs sampling

72

slide-73
SLIDE 73

Gibbs sampling

73

slide-74
SLIDE 74

Gibbs sampling

74

slide-75
SLIDE 75

Gibbs sampling

75

slide-76
SLIDE 76

Gibbs sampling

76

slide-77
SLIDE 77

Gibbs sampling

77

slide-78
SLIDE 78

Gibbs sampling for SBM

slide-79
SLIDE 79

Variational inference

  • Key idea:
  • Approximate distribution of interest p(z) with another

distribution q(z)

  • Make q(z) tractable to work with
  • Solve an optimization problem to make q(z) as similar to

p(z) as possible, e.g. in KL-divergence

79

slide-80
SLIDE 80

Variational inference

80

p q

slide-81
SLIDE 81

Variational inference

81

p q

slide-82
SLIDE 82

Variational inference

82

p q

slide-83
SLIDE 83

83

Blows up if p is small and q isn’t. Under-estimates the support Blows up if q is small and p isn’t. Over-estimates the support Reverse KL Forwards KL

Figures due to Kevin Murphy (2012). Machine Learning: A Probabilistic Perspective

slide-84
SLIDE 84

KL-divergence as an objective function for variational inference

  • Minimizing the KL is equivalent to maximizing

84

Fit the data well Be flat

slide-85
SLIDE 85

KL-divergence as an objective function for variational inference

  • Minimizing the KL is equivalent to maximizing

85

Fit the data well Be flat

slide-86
SLIDE 86

KL-divergence as an objective function for variational inference

  • Minimizing the KL is equivalent to maximizing

86

Fit the data well Be flat

slide-87
SLIDE 87

KL-divergence as an objective function for variational inference

  • Minimizing the KL is equivalent to maximizing

87

Fit the data well Be flat

slide-88
SLIDE 88

Mean field variational inference

  • We still need to compute expectations over z
  • However, we have gained the option to restrict q(z)

to make these expectations tractable.

  • The mean field approach uses a fully factorized q(z)

88

The entropy term decomposes nicely:

slide-89
SLIDE 89

Mean field algorithm

  • Until converged
  • For each factor i
  • Select variational parameters such that
  • Each update monotonically improves the ELBO so

the algorithm must converge

89

slide-90
SLIDE 90

Deriving mean field updates for your model

  • Write down the mean field equation explicitly,
  • Simplify and apply the expectation.
  • Manipulate it until you can recognize it as a log-pdf
  • f a known distribution (hopefully).
  • Reinstate the normalizing constant.

90

slide-91
SLIDE 91

Mean field vs Gibbs sampling

  • Both mean field and Gibbs sampling iteratively

update one variable given the rest

  • Mean field stores an entire distribution for each

variable, while Gibbs sampling draws from one.

91

slide-92
SLIDE 92

Pros and cons vs Gibbs sampling

  • Pros:
  • Deterministic algorithm, typically converges faster
  • Stores an analytic representation of the distribution, not just

samples

  • Non-approximate parallel algorithms
  • Stochastic algorithms can scale to very large data sets
  • No issues with checking convergence
  • Cons:
  • Will never converge to the true distribution,

unlike Gibbs sampling

  • Dense representation can mean more communication for parallel

algorithms

  • Harder to derive update equations

92

slide-93
SLIDE 93

Variational inference algorithm for MMSB (Variational EM)

  • Compute maximum likelihood estimates for interaction

parameters Wkk’

  • Assume fully factorized variational distribution for

mixed membership vectors, cluster assignments

  • Until converged
  • For each node
  • Compute variational discrete distribution over it’s latent

zp->q and zq->p assignments

  • Compute variational Dirichlet distribution over its mixed

membership distribution

  • Maximum likelihood update for W
slide-94
SLIDE 94

Application of MMSB to Sampson’s Monastery

  • Sampson (1968) studied

friendship relationships between novice monks

  • Identified several factions
  • Blockmodel appropriate?
  • Conflicts occurred
  • Two monks expelled
  • Others left

Airoldi, E. M., Blei, D. M., Fienberg, S. E., & Xing, E. P. (2009). Mixed membership stochastic blockmodels. In Advances in Neural Information Processing Systems (pp. 33-40).

slide-95
SLIDE 95

Application of MMSB to Sampson’s Monastery

Airoldi, E. M., Blei, D. M., Fienberg, S. E., & Xing, E. P. (2009). Mixed membership stochastic blockmodels. In Advances in Neural Information Processing Systems (pp. 33-40).

Estimated blockmodel

slide-96
SLIDE 96

Application of MMSB to Sampson’s Monastery

Airoldi, E. M., Blei, D. M., Fienberg, S. E., & Xing, E. P. (2009). Mixed membership stochastic blockmodels. In Advances in Neural Information Processing Systems (pp. 33-40).

Estimated blockmodel Least coherent

slide-97
SLIDE 97

Application of MMSB to Sampson’s Monastery

Airoldi, E. M., Blei, D. M., Fienberg, S. E., & Xing, E. P. (2009). Mixed membership stochastic blockmodels. In Advances in Neural Information Processing Systems (pp. 33-40).

Estimated Mixed membership vectors (posterior mean)

slide-98
SLIDE 98

Application of MMSB to Sampson’s Monastery

Airoldi, E. M., Blei, D. M., Fienberg, S. E., & Xing, E. P. (2009). Mixed membership stochastic blockmodels. In Advances in Neural Information Processing Systems (pp. 33-40).

Estimated Mixed membership vectors (posterior mean) Expelled

slide-99
SLIDE 99

Application of MMSB to Sampson’s Monastery

Airoldi, E. M., Blei, D. M., Fienberg, S. E., & Xing, E. P. (2009). Mixed membership stochastic blockmodels. In Advances in Neural Information Processing Systems (pp. 33-40).

Estimated Mixed membership vectors (posterior mean) Wavering not captured

slide-100
SLIDE 100

Application of MMSB to Sampson’s Monastery

Airoldi, E. M., Blei, D. M., Fienberg, S. E., & Xing, E. P. (2009). Mixed membership stochastic blockmodels. In Advances in Neural Information Processing Systems (pp. 33-40).

Original network (whom do you like?) Summary of network (use Ο€β€˜s)

slide-101
SLIDE 101

Application of MMSB to Sampson’s Monastery

Airoldi, E. M., Blei, D. M., Fienberg, S. E., & Xing, E. P. (2009). Mixed membership stochastic blockmodels. In Advances in Neural Information Processing Systems (pp. 33-40).

Original network (whom do you like?) Denoise network (use z’s)

slide-102
SLIDE 102

Scaling up Bayesian inference to large networks

  • Two key strategies: parallel/distributed, and stochastic

algorithms

  • Parallel/distributed algorithms
  • Compute VB or MCMC updates in parallel
  • Communication overhead may be lower for MCMC
  • Not well understood for MCMC, but works in practice
  • Stochastic algorithms
  • Stochastic variational inference
  • estimate updates based on subsamples. MMSB: Gopalan et al. (2012)
  • A related subsampling trick for MCMC in latent space models (Raftery et

al., 2012)

  • Other general stochastic MCMC algorithms:
  • Stochastic gradient Langevin dynamics (Welling and Teh, 2011), Austerity MCMC

(Korattika et al., 2014)

slide-103
SLIDE 103

Evaluation of unsupervised models

  • Quantitative evaluation
  • Measurable, quantifiable performance metrics
  • Qualitative evaluation
  • Exploratory data analysis (EDA) using the model
  • Human evaluation, user studies,…

103

slide-104
SLIDE 104

Evaluation of unsupervised models

  • Intrinsic evaluation
  • Measure inherently good properties of the model
  • Fit to the data (e.g. link prediction), interpretability,…
  • Extrinsic evaluation
  • Study usefulness of model for external tasks
  • Classification, retrieval, part of speech tagging,…

104

slide-105
SLIDE 105

Extrinsic evaluation: What will you use your model for?

  • If you have a downstream task in mind, you should

probably evaluate based on it!

  • Even if you don’t, you could contrive one for

evaluation purposes

  • E.g. use latent representations for:
  • Classification, regression, retrieval, ranking…

105

slide-106
SLIDE 106

Posterior predictive checks

  • Sampling data from the posterior predictive distribution

allows us to β€œlook into the mind of the model” – G. Hinton

106

β€œThis use of the word mind is not intended to be metaphorical. We believe that a mental state is the state of a hypothetical, external world in which a high-level internal representation would constitute veridical perception. That hypothetical world is what the figure shows.” Geoff Hinton et al. (2006), A Fast Learning Algorithm for Deep Belief Nets.

slide-107
SLIDE 107

Posterior predictive checks

  • Does data drawn from the model differ from the
  • bserved data, in ways that we care about?
  • PPC:
  • Define a discrepancy function (a.k.a. test statistic) T(X).
  • Like a test statistic for a p-value. How extreme is my data set?
  • Simulate new data X(rep) from the posterior predictive
  • Use MCMC to sample parameters from posterior, then simulate data
  • Compute T(X(rep)) and T(X), compare. Repeat, to estimate:

107

slide-108
SLIDE 108

Outline

  • Mathematical representations of social networks

and generative models

  • Introduction to generative approach
  • Connections to sociological principles
  • Fitting generative social network models to data
  • Example application scenarios
  • Model selection and evaluation
  • Recent developments in generative social network

models

  • Dynamic social network models
slide-109
SLIDE 109

Dynamic social network

  • Relations between people may change over time
  • Need to generalize social network models to

account for dynamics

Dynamic social network (Nordlie, 1958; Newcomb, 1961)

slide-110
SLIDE 110

Dynamic Relational Infinite Feature Model (DRIFT)

  • J. R. Foulds, A. Asuncion, C. DuBois, C. T. Butts, P. Smyth.

A dynamic relational infinite feature model for longitudinal social networks. AISTATS 2011

  • Models networks as they over time, by way of

changing latent features

Cycling Fishing Running Waltz Running Tango Salsa Alice Bob Claire

slide-111
SLIDE 111

Dynamic Relational Infinite Feature Model (DRIFT)

  • Models networks as they over time, by way of

changing latent features

Cycling Fishing Running Waltz Running Tango Salsa Alice Bob Claire

  • J. R. Foulds, A. Asuncion, C. DuBois, C. T. Butts, P. Smyth.

A dynamic relational infinite feature model for longitudinal social networks. AISTATS 2011

slide-112
SLIDE 112

Dynamic Relational Infinite Feature Model (DRIFT)

  • Models networks as they over time, by way of

changing latent features

Cycling Fishing Running Waltz Running Tango Salsa Alice Bob Claire

  • J. R. Foulds, A. Asuncion, C. DuBois, C. T. Butts, P. Smyth.

A dynamic relational infinite feature model for longitudinal social networks. AISTATS 2011

slide-113
SLIDE 113

Dynamic Relational Infinite Feature Model (DRIFT)

  • Models networks as they over time, by way of

changing latent features

Cycling Fishing Running Waltz Running Tango Salsa Fishing Alice Bob Claire

  • J. R. Foulds, A. Asuncion, C. DuBois, C. T. Butts, P. Smyth.

A dynamic relational infinite feature model for longitudinal social networks. AISTATS 2011

slide-114
SLIDE 114

Dynamic Relational Infinite Feature Model (DRIFT)

  • Models networks as they over time, by way of

changing latent features

Cycling Fishing Running Waltz Running Tango Salsa Fishing Alice Bob Claire

  • J. R. Foulds, A. Asuncion, C. DuBois, C. T. Butts, P. Smyth.

A dynamic relational infinite feature model for longitudinal social networks. AISTATS 2011

slide-115
SLIDE 115

Dynamic Relational Infinite Feature Model (DRIFT)

  • Models networks as they over time, by way of

changing latent features

  • HMM dynamics for each actor/feature (factorial HMM)
  • J. R. Foulds, A. Asuncion, C. DuBois, C. T. Butts, P. Smyth.

A dynamic relational infinite feature model for longitudinal social networks. AISTATS 2011

slide-116
SLIDE 116

Bayesian Inference for DRIFT

  • Markov chain Monte Carlo inference
  • Blocked Gibbs sampler
  • Forward filtering, backward sampling to jointly sample

each actor's feature chains

  • β€œSlice sampling” trick with the stick-breaking

construction of the IBP to adaptively truncate the number of features but still perform exact inference

  • Metropolis-Hastings updates for W's
  • J. R. Foulds, A. Asuncion, C. DuBois, C. T. Butts, P. Smyth.

A dynamic relational infinite feature model for longitudinal social networks. AISTATS 2011

slide-117
SLIDE 117

Synthetic Data: Inference on Z’s

  • J. R. Foulds, A. Asuncion, C. DuBois, C. T. Butts, P. Smyth.

A dynamic relational infinite feature model for longitudinal social networks. AISTATS 2011

slide-118
SLIDE 118

Synthetic Data: Predicting the Future

  • J. R. Foulds, A. Asuncion, C. DuBois, C. T. Butts, P. Smyth.

A dynamic relational infinite feature model for longitudinal social networks. AISTATS 2011

slide-119
SLIDE 119

Enron Email Data: Predicting the Future

  • J. R. Foulds, A. Asuncion, C. DuBois, C. T. Butts, P. Smyth.

A dynamic relational infinite feature model for longitudinal social networks. AISTATS 2011

slide-120
SLIDE 120

Enron Email Data: Predicting the Future

  • J. R. Foulds, A. Asuncion, C. DuBois, C. T. Butts, P. Smyth.

A dynamic relational infinite feature model for longitudinal social networks. AISTATS 2011

slide-121
SLIDE 121

Enron Email Data: Missing Data Imputation

  • J. R. Foulds, A. Asuncion, C. DuBois, C. T. Butts, P. Smyth.

A dynamic relational infinite feature model for longitudinal social networks. AISTATS 2011

slide-122
SLIDE 122

Enron Email Data: Edge Probability Over Time

  • J. R. Foulds, A. Asuncion, C. DuBois, C. T. Butts, P. Smyth.

A dynamic relational infinite feature model for longitudinal social networks. AISTATS 2011

slide-123
SLIDE 123

Quantitative Results

  • J. R. Foulds, A. Asuncion, C. DuBois, C. T. Butts, P. Smyth.

A dynamic relational infinite feature model for longitudinal social networks. AISTATS 2011

slide-124
SLIDE 124

Hidden Markov dynamic network models

  • Most work on dynamic network modeling

assumes hidden Markov structure

– Latent variables and/or parameters follow Markov dynamics – Graph snapshot at each time generated using static network model, e.g. stochastic block model or latent feature model as in DRIFT – Has been used to extend SBMs to dynamic models (Yang et al., 2011; Xu and Hero, 2014)

slide-125
SLIDE 125

Beyond hidden Markov networks

  • Hidden Markov structure is tractable but not very

realistic assumption in social interaction networks

– Interaction between two people does not influence future interactions

  • Proposed model: Allow current graph to depend on

current parameters and previous graph

  • Proposed inference procedure does not require MCMC

– Scales to ~ 1000 nodes

slide-126
SLIDE 126

Stochastic block transition model

  • Generate graph at initial time step using SBM
  • Place Markov model on Π𝑒|0, Π𝑒|1
  • Main idea: parameterize each block

𝑙, 𝑙′ with two probabilities

– Probability of forming new edge πœŒπ‘™π‘™β€²

𝑒|0 = Pr 𝑍 π‘—π‘˜ 𝑒 = 1|𝑍 π‘—π‘˜ π‘’βˆ’1 = 0

– Probability of existing edge re-

  • ccurring

πœŒπ‘™π‘™β€²

𝑒|1 = Pr 𝑍 π‘—π‘˜ 𝑒 = 1|𝑍 π‘—π‘˜ π‘’βˆ’1 = 1

slide-127
SLIDE 127

Application to Facebook wall posts

  • Fit dynamic SBMs to network of Facebook wall posts

– ~ 700 nodes, 9 time steps, 5 classes

  • How accurately do hidden Markov SBM and SBTM

replicate edge durations in observed network?

– Simulate networks from both models using estimated parameters – Hidden Markov SBM cannot replicate long-lasting edges in sparse blocks

slide-128
SLIDE 128

Behaviors of different classes

  • SBTM retains interpretability of SBM at each time step
  • Q: Do different classes behave differently in how they form edges?
  • A: Only for probability of existing edges re-occurring
  • New insight revealed by having separate probabilities in SBTM
slide-129
SLIDE 129

Information diffusion in text-based cascades

t=0 t=3.5 t=1 t=2 t=1.5

  • Temporal information
  • Content information
  • Network is latent
  • X. He, T. Rekatsinas, J. R. Foulds, L. Getoor, and Y. Liu. HawkesTopic: A joint model for network inference and topic modeling

from text-based cascades. ICML 2015.

slide-130
SLIDE 130

HawkesTopic model for text-based cascades

130

Mutual exciting nature: A posting event can trigger future events

Content cascades: The content of a document should be similar

to the document that triggers its publication

  • X. He, T. Rekatsinas, J. R. Foulds, L. Getoor, and Y. Liu. HawkesTopic: A joint model for network inference and topic modeling

from text-based cascades. ICML 2015.

slide-131
SLIDE 131

Modeling posting times

Mutually exciting nature captured via Multivariate Hawkes Process (MHP) [Liniger 09]. For MHP, intensity process πœ‡π‘€(𝑒) takes the form:

πœ‡π‘€ 𝑒 = πœˆπ‘€ + 𝐡𝑀𝑓,𝑀𝑔

Ξ”(𝑒 βˆ’ 𝑒𝑓) 𝑓:𝑒𝑓<𝑒

𝐡𝑣,π‘₯: influence strength from 𝑣 to 𝑀 𝑔

Ξ”(β‹…): probability density function of the delay distribution

Base intensity Influence from previous events

+

Rate =

slide-132
SLIDE 132

Clustered Poisson process interpretation

  • X. He, T. Rekatsinas, J. R. Foulds, L. Getoor, and Y. Liu. HawkesTopic: A joint model for network inference and topic modeling

from text-based cascades. ICML 2015.

slide-133
SLIDE 133

Generating documents

  • X. He, T. Rekatsinas, J. R. Foulds, L. Getoor, and Y. Liu. HawkesTopic: A joint model for network inference and topic modeling

from text-based cascades. ICML 2015.

slide-134
SLIDE 134

Experiments for HawkesTopic

  • X. He, T. Rekatsinas, J. R. Foulds, L. Getoor, and Y. Liu. HawkesTopic: A joint model for network inference and topic modeling

from text-based cascades. ICML 2015.

slide-135
SLIDE 135

Results: EventRegistry

  • X. He, T. Rekatsinas, J. R. Foulds, L. Getoor, and Y. Liu. HawkesTopic: A joint model for network inference and topic modeling

from text-based cascades. ICML 2015.

slide-136
SLIDE 136

Results: EventRegistry

  • X. He, T. Rekatsinas, J. R. Foulds, L. Getoor, and Y. Liu. HawkesTopic: A joint model for network inference and topic modeling

from text-based cascades. ICML 2015.

slide-137
SLIDE 137

Results: ArXiv

  • X. He, T. Rekatsinas, J. R. Foulds, L. Getoor, and Y. Liu. HawkesTopic: A joint model for network inference and topic modeling

from text-based cascades. ICML 2015.

slide-138
SLIDE 138

Results: ArXiv

  • X. He, T. Rekatsinas, J. R. Foulds, L. Getoor, and Y. Liu. HawkesTopic: A joint model for network inference and topic modeling

from text-based cascades. ICML 2015.

slide-139
SLIDE 139

Summary

  • Generative models provide a powerful mechanism for

modeling social networks

  • Latent variable models offer flexible yet interpretable

models motivated by sociological principles

  • Latent space model
  • Stochastic block model
  • Mixed-membership stochastic block model
  • Latent feature model
  • Many recent advancements in generative models for

social networks

  • Dynamic networks, cascades, joint modeling with text
slide-140
SLIDE 140

Thank you!

slide-141
SLIDE 141

The giant component

  • Depending on the quantity ,

a β€œgiant” connected component may emerge

141

slide-142
SLIDE 142

The giant component

  • Depending on the quantity ,

a β€œgiant” connected component may emerge

142

slide-143
SLIDE 143

The giant component

  • Depending on the quantity ,

a β€œgiant” connected component may emerge

143