Configuring random graph models with fixed degree sequences Daniel - - PowerPoint PPT Presentation

configuring random graph models with fixed degree
SMART_READER_LITE
LIVE PREVIEW

Configuring random graph models with fixed degree sequences Daniel - - PowerPoint PPT Presentation

Configuring random graph models with fixed degree sequences Daniel B. Larremore Santa Fe Institute June 21, 2017 NetSci larremore@santafe.edu @danlarremore Brief note on references: This talk does not include references to literature, which


slide-1
SLIDE 1

Configuring random graph models with fixed degree sequences

Daniel B. Larremore

June 21, 2017 NetSci

Santa Fe Institute

larremore@santafe.edu @danlarremore

slide-2
SLIDE 2

Brief note on references: This talk does not include references to literature, which are numerous and important. Most (but not all) references are included in the arXiv paper: arxiv.org/abs/1608.00607

slide-3
SLIDE 3
  • since a single stochastic generative model can generate many

networks, the model itself corresponds to a set of networks.

  • and since the generative model itself is some combination or

composition of random variables, a random graph model is a set of possible networks, each with an associated probability, i.e., a distribution. this talk: configuration models: uniform distributions over networks w/ fixed deg. seq.

  • a stochastic generative model is also just a recipe:

choose parameters→draw a network

  • a generative model is just a recipe:

choose parameters→make the network

Stochastic models, sets, and distributions

slide-4
SLIDE 4

Why care about random graphs w/ fixed degree sequence?

Since many networks have broad or peculiar degree sequences, these random graph distributions are commonly used for: Can a particular network’s properties be explained by the degree sequence alone? Hypothesis testing: Modeling: How does the degree distribution affect the epidemic threshold for disease transmission? Null model for Modularity, Stochastic Block Model: Compare an empirical graph with (possibly) community structure to the ensemble of random graphs with the same vertex degrees.

slide-5
SLIDE 5

1 2 3 4 5 6 1 2 5 6 2 6 3 4 1 3 5 4 1 2 3 5 6 4

Stub Matching to draw from the config. model

the standard algorithm: draw from the distribution by sequential “Stub Matching”

  • 1. initialize each node n with kn half-edges or stubs.
  • 2. choose two stubs uniformly at random and join to form an edge.

~ k = {1, 2, 2, 1}

slide-6
SLIDE 6

1 4 5 6 2 3 1 4 5 2 4 2 1 2 3 4 5 6 1 3 5 6 6 1 2 3 4 5 6 1 2 5 6 2 6 3 4 1 3 5 4 1 2 3 5 6 4

Stub Matching to draw from the config. model

draw #1 draw #2

slide-7
SLIDE 7

Are these two different networks? or the same network? The rest of this talk: the answer matters. Are stubs distinguishable or not?

slide-8
SLIDE 8

The distribution according to stub-matching

When we draw a graph using stub matching, this is the set of graphs that we uniformly sample. 8 of the graphs are simple, while the other 7 have self-loops or multiedges. We therefore say that stub matching uniformly samples space of stub-labeled loopy multigraphs. Note, however, that this is not a uniform sample over adjacency matrices (rows).

stub-labeled

d

slide-9
SLIDE 9

The importance of uniform distributions

vertex-labeled

c

graph isomorph.

b

graph isomorph.

goal: provably uniform sampling for all eight spaces: loopy{0,1} x multigraph{0,1} x {stub-,vertex-}

a b

simple graphs

no multiedges no self-loops

multigraphs loopy multigraphs loopy graphs

remove stub labels

stub-labeled

d remove vertex labels

slide-10
SLIDE 10
  • two graphs
  • one graph, drawn two ways
  • one valid; one nonsensical
  • three graphs
  • one graph, drawn three ways
  • one valid; two nonsensical

Question 3: vertex- or stub-labeled? stub-labeled vertex-labeled These configurations are . . .

Choosing a space for your configuration model

simple

(skip Q3)

loopy multigraph loopy multigraph Question 1: loops? Question 2: multiedges?

example: Are loops reasonable? Would a loop make sense? [tennis matches: no | author citations: yes]

slide-11
SLIDE 11

stub matching samples uniformly from stub-labeled loopy multigraphs

Sampling from configuration models

NB: Sampling is easy. Provably uniform sampling is not!

for other spaces, define a Markov chain over the “graph of graphs” G →each vertex is a graph, and directed edges are “double-edge swaps” swap this way

  • r the other way
slide-12
SLIDE 12

Markov chains for uniform sampling

Prove that:

  • the transition matrix is doubly stochastic (G is regular)
  • the chain is irreducible (G is strongly connected)
  • the chain is aperiodic (G is aperiodic; gcd of all cycles is one)

Straightforward for stub-labeled loopy multigraphs.

Choose two edges uniformly at random and swap them. Accept all swaps and treat each resulting graph as a sample from the U distribution. (Each node in G has degree m-choose-2.)

Prove that:

  • the transition matrix is doubly stochastic
  • the chain is irreducible
  • the chain is aperiodic

Easy for stub-labeled multigraphs.

Choose two edges uniformly at random and swap them. Reject swaps that create a self-loop and resample the current graph. (Think of any “rejected swap” as a self-loop in G.)

Easy for simple graphs.

Choose two edges uniformly at random and swap them. Reject swaps that create a self-loop or multiedge and resample the current graph. (Again, treat “rejected swaps” as a self loops in G.)

slide-13
SLIDE 13

Markov chains for uniform sampling

For vertex-labeled graphs, we inherit the strong connectedness of G as well as its aperiodicity. However, ensuring that the Markov chain has a uniform distribution as its stationary distribution requires that we adjust transition probabilities. These asymmetric modifications to transition probabilities depend on the number of self-loops and multiedges in the current state.

a b

unadjusted transitions P = 1/3, 2/3 adjusted transitions P = 1/2,1/2

decrease outflow (and increase resampling)

  • f graphs with multiedges or self-loops.

Intuition:

slide-14
SLIDE 14

Stub-labeled loopy graphs: not connected

counterexample: no double-edge swap connects these two graphs!

but see Nishimura 2017 (arxiv:1701.04888) - The connectivity of graphs of graphs with self-loops and a given degree sequence

slide-15
SLIDE 15

Do {stub labels, self-loops, multiedges} matter for how we sample CMs? yes Do {stub labels, self-loops, multiedges} matter in applications of CMs? next… →hypothesis testing →null model for modularity

showed that these spaces are far from equivalent, even in thermodynamic lim. introduced (and just outlined) provably uniform sampling methods.

slide-16
SLIDE 16

Do barn swallows tend to associate with other swallows of similar color? Data: bird interactions, bird colors. Compute color assortativity [correlation over edges]

Hypothesis testing

slide-17
SLIDE 17
  • two graphs
  • one graph, drawn two ways
  • one valid; one nonsensical
  • three graphs
  • one graph, drawn three ways
  • one valid; two nonsensical

Question 3: vertex- or stub-labeled? stub-labeled vertex-labeled These configurations are . . . simple

(skip Q3)

loopy multigraph loopy multigraph Question 1: loops? Question 2: multiedges?

N

  • n

s e n s i c a l R e a s

  • n

a b l e [ i n

  • u

r d a t a , i n f a c t ! ] [Why? If we interacted today and yesterday, a randomization in which my today interacts with your yesterday is nonsensical!] v e r t e x

  • l

a b e l e d

This should be modeled as a vertex-labeled multigraph.

Choose a graph space for barn swallows

slide-18
SLIDE 18

Assortative pairing of barn swallows

−0.6 −0.4 −0.2 0.0 0.2 0.4 0.6 1 2 3 4 5 r

Stub-labeled

Simple graphs

Density p = 0.001 −0.6 −0.4 −0.2 0.0 0.2 0.4 0.6 1 2 3 4 5 r

Multigraphs

Density p = 0.608 −0.6 −0.4 −0.2 0.0 0.2 0.4 0.6 1 2 3 4 5 r p = 0.001

Vertex-labeled

−0.6 −0.4 −0.2 0.0 0.2 0.4 0.6 1 2 3 4 5 r p = 0.852

note: for simple graphs and statistics based on the graph adjacency matrix, stub-labeled vertex-labeled ≡

Uniform sampling means we can compare empirical value to null distribution to draw scientific conclusions.

The choice of graph space matters—careful choice & sampling can flip conclusions!

Sanity check: should be = for simple NONE of these is centered at zero. Correct space is meaningfully different.

slide-19
SLIDE 19

8 9 c

  • n

v e r t t

  • a

l i g n m e n t i n d i c a t

  • r

s r e m

  • v

e s h

  • r

t a l i g n e d r e g i

  • n

s e x t r a c t h i g h l y v a r i a b l e r e g i

  • n

s

C D

Community Detection

Are there groups of vertices that tend to associate with each other more than we expect by chance? Data: collaborations among geometers. Maximize modularity, e.g.

slide-20
SLIDE 20

Coauthorship communities (vertex-labeled multigraph)

Modularity

expected number of edges in a random degree-preserving null model specifically, in the stub-labeled loopy multigraph CM

Generic Modularity

expected number of edges in any random degree-preserving null model

2 3 4 5 6 7 8 9 10

number of communities

0.4 0.5 0.6 0.7 0.8 0.9 1

NMI between Eq(6) and Eq(8) partitions

Similarity of Q and Qgeneric communities number of communities

same community detection algorithm, same initial state, different results

slide-21
SLIDE 21

Advanced edge swaps

required for graph-of-graphs irreducibility in directed networks

reversing a directed triangle a

useful if you wish to sample only networks that have a fixed number

  • f connected components

connectivity preserving edge swap b

  • ther swaps have been proposed,

e.g. to improve mixing time

3 edge swap c

Proofs, samplers, the history of the configuration model, and applications in the paper

slide-22
SLIDE 22

The point: graph spaces & stub labels matter, in theory and in practice. Recognizing this exposes a number of unrecognized & unsolved problems. Provably uniform sampling methods exist—some have existed for decades!

slide-23
SLIDE 23

danlarremore.com/configuration models

Johan Ugander Bailey Fosdick Joel Nishimura

Configuring Random Graph Models with Fixed Degree Sequences Fosdick, Larremore, Nishimura, Ugander. To Appear in SIAM Review. arxiv.org/abs/1608.00607

slide-24
SLIDE 24

Thank you

@danlarremore larremore@santafe.edu