What is the shell distribution of a graph telling us? Vishesh Karwa - - PowerPoint PPT Presentation

what is the shell distribution of a graph telling us
SMART_READER_LITE
LIVE PREVIEW

What is the shell distribution of a graph telling us? Vishesh Karwa - - PowerPoint PPT Presentation

What is the shell distribution of a graph telling us? Vishesh Karwa Based on joint work with Michael J. Pelsmajer (IIT) Sonja Petrovi c (IIT) Despina Stasi (Univ of Cyprus/IIT) Dane Wilburne (IIT) arXiv:1410.7357 - v2 soon. (Monday?)


slide-1
SLIDE 1

What is the shell distribution of a graph telling us?

Vishesh Karwa

Based on joint work with Michael J. Pelsmajer (IIT) Sonja Petrovi´ c (IIT) Despina Stasi (Univ of Cyprus/IIT) Dane Wilburne (IIT) arXiv:1410.7357 - v2 soon. (Monday?)

Carnegie Mellon University Harvard University

AMS Sectional Meeting Oct 4, 2015

vishesh.karwa@gmail.com (CMU/Harvard) Cores ERGM Oct 2015 1 / 19

slide-2
SLIDE 2

Outline

1 Motivation 2 Shell Distribution ERGM 3 Inference in the Shell Distribution ERGM 4 Application to Real life Example 5 Open Problems 6 The End

vishesh.karwa@gmail.com (CMU/Harvard) Cores ERGM Oct 2015 2 / 19

slide-3
SLIDE 3

Motivation

The k-core decomposition of a graph

Definition (Seidman83) The k-core of a graph G is the maximal subgraph in which every vertex has degree at least k. The shell index of a vertex i is the highest k such that i is contained in the k-core of G.

vishesh.karwa@gmail.com (CMU/Harvard) Cores ERGM Oct 2015 3 / 19

slide-4
SLIDE 4

Motivation

“Modeling” a graph via its Core decomposition

A core decomposition has been used as a descriptive tool to explain many properties of observed graphs, such as:

1 Core-Periphery or the rich club structure 2 Importance of a node in a network - Robust degree of a node 3 Visualization of network topology by peeling it into layers

Fast computation of shell indices; Interesting applications and heuristic studies.

vishesh.karwa@gmail.com (CMU/Harvard) Cores ERGM Oct 2015 4 / 19

slide-5
SLIDE 5

Motivation

Why do we care?

No clear understanding of what the core structure really represents:

1 1983,2006: Shell index measures the importance of a node. 2 2007: Wait, it does not. 3 2010: But wait, if you take this into account the degrees it does...

How do we make this question precise? What properties of a network does the core structure really capture? Goal: How to make the core decomposition a tool for statistical modeling rather than a descriptive analysis?

vishesh.karwa@gmail.com (CMU/Harvard) Cores ERGM Oct 2015 5 / 19

slide-6
SLIDE 6

Motivation

Summarizing the k-core decomposition

Recall shell index of a vertex i is the highest k such that v is contained in the k-core. Shell sequence is the sequence of shell indices of each node. Shell distribution is the histogram of shell sequence. ns(g) = {0, 2, 3, 13, 0, 0, ..., 0}

vishesh.karwa@gmail.com (CMU/Harvard) Cores ERGM Oct 2015 6 / 19

slide-7
SLIDE 7

Shell Distribution ERGM

Enter: Exponential random graph models

P(G, θ) = exp k

  • i=1

θiti(g) − ψ(θ)

  • ERGMs are natural statistical tools to model networks through their

summary statistics. Large growing literature on ERGMs - posses both good and bad (but fixable) properties, see Rinaldo et al. [2009]. Embed the core structure in the ERGM framework and study it’s properties.

vishesh.karwa@gmail.com (CMU/Harvard) Cores ERGM Oct 2015 7 / 19

slide-8
SLIDE 8

Shell Distribution ERGM

The Family of Shell distribution ERGMs

Gn,m := {g : dgen(g) = m}∗ m = degeneracy parameter {n0(g), . . . , ni(g), . . . , nm−1(g)} = shell distribution pi = shell index parameter P(Gn,m) = P(G = g; p, m) = ϕ(p)

m−1

  • i=0

pini(g), For a fixed value of m, defines a sub model.

∗Can also define the model on Gn,≤m = {g : dgen(g) ≤ m} vishesh.karwa@gmail.com (CMU/Harvard) Cores ERGM Oct 2015 8 / 19

slide-9
SLIDE 9

Shell Distribution ERGM

Exponential family form

θ0, . . . , θm−1 = vector of natural parameter where θi = log pi

pm

P(G = g) = exp m−1

  • i=0

ni(g)θi − ψ(θ)

  • .

where ψ(θ) = log

g∈Gn,m exp

m−1

j=0 nj(g)θj

  • .

vishesh.karwa@gmail.com (CMU/Harvard) Cores ERGM Oct 2015 9 / 19

slide-10
SLIDE 10

Shell Distribution ERGM

Exponential family form

θ0, . . . , θm−1 = vector of natural parameter where θi = log pi

pm

P(G = g) = exp m−1

  • i=0

ni(g)θi − ψ(θ)

  • .

where ψ(θ) = log

g∈Gn,m exp

m−1

j=0 nj(g)θj

  • .

Same degree distribution, different shell distribution. Erd¨

  • s-R´

enyi not a sub model. Log-linear model only in “atomic” level.

vishesh.karwa@gmail.com (CMU/Harvard) Cores ERGM Oct 2015 9 / 19

slide-11
SLIDE 11

Inference in the Shell Distribution ERGM

Three Inference tasks on ERGMS

1 Characterize the Marginal Polytope - the convex hull of sufficient

statistics, conditions for existence of MLE

2 Sampling random graphs from the model - estimation of MLE or

Bayesian Inference

3 Sample graphs from the Fiber - the set of all graphs with fixed shell

distribution - Useful for goodness of fit testing, understanding the space of graphs with fixed shell distribution.

vishesh.karwa@gmail.com (CMU/Harvard) Cores ERGM Oct 2015 10 / 19

slide-12
SLIDE 12

Inference in the Shell Distribution ERGM

Marginal Polytope of the model P(Gn,≤m)

The unrestricted Model P(Gn,≤n−1) Theorem The marginal polytope of P(Gn,≤n−1) is a dilate of a simplex. All realizable lattice points lie on the boundary of this polytope. The MLE of P(Gn,n−1) never exists for a sample of size 1.

vishesh.karwa@gmail.com (CMU/Harvard) Cores ERGM Oct 2015 11 / 19

slide-13
SLIDE 13

Inference in the Shell Distribution ERGM

Marginal Polytope of the model P(Gn,≤m)

The unrestricted Model P(Gn,≤n−1) Theorem The marginal polytope of P(Gn,≤n−1) is a dilate of a simplex. All realizable lattice points lie on the boundary of this polytope. The MLE of P(Gn,n−1) never exists for a sample of size 1. The restricted Model P(Gn,≤m) Theorem The marginal polytope of P(Gn,≤m) is a dilate of a simplex. If n > 2m, the polytope has a non-empty interior and the MLE may exist.

vishesh.karwa@gmail.com (CMU/Harvard) Cores ERGM Oct 2015 11 / 19

slide-14
SLIDE 14

Inference in the Shell Distribution ERGM

Marginal Polytope of the model P(Gn,≤m)

The unrestricted Model P(Gn,≤n−1) Theorem The marginal polytope of P(Gn,≤n−1) is a dilate of a simplex. All realizable lattice points lie on the boundary of this polytope. The MLE of P(Gn,n−1) never exists for a sample of size 1. The restricted Model P(Gn,≤m) Theorem The marginal polytope of P(Gn,≤m) is a dilate of a simplex. If n > 2m, the polytope has a non-empty interior and the MLE may exist. Note - In general, P(Gn,=m) is better behaved than P(Gn,≤m).

vishesh.karwa@gmail.com (CMU/Harvard) Cores ERGM Oct 2015 11 / 19

slide-15
SLIDE 15

Inference in the Shell Distribution ERGM

An MCMC algorithm to Sample from the model

MCMC scheme: TNT (tie-no-tie) sampler [Hunter et al, Caimo-Friel]

instead of selecting a dyad at random whose state it will flip, it first selects a set of either non-edges or edges and swaps one of them: re-weighs the probability of selecting the dyads. better mixing properties. Probability of accepting: π = min

  • 1,
  • i

pni(g′)−ni(g)

i

· P(g′ → g) P(g → g′)

  • .

Issue: Computing ni(g′) − ni(g).

vishesh.karwa@gmail.com (CMU/Harvard) Cores ERGM Oct 2015 12 / 19

slide-16
SLIDE 16

Inference in the Shell Distribution ERGM

Understanding the structure of the fiber

Algorithm to sample graphs with fixed Shell Distribution

1 Constructs an arbitrary graph with a given shell distribution. 2 Does so with positive probability for each graph in the fiber. 3 Fast graph discovery.

Bounds on complementary sufficient statistics in the fiber,e.g., Proposition The maximum number of triangles for a graph with sorted shell sequence s1 ≤ . . . ≤ sn = m is m 3

  • +

n−m

  • i=1

si 2

  • .

vishesh.karwa@gmail.com (CMU/Harvard) Cores ERGM Oct 2015 13 / 19

slide-17
SLIDE 17

Application to Real life Example

Application to Sampson Data

Sampson data set: 18 monks in a New England Monastery nS(g) = (0, 2, 3, 13) ˆ θmle = (−7.95, 2.79, 0.91) Estimated using MCMC MLE. ˆ pmle = (0.00, 0.82, 0.13, 0.05).

vishesh.karwa@gmail.com (CMU/Harvard) Cores ERGM Oct 2015 14 / 19

slide-18
SLIDE 18

Application to Real life Example

The Polytope for the Sampson Data

Samples from ˆ θmle using a 40,000 step MCMC using TNT proposal

vishesh.karwa@gmail.com (CMU/Harvard) Cores ERGM Oct 2015 15 / 19

slide-19
SLIDE 19

Application to Real life Example

Typical Graphs from the models

vishesh.karwa@gmail.com (CMU/Harvard) Cores ERGM Oct 2015 16 / 19

slide-20
SLIDE 20

Application to Real life Example

Histogram of various summary Statistics

0.000 0.025 0.050 0.075 0.100

20 30 40 50 Edges

0.000 0.025 0.050 0.075 0.100 10 20 30

Triangles

0.000 0.004 0.008 0.012 100 200

2 stars

2 4 6 8 0.1 0.2 0.3 0.4 0.5

Centrality

0.0 0.1 0.2 0.3 5 10 15

Size of largest shell

0.0 0.1 0.2 0.3 5 10 15

Size of innermost shell

  • 3

6 9 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

Degree Number of Nodes

vishesh.karwa@gmail.com (CMU/Harvard) Cores ERGM Oct 2015 17 / 19

slide-21
SLIDE 21

Open Problems

Open Problems

1 Generate uniform random samples from Gn,m and Gn,≤m. 2 Asymptotic formula for the number of graphs in a fiber (e.g.

Barvinok and Hartigan for degree sequence)

3 Better Markov chain proposals that move rapidly in the marginal

polytope space.

4 Local computation of core distribution to speed up MCMC

Questions? Thank you for your attention! arXiv:1410.7357 - v2 soon

vishesh.karwa@gmail.com (CMU/Harvard) Cores ERGM Oct 2015 18 / 19

slide-22
SLIDE 22

The End

Questions? Thank you for your attention!

vishesh.karwa@gmail.com (CMU/Harvard) Cores ERGM Oct 2015 19 / 19