Statistical inference of network structure Part 2 Tiago P. Peixoto - - PowerPoint PPT Presentation

statistical inference of network structure
SMART_READER_LITE
LIVE PREVIEW

Statistical inference of network structure Part 2 Tiago P. Peixoto - - PowerPoint PPT Presentation

Statistical inference of network structure Part 2 Tiago P. Peixoto University of Bath Berlin, August 2017 Weighted graphs C. Aicher et al. Journal of Complex Networks 3(2), 221-248 (2015); T.P.P arXiv: 1708.01432 Adjacency: A ij { 0 , 1 }


slide-1
SLIDE 1

Statistical inference of network structure

Part 2 Tiago P. Peixoto

University of Bath Berlin, August 2017

slide-2
SLIDE 2

Weighted graphs

  • C. Aicher et al. Journal of Complex Networks 3(2), 221-248 (2015); T.P.P

arXiv: 1708.01432

Adjacency: Aij ∈ {0, 1} or N Weights: xij ∈ N or R SBMs with edge covariates: P(A, x|θ, γ, b) = P(x|A, γ, b)P(A|θ, b) Adjacency: P(A|θ = {λ, κ}, b) =

  • i<j

e−λbi,bjκiκj(λbi,bjκiκj)Aij Aij! , Edge covariates: P(x|A, γ, b) =

  • r≤s

P(xrs|γrs) P(x|γ) → Exponential, Normal, Geometric, Binomial, Poisson, . . .

slide-3
SLIDE 3

Weighted graphs

T.P.P arXiv: 1708.01432 Nonparametric Bayesian approach P(b|A, x) = P(A, x|b)P(b) P(A, x) , Marginal likelihood: P(A, x|b) =

  • P(A, x|θ, γ, b)P(θ)P(γ) dθdγ

= P(A|b)P(x|A, b), Adjacency part (unweighted): P(A|b) =

  • P(A|θ, b)P(θ) dθ

Weights part: P(x|A, b) =

  • P(x|A, γ, b)P(γ) dγ

=

  • r≤s
  • P(xrs|γrs)P(γrs) dγrs
slide-4
SLIDE 4

UN Migrations

slide-5
SLIDE 5

UN Migrations

100 101 102 103 104 105 106

Migrations

10−9 10−8 10−7 10−6 10−5 10−4 10−3 10−2 10−1

Probability

SBM fit with geometric weights Geometric distribution fit

slide-6
SLIDE 6

Votes in congress

Deputy Deputy

0.0 0.2 0.4 0.6 0.8

Vote correlation O p p

  • s

i t i

  • n

G

  • v

e r n m e n t

−0.2 0.0 0.2 0.4 0.6 0.8 1.0

Vote correlation

1 2 3 4

Probability density

SBM fit on original data SBM fit on shuffled data

slide-7
SLIDE 7

Human connectome

Right hemisphere Left hemisphere

10−1 100 101 102 Electrical connectivity (mm−1) 10−8 10−6 10−4 10−2 100 Probability density SBM fit 0.1 0.2 0.3 0.4 0.5 0.6 Fractional anisotropy (dimensionless) 1 2 3 4 5 Probability density SBM fit

slide-8
SLIDE 8

Overlapping groups

c)

(Palla et al 2005)

slide-9
SLIDE 9

Overlapping groups

c)

(Palla et al 2005)

slide-10
SLIDE 10

Overlapping groups

c)

(Palla et al 2005)

◮ Number of nonoverlapping partitions: BN ◮ Number of overlapping partitions: 2BN

slide-11
SLIDE 11

Overlapping groups

c)

(Palla et al 2005)

◮ Number of nonoverlapping partitions: BN ◮ Number of overlapping partitions: 2BN

slide-12
SLIDE 12

Group overlap

P(A|κ, λ) =

  • i<j

e−λijλ

Aij ij

Aij! ×

  • i

e−λii/2(λii/2)Aii/2 Aii/2! , λij =

  • rs

κirλrsκjs Labelled half-edges: Aij =

  • rs

Grs

ij ,

P(A|κ, λ) =

  • G

P(G|κ, λ)

slide-13
SLIDE 13

Group overlap

P(A|κ, λ) =

  • i<j

e−λijλ

Aij ij

Aij! ×

  • i

e−λii/2(λii/2)Aii/2 Aii/2! , λij =

  • rs

κirλrsκjs Labelled half-edges: Aij =

  • rs

Grs

ij ,

P(A|κ, λ) =

  • G

P(G|κ, λ) P(G) =

  • P(G|κ, λ)P(κ)P(λ|¯

λ) dκdλ, = ¯ λE (¯ λ + 1)E+B(B+1)/2

  • r<s ers!

r err!!

  • rs
  • i<j Grs

ij ! i Grs ii !! ×

  • r

(N − 1)! (er + N − 1)! ×

  • ir

kr

i !,

slide-14
SLIDE 14

Group overlap

P(A|κ, λ) =

  • i<j

e−λijλ

Aij ij

Aij! ×

  • i

e−λii/2(λii/2)Aii/2 Aii/2! , λij =

  • rs

κirλrsκjs Labelled half-edges: Aij =

  • rs

Grs

ij ,

P(A|κ, λ) =

  • G

P(G|κ, λ) P(G) =

  • P(G|κ, λ)P(κ)P(λ|¯

λ) dκdλ, = ¯ λE (¯ λ + 1)E+B(B+1)/2

  • r<s ers!

r err!!

  • rs
  • i<j Grs

ij ! i Grs ii !! ×

  • r

(N − 1)! (er + N − 1)! ×

  • ir

kr

i !,

Microcanonical equivalence: P(G) = P(G|k, e)P(k|e)P(e), P(G|k, e) =

  • r<s ers!

r err!! ir kr i !

  • rs
  • i<j Grs

ij ! i Grs ii !! r er!,

P(k|e) =

  • r
  • er

N

  • −1
slide-15
SLIDE 15

Overlap vs. non-overlap

Social “ego” network (from Facebook)

8 16 24 k 1 2 3 4 nk 5 10 15 20 k 2 4 6 nk 3 6 9 k 1 2 3 nk 4 8 12 16 k 1 2 3 nk

B = 4, Λ ≃ 0.053

slide-16
SLIDE 16

Overlap vs. non-overlap

Social “ego” network (from Facebook)

8 16 24 k 1 2 3 4 nk 5 10 15 20 k 2 4 6 nk 3 6 9 k 1 2 3 nk 4 8 12 16 k 1 2 3 nk 8 16 24 k 1 2 3 4 nk 3 6 k 2 4 nk 3 6 9 k 1 2 3 nk 4 8 12 k 1 2 3 nk

B = 4, Λ ≃ 0.053 B = 5, Λ = 1

slide-17
SLIDE 17

Overlap vs. non-overlap

0.0 0.2 0.4 0.6 0.8 1.0 µ 4.5 5.0 5.5 6.0 Σ/E

B = 4 (overlapping) B = 15 (nonoverlapping)

slide-18
SLIDE 18

SBM with layers

T.P.P, Phys. Rev. E 92, 042807 (2015)

Collapsed l = 1 l = 2 l = 3

◮ Fairly straightforward. Easily combined with degree-correction, overlaps, etc. ◮ Edge probabilities are in general different in each layer. ◮ Node memberships can move or stay the same across layer. ◮ Works as a general model for discrete as well as discretized edge covariates. ◮ Works as a model for temporal networks.

slide-19
SLIDE 19

SBM with layers

Edge covariates P({Al}|{θ}) = P(Ac|{θ})

  • r≤s
  • l ml

rs!

mrs! Independent layers P({Al}|{{θ}l}, {φ}, {zil}}) =

  • l

P(Al|{θ}l, {φ}) Embedded models can be of any type: Traditional, degree-corrected, overlapping.

slide-20
SLIDE 20

Layer information can reveal hidden structure

slide-21
SLIDE 21

Layer information can reveal hidden structure

slide-22
SLIDE 22

... but it can also hide structure!

→ · · · × C

0.0 0.2 0.4 0.6 0.8 1.0 c 0.0 0.2 0.4 0.6 0.8 1.0 NMI

Collapsed E/C = 500 E/C = 100 E/C = 40 E/C = 20 E/C = 15 E/C = 12 E/C = 10 E/C = 5

slide-23
SLIDE 23

Model selection

Null model: Collapsed (aggregated) SBM + fully random layers P({Gl}|{θ}, {El}) = P(Gc|{θ}) ×

  • l El!

E!

(we can also aggregate layers into larger layers)

slide-24
SLIDE 24

Model selection

Example: Social network of physicians

N = 241 Physicians Survey questions: ◮ “When you need information or advice about questions of therapy where do you usually turn?” ◮ “And who are the three or four physicians with whom you most often find yourself discussing cases or therapy in the course of an ordinary week – last week for instance?” ◮ “Would you tell me the first names of your three friends whom you see most often socially?”

T.P.P, Phys. Rev. E 92, 042807 (2015)

slide-25
SLIDE 25

Model selection

Example: Social network of physicians

slide-26
SLIDE 26

Model selection

Example: Social network of physicians

slide-27
SLIDE 27

Model selection

Example: Social network of physicians

Λ = 1 log10 Λ ≈ −50

slide-28
SLIDE 28

Example: Brazilian chamber of deputies

Voting network between members of congress (1999-2006)

P M D B ,

P P , P T B D E M , P P

P T P P ,

P M D B , P T B P D T , P S B , P C d

  • B

P T B , P M D B

P S D B ,

P R

D E M , P F L P S D B

P T

P P , P T B

R i g h t C e n t e r L e f t

slide-29
SLIDE 29

Example: Brazilian chamber of deputies

Voting network between members of congress (1999-2006)

P M D B ,

P P , P T B D E M , P P

P T P P ,

P M D B , P T B P D T , P S B , P C d

  • B

P T B , P M D B

P S D B ,

P R

D E M , P F L P S D B

P T

P P , P T B

R i g h t C e n t e r L e f t

P R , P F L , P M D B

P T P M D B ,

P P , P T B P D T , P S B , P C d

  • B

P M D B , P P S

D E M ,

P F L

P S D B

P T

P M D B ,

P P , P T B

P T B

G

  • v

e r n m e n t O p p

  • s

i t i

  • n

1999-2002

P R , P F L , P M D B

P T P M D B ,

P P , P T B P D T , P S B , P C d

  • B

P M D B , P P S

D E M ,

P F L

P S D B

P T

P M D B ,

P P , P T B

P T B

G

  • v

e r n m e n t O p p

  • s

i t i

  • n

2003-2006

slide-30
SLIDE 30

Example: Brazilian chamber of deputies

Voting network between members of congress (1999-2006)

P M D B ,

P P , P T B D E M , P P

P T P P ,

P M D B , P T B P D T , P S B , P C d

  • B

P T B , P M D B

P S D B ,

P R

D E M , P F L P S D B

P T

P P , P T B

R i g h t C e n t e r L e f t

P R , P F L , P M D B

P T P M D B ,

P P , P T B P D T , P S B , P C d

  • B

P M D B , P P S

D E M ,

P F L

P S D B

P T

P M D B ,

P P , P T B

P T B

G

  • v

e r n m e n t O p p

  • s

i t i

  • n

1999-2002

P R , P F L , P M D B

P T P M D B ,

P P , P T B P D T , P S B , P C d

  • B

P M D B , P P S

D E M ,

P F L

P S D B

P T

P M D B ,

P P , P T B

P T B

G

  • v

e r n m e n t O p p

  • s

i t i

  • n

2003-2006

log10 Λ ≈ −111 Λ = 1

slide-31
SLIDE 31

Real-valued edges?

Idea: Layers {ℓ} → bins of edge values! P({Gx}|{θ}{ℓ}, {ℓ}) = P({Gl}|{θ}{ℓ}, {ℓ}) ×

  • l

ρ(xl) Bayesian posterior → Number (and shape) of bins

slide-32
SLIDE 32

Movement between groups...

P T P M D B P D T , P S B P M D B P T P S B , P D T P P , P R , P T B P C d

  • B

, P S B P S D B P F L , D E M P P S , P T

slide-33
SLIDE 33

Networks with metadata

Many network datasets contain metadata: Annotations that go beyond the mere adjacency between nodes. Often assumed as indicators of topological structure, and used to validate community detection methods. A.k.a. “ground-truth”.

slide-34
SLIDE 34

Example: American college football

Metadata (Conferences)

slide-35
SLIDE 35

Example: American college football

SBM fit

slide-36
SLIDE 36

Example: American college football

Discrepancy

slide-37
SLIDE 37

Example: American college football

Discrepancy

Why the discrepancy? Some hypotheses:

slide-38
SLIDE 38

Example: American college football

Discrepancy

Why the discrepancy? Some hypotheses: ◮ The model is not sufficiently descriptive.

slide-39
SLIDE 39

Example: American college football

Discrepancy

Why the discrepancy? Some hypotheses: ◮ The model is not sufficiently descriptive. ◮ The metadata is not sufficiently descriptive or is inaccurate.

slide-40
SLIDE 40

Example: American college football

Discrepancy

Why the discrepancy? Some hypotheses: ◮ The model is not sufficiently descriptive. ◮ The metadata is not sufficiently descriptive or is inaccurate. ◮ Both.

slide-41
SLIDE 41

Example: American college football

Discrepancy

Why the discrepancy? Some hypotheses: ◮ The model is not sufficiently descriptive. ◮ The metadata is not sufficiently descriptive or is inaccurate. ◮ Both. ◮ Neither.

slide-42
SLIDE 42

Model variations: Annotated networks

M.E.J. Newman and A. Clauset, arXiv:1507.04001

Main idea: Treat metadata as data, not “ground truth”. Annotations are partitions, {xi} Can be used as priors: P(G, {xi}|θ, γ) =

  • {bi}

P(G|{bi}, θ)P({bi}|{xi}, γ) P({bi}|{xi}, γ) =

  • i

γbixu Drawbacks: Parametric (i.e. can overfit). Annotations are not always partitions.

slide-43
SLIDE 43

Metadata is often very heterogeneous

Example: IMDB Film-Actor network Data: 96, 982 Films, 275, 805 Actors, 1, 812, 657 Film-Actor Edges Film metadata: Title, year, genre, production company, country, user-contributed keywords, etc. Actor metadata: Name, Age, Gender, Nationality, etc.

User-contributed keywords (93, 448)

100 101 102 103 104 k 100 101 102 103 104 105 Nk Keyword occurrence Number of keywords per film

slide-44
SLIDE 44

Metadata is often very heterogeneous

Example: IMDB Film-Actor network

Keyword Occurrences ’independent-film’ 15513 ’based-on-novel’ 12303 ’character-name-in-title’ 11801 ’murder’ 11184 ’sex’ 9759 ’female-nudity’ 9239 ’nudity’ 5846 ’death’ 5791 ’husband-wife-relationship’ 5568 ’love’ 5560 ’violence’ 5480 ’police’ 5463 ’father-son-relationship’ 5063

slide-45
SLIDE 45

Metadata is often very heterogeneous

Example: IMDB Film-Actor network

Keyword Occurrences ’independent-film’ 15513 ’based-on-novel’ 12303 ’character-name-in-title’ 11801 ’murder’ 11184 ’sex’ 9759 ’female-nudity’ 9239 ’nudity’ 5846 ’death’ 5791 ’husband-wife-relationship’ 5568 ’love’ 5560 ’violence’ 5480 ’police’ 5463 ’father-son-relationship’ 5063 Keyword Occurrences ’discriminaton-against-anteaters’ 1 ’partisan-violence’ 1 ’deliberately-leaving-something-behind’ 1 ’princess-from-outer-space’ 1 ’reference-to-aleksei-vorobyov’ 1 ’dead-body-on-the-beach’ 1 ’liver-failure’ 1 ’hit-with-a-skateboard’ 1 ’helping-blind-man-cross-street’ 1 ’abandoned-pet’ 1 ’retired-clown’ 1 ’resentment-toward-stepson’ 1 ’mutilating-a-plant’ 1

slide-46
SLIDE 46

Better approach: Metadata as data

Main idea: Treat metadata as data, not “ground truth”. Generalized annotations Aij → Data layer Tij → Annotation layer

Data, A Metadata, T

◮ Joint model for data and metadata (the layered SBM [1]). ◮ Arbitrary types of annotation. ◮ Both data and metadata are clustered into groups. ◮ Fully nonparametric.

slide-47
SLIDE 47

Example: American college football

slide-48
SLIDE 48

Prediction of missing edges

G′ = G

  • Observed

∪ δG

  • Missing

Posterior probability of missing edges P(δG|G, {bi}) =

  • θ P(G ∪ δG|{bi}, θ)P(θ)
  • θ P(G|{bi}, θ)P(θ)
  • A. Clauset, C. Moore, MEJ Newman,

Nature, 2008

  • R. Guimer`

a, M Sales-Pardo, PNAS 2009

Drug-drug interactions

  • R. Guimer`

a, M. Sales-Pardo, PLoS Comput Biol, 2013

slide-49
SLIDE 49

Metadata and prediction of missing nodes

Node probability, with known group membership: P(ai|A, bi, b) =

  • θ P(A, ai|bi, b, θ)P(θ)
  • θ P(A|b, θ)P(θ)

Node probability, with unknown group membership: P(ai|A, b) =

  • bi

P(ai|A, bi, b)P(bi|b), Node probability, with unknown group membership, but known metadata: P(ai|A, T , b, c) =

  • bi

P(ai|A, bi, b)P(bi|T , b, c), Group membership probability, given metadata: P(bi|T , b, c) = P(bi, b|T , c) P(b|T , c) =

  • γ P(T |bi, b, c, γ)P(bi, b)P(γ)
  • b′

i

  • γ P(T |b′

i, b, c, γ)P(b′ i, b)P(γ)

Predictive likelihood ratio: λi = P(ai|A, T , b, c) P(ai|A, T , b, c) + P(ai|A, b) λi > 1/2 → the metadata improves the prediction task

slide-50
SLIDE 50

Metadata and prediction of missing nodes

slide-51
SLIDE 51

Metadata and prediction of missing nodes

FB Penn PPI (krogan) FB Tennessee FB Berkeley FB Caltech FB Princeton PPI (isobase hs) PPI (yu) FB Stanford PGP PPI (gastric) FB Harvard FB Vassar

  • Pol. Blogs

PPI (pancreas) Flickr PPI (lung) PPI (predicted) Anobii SN IMDB Amazon Debian pkgs. DBLP Internet AS APS citations LFR bench. 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

  • Avg. likelihood ratio, λ

λi = P(ai|A, T , b, c) P(ai|A, T , b, c) + P(ai|A, b)

slide-52
SLIDE 52

Metadata and prediction of missing nodes

Metadata Data

Aligned Misaligned Random

2 3 4 5 6 7 8 9 10

Number of planted groups, B

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Average predictive likelihood ratio, λ

Aligned Random Misaligned Misaligned (N/B = 103)

slide-53
SLIDE 53

Metadata predictiveness

Neighbor probability: Pe(i|j) = ki ebi,bj ebiebj Neighbour probability, given metadata tag: Pt(i) =

  • j

P(i|j)Pm(j|t) Null neighbor probability (no metadata tag): Q(i) =

  • j

P(i|j)Π(j) Kullback-Leibler divergence: DKL(Pt||Q) =

  • i

Pt(i) ln Pt(i) Q(i) Relative divergence: µr ≡ DKL(Pt||Q) H(Q) → Metadata group predictiveness

Neighbour prob. without metadata

i Q(i)

Neighbour prob. with metadata

i P(i)

slide-54
SLIDE 54

Metadata predictiveness

IMDB film-actor network

100 101 102 103 104

Metadata group size, nr

0.0 0.2 0.4 0.6 0.8 1.0 1.2

Metadata group predictiveness, µr

Keywords Producers Directors Ratings Country Genre Production Year

slide-55
SLIDE 55

Metadata predictiveness

APS citation network

100 101 102

Metadata group size, nr

0.0 0.2 0.4 0.6 0.8 1.0

Metadata group predictiveness, µr

Journal PACS Date

slide-56
SLIDE 56

Metadata predictiveness

Amazon co-purchases

101 102 103 104

Metadata group size, nr

0.0 0.2 0.4 0.6 0.8 1.0

Metadata group predictiveness, µr

Categories

slide-57
SLIDE 57

Metadata predictiveness

Internet AS

100 101

Metadata group size, nr

0.0 0.2 0.4 0.6 0.8 1.0

Metadata group predictiveness, µr

Country

slide-58
SLIDE 58

Metadata predictiveness

Facebook Penn State

100 101 102 103

Metadata group size, nr

0.0 0.2 0.4 0.6 0.8 1.0

Metadata group predictiveness, µr

Dorm Gender High School Major Year

slide-59
SLIDE 59

n-order Markov chains with communities

  • T. P. P. and Martin Rosvall, arXiv: 1509.04740

Transitions conditioned on the last n tokens p(xt| xt−1) → Probability of transition from memory

  • xt−1 = {xt−n, . . . , xt−1} to

token xt Instead of such a direct parametrization, we divide the tokens and memories into groups: p(x| x) = θxλbxb

x

θx → Overall frequency of token x λrs → Transition probability from memory group s to token group r bx, b

x → Group memberships of tokens and

groups

t f s I e a w

  • h

b m i h i t s w b

  • a

f e m

(a)

I t w a s h e b

  • f

i m

(b)

t he It

  • f

st as s f es t w be wa me e

  • b

th im ti h i t w

  • a

s b f e m

(c) {xt} = "It␣was␣the␣best␣of␣times"

slide-60
SLIDE 60

n-order Markov chains with communities

{xt} = "It␣was␣the␣best␣of␣times" P({xt}|b) =

  • dλdθ P({xt}|b, λ, θ)P(θ)P(λ)

The Markov chain likelihood is (almost) identical to the SBM likelihood that gen- erates the bipartite transition graph. Nonparametric → We can select the number of groups and the Markov

  • rder based on statistical evidence!
  • T. P. P. and Martin Rosvall, Nature Communications (in press)
slide-61
SLIDE 61

Bayesian formulation

P({xt}|b) =

  • dθ dλ P({xt}|b, λ, θ)
  • r

Dr({θx})

  • s

Ds({λrs}) Noninformative priors → Microcanonical model P({xt}|b) = P({xt}|b, {ers}, {kx}) × P({kx}|{ers}, b) × P({ers}), where P({xt}|b, {ers}, {kx}) → Sequence likelihood, P({kx}|{ers}, b) → Token frequency likelihood, P({ers}) → Transition count likelihood, − ln P({xt}, b) → Description length of the sequence Inference ↔ Compression

slide-62
SLIDE 62

n-order Markov chains with communities

US Air Flights War and peace Taxi movements “Rock you” password list n BN BM Σ Σ′ BN BM Σ Σ′ BN BM Σ Σ′ BN BM Σ Σ′ 1 384 365 364, 385, 780 365, 211, 460 65 71 11, 422, 564 11, 438, 753 387 385 2, 635, 789 2, 975, 299 140 147 1, 060, 272, 230 1, 060, 385, 582 2 386 7605 319, 851, 871 326, 511, 545 62 435 9, 175, 833 9, 370, 379 397 1127 2, 554, 662 3, 258, 586 109 1597 984, 697, 401 987, 185, 890 3 183 2455 318, 380, 106 339, 898, 057 70 1366 7, 609, 366 8, 493, 211 393 1036 2, 590, 811 3, 258, 586 114 4703 910, 330, 062 930, 926, 370 4 292 1558 318, 842, 968 337, 988, 629 72 1150 7, 574, 332 9, 282, 611 397 1071 2, 628, 813 3, 258, 586 114 5856 889, 006, 060 940, 991, 463 5 297 1573 335, 874, 766 338, 442, 011 71 882 10, 181, 047 10, 992, 795 395 1095 2, 664, 990 3, 258, 586 99 6430 1, 000, 410, 410 1, 005, 057, 233 gzip 573, 452, 240 9, 594, 000 4, 289, 888 1, 315, 388, 208 LZMA 402, 125, 144 7, 420, 464 2, 902, 904 1, 097, 012, 288

(SBM can compress your files!)

  • T. P. P. and Martin Rosvall, arXiv: 1509.04740
slide-63
SLIDE 63

n-order Markov chains with communities

Example: Flight itineraries

  • xt = {xt−3, Altanta|Las Vegas, xt−1}

{ {

  • T. P. P. and Martin Rosvall, arXiv: 1509.04740
slide-64
SLIDE 64

Dynamic networks

Each token is an edge: xt → (i, j)t Dynamic network → Sequence of edges: {xt} = {(i, j)t} Problem: Too many possible tokens! O(N 2) Solution: Group the nodes into B groups. Pair of node groups (r, s) → edge group. Number of tokens: O(B2) ≪ O(N 2) Two-step generative process: {xt} = {(r, s)t} (n-order Markov chain of pairs of group labels) P((i, j)t|(r, s)t) (static SBM generating edges from group labels)

  • T. P. P. and Martin Rosvall, arXiv: 1509.04740
slide-65
SLIDE 65

Dynamic networks

Example: Student proximity

Static part P C P C s t M P s t 1 M P M P s t 2 2 B I O 1 2 B I O 2 2 B I O 3 P S I s t

  • T. P. P. and Martin Rosvall, arXiv: 1509.04740
slide-66
SLIDE 66

Dynamic networks

Example: Student proximity

Static part P C P C s t M P s t 1 M P M P s t 2 2 B I O 1 2 B I O 2 2 B I O 3 P S I s t Temporal part

  • T. P. P. and Martin Rosvall, arXiv: 1509.04740
slide-67
SLIDE 67

Dynamic networks in continuous time

xτ → token at continuous time τ P({xτ}) = P({xt})

  • Discrete chain

× P({∆t}|{xt})

  • Waiting times

Exponential waiting time distribution P({∆t}|{xt}, λ) =

  • x

λk

x

b

x e−λb x ∆ x

Bayesian integrated likelihood P({∆t}|{xt}) =

  • r

∞ dλ λere−λ∆rP(λ|α, β), =

  • r

Γ(er + α)βα Γ(α)(∆r + β)er+α . Hyperparameters: α, β. Noninformative limit α → 0, β → 0 leads to Jeffreys prior: P(λ) ∝ 1

λ

  • T. P. P. and Martin Rosvall, arXiv: 1509.04740
slide-68
SLIDE 68

Dynamic networks

Continuous time {xτ} → Sequence of notes in Beethoven’s fifth symphony

4 8 12 16 20

Memory group

10−4 10−3 10−2 10−1

  • Avg. waiting time (seconds)

Without waiting times (n = 1)

4 8 12 16 20 24 28 32

Memory group

10−7 10−6 10−5 10−4 10−3 10−2 10−1

  • Avg. waiting time (seconds)

With waiting times (n = 2)

slide-69
SLIDE 69

Nonstationarity Dynamic networks

{xt} → Concatenation of “War and peace,” by Leo Tolstoy, and “ ` A la recherche du temps perdu,” by Marcel Proust. Unmodified chain Tokens Memories − log2 P({xt}, b) = 7, 450, 322

slide-70
SLIDE 70

Nonstationarity Dynamic networks

{xt} → Concatenation of “War and peace,” by Leo Tolstoy, and “ ` A la recherche du temps perdu,” by Marcel Proust. Unmodified chain Tokens Memories − log2 P({xt}, b) = 7, 450, 322 Annotated chain x′

t = (xt, novel)

Tokens Memories − log2 P({xt}, b) = 7, 146, 465

slide-71
SLIDE 71

Latent space models

  • P. D. Hoff, A. E. Raferty, and M. S. Handcock, J. Amer. Stat. Assoc.

97, 1090–1098 (2002) P(G|{ xi}) =

  • i>j

p

Aij ij (1 − pij)1−Aij

pij = exp

  • − (

xi − xj)2 .

(Human connectome)

Many other more elaborate embeddings (e.g. hyperbolic spaces). Properties: ◮ Softer approach: Nodes are not placed into discrete categories. ◮ Exclusively assortative structures. ◮ Formulation for directed graphs less trivial.

slide-72
SLIDE 72

Discrete vs. continuous

Can we formulate a unified parametrization?

slide-73
SLIDE 73

The Graphon

P(G|{xi}) =

  • i>j

p

Aij ij (1 − pij)1−Aij

pij = ω(xi, xj) xi ∈ [0, 1] Properties: ◮ Mostly a theoretical tool. ◮ Cannot be directly inferred (without massively overfitting). ◮ Needs to be parametrized to be practical.

slide-74
SLIDE 74

The SBM → a piecewise-constant Graphon

ω(x, y)

slide-75
SLIDE 75

A “soft” graphon parametrization

puv = dudv 2m ω(xu, xv) ω(x, y) =

N

  • j,k=0

cjkBj(x)Bk(y) Bernstein polynomials: Bk(x) =

  • N

k

  • xk(1−x)N−k,

k = 0 . . . N

0.0 0.2 0.4 0.6 0.8 1.0

x

0.0 0.2 0.4 0.6 0.8 1.0

Bk(x)

k = 0 k = 1 k = 2 k = 3 k = 4 k = 5

slide-76
SLIDE 76

A “soft” graphon parametrization

puv = dudv 2m ω(xu, xv) ω(x, y) =

N

  • j,k=0

cjkBj(x)Bk(y) Bernstein polynomials: Bk(x) =

  • N

k

  • xk(1−x)N−k,

k = 0 . . . N

0.0 0.2 0.4 0.6 0.8 1.0

x

0.0 0.2 0.4 0.6 0.8 1.0

Bk(x)

k = 0 k = 1 k = 2 k = 3 k = 4 k = 5

ω(x, y)

slide-77
SLIDE 77

Inferring the model

Semi-parametric Bayesian approach Expectation-Maximization algorithm

  • 1. Expectation step

q(x) = P(A, x|c)

  • P(A, x|c)dnx
  • 2. Maximization step

P(A|c) =

  • P(A, x|c)dnx

ˆ cjk = argmax

cjk

P(A|c) Belief-Propagation

ηu→v(x) = 1 Zu→v exp

  • w

dudw 1 qw(y)ω(x, y)dy

  • ×
  • w(=v)

auw=1

1 ηw→u(y)ω(x, y)dy, quv(x, y) = ηu→v(x)ηv→u(y)ω(x, y) 1

0 ηu→v(x)ηv→u(y)ω(x, y)dxdy

. Algorithmic complexity: O(mN2)

slide-78
SLIDE 78

Example: SBM sample

slide-79
SLIDE 79

Example: School friendships

12 14 16 18

Age

1

Node parameter

slide-80
SLIDE 80

Example: C. elegans worm

slide-81
SLIDE 81

Example: C. elegans worm

slide-82
SLIDE 82

Example: Interstate highway