Statistical network clustering: some recent advances and - - PowerPoint PPT Presentation

statistical network clustering some recent advances and
SMART_READER_LITE
LIVE PREVIEW

Statistical network clustering: some recent advances and - - PowerPoint PPT Presentation

Statistical network clustering: some recent advances and applications to digital humanities Charles Bouveyron Laboratoire MAP5, UMR CNRS 8145 Universit Paris Descartes charles.bouveyron@parisdescartes.fr @cbouveyron 1 Disclaimer


slide-1
SLIDE 1

Statistical network clustering: some recent advances and applications to digital humanities

Charles Bouveyron

Laboratoire MAP5, UMR CNRS 8145 Université Paris Descartes charles.bouveyron@parisdescartes.fr – @cbouveyron

1

slide-2
SLIDE 2

Disclaimer

“Essentially, all models are wrong but some are useful” George E.P. Box

2

slide-3
SLIDE 3

Outline

Introduction The stochastic block model (SBM) The random subgraph model (RSM) Analysis of an ecclesiastical network Extension to dynamic networks Conclusion

3

slide-4
SLIDE 4

Introduction

The analysis of networks:

is a recent but increasingly important field in statistical learning, with applications in domains ranging from biology to history: biology: analysis of gene regulation processes, social sciences: analysis of political blogs, history: visualization of medieval social networks.

Two main problems are currently well addressed:

visualization of the networks, clustering of the network nodes. 4

slide-5
SLIDE 5

Introduction

The analysis of networks:

is a recent but increasingly important field in statistical learning, with applications in domains ranging from biology to history: biology: analysis of gene regulation processes, social sciences: analysis of political blogs, history: visualization of medieval social networks.

Two main problems are currently well addressed:

visualization of the networks, clustering of the network nodes.

Network comparison:

is a still emerging problem is statistical learning, which is mainly addressed using graph structure comparison, but limited to binary networks. 4

slide-6
SLIDE 6

Introduction

Figure: Clustering of network nodes: communities (left) vs. structures with hubs (right).

5

slide-7
SLIDE 7

Introduction

Key works in probabilistic models:

stochastic block model (SBM) by Nowicki and Snijders (2001), latent space model by Hoff, Handcock and Raftery (2002), latent cluster model by Handcock, Raftery and Tantrum (2007), mixed membership SBM (MMSBM) by Airoldi et al. (2008), mixture of experts for LCM by Gormley and Murphy (2010), MMSBM for dynamic networks by Xing et al. (2010),

  • verlapping SBM (OSBM) by Latouche et al. (2011).

A good overview is given in:

  • M. Salter-Townshend, A. White, I. Gollini and T. B. Murphy, “Review of

Statistical Network Analysis: Models, Algorithms, and Software”, Statistical Analysis and Data Mining, Vol. 5(4), pp. 243–264, 2012.

6

slide-8
SLIDE 8

Introduction: a historical problem

Our colleagues from the LAMOP team were interested in answering the following question: Was the Church organized in the same way within the different kingdoms in Merovingian Gaul?

7

slide-9
SLIDE 9

Introduction: a historical problem

Our colleagues from the LAMOP team were interested in answering the following question: Was the Church organized in the same way within the different kingdoms in Merovingian Gaul? To this end, they have build a relational database:

from written acts of ecclesiastical councils that took place in Gaul during

the 6th century (480-614),

those acts report who attended (bishops, kings, dukes, priests, monks, ...)

and what questions (regarding Church, faith, ...) were discussed,

they also allowed to characterize the type of relationship between the

individuals,

it took 18 months to build the database. 7

slide-10
SLIDE 10

Introduction: a historical problem

The database contains:

1331 individuals (mostly clergymen) who

participated to ecclesiastical councils in Gaul between 480 and 614,

4 types of relationships between

individuals have been identified (positive, negative, variable or neutral),

each individual belongs to one of the 5

regions of Gaul:

3 kingdoms: Austrasia, Burgundy and

Neustria,

2 provinces: Aquitaine and Provence. additional information is also available: social positions, family

relationships, birth and death dates, hold offices, councils dates, ...

8

slide-11
SLIDE 11

Introduction: a historical problem

Neustria Provence Unknown Aquitaine Austrasia Burgundy

Figure: Adjacency matrix of the ecclesiastical network (sorted by regions).

9

slide-12
SLIDE 12

Outline

Introduction The stochastic block model (SBM) The random subgraph model (RSM) Analysis of an ecclesiastical network Extension to dynamic networks Conclusion

10

slide-13
SLIDE 13

The stochastic block model (SBM)

The SBM (Nowicki and Snijders, 2001) model assumes that the network (represented by its adjacency matrix X) is generated as follows:

each node i is associated with an (unobserved) group among K

according to:‌ Zi ∼ M(α), where α ∈ [0, 1]K and K

k=1 αk = 1,

11

slide-14
SLIDE 14

The stochastic block model (SBM)

The SBM (Nowicki and Snijders, 2001) model assumes that the network (represented by its adjacency matrix X) is generated as follows:

each node i is associated with an (unobserved) group among K

according to:‌ Zi ∼ M(α), where α ∈ [0, 1]K and K

k=1 αk = 1,

then, each edge Xij is drawn according to:

Xij|ZikZjl = 1 ∼ B(πkl), where πkl ∈ [0, 1].

11

slide-15
SLIDE 15

The stochastic block model (SBM)

The SBM (Nowicki and Snijders, 2001) model assumes that the network (represented by its adjacency matrix X) is generated as follows:

each node i is associated with an (unobserved) group among K

according to:‌ Zi ∼ M(α), where α ∈ [0, 1]K and K

k=1 αk = 1,

then, each edge Xij is drawn according to:

Xij|ZikZjl = 1 ∼ B(πkl), where πkl ∈ [0, 1].

this model is therefore a mixture model:

Xij ∼

K

  • k=1

K

  • ℓ=1

αkαℓB(πkl).

11

slide-16
SLIDE 16

The stochastic block model (SBM)

1 2 3

π••

4 5 6 7

π••

8 9

π•• π•• π•• π••

Table: A SBM network.

12

slide-17
SLIDE 17

The stochastic block model (SBM)

Inference of the SBM model (maximum likelihood):

log-likelihood:

log p(X|α, Π) = log

  • Z

p(X, Z|α, Π)

  • ,

֒ → KN terms!

13

slide-18
SLIDE 18

The stochastic block model (SBM)

Inference of the SBM model (maximum likelihood):

log-likelihood:

log p(X|α, Π) = log

  • Z

p(X, Z|α, Π)

  • ,

֒ → KN terms!

Expectation Maximization (EM) algorithm requires the knowledge of

p(Z|X, α, Π),

Problem: p(Z|X, α, Π) is not tractable (no conditional independence)! 13

slide-19
SLIDE 19

The stochastic block model (SBM)

Inference of the SBM model (maximum likelihood):

log-likelihood:

log p(X|α, Π) = log

  • Z

p(X, Z|α, Π)

  • ,

֒ → KN terms!

Expectation Maximization (EM) algorithm requires the knowledge of

p(Z|X, α, Π),

Problem: p(Z|X, α, Π) is not tractable (no conditional independence)!

Solutions:

Variational EM (Daudin et al., 2008) + ICL (Biernacki et al., 2003), Variational Bayes EM + ILvb criterion (Latouche et al., 2012). 13

slide-20
SLIDE 20

Outline

Introduction The stochastic block model (SBM) The random subgraph model (RSM) Analysis of an ecclesiastical network Extension to dynamic networks Conclusion

14

slide-21
SLIDE 21

The random subgraph model (RSM)

Before the maths, an example of an RSM network:

Figure: Example of an RSM network.

We observe:

the partition of the network into

S = 2 subgraphs (node form),

the presence Aij of directed edges

between the N nodes,

the type Xij ∈ {1, ..., C} of the

edges (C = 3, edge color).

15

slide-22
SLIDE 22

The random subgraph model (RSM)

Before the maths, an example of an RSM network:

Figure: Example of an RSM network.

We observe:

the partition of the network into

S = 2 subgraphs (node form),

the presence Aij of directed edges

between the N nodes,

the type Xij ∈ {1, ..., C} of the

edges (C = 3, edge color). We search:

a partition of the node into K = 3

groups (node color),

which overlap with the partition

into subgraphs.

15

slide-23
SLIDE 23

The random subgraph model (RSM)

The network (represented by its adjacency matrix X) is assumed to be generated as follows:

the presence of an edge between nodes i and j is such that:

Aij ∼ B(γsisj) where si ∈ {1, ..., S} indicates the (observed) subgraph of node i,

16

slide-24
SLIDE 24

The random subgraph model (RSM)

The network (represented by its adjacency matrix X) is assumed to be generated as follows:

the presence of an edge between nodes i and j is such that:

Aij ∼ B(γsisj) where si ∈ {1, ..., S} indicates the (observed) subgraph of node i,

each node i is as well associated with an (unobserved) group among K

according to: Zi ∼ M(αsi) where αs ∈ [0, 1]K and K

k=1 αsk = 1,

16

slide-25
SLIDE 25

The random subgraph model (RSM)

The network (represented by its adjacency matrix X) is assumed to be generated as follows:

the presence of an edge between nodes i and j is such that:

Aij ∼ B(γsisj) where si ∈ {1, ..., S} indicates the (observed) subgraph of node i,

each node i is as well associated with an (unobserved) group among K

according to: Zi ∼ M(αsi) where αs ∈ [0, 1]K and K

k=1 αsk = 1,

each edge Xij can be finally of C different (observed) types and such

that: Xij|AijZikZjl = 1 ∼ M(Πkl) where Πkl ∈ [0, 1]C and C

c=1 Πklc = 1.

16

slide-26
SLIDE 26

The random subgraph model (RSM)

1 2 4 5 3 γ22, π•• γ22, π••

8 7 6 9

γ##, π•• γ##, π•• γ#2, π••

Table: A RSM network.

17

slide-27
SLIDE 27

The random subgraph model (RSM)

Xij

Π

Zi Zj

α

Xij

Π

Zi Zj Aij

γ α

Xij P

(a) SBM (b) RSM Figure: SBM model vs. RSM model.

18

slide-28
SLIDE 28

The random subgraph model (RSM)

Remark 1:

the RSM model separates the roles of the known partition and the latent

clusters,

this was motivated by historical assumptions on the creation of

relationships during the 6th century,

indeed, the possibilities of connection were preponderant over the type of

connection and mainly dependent on the geography.

19

slide-29
SLIDE 29

The random subgraph model (RSM)

Remark 1:

the RSM model separates the roles of the known partition and the latent

clusters,

this was motivated by historical assumptions on the creation of

relationships during the 6th century,

indeed, the possibilities of connection were preponderant over the type of

connection and mainly dependent on the geography. Remark 2:

an alternative approach would consist in allowing Xij to directly depend

  • n both the latent clusters and the partition,

however, this would dramatically increase the number of model

parameters (K2S2(C + 1) + SK instead of S2 + K2C + SK),

if S = 6, K = 6 and C = 4, then the alternative approach has 6 516

parameters while RSM has only 216.

19

slide-30
SLIDE 30

The random subgraph model (RSM)

We consider a Bayesian framework:

the previous model is fully defined by its joint distribution:

p(X, A, Z|α, γ, Π) = p(X|A, Z, Π)p(A|γ)p(Z|α),

which we complete with conjuguate prior distributions for model

parameters:

the prior distribution for α is:

p(γrs) = Beta(ars, brs),

the prior distribution for γ is:

p(αs) = Dir(χs),

the prior distribution for Π is:

p(Πkl) = Dir(Ξkl).

20

slide-31
SLIDE 31

The random subgraph model (RSM)

Xij Π Zi Zj Aij γ α Xij P

χ a, b Ξ

Figure: A graphical representation of the RSM model.

21

slide-32
SLIDE 32

Model inference through a VBEM algorithm

Due to the Bayesian framework introduces above:

we aim at estimating the posterior distribution p(Z, α, γ, Π|X, A), which

in turn will allow us to compute MAP estimates of Z and (α, γ, Π),

as expected, this distribution is not tractable and approximate inference

procedures are required,

the use of MCMC methods is obviously an option but MCMC methods

have a poor scaling with sample sizes.

22

slide-33
SLIDE 33

Model inference through a VBEM algorithm

Due to the Bayesian framework introduces above:

we aim at estimating the posterior distribution p(Z, α, γ, Π|X, A), which

in turn will allow us to compute MAP estimates of Z and (α, γ, Π),

as expected, this distribution is not tractable and approximate inference

procedures are required,

the use of MCMC methods is obviously an option but MCMC methods

have a poor scaling with sample sizes. We chose to use variational approaches:

because they allow to deal with large networks (N > 1000), recent theoretical results (Celisse et al., 2012; Mariadassou and Matias,

2013) gave new insights about convergence properties of variational approaches in this context.

22

slide-34
SLIDE 34

The VBEM algorithm

We aim at estimating the posterior distribution p(Z, θ|X):

we use the decomposition of the marginal log-likelihood:

log(p(X)) = L(q(Z, θ)) + KL(q(Z, θ)||p(Z, θ|X)), where:

L(q(Z, θ)) =

Z

  • θ q(Z, θ) log(p(X, Z, θ)/q(Z, θ))dθ is a lower bound of

the log-likelihood,

KL(q(Z, θ)||p(Z, θ|X)) = −

Z

  • θ q(Z, θ) log(p(Z, θ|X)/q(Z, θ))dθ is the

KL divergence between q(Z, θ) and p(Z, θ|X).

we also assume that q factorizes over Z and θ:

q(Z, θ) =

  • i

qi(Zi)qθ(θ).

23

slide-35
SLIDE 35

The VBEM algorithm

We aim at estimating the posterior distribution p(Z, θ|X):

we use the decomposition of the marginal log-likelihood:

log(p(X)) = L(q(Z, θ)) + KL(q(Z, θ)||p(Z, θ|X)), where:

L(q(Z, θ)) =

Z

  • θ q(Z, θ) log(p(X, Z, θ)/q(Z, θ))dθ is a lower bound of

the log-likelihood,

KL(q(Z, θ)||p(Z, θ|X)) = −

Z

  • θ q(Z, θ) log(p(Z, θ|X)/q(Z, θ))dθ is the

KL divergence between q(Z, θ) and p(Z, θ|X).

we also assume that q factorizes over Z and θ:

q(Z, θ) =

  • i

qi(Zi)qθ(θ). The VBEM algorithm:

VB-E step: qθ(θ) is fixed and L is maximized over the qi

⇒ log q∗

j (Zj) = Ei=j,θ[log p(X, Z, θ)] + c

VB-M step: all qi(Zi) are now fixed and L is maximized over qθ

⇒ log q∗

θ(θ) = EZ[log p(X, Z, θ)] + c

23

slide-36
SLIDE 36

Initialization and choice of K

Initialization of the VBEM algorithm:

the VBEM is known to be sensitive to its initialization, we propose a strategy based on several k-means algorithms with a

specific distance: d(i, j) =

N

  • h=1

δ(Xih = Xjh)AihAjh +

N

  • h=1

δ(Xhi = Xhj)AhiAhj.

24

slide-37
SLIDE 37

Initialization and choice of K

Initialization of the VBEM algorithm:

the VBEM is known to be sensitive to its initialization, we propose a strategy based on several k-means algorithms with a

specific distance: d(i, j) =

N

  • h=1

δ(Xih = Xjh)AihAjh +

N

  • h=1

δ(Xhi = Xhj)AhiAhj. Choice of the number K of groups:

  • nce the VBEM algorithm has converged, the lower bound L(q) is a

good approximation of the integrated log-likelihood log p(X, A),

we thus can use L(q) as a model selection criterion for choosing K, if computed right after the M step,

L(q) =

S

  • r,s

log( B(ars, brs) B(a0

rs, b0 rs) ) + S

  • s=1

log( C(χs) C(χ0

s) ) + K

  • k,l

log( C(Ξkl) C(Ξ0

kl) ) − N

  • i=1

K

  • k=1

τik log(τik).

24

slide-38
SLIDE 38

Outline

Introduction The stochastic block model (SBM) The random subgraph model (RSM) Analysis of an ecclesiastical network Extension to dynamic networks Conclusion

25

slide-39
SLIDE 39

The ecclesiastical network

The data:

1331 individuals (mostly clergymen) who

participated to ecclesiastical councils in Gaul between 480 and 614,

4 types of relationships between

individuals have been identified (positive, negative, variable or neutral),

each individual belongs to one of the 5

regions (3 kingdoms et 2 provinces). Our modeling allows a multi-level analysis:

Z allows to characterize the found clusters through social positions of the

individuals,

parameter Π describes the relations between the found clusters, parameter γ describes the connections between the subgraphs, parameter α describes the cluster repartition in the subgraphs. 26

slide-40
SLIDE 40

RSM results: the latent clusters

Bishop Priest Abbot Earl Duke Monk Deacon King Queen Archdeacon

Cluster 1 50 100 150 200 250

Bishop Priest Abbot Earl Duke Monk Deacon King Queen Archdeacon

Cluster 2 2 4 6 8

Bishop Priest Abbot Earl Duke Monk Deacon King Queen Archdeacon

Cluster 3 50 100 150

Bishop Priest Abbot Earl Duke Monk Deacon King Queen Archdeacon

Cluster 4 1 2 3 4 5 6

Bishop Priest Abbot Earl Duke Monk Deacon King Queen Archdeacon

Cluster 5 5 10 15 20

Bishop Priest Abbot Earl Duke Monk Deacon King Queen Archdeacon

Cluster 6 10 20 30 40

Figure: Characterization of the K = 6 clusters found by RSM.

27

slide-41
SLIDE 41

RSM results: the latent clusters

The latent clusters from the historical point of view:

clusters 1 and 3 correspond to local, provincial of diocesan councils,

mostly interested in local issues (ex: council of Arles, 554),

clusters 2 and 6 correspond to councils dedicated to political questions,

usually convened by a king (ex: Orleans, 511),

clusters 4 and 5 correspond to aristocratic assemblies, where queens and

duke and earls are present (ex: Orleans, 529).

28

slide-42
SLIDE 42

RSM results: the relationships between clusters

positive

cluster 1 cluster 2 cluster 3 cluster 4 cluster 5 cluster 6

negative

cluster 1 cluster 2 cluster 3 cluster 4 cluster 5 cluster 6

Figure: Characterization of the relationships between clusters (parameter Π).

29

slide-43
SLIDE 43

RSM results: the relationships between clusters

variable

cluster 1 cluster 2 cluster 3 cluster 4 cluster 5 cluster 6

neutral

cluster 1 cluster 2 cluster 3 cluster 4 cluster 5 cluster 6

Figure: Characterization of the relationships between clusters (parameter Π).

30

slide-44
SLIDE 44

RSM results: the relationships between clusters

The clusters relationships from the historical point of view:

positive relations between clusters 3, 5 and 6 mainly corresponds to

personal friendships between bishops (source effect),

negative and variable relations betweens clusters 4, 5 and 6 report the

conflicts in the hierarchy of the power,

neutral relations between clusters 1, 3 and 6 were expected because they

deal with different issues (local / political).

31

slide-45
SLIDE 45

RSM results: the relationships between regions

Neustria Provence Unknown Aquitaine Austrasia Burgundy Neustria Provence Unknown Aquitaine Austrasia Burgundy

1 2 3 4 5 6 1 2 3 4 5 6 −3.5 −3.0 −2.5 −2.0 −1.5

Figure: Characterization of the relationships between the regions (parameter γ in log scale).

32

slide-46
SLIDE 46

RSM results: comparison of the regions

Neustria Provence Unknown Aquitaine Austrasia Burgundy total Proportions 0.0 0.1 0.2 0.3 0.4 0.5 cluster 1 cluster 2 cluster 3 cluster 4 cluster 5 cluster 6

Figure: Characterization of regions through cluster repartition (parameter α).

33

slide-47
SLIDE 47

RSM results: comparison of the regions

−1 1 2 3 −1.5 −1.0 −0.5 0.0 0.5 Comp.1 Comp.2 Neustria Provence Unknown Aquitaine Austrasia Burgundy

Figure: PCA for compositional data on the parameter α.

34

slide-48
SLIDE 48

Outline

Introduction The stochastic block model (SBM) The random subgraph model (RSM) Analysis of an ecclesiastical network Extension to dynamic networks Conclusion

35

slide-49
SLIDE 49

Dynamic networks: a problem in geography

Clustering of dynamic networks is an increasing problem, since most of the

  • bserved networks are in fact not static.

36

slide-50
SLIDE 50

Dynamic networks: a problem in geography

Clustering of dynamic networks is an increasing problem, since most of the

  • bserved networks are in fact not static.

As an example, we will analyze a maritime flow network from 1870 to 2008:

Europe−Atlantic Asia−Pacific Middle East & Indian Ocean Med & Black Sea

1890 1946 1965 2008

36

slide-51
SLIDE 51

Dynamic networks: a problem in geography

Network in 1890 Network in 2008

Figure: Adjacency matrix of the maritime flow network organized by subgraph in 1890 and 2008.

37

slide-52
SLIDE 52

Only a few works in the literature

To date, only a few models have been proposed to deal with this kind of networks:

dynamic MMSBM by Xing et al., dynamic SBM by Yang et al., another dynamic SBM by Xu et al., dynamic LPCM by Sarkar et al., and a few others... 38

slide-53
SLIDE 53

Only a few works in the literature

To date, only a few models have been proposed to deal with this kind of networks:

dynamic MMSBM by Xing et al., dynamic SBM by Yang et al., another dynamic SBM by Xu et al., dynamic LPCM by Sarkar et al., and a few others...

Here, we extend the RSM model (Jernite et al., 2012) to be able to deal with dynamic networks with categorical edges and a known partition into subgraphs.

38

slide-54
SLIDE 54

The dRSM model: the model at time t

At time t, the network (represented by its adjacency matrix X(t)) is assumed to be generated as follows:

each node i is associated with an (unobserved) group among K

according to: Z(t)

i

∼ M(α(t)

si )

where α(t)

s

∈ [0, 1]K and K

k=1 α(t) sk = 1,

each edge X(t)

ij can have C + 1 different (observed) types (0 denotes the

absence of an edge) and such that: X(t)

ij |Z(t) ik Z(t) jl = 1 ∼ M(Πkl)

where Πkl ∈ [0, 1]C+1 and C

c=0 Πklc = 1.

39

slide-55
SLIDE 55

The dRSM model: modeling the evolution

We rely on a state space model to take into account the dynamic of the network:

we introduce the latent variable γ(t)

s

to link the group proportions over the time: α(t)

sk = exp(γ(t) sk )

C(γ(t)

s )

, where γ(t)

sK = 0 and C(γ(t) s ) = K ℓ=1 exp(γ(t) sℓ ),

γ(t)

s\K is further assumed to be distributed according to a normal

distribution with mean Bν(t) and covariance matrix Σ, γ(t)

s

∼ N(Bν(t), Σ).

40

slide-56
SLIDE 56

The dRSM model: modeling the evolution

The reminder of the modeling involves a classical state space model:

νt depends on νt−1 such that:

ν(t) = Aν(t−1) + ω(t), where:

ω(t) ∼ N(0, Φ), ν1 = µ0 + u, u ∼ N(0, v0). 41

slide-57
SLIDE 57

The dRSM model: modeling the evolution

The reminder of the modeling involves a classical state space model:

νt depends on νt−1 such that:

ν(t) = Aν(t−1) + ω(t), where:

ω(t) ∼ N(0, Φ), ν1 = µ0 + u, u ∼ N(0, v0).

To avoid model identifiability issues, we fixe A, B and v0 to be equal to the identity matrix IK−1 and all components of µ0 to zero in the numerical experiments.

41

slide-58
SLIDE 58

The dRSM model: modeling the evolution

X(t)

ij

Π

Z(t)

i

Z(t)

j

P γ(t)

B, Σ

ν(t)

µ0, A, Φ, ν0

  • Fig. Graphical representation of the dRSM model.

42

slide-59
SLIDE 59

Analysis of the maritime flow network: The data

We considered the data from Ducruet (2013):

data from Lloyd’s List (Voyage Record) covering the period 1890-2008 at

17 time points,

huge work to extract from paper versions and complement the lacks

(capacity, ...),

the data contains 176 095 vessels between 4472 ports but we had to

reduce to the 286 ports always existing,

4 types of relations between ports are considered: liquid bulk, passengers,

containers and solid bulk.

43

slide-60
SLIDE 60

Analysis of the maritime flow network: The data

Figure: Map of the ports and their maritime basin.

44

slide-61
SLIDE 61

Analysis of the maritime flow network: The results

  • 3

4 5 6 7 8 −288000 −286000 −284000 −282000 −280000 −278000 −276000 −274000

Choice of K

K BIC

Figure: Choice of the number of groups according to BIC.

45

slide-62
SLIDE 62

Analysis of the maritime flow network: The results

0.2% 1.3% 0.4% 0.6% 1.2% 9.3% 2.8% 1.3% 11.8% 2% 6% 6% 51.9% 23.6% 0.4% 2% 3.3% 0.3% 9.4% 37% 11.6% 0.6% 6% 0.3% 3.3% 1.2% 25.4% 9% 1.2% 6% 9.4% 1.2% 23.7% 68.9% 30.5% 9.3% 51.9% 37% 25.4% 68.9% 86.6% 83.6% 2.8% 23.6% 11.6% 9% 30.5% 83.6% 50.6% Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 Cluster 6 Cluster 7 Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 Cluster 6 Cluster 7

Connexion probabilities between clusters

Figure: Estimated values for Π1

kl.

46

slide-63
SLIDE 63

Analysis of the maritime flow network: The results

0.0 0.1 0.2 0.3 0.4 0.5

Subgraph 1 (Asia − Pacific)

Time Group proportions 1890 1930 1940 1951 1965 1975 1985 1995 2008 0.0 0.1 0.2 0.3 0.4

Subgraph 2 (Europe − Atlantic)

Time Group proportions 1890 1930 1940 1951 1965 1975 1985 1995 2008 0.0 0.1 0.2 0.3 0.4 0.5

Subgraph 3 (Medit. − Black Sea)

Time Group proportions 1890 1930 1940 1951 1965 1975 1985 1995 2008 0.0 0.1 0.2 0.3 0.4

Subgraph 4 (Middle East − India)

Time Group proportions 1890 1930 1940 1951 1965 1975 1985 1995 2008

G1 G2 G3 G4 G5 G6 G7

47

slide-64
SLIDE 64

Outline

Introduction The stochastic block model (SBM) The random subgraph model (RSM) Analysis of an ecclesiastical network Extension to dynamic networks Conclusion

48

slide-65
SLIDE 65

Conclusion

Our contributions:

the RSM model takes into account an existing partition into subgraphs, this modeling allows afterward a comparison of the subgraphs, the dRSM model allows to deal with evolving networks. 49

slide-66
SLIDE 66

Conclusion

Our contributions:

the RSM model takes into account an existing partition into subgraphs, this modeling allows afterward a comparison of the subgraphs, the dRSM model allows to deal with evolving networks.

Software: package Rambo for the R software is available on the CRAN Publication:

  • C. Bouveyron, L. Jegou, Y. Jernite, S. Lamassé, P. Latouche & P. Rivera, The

random subgraph model for the analysis of an ecclesiastical network in merovingian Gaul, The Annals of Applied Statistics, 8(1), 377-405, 2014.

  • C. Bouveyron, P. Latouche and R. Zreik, The Dynamic Random Subgraph Model for

the Clustering of Evolving Networks, Preprint HAL n°01122393, Laboratoire MAP5, Université Paris Descartes, 2015.

  • C. Bouveyron, C. Ducruet, P. Latouche and R. Zreik, Cluster Identification in

Maritime Flows with Stochastic Methods, in Maritime Networks: Spatial Structures and Time Dynamics, Routledge, 2015.

49