Statistical modelling of a terrorist network with the latent class - - PowerPoint PPT Presentation

statistical modelling of a terrorist network with the
SMART_READER_LITE
LIVE PREVIEW

Statistical modelling of a terrorist network with the latent class - - PowerPoint PPT Presentation

Statistical modelling of a terrorist network with the latent class model and Bayesian model comparisons Murray Aitkin and Duy Vu, and Brian Francis murray.aitkin@unimelb.edu.au duy.vu@unimelb.edu.au b.francis@lancaster.ac.uk School of


slide-1
SLIDE 1

Statistical modelling of a terrorist network with the latent class model and Bayesian model comparisons

Murray Aitkin and Duy Vu, and Brian Francis

murray.aitkin@unimelb.edu.au duy.vu@unimelb.edu.au b.francis@lancaster.ac.uk

School of Mathematics and Statistics, University of Melbourne and Department of Mathematics and Statistics, University of Lancaster, UK

BOB 2015 Terrorist network – p. 1

slide-2
SLIDE 2

Statistical modelling of social networks

Work supported by Australian Research Council 2004-7, 2012-15. Aim: to evaluate latent class modelling by maximum likelihood and Bayesian methods for the analysis of social network and criminal career data. Participants Murray Aitkin, Pip Pattison, Brian Francis, Duy Vu. Main contribution: identifying subgroups, their number and membership, by latent class modelling and Bayesian model comparison. Two examples:

  • Social network of the Natchez Mississippi women – see

Aitkin, M., Vu, D. and Francis, B.J. (2014). Statistical modelling of the group structure of social networks. Social Networks 38, 74-87.

  • Noordin Top terrorist network – Aitkin, M., Vu, D. and Francis, B.J.

(2016) – to appear in an RSS journal (A or C).

BOB 2015 Terrorist network – p. 2

slide-3
SLIDE 3

Where do network data come from?

For social networks – networks of people or other social creatures – from either

  • direct observation, or
  • indirect data gathering though newpapers and other recording

instruments. These sources of information provide the evidence of connections among actors. The connections are presented mathematically, and are analysed through properties of the mathematical structure.

BOB 2015 Terrorist network – p. 3

slide-4
SLIDE 4

Facebook friendship network (Wikipedia)

BOB 2015 Terrorist network – p. 4

slide-5
SLIDE 5

Unipartite and bipartite networks

The Facebook network is a unipartite network – it represents direct connections between the Facebook users. These connections may be directed (A likes B does not imply that B likes A) or undirected or reciprocal (A and B are connected through something, like Facebook). We discuss bipartite networks, in which the connections between actors are through their joint participation in events.

BOB 2015 Terrorist network – p. 5

slide-6
SLIDE 6

The Natchez social network

BOB 2015 Terrorist network – p. 6

slide-7
SLIDE 7

The adjacency matrix

To perform any analysis we need to re-express the table elements mathematically through a link, or tie variable Yij, with the presence of woman i at event j defining Yij = 1, and her absence from the event defining Yij = 0. We use n to denote the number of rows – women, and r to denote the number of columns – events. The resulting table is expressed as an n × r matrix, called the adjacency matrix, denoted by Y . Marginal totals (T) have been added to the table, giving the total number of events attended by each woman, and the total number of women attending each event. We see that women vary in their propensity to attend events, and events vary in their attractiveness to women. We also give the marital status of woman i in a variable xi, coded 1 for married and 0 for unmarried.

BOB 2015 Terrorist network – p. 7

slide-8
SLIDE 8

Two-mode network data

x W\E 1 2 3 4 5 6 7 8 9 10 11 12 13 14 T 1 1 1 1 1 1 1 1 1 1 1 8 2 1 1 1 1 1 1 1 7 3 1 1 1 1 1 1 1 1 8 4 1 1 1 1 1 1 1 7 5 1 1 1 1 4 6 1 1 1 1 4 7 1 1 1 1 4 8 1 1 1 3 9 1 1 1 1 4 10 1 1 1 1 4 11 1 1 1 1 1 5 12 1 1 1 1 1 1 6 1 13 1 1 1 1 1 1 1 7 1 14 1 1 1 1 1 1 1 1 8 1 15 1 1 1 1 1 5 1 16 1 1 2 1 17 1 1 2 1 18 1 1 2 T 3 3 6 4 8 9 10 14 12 5 4 6 3 3 90

BOB 2015 Terrorist network – p. 8

slide-9
SLIDE 9

Two-mode network data – zeros suppressed

x W\E 1 2 3 4 5 6 7 8 9 10 11 12 13 14 T 1 1 1 1 1 1 1 1 1 1 1 8 2 1 1 1 1 1 1 1 7 3 1 1 1 1 1 1 1 1 8 4 1 1 1 1 1 1 1 7 5 1 1 1 1 4 6 1 1 1 1 4 7 1 1 1 1 4 8 1 1 1 3 9 1 1 1 1 4 10 1 1 1 1 4 11 1 1 1 1 1 5 12 1 1 1 1 1 1 6 1 13 1 1 1 1 1 1 1 7 1 14 1 1 1 1 1 1 1 1 8 1 15 1 1 1 1 1 5 1 16 1 1 2 1 17 1 1 2 1 18 1 1 2 T 3 3 6 4 8 9 10 14 12 5 4 6 3 3 90

BOB 2015 Terrorist network – p. 9

slide-10
SLIDE 10

18 Actors 14 Events

Original Matrix

BOB 2015 Terrorist network – p. 10

slide-11
SLIDE 11

18 Actors 14 Events

Random Shuffled Matrix

BOB 2015 Terrorist network – p. 11

slide-12
SLIDE 12

Probability models for actors and events

Analysis needs to allow for uncertainty in the behaviour of actors: even if they form an established group with other actors, this does not mean that they all attend the same events. We consider the presence or absence of an actor at an event as a random process – attendance is determined by a possibly large number of factors unknown to us, so we represent the process

  • utcome as a Bernoulli random variable:

The probability that actor i attends event j, (Yij = 1), is pij, and that actor i does not attend event j, (Yij = 0), is 1 − pij. We want to bring the actors and event structures into the event attendance probability in some way.

BOB 2015 Terrorist network – p. 12

slide-13
SLIDE 13

Models

The “null" model is a single-parameter model, giving the same constant probability pij = p that every actor attends every event, independently across events and actors – all actors have the same attendance probability, and all events have the same attraction probability. The Rasch model has a parameter for each actor and a parameter for each event:

  • Each actor i has a propensity θi to attend any event.
  • Each event j has an attractiveness φj to any actor.
  • Actors attend events independently.
  • The Rasch model is a main effect or additive exponential random

graph model (ERGM), in events and actors, on the logit-transformed probability scale: logit pij = log

  • pij

1 − pij

  • = θi + φj.

It has no subgroup structure.

BOB 2015 Terrorist network – p. 13

slide-14
SLIDE 14

The latent class model

This model specifies a K-class latent structure for actors. The K classes are distinguished by K sets of event attendance parameters qjk, different among classes, but identical within classes. The proportion of actors in class k is πk; θK = (K, {πk}, {qjk}). The class structure is unobserved; it is implied and identified by the actors’ different patterns of event attendance. The (observed data) likelihood L(θK) – the probability of the observed data – is given by Pr[{yij} | k, i] =

r

  • j=1

qyij

jk (1 − qjk)1−yij

Pr[{yij} | i] =

K

  • k=1

 πk

r

  • j=1

qyij

jk (1 − qjk)1−yij

  L(θK) = Pr[{yij}] =

n

  • i=1

  

K

  • k=1

 πk

r

  • j=1

qyij

jk (1 − qjk)1−yij

     .

BOB 2015 Terrorist network – p. 14

slide-15
SLIDE 15

Analysis with the complete data likelihood

Bayesian analysis is greatly simplified by introducing counterfactual missing data: the class identification of each actor. We define Zik = 1 if actor i belongs to class k, and zero otherwise, with (prior) probability πk. If the complete data yij and Zik were observed, the complete data likelihood CL(θK) for the K-class model would be CL(θK) = Pr[{yij}, {Zik}] = Pr[{yij}|{Zik}] · Pr[{Zik}] =  

n

  • i=1

r

  • j=1

K

  • k=1
  • qyij

jk (1 − qjk)1−yijZik

  · n

  • i=1

K

  • k=1

πZik

k

  • =

n

  • i=1

K

  • k=1

 πk

r

  • j=1

qyij

jk (1 − qjk)1−yij

 

Zik

.

BOB 2015 Terrorist network – p. 15

slide-16
SLIDE 16

MCMC analysis

MCMC iterates between making

  • random draws of the Zik given the current parameter draws, and
  • random draws of the parameters given the current Zik draws.

With flat or non-informative priors on the parameters and the Zik, the conditional distributions can be inferred from the complete data likelihood:

  • the Zik given the parameters are multinomial with probabilities

proportional to πk r

j=1 qyij jk (1 − qjk)1−yij;

  • the parameters given the Zik:
  • the πk are Dirichlet with parameters Z+k = n

i=1 Zik;

  • the qjk are Beta with parameters

n

i=1 Zikyij, n i=1 Zik(1 − yij).

BOB 2015 Terrorist network – p. 16

slide-17
SLIDE 17

Bayesian model comparison

  • Bayesian theory allows us to decide which of these (or other)

models is the most plausible for a given data set, through the deviance distributions for the competing models (deviance = -2 log likelihood = “badness of fit of the model").

  • This approach
  • (Dempster 1997, Aitkin 1997, Aitkin, Boys and Chadwick

2005, Aitkin 2010)

  • is to use the posterior distributions of the likelihoods,
  • by substituting M (typically 10,000) random draws θ[m]

k

  • f the

parameters θk (k = 1,...,K) from their posterior distributions

  • into the (observed data) likelihoods L(θk),
  • giving M corresponding random draws L[m]

k

= L(θ[m]

k

) from the posterior distributions of the observed data likelihoods.

BOB 2015 Terrorist network – p. 17

slide-18
SLIDE 18

Model comparison through posterior deviances

  • Because of the scale of likelihoods, we use (observed data)

deviances Dk(θk) = −2 log Lk(θk) rather than likelihoods L.

  • Models are compared for the stochastic ordering of their posterior

deviance distributions,

  • initially by graphing the cdfs of the deviance draws

D[m]

k

= Dk(θ[m]

k

) for each number of components;

  • the left-most cdf defines the best-supported model.

BOB 2015 Terrorist network – p. 18

slide-19
SLIDE 19

DIC

The DIC of Spiegelhalter et al (2002), implemented in BUGS, also uses deviance draws, but these are of the complete data deviance rather than the observed data deviance, and are used only to compute the mean complete data deviance across the draws. The complete data likelihood and deviance treat the latent class as an

  • bserved structure, overstating the data information.

The DIC, like AIC and BIC and some other decision criteria, requires a penalty (in this case using the effective number of parameters) on the mean complete data deviance to account for this overstatement. This is not needed for the comparison of observed data deviance distributions: models with increasing numbers of components are effectively penalized for their increasing parametrization, as they have increasingly diffuse deviance distributions because of the decreasing data information about each component. But does it work? – see later.

BOB 2015 Terrorist network – p. 19

slide-20
SLIDE 20

Class membership probabilities

The Bayesian analysis also provides the full posterior distribution of the class membership probabilities. The probability of membership of actor i in class k, given the data, follows from Bayes’s theorem: πik|data = Pr(i ∈ class k | data) = πk r

j=1 qyij jk (1 − qjk)1−yij

K

k=1

  • πk

r

j=1 qyij jk (1 − qjk)1−yij

  • Substituting the parameter draws θ[m]

k

= (π[m]

k

, qm]

jk ) into the

membership probability gives its posterior distribution from the M values π[m]

ik|data.

Label-switching can be a problem.

BOB 2015 Terrorist network – p. 20

slide-21
SLIDE 21

The Natchez women

200 220 240 260 280 300 320 0.0 0.2 0.4 0.6 0.8 1.0

Asymptotic Deviances

Deviance CDF Rasch K = 2 K = 3

BOB 2015 Terrorist network – p. 21

slide-22
SLIDE 22

The Natchez women

The two-class latent class model fits better than the Rasch, and the three-class model does not fit any better than the two-class model. We conclude that the two-class model is best.

BOB 2015 Terrorist network – p. 22

slide-23
SLIDE 23

Posterior membership distributions for women 1–9

BOB 2015 Terrorist network – p. 23

slide-24
SLIDE 24

Posterior membership distributions for women 10–18

BOB 2015 Terrorist network – p. 24

slide-25
SLIDE 25

Summary

  • Women 1–6 clearly belong to class 1.
  • Women 10–18 clearly belong to class 2.
  • Women 7–9 have grades of membership in both classes:
  • Woman 7 93% in class 1, 7% in class 2.
  • Woman 8 11% in class 1, 89% in class 2.
  • Woman 9 43% in class 1, 57% in class 2.

Woman 9 was claimed by both classes in interviews; woman 8 was placed in a separate class in many analyses.

BOB 2015 Terrorist network – p. 25

slide-26
SLIDE 26

The Noordin Top terrorist network (Wikipedia)

  • Noordin Mohammad Top, a Malaysian citizen was a Muslim

extremist, and Indonesia’s most wanted Islamist militant.

  • He is thought to have been a key bomb maker and/or financier.
  • Noordin and Azahari Husin were thought to have masterminded
  • the 2003 Marriott hotel bombing in Jakarta,
  • the 2004 Australian embassy bombing in Jakarta,
  • the 2005 Bali bombings and the 2009 JW Marriott-Ritz-Carlton

bombings,

  • and Noordin may have assisted in the 2002 Bali bombings.
  • Noordin was an indoctrinator who specialized in recruiting

militants into becoming suicide bombers and collecting funds for militant activities.

  • Husin was killed in a police raid on his hideout in Batu, near

Malang in East Java on 9 November 2005.

  • Top was killed during a police raid in Solo, Central Java, on 17

September 2009 conducted by an Indonesian anti-terrorist team.

BOB 2015 Terrorist network – p. 26

slide-27
SLIDE 27

Data source

  • The data come from the book Disrupting Dark Networks by

Everton,

  • in the Structural Analysis in the Social Science Series, Cambridge

(2012).

  • Appendix 1 of this book provides data:
  • The subset of the Noordin Top Terrorist Network was

drawn primarily from Terrorism in Indonesia: Noordin’s Networks,

  • a 2006 publication of the International Crisis Group.
  • It includes relational data on 79 individuals listed in

Appendix C of that publication.

  • The data were initially coded as 45 binary items.

Our analysis is restricted to 75 of these individuals: four individuals were eliminated as they were not present at any of the 45 events used in the analysis.

BOB 2015 Terrorist network – p. 27

slide-28
SLIDE 28

74 Actors 45 Events

Original Matrix

BOB 2015 Terrorist network – p. 28

slide-29
SLIDE 29

Interpretation

The appearance of the full 74×45 adjacency matrix is quite different from that for the Natchez women. It is very sparse, and appears random – without the structure we can see in the Natchez women network matrix. Re-ordering the rows and columns does not change much the appearance of the matrix – there is no clear division into sub-groups.

BOB 2015 Terrorist network – p. 29

slide-30
SLIDE 30

The Noordin network

1700 1750 1800 0.0 0.2 0.4 0.6 0.8 1.0

Noordin

Deviance CDF Rasch K = 2 K = 3 K = 4

BOB 2015 Terrorist network – p. 30

slide-31
SLIDE 31

The Noordin network

The three-class model fits better than the two-class or Rasch, and the four-class model does not fit any better than the three-class. We conclude that the three-class model is best. Which actors are in which classes? We need a probabilistic expression for this – the ternary plot.

BOB 2015 Terrorist network – p. 31

slide-32
SLIDE 32

Ternary plot

G 1 G 2 G 3

Azahari Husin 17 Noordin Mohammed Top 23 7 6 5 9 5 5 5 5 5 5 9 5 6 6 5 5 6

BOB 2015 Terrorist network – p. 32

slide-33
SLIDE 33

The Noordin network

Top and Husin define class 1 – the planners and leaders. Class 3 contains all the actors who attended 6-9 events – the trainers who meet the planners and train the footsoldiers. Actors who attended 5 or fewer events are spread along the class 2-3 axis – class 2 are the footsoldiers who are present at organisation, training and operations – never at finance, meetings or logistics. The division beween classes 2 and 3 is not absolute. This may be because of the short “working life" of many actors, killed in actions or arrested and in prison. Trainers who are killed in actions or imprisoned, may be replaced by footsoldiers who have survived actions. Of the 74 actors, 45 were dead or in prison by 2006.

BOB 2015 Terrorist network – p. 33

slide-34
SLIDE 34

What works in Bayesian model comparison?

  • AIC chooses over-complex models asymptotically.
  • BIC chooses the true model asymptotically but chooses

under-complex models in finite samples.

  • Simulation studies with the posterior deviance distribution:
  • comparison of normal mixture models with 1–7 components
  • n samples generated from each model (Aitkin, Vu and

Francis 2015), derived from real astronomical galaxy recession velocity data, with DIC comparison: Aitkin, M., Vu, D. and Francis, B.J. (2015). A new Bayesian approach for determining the number of components in a finite

  • mixture. Metron DOI :10.1007/s40300-015-0068-1.
  • comparison of four single-population models – normal,

lognormal, gamma and multinomial, on samples generated from each model (Aitkin 2010 pp. 188-192), derived from real family income data.

BOB 2015 Terrorist network – p. 34

slide-35
SLIDE 35

Galaxy recession velocity data

82 observations of recession velocities of galaxies, modelled as a mixture of K normals with different means and variances (shown with single normal cdf).

10 15 20 25 30 35 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

velocity cdf

BOB 2015 Terrorist network – p. 35

slide-36
SLIDE 36

Simulations

100 samples of sizes 82, 164, 328, 656 were generated from normal mixture distributions for K = 1,...,7 components. The means, variances and proportions for the simulated data were equal to the MLEs of the galaxy data. For each sample the posterior deviance for each model was computed for each of 1,000 draws, and for each draw the model with the smallest deviance was called “best". The graphs show the percentages in the 100 samples of “best model" for each K and sample size, by both the DIC and the posterior deviance:

  • DIC (solid), posterior deviance (dash).
  • Panel 1 sample size 82, panel 2 (across) sample size 164, panel

3 (under) sample size 328, panel 4 sample size 656.

BOB 2015 Terrorist network – p. 36

slide-37
SLIDE 37

Galaxy data simulations

1 2 3 4 5 6 7 10 20 30 40 50 60 70 80 90 100

K percent correct

1 2 3 4 5 6 7 10 20 30 40 50 60 70 80 90 100

K percent correct

1 2 3 4 5 6 7 10 20 30 40 50 60 70 80 90 100

K percent correct

1 2 3 4 5 6 7 10 20 30 40 50 60 70 80 90 100

K percent correct

BOB 2015 Terrorist network – p. 37

slide-38
SLIDE 38

Simulation results

  • The DIC performed well for up to 3 components but poorly for

more than 3.

  • The posterior deviance worked uniformly better: its correct

identification probabilities increased steadily with increasing sample size. The posterior deviance performed similarly in simulations from latent class models with varying numbers of classes (Aitkin, Vu and Francis 2015).

BOB 2015 Terrorist network – p. 38

slide-39
SLIDE 39

Family income data

50 100 150 200 250 0.00 7.00 14.00 21.00 28.00 35.00

INCOME in hundreds count

BOB 2015 Terrorist network – p. 39

slide-40
SLIDE 40

Model fit

50 100 150 200 250 25 50 75 100 125 150 175 200 225 250

normal population deviate income deviate

2.5 3.0 3.5 4.0 4.5 5.0 5.5 2.5 3.0 3.5 4.0 4.5 5.0 5.5

lognormal population deviate log income deviate

50 100 150 200 250 25 50 75 100 125 150 175 200 225 250

gamma population deviate income deviate

BOB 2015 Terrorist network – p. 40

slide-41
SLIDE 41

Simulations

1,000 samples of sizes from 10 to 1,000 were generated from normal, lognormal, gamma and multinomial populations. The mean and variance for the parametric models were equal to those

  • f the income data population.

For each sample the posterior deviance for each model was computed for each of 10,000 draws, and for each draw the model with the smallest deviance was called “best". The graphs show the proportions in the 1,000 samples of “best model" for each of the four:

  • normal (solid), lognormal (dotted), gamma (dot-dash), multinomial

(dash).

  • Panel 1 true normal, panel 2 (across) true lognormal, panel 3

(under) true gamma, panel 4 true multinomial.

BOB 2015 Terrorist network – p. 41

slide-42
SLIDE 42

Model choice performance

200 400 600 800 1000 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

sample size probability

200 400 600 800 1000 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55

sample size probability

200 400 600 800 1000 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55

sample size probability

50 100 150 200 250 300 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

sample size probability

BOB 2015 Terrorist network – p. 42

slide-43
SLIDE 43

Simulation results

  • For the true multinomial, by sample size 120 the multinomial

model was best, and its probability increased rapidly with increasing sample size.

  • For the parametric models:
  • the true normal was best for all sample sizes,
  • the true lognormal was best for sample sizes greater than 120,
  • the true gamma was best for sample sizes greater than 50.
  • the multinomial always had low probability for small sample

sizes, but this approached 0.5 in large samples.

BOB 2015 Terrorist network – p. 43

slide-44
SLIDE 44

References

Aitkin, M. (1997). The calibration of P-values, posterior Bayes factors and the AIC from the posterior distribution of the likelihood (with Discussion). Statistics and Computing 7, 253-272. Aitkin, M. (2001). Likelihood and Bayesian analysis of mixtures. Statistical Modelling 1, 287–304. Aitkin, M., Boys, R.J. and Chadwick, T. (2005). Bayesian point null hypothesis testing via the posterior likelihood ratio. Statistics and Computing 15, 217-230. Aitkin, M. (2010). Statistical Inference: an Integrated Likelihood/ Bayesian Approach. Chapman and Hall/CRC Press, Boca Raton FL. Aitkin, M. (2011). How many components in a finite mixture? In Mixtures: Estimation and Applications, eds. K.L. Mengersen, C.P . Robert and D.M. Titterington. Wiley, Chichester, 277-292. Aitkin, M., Vu, D. and Francis, B.J. (2014). Statistical modelling of the group structure of social networks. Social Networks 38, 74-87. Aitkin, M., Vu, D. and Francis, B.J. (2015). A new Bayesian approach for determining the number of components in a finite mixture. Metron DOI :10.1007/s40300-015-0068-1.

BOB 2015 Terrorist network – p. 44

slide-45
SLIDE 45

References

Davis, A., Gardner, B.B. and Gardner, M.R. (1941). Deep South: A Social Anthropological Study of Caste and Class. Chicago: University Press. Dempster, A. P . (1997). The direct use of likelihood in significance testing. Statistics and Computing 7, 247-252. Everton, S.F . (2012). Disrupting Dark Networks. Cambridge: University Press. Spiegelhalter, D.J., Best, N.G., Carlin, B.P . and van der Linde, A. (2002). Bayesian measures of model complexity and fit (with Discussion). Journal of the Royal Statistical Society B 64, 583-639.

BOB 2015 Terrorist network – p. 45