Bayesian nonparametric models for bipartite graphs Fran cois Caron - - PowerPoint PPT Presentation

bayesian nonparametric models for bipartite graphs
SMART_READER_LITE
LIVE PREVIEW

Bayesian nonparametric models for bipartite graphs Fran cois Caron - - PowerPoint PPT Presentation

Bayesian nonparametric models for bipartite graphs Fran cois Caron Department of Statistics, Oxford Statistics Colloquium, Harvard University November 11, 2013 F. Caron 1 / 27 Bipartite networks Readers/Customers A 1 A 2 B 1 B 2 B 3 B 4


slide-1
SLIDE 1

Bayesian nonparametric models for bipartite graphs

Fran¸ cois Caron

Department of Statistics, Oxford

Statistics Colloquium, Harvard University

November 11, 2013

  • F. Caron

1 / 27

slide-2
SLIDE 2

Bipartite networks

B1 B2 B3 B4 A1 A2

Readers/Customers

◮ Scientists authoring papers ◮ Readers reading books ◮ Internet users posting messages on forums ◮ Customers buying items ◮ Objects sharing a set of features

  • F. Caron

2 / 27

slide-3
SLIDE 3

Bipartite networks

B1 B2 B3 B4 A1 A2

Readers/Customers

◮ Scientists authoring papers ◮ Readers reading books ◮ Internet users posting messages on forums ◮ Customers buying items ◮ Objects sharing a set of features

  • F. Caron

2 / 27

slide-4
SLIDE 4

Bipartite networks

B1 B2 B3 B4 A1 A2

Readers/Customers Readers Books

◮ Scientists authoring papers ◮ Readers reading books ◮ Internet users posting messages on forums ◮ Customers buying items ◮ Objects sharing a set of features

  • F. Caron

2 / 27

slide-5
SLIDE 5

Bipartite networks

B1 B2 B3 B4 A1 A2

Readers/Customers Readers Books ? ?

◮ Scientists authoring papers ◮ Readers reading books ◮ Internet users posting messages on forums ◮ Customers buying items ◮ Objects sharing a set of features

  • F. Caron

2 / 27

slide-6
SLIDE 6

Bipartite networks

B1 B2 B3 B4 A1 A2

Readers/Customers Readers Books

◮ Scientists authoring papers ◮ Readers reading books ◮ Internet users posting messages on forums ◮ Customers buying items ◮ Objects sharing a set of features

  • F. Caron

2 / 27

slide-7
SLIDE 7

Bipartite networks

B1 B2 B3 B4 A1 A2 A3

Readers/Customers Readers Books

◮ Scientists authoring papers ◮ Readers reading books ◮ Internet users posting messages on forums ◮ Customers buying items ◮ Objects sharing a set of features

  • F. Caron

2 / 27

slide-8
SLIDE 8

Bipartite networks

B1 B2 B3 B4 A1 A2 A3

Readers/Customers Readers Books ?

◮ Scientists authoring papers ◮ Readers reading books ◮ Internet users posting messages on forums ◮ Customers buying items ◮ Objects sharing a set of features

  • F. Caron

2 / 27

slide-9
SLIDE 9

Book-crossing community network

5 000 readers, 36 000 books, 50 000 edges

  • F. Caron

3 / 27

slide-10
SLIDE 10

Book-crossing community network

Degree distributions on log-log scale

10 10

1

10

2

10

3

10

4

10

−7

10

−6

10

−5

10

−4

10

−3

10

−2

10

−1

10

Degree Distribution

(a) Readers

10 10

1

10

2

10

−7

10

−6

10

−5

10

−4

10

−3

10

−2

10

−1

10

Degree Distribution

(b) Books

  • F. Caron

4 / 27

slide-11
SLIDE 11

Statistical network models

◮ Statistics literature

◮ Exponential random graph, stochastic block-models, Rasch models, etc ◮ Do not capture power-law behavior ◮ Inference do not scale well with the number of nodes

◮ Physics literature

◮ Preferential attachment ◮ Lacks interpretable parameters, non-exchangeability

  • F. Caron

5 / 27

slide-12
SLIDE 12

Bayesian nonparametrics

◮ Parameter of interest is infinite-dimensional ◮ Allows the complexity of the model to adapt to the data

◮ Dirichlet Process Mixtures: Clustering/density estimation with

unknown number of modes

◮ Attractive power-law properties

◮ Language modeling, image segmentation

[Teh, 2006; Sudderth and Jordan, 2008; Blunsom and Cohn, 2011]

  • F. Caron

6 / 27

slide-13
SLIDE 13

BNP for networks

◮ Models with some latent structure (e.g. infinite relational model)

◮ Number of nodes is fixed and dimension of the latent structure

unknown

◮ Here: Infinite number of nodes ◮ (stable) Beta-Bernoulli/Indian Buffet Process

◮ Can capture power-law degree distributions for books ◮ Poisson degree distribution for readers

[Griffiths and Ghahramani, 2005, Teh and G¨

ur, 2009]

  • F. Caron

7 / 27

slide-14
SLIDE 14

Bipartite networks

Aims

◮ Bayesian nonparametric model for bipartite networks with a

potentially infinite number of nodes of each type

◮ Each node is modelled using a positive rating parameter that

represents its ability to connect to other nodes

◮ Captures power-law behavior ◮ Simple generative model for network growth ◮ Develop efficient computational procedure for posterior simulation.

  • F. Caron

8 / 27

slide-15
SLIDE 15

Hierarchical model

◮ Represent a bipartite network by a collection of atomic measures Zi,

i = 1, 2, . . . such that Zi =

  • j=1

zijδθj

◮ zij = 1 if reader i has read book j, 0 otherwise ◮ {θj} is the set of books

◮ Each book j is assigned a positive “popularity” parameter wj ◮ Each reader i is assigned a positive “interest in reading” parameter γi ◮ The probability that reader i reads book j is

P (zij = 1|γi, wj) = 1 − exp(−wjγi)

  • F. Caron

9 / 27

slide-16
SLIDE 16

Hierarchical model

◮ Represent a bipartite network by a collection of atomic measures Zi,

i = 1, 2, . . . such that Zi =

  • j=1

zijδθj

◮ zij = 1 if reader i has read book j, 0 otherwise ◮ {θj} is the set of books

◮ Each book j is assigned a positive “popularity” parameter wj ◮ Each reader i is assigned a positive “interest in reading” parameter γi ◮ The probability that reader i reads book j is

P (zij = 1|γi, wj) = 1 − exp(−wjγi)

  • F. Caron

9 / 27

slide-17
SLIDE 17

Hierarchical model

◮ Represent a bipartite network by a collection of atomic measures Zi,

i = 1, 2, . . . such that Zi =

  • j=1

zijδθj

◮ zij = 1 if reader i has read book j, 0 otherwise ◮ {θj} is the set of books

◮ Each book j is assigned a positive “popularity” parameter wj ◮ Each reader i is assigned a positive “interest in reading” parameter γi ◮ The probability that reader i reads book j is

P (zij = 1|γi, wj) = 1 − exp(−wjγi)

  • F. Caron

9 / 27

slide-18
SLIDE 18

Hierarchical model

◮ Represent a bipartite network by a collection of atomic measures Zi,

i = 1, 2, . . . such that Zi =

  • j=1

zijδθj

◮ zij = 1 if reader i has read book j, 0 otherwise ◮ {θj} is the set of books

◮ Each book j is assigned a positive “popularity” parameter wj ◮ Each reader i is assigned a positive “interest in reading” parameter γi ◮ The probability that reader i reads book j is

P (zij = 1|γi, wj) = 1 − exp(−wjγi)

  • F. Caron

9 / 27

slide-19
SLIDE 19

Data Augmentation

◮ Latent variable formulation

◮ Latent scores sij ∼ Gumbel(log(wj), 1) ◮ All books with a score above − log(γi) are retained, others are

discarded

0.5 1 1.5 2 2.5 3 5 10 15 20 25 30

popularity books

−8 −6 −4 −2 2 4 5 10 15 20 25 30

score books − log(γi)

  • F. Caron

10 / 27

slide-20
SLIDE 20

Model for the book popularity parameters

◮ Random atomic measure

G =

  • j=1

wjδθj

◮ Construction: two-dimensional Poisson process N = {wj, θj}j=1,... ◮ Completely Random Measure G ∼ CRM(λ, h) characterized by a

L´ evy measure λ(w)h(θ)dwdθ ∞ (1 − e−w)λ(w)dw < ∞ ⇒ finite total

  • j=1

zij.

[Kingman, 1967, Regazzini et al., 2003, Lijoi and Pr¨ unster, 2010]

  • F. Caron

11 / 27

slide-21
SLIDE 21

Posterior characterization

◮ Observed bipartite network Z1, . . . , Zn ◮ n readers and K books with degree at least one ◮ Cannot derive directly the conditional of G given Z1, . . . , Zn nor the

predictive of Zn+1 given Z1, . . . , Zn

◮ Let

Xi =

  • j=1

xijδθj where xij = max(0, sij + log(γi)) ≥ 0 are latent positive scores.

−8 −6 −4 −2 2 4 5 10 15 20 25 30

score books − log(γi)

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5 10 15 20 25 30

censored score books

  • F. Caron

12 / 27

slide-22
SLIDE 22

Posterior Characterization

The conditional distribution of G given X1, . . . Xn can be expressed as G = G∗ +

K

  • j=1

wjδθj where G∗ and (wj) are mutually independent with G∗∼ CRM(λ∗, h), λ∗(w) = λ(w) exp

  • −w

n

  • i=1

γi

  • and the masses are

P (wj|other) ∝ λ(wj)wmj

j

exp

  • −wj

n

  • i=1

γie−xij

  • Characterization related to that for normalized random measures

[Pr¨ unster, 2002, James, 2002, James et al., 2009]

  • F. Caron

13 / 27

slide-23
SLIDE 23

Generative Process for network growth

Predictive distribution of Zn+1 given the latent process X1, . . . , Xn

Reader 1

Books

A1

  • F. Caron

14 / 27

slide-24
SLIDE 24

Generative Process for network growth

Predictive distribution of Zn+1 given the latent process X1, . . . , Xn

Reader 1

Books

... A1 B3 B2 B1

  • F. Caron

14 / 27

slide-25
SLIDE 25

Generative Process for network growth

Predictive distribution of Zn+1 given the latent process X1, . . . , Xn

Reader 1

Books

18 4 14 ... A1 B3 B2 B1

  • F. Caron

14 / 27

slide-26
SLIDE 26

Generative Process for network growth

Predictive distribution of Zn+1 given the latent process X1, . . . , Xn

Reader 1

Books

18 4 14 ... Reader 2 A1 A2 B3 B2 B1

  • F. Caron

14 / 27

slide-27
SLIDE 27

Generative Process for network growth

Predictive distribution of Zn+1 given the latent process X1, . . . , Xn

Reader 1

Books

18 4 14 ... Reader 2 A1 A2 B3 B2 B1

  • F. Caron

14 / 27

slide-28
SLIDE 28

Generative Process for network growth

Predictive distribution of Zn+1 given the latent process X1, . . . , Xn

Reader 1

Books

18 4 14 ... Reader 2 ... A1 A2 B3 B2 B1 B4 B5

  • F. Caron

14 / 27

slide-29
SLIDE 29

Generative Process for network growth

Predictive distribution of Zn+1 given the latent process X1, . . . , Xn

Reader 1

Books

18 4 14 ... Reader 2 12 8 13 4 ... A1 A2 B3 B2 B1 B4 B5

  • F. Caron

14 / 27

slide-30
SLIDE 30

Generative Process for network growth

Predictive distribution of Zn+1 given the latent process X1, . . . , Xn

Reader 1

Books

18 4 14 ... Reader 2 12 8 13 4 ... Reader 3 A1 A2 A3 B3 B2 B1 B4 B5

  • F. Caron

14 / 27

slide-31
SLIDE 31

Generative Process for network growth

Predictive distribution of Zn+1 given the latent process X1, . . . , Xn

Reader 1

Books

18 4 14 ... Reader 2 12 8 13 4 ... Reader 3 A1 A2 A3 B3 B2 B1 B4 B5

  • F. Caron

14 / 27

slide-32
SLIDE 32

Generative Process for network growth

Predictive distribution of Zn+1 given the latent process X1, . . . , Xn

Reader 1

Books

18 4 14 ... Reader 2 12 8 13 4 ... Reader 3 ... A1 A2 A3 B3 B2 B1 B4 B5 B6 B7

  • F. Caron

14 / 27

slide-33
SLIDE 33

Generative Process for network growth

Predictive distribution of Zn+1 given the latent process X1, . . . , Xn

Reader 1

Books

18 4 14 ... Reader 2 12 8 13 4 ... Reader 3 16 10 14 9 6 ... A1 A2 A3 B3 B2 B1 B4 B5 B6 B7

  • F. Caron

14 / 27

slide-34
SLIDE 34

Prior Draws

Generalized Gamma process with λ(w) =

α Γ(1−σ)w−σ−1e−τw, τ = 1, γi = 2.

Books Readers 20 40 60 80 5 10 15 20 25 30

(c) α = 1, σ = 0

Books Readers 20 40 60 80 5 10 15 20 25 30

(d) α = 5, σ = 0

Books Readers 20 40 60 80 5 10 15 20 25 30

(e) α = 10, σ = 0

Books Readers 20 40 60 80 5 10 15 20 25 30

(f) α = 2, σ = 0.1

Books Readers 20 40 60 80 5 10 15 20 25 30

(g) α = 2, σ = 0.5

Books Readers 20 40 60 80 5 10 15 20 25 30

(h) α = 2, σ = 0.9

[Brix, 1999, Lijoi et al., 2007]

  • F. Caron

15 / 27

slide-35
SLIDE 35

Properties of the model

◮ Power-law behavior for the generalized gamma process with σ > 0

◮ The total number of books read by n readers is O(nσ) ◮ Asympt., the proportion of books read by m readers is O(m−1−σ)

◮ (stable) Beta-Bernoulli/Indian Buffet process as a special case when

λ(w) = αΓ(1 + c) Γ(1 − σ)Γ(σ + c)γ(1 − e−γw)−σ−1e−γw(c+σ)

  • F. Caron

16 / 27

slide-36
SLIDE 36

Bayesian Inference via Gibbs Sampling

B1 B2 B3 B4 A1 A2

Readers/Customers Readers Books

◮ Popularity parameters wj of observed books. ◮ Sum w∗ of popularity parameters of unobserved books. ◮ Latent scores xij associated to observed edges. ◮ Posterior distribution P ({wj}, w∗, {xij}|Z1, . . . , Zn)

Gibbs sampler for the GGP xij|rest ∼ Truncated Gumbel wj|rest ∼ Gamma w∗|rest ∼ Exponentially tilted stable

[Devroye, 2009]

  • F. Caron

17 / 27

slide-37
SLIDE 37

Bayesian Inference via Gibbs Sampling

B1 B2 B3 B4

w1 w2 w3 w4

A1 A2

Readers/Customers Readers Books

◮ Popularity parameters wj of observed books. ◮ Sum w∗ of popularity parameters of unobserved books. ◮ Latent scores xij associated to observed edges. ◮ Posterior distribution P ({wj}, w∗, {xij}|Z1, . . . , Zn)

Gibbs sampler for the GGP xij|rest ∼ Truncated Gumbel wj|rest ∼ Gamma w∗|rest ∼ Exponentially tilted stable

[Devroye, 2009]

  • F. Caron

17 / 27

slide-38
SLIDE 38

Bayesian Inference via Gibbs Sampling

B1 B2 B3 B4

w1 w2 w3 w4 w∗

A1 A2

Readers/Customers Readers Books

◮ Popularity parameters wj of observed books. ◮ Sum w∗ of popularity parameters of unobserved books. ◮ Latent scores xij associated to observed edges. ◮ Posterior distribution P ({wj}, w∗, {xij}|Z1, . . . , Zn)

Gibbs sampler for the GGP xij|rest ∼ Truncated Gumbel wj|rest ∼ Gamma w∗|rest ∼ Exponentially tilted stable

[Devroye, 2009]

  • F. Caron

17 / 27

slide-39
SLIDE 39

Bayesian Inference via Gibbs Sampling

B1 B2 B3 B4

w1 w2 w3 w4 w∗

A1 A2

Readers/Customers Readers Books x11 x12 x13 x24 x23

◮ Popularity parameters wj of observed books. ◮ Sum w∗ of popularity parameters of unobserved books. ◮ Latent scores xij associated to observed edges. ◮ Posterior distribution P ({wj}, w∗, {xij}|Z1, . . . , Zn)

Gibbs sampler for the GGP xij|rest ∼ Truncated Gumbel wj|rest ∼ Gamma w∗|rest ∼ Exponentially tilted stable

[Devroye, 2009]

  • F. Caron

17 / 27

slide-40
SLIDE 40

Model for the “interest in reading” parameters

◮ Still Poisson degree distribution for readers ◮ Parametric: γi are indep. and identically distributed from a gamma

distribution

◮ Nonparametric: γi are the points of a random atomic measure Γ ◮ Gibbs sampler can be derived in the same way as for books

  • F. Caron

18 / 27

slide-41
SLIDE 41

Application

◮ Evaluate the fit of three models

◮ Stable Indian Buffet Process ◮ Proposed model where G follows a Generalized Gamma process of

unknown parameters (αw, σw, τw)

◮ with shared and unknown γi = γ ◮ with nonparametric prior where Γ follows a generalized gamma process

  • f unknown parameters (αγ, τγ, σγ)

[Teh and G¨

ur, 2009, Griffiths and Ghahramani, 2005]

  • F. Caron

19 / 27

slide-42
SLIDE 42

Application: IMDB Movie Actor network

280 000 movies, 178 000 actors, 341 000 edges

10 10

2

10 10

1

10

2

10

3

Degree

Model Data

(a) S-IBP

10 10

2

10 10

1

10

2

10

3

Degree

Model Data

(b) GS

10 10

2

10 10

1

10

2

10

3

10

4

Degree

Model Data

(c) GGP

10 10 10

1

10

2

10

3

10

4

10

5

Degree

Model Data

(d) S-IBP

10 10 10

1

10

2

10

3

10

4

10

5

Degree

Model Data

(e) GS

10 10 10

1

10

2

10

3

10

4

10

5

Degree

Model Data

(f) GGP

Figure: Degree distributions for movies (a-d) and actors (e-h) for the IMDB movie-actor dataset with three different models. Data are represented by red plus and samples from the model by blue crosses.

  • F. Caron

20 / 27

slide-43
SLIDE 43

Application: Book-crossing community network

5 000 readers, 36 000 books, 50 000 edges

10 10

2

10 10

1

10

2

10

3

Degree

Model Data

(a) S-IBP

10 10

2

10 10

1

10

2

10

3

Degree

Model Data

(b) GS

10 10

2

10 10

1

10

2

10

3

Degree

Model Data

(c) GGP

10 10 10

1

10

2

10

3

10

4

10

5

Degree

Model Data

(d) S-IBP

10 10 10

1

10

2

10

3

10

4

10

5

Degree

Model Data

(e) GS

10 10 10

1

10

2

10

3

10

4

10

5

Degree

Model Data

(f) GGP

Figure: Degree distributions for readers (a-d) and books (e-h) for the book crossing dataset with three different models. Data are represented by red plus and samples from the model by blue crosses.

  • F. Caron

21 / 27

slide-44
SLIDE 44

Application: Book-crossing community network

5 000 readers, 36 000 books, 50 000 edges

0.51 0.52 0.53 0.54 0.55 0.56 0.57 0.58 0.59 20 40 60 80 100 120 140

σγ Posterior

(a) σγ (Readers)

0.755 0.76 0.765 0.77 0.775 0.78 0.785 0.79 0.795 20 40 60 80 100 120 140

σw Posterior

(b) σw (Books)

Figure: Posterior distributions of the power-law parameters σγ and σw

  • F. Caron

22 / 27

slide-45
SLIDE 45

Application

◮ Log-likelihood on test dataset

Dataset S-IBP SG GGP Board 9.82(29.8) 8.3(30.8)

  • 68.6 (31.9)

Forum

  • 6.7e3
  • 6.7e3

−5.6e3 Books 83.1 214 4.4e4 Citations

  • 3.7e4
  • 3.7e4

−3.4e4 Movielens100k

  • 6.7e4
  • 6.7e4

−5.5e4 IMDB

  • 1.5e5
  • 1.5e5

−1.1e5

  • F. Caron

23 / 27

slide-46
SLIDE 46

Summary

◮ Bayesian nonparametric model for bipartite networks with a

potentially infinite number of nodes

◮ Captures power-law behavior ◮ Simple generative model for network growth ◮ Simple computational procedure for posterior simulation. ◮ Displays a good fit on a variety of social networks

  • F. Caron

24 / 27

slide-47
SLIDE 47

Future work

◮ BNP model for general (non-bipartite) networks ◮ BNP (dynamic) recommender systems ◮ Latent factorial models and dictionary learning

  • F. Caron

25 / 27

slide-48
SLIDE 48

Future work

◮ BNP model for general (non-bipartite) networks ◮ BNP (dynamic) recommender systems ◮ Latent factorial models and dictionary learning

  • F. Caron

25 / 27

slide-49
SLIDE 49

Future work

◮ BNP model for general (non-bipartite) networks ◮ BNP (dynamic) recommender systems ◮ Latent factorial models and dictionary learning

  • F. Caron

25 / 27

slide-50
SLIDE 50

Bibliography I

Brix, A. (1999). Generalized gamma measures and shot-noise Cox processes. Advances in Applied Probability, 31(4):929–953. Devroye, L. (2009). Random variate generation for exponentially and polynomially tilted stable distributions. ACM Transactions on Modeling and Computer Simulation (TOMACS), 19(4):18. Griffiths, T. and Ghahramani, Z. (2005). Infinite latent feature models and the Indian buffet process. In NIPS. James, L., Lijoi, A., and Pr¨ unster, I. (2009). Posterior analysis for normalized random measures with independent increments. Scandinavian Journal of Statistics, 36(1):76–97. James, L. F. (2002). Poisson process partition calculus with applications to exchangeable models and bayesian nonparametrics. arXiv preprint math/0205093. Kingman, J. (1967). Completely random measures. Pacific Journal of Mathematics, 21(1):59–78.

  • F. Caron

26 / 27

slide-51
SLIDE 51

Bibliography II

Lijoi, A., Mena, R. H., and Pr¨ unster, I. (2007). Controlling the reinforcement in bayesian non-parametric mixture models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 69(4):715–740. Lijoi, A. and Pr¨ unster, I. (2010). Models beyond the Dirichlet process. In N. L. Hjort, C. Holmes, P. M. S. G. W., editor, Bayesian Nonparametrics. Cambridge University Press. Pr¨ unster, I. (2002). Random probability measures derived from increasing additive processes and their application to Bayesian statistics. PhD thesis, University of Pavia. Regazzini, E., Lijoi, A., and Pr¨ unster, I. (2003). Distributional results for means of normalized random measures with independent increments. The Annals of Statistics, 31(2):560–585. Teh, Y. and G¨

ur, D. (2009). Indian buffet processes with power-law behavior. In Neural Information Processing Systems (NIPS’2009).

  • F. Caron

27 / 27