Copula Mixture Model for Dependency-seeking Clustering Melanie Rey, - - PowerPoint PPT Presentation

▶

Nov 28, 2022 123 likes •296 views

Copula Mixture Model for Dependency-seeking Clustering Melanie Rey, Volker Roth Department of Mathematics and Computer Science, University of Basel, Switzerland December 16, 2011 1 / 1 Dependency-seeking Clustering Clustering co-occurring

SLIDE 1

Copula Mixture Model for Dependency-seeking Clustering

Melanie Rey, Volker Roth

Department of Mathematics and Computer Science, University of Basel, Switzerland

December 16, 2011

1 / 1

SLIDE 2

Dependency-seeking Clustering

◮ Clustering co-occurring samples

from different data sources called views:

◮ The aim is to cluster the points

according to their between-views dependence structure.

2 / 1

SLIDE 3

Dependency-seeking Clustering

(1) + (2) : cor(X2, Y2) = 0.45.

●
●
●
●
●
●
●
●
●
●
−6

−2 2 4 6 −5 5

X1 X2

●
●
●
●
●
●
●
●
●
●
●
−6

−2 2 4 6 −5 5

Y1 Y2

3 / 1

SLIDE 4

Dependency-seeking Clustering

(1) + (2) : cor(X2, Y2) = 0.45.

●
●
●
●
●
●
●
●
●
●
−6

−2 2 4 6 −5 5

X1 X2

●
●
●
●
●
●
●
●
●
●
●
−6

−2 2 4 6 −5 5

Y1 Y2

(1) : cor(X2, Y2) = 0.8, (2) : cor(X1, Y1) = 0.45.

●
●
●
●
●
●
●
●
●
●
−6

−2 2 4 6 −5 5

X1 X2

●
●
●
●
●
●
●
●
●
●
●
●
−6

−2 2 4 6 −5 5

Y1 Y2

4 / 1

SLIDE 5

Probabilistic CCA

◮ The probabilistic interpretation of CCA [Bach, 2005]:

Z ∼ Nd (0, Id), (X, Y ) |Z ∼ Np+q (WZ + µ, Ψ) , where Ψ has a block-diagonal form: Ψ = Ψx Ψy

5 / 1

SLIDE 6

Dependency-seeking Clustering

◮ Probabilistic dependency-seeking clustering [Klami, 2006]:

Z ∼ Mult (θ), (X, Y ) |Z ∼ Np+q (µz, Ψ) . where Ψ has a block-diagonal form: Ψ = Ψx Ψy

◮ Ψ block diagonal −

→ independent views conditioned on cluster assignment − → cluster structure captures dependencies.

6 / 1

SLIDE 7

Clustering of non-Gaussian data

●
●
●
●
●
●
●
●
●
●
●
●
−4

−2 2 4 −6 −4 −2 2 4

Gaussian model, view 1

X1 X2

0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

Gaussian model, view 2

Y1 Y2

●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−4

−2 2 4 −4 −2 2 4

Copula model, view 1

X1 X2

0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

Copula model, view 2

Y1 Y2 −6 −4 −2 2 4 6 0.0 0.1 0.2 0.3 0.4 0.0 0.2 0.4 0.6 0.8 1.0 1 2 3 4

7 / 1

SLIDE 8

Meta-Gaussian distribution

◮ Specify dependence by a Gaussian copula with block-diagonal

correlation matrix P: C G

P (u) = ΦP

Φ−1

u1 , . . . , Φ−1 ud .

◮ Margins are arbitrary continuous distributions:

X j|θ = X j|θj ∼ F j

X|θ, j = 1, . . . , p,

Y j|θ = Y j|θj ∼ F j

Y |θ, j = 1, . . . , q. ◮ Use Sklar’s Theorem to construct Fθ,P.

8 / 1

SLIDE 9

Meta-Gaussian density

◮ Consider F with copula C and margins F 1, . . . , F d. If F has a

density then it can be expressed as: f (x1, . . . , xd) = c

F 1(x1), . . . , F d(xd)

d

f j(xj), where c(u1, . . . , ud) = ∂C(u1,...,ud)

∂u1...∂ud

is the copula density of C.

◮ Gaussian copula density has a simple form and fθ,P is:

f(X,Y )|θ,P(x, y) = |P|− 1

2 exp

−1

2˜ xT(P−1 − I)˜ x p+q

f j(xj), where ˜ xj = Φ−1 F j(xj)

9 / 1

SLIDE 10

Mixture of Copula Model

The joint density of X and Y is a Dirichlet process prior mixture: f(X,Y )(x, y) = f(X,Y )|θ,P(x, y)dµθ,PdµG(λ, G0).

10 / 1

SLIDE 11

The priors

◮ Assume a priori independence for θ and P:

→ specify the priors separately

◮ Specify prior distributions for Px and Py, where

P = Px Py

, assuming a priori independence.

◮ For Px and Py we choose the marginally uniform prior

[Barnard, 2000]: f (R, d + 1) ∝ |R|

d(d−1) 2

−1

d

|Rii| − (d+1)

.

11 / 1

SLIDE 12

Inference

◮ MCMC algorithm for DP with

non-conjugate prior [Neal, 1998].

◮ Simplifies when using data

augmentation: introduce the normal scores ( ˜ X, ˜ Y ) ˜ X j = Φ−1 F j(X j)

˜ Y j = Φ−1 F j(Y j)

We then have: ( ˜ X, ˜ Y ) ∼ Np+q(0, P)

12 / 1

SLIDE 13

Simulations

0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

Gaussian model, view 2

Y1 Y2

First margin, view 2

Y1 Density 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Gaussian clusters sum of Gaussian

0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

Copula model, view 2

Y1 Y2

First margin, view 2

Y1 Density 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Beta clusters sum of Beta

0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

CCA Mixture, view 2

Y1 Y2

13 / 1

SLIDE 14

Real data experiments

Two data sets containing information about the regulation of heat shock in yeast, [Gasch, 2000], [Harbison, 2004].

◮ First view : gene expressions for yeast measured at 4 time

points → Gaussian

◮ Second view: probability scores of binding interactions for 8

different regulators → Beta

14 / 1

SLIDE 15

Real data experiments

Gaussian Mixture, view 2

Y6 Density 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 1.2

Copula mixture, view 2

Y6 Density 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 1.2

15 / 1

SLIDE 16

Conclusion

◮ Dependency-seeking clustering as alternative to CCA for

multi-view analysis.

◮ Gaussian model produces misleading results when Gaussian

assumption violated.

◮ Increase flexibility using a copula mixture model. ◮ Thank you !

melanie.rey@unibas.ch

16 / 1